sastrawi 0.1.0.pre

Sign up to get free protection for your applications and to get access to all the features.
Files changed (89) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +50 -0
  3. data/.travis.yml +8 -0
  4. data/Gemfile +4 -0
  5. data/LICENSE.txt +21 -0
  6. data/README.md +70 -0
  7. data/Rakefile +6 -0
  8. data/data/kata-dasar.txt +29932 -0
  9. data/lib/sastrawi/dictionary/array_dictionary.rb +33 -0
  10. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule10.rb +17 -0
  11. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule11.rb +17 -0
  12. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule12.rb +17 -0
  13. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule13a.rb +17 -0
  14. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule13b.rb +17 -0
  15. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule14.rb +17 -0
  16. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule15a.rb +17 -0
  17. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule15b.rb +17 -0
  18. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule16.rb +17 -0
  19. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule17a.rb +17 -0
  20. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule17b.rb +17 -0
  21. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule17c.rb +17 -0
  22. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule17d.rb +17 -0
  23. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule18a.rb +17 -0
  24. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule18b.rb +17 -0
  25. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule19.rb +17 -0
  26. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule1a.rb +17 -0
  27. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule1b.rb +17 -0
  28. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule2.rb +19 -0
  29. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule20.rb +17 -0
  30. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule21a.rb +17 -0
  31. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule21b.rb +17 -0
  32. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule23.rb +19 -0
  33. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule24.rb +19 -0
  34. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule25.rb +17 -0
  35. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule26a.rb +17 -0
  36. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule26b.rb +17 -0
  37. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule27.rb +17 -0
  38. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule28a.rb +17 -0
  39. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule28b.rb +17 -0
  40. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule29.rb +17 -0
  41. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule3.rb +19 -0
  42. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule30a.rb +17 -0
  43. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule30b.rb +17 -0
  44. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule30c.rb +17 -0
  45. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule31a.rb +17 -0
  46. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule31b.rb +17 -0
  47. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule32.rb +19 -0
  48. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule34.rb +19 -0
  49. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule35.rb +17 -0
  50. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule36.rb +17 -0
  51. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule37a.rb +17 -0
  52. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule37b.rb +17 -0
  53. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule38a.rb +17 -0
  54. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule38b.rb +17 -0
  55. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule39a.rb +17 -0
  56. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule39b.rb +17 -0
  57. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule4.rb +11 -0
  58. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule40a.rb +17 -0
  59. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule40b.rb +17 -0
  60. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule41.rb +17 -0
  61. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule42.rb +17 -0
  62. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule5.rb +17 -0
  63. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule6a.rb +17 -0
  64. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule6b.rb +17 -0
  65. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule7.rb +19 -0
  66. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule8.rb +19 -0
  67. data/lib/sastrawi/morphology/disambiguator/disambiguator_prefix_rule9.rb +19 -0
  68. data/lib/sastrawi/morphology/invalid_affix_pair_specification.rb +24 -0
  69. data/lib/sastrawi/stemmer/cache/array_cache.rb +25 -0
  70. data/lib/sastrawi/stemmer/cached_stemmer.rb +33 -0
  71. data/lib/sastrawi/stemmer/confix_stripping/precedence_adjustment_specification.rb +20 -0
  72. data/lib/sastrawi/stemmer/context/context.rb +170 -0
  73. data/lib/sastrawi/stemmer/context/removal.rb +17 -0
  74. data/lib/sastrawi/stemmer/context/visitor/dont_stem_short_word.rb +17 -0
  75. data/lib/sastrawi/stemmer/context/visitor/prefix_disambiguator.rb +46 -0
  76. data/lib/sastrawi/stemmer/context/visitor/remove_derivational_suffix.rb +28 -0
  77. data/lib/sastrawi/stemmer/context/visitor/remove_inflectional_particle.rb +26 -0
  78. data/lib/sastrawi/stemmer/context/visitor/remove_inflectional_possessive_pronoun.rb +26 -0
  79. data/lib/sastrawi/stemmer/context/visitor/remove_plain_prefix.rb +26 -0
  80. data/lib/sastrawi/stemmer/context/visitor/visitor_provider.rb +157 -0
  81. data/lib/sastrawi/stemmer/filter/text_normalizer.rb +15 -0
  82. data/lib/sastrawi/stemmer/stemmer.rb +85 -0
  83. data/lib/sastrawi/stemmer/stemmer_factory.rb +45 -0
  84. data/lib/sastrawi/stop_word_remover/stop_word_remover.rb +24 -0
  85. data/lib/sastrawi/stop_word_remover/stop_word_remover_factory.rb +152 -0
  86. data/lib/sastrawi/version.rb +3 -0
  87. data/lib/sastrawi.rb +12 -0
  88. data/sastrawi.gemspec +25 -0
  89. metadata +173 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 85c8c97313e9ebf76008045f30bec9a9eaca39dd
4
+ data.tar.gz: f4353fd69f5a722fd8003c37e8c0a61c43ce8c34
5
+ SHA512:
6
+ metadata.gz: 86ac9f0de919863b86bc7e41b2428f958c43d921453fc34336282996352005bde335c5790ac509a894251b79fa9c79685a638c1ca97469921493fc54f91cad7a
7
+ data.tar.gz: 826cef4036c182fd855b399c0db1f7a417b43950f8ac9e710a4e8bb335880369f589196bc947162ccb664e427b6ea7e79d014a6ee9cbf0053084665c6df82fa9
data/.gitignore ADDED
@@ -0,0 +1,50 @@
1
+ *.gem
2
+ *.rbc
3
+ /.config
4
+ /coverage/
5
+ /InstalledFiles
6
+ /pkg/
7
+ /spec/reports/
8
+ /spec/examples.txt
9
+ /test/tmp/
10
+ /test/version_tmp/
11
+ /tmp/
12
+
13
+ # Used by dotenv library to load environment variables.
14
+ # .env
15
+
16
+ ## Specific to RubyMotion:
17
+ .dat*
18
+ .repl_history
19
+ build/
20
+ *.bridgesupport
21
+ build-iPhoneOS/
22
+ build-iPhoneSimulator/
23
+
24
+ ## Specific to RubyMotion (use of CocoaPods):
25
+ #
26
+ # We recommend against adding the Pods directory to your .gitignore. However
27
+ # you should judge for yourself, the pros and cons are mentioned at:
28
+ # https://guides.cocoapods.org/using/using-cocoapods.html#should-i-check-the-pods-directory-into-source-control
29
+ #
30
+ # vendor/Pods/
31
+
32
+ ## Documentation cache and generated files:
33
+ /.yardoc/
34
+ /_yardoc/
35
+ /doc/
36
+ /rdoc/
37
+
38
+ ## Environment normalization:
39
+ /.bundle/
40
+ /vendor/bundle
41
+ /lib/bundler/man/
42
+
43
+ # for a library or gem, you might want to ignore these files since the code is
44
+ # intended to run in multiple environments; otherwise, check them in:
45
+ # Gemfile.lock
46
+ # .ruby-version
47
+ # .ruby-gemset
48
+
49
+ # unless supporting rvm < 1.11.0 or doing something fancy, ignore this:
50
+ .rvmrc
data/.travis.yml ADDED
@@ -0,0 +1,8 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 1.9.3
5
+ - 2.2.6
6
+ - 2.3.1
7
+ - 2.4.0
8
+ before_install: gem install bundler -v 1.12.5
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in sastrawi.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2016-2017 Andrias Meisyal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,70 @@
1
+ # Sastrawi Bindings for Ruby [![Build Status](https://travis-ci.org/meisyal/sastrawi-ruby.svg?branch=master)](https://travis-ci.org/meisyal/sastrawi-ruby)
2
+
3
+ sastrawi-ruby is Ruby bindings for [Sastrawi][sastrawi], a library which allows you
4
+ to stem words in Bahasa Indonesia. The original implementation of Sastrawi was
5
+ written in PHP and this library is written in Ruby language.
6
+
7
+ Taken from [Wikipedia][stemmingwiki], stemming is the process of reducing
8
+ inflected (or sometimes derived) words to their word stem, base or root form.
9
+ For instance, "menahan" has "tahan" as its base form.
10
+
11
+ ## Documentation
12
+
13
+ Documentation for this library is not available at this moment. But, you can
14
+ check [sastrawi-ruby GitHub Wiki][documentation] that contains TODO list.
15
+
16
+ ## Installation
17
+
18
+ There are two options to install this library. First, if you just want to use
19
+ Ruby bindings for Sastrawi, add this line to your application's Gemfile:
20
+
21
+ gem 'sastrawi'
22
+
23
+ and then execute:
24
+
25
+ bundle install
26
+
27
+ or you can install directly:
28
+
29
+ gem install sastrawi
30
+
31
+ Note that, this library requires Ruby. Ruby 1.9.3 or above should be installed
32
+ on your system. I would recommend to choose the stable versions.
33
+
34
+ ## Usage
35
+
36
+ Currently, this library supports stemming words with provided base forms. You
37
+ can't add or remove any base form. This feature will be implemented for next
38
+ release.
39
+
40
+ ```ruby
41
+ require 'sastrawi'
42
+
43
+ # prepare a sentence or words to be stemmed and call the stem API
44
+ sentence = 'Perekonomian Indonesia sedang dalam pertumbuhan yang membanggakan.'
45
+ stemming_result = Sastrawi.stem(sentence)
46
+
47
+ # the stemming result should be "ekonomi indonesia sedang dalam tumbuh yang
48
+ bangga"
49
+ puts stemming_result
50
+ ```
51
+
52
+ ## Contributing
53
+
54
+ Contributions are welcome. If you find a bug, please report it to issue
55
+ tracker. Use `dev` branch as a target of your feature branch for pull request.
56
+ Both issue and pull request details should be written in English.
57
+
58
+ ## License
59
+
60
+ This library is released under the terms of MIT License. See the
61
+ [LICENSE][license] file for more details. sastrawi-ruby contains base form of
62
+ words from [Kateglo][kateglo] and it is licensed under a [Creative Commons
63
+ Attribution-NonCommercial-ShareAlike 3.0 Unported License][kateglolicense].
64
+
65
+ [sastrawi]: https://github.com/sastrawi/sastrawi
66
+ [stemmingwiki]: https://en.wikipedia.org/wiki/Stemming
67
+ [documentation]: https://github.com/meisyal/sastrawi-ruby/wiki
68
+ [license]: https://github.com/meisyal/sastrawi-ruby/blob/master/LICENSE.txt
69
+ [kateglo]: http://kateglo.com
70
+ [kateglolicense]: https://creativecommons.org/licenses/by-nc-sa/3.0/
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec