pragmatic_segmenter 0.3.16 → 0.3.17
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/NEWS +4 -0
- data/README.md +108 -105
- data/lib/pragmatic_segmenter/cleaner/rules.rb +2 -2
- data/lib/pragmatic_segmenter/version.rb +1 -1
- data/spec/pragmatic_segmenter/languages/english_spec.rb +7 -0
- metadata +3 -3
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: baf0f9cb38a40398530e15df0d28ab8a654c49c6
|
4
|
+
data.tar.gz: 4a184ffa70092a3f99fe7d194a71e7a935b1d3b2
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 089a40744464256d33ce30b4c6d40a6f00bcd5dd0178757a3d65b3f5ff18242751e5074902bd0aab0460e97e5db0040a30f206648e98c14ec861abf0ba92680c
|
7
|
+
data.tar.gz: e49bdbe345e2d2e394d2f4ef03eac2ca0488f3dac158dfd5c193a328368326963785bf643a565d599d0e4d49196e3480d7f8054ec1df4927c192958bb1e1de47
|
data/NEWS
CHANGED
data/README.md
CHANGED
@@ -1,27 +1,27 @@
|
|
1
|
-
# Pragmatic Segmenter
|
1
|
+
# Pragmatic Segmenter
|
2
2
|
|
3
3
|
[![Gem Version](https://badge.fury.io/rb/pragmatic_segmenter.svg)](http://badge.fury.io/rb/pragmatic_segmenter) [![Code Climate](https://codeclimate.com/github/diasks2/pragmatic_segmenter/badges/gpa.svg)](https://codeclimate.com/github/diasks2/pragmatic_segmenter) [![Build Status](https://travis-ci.org/diasks2/pragmatic_segmenter.png)](https://travis-ci.org/diasks2/pragmatic_segmenter) [![Test Coverage](https://codeclimate.com/github/diasks2/pragmatic_segmenter/badges/coverage.svg)](https://codeclimate.com/github/diasks2/pragmatic_segmenter) [![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/diasks2/pragmatic_segmenter/blob/master/LICENSE.txt)
|
4
4
|
|
5
|
-
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
|
5
|
+
Pragmatic Segmenter is a rule-based sentence boundary detection gem that works out-of-the-box across many languages.
|
6
6
|
|
7
|
-
## Install
|
7
|
+
## Install
|
8
8
|
|
9
|
-
**Ruby**
|
10
|
-
*Supports Ruby 2.1.5 and above*
|
9
|
+
**Ruby**
|
10
|
+
*Supports Ruby 2.1.5 and above*
|
11
11
|
```
|
12
12
|
gem install pragmatic_segmenter
|
13
13
|
```
|
14
14
|
|
15
|
-
**Ruby on Rails**
|
16
|
-
Add this line to your application’s Gemfile:
|
17
|
-
```ruby
|
15
|
+
**Ruby on Rails**
|
16
|
+
Add this line to your application’s Gemfile:
|
17
|
+
```ruby
|
18
18
|
gem 'pragmatic_segmenter'
|
19
19
|
```
|
20
20
|
|
21
|
-
## Usage
|
21
|
+
## Usage
|
22
22
|
|
23
|
-
* If no language is specified, the library will default to English.
|
24
|
-
* To specify a language use its two character [ISO 639-1 code](https://www.tm-town.com/languages).
|
23
|
+
* If no language is specified, the library will default to English.
|
24
|
+
* To specify a language use its two character [ISO 639-1 code](https://www.tm-town.com/languages).
|
25
25
|
|
26
26
|
```ruby
|
27
27
|
text = "Hello world. My name is Mr. Smith. I work for the U.S. Government and I live in the U.S. I live in New York."
|
@@ -64,11 +64,11 @@ According to Wikipedia, [sentence boundary disambiguation](http://en.wikipedia.o
|
|
64
64
|
|
65
65
|
> Sentence boundary disambiguation (SBD), also known as sentence breaking, is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons. However sentence boundary identification is challenging because punctuation marks are often ambiguous. For example, a period may denote an abbreviation, decimal point, an ellipsis, or an email address – not the end of a sentence. About 47% of the periods in the Wall Street Journal corpus denote abbreviations. As well, question marks and exclamation marks may appear in embedded quotations, emoticons, computer code, and slang. Languages like Japanese and Chinese have unambiguous sentence-ending markers.
|
66
66
|
|
67
|
-
The goal of **Pragmatic Segmenter** is to provide a "real-world" segmenter that works out of the box across many languages and does a reasonable job when the format and domain of the input text are unknown. Pragmatic Segmenter does not use any machine-learning techniques and thus does not require training data.
|
67
|
+
The goal of **Pragmatic Segmenter** is to provide a "real-world" segmenter that works out of the box across many languages and does a reasonable job when the format and domain of the input text are unknown. Pragmatic Segmenter does not use any machine-learning techniques and thus does not require training data.
|
68
68
|
|
69
|
-
Pragmatic Segmenter aims to improve on other segmentation engines in 2 main areas:
|
70
|
-
1) Language support (most segmentation tools only focus on English)
|
71
|
-
2) Text cleaning and preprocessing
|
69
|
+
Pragmatic Segmenter aims to improve on other segmentation engines in 2 main areas:
|
70
|
+
1) Language support (most segmentation tools only focus on English)
|
71
|
+
2) Text cleaning and preprocessing
|
72
72
|
|
73
73
|
Pragmatic Segmenter is opinionated and made for the explicit purpose of segmenting texts to create translation memories. Therefore, things such as parenthesis within a sentence are kept as one segment, even if technically there are two or more sentences within the segment in order to maintain coherence. The algorithm is also conservative in that if it comes across an ambiguous sentence boundary it will ignore it rather than splitting.
|
74
74
|
|
@@ -85,11 +85,11 @@ Pragmatic Segmenter is specifically used for the purpose of segmenting texts for
|
|
85
85
|
|
86
86
|
## The Golden Rules
|
87
87
|
|
88
|
-
*The Golden Rules* are a set of tests I developed that can be run through a segmenter to check its accuracy in regards to edge case scenarios. Most of the papers cited below in *Segmentation Papers and Books* either use the WSJ corpus or Brown corpus from the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) to test their segmentation algorithm. In my opinion there are 2 limits to using these corpora:
|
89
|
-
1) The corpora may be too expensive for some people ($1,700).
|
90
|
-
2) The majority of the sentences in the corpora are sentences that end with a regular word followed by a period, thus testing the same thing over and over again.
|
88
|
+
*The Golden Rules* are a set of tests I developed that can be run through a segmenter to check its accuracy in regards to edge case scenarios. Most of the papers cited below in *Segmentation Papers and Books* either use the WSJ corpus or Brown corpus from the [Penn Treebank](https://catalog.ldc.upenn.edu/LDC99T42) to test their segmentation algorithm. In my opinion there are 2 limits to using these corpora:
|
89
|
+
1) The corpora may be too expensive for some people ($1,700).
|
90
|
+
2) The majority of the sentences in the corpora are sentences that end with a regular word followed by a period, thus testing the same thing over and over again.
|
91
91
|
|
92
|
-
> In the Brown Corpus 92% of potential sentence boundaries come after a regular word. The WSJ Corpus is richer with abbreviations and only 83% [53% according to Gale and Church, 1991] of sentences end with a regular word followed by a period.
|
92
|
+
> In the Brown Corpus 92% of potential sentence boundaries come after a regular word. The WSJ Corpus is richer with abbreviations and only 83% [53% according to Gale and Church, 1991] of sentences end with a regular word followed by a period.
|
93
93
|
|
94
94
|
Andrei Mikheev - *Periods, Capitalized Words, etc.*
|
95
95
|
|
@@ -664,7 +664,7 @@ Pragmatic Segmenter | Ruby
|
|
664
664
|
[SRX English](https://github.com/apohllo/srx-english) | Ruby | [GNU GPLv3](http://www.gnu.org/copyleft/gpl.html) | 30.77% | 28.57% | 6.19 s
|
665
665
|
[Scapel](https://github.com/louismullie/scalpel) | Ruby | [GNU GPLv3](http://www.gnu.org/copyleft/gpl.html) | 28.85% | 20.00% | 0.13 s
|
666
666
|
|
667
|
-
†GRS (Other Languages) is the total of the Golden Rules listed above for all languages other than English. This metric by no means includes all languages, only the ones that have Golden Rules listed above.
|
667
|
+
†GRS (Other Languages) is the total of the Golden Rules listed above for all languages other than English. This metric by no means includes all languages, only the ones that have Golden Rules listed above.
|
668
668
|
‡ Speed is based on the performance benchmark results detailed in the section "Speed Performance Benchmarks" below. The number is an average of 10 runs.
|
669
669
|
|
670
670
|
Other tools not yet tested:
|
@@ -729,140 +729,143 @@ To test the relative performance of different segmentation tools and libraries I
|
|
729
729
|
* Add additional language support
|
730
730
|
* Add abbreviation lists for any languages that do not currently have one (only relevant for languages that have the concept of abbreviations with periods)
|
731
731
|
* Get Golden Rule #18 passing - Handling of a.m. or p.m. followed by a capitalized non sentence starter (ex. "At 5 p.m. Mr. Smith went to the bank. He left the bank at 6 p.m. Next he went to the store." --> ["At 5 p.m. Mr. Smith went to the bank.", "He left the bank at 6 p.m.", "Next he went to the store."])
|
732
|
-
* Support for Thai. This is a very challenging problem due to the absence of explicit sentence markers (i.e. like a period in English) and the ambiguity in Thai regarding what constitutes a sentence even among native speakers. For more information see the following research papers ([#1](http://www.cs.cmu.edu/~paisarn/papers/iccpol2001.pdf) | [#2](http://pioneer.chula.ac.th/~awirote/ling/snlp2007-wirote.pdf)).
|
732
|
+
* Support for Thai. This is a very challenging problem due to the absence of explicit sentence markers (i.e. like a period in English) and the ambiguity in Thai regarding what constitutes a sentence even among native speakers. For more information see the following research papers ([#1](http://www.cs.cmu.edu/~paisarn/papers/iccpol2001.pdf) | [#2](http://pioneer.chula.ac.th/~awirote/ling/snlp2007-wirote.pdf)).
|
733
733
|
|
734
734
|
## Change Log
|
735
735
|
|
736
|
-
**Version 0.0.1**
|
737
|
-
* Initial Release
|
736
|
+
**Version 0.0.1**
|
737
|
+
* Initial Release
|
738
738
|
|
739
|
-
**Version 0.0.2**
|
740
|
-
* Major design refactor
|
739
|
+
**Version 0.0.2**
|
740
|
+
* Major design refactor
|
741
741
|
|
742
742
|
**Version 0.0.3**
|
743
|
-
* Add travis.yml
|
744
|
-
* Add Code Climate
|
745
|
-
* Update README
|
743
|
+
* Add travis.yml
|
744
|
+
* Add Code Climate
|
745
|
+
* Update README
|
746
746
|
|
747
|
-
**Version 0.0.4**
|
748
|
-
* Add `ConsecutiveForwardSlashRule` to cleaner
|
749
|
-
* Refactor `segmenter.rb` and `process.rb`
|
747
|
+
**Version 0.0.4**
|
748
|
+
* Add `ConsecutiveForwardSlashRule` to cleaner
|
749
|
+
* Refactor `segmenter.rb` and `process.rb`
|
750
750
|
|
751
|
-
**Version 0.0.5**
|
752
|
-
* Make symbol substitution safer
|
753
|
-
* Refactor `process.rb`
|
754
|
-
* Update cleaner with escaped newline rules
|
751
|
+
**Version 0.0.5**
|
752
|
+
* Make symbol substitution safer
|
753
|
+
* Refactor `process.rb`
|
754
|
+
* Update cleaner with escaped newline rules
|
755
755
|
|
756
|
-
**Version 0.0.6**
|
757
|
-
* Add rule for escaped newlines that include a space between the slash and character
|
758
|
-
* Add Golden Rule #52 and code to make it pass
|
756
|
+
**Version 0.0.6**
|
757
|
+
* Add rule for escaped newlines that include a space between the slash and character
|
758
|
+
* Add Golden Rule #52 and code to make it pass
|
759
759
|
|
760
|
-
**Version 0.0.7**
|
761
|
-
* Add change log to README
|
762
|
-
* Add passing spec for new end of sentence abbreviation (EN)
|
763
|
-
* Add roman numeral list support
|
760
|
+
**Version 0.0.7**
|
761
|
+
* Add change log to README
|
762
|
+
* Add passing spec for new end of sentence abbreviation (EN)
|
763
|
+
* Add roman numeral list support
|
764
764
|
|
765
|
-
**Version 0.0.8**
|
766
|
-
* Fix error in `list.rb`
|
765
|
+
**Version 0.0.8**
|
766
|
+
* Fix error in `list.rb`
|
767
767
|
|
768
|
-
**Version 0.0.9**
|
769
|
-
* Improve handling of alphabetical and roman numeral lists
|
768
|
+
**Version 0.0.9**
|
769
|
+
* Improve handling of alphabetical and roman numeral lists
|
770
770
|
|
771
|
-
**Version 0.1.0**
|
772
|
-
* Add Kommanditgesellschaft Rule
|
771
|
+
**Version 0.1.0**
|
772
|
+
* Add Kommanditgesellschaft Rule
|
773
773
|
|
774
|
-
**Version 0.1.1**
|
775
|
-
* Fix handling of German dates
|
774
|
+
**Version 0.1.1**
|
775
|
+
* Fix handling of German dates
|
776
776
|
|
777
|
-
**Version 0.1.2**
|
778
|
-
* Fix missing abbreviations
|
779
|
-
* Add footnote rule to `cleaner.rb`
|
777
|
+
**Version 0.1.2**
|
778
|
+
* Fix missing abbreviations
|
779
|
+
* Add footnote rule to `cleaner.rb`
|
780
780
|
|
781
|
-
**Version 0.1.3**
|
782
|
-
* Improve punctuation in bracket replacement
|
781
|
+
**Version 0.1.3**
|
782
|
+
* Improve punctuation in bracket replacement
|
783
783
|
|
784
|
-
**Version 0.1.4**
|
785
|
-
* Fix missing abbreviations
|
784
|
+
**Version 0.1.4**
|
785
|
+
* Fix missing abbreviations
|
786
786
|
|
787
|
-
**Version 0.1.5**
|
788
|
-
* Fix comma at end of quotation bug
|
787
|
+
**Version 0.1.5**
|
788
|
+
* Fix comma at end of quotation bug
|
789
789
|
|
790
|
-
**Version 0.1.6**
|
791
|
-
* Fix bug in numbered list finder (ignore longer digits)
|
790
|
+
**Version 0.1.6**
|
791
|
+
* Fix bug in numbered list finder (ignore longer digits)
|
792
792
|
|
793
|
-
**Version 0.1.7**
|
794
|
-
* Add Alice in Wonderland specs
|
795
|
-
* Fix parenthesis between double quotations bug
|
796
|
-
* Fix split after quotation ending in dash bug
|
793
|
+
**Version 0.1.7**
|
794
|
+
* Add Alice in Wonderland specs
|
795
|
+
* Fix parenthesis between double quotations bug
|
796
|
+
* Fix split after quotation ending in dash bug
|
797
797
|
|
798
|
-
**Version 0.1.8**
|
799
|
-
* Fix bug in splitting new sentence after single quotes
|
798
|
+
**Version 0.1.8**
|
799
|
+
* Fix bug in splitting new sentence after single quotes
|
800
800
|
|
801
|
-
**Version 0.2.0**
|
802
|
-
* Add Dutch Golden Rules and abbreviations
|
803
|
-
* Update README with additional tools
|
804
|
-
* Update segmentation test scores in README with results of new Golden Rule tests
|
805
|
-
* Add Polish abbreviations
|
801
|
+
**Version 0.2.0**
|
802
|
+
* Add Dutch Golden Rules and abbreviations
|
803
|
+
* Update README with additional tools
|
804
|
+
* Update segmentation test scores in README with results of new Golden Rule tests
|
805
|
+
* Add Polish abbreviations
|
806
806
|
|
807
|
-
**Version 0.3.0**
|
808
|
-
* Add support for square brackets
|
809
|
-
* Add support for continuous exclamation points or questions marks or combinations of both
|
810
|
-
* Fix Roman numeral support
|
811
|
-
* Add English abbreviations
|
807
|
+
**Version 0.3.0**
|
808
|
+
* Add support for square brackets
|
809
|
+
* Add support for continuous exclamation points or questions marks or combinations of both
|
810
|
+
* Fix Roman numeral support
|
811
|
+
* Add English abbreviations
|
812
812
|
|
813
|
-
**Version 0.3.1**
|
814
|
-
* Fix undefined method 'gsub!' for nil:NilClass issue
|
813
|
+
**Version 0.3.1**
|
814
|
+
* Fix undefined method 'gsub!' for nil:NilClass issue
|
815
815
|
|
816
|
-
**Version 0.3.2**
|
817
|
-
* Add English abbreviations
|
816
|
+
**Version 0.3.2**
|
817
|
+
* Add English abbreviations
|
818
818
|
|
819
|
-
**Version 0.3.3**
|
820
|
-
* Fix cleaner bug
|
819
|
+
**Version 0.3.3**
|
820
|
+
* Fix cleaner bug
|
821
821
|
|
822
|
-
**Version 0.3.4**
|
823
|
-
* Large refactor
|
822
|
+
**Version 0.3.4**
|
823
|
+
* Large refactor
|
824
824
|
|
825
|
-
**Version 0.3.5**
|
826
|
-
* Reduce GC by replacing `#gsub` with `#gsub!` where possible
|
825
|
+
**Version 0.3.5**
|
826
|
+
* Reduce GC by replacing `#gsub` with `#gsub!` where possible
|
827
827
|
|
828
|
-
**Version 0.3.6**
|
829
|
-
* Refactor SENTENCE_STARTERS to each individual language and add SENTENCE_STARTERS for German
|
828
|
+
**Version 0.3.6**
|
829
|
+
* Refactor SENTENCE_STARTERS to each individual language and add SENTENCE_STARTERS for German
|
830
830
|
|
831
|
-
**Version 0.3.7**
|
832
|
-
* Add `unicode` gem and use it for downcasing to better handle cyrillic languages
|
831
|
+
**Version 0.3.7**
|
832
|
+
* Add `unicode` gem and use it for downcasing to better handle cyrillic languages
|
833
833
|
|
834
|
-
**Version 0.3.8**
|
835
|
-
* Fix bug that cleaned away single letter segments
|
834
|
+
**Version 0.3.8**
|
835
|
+
* Fix bug that cleaned away single letter segments
|
836
836
|
|
837
|
-
**Version 0.3.9**
|
838
|
-
* Remove `guard-rspec` development dependency
|
837
|
+
**Version 0.3.9**
|
838
|
+
* Remove `guard-rspec` development dependency
|
839
839
|
|
840
|
-
**Version 0.3.10**
|
841
|
-
* Change load order of dependencies to fix bug
|
840
|
+
**Version 0.3.10**
|
841
|
+
* Change load order of dependencies to fix bug
|
842
842
|
|
843
|
-
**Version 0.3.11**
|
843
|
+
**Version 0.3.11**
|
844
844
|
* Update German abbreviation list
|
845
|
-
* Refactor 'remove_newline_in_middle_of_sentence' method
|
845
|
+
* Refactor 'remove_newline_in_middle_of_sentence' method
|
846
846
|
|
847
|
-
**Version 0.3.12**
|
847
|
+
**Version 0.3.12**
|
848
848
|
* Fix issue involving words with leading apostrophes
|
849
849
|
|
850
|
-
**Version 0.3.13**
|
850
|
+
**Version 0.3.13**
|
851
851
|
* Fix issue involving unexpected sentence break between abbreviation and hyphen
|
852
852
|
|
853
|
-
**Version 0.3.14**
|
853
|
+
**Version 0.3.14**
|
854
854
|
* Add English abbreviation Rs. to denote the Indian currency
|
855
855
|
|
856
|
-
**Version 0.3.15**
|
856
|
+
**Version 0.3.15**
|
857
857
|
* Handle em dashes that appear in the middle of a sentence and include a sentence ending punctuation mark
|
858
858
|
|
859
|
-
**Version 0.3.16**
|
859
|
+
**Version 0.3.16**
|
860
860
|
* Add support and tests for Danish
|
861
861
|
|
862
|
+
**Version 0.3.17**
|
863
|
+
* Fix issue involving the HTML regex in the cleaner
|
864
|
+
|
862
865
|
## Contributing
|
863
866
|
|
864
867
|
If you find a text that is incorrectly segmented using this gem, please submit an issue.
|
865
|
-
|
868
|
+
|
866
869
|
1. Fork it ( https://github.com/diasks2/pragmatic_segmenter/fork )
|
867
870
|
2. Create your feature branch (`git checkout -b my-new-feature`)
|
868
871
|
3. Commit your changes (`git commit -am 'Add some feature'`)
|
@@ -64,8 +64,8 @@ module PragmaticSegmenter
|
|
64
64
|
|
65
65
|
|
66
66
|
module HTML
|
67
|
-
# Rubular: http://rubular.com/r/
|
68
|
-
HTMLTagRule = Rule.new(
|
67
|
+
# Rubular: http://rubular.com/r/9d0OVOEJWj
|
68
|
+
HTMLTagRule = Rule.new(/<\/?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[\^'">\s]+))?)+\s*|\s*)\/?>/, '')
|
69
69
|
|
70
70
|
# Rubular: http://rubular.com/r/XZVqMPJhea
|
71
71
|
EscapedHTMLTagRule = Rule.new(/<\/?[^gt;]*gt;/, '')
|
@@ -1399,5 +1399,12 @@ RSpec.describe PragmaticSegmenter::Languages::English, "(en)" do
|
|
1399
1399
|
ps = PragmaticSegmenter::Segmenter.new(text: "What do you see? - Posted like silent sentinels all around the town, stand thousands upon thousands of mortal men fixed in ocean reveries.", clean: false)
|
1400
1400
|
expect(ps.segment).to eq(["What do you see?", "- Posted like silent sentinels all around the town, stand thousands upon thousands of mortal men fixed in ocean reveries."])
|
1401
1401
|
end
|
1402
|
+
|
1403
|
+
it 'correctly segments text #117' do
|
1404
|
+
text = "In placebo-controlled studies of all uses of Tracleer, marked decreases in hemoglobin (>15% decrease from baseline resulting in values <11 g/ dL) were observed in 6% of Tracleer-treated patients and 3% of placebo-treated patients. Bosentan is highly bound (>98%) to plasma proteins, mainly albumin."
|
1405
|
+
ps = PragmaticSegmenter::Segmenter.new(text: text)
|
1406
|
+
expect(ps.segment).to eq(["In placebo-controlled studies of all uses of Tracleer, marked decreases in hemoglobin (>15% decrease from baseline resulting in values <11 g/ dL) were observed in 6% of Tracleer-treated patients and 3% of placebo-treated patients.", "Bosentan is highly bound (>98%) to plasma proteins, mainly albumin."])
|
1407
|
+
end
|
1408
|
+
|
1402
1409
|
end
|
1403
1410
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: pragmatic_segmenter
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.3.
|
4
|
+
version: 0.3.17
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Kevin S. Dias
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2017-
|
11
|
+
date: 2017-12-07 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: unicode
|
@@ -180,7 +180,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
180
180
|
version: '0'
|
181
181
|
requirements: []
|
182
182
|
rubyforge_project:
|
183
|
-
rubygems_version: 2.6.
|
183
|
+
rubygems_version: 2.6.14
|
184
184
|
signing_key:
|
185
185
|
specification_version: 4
|
186
186
|
summary: A rule-based sentence boundary detection gem that works out-of-the-box across
|