ruby-spellchecker 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 3d7994756cdab030a88d162801d9c0bae3981a1cf0c80041769dda663c42dbdb
4
- data.tar.gz: b397558b59a1cd70229916a1876c55c91c2ca703c7f8dfbda5cc81eb6f107848
3
+ metadata.gz: 19fe4bc1957bb2abc21b2cdba7ccc55ab33cefc7c064055d35a338a01fbd910d
4
+ data.tar.gz: 74f000046be2ba09622d6bf725058a4b018e7fdd81e340b9a61593709312c626
5
5
  SHA512:
6
- metadata.gz: '09e6946ae6358ea3dec8e8681b23e83370a301434e32984998e36732fc8bf78f1d2d84709fb019b1d1eb28d9dbf400d943ca2254a4b67e444e84a3f611d77974'
7
- data.tar.gz: d0473248fd1703f17b4e7c4061e1827bf72d82127baacdc7dddf50c93630579aa76861af06013a6f2bb5ddc89b295e46de6c730570b5c5e6b4934d03a45870b3
6
+ metadata.gz: 18c5dfde1bb90223e24a87da7a68c15f85b57cfd8709c8a1938d2e0c2e9dfbb12382cc99091209afd5f6d9951840113658a9c0d5e57d34a24dc394a935522481
7
+ data.tar.gz: 46c48cb356d5f3f825bfe3bb0ce4998e36d3b69ed233592905e9d7eee1a456c524019f8606673ef8bdbe329bf8a68ea3663c5395fdb8e6b47647379942a6dbce
@@ -0,0 +1,14 @@
1
+ name: Benchmark
2
+ on: push
3
+
4
+ jobs:
5
+ verify:
6
+ runs-on: ubuntu-latest
7
+ steps:
8
+ - uses: actions/checkout@v2
9
+ - name: Set up Ruby 2.6.0
10
+ uses: ruby/setup-ruby@v1
11
+ with:
12
+ ruby-version: 2.6.0
13
+ - name: Run benchmarks
14
+ run: ruby benchmark/benchmark.rb
@@ -0,0 +1,26 @@
1
+ name: Rspec
2
+ on: push
3
+
4
+ jobs:
5
+ verify:
6
+ runs-on: ubuntu-latest
7
+ steps:
8
+ - uses: actions/checkout@v2
9
+ - name: Set up Ruby 2.6.0
10
+ uses: ruby/setup-ruby@v1
11
+ with:
12
+ ruby-version: 2.6.0
13
+ - name: Cache gems
14
+ uses: actions/cache@v1
15
+ with:
16
+ path: vendor/bundle
17
+ key: ${{ runner.os }}-gem-${{ hashFiles('**/Gemfile.lock') }}
18
+ restore-keys: |
19
+ ${{ runner.os }}-gem-
20
+ - name: Install gems
21
+ run: |
22
+ gem install bundler
23
+ bundle config path vendor/bundle
24
+ bundle install --jobs 4 --retry 3
25
+ - name: Run RSpec
26
+ run: bundle exec rspec
@@ -0,0 +1,26 @@
1
+ name: Rubocop
2
+ on: push
3
+
4
+ jobs:
5
+ rubocop:
6
+ runs-on: ubuntu-latest
7
+ steps:
8
+ - uses: actions/checkout@v2
9
+ - name: Set up Ruby 2.6.0
10
+ uses: ruby/setup-ruby@v1
11
+ with:
12
+ ruby-version: 2.6.0
13
+ - name: Cache gems
14
+ uses: actions/cache@v1
15
+ with:
16
+ path: vendor/bundle
17
+ key: ${{ runner.os }}-gem-${{ hashFiles('**/Gemfile.lock') }}
18
+ restore-keys: |
19
+ ${{ runner.os }}-gem-
20
+ - name: Install gems
21
+ run: |
22
+ gem install bundler
23
+ bundle config path vendor/bundle
24
+ bundle install --jobs 4 --retry 3
25
+ - name: Run RuboCop
26
+ run: bundle exec rubocop
data/.rspec CHANGED
@@ -1,3 +1,2 @@
1
- --format documentation
2
1
  --color
3
2
  --require spec_helper
@@ -1,7 +1,3 @@
1
- require:
2
- - rubocop-performance
3
- - rubocop-rails
4
-
5
1
  AllCops:
6
2
  NewCops: enable
7
3
  Exclude:
@@ -40,6 +36,3 @@ Naming/FileName:
40
36
 
41
37
  Naming/MethodParameterName:
42
38
  MinNameLength: 2
43
-
44
- Rails/SkipsModelValidations:
45
- Enabled: false
data/README.md CHANGED
@@ -1,15 +1,11 @@
1
- # Spellchecker
2
-
3
- Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/spellchecker`. To experiment with that code, run `bin/console` for an interactive prompt.
4
-
5
- TODO: Delete this and the text above, and describe your gem
1
+ # Ruby Spellchecker
6
2
 
7
3
  ## Installation
8
4
 
9
5
  Add this line to your application's Gemfile:
10
6
 
11
7
  ```ruby
12
- gem 'spellchecker'
8
+ gem 'ruby-spellchecker'
13
9
  ```
14
10
 
15
11
  And then execute:
@@ -22,14 +18,77 @@ Or install it yourself as:
22
18
 
23
19
  ## Usage
24
20
 
25
- TODO: Write usage instructions here
21
+ ### Get list of errors
22
+
23
+ ```ruby
24
+ Spellchecker.check(text)
25
+ ```
26
+
27
+ ### Autocorrection
26
28
 
27
- ## Development
29
+ ```ruby
30
+ text = <<~TEXT
31
+ I started my schooling as the majority did in my area, at the local
32
+ primarry school. I then went to the local secondarry school and
33
+ recieved grades in English, Maths, Phisics, Biology, Geography,
34
+ Art, Graphical Comunication and Philosophy of Religeon. I'll not
35
+ bore you with the 'A' levels and above.
36
+
37
+ Notice the ambigous English qualification above. It was, in truth,
38
+ a cource dedicated to reading "Lord of the flies" and other gems,
39
+ and a weak atempt at getting us to commprehend them. Luckilly my
40
+ middle-class upbringing gave me a head start as I was was already
41
+ aquainted with that sort of langauge these books used (and not just
42
+ the Peter and Jane books) and had read simillar books before. I will
43
+ never be able to put that paticular course down as much as I desire
44
+ to because, for all its faults, it introduced me to Steinbeck,
45
+ Malkovich and the wonders of Lenny, mice and pockets.
46
+
47
+ My education never included one iota of grammar. Lynn Truss points
48
+ out in "Eats, shoots and leaves" that many people were excused from
49
+ the rigours of learning English grammar during their schooling over
50
+ the last 30 or so years because the majority or decision-makers
51
+ decided one day that it might hinder imagination and expresion (so
52
+ what, I ask, happened to all those expresive and imaginative people
53
+ before the ruling?).
54
+ TEXT
55
+
56
+ corrected = Spellchecker.correct(text)
57
+ ```
58
+
59
+ Wdiff:
28
60
 
29
- After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
61
+ ```ruby
62
+ require 'wdiff'
30
63
 
31
- To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
64
+ Wdiff.diff(text, corrected)
32
65
 
33
- ## Contributing
66
+ ```
34
67
 
35
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/spellchecker.
68
+ Result:
69
+
70
+ ```diff
71
+ I started my schooling as the majority did in my area, at the local
72
+ [-primarry-] {+primary+} school. I then went to the local [-secondarry-] {+secondary+} school and
73
+ [-recieved-] {+received+} grades in English, Maths, [-Phisics,-] {+Physics,+} Biology, Geography,
74
+ Art, Graphical Comunication and Philosophy of [-Religeon.-] {+Religion.+} I'll not
75
+ bore you with the 'A' levels and above.
76
+
77
+ Notice the [-ambigous-] {+ambiguous+} English qualification above. It was, in truth,
78
+ a [-cource-] {+course+} dedicated to reading "Lord of the flies" and other gems,
79
+ and a weak [-atempt-] {+attempt+} at getting us to [-commprehend-] {+comprehend+} them. [-Luckilly-] {+Luckily+} my
80
+ middle-class upbringing gave me a head start as I was [-was-] already
81
+ [-aquainted-] {+acquainted+} with that sort of [-langauge-] {+language+} these books used (and not just
82
+ the Peter and Jane books) and had read [-simillar-] {+similar+} books before. I will
83
+ never be able to put that [-paticular-] {+particular+} course down as much as I desire
84
+ to because, for all its faults, it introduced me to Steinbeck,
85
+ Malkovich and the wonders of Lenny, mice and pockets.
86
+
87
+ My education never included one iota of grammar. Lynn Truss points
88
+ out in "Eats, shoots and leaves" that many people were excused from
89
+ the rigours of learning English grammar during their schooling over
90
+ the last 30 or so years because the majority or decision-makers
91
+ decided one day that it might hinder imagination and [-expresion-] {+expression+} (so
92
+ what, I ask, happened to all those [-expresive-] {+expressive+} and imaginative people
93
+ before the ruling?).
94
+ ```
@@ -0,0 +1,29 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'benchmark'
4
+ require_relative '../lib/spellchecker'
5
+
6
+ text1 = <<~TEXT
7
+ I started my schooling as the majority did in my area, at the local primarry school. I then went to the local secondarry school and recieved grades in English, Maths, Phisics, Biology, Geography, Art, Graphical Comunication and Philosophy of Religeon. I'll not bore you with the 'A' levels and above.
8
+ Notice the ambigous English qualification above. It was, in truth, a cource dedicated to reading "Lord of the flies" and other gems, and a weak atempt at getting us to commprehend them. Luckilly my middle-class upbringing gave me a head start as I was already aquainted with that sort of langauge these books used (and not just the Peter and Jane books) and had read simillar books before. I will never be able to put that paticular course down as much as I desire to because, for all its faults, it introduced me to Steinbeck, Malkovich and the wonders of Lenny, mice and pockets.
9
+ My education never included one iota of grammar. Lynn Truss points out in "Eats, shoots and leaves" that many people were excused from the rigours of learning English grammar during their schooling over the last 30 or so years because the majority or decision-makers decided one day that it might hinder imagination and expresion (so what, I ask, happened to all those expresive and imaginative people before the ruling?).
10
+
11
+ I started my schooling as the majority did in my area, at the local primary school. I then went to the local secondary school and received grades in English, Maths, Physics, Biology, Geography, Art, Graphical Communication and Philosophy of Religion. I'll not bore you with the 'A' levels and above.
12
+ Notice the ambiguous English qualification above. It was, in truth, a course dedicated to reading "Lord of the flies" and other gems, and a weak attempt at getting us to comprehend them. Luckily my middle-class upbringing gave me a head start as I was already acquainted with that sort of language these books used (and not just the Peter and Jane books) and had read similar books before. I will never be able to put that particular course down as much as I desire to because, for all its faults, it introduced me to Steinbeck, Malkovich and the wonders of Lenny, mice and pockets.
13
+ My education never included one iota of grammar. Lynn Truss points out in "Eats, shoots and leaves" that many people were excused from the rigours of learning English grammar during their schooling over the last 30 or so years because the majority or decision-makers decided one day that it might hinder imagination and expression (so what, I ask, happened to all those expressive and imaginative people before the ruling?).
14
+ TEXT
15
+
16
+ text2 = <<~TEXT
17
+ Mail Attachment Support Viewable document types (apple.com)
18
+ .jpg, .tiff, .gif (images); .doc and .docx (Microsoft Word); .htm and .html (web pages); .key (Keynote); .numbers (Numbers); .pages (Pages); .pdf (Preview and Adobe Acrobat); .ppt and .pptx (Microsoft PowerPoint); .txt (text); .rtf (rich text format); .vcf (contact information); .xls and .xlsx (Microsoft Excel); .zip; .ics; .usdz (USDZ-Universal).
19
+ TEXT
20
+
21
+ text = text1 + ([text2] * 5).join("\n")
22
+
23
+ Spellchecker.check(text)
24
+
25
+ Benchmark.bm do |x|
26
+ x.report('tokenize') { 500.times { Spellchecker::Tokenizer.call(text) } }
27
+ x.report('check ') { 500.times { Spellchecker.check(text) } }
28
+ x.report('correct ') { 500.times { Spellchecker.correct(text) } }
29
+ end
@@ -350588,9 +350588,6 @@ Comunicatii
350588
350588
  Comunicatiilor
350589
350589
  Comunicating
350590
350590
  Comunicatio
350591
- Comunication
350592
- Comunicational
350593
- Comunications
350594
350591
  Comunicatistampa
350595
350592
  Comunicativa
350596
350593
  Comunicativas
@@ -452,7 +452,6 @@ atlanta journal and constitution,Atlanta Journal-Constitution
452
452
  atlanta journal constitution,Atlanta Journal-Constitution
453
453
  atlanta-journal and constitution,Atlanta Journal-Constitution
454
454
  atlanta-journal constitution,Atlanta Journal-Constitution
455
- atlantic ocean,atlantic Ocean
456
455
  award winning,award-winning
457
456
  b'nai brith,B'nai B'rith
458
457
  b'nai b’rith,B'nai B'rith
@@ -1318,7 +1317,6 @@ in tact,intact
1318
1317
  in their life time,in their lifetime
1319
1318
  in their life-time,in their lifetime
1320
1319
  in united states,in the United States
1321
- indian ocean,indian Ocean
1322
1320
  indira gahndi,Indira Gandhi
1323
1321
  indira ghandi,Indira Gandhi
1324
1322
  inherlife time,inher lifetime
@@ -1585,7 +1583,6 @@ lloyds of london,Lloyd's of London
1585
1583
  long awaited,long-awaited
1586
1584
  longer then,longer than
1587
1585
  loosing on penalties,losing on penalties
1588
- lorem ipsum dolor sit,[default text]
1589
1586
  los angelas,los Angeles
1590
1587
  los angels,los Angeles
1591
1588
  los angles,los Angeles
@@ -1732,8 +1729,6 @@ mostly knowed as,mostly known as
1732
1729
  mostly knowed for,mostly known for
1733
1730
  mostly knows as,mostly known as
1734
1731
  mostly knows for,mostly known for
1735
- moyen age,moyen Âge
1736
- moyen âge,moyen Âge
1737
1732
  muhammed ali,Muhammad Ali
1738
1733
  mullerian duct,Müllerian Duct
1739
1734
  mullerian ducts,Müllerian Ducts
@@ -1911,7 +1906,6 @@ over sized,oversized
1911
1906
  over-size,oversize
1912
1907
  over-sized,oversized
1913
1908
  owning to,owing to
1914
- pacific ocean,pacific Ocean
1915
1909
  palm d'or,Palme d'Or
1916
1910
  palm d`or,Palme d'Or
1917
1911
  palm d’or,Palme d'Or
@@ -2263,8 +2257,6 @@ the fist time,the first time
2263
2257
  the frist time,the first time
2264
2258
  the just the,just the
2265
2259
  the least the least,the least
2266
- the on going,the ongoing
2267
- the on-going,the ongoing
2268
2260
  the question how,the question of how
2269
2261
  the question where,the question of where
2270
2262
  the roughly the,roughly the
@@ -8898,7 +8898,6 @@ arful,awful
8898
8898
  arfull,awful
8899
8899
  arfully,artfully
8900
8900
  arfument,argument
8901
- arg,argument
8902
8901
  argement,argument
8903
8902
  argentia,argentina
8904
8903
  argentinia,argentina
@@ -10568,7 +10567,6 @@ attrbibutes,attributes
10568
10567
  attrbiutes,attributes
10569
10568
  attrbute,attribute
10570
10569
  attrbutes,attributes
10571
- attrib,attribute
10572
10570
  attribbutes,attributes
10573
10571
  attribites,attributes
10574
10572
  attribte,attribute
@@ -10609,7 +10607,6 @@ attrivute,attribute
10609
10607
  attrocious,atrocious
10610
10608
  attrocities,atrocities
10611
10609
  attrocity,atrocity
10612
- attrs,attributes
10613
10610
  attruibutes,attributes
10614
10611
  atttempts,attempts
10615
10612
  atttract,attract
@@ -20329,6 +20326,8 @@ commpletion,completion
20329
20326
  commplexity,complexity
20330
20327
  commplishion,completion
20331
20328
  commpm,common
20329
+ commprehend,comprehend
20330
+ commprehended,comprehended
20332
20331
  commpression,compression
20333
20332
  commptiblity,commptibility
20334
20333
  commpunted,competent
@@ -26494,7 +26493,8 @@ countufersey,controversy
26494
26493
  countuness,countenance
26495
26494
  couontable,countable
26496
26495
  coupld,couple
26497
- cource,source
26496
+ cource,course
26497
+ primarry,primary
26498
26498
  cources,courses
26499
26499
  courcework,coursework
26500
26500
  courching,crouching
@@ -40347,7 +40347,6 @@ enusre,ensure
40347
40347
  enusres,ensures
40348
40348
  enusring,ensuring
40349
40349
  enuthic,enthusiastic
40350
- env,environment
40351
40350
  enveloppe,envelope
40352
40351
  envelopped,envelope
40353
40352
  enveloppen,envelope
@@ -61708,7 +61707,6 @@ isolatuon,isolation
61708
61707
  isoldation,isolation
61709
61708
  isomorphim,isomorphism
61710
61709
  isomorphims,isomorphisms
61711
- isort,frosted
61712
61710
  isotretioin,isotretion
61713
61711
  isotrop,isotope
61714
61712
  ispired,inspired
@@ -96580,6 +96578,7 @@ secodns,seconds
96580
96578
  secods,seconds
96581
96579
  secomdary,secondary
96582
96580
  secondady,secondary
96581
+ secondarry,secondary
96583
96582
  seconday,secondary
96584
96583
  seconderies,secondaries
96585
96584
  secondery,secondary
@@ -112694,7 +112693,6 @@ unitl,until
112694
112693
  unitoligist,unitologist
112695
112694
  unitoligists,unitologists
112696
112695
  unitomious,unanimous
112697
- unittests,unit
112698
112696
  uniue,unique
112699
112697
  univeral,universal
112700
112698
  univeralism,universalism
@@ -13,8 +13,6 @@ require_relative 'spellchecker/detect_typo'
13
13
  require_relative 'spellchecker/detect_ngram'
14
14
 
15
15
  module Spellchecker
16
- NGRAM_NUMBER = 5
17
-
18
16
  module MistakeTypes
19
17
  ALL = [
20
18
  DUPLICATE = 'duplicate',
@@ -60,7 +58,7 @@ module Spellchecker
60
58
  # @param mistakes [Array<Spellchecker::Mistake>]
61
59
  # @return [String]
62
60
  def apply_fixes(text, mistakes)
63
- mistakes_hash = mistakes.map { |m| [m.context, m.context.sub(m.text, m.correction)] }.to_h
61
+ mistakes_hash = mistakes.map { |m| [m.text, m.correction] }.to_h
64
62
  regexp = Regexp.union(mistakes_hash.keys)
65
63
 
66
64
  text.gsub(regexp, mistakes_hash)
@@ -12,7 +12,8 @@ module Spellchecker
12
12
  yum yummy agar kori lai please mumble extremely
13
13
  highly root whoa knock check woof bounce bouncy
14
14
  million tut wow mola paw hubba histrio cha nom
15
- chop same extra more bang big go no pom]
15
+ chop same extra more bang big go no pom la ah
16
+ ha oh ew]
16
17
  ).freeze
17
18
 
18
19
  SKIP_PHRASES = Set.new(['try and', 'and try', 'and again', 'again and',
@@ -46,13 +47,13 @@ module Spellchecker
46
47
  text, correction = find_duplicate(t1, t2, t3, t4)
47
48
 
48
49
  return unless text
49
- return if t2.text.match?(/\A[A-Z]/)
50
+ return if t2.capital? || t3.capital?
50
51
  return if SKIP_PHRASES.include?(correction.downcase)
51
52
  return unless Dictionaries::EnglishWords.include?(t2.text)
52
53
 
53
54
  return if skip_phrase?(t1, t2, t3, t4)
54
55
  return if repetition?(t1, t2, t3, t4)
55
- return if from_to_phrase?(t1, t2, t3, t4)
56
+ return if from_to_phrase?(t1, t2, t3)
56
57
  return if quoted?(t1, t2, t3, t4)
57
58
 
58
59
  Mistake.new(text: text, correction: correction,
@@ -79,22 +80,25 @@ module Spellchecker
79
80
  false
80
81
  end
81
82
 
83
+ # rubocop:disable Metrics/AbcSize
82
84
  def repetition?(t1, t2, t3, t4)
83
85
  return true if t1.downcased == t3.downcased && t1.downcased == t4.next.downcased
84
86
  return true if t1.prev.downcased == t2.downcased && t2.downcased == t4.downcased
87
+ return true if t1.prev.downcased == t1.downcased && t1.downcased == t3.downcased
85
88
  return true if t1.downcased == t2.downcased && (t1.downcased == t3.downcased ||
86
89
  t1.downcased == t1.prev.downcased ||
87
90
  t1.downcased == t4.downcased)
88
91
 
89
92
  false
90
93
  end
94
+ # rubocop:enable Metrics/AbcSize
91
95
 
92
96
  def quoted?(t1, _t2, t3, t4)
93
97
  t1.prev.text == '"' && (t3.text == '"' || t4.text == '"')
94
98
  end
95
99
 
96
- def from_to_phrase?(t1, t2, t3, t4)
97
- t1.downcased == 'from' && t3.downcased == 'to' && t2.downcased == t4.downcased
100
+ def from_to_phrase?(t1, t2, t3)
101
+ t1.prev.downcased == 'from' && t2.downcased == 'to' && t1.downcased == t3.downcased
98
102
  end
99
103
  end
100
104
  end
@@ -4,15 +4,9 @@ module Spellchecker
4
4
  module DetectTypo
5
5
  PROPER_NAME_REGEXP = /\A(?:[a-z]+[A-Z])|(?:[A-Z]+.+[A-Z]+)|(?:[A-Z]{2,}[^A-Z]+)/.freeze
6
6
  ABBREVIATION_REGEXP = /\A(?:[A-Z]{2,4})|(?:[A-Z][a-z])\z/.freeze
7
- MUTEX = Mutex.new
8
7
 
9
8
  LENGTH_LIMIT = 2
10
9
 
11
- POSTFILTERS = {
12
- 'aan' => :all_english_words?,
13
- 'dont' => :any_english_word?
14
- }.freeze
15
-
16
10
  module_function
17
11
 
18
12
  # @param token [Spellchecker::Tokenizer::Token]
@@ -29,12 +23,9 @@ module Spellchecker
29
23
  return if ABBREVIATION_REGEXP.match?(word)
30
24
  return if Dictionaries::EnglishWords.include?(Utils.replace_quote(word))
31
25
 
32
- is_capital = word.match?(/\A[A-Z]/)
33
-
34
- return if is_capital && proper_noun?(word)
35
- return if postfilter?(token)
26
+ return if token.capital? && proper_noun?(word)
36
27
 
37
- correction = correction.sub(/\S/, &:upcase) if is_capital
28
+ correction = correction.sub(/\S/, &:upcase) if token.capital?
38
29
 
39
30
  Mistake.new(text: word, correction: correction,
40
31
  position: token.position, type: MistakeTypes::SPELLING)
@@ -47,29 +38,5 @@ module Spellchecker
47
38
  Dictionaries::CompanyNames.include?(word) ||
48
39
  Dictionaries::UsToponyms.include?(word)
49
40
  end
50
-
51
- # @param token [Spellchecker::Tokenizer::Token]
52
- # @return [Boolean]
53
- def postfilter?(token)
54
- filter = POSTFILTERS[token.downcased]
55
-
56
- return false unless filter
57
-
58
- !method(filter).call(token)
59
- end
60
-
61
- # @param token [Spellchecker::Tokenizer::Token]
62
- # @return [Boolean]
63
- def all_english_words?(token)
64
- Dictionaries::EnglishWords.include?(token.prev.text) &&
65
- Dictionaries::EnglishWords.include?(token.next.text)
66
- end
67
-
68
- # @param token [Spellchecker::Tokenizer::Token]
69
- # @return [Boolean]
70
- def any_english_word?(token)
71
- Dictionaries::EnglishWords.include?(token.prev.text) ||
72
- Dictionaries::EnglishWords.include?(token.next.text)
73
- end
74
41
  end
75
42
  end
@@ -17,8 +17,8 @@ module Spellchecker
17
17
 
18
18
  # @param word [String]
19
19
  # @return [Boolean]
20
- def include?(name)
21
- !match(name).nil?
20
+ def include?(word)
21
+ !match(word).nil?
22
22
  end
23
23
 
24
24
  # @param word [String]
@@ -4,6 +4,7 @@ module Spellchecker
4
4
  module Dictionaries
5
5
  module UsToponyms
6
6
  MUTEX = Mutex.new
7
+ # https://github.com/grammakov/USA-cities-and-states
7
8
  PATH = Dictionaries.path.join('us_toponyms.csv')
8
9
 
9
10
  module_function
@@ -28,10 +29,10 @@ module Spellchecker
28
29
  csv = CSV.parse(PATH.read, headers: true, col_sep: '|')
29
30
 
30
31
  csv.each_with_object(Set.new) do |row, set|
31
- set.add(row['City'])
32
- set.add(row['State full'])
32
+ set.add(row['City']) if row['City']
33
+ set.add(row['State full']) if row['State full']
33
34
  set.add(row['County'].to_s.split(/\s+/).map(&:capitalize).join(' ')) unless row['County'].to_s.empty?
34
- set.add(row['City alias'])
35
+ set.add(row['City alias']) if row['City alias']
35
36
  end
36
37
  end
37
38
  end
@@ -10,6 +10,8 @@ module Spellchecker
10
10
  WORD_REGEXP = /[[:word:]]/.freeze
11
11
  LINEBREAK = "\n"
12
12
 
13
+ DOT = '.'
14
+
13
15
  SIMPLE_PRE = ['¿', '¡'].freeze
14
16
  SIMPLE_POST = ['!', '?', ',', ':', ';', '.'].freeze
15
17
  PAIR_PRE = ['(', '{', '[', '<', '«', '„', '‘'].freeze
@@ -22,9 +24,9 @@ module Spellchecker
22
24
 
23
25
  module_function
24
26
 
25
- # rubocop:disable Metrics/AbcSize
26
- # @param [String] str String to be tokenized.
27
- # @return [Array<String>] Array of list.
27
+ # rubocop:disable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
28
+ # @param str [String] string to be tokenized.
29
+ # @return [Spellchecker::Tokenizer::List]
28
30
  def call(str)
29
31
  chars = str.chars
30
32
  pos = 0
@@ -36,33 +38,40 @@ module Spellchecker
36
38
  if char.nil?
37
39
  list << Token.new(acc.join, pos) unless acc.empty?
38
40
 
39
- break list
41
+ break
40
42
  end
41
43
 
42
44
  if char.match?(BLANK_REGEXP)
43
45
  list << Token.new(acc.join, pos) unless acc.empty?
44
46
  acc.clear
45
- elsif splitable?(char, chars[i + 1], chars[i - 1])
46
- list << Token.new(acc.join, pos) unless acc.empty?
47
- list << Token.new(char, i)
47
+ elsif splitable?(char)
48
+ is_next_wordchar = word_char?(chars[i + 1])
48
49
 
49
- acc.clear
50
+ if acc.empty? && char == DOT && is_next_wordchar
51
+ pos = i
52
+ acc << char
53
+ elsif !word_char?(chars[i - 1]) || !is_next_wordchar
54
+ list << Token.new(acc.join, pos) unless acc.empty?
55
+ list << Token.new(char, i)
56
+
57
+ acc.clear
58
+ else
59
+ acc << char
60
+ end
50
61
  else
51
62
  pos = i if acc.empty?
52
63
  acc << char
53
64
  end
54
65
  end
66
+
67
+ list
55
68
  end
56
- # rubocop:enable Metrics/AbcSize
69
+ # rubocop:enable Metrics/AbcSize, Metrics/MethodLength, Metrics/PerceivedComplexity
57
70
 
58
- # @param cur [String]
59
- # @param prev [String]
60
- # @param nxt [String]
71
+ # @param char [String]
61
72
  # @return [Boolean]
62
- def splitable?(cur, prev, nxt)
63
- return true if SPLITTABLES_REGEXP.match?(cur) && (!word_char?(prev) || !word_char?(nxt))
64
-
65
- cur == LINEBREAK
73
+ def splitable?(char)
74
+ SPLITTABLES_REGEXP.match?(char) || char == LINEBREAK
66
75
  end
67
76
 
68
77
  # @param char [String]
@@ -38,6 +38,11 @@ module Spellchecker
38
38
  @normalized ||= Utils.replace_quote(downcased)
39
39
  end
40
40
 
41
+ # @return [Boolean]
42
+ def capital?
43
+ @capital ||= text.match?(/\A[A-Z]/)
44
+ end
45
+
41
46
  # @return [String]
42
47
  def downcased
43
48
  @downcased ||= text.downcase
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Spellchecker
4
- VERSION = '0.1.1'
4
+ VERSION = '0.1.2'
5
5
  end
@@ -25,6 +25,8 @@ Gem::Specification.new do |spec|
25
25
 
26
26
  spec.require_paths = ['lib']
27
27
 
28
- spec.add_development_dependency 'rspec', '~> 3.0'
29
- spec.add_development_dependency 'rubocop', '~> 1.0'
28
+ spec.add_development_dependency 'rspec'
29
+ spec.add_development_dependency 'rubocop'
30
+ spec.add_development_dependency 'simplecov'
31
+ spec.add_development_dependency 'yard'
30
32
  end
metadata CHANGED
@@ -1,43 +1,71 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: ruby-spellchecker
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pete Matsyburka
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-11-15 00:00:00.000000000 Z
11
+ date: 2020-11-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: rspec
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - "~>"
17
+ - - ">="
18
18
  - !ruby/object:Gem::Version
19
- version: '3.0'
19
+ version: '0'
20
20
  type: :development
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - "~>"
24
+ - - ">="
25
25
  - !ruby/object:Gem::Version
26
- version: '3.0'
26
+ version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: rubocop
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - "~>"
31
+ - - ">="
32
32
  - !ruby/object:Gem::Version
33
- version: '1.0'
33
+ version: '0'
34
34
  type: :development
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - "~>"
38
+ - - ">="
39
39
  - !ruby/object:Gem::Version
40
- version: '1.0'
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: simplecov
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - ">="
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: yard
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
41
69
  description: Ruby spelling and grammar checker that can be used for autocorrection.
42
70
  email:
43
71
  - pete.matsy@gmail.com
@@ -45,14 +73,16 @@ executables: []
45
73
  extensions: []
46
74
  extra_rdoc_files: []
47
75
  files:
76
+ - ".github/workflows/benchmark.yml"
77
+ - ".github/workflows/rspec.yml"
78
+ - ".github/workflows/rubocop.yml"
48
79
  - ".gitignore"
49
80
  - ".rspec"
50
81
  - ".rubocop.yml"
51
- - ".travis.yml"
52
82
  - Gemfile
53
- - LICENSE
54
83
  - README.md
55
84
  - Rakefile
85
+ - benchmark/benchmark.rb
56
86
  - bin/console
57
87
  - bin/setup
58
88
  - dictionaries/company_names.txt
@@ -1,6 +0,0 @@
1
- ---
2
- language: ruby
3
- cache: bundler
4
- rvm:
5
- - 2.7.1
6
- before_install: gem install bundler -v 2.1.4
data/LICENSE DELETED
@@ -1,21 +0,0 @@
1
- The MIT License (MIT)
2
-
3
- Copyright (c) 2020 Pete Matsyburka
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in
13
- all copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
- THE SOFTWARE.