pascoale 0.0.1 → 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.ruby-gemset +1 -0
- data/.ruby-version +1 -0
- data/README.md +32 -4
- data/data/errors.txt +1124 -0
- data/data/everything.txt +177302 -0
- data/data/unique_errors.txt +957 -0
- data/lib/pascoale/constants.rb +8 -0
- data/lib/pascoale/edits.rb +1 -1
- data/lib/pascoale/syllable_separator.rb +44 -0
- data/lib/pascoale/syllable_separator_benchmark.rb +29 -0
- data/lib/pascoale/version.rb +1 -1
- data/lib/pascoale.rb +8 -3
- data/pascoale.gemspec +1 -0
- data/spec/lib/pascoale/syllable_separator_spec.rb +150 -0
- metadata +38 -14
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: bfd6cfb79a1e6ef372e86ccd6eba1e414f256ac7
|
4
|
+
data.tar.gz: 85e7e768620a374cb72588e54de90e6166cb2f4f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: b3f0f665e8daab808873ca392de860ddbd1da1e7e361466bee69aed3728032b73aa926720056bc01e416eee12cc4270de3df036b45f2f9801f839d0c38aea705
|
7
|
+
data.tar.gz: 69e36730f1c1f809f8e5d4117f044e5948a2ab89065dae57bb69160117d499181ded61ba873d2a6640953018155c03671964dfd1e9cead0c9f9b65e53e385896
|
data/.ruby-gemset
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
pascoale
|
data/.ruby-version
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
ruby-2.1
|
data/README.md
CHANGED
@@ -1,11 +1,17 @@
|
|
1
1
|
# Pascoale
|
2
2
|
|
3
|
-
Minor utilities for text processing in Brazilian Portuguese
|
3
|
+
Minor utilities for text processing in **Brazilian Portuguese**.
|
4
4
|
|
5
|
-
I'm going to add new functions as I need them.
|
5
|
+
I'm going to add new functions as I need them.
|
6
|
+
|
7
|
+
Currently it has:
|
8
|
+
- variations of a word at one and two **edit distances** (Reference: http://norvig.com/spell-correct.html).
|
9
|
+
- Syllabic separation. My tests against a corpus of ~170K words shows 99.36% of correctness \o/.
|
6
10
|
|
7
11
|
The code is kinda slow, but I'm not worried about speed (yet).
|
8
12
|
|
13
|
+
The name of the gem is a homage to "Prof. Pasquale Cipro Neto" (http://pt.wikipedia.org/wiki/Pasquale_Cipro_Neto), a great teacher! And yes, the name of the gem is wrong spelled as a joke ^_^
|
14
|
+
|
9
15
|
## Installation
|
10
16
|
|
11
17
|
Add this line to your application's Gemfile:
|
@@ -27,15 +33,37 @@ Variations of a word (typos and misspelling)
|
|
27
33
|
```ruby
|
28
34
|
require 'pascoale'
|
29
35
|
|
30
|
-
edits = Pascoale
|
36
|
+
edits = Pascoale::Edits.new('você')
|
31
37
|
|
32
38
|
# 1 edit distance
|
33
39
|
puts edits.editions.inspect
|
34
40
|
|
35
41
|
# 2 edits distance
|
36
|
-
puts edits.editions2.inspect # LOTS of output,
|
42
|
+
puts edits.editions2.inspect # LOTS of output, beware.
|
37
43
|
```
|
38
44
|
|
45
|
+
Syllabic separation
|
46
|
+
|
47
|
+
```ruby
|
48
|
+
require 'pascoale'
|
49
|
+
|
50
|
+
separator = Pascoale::SyllableSeparator.new('exceção')
|
51
|
+
puts separator.separated.inspect # ["ex", "ce", "ção"]
|
52
|
+
|
53
|
+
separator = Pascoale::SyllableSeparator.new('aéreo')
|
54
|
+
puts separator.separated.inspect # ["a", "é", "re", "o"]
|
55
|
+
|
56
|
+
separator = Pascoale::SyllableSeparator.new('apneia')
|
57
|
+
puts separator.separated.inspect # ["ap", "nei", "a"]
|
58
|
+
|
59
|
+
separator = Pascoale::SyllableSeparator.new('construir')
|
60
|
+
puts separator.separated.inspect # ["cons", "tru", "ir"]
|
61
|
+
|
62
|
+
# Known error :( :( :(
|
63
|
+
separator = Pascoale::SyllableSeparator.new('traidor')
|
64
|
+
puts separator.separated.inspect # ["tra", "i", "dor"] should be ["trai", "dor"]
|
65
|
+
|
66
|
+
```
|
39
67
|
|
40
68
|
## Contributing
|
41
69
|
|