segment_ruby 0.1.1 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/data/segment_ruby/test_bigram/2_frequencies.tsv +10 -0
- data/data/segment_ruby/test_bigram/2_total.tsv +1 -0
- data/data/segment_ruby/test_bigram/frequencies.tsv +10 -0
- data/data/segment_ruby/test_bigram/total.tsv +1 -0
- data/data/segment_ruby/test_unigram/frequencies.tsv +10 -0
- data/data/segment_ruby/test_unigram/total.tsv +1 -0
- data/data/segment_ruby/us_names/2_frequencies.tsv.save +0 -0
- data/data/segment_ruby/us_names/2_total.tsv.save +1 -0
- data/data/segment_ruby/us_names/README.md +15 -0
- data/data/segment_ruby/us_names/frequencies.tsv +78637 -0
- data/data/segment_ruby/us_names/total.tsv +1 -0
- data/lib/segment_ruby/version.rb +1 -1
- data/lib/segment_ruby.rb +3 -3
- metadata +13 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a44ebf20192579cea4c945942fb1275614ccd4cf
|
4
|
+
data.tar.gz: 59fc1cd835d97ca68459dccfd9f3b61d44667292
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0d422443d5be858b3b328d40f3dd174372971aa7364d84118d436d4381b03d815906883bb66fc4fba16a2804a8f7ad509c47c41ed8c979cc0c8b0277bfd91e64
|
7
|
+
data.tar.gz: cd72e99abe2f9b59a6c2dcafcfedc912702aa46621c7086506191fe8cd06a0177d71fd5dbd8df99caf5aaf730cfb6aea9917f7e57708a6859003a02646958847
|
data/.gitignore
CHANGED
@@ -0,0 +1 @@
|
|
1
|
+
431675447550
|
@@ -0,0 +1 @@
|
|
1
|
+
468285774779
|
@@ -0,0 +1 @@
|
|
1
|
+
468285774779
|
File without changes
|
@@ -0,0 +1 @@
|
|
1
|
+
1
|
@@ -0,0 +1,15 @@
|
|
1
|
+
# US names
|
2
|
+
|
3
|
+
These frequencies from from US 2013 Social Security Death Master Index.
|
4
|
+
First and last names were lowercased, all spaces were removed, and
|
5
|
+
then frequencies were counted.
|
6
|
+
|
7
|
+
For example:
|
8
|
+
|
9
|
+
- "Jane" was counted as both a first and as a last name.
|
10
|
+
- The first name "Mary Ann" had spaces removed, and so was counted along
|
11
|
+
with "MaryAnn"
|
12
|
+
- Variants such as "O'Reilly" and "OReilly" were _not_ merged (Although "O Reilly" and "OReilly" were merged).
|
13
|
+
|
14
|
+
The original data comes from the US Social Security Agency; the data was
|
15
|
+
provided by [Tom Alciere](http://cancelthesefunerals.com/).
|