segment_ruby 0.1.1 → 0.1.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.gitignore +2 -0
- data/data/segment_ruby/test_bigram/2_frequencies.tsv +10 -0
- data/data/segment_ruby/test_bigram/2_total.tsv +1 -0
- data/data/segment_ruby/test_bigram/frequencies.tsv +10 -0
- data/data/segment_ruby/test_bigram/total.tsv +1 -0
- data/data/segment_ruby/test_unigram/frequencies.tsv +10 -0
- data/data/segment_ruby/test_unigram/total.tsv +1 -0
- data/data/segment_ruby/us_names/2_frequencies.tsv.save +0 -0
- data/data/segment_ruby/us_names/2_total.tsv.save +1 -0
- data/data/segment_ruby/us_names/README.md +15 -0
- data/data/segment_ruby/us_names/frequencies.tsv +78637 -0
- data/data/segment_ruby/us_names/total.tsv +1 -0
- data/lib/segment_ruby/version.rb +1 -1
- data/lib/segment_ruby.rb +3 -3
- metadata +13 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: a44ebf20192579cea4c945942fb1275614ccd4cf
|
4
|
+
data.tar.gz: 59fc1cd835d97ca68459dccfd9f3b61d44667292
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 0d422443d5be858b3b328d40f3dd174372971aa7364d84118d436d4381b03d815906883bb66fc4fba16a2804a8f7ad509c47c41ed8c979cc0c8b0277bfd91e64
|
7
|
+
data.tar.gz: cd72e99abe2f9b59a6c2dcafcfedc912702aa46621c7086506191fe8cd06a0177d71fd5dbd8df99caf5aaf730cfb6aea9917f7e57708a6859003a02646958847
|
data/.gitignore
CHANGED
@@ -0,0 +1 @@
|
|
1
|
+
431675447550
|
@@ -0,0 +1 @@
|
|
1
|
+
468285774779
|
@@ -0,0 +1 @@
|
|
1
|
+
468285774779
|
File without changes
|
@@ -0,0 +1 @@
|
|
1
|
+
1
|
@@ -0,0 +1,15 @@
|
|
1
|
+
# US names
|
2
|
+
|
3
|
+
These frequencies from from US 2013 Social Security Death Master Index.
|
4
|
+
First and last names were lowercased, all spaces were removed, and
|
5
|
+
then frequencies were counted.
|
6
|
+
|
7
|
+
For example:
|
8
|
+
|
9
|
+
- "Jane" was counted as both a first and as a last name.
|
10
|
+
- The first name "Mary Ann" had spaces removed, and so was counted along
|
11
|
+
with "MaryAnn"
|
12
|
+
- Variants such as "O'Reilly" and "OReilly" were _not_ merged (Although "O Reilly" and "OReilly" were merged).
|
13
|
+
|
14
|
+
The original data comes from the US Social Security Agency; the data was
|
15
|
+
provided by [Tom Alciere](http://cancelthesefunerals.com/).
|