word_count_analyzer 0.0.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +14 -0
- data/.rspec +1 -0
- data/.travis.yml +5 -0
- data/Gemfile +4 -0
- data/LICENSE.txt +22 -0
- data/README.md +554 -0
- data/Rakefile +2 -0
- data/lib/word_count_analyzer.rb +14 -0
- data/lib/word_count_analyzer/analyzer.rb +34 -0
- data/lib/word_count_analyzer/contraction.rb +176 -0
- data/lib/word_count_analyzer/counter.rb +230 -0
- data/lib/word_count_analyzer/date.rb +149 -0
- data/lib/word_count_analyzer/ellipsis.rb +48 -0
- data/lib/word_count_analyzer/hyperlink.rb +53 -0
- data/lib/word_count_analyzer/hyphenated_word.rb +23 -0
- data/lib/word_count_analyzer/number.rb +23 -0
- data/lib/word_count_analyzer/numbered_list.rb +61 -0
- data/lib/word_count_analyzer/punctuation.rb +52 -0
- data/lib/word_count_analyzer/slash.rb +84 -0
- data/lib/word_count_analyzer/version.rb +3 -0
- data/lib/word_count_analyzer/xhtml.rb +26 -0
- data/spec/spec_helper.rb +1 -0
- data/spec/word_count_analyzer/analyzer_spec.rb +11 -0
- data/spec/word_count_analyzer/contraction_spec.rb +124 -0
- data/spec/word_count_analyzer/counter_spec.rb +647 -0
- data/spec/word_count_analyzer/date_spec.rb +257 -0
- data/spec/word_count_analyzer/ellipsis_spec.rb +69 -0
- data/spec/word_count_analyzer/hyperlink_spec.rb +77 -0
- data/spec/word_count_analyzer/hyphenated_word_spec.rb +81 -0
- data/spec/word_count_analyzer/number_spec.rb +63 -0
- data/spec/word_count_analyzer/numbered_list_spec.rb +69 -0
- data/spec/word_count_analyzer/punctuation_spec.rb +91 -0
- data/spec/word_count_analyzer/slash_spec.rb +105 -0
- data/spec/word_count_analyzer/xhtml_spec.rb +65 -0
- data/word_count_analyzer.gemspec +26 -0
- metadata +153 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: e8bc0afc6af503e2184304985535e15d90594603
|
4
|
+
data.tar.gz: d74add77be74ac5be7ba89f7afc3d6f8b17a10bd
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 16dc74cf00181059fcb607815cb7f3c86828e20e21a0cb5b10cd2c150b815667426345cb43958bafd7b5288067130e76b5f23d6faa9c1f5763398ae2d0e317fa
|
7
|
+
data.tar.gz: c07418f17ed2d6c3f1bed4aa5479ca165b800979f24e67887106a66ccb3c44f61c51688e9e4d7f7c2debc59912cce66a6b63df437b0b0fe06bedccd9eda78704
|
data/.gitignore
ADDED
data/.rspec
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
--color
|
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2015 Kevin S. Dias
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,554 @@
|
|
1
|
+
# Word Count Analyzer
|
2
|
+
|
3
|
+
[![Gem Version](https://badge.fury.io/rb/word_count_analyzer.svg)](http://badge.fury.io/rb/word_count_analyzer) [![Build Status](https://travis-ci.org/diasks2/word_count_analyzer.png)](https://travis-ci.org/diasks2/word_count_analyzer) [![License](https://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat)](https://github.com/diasks2/word_count_analyzer/blob/master/LICENSE.txt)
|
4
|
+
|
5
|
+
See what word count [gray areas](#gray-area-details) might be affecting your word count.
|
6
|
+
|
7
|
+
Word Count Analyzer is a Ruby gem that analyzes a string for potential areas of the text that might cause word count discrepancies depending on the tool used. It also provides comprehensive configuration options so you can easily customize how different gray areas should be counted and find the right word count for your purposes.
|
8
|
+
|
9
|
+
If you prioritize speed over accuracy, then I recommend not using this gem. There are most definitely faster gems for getting a word count. However, if accuracy is important, and you want control over the gray areas that affect word count, then this gem is for you.
|
10
|
+
|
11
|
+
##Install
|
12
|
+
|
13
|
+
**Ruby**
|
14
|
+
*Supports Ruby 2.1.0 and above*
|
15
|
+
```
|
16
|
+
gem install word_count_analyzer
|
17
|
+
```
|
18
|
+
|
19
|
+
**Ruby on Rails**
|
20
|
+
Add this line to your application’s Gemfile:
|
21
|
+
```ruby
|
22
|
+
gem 'word_count_analyzer'
|
23
|
+
```
|
24
|
+
|
25
|
+
## Usage
|
26
|
+
|
27
|
+
### Analyze the word count gray areas of a string
|
28
|
+
|
29
|
+
Common word count gray areas include (*[more details below](#gray-area-details)*):
|
30
|
+
- Ellipses
|
31
|
+
- Hyperlinks
|
32
|
+
- Contractions
|
33
|
+
- Hyphenated Words
|
34
|
+
- Dates
|
35
|
+
- Numbers
|
36
|
+
- Numbered Lists
|
37
|
+
- XML and HTML tags
|
38
|
+
- Forward slashes and backslashes
|
39
|
+
- Punctuation
|
40
|
+
|
41
|
+
Other gray areas not covered by this gem:
|
42
|
+
- Headers
|
43
|
+
- Footers
|
44
|
+
- Hidden Text (*specific to Microsoft Word*)
|
45
|
+
|
46
|
+
```ruby
|
47
|
+
text = "This string has a date: Monday, November 3rd, 2011. I was thinking... it also shouldn't have too many contractions, maybe 4. <html> Some HTML and a hyphenated-word</html>. Don't count stray punctuation ? ? ? Please visit the ____________ ------------ ........ go-to site: https://www.example-site.com today. Let's add a list 1. item a 2. item b 3. item c. Now let's add he/she/it or a c:\\Users\\john. 2/15/2012 is the date! { HYPERLINK 'http://www.hello.com' }"
|
48
|
+
WordCountAnalyzer::Analyzer.new(text: text).analyze
|
49
|
+
|
50
|
+
# => {
|
51
|
+
# "ellipsis": 1,
|
52
|
+
# "hyperlink": 2,
|
53
|
+
# "contraction": 4,
|
54
|
+
# "hyphenated_word": 2,
|
55
|
+
# "date": 2,
|
56
|
+
# "number": 1,
|
57
|
+
# "numbered_list": 3,
|
58
|
+
# "xhtml": 1,
|
59
|
+
# "forward_slash": 1,
|
60
|
+
# "backslash": 1,
|
61
|
+
# "dotted_line": 1,
|
62
|
+
# "dashed_line": 1,
|
63
|
+
# "underscore": 1,
|
64
|
+
# "stray_punctuation": 5
|
65
|
+
# }
|
66
|
+
```
|
67
|
+
|
68
|
+
### Count the words in a string
|
69
|
+
|
70
|
+
```ruby
|
71
|
+
text = "This string has a date: Monday, November 3rd, 2011. I was thinking... it also shouldn't have too many contractions, maybe 2. <html> Some HTML and a hyphenated-word</html>. Don't count punctuation ? ? ? Please visit the ____________ ------------ ........ go-to site: https://www.example-site.com today. Let's add a list 1. item a 2. item b 3. item c. Now let's add he/she/it or a c:\\Users\\john. 2/15/2012 is the date! { HYPERLINK 'http://www.hello.com' }"
|
72
|
+
|
73
|
+
WordCountAnalyzer::Counter.new(text: text).count
|
74
|
+
# => 64
|
75
|
+
|
76
|
+
# Overrides all settings to match the way Pages handles word count.
|
77
|
+
# N.B. The developers of Pages may change the algorithm at any time so this should just be as an approximation.
|
78
|
+
WordCountAnalyzer::Counter.new(text: text).pages_count
|
79
|
+
# => 79
|
80
|
+
|
81
|
+
# Overrides all settings to match the way Microsoft Word and wc (Unix) handle word count.
|
82
|
+
# N.B. The developers of these tools may change the algorithm at any time so this should just be as an approximation.
|
83
|
+
|
84
|
+
WordCountAnalyzer::Counter.new(text: text).mword_count
|
85
|
+
# => 71
|
86
|
+
|
87
|
+
# Highly configurable (see all options below)
|
88
|
+
WordCountAnalyzer::Counter.new(
|
89
|
+
text: text,
|
90
|
+
ellipsis: 'no_special_treatment',
|
91
|
+
hyperlink: 'no_special_treatment',
|
92
|
+
contraction: 'count_as_multiple',
|
93
|
+
hyphenated_word: 'count_as_multiple',
|
94
|
+
date: 'count_as_one',
|
95
|
+
number: 'ignore',
|
96
|
+
numbered_list: 'ignore',
|
97
|
+
xhtml: 'keep',
|
98
|
+
forward_slash: 'count_as_multiple',
|
99
|
+
backslash: 'count_as_multiple',
|
100
|
+
dotted_line: 'count',
|
101
|
+
dashed_line: 'count',
|
102
|
+
underscore: 'count',
|
103
|
+
stray_punctuation: 'count'
|
104
|
+
).count
|
105
|
+
|
106
|
+
# => 77
|
107
|
+
```
|
108
|
+
|
109
|
+
#### Counter `options`
|
110
|
+
|
111
|
+
##### `ellipsis`
|
112
|
+
**default** = `'ignore'`
|
113
|
+
- `'ignore'`
|
114
|
+
Ignores all ellipses in the word count total.
|
115
|
+
- `'no_special_treatment'`
|
116
|
+
Ellipses will not be searched for in the string.
|
117
|
+
|
118
|
+
<hr>
|
119
|
+
|
120
|
+
##### `hyperlink`
|
121
|
+
**default** = `'count_as_one'`
|
122
|
+
- `'count_as_one'`
|
123
|
+
Counts a hyperlink as one word.
|
124
|
+
- `'no_special_treatment'`
|
125
|
+
Hyperlinks will not be searched for in the string. Therefore, how a hyperlink is handled in the word count will depend on other settings (mainly slashes).
|
126
|
+
- `'split_at_period'`
|
127
|
+
Pages will split hyperlinks at a period and count each token as a separate word.
|
128
|
+
|
129
|
+
<hr>
|
130
|
+
|
131
|
+
##### `contraction`
|
132
|
+
**default** = `'count_as_one'`
|
133
|
+
- `'count_as_one'`
|
134
|
+
Counts a contraction as one word.
|
135
|
+
- `'count_as_multiple'`
|
136
|
+
Splits a contraction into the words that make it up. Examples:
|
137
|
+
- `don't` => `do not` (2 words)
|
138
|
+
- `o'clock` => `of the clock` (3 words)
|
139
|
+
|
140
|
+
<hr>
|
141
|
+
|
142
|
+
##### `hyphenated_word`
|
143
|
+
**default** = `'count_as_one'`
|
144
|
+
- `'count_as_one'`
|
145
|
+
Counts a hyphenated word as one word.
|
146
|
+
- `'count_as_multiple'`
|
147
|
+
Breaks a hyphenated word at each hyphen and counts each word separately. Example:
|
148
|
+
- `devil-may-care` (3 words)
|
149
|
+
|
150
|
+
<hr>
|
151
|
+
|
152
|
+
##### `date`
|
153
|
+
**default** = `'no_special_treatment'`
|
154
|
+
- `'count_as_one'`
|
155
|
+
Counts a date as one word. This is more commonly seen in translation CAT tools where a date is thought of as a *placeable* that can usually be automatically translated. Examples:
|
156
|
+
- Monday, April 4th, 2011 (1 word)
|
157
|
+
- April 4th, 2011 (1 word)
|
158
|
+
- 04/04/2011 (1 word)
|
159
|
+
- 04.04.2011 (1 word)
|
160
|
+
- 2011/04/04 (1 word)
|
161
|
+
- 2011-04-04 (1 word)
|
162
|
+
- 2003Nov9 (1 word)
|
163
|
+
- 2003 November 9 (1 word)
|
164
|
+
- 2003-Nov-9 (1 word)
|
165
|
+
- and others...
|
166
|
+
- `'no_special_treatment'`
|
167
|
+
Dates will not be searched for in the string. Therefore, how a date is handled in the word count will depend on other settings.
|
168
|
+
|
169
|
+
<hr>
|
170
|
+
|
171
|
+
##### `number`
|
172
|
+
**default** = `'count'`
|
173
|
+
- `'count'`
|
174
|
+
Counts a number as one word.
|
175
|
+
- `'ignore'`
|
176
|
+
Ignores any numbers in the string (with the exception of `dates` and `numbered_lists`) and does not count them towards the word count.
|
177
|
+
|
178
|
+
<hr>
|
179
|
+
|
180
|
+
##### `numbered_list`
|
181
|
+
**default** = `'count'`
|
182
|
+
- `'count'`
|
183
|
+
Counts a number in a numbered list as one word.
|
184
|
+
- `'ignore'`
|
185
|
+
Ignores any numbers that are part of a numbered list and does not count them towards the word count.
|
186
|
+
|
187
|
+
<hr>
|
188
|
+
|
189
|
+
##### `xhtml`
|
190
|
+
**default** = `'remove'`
|
191
|
+
- `'remove'`
|
192
|
+
Removes any XML or HTML opening and closing tags from the string.
|
193
|
+
- `'keep'`
|
194
|
+
Ignores any XML or HTML in the string.
|
195
|
+
|
196
|
+
<hr>
|
197
|
+
|
198
|
+
##### `forward_slash`
|
199
|
+
**default** = `'count_as_multiple_except_dates'`
|
200
|
+
- `'count_as_one'`
|
201
|
+
Counts any tokens that include a forward slash as one word. Example:
|
202
|
+
- she/he/it (1 word)
|
203
|
+
- `'count_as_multiple'`
|
204
|
+
Separates any tokens that include a forward slash at the slash(s) and counts each token individually. Whether dates, hyperlinks and xhtml are included depends on what is set for those options. Example:
|
205
|
+
- she/he/it (3 words)
|
206
|
+
- `'count_as_multiple_except_dates'`
|
207
|
+
Separates any tokens that include a forward slash (except dates) at the slash(s) and counts each token individually. Example:
|
208
|
+
- she/he/it 4/25/2014 (4 words)
|
209
|
+
|
210
|
+
<hr>
|
211
|
+
|
212
|
+
##### `backslash`
|
213
|
+
**default** = `'count_as_one'`
|
214
|
+
- `'count_as_one'`
|
215
|
+
Counts any tokens that include a backslash as one word. Example:
|
216
|
+
- c:\Users\johndoe (1 word)
|
217
|
+
- `'count_as_multiple'`
|
218
|
+
Separates any tokens that include a backslash at the slash(s) and counts each token individually. Example:
|
219
|
+
- c:\Users\johndoe (3 words)
|
220
|
+
|
221
|
+
<hr>
|
222
|
+
|
223
|
+
##### `dotted_line`
|
224
|
+
**default** = `'ignore'`
|
225
|
+
- `'count'`
|
226
|
+
Counts a dotted line as one word.
|
227
|
+
- `'ignore'`
|
228
|
+
Ignores any dotted lines in the string and does not count them towards the word count.
|
229
|
+
|
230
|
+
<hr>
|
231
|
+
|
232
|
+
##### `dashed_line`
|
233
|
+
**default** = `'ignore'`
|
234
|
+
- `'count'`
|
235
|
+
Counts a dashed line as one word.
|
236
|
+
- `'ignore'`
|
237
|
+
Ignores any dashed lines in the string and does not count them towards the word count.
|
238
|
+
|
239
|
+
<hr>
|
240
|
+
|
241
|
+
##### `underscore`
|
242
|
+
**default** = `'ignore'`
|
243
|
+
- `'count'`
|
244
|
+
Counts a series of underscores as one word.
|
245
|
+
- `'ignore'`
|
246
|
+
Ignores any series of underscores in the string and does not count them towards the word count.
|
247
|
+
|
248
|
+
<hr>
|
249
|
+
|
250
|
+
##### `stray_punctuation`
|
251
|
+
**default** = `'ignore'`
|
252
|
+
- `'count'`
|
253
|
+
Counts a punctuation mark surrounded on both sides by a whitespace as one word.
|
254
|
+
- `'ignore'`
|
255
|
+
Ignores any punctuation marks surrounded on both sides by a whitespace in the string and does not count them towards the word count.
|
256
|
+
|
257
|
+
### Gray Area Details
|
258
|
+
|
259
|
+
#### Ellipsis
|
260
|
+
|
261
|
+
Checks for any occurrences of ellipses in your text. Writers tend to use different formats for ellipsis, and although there are [style guides](http://www.thepunctuationguide.com/ellipses.html), it is rare that these rules are followed.
|
262
|
+
|
263
|
+
##### Three Consecutive Periods
|
264
|
+
```
|
265
|
+
...
|
266
|
+
```
|
267
|
+
Tool | Word Count
|
268
|
+
-------------- | ----------
|
269
|
+
Microsoft Word | 1
|
270
|
+
Pages | 0
|
271
|
+
wc (Unix) | 1
|
272
|
+
|
273
|
+
##### Four Consecutive Periods
|
274
|
+
```
|
275
|
+
....
|
276
|
+
```
|
277
|
+
Tool | Word Count
|
278
|
+
-------------- | ----------
|
279
|
+
Microsoft Word | 1
|
280
|
+
Pages | 0
|
281
|
+
wc (Unix) | 1
|
282
|
+
|
283
|
+
##### Three Periods With Spaces
|
284
|
+
```
|
285
|
+
. . .
|
286
|
+
```
|
287
|
+
Tool | Word Count
|
288
|
+
-------------- | ----------
|
289
|
+
Microsoft Word | 3
|
290
|
+
Pages | 0
|
291
|
+
wc (Unix) | 3
|
292
|
+
|
293
|
+
##### Four Periods With Spaces
|
294
|
+
```
|
295
|
+
. . . .
|
296
|
+
```
|
297
|
+
Tool | Word Count
|
298
|
+
-------------- | ----------
|
299
|
+
Microsoft Word | 4
|
300
|
+
Pages | 0
|
301
|
+
wc (Unix) | 4
|
302
|
+
|
303
|
+
##### Horizontal Ellipsis
|
304
|
+
```
|
305
|
+
…
|
306
|
+
```
|
307
|
+
Tool | Word Count
|
308
|
+
-------------- | ----------
|
309
|
+
Microsoft Word | 1
|
310
|
+
Pages | 0
|
311
|
+
wc (Unix) | 1
|
312
|
+
|
313
|
+
#### Hyperlink
|
314
|
+
|
315
|
+
```
|
316
|
+
http://www.example.com
|
317
|
+
```
|
318
|
+
Tool | Word Count
|
319
|
+
-------------- | ----------
|
320
|
+
Microsoft Word | 1
|
321
|
+
Pages | 4
|
322
|
+
wc (Unix) | 1
|
323
|
+
|
324
|
+
#### Contraction
|
325
|
+
|
326
|
+
Most tools count contractions as one word. [Some might argue](http://english.stackexchange.com/questions/80635/counting-contractions-as-one-or-two-words) a contraction is technically more than one word.
|
327
|
+
|
328
|
+
```
|
329
|
+
can't
|
330
|
+
```
|
331
|
+
Tool | Word Count
|
332
|
+
-------------- | ----------
|
333
|
+
Microsoft Word | 1
|
334
|
+
Pages | 1
|
335
|
+
wc (Unix) | 1
|
336
|
+
|
337
|
+
#### Hyphenated Word
|
338
|
+
|
339
|
+
```
|
340
|
+
devil-may-care
|
341
|
+
```
|
342
|
+
Tool | Word Count
|
343
|
+
-------------- | ----------
|
344
|
+
Microsoft Word | 1
|
345
|
+
Pages | 3
|
346
|
+
wc (Unix) | 1
|
347
|
+
|
348
|
+
#### Date
|
349
|
+
|
350
|
+
Most word processing tools do not do recognize dates, but translation CAT tools tend to recognize dates as one word or [placeable](http://www.wordfast.net/wiki/Placeables). This gem checks for many date formats including those that include day or month abbreviations. A few examples are listed below (*not an exhaustive list*).
|
351
|
+
|
352
|
+
##### Date (example A)
|
353
|
+
```
|
354
|
+
Monday, April 4th, 2011
|
355
|
+
```
|
356
|
+
Tool | Word Count
|
357
|
+
-------------- | ----------
|
358
|
+
Microsoft Word | 4
|
359
|
+
Pages | 4
|
360
|
+
wc (Unix) | 4
|
361
|
+
|
362
|
+
##### Date (example B)
|
363
|
+
```
|
364
|
+
04/04/2011
|
365
|
+
```
|
366
|
+
Tool | Word Count
|
367
|
+
-------------- | ----------
|
368
|
+
Microsoft Word | 1
|
369
|
+
Pages | 3
|
370
|
+
wc (Unix) | 1
|
371
|
+
|
372
|
+
##### Date (example C)
|
373
|
+
```
|
374
|
+
04.04.2011
|
375
|
+
```
|
376
|
+
Tool | Word Count
|
377
|
+
-------------- | ----------
|
378
|
+
Microsoft Word | 1
|
379
|
+
Pages | 1
|
380
|
+
wc (Unix) | 1
|
381
|
+
|
382
|
+
#### Number
|
383
|
+
|
384
|
+
##### Simple number
|
385
|
+
```
|
386
|
+
200
|
387
|
+
```
|
388
|
+
Tool | Word Count
|
389
|
+
-------------- | ----------
|
390
|
+
Microsoft Word | 1
|
391
|
+
Pages | 1
|
392
|
+
wc (Unix) | 1
|
393
|
+
|
394
|
+
##### Number with preceding unit
|
395
|
+
```
|
396
|
+
$200
|
397
|
+
```
|
398
|
+
Tool | Word Count
|
399
|
+
-------------- | ----------
|
400
|
+
Microsoft Word | 1
|
401
|
+
Pages | 1
|
402
|
+
wc (Unix) | 1
|
403
|
+
|
404
|
+
|
405
|
+
##### Number with unit following
|
406
|
+
```
|
407
|
+
50%
|
408
|
+
```
|
409
|
+
Tool | Word Count
|
410
|
+
-------------- | ----------
|
411
|
+
Microsoft Word | 1
|
412
|
+
Pages | 1
|
413
|
+
wc (Unix) | 1
|
414
|
+
|
415
|
+
#### Numbered List
|
416
|
+
|
417
|
+
```
|
418
|
+
1. List item a
|
419
|
+
2. List item b
|
420
|
+
3. List item c
|
421
|
+
```
|
422
|
+
Tool | Word Count
|
423
|
+
-------------- | ----------
|
424
|
+
Microsoft Word | 12
|
425
|
+
Pages | 9
|
426
|
+
wc (Unix) | 12
|
427
|
+
|
428
|
+
#### XML and HTML Tags
|
429
|
+
|
430
|
+
```html
|
431
|
+
<span class="large-text">Hello world</span> <new-tag>Hello</new-tag>
|
432
|
+
```
|
433
|
+
Tool | Word Count
|
434
|
+
-------------- | ----------
|
435
|
+
Microsoft Word | 4
|
436
|
+
Pages | 12
|
437
|
+
wc (Unix) | 4
|
438
|
+
|
439
|
+
#### Slashes
|
440
|
+
|
441
|
+
##### Forward slash
|
442
|
+
```
|
443
|
+
she/he/it
|
444
|
+
```
|
445
|
+
Tool | Word Count
|
446
|
+
-------------- | ----------
|
447
|
+
Microsoft Word | 1
|
448
|
+
Pages | 3
|
449
|
+
wc (Unix) | 1
|
450
|
+
|
451
|
+
##### Backslash
|
452
|
+
```
|
453
|
+
c:\Users\johndoe
|
454
|
+
```
|
455
|
+
Tool | Word Count
|
456
|
+
-------------- | ----------
|
457
|
+
Microsoft Word | 1
|
458
|
+
Pages | 3
|
459
|
+
wc (Unix) | 1
|
460
|
+
|
461
|
+
#### Punctuation
|
462
|
+
|
463
|
+
##### Dotted line
|
464
|
+
```
|
465
|
+
.........
|
466
|
+
```
|
467
|
+
Tool | Word Count
|
468
|
+
-------------- | ----------
|
469
|
+
Microsoft Word | 1
|
470
|
+
Pages | 0
|
471
|
+
wc (Unix) | 1
|
472
|
+
|
473
|
+
```
|
474
|
+
………………………
|
475
|
+
```
|
476
|
+
Tool | Word Count
|
477
|
+
-------------- | ----------
|
478
|
+
Microsoft Word | 1
|
479
|
+
Pages | 0
|
480
|
+
wc (Unix) | 1
|
481
|
+
|
482
|
+
##### Dashed line
|
483
|
+
```
|
484
|
+
-----------
|
485
|
+
```
|
486
|
+
Tool | Word Count
|
487
|
+
-------------- | ----------
|
488
|
+
Microsoft Word | 1
|
489
|
+
Pages | 0
|
490
|
+
wc (Unix) | 1
|
491
|
+
|
492
|
+
##### Underscore
|
493
|
+
```
|
494
|
+
____________
|
495
|
+
```
|
496
|
+
Tool | Word Count
|
497
|
+
-------------- | ----------
|
498
|
+
Microsoft Word | 1
|
499
|
+
Pages | 0
|
500
|
+
wc (Unix) | 1
|
501
|
+
|
502
|
+
##### Punctuation mark surrounded by spaces
|
503
|
+
```
|
504
|
+
:
|
505
|
+
```
|
506
|
+
Tool | Word Count
|
507
|
+
-------------- | ----------
|
508
|
+
Microsoft Word | 1
|
509
|
+
Pages | 0
|
510
|
+
wc (Unix) | 1
|
511
|
+
|
512
|
+
## Research
|
513
|
+
|
514
|
+
- *[So how many words do you think it is?](http://multifarious.filkin.com/2012/11/13/wordcount)* - Paul Filkin
|
515
|
+
- [Word Count](http://en.wikipedia.org/wiki/Word_count) - Wikipedia
|
516
|
+
- [Words Counted Ruby Gem](https://github.com/abitdodgy/words_counted) - Mohamad El-Husseini
|
517
|
+
|
518
|
+
## TODO
|
519
|
+
|
520
|
+
- Add language support for languages other than English
|
521
|
+
- For most languages this is probably as simple as adding in the translations and abbreviations for months and days.
|
522
|
+
- For languages that use a character count (Japanese, Chinese) there will be larger changes. For these languages need to add an option for how to handle Roman words within the text.
|
523
|
+
|
524
|
+
## Contributing
|
525
|
+
|
526
|
+
1. Fork it ( https://github.com/diasks2/word_count_analyzer/fork )
|
527
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
528
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
529
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
530
|
+
5. Create a new Pull Request
|
531
|
+
|
532
|
+
## License
|
533
|
+
|
534
|
+
The MIT License (MIT)
|
535
|
+
|
536
|
+
Copyright (c) 2015 Kevin S. Dias
|
537
|
+
|
538
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
539
|
+
of this software and associated documentation files (the "Software"), to deal
|
540
|
+
in the Software without restriction, including without limitation the rights
|
541
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
542
|
+
copies of the Software, and to permit persons to whom the Software is
|
543
|
+
furnished to do so, subject to the following conditions:
|
544
|
+
|
545
|
+
The above copyright notice and this permission notice shall be included in
|
546
|
+
all copies or substantial portions of the Software.
|
547
|
+
|
548
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
549
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
550
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
551
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
552
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
553
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
554
|
+
THE SOFTWARE.
|