tweet_compressor 0.8.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,125 @@
1
+ # tweet\_compressor
2
+
3
+ ## Copyright and Licensing
4
+
5
+ ### Copyright Notice
6
+
7
+ The copyright for the software, documentation, and associated files are
8
+ held by the author.
9
+
10
+ Copyright 2013 Todd A. Jacobs
11
+ All rights reserved.
12
+
13
+ The AUTHORS file is also included in the source tree.
14
+
15
+ ### Software License
16
+
17
+ ![GPLv3 Logo](http://www.gnu.org/graphics/gplv3-88x31.png)
18
+
19
+ The software is licensed under the
20
+ [GPLv3](http://www.gnu.org/copyleft/gpl.html). The LICENSE file is also
21
+ included in the source tree.
22
+
23
+ ### README License
24
+
25
+ ![Creative Commons BY-NC-SA
26
+ Logo](http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png)
27
+
28
+ This README is licensed under the [Creative Commons
29
+ Attribution-NonCommercial-ShareAlike 3.0 United States
30
+ License](http://creativecommons.org/licenses/by-nc-sa/3.0/us/).
31
+
32
+ ## Purpose
33
+
34
+ tweet\_compressor is Ruby gem that performs successive text
35
+ transformations in order to shrink input text below Twitter's
36
+ 140-character limit while preserving the integrity of hashtags and
37
+ links.
38
+
39
+ ## Features
40
+
41
+ - Treats hashtags as sacrosanct.
42
+ - Relies on Twitter to shorten URLs for you, counting URLs as 20
43
+ characters.
44
+ - Skips shortening stages whenever the character length drops below 140.
45
+ - Remains vaguely intelligible even under heavy compression.
46
+
47
+ ## Caveats and Limitations
48
+
49
+ 1. The gem performs text transformations; it's not a full parser.
50
+ 2. Some of the transformations may be naive or rely on brute force to
51
+ get the job done. YMMV.
52
+ 3. No sanity checking is performed on the semantics of the output text.
53
+ It Works for Me™, but it's not a substitute for applying common
54
+ sense and a keen eye to your tweets before posting on Twitter.
55
+ 4. Works best when you only need to trim a handful of characters. If
56
+ you're vastly over the limit, readability suffers as compression gets
57
+ tighter.
58
+
59
+ ## Supported Software Versions
60
+
61
+ This software is tested against the current Ruby 2.x series. It is
62
+ unlikely to work without minor editing on 1.9.3, and you're on your own
63
+ for anything earlier than 1.9.1.
64
+
65
+ - See [.ruby-version][20] for the currently-supported Ruby versions.
66
+ - See [Gemfile.lock][30] for a complete list of gems, including supported
67
+ versions, needed to build or run this project.
68
+
69
+ ## Installation and Setup
70
+
71
+ Installing tweet\_compressor couldn't be easier. Just follow these two
72
+ simple steps:
73
+
74
+ 1. `gem install tweet_compressor`
75
+ 2. There is no step two.
76
+
77
+ ## Usage
78
+
79
+ tweet_compressor <tweet>
80
+
81
+ ## Examples
82
+
83
+ No screenshots here, just samples of what you can expect to see on
84
+ standard output when you run the program.
85
+
86
+
87
+ - Example of text that requires no compression.
88
+
89
+ $ tweet_compressor foo
90
+ Chars: 3, Compression: 0.0%
91
+
92
+ foo
93
+
94
+ - Example of extremely heavy compression. Trims 196 characters about the
95
+ Gettysburg Address down to 137.
96
+
97
+ $ tweet_compressor 'Four score and seven years ago our fathers
98
+ brought forth on this continent a new nation, conceived in liberty,
99
+ and dedicated to the proposition that all men are created equal.
100
+ #speech #Lincoln'
101
+ Chars: 137, Compression: 28.65%
102
+
103
+ 4 scr &7 yrs ago our fthrs brght frth on ths cntnt a new ntn,cncvd
104
+ in lbrty,& dctd to the prpstn tht al men are crtd eql.#speech
105
+ #Lincoln
106
+
107
+ - Example of assumed compression from [Twitter's built-in URL
108
+ shortener.][10]
109
+
110
+ $ tweet_compressor 'http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters'
111
+ Chars: 20, Compression: 70.59%
112
+
113
+ http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters
114
+
115
+ ## Contributions Welcome
116
+
117
+ This is an open-source project. Contributors are highly encouraged to
118
+ open pull-requests on GitHub.
119
+
120
+ ----
121
+ [Project Home Page](https://github.com/CodeGnome/tweet_compressor)
122
+
123
+ [10]: https://support.twitter.com/entries/109623
124
+ [20]: https://raw.github.com/CodeGnome/tweet_compressor/master/.ruby-version
125
+ [30]: https://raw.github.com/CodeGnome/tweet_compressor/master/Gemfile.lock
data/Rakefile ADDED
@@ -0,0 +1,55 @@
1
+ begin
2
+ require 'bundler/gem_tasks' if Dir.glob('*gemspec').any?
3
+ require 'bundler/setup' if File.exists? 'Gemfile'
4
+ rescue LoadError => bundler_missing
5
+ $stderr.puts bundler_missing
6
+ end
7
+
8
+ require 'rake'
9
+
10
+ PROJECT_NAME = File.basename(Dir.pwd).sub /\.rb$/, ''
11
+
12
+ desc 'Update exuberant-ctags'
13
+ task :etags do
14
+ sh %{etags -R}
15
+ end
16
+
17
+ if Dir.exists? 'test'
18
+ require 'rake/testtask'
19
+
20
+ Rake::TestTask.new do |t|
21
+ t.test_files = FileList[ 'test*' ]
22
+ end
23
+ task :default => :test
24
+ end
25
+
26
+ if Dir.exists? 'spec'
27
+ require 'rspec/core/rake_task'
28
+ RSpec::Core::RakeTask.new(:spec)
29
+ task :default => :spec
30
+ end
31
+
32
+ desc 'Generate rdoc files'
33
+ task :rdoc do
34
+ excludes = %w[AUTHORS LICENSE README* *gemspec]
35
+ system "rdoc #{excludes.map { |file| "-x #{file}" }.join ' '}"
36
+ end
37
+
38
+ task :rename_objects do
39
+ FileList['lib/**/**', 'README*', '.ruby-version', '.rvm'].each do |oldfile|
40
+ next if File.directory? oldfile
41
+ text = File.read(oldfile)
42
+
43
+ next unless text.match /(require|module|class).*foo/i
44
+ text.gsub!(/foo/i, PROJECT_NAME)
45
+ File.open(oldfile, 'w') { |f| f.puts text }
46
+ end
47
+ end
48
+
49
+ desc 'Rename lib files/objects'
50
+ task :rename => :rename_objects do
51
+ libfiles = FileList['lib/**/**']
52
+ libfiles.gsub(/foo/, PROJECT_NAME).zip(libfiles).each do |f|
53
+ FileUtils.mv f[1], f[0] unless f.uniq.count == 1
54
+ end
55
+ end
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative File.join '..', 'lib', 'tweet_compressor'
4
+
5
+ unless ARGV.size == 1
6
+ puts "Usage: #{File.basename $0} <tweet>"
7
+ exit 1
8
+ end
9
+
10
+ tweet = TweetCompressor::Tweet.new ARGV.join ' '
11
+ tweet.compress
12
+
13
+ $stderr.puts "Chars: #{tweet.char_count}, Compression: #{tweet.compression_level}%"
14
+ $stdout.puts ?\n, tweet.compressed
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $: << File.join(Dir.pwd, 'lib')
4
+
5
+ require 'tweet_compressor/compress'
6
+ require 'tweet_compressor/tweet'
7
+ require 'tweet_compressor/version'
@@ -0,0 +1,187 @@
1
+ # This module is a mixin for classes that want to use a very basic alphabetic
2
+ # shorthand to reduce text size. The module performs in-place operations, and
3
+ # expects to find a @compressed instance variable to work from.
4
+ #
5
+ # Example:
6
+ #
7
+ # include Compress
8
+ # @original = 'JavaScript'
9
+ # @compressed = @original.dup
10
+ # abbr
11
+ # # => "JS"
12
+ #
13
+ #
14
+ module Compress
15
+ URL_HOLDER = '__PLACEHOLDER4URLS__'
16
+ URL_LENGTH = 20
17
+ URL_PATTERN = %r{
18
+ \b
19
+ (
20
+ (?: [a-z][\w-]+:
21
+ (?: /{1,3} | [a-z0-9%] ) |
22
+ www\d{0,3}[.] |
23
+ [a-z0-9.\-]+[.][a-z]{2,4}/
24
+ )
25
+ (?:
26
+ [^\s()<>]+ | \(([^\s()<>]+|(\([^\s()<>]+\)))*\)
27
+ )+
28
+ (?:
29
+ \(([^\s()<>]+|(\([^\s()<>]+\)))*\) |
30
+ [^\s`!()\[\]{};:'".,<>?«»“”‘’]
31
+ )
32
+ )
33
+ }ix
34
+
35
+ # Calculate the current character count, taking the "virtual size" of
36
+ # Twitter-shortened URLs into account.
37
+ def char_count
38
+ real_url_chars = @urls.join.size
39
+ virt_url_chars = @urls.count * URL_LENGTH
40
+ @compressed.size - real_url_chars + virt_url_chars
41
+ end
42
+
43
+ private
44
+
45
+ # Special abbreviations to increase clarity.
46
+ #
47
+ # TODO: A YAML dictionary would be preferrable to case statements if the list
48
+ # grows to any significant length.
49
+ def abbr
50
+ @compressed = @compressed.split.map do |word|
51
+ case word.downcase
52
+ when 'and' then '&'
53
+ when 'javascript' then 'JS'
54
+ when 'string' then 'str'
55
+ when 'one' then '1'
56
+ when 'two' then '2'
57
+ when 'three' then '3'
58
+ when 'four' then '4'
59
+ when 'five' then '5'
60
+ when 'six' then '6'
61
+ when 'seven' then '7'
62
+ when 'eight' then '8'
63
+ when 'nine' then '9'
64
+ when 'ten' then '10'
65
+ when 'eleven' then '11'
66
+ when 'twelve' then '12'
67
+ when 'thirteen' then '13'
68
+ when 'fourteen' then '14'
69
+ when 'fifteen' then '15'
70
+ when 'sixteen' then '15'
71
+ when 'seventeen' then '17'
72
+ when 'eighteen' then '18'
73
+ when 'nineteen' then '19'
74
+ when 'twenty' then '20'
75
+ else word
76
+ end
77
+ end.join ' '
78
+ @compressed.gsub! /is (?:an?|the)/, '='
79
+ @compressed.gsub! /(in|with)? regards? (to)?/i, 're'
80
+ @compressed.gsub! /about|regarding|related( to)?|(in response to)/, 're'
81
+ end
82
+
83
+ # Remove apostrophes from contractions to save more space.
84
+ def apostrophes
85
+ @compressed.gsub! /n't/i, 'nt'
86
+ end
87
+
88
+ # Identify common contractions, taking a few pains to preserve capitalization
89
+ # of the initial letter.
90
+ def contractions
91
+ @compressed.gsub! /I would/i, %q{I'd}
92
+ @compressed.gsub! /i will(?!= ?not)/i, %q{I'll}
93
+ @compressed.gsub! /(i)t is/i, %q{\1t's}
94
+ @compressed.gsub! /(i)s not/i, %q{\1sn't}
95
+ @compressed.gsub! /(w)ill not/i, %q{\1on't}
96
+ @compressed.gsub! /(c)an ?not/i, %q{\1an't}
97
+ @compressed.gsub! /(d)o(es)? not/i, %q{\1o\2n't}
98
+ @compressed.gsub! /(s)hould not/i, %q{\1houldn't}
99
+ @compressed.gsub! /(m)ust not/i, %q{\1usn't}
100
+ end
101
+
102
+ # Fix common grammar mistakes that also save space.
103
+ def correct_grammar
104
+ @compressed.gsub! /s's/i, ?'
105
+ end
106
+
107
+ # Remove duplicate lowercase consonants. Assume duplicate capital letters
108
+ # like 'LLC' are intentional.
109
+ def dedupe_consonants
110
+ consonants = [*'a'..'z'].flatten.reject { |c| c =~ /[aeiou]/ }
111
+ regex = /(#{consonants})\1+/
112
+ @compressed = @compressed.split.map do |word|
113
+ next word unless word =~ regex
114
+ word.gsub! regex, $1.to_s
115
+ end.join ' '
116
+ end
117
+
118
+ # Remove duplicate punctuation characters. Make an exception for ellipses
119
+ # and dashes.
120
+ def dedupe_punct
121
+ regex = /([[:punct:]])\1+/
122
+ @compressed = @compressed.split.map do |word|
123
+ word.gsub! /\.{4,}/, '...'
124
+ word.gsub! /-{3,}/, '--'
125
+ next word if word.include? '...' or word.match /-{2,3}/
126
+ next word unless word =~ regex
127
+ word.gsub! regex, '\1'
128
+ end.join ' '
129
+ end
130
+
131
+ # Replace 'ing' with 'g'. Excludes short words like "ring" and "sing," and
132
+ # checks an exception list for special cases.
133
+ def ing
134
+ exceptions = %w[fling]
135
+ @compressed = @compressed.split.map do |word|
136
+ next word unless word.end_with? 'ing'
137
+ next word if word.start_with? '#'
138
+ next word if word.size <= 4
139
+ next word if exceptions.include? word
140
+ word.sub(/ing$/, 'g')
141
+ end.join ' '
142
+ end
143
+
144
+ # Remove lowercase vowels in longer words, unless it is the starting letter.
145
+ def remove_vowels
146
+ @compressed = @compressed.split.map do |word|
147
+ next word if word.start_with? '#'
148
+ word.size >= 4 ? word.gsub(/(?<!\A)[aeiou]/, '') : word
149
+ end.join ' '
150
+ end
151
+
152
+ # Remove spaces between punctuation marks and the following words.
153
+ def sentences
154
+ @compressed.gsub! /([[:punct:]])\s*(\S)/, '\1\2'
155
+ end
156
+
157
+ # Abbreviations common in texting, but with a higher cognitive load.
158
+ def texting
159
+ @compressed.gsub! /is (?:an?|the)/, '='
160
+ @compressed.gsub! /:.\)|\(.:/, ':)'
161
+ @compressed.gsub! /(in|with)? regards? (to)?/i, 're'
162
+ @compressed.gsub! /about|regarding|related( to)?|(in response to)/i, 're'
163
+ @compressed.gsub! /(RT @[^:\b]+):?/, '\1'
164
+ @compressed.gsub! /\bare\b/, 'r'
165
+ @compressed.gsub! /\bfor\b/, '4'
166
+ @compressed.gsub! /\bto/, '2'
167
+ @compressed.gsub! /why/, 'y'
168
+ @compressed.gsub! /you/, 'u'
169
+ end
170
+
171
+ # Regularize whitespace.
172
+ def whitespace
173
+ @compressed = @compressed.split.join ' '
174
+ end
175
+
176
+ # Temporarily remove URLs from the pattern space so that they don't get horked
177
+ # during other text transormations.
178
+ def url_preserve
179
+ @urls = @compressed.scan(/#{URL_PATTERN}/).flatten.compact
180
+ @urls.each { |url| @compressed.gsub! /#{url}/, URL_HOLDER }
181
+ end
182
+
183
+ # Return stored URLs to the pattern space.
184
+ def url_restore
185
+ @urls.each { |url| @compressed.sub! URL_HOLDER, url }
186
+ end
187
+ end
@@ -0,0 +1,38 @@
1
+ module TweetCompressor
2
+ class Tweet
3
+ MAX_LENGTH = 140
4
+ attr_reader :compressed, :original, :urls
5
+
6
+ def initialize tweet=''
7
+ @original, @compressed = tweet, tweet
8
+ @urls = []
9
+ end
10
+
11
+ # The workhorse method that calls each compression stage in turn as long as
12
+ # the tweet text remains larger than 140 characters.
13
+ def compress
14
+ # Always perform, in order to track URL shortening.
15
+ url_preserve
16
+
17
+ stages = %i[url_preserve whitespace correct_grammar contractions
18
+ dedupe_punct abbr remove_vowels dedupe_consonants apostrophes
19
+ sentences]
20
+ stages.each do |stage|
21
+ break if char_count <= MAX_LENGTH
22
+ self.send stage
23
+ end
24
+
25
+ # Must not be a stage, which may be bypassed.
26
+ url_restore
27
+
28
+ @compressed
29
+ end
30
+
31
+ def compression_level
32
+ (100 - ((char_count / @original.size.to_f) * 100)).round 2
33
+ end
34
+
35
+ private
36
+ include Compress
37
+ end
38
+ end
@@ -0,0 +1,3 @@
1
+ module TweetCompressor
2
+ VERSION = '0.8.2'
3
+ end
@@ -0,0 +1,6 @@
1
+ $: << Dir.pwd
2
+
3
+ require 'simplecov'
4
+ SimpleCov.start
5
+
6
+ require 'tweet_compressor'