tweet_compressor 0.8.2

Sign up to get free protection for your applications and to get access to all the features.
data/README.md ADDED
@@ -0,0 +1,125 @@
1
+ # tweet\_compressor
2
+
3
+ ## Copyright and Licensing
4
+
5
+ ### Copyright Notice
6
+
7
+ The copyright for the software, documentation, and associated files are
8
+ held by the author.
9
+
10
+ Copyright 2013 Todd A. Jacobs
11
+ All rights reserved.
12
+
13
+ The AUTHORS file is also included in the source tree.
14
+
15
+ ### Software License
16
+
17
+ ![GPLv3 Logo](http://www.gnu.org/graphics/gplv3-88x31.png)
18
+
19
+ The software is licensed under the
20
+ [GPLv3](http://www.gnu.org/copyleft/gpl.html). The LICENSE file is also
21
+ included in the source tree.
22
+
23
+ ### README License
24
+
25
+ ![Creative Commons BY-NC-SA
26
+ Logo](http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png)
27
+
28
+ This README is licensed under the [Creative Commons
29
+ Attribution-NonCommercial-ShareAlike 3.0 United States
30
+ License](http://creativecommons.org/licenses/by-nc-sa/3.0/us/).
31
+
32
+ ## Purpose
33
+
34
+ tweet\_compressor is Ruby gem that performs successive text
35
+ transformations in order to shrink input text below Twitter's
36
+ 140-character limit while preserving the integrity of hashtags and
37
+ links.
38
+
39
+ ## Features
40
+
41
+ - Treats hashtags as sacrosanct.
42
+ - Relies on Twitter to shorten URLs for you, counting URLs as 20
43
+ characters.
44
+ - Skips shortening stages whenever the character length drops below 140.
45
+ - Remains vaguely intelligible even under heavy compression.
46
+
47
+ ## Caveats and Limitations
48
+
49
+ 1. The gem performs text transformations; it's not a full parser.
50
+ 2. Some of the transformations may be naive or rely on brute force to
51
+ get the job done. YMMV.
52
+ 3. No sanity checking is performed on the semantics of the output text.
53
+ It Works for Me™, but it's not a substitute for applying common
54
+ sense and a keen eye to your tweets before posting on Twitter.
55
+ 4. Works best when you only need to trim a handful of characters. If
56
+ you're vastly over the limit, readability suffers as compression gets
57
+ tighter.
58
+
59
+ ## Supported Software Versions
60
+
61
+ This software is tested against the current Ruby 2.x series. It is
62
+ unlikely to work without minor editing on 1.9.3, and you're on your own
63
+ for anything earlier than 1.9.1.
64
+
65
+ - See [.ruby-version][20] for the currently-supported Ruby versions.
66
+ - See [Gemfile.lock][30] for a complete list of gems, including supported
67
+ versions, needed to build or run this project.
68
+
69
+ ## Installation and Setup
70
+
71
+ Installing tweet\_compressor couldn't be easier. Just follow these two
72
+ simple steps:
73
+
74
+ 1. `gem install tweet_compressor`
75
+ 2. There is no step two.
76
+
77
+ ## Usage
78
+
79
+ tweet_compressor <tweet>
80
+
81
+ ## Examples
82
+
83
+ No screenshots here, just samples of what you can expect to see on
84
+ standard output when you run the program.
85
+
86
+
87
+ - Example of text that requires no compression.
88
+
89
+ $ tweet_compressor foo
90
+ Chars: 3, Compression: 0.0%
91
+
92
+ foo
93
+
94
+ - Example of extremely heavy compression. Trims 196 characters about the
95
+ Gettysburg Address down to 137.
96
+
97
+ $ tweet_compressor 'Four score and seven years ago our fathers
98
+ brought forth on this continent a new nation, conceived in liberty,
99
+ and dedicated to the proposition that all men are created equal.
100
+ #speech #Lincoln'
101
+ Chars: 137, Compression: 28.65%
102
+
103
+ 4 scr &7 yrs ago our fthrs brght frth on ths cntnt a new ntn,cncvd
104
+ in lbrty,& dctd to the prpstn tht al men are crtd eql.#speech
105
+ #Lincoln
106
+
107
+ - Example of assumed compression from [Twitter's built-in URL
108
+ shortener.][10]
109
+
110
+ $ tweet_compressor 'http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters'
111
+ Chars: 20, Compression: 70.59%
112
+
113
+ http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters
114
+
115
+ ## Contributions Welcome
116
+
117
+ This is an open-source project. Contributors are highly encouraged to
118
+ open pull-requests on GitHub.
119
+
120
+ ----
121
+ [Project Home Page](https://github.com/CodeGnome/tweet_compressor)
122
+
123
+ [10]: https://support.twitter.com/entries/109623
124
+ [20]: https://raw.github.com/CodeGnome/tweet_compressor/master/.ruby-version
125
+ [30]: https://raw.github.com/CodeGnome/tweet_compressor/master/Gemfile.lock
data/Rakefile ADDED
@@ -0,0 +1,55 @@
1
+ begin
2
+ require 'bundler/gem_tasks' if Dir.glob('*gemspec').any?
3
+ require 'bundler/setup' if File.exists? 'Gemfile'
4
+ rescue LoadError => bundler_missing
5
+ $stderr.puts bundler_missing
6
+ end
7
+
8
+ require 'rake'
9
+
10
+ PROJECT_NAME = File.basename(Dir.pwd).sub /\.rb$/, ''
11
+
12
+ desc 'Update exuberant-ctags'
13
+ task :etags do
14
+ sh %{etags -R}
15
+ end
16
+
17
+ if Dir.exists? 'test'
18
+ require 'rake/testtask'
19
+
20
+ Rake::TestTask.new do |t|
21
+ t.test_files = FileList[ 'test*' ]
22
+ end
23
+ task :default => :test
24
+ end
25
+
26
+ if Dir.exists? 'spec'
27
+ require 'rspec/core/rake_task'
28
+ RSpec::Core::RakeTask.new(:spec)
29
+ task :default => :spec
30
+ end
31
+
32
+ desc 'Generate rdoc files'
33
+ task :rdoc do
34
+ excludes = %w[AUTHORS LICENSE README* *gemspec]
35
+ system "rdoc #{excludes.map { |file| "-x #{file}" }.join ' '}"
36
+ end
37
+
38
+ task :rename_objects do
39
+ FileList['lib/**/**', 'README*', '.ruby-version', '.rvm'].each do |oldfile|
40
+ next if File.directory? oldfile
41
+ text = File.read(oldfile)
42
+
43
+ next unless text.match /(require|module|class).*foo/i
44
+ text.gsub!(/foo/i, PROJECT_NAME)
45
+ File.open(oldfile, 'w') { |f| f.puts text }
46
+ end
47
+ end
48
+
49
+ desc 'Rename lib files/objects'
50
+ task :rename => :rename_objects do
51
+ libfiles = FileList['lib/**/**']
52
+ libfiles.gsub(/foo/, PROJECT_NAME).zip(libfiles).each do |f|
53
+ FileUtils.mv f[1], f[0] unless f.uniq.count == 1
54
+ end
55
+ end
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require_relative File.join '..', 'lib', 'tweet_compressor'
4
+
5
+ unless ARGV.size == 1
6
+ puts "Usage: #{File.basename $0} <tweet>"
7
+ exit 1
8
+ end
9
+
10
+ tweet = TweetCompressor::Tweet.new ARGV.join ' '
11
+ tweet.compress
12
+
13
+ $stderr.puts "Chars: #{tweet.char_count}, Compression: #{tweet.compression_level}%"
14
+ $stdout.puts ?\n, tweet.compressed
@@ -0,0 +1,7 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $: << File.join(Dir.pwd, 'lib')
4
+
5
+ require 'tweet_compressor/compress'
6
+ require 'tweet_compressor/tweet'
7
+ require 'tweet_compressor/version'
@@ -0,0 +1,187 @@
1
+ # This module is a mixin for classes that want to use a very basic alphabetic
2
+ # shorthand to reduce text size. The module performs in-place operations, and
3
+ # expects to find a @compressed instance variable to work from.
4
+ #
5
+ # Example:
6
+ #
7
+ # include Compress
8
+ # @original = 'JavaScript'
9
+ # @compressed = @original.dup
10
+ # abbr
11
+ # # => "JS"
12
+ #
13
+ #
14
+ module Compress
15
+ URL_HOLDER = '__PLACEHOLDER4URLS__'
16
+ URL_LENGTH = 20
17
+ URL_PATTERN = %r{
18
+ \b
19
+ (
20
+ (?: [a-z][\w-]+:
21
+ (?: /{1,3} | [a-z0-9%] ) |
22
+ www\d{0,3}[.] |
23
+ [a-z0-9.\-]+[.][a-z]{2,4}/
24
+ )
25
+ (?:
26
+ [^\s()<>]+ | \(([^\s()<>]+|(\([^\s()<>]+\)))*\)
27
+ )+
28
+ (?:
29
+ \(([^\s()<>]+|(\([^\s()<>]+\)))*\) |
30
+ [^\s`!()\[\]{};:'".,<>?«»“”‘’]
31
+ )
32
+ )
33
+ }ix
34
+
35
+ # Calculate the current character count, taking the "virtual size" of
36
+ # Twitter-shortened URLs into account.
37
+ def char_count
38
+ real_url_chars = @urls.join.size
39
+ virt_url_chars = @urls.count * URL_LENGTH
40
+ @compressed.size - real_url_chars + virt_url_chars
41
+ end
42
+
43
+ private
44
+
45
+ # Special abbreviations to increase clarity.
46
+ #
47
+ # TODO: A YAML dictionary would be preferrable to case statements if the list
48
+ # grows to any significant length.
49
+ def abbr
50
+ @compressed = @compressed.split.map do |word|
51
+ case word.downcase
52
+ when 'and' then '&'
53
+ when 'javascript' then 'JS'
54
+ when 'string' then 'str'
55
+ when 'one' then '1'
56
+ when 'two' then '2'
57
+ when 'three' then '3'
58
+ when 'four' then '4'
59
+ when 'five' then '5'
60
+ when 'six' then '6'
61
+ when 'seven' then '7'
62
+ when 'eight' then '8'
63
+ when 'nine' then '9'
64
+ when 'ten' then '10'
65
+ when 'eleven' then '11'
66
+ when 'twelve' then '12'
67
+ when 'thirteen' then '13'
68
+ when 'fourteen' then '14'
69
+ when 'fifteen' then '15'
70
+ when 'sixteen' then '15'
71
+ when 'seventeen' then '17'
72
+ when 'eighteen' then '18'
73
+ when 'nineteen' then '19'
74
+ when 'twenty' then '20'
75
+ else word
76
+ end
77
+ end.join ' '
78
+ @compressed.gsub! /is (?:an?|the)/, '='
79
+ @compressed.gsub! /(in|with)? regards? (to)?/i, 're'
80
+ @compressed.gsub! /about|regarding|related( to)?|(in response to)/, 're'
81
+ end
82
+
83
+ # Remove apostrophes from contractions to save more space.
84
+ def apostrophes
85
+ @compressed.gsub! /n't/i, 'nt'
86
+ end
87
+
88
+ # Identify common contractions, taking a few pains to preserve capitalization
89
+ # of the initial letter.
90
+ def contractions
91
+ @compressed.gsub! /I would/i, %q{I'd}
92
+ @compressed.gsub! /i will(?!= ?not)/i, %q{I'll}
93
+ @compressed.gsub! /(i)t is/i, %q{\1t's}
94
+ @compressed.gsub! /(i)s not/i, %q{\1sn't}
95
+ @compressed.gsub! /(w)ill not/i, %q{\1on't}
96
+ @compressed.gsub! /(c)an ?not/i, %q{\1an't}
97
+ @compressed.gsub! /(d)o(es)? not/i, %q{\1o\2n't}
98
+ @compressed.gsub! /(s)hould not/i, %q{\1houldn't}
99
+ @compressed.gsub! /(m)ust not/i, %q{\1usn't}
100
+ end
101
+
102
+ # Fix common grammar mistakes that also save space.
103
+ def correct_grammar
104
+ @compressed.gsub! /s's/i, ?'
105
+ end
106
+
107
+ # Remove duplicate lowercase consonants. Assume duplicate capital letters
108
+ # like 'LLC' are intentional.
109
+ def dedupe_consonants
110
+ consonants = [*'a'..'z'].flatten.reject { |c| c =~ /[aeiou]/ }
111
+ regex = /(#{consonants})\1+/
112
+ @compressed = @compressed.split.map do |word|
113
+ next word unless word =~ regex
114
+ word.gsub! regex, $1.to_s
115
+ end.join ' '
116
+ end
117
+
118
+ # Remove duplicate punctuation characters. Make an exception for ellipses
119
+ # and dashes.
120
+ def dedupe_punct
121
+ regex = /([[:punct:]])\1+/
122
+ @compressed = @compressed.split.map do |word|
123
+ word.gsub! /\.{4,}/, '...'
124
+ word.gsub! /-{3,}/, '--'
125
+ next word if word.include? '...' or word.match /-{2,3}/
126
+ next word unless word =~ regex
127
+ word.gsub! regex, '\1'
128
+ end.join ' '
129
+ end
130
+
131
+ # Replace 'ing' with 'g'. Excludes short words like "ring" and "sing," and
132
+ # checks an exception list for special cases.
133
+ def ing
134
+ exceptions = %w[fling]
135
+ @compressed = @compressed.split.map do |word|
136
+ next word unless word.end_with? 'ing'
137
+ next word if word.start_with? '#'
138
+ next word if word.size <= 4
139
+ next word if exceptions.include? word
140
+ word.sub(/ing$/, 'g')
141
+ end.join ' '
142
+ end
143
+
144
+ # Remove lowercase vowels in longer words, unless it is the starting letter.
145
+ def remove_vowels
146
+ @compressed = @compressed.split.map do |word|
147
+ next word if word.start_with? '#'
148
+ word.size >= 4 ? word.gsub(/(?<!\A)[aeiou]/, '') : word
149
+ end.join ' '
150
+ end
151
+
152
+ # Remove spaces between punctuation marks and the following words.
153
+ def sentences
154
+ @compressed.gsub! /([[:punct:]])\s*(\S)/, '\1\2'
155
+ end
156
+
157
+ # Abbreviations common in texting, but with a higher cognitive load.
158
+ def texting
159
+ @compressed.gsub! /is (?:an?|the)/, '='
160
+ @compressed.gsub! /:.\)|\(.:/, ':)'
161
+ @compressed.gsub! /(in|with)? regards? (to)?/i, 're'
162
+ @compressed.gsub! /about|regarding|related( to)?|(in response to)/i, 're'
163
+ @compressed.gsub! /(RT @[^:\b]+):?/, '\1'
164
+ @compressed.gsub! /\bare\b/, 'r'
165
+ @compressed.gsub! /\bfor\b/, '4'
166
+ @compressed.gsub! /\bto/, '2'
167
+ @compressed.gsub! /why/, 'y'
168
+ @compressed.gsub! /you/, 'u'
169
+ end
170
+
171
+ # Regularize whitespace.
172
+ def whitespace
173
+ @compressed = @compressed.split.join ' '
174
+ end
175
+
176
+ # Temporarily remove URLs from the pattern space so that they don't get horked
177
+ # during other text transormations.
178
+ def url_preserve
179
+ @urls = @compressed.scan(/#{URL_PATTERN}/).flatten.compact
180
+ @urls.each { |url| @compressed.gsub! /#{url}/, URL_HOLDER }
181
+ end
182
+
183
+ # Return stored URLs to the pattern space.
184
+ def url_restore
185
+ @urls.each { |url| @compressed.sub! URL_HOLDER, url }
186
+ end
187
+ end
@@ -0,0 +1,38 @@
1
+ module TweetCompressor
2
+ class Tweet
3
+ MAX_LENGTH = 140
4
+ attr_reader :compressed, :original, :urls
5
+
6
+ def initialize tweet=''
7
+ @original, @compressed = tweet, tweet
8
+ @urls = []
9
+ end
10
+
11
+ # The workhorse method that calls each compression stage in turn as long as
12
+ # the tweet text remains larger than 140 characters.
13
+ def compress
14
+ # Always perform, in order to track URL shortening.
15
+ url_preserve
16
+
17
+ stages = %i[url_preserve whitespace correct_grammar contractions
18
+ dedupe_punct abbr remove_vowels dedupe_consonants apostrophes
19
+ sentences]
20
+ stages.each do |stage|
21
+ break if char_count <= MAX_LENGTH
22
+ self.send stage
23
+ end
24
+
25
+ # Must not be a stage, which may be bypassed.
26
+ url_restore
27
+
28
+ @compressed
29
+ end
30
+
31
+ def compression_level
32
+ (100 - ((char_count / @original.size.to_f) * 100)).round 2
33
+ end
34
+
35
+ private
36
+ include Compress
37
+ end
38
+ end
@@ -0,0 +1,3 @@
1
+ module TweetCompressor
2
+ VERSION = '0.8.2'
3
+ end
@@ -0,0 +1,6 @@
1
+ $: << Dir.pwd
2
+
3
+ require 'simplecov'
4
+ SimpleCov.start
5
+
6
+ require 'tweet_compressor'