tweet_compressor 0.8.2
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +50 -0
- data/.rspec +2 -0
- data/.ruby-version +1 -0
- data/AUTHORS +2 -0
- data/Gemfile +12 -0
- data/Gemfile.lock +48 -0
- data/Guardfile +5 -0
- data/LICENSE +676 -0
- data/README.md +125 -0
- data/Rakefile +55 -0
- data/bin/tweet_compressor +14 -0
- data/lib/tweet_compressor.rb +7 -0
- data/lib/tweet_compressor/compress.rb +187 -0
- data/lib/tweet_compressor/tweet.rb +38 -0
- data/lib/tweet_compressor/version.rb +3 -0
- data/spec/spec_helper.rb +6 -0
- data/spec/tweet_compressor_spec.rb +274 -0
- data/tweet_compressor.gemspec +26 -0
- metadata +75 -0
data/README.md
ADDED
@@ -0,0 +1,125 @@
|
|
1
|
+
# tweet\_compressor
|
2
|
+
|
3
|
+
## Copyright and Licensing
|
4
|
+
|
5
|
+
### Copyright Notice
|
6
|
+
|
7
|
+
The copyright for the software, documentation, and associated files are
|
8
|
+
held by the author.
|
9
|
+
|
10
|
+
Copyright 2013 Todd A. Jacobs
|
11
|
+
All rights reserved.
|
12
|
+
|
13
|
+
The AUTHORS file is also included in the source tree.
|
14
|
+
|
15
|
+
### Software License
|
16
|
+
|
17
|
+
![GPLv3 Logo](http://www.gnu.org/graphics/gplv3-88x31.png)
|
18
|
+
|
19
|
+
The software is licensed under the
|
20
|
+
[GPLv3](http://www.gnu.org/copyleft/gpl.html). The LICENSE file is also
|
21
|
+
included in the source tree.
|
22
|
+
|
23
|
+
### README License
|
24
|
+
|
25
|
+
![Creative Commons BY-NC-SA
|
26
|
+
Logo](http://i.creativecommons.org/l/by-nc-sa/3.0/us/88x31.png)
|
27
|
+
|
28
|
+
This README is licensed under the [Creative Commons
|
29
|
+
Attribution-NonCommercial-ShareAlike 3.0 United States
|
30
|
+
License](http://creativecommons.org/licenses/by-nc-sa/3.0/us/).
|
31
|
+
|
32
|
+
## Purpose
|
33
|
+
|
34
|
+
tweet\_compressor is Ruby gem that performs successive text
|
35
|
+
transformations in order to shrink input text below Twitter's
|
36
|
+
140-character limit while preserving the integrity of hashtags and
|
37
|
+
links.
|
38
|
+
|
39
|
+
## Features
|
40
|
+
|
41
|
+
- Treats hashtags as sacrosanct.
|
42
|
+
- Relies on Twitter to shorten URLs for you, counting URLs as 20
|
43
|
+
characters.
|
44
|
+
- Skips shortening stages whenever the character length drops below 140.
|
45
|
+
- Remains vaguely intelligible even under heavy compression.
|
46
|
+
|
47
|
+
## Caveats and Limitations
|
48
|
+
|
49
|
+
1. The gem performs text transformations; it's not a full parser.
|
50
|
+
2. Some of the transformations may be naive or rely on brute force to
|
51
|
+
get the job done. YMMV.
|
52
|
+
3. No sanity checking is performed on the semantics of the output text.
|
53
|
+
It Works for Me™, but it's not a substitute for applying common
|
54
|
+
sense and a keen eye to your tweets before posting on Twitter.
|
55
|
+
4. Works best when you only need to trim a handful of characters. If
|
56
|
+
you're vastly over the limit, readability suffers as compression gets
|
57
|
+
tighter.
|
58
|
+
|
59
|
+
## Supported Software Versions
|
60
|
+
|
61
|
+
This software is tested against the current Ruby 2.x series. It is
|
62
|
+
unlikely to work without minor editing on 1.9.3, and you're on your own
|
63
|
+
for anything earlier than 1.9.1.
|
64
|
+
|
65
|
+
- See [.ruby-version][20] for the currently-supported Ruby versions.
|
66
|
+
- See [Gemfile.lock][30] for a complete list of gems, including supported
|
67
|
+
versions, needed to build or run this project.
|
68
|
+
|
69
|
+
## Installation and Setup
|
70
|
+
|
71
|
+
Installing tweet\_compressor couldn't be easier. Just follow these two
|
72
|
+
simple steps:
|
73
|
+
|
74
|
+
1. `gem install tweet_compressor`
|
75
|
+
2. There is no step two.
|
76
|
+
|
77
|
+
## Usage
|
78
|
+
|
79
|
+
tweet_compressor <tweet>
|
80
|
+
|
81
|
+
## Examples
|
82
|
+
|
83
|
+
No screenshots here, just samples of what you can expect to see on
|
84
|
+
standard output when you run the program.
|
85
|
+
|
86
|
+
|
87
|
+
- Example of text that requires no compression.
|
88
|
+
|
89
|
+
$ tweet_compressor foo
|
90
|
+
Chars: 3, Compression: 0.0%
|
91
|
+
|
92
|
+
foo
|
93
|
+
|
94
|
+
- Example of extremely heavy compression. Trims 196 characters about the
|
95
|
+
Gettysburg Address down to 137.
|
96
|
+
|
97
|
+
$ tweet_compressor 'Four score and seven years ago our fathers
|
98
|
+
brought forth on this continent a new nation, conceived in liberty,
|
99
|
+
and dedicated to the proposition that all men are created equal.
|
100
|
+
#speech #Lincoln'
|
101
|
+
Chars: 137, Compression: 28.65%
|
102
|
+
|
103
|
+
4 scr &7 yrs ago our fthrs brght frth on ths cntnt a new ntn,cncvd
|
104
|
+
in lbrty,& dctd to the prpstn tht al men are crtd eql.#speech
|
105
|
+
#Lincoln
|
106
|
+
|
107
|
+
- Example of assumed compression from [Twitter's built-in URL
|
108
|
+
shortener.][10]
|
109
|
+
|
110
|
+
$ tweet_compressor 'http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters'
|
111
|
+
Chars: 20, Compression: 70.59%
|
112
|
+
|
113
|
+
http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters
|
114
|
+
|
115
|
+
## Contributions Welcome
|
116
|
+
|
117
|
+
This is an open-source project. Contributors are highly encouraged to
|
118
|
+
open pull-requests on GitHub.
|
119
|
+
|
120
|
+
----
|
121
|
+
[Project Home Page](https://github.com/CodeGnome/tweet_compressor)
|
122
|
+
|
123
|
+
[10]: https://support.twitter.com/entries/109623
|
124
|
+
[20]: https://raw.github.com/CodeGnome/tweet_compressor/master/.ruby-version
|
125
|
+
[30]: https://raw.github.com/CodeGnome/tweet_compressor/master/Gemfile.lock
|
data/Rakefile
ADDED
@@ -0,0 +1,55 @@
|
|
1
|
+
begin
|
2
|
+
require 'bundler/gem_tasks' if Dir.glob('*gemspec').any?
|
3
|
+
require 'bundler/setup' if File.exists? 'Gemfile'
|
4
|
+
rescue LoadError => bundler_missing
|
5
|
+
$stderr.puts bundler_missing
|
6
|
+
end
|
7
|
+
|
8
|
+
require 'rake'
|
9
|
+
|
10
|
+
PROJECT_NAME = File.basename(Dir.pwd).sub /\.rb$/, ''
|
11
|
+
|
12
|
+
desc 'Update exuberant-ctags'
|
13
|
+
task :etags do
|
14
|
+
sh %{etags -R}
|
15
|
+
end
|
16
|
+
|
17
|
+
if Dir.exists? 'test'
|
18
|
+
require 'rake/testtask'
|
19
|
+
|
20
|
+
Rake::TestTask.new do |t|
|
21
|
+
t.test_files = FileList[ 'test*' ]
|
22
|
+
end
|
23
|
+
task :default => :test
|
24
|
+
end
|
25
|
+
|
26
|
+
if Dir.exists? 'spec'
|
27
|
+
require 'rspec/core/rake_task'
|
28
|
+
RSpec::Core::RakeTask.new(:spec)
|
29
|
+
task :default => :spec
|
30
|
+
end
|
31
|
+
|
32
|
+
desc 'Generate rdoc files'
|
33
|
+
task :rdoc do
|
34
|
+
excludes = %w[AUTHORS LICENSE README* *gemspec]
|
35
|
+
system "rdoc #{excludes.map { |file| "-x #{file}" }.join ' '}"
|
36
|
+
end
|
37
|
+
|
38
|
+
task :rename_objects do
|
39
|
+
FileList['lib/**/**', 'README*', '.ruby-version', '.rvm'].each do |oldfile|
|
40
|
+
next if File.directory? oldfile
|
41
|
+
text = File.read(oldfile)
|
42
|
+
|
43
|
+
next unless text.match /(require|module|class).*foo/i
|
44
|
+
text.gsub!(/foo/i, PROJECT_NAME)
|
45
|
+
File.open(oldfile, 'w') { |f| f.puts text }
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
desc 'Rename lib files/objects'
|
50
|
+
task :rename => :rename_objects do
|
51
|
+
libfiles = FileList['lib/**/**']
|
52
|
+
libfiles.gsub(/foo/, PROJECT_NAME).zip(libfiles).each do |f|
|
53
|
+
FileUtils.mv f[1], f[0] unless f.uniq.count == 1
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require_relative File.join '..', 'lib', 'tweet_compressor'
|
4
|
+
|
5
|
+
unless ARGV.size == 1
|
6
|
+
puts "Usage: #{File.basename $0} <tweet>"
|
7
|
+
exit 1
|
8
|
+
end
|
9
|
+
|
10
|
+
tweet = TweetCompressor::Tweet.new ARGV.join ' '
|
11
|
+
tweet.compress
|
12
|
+
|
13
|
+
$stderr.puts "Chars: #{tweet.char_count}, Compression: #{tweet.compression_level}%"
|
14
|
+
$stdout.puts ?\n, tweet.compressed
|
@@ -0,0 +1,187 @@
|
|
1
|
+
# This module is a mixin for classes that want to use a very basic alphabetic
|
2
|
+
# shorthand to reduce text size. The module performs in-place operations, and
|
3
|
+
# expects to find a @compressed instance variable to work from.
|
4
|
+
#
|
5
|
+
# Example:
|
6
|
+
#
|
7
|
+
# include Compress
|
8
|
+
# @original = 'JavaScript'
|
9
|
+
# @compressed = @original.dup
|
10
|
+
# abbr
|
11
|
+
# # => "JS"
|
12
|
+
#
|
13
|
+
#
|
14
|
+
module Compress
|
15
|
+
URL_HOLDER = '__PLACEHOLDER4URLS__'
|
16
|
+
URL_LENGTH = 20
|
17
|
+
URL_PATTERN = %r{
|
18
|
+
\b
|
19
|
+
(
|
20
|
+
(?: [a-z][\w-]+:
|
21
|
+
(?: /{1,3} | [a-z0-9%] ) |
|
22
|
+
www\d{0,3}[.] |
|
23
|
+
[a-z0-9.\-]+[.][a-z]{2,4}/
|
24
|
+
)
|
25
|
+
(?:
|
26
|
+
[^\s()<>]+ | \(([^\s()<>]+|(\([^\s()<>]+\)))*\)
|
27
|
+
)+
|
28
|
+
(?:
|
29
|
+
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) |
|
30
|
+
[^\s`!()\[\]{};:'".,<>?«»“”‘’]
|
31
|
+
)
|
32
|
+
)
|
33
|
+
}ix
|
34
|
+
|
35
|
+
# Calculate the current character count, taking the "virtual size" of
|
36
|
+
# Twitter-shortened URLs into account.
|
37
|
+
def char_count
|
38
|
+
real_url_chars = @urls.join.size
|
39
|
+
virt_url_chars = @urls.count * URL_LENGTH
|
40
|
+
@compressed.size - real_url_chars + virt_url_chars
|
41
|
+
end
|
42
|
+
|
43
|
+
private
|
44
|
+
|
45
|
+
# Special abbreviations to increase clarity.
|
46
|
+
#
|
47
|
+
# TODO: A YAML dictionary would be preferrable to case statements if the list
|
48
|
+
# grows to any significant length.
|
49
|
+
def abbr
|
50
|
+
@compressed = @compressed.split.map do |word|
|
51
|
+
case word.downcase
|
52
|
+
when 'and' then '&'
|
53
|
+
when 'javascript' then 'JS'
|
54
|
+
when 'string' then 'str'
|
55
|
+
when 'one' then '1'
|
56
|
+
when 'two' then '2'
|
57
|
+
when 'three' then '3'
|
58
|
+
when 'four' then '4'
|
59
|
+
when 'five' then '5'
|
60
|
+
when 'six' then '6'
|
61
|
+
when 'seven' then '7'
|
62
|
+
when 'eight' then '8'
|
63
|
+
when 'nine' then '9'
|
64
|
+
when 'ten' then '10'
|
65
|
+
when 'eleven' then '11'
|
66
|
+
when 'twelve' then '12'
|
67
|
+
when 'thirteen' then '13'
|
68
|
+
when 'fourteen' then '14'
|
69
|
+
when 'fifteen' then '15'
|
70
|
+
when 'sixteen' then '15'
|
71
|
+
when 'seventeen' then '17'
|
72
|
+
when 'eighteen' then '18'
|
73
|
+
when 'nineteen' then '19'
|
74
|
+
when 'twenty' then '20'
|
75
|
+
else word
|
76
|
+
end
|
77
|
+
end.join ' '
|
78
|
+
@compressed.gsub! /is (?:an?|the)/, '='
|
79
|
+
@compressed.gsub! /(in|with)? regards? (to)?/i, 're'
|
80
|
+
@compressed.gsub! /about|regarding|related( to)?|(in response to)/, 're'
|
81
|
+
end
|
82
|
+
|
83
|
+
# Remove apostrophes from contractions to save more space.
|
84
|
+
def apostrophes
|
85
|
+
@compressed.gsub! /n't/i, 'nt'
|
86
|
+
end
|
87
|
+
|
88
|
+
# Identify common contractions, taking a few pains to preserve capitalization
|
89
|
+
# of the initial letter.
|
90
|
+
def contractions
|
91
|
+
@compressed.gsub! /I would/i, %q{I'd}
|
92
|
+
@compressed.gsub! /i will(?!= ?not)/i, %q{I'll}
|
93
|
+
@compressed.gsub! /(i)t is/i, %q{\1t's}
|
94
|
+
@compressed.gsub! /(i)s not/i, %q{\1sn't}
|
95
|
+
@compressed.gsub! /(w)ill not/i, %q{\1on't}
|
96
|
+
@compressed.gsub! /(c)an ?not/i, %q{\1an't}
|
97
|
+
@compressed.gsub! /(d)o(es)? not/i, %q{\1o\2n't}
|
98
|
+
@compressed.gsub! /(s)hould not/i, %q{\1houldn't}
|
99
|
+
@compressed.gsub! /(m)ust not/i, %q{\1usn't}
|
100
|
+
end
|
101
|
+
|
102
|
+
# Fix common grammar mistakes that also save space.
|
103
|
+
def correct_grammar
|
104
|
+
@compressed.gsub! /s's/i, ?'
|
105
|
+
end
|
106
|
+
|
107
|
+
# Remove duplicate lowercase consonants. Assume duplicate capital letters
|
108
|
+
# like 'LLC' are intentional.
|
109
|
+
def dedupe_consonants
|
110
|
+
consonants = [*'a'..'z'].flatten.reject { |c| c =~ /[aeiou]/ }
|
111
|
+
regex = /(#{consonants})\1+/
|
112
|
+
@compressed = @compressed.split.map do |word|
|
113
|
+
next word unless word =~ regex
|
114
|
+
word.gsub! regex, $1.to_s
|
115
|
+
end.join ' '
|
116
|
+
end
|
117
|
+
|
118
|
+
# Remove duplicate punctuation characters. Make an exception for ellipses
|
119
|
+
# and dashes.
|
120
|
+
def dedupe_punct
|
121
|
+
regex = /([[:punct:]])\1+/
|
122
|
+
@compressed = @compressed.split.map do |word|
|
123
|
+
word.gsub! /\.{4,}/, '...'
|
124
|
+
word.gsub! /-{3,}/, '--'
|
125
|
+
next word if word.include? '...' or word.match /-{2,3}/
|
126
|
+
next word unless word =~ regex
|
127
|
+
word.gsub! regex, '\1'
|
128
|
+
end.join ' '
|
129
|
+
end
|
130
|
+
|
131
|
+
# Replace 'ing' with 'g'. Excludes short words like "ring" and "sing," and
|
132
|
+
# checks an exception list for special cases.
|
133
|
+
def ing
|
134
|
+
exceptions = %w[fling]
|
135
|
+
@compressed = @compressed.split.map do |word|
|
136
|
+
next word unless word.end_with? 'ing'
|
137
|
+
next word if word.start_with? '#'
|
138
|
+
next word if word.size <= 4
|
139
|
+
next word if exceptions.include? word
|
140
|
+
word.sub(/ing$/, 'g')
|
141
|
+
end.join ' '
|
142
|
+
end
|
143
|
+
|
144
|
+
# Remove lowercase vowels in longer words, unless it is the starting letter.
|
145
|
+
def remove_vowels
|
146
|
+
@compressed = @compressed.split.map do |word|
|
147
|
+
next word if word.start_with? '#'
|
148
|
+
word.size >= 4 ? word.gsub(/(?<!\A)[aeiou]/, '') : word
|
149
|
+
end.join ' '
|
150
|
+
end
|
151
|
+
|
152
|
+
# Remove spaces between punctuation marks and the following words.
|
153
|
+
def sentences
|
154
|
+
@compressed.gsub! /([[:punct:]])\s*(\S)/, '\1\2'
|
155
|
+
end
|
156
|
+
|
157
|
+
# Abbreviations common in texting, but with a higher cognitive load.
|
158
|
+
def texting
|
159
|
+
@compressed.gsub! /is (?:an?|the)/, '='
|
160
|
+
@compressed.gsub! /:.\)|\(.:/, ':)'
|
161
|
+
@compressed.gsub! /(in|with)? regards? (to)?/i, 're'
|
162
|
+
@compressed.gsub! /about|regarding|related( to)?|(in response to)/i, 're'
|
163
|
+
@compressed.gsub! /(RT @[^:\b]+):?/, '\1'
|
164
|
+
@compressed.gsub! /\bare\b/, 'r'
|
165
|
+
@compressed.gsub! /\bfor\b/, '4'
|
166
|
+
@compressed.gsub! /\bto/, '2'
|
167
|
+
@compressed.gsub! /why/, 'y'
|
168
|
+
@compressed.gsub! /you/, 'u'
|
169
|
+
end
|
170
|
+
|
171
|
+
# Regularize whitespace.
|
172
|
+
def whitespace
|
173
|
+
@compressed = @compressed.split.join ' '
|
174
|
+
end
|
175
|
+
|
176
|
+
# Temporarily remove URLs from the pattern space so that they don't get horked
|
177
|
+
# during other text transormations.
|
178
|
+
def url_preserve
|
179
|
+
@urls = @compressed.scan(/#{URL_PATTERN}/).flatten.compact
|
180
|
+
@urls.each { |url| @compressed.gsub! /#{url}/, URL_HOLDER }
|
181
|
+
end
|
182
|
+
|
183
|
+
# Return stored URLs to the pattern space.
|
184
|
+
def url_restore
|
185
|
+
@urls.each { |url| @compressed.sub! URL_HOLDER, url }
|
186
|
+
end
|
187
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
module TweetCompressor
|
2
|
+
class Tweet
|
3
|
+
MAX_LENGTH = 140
|
4
|
+
attr_reader :compressed, :original, :urls
|
5
|
+
|
6
|
+
def initialize tweet=''
|
7
|
+
@original, @compressed = tweet, tweet
|
8
|
+
@urls = []
|
9
|
+
end
|
10
|
+
|
11
|
+
# The workhorse method that calls each compression stage in turn as long as
|
12
|
+
# the tweet text remains larger than 140 characters.
|
13
|
+
def compress
|
14
|
+
# Always perform, in order to track URL shortening.
|
15
|
+
url_preserve
|
16
|
+
|
17
|
+
stages = %i[url_preserve whitespace correct_grammar contractions
|
18
|
+
dedupe_punct abbr remove_vowels dedupe_consonants apostrophes
|
19
|
+
sentences]
|
20
|
+
stages.each do |stage|
|
21
|
+
break if char_count <= MAX_LENGTH
|
22
|
+
self.send stage
|
23
|
+
end
|
24
|
+
|
25
|
+
# Must not be a stage, which may be bypassed.
|
26
|
+
url_restore
|
27
|
+
|
28
|
+
@compressed
|
29
|
+
end
|
30
|
+
|
31
|
+
def compression_level
|
32
|
+
(100 - ((char_count / @original.size.to_f) * 100)).round 2
|
33
|
+
end
|
34
|
+
|
35
|
+
private
|
36
|
+
include Compress
|
37
|
+
end
|
38
|
+
end
|