tweet_compressor 0.8.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +50 -0
- data/.rspec +2 -0
- data/.ruby-version +1 -0
- data/AUTHORS +2 -0
- data/Gemfile +12 -0
- data/Gemfile.lock +48 -0
- data/Guardfile +5 -0
- data/LICENSE +676 -0
- data/README.md +125 -0
- data/Rakefile +55 -0
- data/bin/tweet_compressor +14 -0
- data/lib/tweet_compressor.rb +7 -0
- data/lib/tweet_compressor/compress.rb +187 -0
- data/lib/tweet_compressor/tweet.rb +38 -0
- data/lib/tweet_compressor/version.rb +3 -0
- data/spec/spec_helper.rb +6 -0
- data/spec/tweet_compressor_spec.rb +274 -0
- data/tweet_compressor.gemspec +26 -0
- metadata +75 -0
data/README.md
ADDED
@@ -0,0 +1,125 @@
|
|
1
|
+
# tweet\_compressor
|
2
|
+
|
3
|
+
## Copyright and Licensing
|
4
|
+
|
5
|
+
### Copyright Notice
|
6
|
+
|
7
|
+
The copyright for the software, documentation, and associated files are
|
8
|
+
held by the author.
|
9
|
+
|
10
|
+
Copyright 2013 Todd A. Jacobs
|
11
|
+
All rights reserved.
|
12
|
+
|
13
|
+
The AUTHORS file is also included in the source tree.
|
14
|
+
|
15
|
+
### Software License
|
16
|
+
|
17
|
+

|
18
|
+
|
19
|
+
The software is licensed under the
|
20
|
+
[GPLv3](http://www.gnu.org/copyleft/gpl.html). The LICENSE file is also
|
21
|
+
included in the source tree.
|
22
|
+
|
23
|
+
### README License
|
24
|
+
|
25
|
+

|
27
|
+
|
28
|
+
This README is licensed under the [Creative Commons
|
29
|
+
Attribution-NonCommercial-ShareAlike 3.0 United States
|
30
|
+
License](http://creativecommons.org/licenses/by-nc-sa/3.0/us/).
|
31
|
+
|
32
|
+
## Purpose
|
33
|
+
|
34
|
+
tweet\_compressor is Ruby gem that performs successive text
|
35
|
+
transformations in order to shrink input text below Twitter's
|
36
|
+
140-character limit while preserving the integrity of hashtags and
|
37
|
+
links.
|
38
|
+
|
39
|
+
## Features
|
40
|
+
|
41
|
+
- Treats hashtags as sacrosanct.
|
42
|
+
- Relies on Twitter to shorten URLs for you, counting URLs as 20
|
43
|
+
characters.
|
44
|
+
- Skips shortening stages whenever the character length drops below 140.
|
45
|
+
- Remains vaguely intelligible even under heavy compression.
|
46
|
+
|
47
|
+
## Caveats and Limitations
|
48
|
+
|
49
|
+
1. The gem performs text transformations; it's not a full parser.
|
50
|
+
2. Some of the transformations may be naive or rely on brute force to
|
51
|
+
get the job done. YMMV.
|
52
|
+
3. No sanity checking is performed on the semantics of the output text.
|
53
|
+
It Works for Me™, but it's not a substitute for applying common
|
54
|
+
sense and a keen eye to your tweets before posting on Twitter.
|
55
|
+
4. Works best when you only need to trim a handful of characters. If
|
56
|
+
you're vastly over the limit, readability suffers as compression gets
|
57
|
+
tighter.
|
58
|
+
|
59
|
+
## Supported Software Versions
|
60
|
+
|
61
|
+
This software is tested against the current Ruby 2.x series. It is
|
62
|
+
unlikely to work without minor editing on 1.9.3, and you're on your own
|
63
|
+
for anything earlier than 1.9.1.
|
64
|
+
|
65
|
+
- See [.ruby-version][20] for the currently-supported Ruby versions.
|
66
|
+
- See [Gemfile.lock][30] for a complete list of gems, including supported
|
67
|
+
versions, needed to build or run this project.
|
68
|
+
|
69
|
+
## Installation and Setup
|
70
|
+
|
71
|
+
Installing tweet\_compressor couldn't be easier. Just follow these two
|
72
|
+
simple steps:
|
73
|
+
|
74
|
+
1. `gem install tweet_compressor`
|
75
|
+
2. There is no step two.
|
76
|
+
|
77
|
+
## Usage
|
78
|
+
|
79
|
+
tweet_compressor <tweet>
|
80
|
+
|
81
|
+
## Examples
|
82
|
+
|
83
|
+
No screenshots here, just samples of what you can expect to see on
|
84
|
+
standard output when you run the program.
|
85
|
+
|
86
|
+
|
87
|
+
- Example of text that requires no compression.
|
88
|
+
|
89
|
+
$ tweet_compressor foo
|
90
|
+
Chars: 3, Compression: 0.0%
|
91
|
+
|
92
|
+
foo
|
93
|
+
|
94
|
+
- Example of extremely heavy compression. Trims 196 characters about the
|
95
|
+
Gettysburg Address down to 137.
|
96
|
+
|
97
|
+
$ tweet_compressor 'Four score and seven years ago our fathers
|
98
|
+
brought forth on this continent a new nation, conceived in liberty,
|
99
|
+
and dedicated to the proposition that all men are created equal.
|
100
|
+
#speech #Lincoln'
|
101
|
+
Chars: 137, Compression: 28.65%
|
102
|
+
|
103
|
+
4 scr &7 yrs ago our fthrs brght frth on ths cntnt a new ntn,cncvd
|
104
|
+
in lbrty,& dctd to the prpstn tht al men are crtd eql.#speech
|
105
|
+
#Lincoln
|
106
|
+
|
107
|
+
- Example of assumed compression from [Twitter's built-in URL
|
108
|
+
shortener.][10]
|
109
|
+
|
110
|
+
$ tweet_compressor 'http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters'
|
111
|
+
Chars: 20, Compression: 70.59%
|
112
|
+
|
113
|
+
http://tweet_compressor/knows/twitter/shortens/urls/to/20/characters
|
114
|
+
|
115
|
+
## Contributions Welcome
|
116
|
+
|
117
|
+
This is an open-source project. Contributors are highly encouraged to
|
118
|
+
open pull-requests on GitHub.
|
119
|
+
|
120
|
+
----
|
121
|
+
[Project Home Page](https://github.com/CodeGnome/tweet_compressor)
|
122
|
+
|
123
|
+
[10]: https://support.twitter.com/entries/109623
|
124
|
+
[20]: https://raw.github.com/CodeGnome/tweet_compressor/master/.ruby-version
|
125
|
+
[30]: https://raw.github.com/CodeGnome/tweet_compressor/master/Gemfile.lock
|
data/Rakefile
ADDED
@@ -0,0 +1,55 @@
|
|
1
|
+
begin
|
2
|
+
require 'bundler/gem_tasks' if Dir.glob('*gemspec').any?
|
3
|
+
require 'bundler/setup' if File.exists? 'Gemfile'
|
4
|
+
rescue LoadError => bundler_missing
|
5
|
+
$stderr.puts bundler_missing
|
6
|
+
end
|
7
|
+
|
8
|
+
require 'rake'
|
9
|
+
|
10
|
+
PROJECT_NAME = File.basename(Dir.pwd).sub /\.rb$/, ''
|
11
|
+
|
12
|
+
desc 'Update exuberant-ctags'
|
13
|
+
task :etags do
|
14
|
+
sh %{etags -R}
|
15
|
+
end
|
16
|
+
|
17
|
+
if Dir.exists? 'test'
|
18
|
+
require 'rake/testtask'
|
19
|
+
|
20
|
+
Rake::TestTask.new do |t|
|
21
|
+
t.test_files = FileList[ 'test*' ]
|
22
|
+
end
|
23
|
+
task :default => :test
|
24
|
+
end
|
25
|
+
|
26
|
+
if Dir.exists? 'spec'
|
27
|
+
require 'rspec/core/rake_task'
|
28
|
+
RSpec::Core::RakeTask.new(:spec)
|
29
|
+
task :default => :spec
|
30
|
+
end
|
31
|
+
|
32
|
+
desc 'Generate rdoc files'
|
33
|
+
task :rdoc do
|
34
|
+
excludes = %w[AUTHORS LICENSE README* *gemspec]
|
35
|
+
system "rdoc #{excludes.map { |file| "-x #{file}" }.join ' '}"
|
36
|
+
end
|
37
|
+
|
38
|
+
task :rename_objects do
|
39
|
+
FileList['lib/**/**', 'README*', '.ruby-version', '.rvm'].each do |oldfile|
|
40
|
+
next if File.directory? oldfile
|
41
|
+
text = File.read(oldfile)
|
42
|
+
|
43
|
+
next unless text.match /(require|module|class).*foo/i
|
44
|
+
text.gsub!(/foo/i, PROJECT_NAME)
|
45
|
+
File.open(oldfile, 'w') { |f| f.puts text }
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
49
|
+
desc 'Rename lib files/objects'
|
50
|
+
task :rename => :rename_objects do
|
51
|
+
libfiles = FileList['lib/**/**']
|
52
|
+
libfiles.gsub(/foo/, PROJECT_NAME).zip(libfiles).each do |f|
|
53
|
+
FileUtils.mv f[1], f[0] unless f.uniq.count == 1
|
54
|
+
end
|
55
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require_relative File.join '..', 'lib', 'tweet_compressor'
|
4
|
+
|
5
|
+
unless ARGV.size == 1
|
6
|
+
puts "Usage: #{File.basename $0} <tweet>"
|
7
|
+
exit 1
|
8
|
+
end
|
9
|
+
|
10
|
+
tweet = TweetCompressor::Tweet.new ARGV.join ' '
|
11
|
+
tweet.compress
|
12
|
+
|
13
|
+
$stderr.puts "Chars: #{tweet.char_count}, Compression: #{tweet.compression_level}%"
|
14
|
+
$stdout.puts ?\n, tweet.compressed
|
@@ -0,0 +1,187 @@
|
|
1
|
+
# This module is a mixin for classes that want to use a very basic alphabetic
|
2
|
+
# shorthand to reduce text size. The module performs in-place operations, and
|
3
|
+
# expects to find a @compressed instance variable to work from.
|
4
|
+
#
|
5
|
+
# Example:
|
6
|
+
#
|
7
|
+
# include Compress
|
8
|
+
# @original = 'JavaScript'
|
9
|
+
# @compressed = @original.dup
|
10
|
+
# abbr
|
11
|
+
# # => "JS"
|
12
|
+
#
|
13
|
+
#
|
14
|
+
module Compress
|
15
|
+
URL_HOLDER = '__PLACEHOLDER4URLS__'
|
16
|
+
URL_LENGTH = 20
|
17
|
+
URL_PATTERN = %r{
|
18
|
+
\b
|
19
|
+
(
|
20
|
+
(?: [a-z][\w-]+:
|
21
|
+
(?: /{1,3} | [a-z0-9%] ) |
|
22
|
+
www\d{0,3}[.] |
|
23
|
+
[a-z0-9.\-]+[.][a-z]{2,4}/
|
24
|
+
)
|
25
|
+
(?:
|
26
|
+
[^\s()<>]+ | \(([^\s()<>]+|(\([^\s()<>]+\)))*\)
|
27
|
+
)+
|
28
|
+
(?:
|
29
|
+
\(([^\s()<>]+|(\([^\s()<>]+\)))*\) |
|
30
|
+
[^\s`!()\[\]{};:'".,<>?«»“”‘’]
|
31
|
+
)
|
32
|
+
)
|
33
|
+
}ix
|
34
|
+
|
35
|
+
# Calculate the current character count, taking the "virtual size" of
|
36
|
+
# Twitter-shortened URLs into account.
|
37
|
+
def char_count
|
38
|
+
real_url_chars = @urls.join.size
|
39
|
+
virt_url_chars = @urls.count * URL_LENGTH
|
40
|
+
@compressed.size - real_url_chars + virt_url_chars
|
41
|
+
end
|
42
|
+
|
43
|
+
private
|
44
|
+
|
45
|
+
# Special abbreviations to increase clarity.
|
46
|
+
#
|
47
|
+
# TODO: A YAML dictionary would be preferrable to case statements if the list
|
48
|
+
# grows to any significant length.
|
49
|
+
def abbr
|
50
|
+
@compressed = @compressed.split.map do |word|
|
51
|
+
case word.downcase
|
52
|
+
when 'and' then '&'
|
53
|
+
when 'javascript' then 'JS'
|
54
|
+
when 'string' then 'str'
|
55
|
+
when 'one' then '1'
|
56
|
+
when 'two' then '2'
|
57
|
+
when 'three' then '3'
|
58
|
+
when 'four' then '4'
|
59
|
+
when 'five' then '5'
|
60
|
+
when 'six' then '6'
|
61
|
+
when 'seven' then '7'
|
62
|
+
when 'eight' then '8'
|
63
|
+
when 'nine' then '9'
|
64
|
+
when 'ten' then '10'
|
65
|
+
when 'eleven' then '11'
|
66
|
+
when 'twelve' then '12'
|
67
|
+
when 'thirteen' then '13'
|
68
|
+
when 'fourteen' then '14'
|
69
|
+
when 'fifteen' then '15'
|
70
|
+
when 'sixteen' then '15'
|
71
|
+
when 'seventeen' then '17'
|
72
|
+
when 'eighteen' then '18'
|
73
|
+
when 'nineteen' then '19'
|
74
|
+
when 'twenty' then '20'
|
75
|
+
else word
|
76
|
+
end
|
77
|
+
end.join ' '
|
78
|
+
@compressed.gsub! /is (?:an?|the)/, '='
|
79
|
+
@compressed.gsub! /(in|with)? regards? (to)?/i, 're'
|
80
|
+
@compressed.gsub! /about|regarding|related( to)?|(in response to)/, 're'
|
81
|
+
end
|
82
|
+
|
83
|
+
# Remove apostrophes from contractions to save more space.
|
84
|
+
def apostrophes
|
85
|
+
@compressed.gsub! /n't/i, 'nt'
|
86
|
+
end
|
87
|
+
|
88
|
+
# Identify common contractions, taking a few pains to preserve capitalization
|
89
|
+
# of the initial letter.
|
90
|
+
def contractions
|
91
|
+
@compressed.gsub! /I would/i, %q{I'd}
|
92
|
+
@compressed.gsub! /i will(?!= ?not)/i, %q{I'll}
|
93
|
+
@compressed.gsub! /(i)t is/i, %q{\1t's}
|
94
|
+
@compressed.gsub! /(i)s not/i, %q{\1sn't}
|
95
|
+
@compressed.gsub! /(w)ill not/i, %q{\1on't}
|
96
|
+
@compressed.gsub! /(c)an ?not/i, %q{\1an't}
|
97
|
+
@compressed.gsub! /(d)o(es)? not/i, %q{\1o\2n't}
|
98
|
+
@compressed.gsub! /(s)hould not/i, %q{\1houldn't}
|
99
|
+
@compressed.gsub! /(m)ust not/i, %q{\1usn't}
|
100
|
+
end
|
101
|
+
|
102
|
+
# Fix common grammar mistakes that also save space.
|
103
|
+
def correct_grammar
|
104
|
+
@compressed.gsub! /s's/i, ?'
|
105
|
+
end
|
106
|
+
|
107
|
+
# Remove duplicate lowercase consonants. Assume duplicate capital letters
|
108
|
+
# like 'LLC' are intentional.
|
109
|
+
def dedupe_consonants
|
110
|
+
consonants = [*'a'..'z'].flatten.reject { |c| c =~ /[aeiou]/ }
|
111
|
+
regex = /(#{consonants})\1+/
|
112
|
+
@compressed = @compressed.split.map do |word|
|
113
|
+
next word unless word =~ regex
|
114
|
+
word.gsub! regex, $1.to_s
|
115
|
+
end.join ' '
|
116
|
+
end
|
117
|
+
|
118
|
+
# Remove duplicate punctuation characters. Make an exception for ellipses
|
119
|
+
# and dashes.
|
120
|
+
def dedupe_punct
|
121
|
+
regex = /([[:punct:]])\1+/
|
122
|
+
@compressed = @compressed.split.map do |word|
|
123
|
+
word.gsub! /\.{4,}/, '...'
|
124
|
+
word.gsub! /-{3,}/, '--'
|
125
|
+
next word if word.include? '...' or word.match /-{2,3}/
|
126
|
+
next word unless word =~ regex
|
127
|
+
word.gsub! regex, '\1'
|
128
|
+
end.join ' '
|
129
|
+
end
|
130
|
+
|
131
|
+
# Replace 'ing' with 'g'. Excludes short words like "ring" and "sing," and
|
132
|
+
# checks an exception list for special cases.
|
133
|
+
def ing
|
134
|
+
exceptions = %w[fling]
|
135
|
+
@compressed = @compressed.split.map do |word|
|
136
|
+
next word unless word.end_with? 'ing'
|
137
|
+
next word if word.start_with? '#'
|
138
|
+
next word if word.size <= 4
|
139
|
+
next word if exceptions.include? word
|
140
|
+
word.sub(/ing$/, 'g')
|
141
|
+
end.join ' '
|
142
|
+
end
|
143
|
+
|
144
|
+
# Remove lowercase vowels in longer words, unless it is the starting letter.
|
145
|
+
def remove_vowels
|
146
|
+
@compressed = @compressed.split.map do |word|
|
147
|
+
next word if word.start_with? '#'
|
148
|
+
word.size >= 4 ? word.gsub(/(?<!\A)[aeiou]/, '') : word
|
149
|
+
end.join ' '
|
150
|
+
end
|
151
|
+
|
152
|
+
# Remove spaces between punctuation marks and the following words.
|
153
|
+
def sentences
|
154
|
+
@compressed.gsub! /([[:punct:]])\s*(\S)/, '\1\2'
|
155
|
+
end
|
156
|
+
|
157
|
+
# Abbreviations common in texting, but with a higher cognitive load.
|
158
|
+
def texting
|
159
|
+
@compressed.gsub! /is (?:an?|the)/, '='
|
160
|
+
@compressed.gsub! /:.\)|\(.:/, ':)'
|
161
|
+
@compressed.gsub! /(in|with)? regards? (to)?/i, 're'
|
162
|
+
@compressed.gsub! /about|regarding|related( to)?|(in response to)/i, 're'
|
163
|
+
@compressed.gsub! /(RT @[^:\b]+):?/, '\1'
|
164
|
+
@compressed.gsub! /\bare\b/, 'r'
|
165
|
+
@compressed.gsub! /\bfor\b/, '4'
|
166
|
+
@compressed.gsub! /\bto/, '2'
|
167
|
+
@compressed.gsub! /why/, 'y'
|
168
|
+
@compressed.gsub! /you/, 'u'
|
169
|
+
end
|
170
|
+
|
171
|
+
# Regularize whitespace.
|
172
|
+
def whitespace
|
173
|
+
@compressed = @compressed.split.join ' '
|
174
|
+
end
|
175
|
+
|
176
|
+
# Temporarily remove URLs from the pattern space so that they don't get horked
|
177
|
+
# during other text transormations.
|
178
|
+
def url_preserve
|
179
|
+
@urls = @compressed.scan(/#{URL_PATTERN}/).flatten.compact
|
180
|
+
@urls.each { |url| @compressed.gsub! /#{url}/, URL_HOLDER }
|
181
|
+
end
|
182
|
+
|
183
|
+
# Return stored URLs to the pattern space.
|
184
|
+
def url_restore
|
185
|
+
@urls.each { |url| @compressed.sub! URL_HOLDER, url }
|
186
|
+
end
|
187
|
+
end
|
@@ -0,0 +1,38 @@
|
|
1
|
+
module TweetCompressor
|
2
|
+
class Tweet
|
3
|
+
MAX_LENGTH = 140
|
4
|
+
attr_reader :compressed, :original, :urls
|
5
|
+
|
6
|
+
def initialize tweet=''
|
7
|
+
@original, @compressed = tweet, tweet
|
8
|
+
@urls = []
|
9
|
+
end
|
10
|
+
|
11
|
+
# The workhorse method that calls each compression stage in turn as long as
|
12
|
+
# the tweet text remains larger than 140 characters.
|
13
|
+
def compress
|
14
|
+
# Always perform, in order to track URL shortening.
|
15
|
+
url_preserve
|
16
|
+
|
17
|
+
stages = %i[url_preserve whitespace correct_grammar contractions
|
18
|
+
dedupe_punct abbr remove_vowels dedupe_consonants apostrophes
|
19
|
+
sentences]
|
20
|
+
stages.each do |stage|
|
21
|
+
break if char_count <= MAX_LENGTH
|
22
|
+
self.send stage
|
23
|
+
end
|
24
|
+
|
25
|
+
# Must not be a stage, which may be bypassed.
|
26
|
+
url_restore
|
27
|
+
|
28
|
+
@compressed
|
29
|
+
end
|
30
|
+
|
31
|
+
def compression_level
|
32
|
+
(100 - ((char_count / @original.size.to_f) * 100)).round 2
|
33
|
+
end
|
34
|
+
|
35
|
+
private
|
36
|
+
include Compress
|
37
|
+
end
|
38
|
+
end
|