substitution_solver 0.5.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.txt +93 -0
- data/dictionary_builder.rb +16 -0
- data/dictionary_inspector.rb +14 -0
- data/substitution_solver.rb +113 -0
- metadata +45 -0
data/README.txt
ADDED
@@ -0,0 +1,93 @@
|
|
1
|
+
For command line usage see the bottom of this readme.
|
2
|
+
|
3
|
+
This archive contains a couple of ruby scripts. The main script
|
4
|
+
will decode most mono-alphabetic simple substitution ciphers
|
5
|
+
by using a shotgun-hill climbing algorithm. It works by
|
6
|
+
selecting a random key and scoring the result, then making small
|
7
|
+
adjustments to the key that improve the score (which is the
|
8
|
+
hill climbing part). If it can't find any obvious adjustments
|
9
|
+
that will improve the score, it will make a small
|
10
|
+
random change to the key and then start hill climbing again.
|
11
|
+
The algorithm will do this 10 times after the score plateaus, at
|
12
|
+
which point if it can't come up with a better score, the
|
13
|
+
program gives up (assuming that it has either reached
|
14
|
+
a dead end or that it has the answer). Then the program
|
15
|
+
will generate a new, completely random key and start the process
|
16
|
+
over from the beginning, (this is the shotgun part of the
|
17
|
+
algorithm). It's a little like trying to find the highest peak
|
18
|
+
of a long polynomial expression where you can't plot the line
|
19
|
+
ahead of time. The tactic here would be, pick a random point
|
20
|
+
on the line, start climbing the hill until you can't climb any
|
21
|
+
higher, then pick another random point on the line and start
|
22
|
+
climbing again. If you have a curve with a small number of
|
23
|
+
relatively uniformly distributed peaks, than this
|
24
|
+
is a moderately efficient, albeit uncertain way of
|
25
|
+
ascertaining the correct answer.
|
26
|
+
|
27
|
+
This program has no way of knowing whether or not it has hit
|
28
|
+
upon the correct answer, so it will keep on looking for better
|
29
|
+
scores until the user hits CTRL-C to end the program. It's
|
30
|
+
up to the user to decide whether the answer the program has
|
31
|
+
come up with makes any sense or not. The scoring algorithm
|
32
|
+
is really the heart of this program. Basically the program
|
33
|
+
has a dictionary of tetra graph frequencies, that is, how
|
34
|
+
often do these four letters appear next to one another in the
|
35
|
+
English language. Using this number as a guide, it produces
|
36
|
+
the score by taking the log of each of the tetra graph
|
37
|
+
frequencies in the cipher text and adds them all together to
|
38
|
+
produce a score. adding up the logs of the frequencies to
|
39
|
+
produce the score rather than adding up the scores themselves
|
40
|
+
appears to be important in making the program able to score
|
41
|
+
plain text higher than gibberish. I didn't realize this the
|
42
|
+
first time I wrote this program and didn't get very good
|
43
|
+
results. I should note that I took somebody else's code as
|
44
|
+
a guide in producing these ruby scripts. I've modified the
|
45
|
+
strategy slightly from the original author's, and obviously
|
46
|
+
I translated it from C to Ruby. I'm very grateful to the
|
47
|
+
original author for sharing his C code with the world. I
|
48
|
+
would have had a hard time getting this program working
|
49
|
+
without using his program as a guide, In particular taking
|
50
|
+
the logs of the scores would not have occurred to me.
|
51
|
+
|
52
|
+
This script runs pretty slowly, owing mostly to the fact that
|
53
|
+
I've written it as a ruby script. I've seen other implementations
|
54
|
+
of this idea written in C which run orders of magnatude faster
|
55
|
+
than this implementation does. I plan to retranslate this code
|
56
|
+
into several languages to see how well they can run it.
|
57
|
+
|
58
|
+
At any rate, included in this archive is the following
|
59
|
+
1. the substitution solver script which will over time,
|
60
|
+
(hopefully) recover the plain text.
|
61
|
+
2. a sample tetra graph dictionary which was generated using
|
62
|
+
"Harry Potter and the Goblet of Fire" as it's source txt,
|
63
|
+
(yes I own a legal copy of the book)
|
64
|
+
3. a script which will generate a tetra graph dictionary from
|
65
|
+
an ascii text file.
|
66
|
+
4. this readme.
|
67
|
+
5. an inspector script which will show the contents of the
|
68
|
+
english.dic file in a readable format.
|
69
|
+
|
70
|
+
This program is distributed under the GNU General Public
|
71
|
+
License. If you would like to use this source code in
|
72
|
+
your own programs, or rewrite it or whatever, just be
|
73
|
+
sure to read the GNU GPL so that you know your rights
|
74
|
+
and responsibilities.
|
75
|
+
|
76
|
+
examples of usage
|
77
|
+
|
78
|
+
to decrypt a cipher text in an ascii text file
|
79
|
+
ruby substitution_solver.rb cipher.txt
|
80
|
+
|
81
|
+
to generate an english.dic file using another source use
|
82
|
+
ruby dictionary_builder.rb novel.txt
|
83
|
+
|
84
|
+
finally I'm including a script which will print out the english.dic
|
85
|
+
file in a readable form. It's interesting to note that the
|
86
|
+
frequencies increase exponentially as you go down the list.
|
87
|
+
|
88
|
+
example
|
89
|
+
ruby dictionary_inspector.rb > output.txt
|
90
|
+
or
|
91
|
+
ruby dictionary_inspector.rb | less
|
92
|
+
or just plain old
|
93
|
+
ruby dictionary_inspector.rb
|
@@ -0,0 +1,16 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
hash = Hash.new(0)
|
4
|
+
|
5
|
+
File::readlines(ARGV[0]).each do |line|
|
6
|
+
line.gsub!(/[^a-zA-Z]/, "").upcase!
|
7
|
+
for x in 0...line.length-4
|
8
|
+
hash[line[x...x+4]] += 1
|
9
|
+
end
|
10
|
+
end
|
11
|
+
|
12
|
+
#puts hash.to_a.sort {|x, y| x[1] <=> y[1]}
|
13
|
+
|
14
|
+
File.open("english.dic", "w+") do |f|
|
15
|
+
Marshal.dump(hash, f)
|
16
|
+
end
|
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
$dictionary = Hash.new(0) # The dictionary of tetragraph frequencies
|
4
|
+
|
5
|
+
File.open("english.dic") do |f| # Open the saved
|
6
|
+
$dictionary = Marshal.load(f) # And load this information into our dictionary
|
7
|
+
end
|
8
|
+
|
9
|
+
array = $dictionary.to_a.sort {|x, y| x[1] <=> y[1]}
|
10
|
+
|
11
|
+
x = 0
|
12
|
+
array.each do |tetragraph, freq|
|
13
|
+
puts "#{x+=1}, #{tetragraph}, #{freq}"
|
14
|
+
end
|
@@ -0,0 +1,113 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
$iteration = 0 # To record how many iterations the programs
|
4
|
+
# had to churn through
|
5
|
+
|
6
|
+
ciphertext = String.new
|
7
|
+
|
8
|
+
File::readlines(ARGV[0]).each do |line| # Grab the input from the standard input
|
9
|
+
ciphertext << line
|
10
|
+
end
|
11
|
+
|
12
|
+
ciphertext.gsub!(/[^a-zA-Z]/, "").upcase! # get rid of any non-alphabetic characters
|
13
|
+
|
14
|
+
key = Hash.new # Create a hash that will represent the translation key
|
15
|
+
|
16
|
+
$dictionary = Hash.new(0) # The dictionary of tetragraph frequencies
|
17
|
+
File.open("english.dic") do |f| # Open the saved tetragraph information
|
18
|
+
$dictionary = Marshal.load(f) # And load this information into our dictionary
|
19
|
+
end
|
20
|
+
|
21
|
+
def score(string) # This function will score a string against the tetragraph statistics
|
22
|
+
$iteration += 1 # Increment the iteration count as this is probably the most fundamental loop to the program
|
23
|
+
tally = 0 # Set a counter to 0
|
24
|
+
0.upto(string.length-4) do |x| # Iterate through the string
|
25
|
+
tally += Math.log(($dictionary[string[x...x+4]].to_i)+1) # tally up the tetragraph frequencies after applying log to each one (the log is where the magic happens)
|
26
|
+
end
|
27
|
+
return tally # and return our grand total when we're finished adding it all up
|
28
|
+
end
|
29
|
+
|
30
|
+
def small_adj!(key) # this function makes small random adjustments to the key when we've hill climbed our way into a dead end
|
31
|
+
for i in 0...rand(5) # pick a random number of changes to make
|
32
|
+
j = rand(26) # now pick two random letters in the alphabet to swap
|
33
|
+
k = rand(26)
|
34
|
+
if j != k # if the random letters aren't equal
|
35
|
+
temp = key[(j+65).chr] # then go ahead and swap them
|
36
|
+
key[(j+65).chr] = key[(k+65).chr]
|
37
|
+
key[(k+65).chr] = temp
|
38
|
+
end
|
39
|
+
end
|
40
|
+
end
|
41
|
+
|
42
|
+
def plaintext(ciphertext, key) # This function will return the decoded ciphertext using a given key to do the decoding
|
43
|
+
return_string = String.new # create a return string
|
44
|
+
|
45
|
+
for x in 0...ciphertext.length # loop through the ciphertext
|
46
|
+
return_string << key[ciphertext[x].chr] # swap the letters out using the key and build up the return string
|
47
|
+
end
|
48
|
+
return return_string # return the answer
|
49
|
+
end
|
50
|
+
|
51
|
+
def randomize!(key) # completely randomize the key, ie start over from scratch
|
52
|
+
array = Array.new # create an array of letters to pick from
|
53
|
+
|
54
|
+
for x in 0...26
|
55
|
+
array[x] = (x+65).chr # populate the array with characters
|
56
|
+
end
|
57
|
+
|
58
|
+
for x in 0...26 # now loop through the array taking a letter out
|
59
|
+
y = rand(array.length) # one at a time randomely and adding it to the key
|
60
|
+
key[(x+65).chr] = array[y]
|
61
|
+
array.delete_at(y)
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
print "best overall = ", score(ciphertext), " : best score = ", score(ciphertext), "\n" #print the original ciphertext
|
66
|
+
puts ciphertext.gsub(/(.....)/, '\1 ')
|
67
|
+
|
68
|
+
randomize!(key) # randomize the key
|
69
|
+
|
70
|
+
best_score=score(ciphertext); # set the best score to the score of the ciphertext
|
71
|
+
best_overall=best_score-1; # set the best overall score to the best score -1
|
72
|
+
num_small_adjusts=0; # set the number of small adjustments to 0
|
73
|
+
|
74
|
+
loop do # loop forever
|
75
|
+
best_adj = best_score # set the best adjustment to the current best score
|
76
|
+
|
77
|
+
for i in 0...26 # loop through all possible "trivial" letter replacements
|
78
|
+
for j in i...26 # in the key looking for the best swap. This in effect is
|
79
|
+
test_key = key.dup # the so called "Hill Climbing" part of our program
|
80
|
+
temp = test_key[(i+65).chr]
|
81
|
+
test_key[(i+65).chr] = test_key[(j+65).chr]
|
82
|
+
test_key[(j+65).chr] = temp
|
83
|
+
sc = score(plaintext(ciphertext, test_key)) # score the change we've made
|
84
|
+
if sc > best_adj # if it's better than any so far
|
85
|
+
best_adj=sc # then record the change so we can apply it later if it
|
86
|
+
best_i = i # turns out to be the best one
|
87
|
+
best_j = j
|
88
|
+
end
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
if best_adj > best_score # if we found an adjustment that improves the best score
|
93
|
+
temp = key[(best_i+65).chr] # then apply that adjustment to the key
|
94
|
+
key[(best_i+65).chr] = key[(best_j+65).chr]
|
95
|
+
key[(best_j+65).chr] = temp
|
96
|
+
best_score = best_adj
|
97
|
+
if best_score > best_overall # if that adjustment is the best overall
|
98
|
+
num_small_adjusts = 0 # then reset the number of small adjusts counter
|
99
|
+
best_overall = best_score # set this new score as the best overall
|
100
|
+
print "best overall = ", best_overall, " : best score = ", best_score, " : iteration = #{$iteration}\n"
|
101
|
+
puts plaintext(ciphertext, key).gsub(/(.....)/, '\1 ') # and print our new found best overall value
|
102
|
+
end
|
103
|
+
else # otherwise none of the adjustments raised are score
|
104
|
+
if num_small_adjusts < 10 # so make a small random adjustment to the key
|
105
|
+
small_adj!(key) # as long as we haven't already made to many small adjustments
|
106
|
+
num_small_adjusts += 1 # increment the number of small adjustments
|
107
|
+
else # otherwise we've made to many small adjustments, we're
|
108
|
+
randomize!(key) # probably not getting anywhere and need to start looking
|
109
|
+
num_small_adjusts = 0 # somplace else, randomize the key and start climbing the
|
110
|
+
end # hill again
|
111
|
+
best_score=score(plaintext(ciphertext, key)) # set the best score to either the small adjustment value or the new randomized string value depending on what we did above.
|
112
|
+
end
|
113
|
+
end
|
metadata
ADDED
@@ -0,0 +1,45 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
rubygems_version: 0.8.10
|
3
|
+
specification_version: 1
|
4
|
+
name: substitution_solver
|
5
|
+
version: !ruby/object:Gem::Version
|
6
|
+
version: 0.5.0
|
7
|
+
date: 2005-11-09
|
8
|
+
summary: "Program for solving mono-alphabetic simple substitution ciphers, (as in
|
9
|
+
cryptoquotes), without word lengths."
|
10
|
+
require_paths:
|
11
|
+
- lib
|
12
|
+
email: pfharlock@yahoo.com
|
13
|
+
homepage:
|
14
|
+
rubyforge_project:
|
15
|
+
description:
|
16
|
+
autorequire:
|
17
|
+
default_executable:
|
18
|
+
bindir: "."
|
19
|
+
has_rdoc: false
|
20
|
+
required_ruby_version: !ruby/object:Gem::Version::Requirement
|
21
|
+
requirements:
|
22
|
+
-
|
23
|
+
- ">"
|
24
|
+
- !ruby/object:Gem::Version
|
25
|
+
version: 0.0.0
|
26
|
+
version:
|
27
|
+
platform: ruby
|
28
|
+
authors:
|
29
|
+
- Gary Watson
|
30
|
+
files:
|
31
|
+
- substitution_solver.rb
|
32
|
+
- dictionary_builder.rb
|
33
|
+
- dictionary_inspector.rb
|
34
|
+
- README.txt
|
35
|
+
test_files: []
|
36
|
+
rdoc_options: []
|
37
|
+
extra_rdoc_files:
|
38
|
+
- README.txt
|
39
|
+
executables:
|
40
|
+
- substitution_solver.rb
|
41
|
+
- dictionary_builder.rb
|
42
|
+
- dictionary_inspector.rb
|
43
|
+
extensions: []
|
44
|
+
requirements: []
|
45
|
+
dependencies: []
|