substitution_solver 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.txt ADDED
@@ -0,0 +1,93 @@
1
+ For command line usage see the bottom of this readme.
2
+
3
+ This archive contains a couple of ruby scripts. The main script
4
+ will decode most mono-alphabetic simple substitution ciphers
5
+ by using a shotgun-hill climbing algorithm. It works by
6
+ selecting a random key and scoring the result, then making small
7
+ adjustments to the key that improve the score (which is the
8
+ hill climbing part). If it can't find any obvious adjustments
9
+ that will improve the score, it will make a small
10
+ random change to the key and then start hill climbing again.
11
+ The algorithm will do this 10 times after the score plateaus, at
12
+ which point if it can't come up with a better score, the
13
+ program gives up (assuming that it has either reached
14
+ a dead end or that it has the answer). Then the program
15
+ will generate a new, completely random key and start the process
16
+ over from the beginning, (this is the shotgun part of the
17
+ algorithm). It's a little like trying to find the highest peak
18
+ of a long polynomial expression where you can't plot the line
19
+ ahead of time. The tactic here would be, pick a random point
20
+ on the line, start climbing the hill until you can't climb any
21
+ higher, then pick another random point on the line and start
22
+ climbing again. If you have a curve with a small number of
23
+ relatively uniformly distributed peaks, than this
24
+ is a moderately efficient, albeit uncertain way of
25
+ ascertaining the correct answer.
26
+
27
+ This program has no way of knowing whether or not it has hit
28
+ upon the correct answer, so it will keep on looking for better
29
+ scores until the user hits CTRL-C to end the program. It's
30
+ up to the user to decide whether the answer the program has
31
+ come up with makes any sense or not. The scoring algorithm
32
+ is really the heart of this program. Basically the program
33
+ has a dictionary of tetra graph frequencies, that is, how
34
+ often do these four letters appear next to one another in the
35
+ English language. Using this number as a guide, it produces
36
+ the score by taking the log of each of the tetra graph
37
+ frequencies in the cipher text and adds them all together to
38
+ produce a score. adding up the logs of the frequencies to
39
+ produce the score rather than adding up the scores themselves
40
+ appears to be important in making the program able to score
41
+ plain text higher than gibberish. I didn't realize this the
42
+ first time I wrote this program and didn't get very good
43
+ results. I should note that I took somebody else's code as
44
+ a guide in producing these ruby scripts. I've modified the
45
+ strategy slightly from the original author's, and obviously
46
+ I translated it from C to Ruby. I'm very grateful to the
47
+ original author for sharing his C code with the world. I
48
+ would have had a hard time getting this program working
49
+ without using his program as a guide, In particular taking
50
+ the logs of the scores would not have occurred to me.
51
+
52
+ This script runs pretty slowly, owing mostly to the fact that
53
+ I've written it as a ruby script. I've seen other implementations
54
+ of this idea written in C which run orders of magnatude faster
55
+ than this implementation does. I plan to retranslate this code
56
+ into several languages to see how well they can run it.
57
+
58
+ At any rate, included in this archive is the following
59
+ 1. the substitution solver script which will over time,
60
+ (hopefully) recover the plain text.
61
+ 2. a sample tetra graph dictionary which was generated using
62
+ "Harry Potter and the Goblet of Fire" as it's source txt,
63
+ (yes I own a legal copy of the book)
64
+ 3. a script which will generate a tetra graph dictionary from
65
+ an ascii text file.
66
+ 4. this readme.
67
+ 5. an inspector script which will show the contents of the
68
+ english.dic file in a readable format.
69
+
70
+ This program is distributed under the GNU General Public
71
+ License. If you would like to use this source code in
72
+ your own programs, or rewrite it or whatever, just be
73
+ sure to read the GNU GPL so that you know your rights
74
+ and responsibilities.
75
+
76
+ examples of usage
77
+
78
+ to decrypt a cipher text in an ascii text file
79
+ ruby substitution_solver.rb cipher.txt
80
+
81
+ to generate an english.dic file using another source use
82
+ ruby dictionary_builder.rb novel.txt
83
+
84
+ finally I'm including a script which will print out the english.dic
85
+ file in a readable form. It's interesting to note that the
86
+ frequencies increase exponentially as you go down the list.
87
+
88
+ example
89
+ ruby dictionary_inspector.rb > output.txt
90
+ or
91
+ ruby dictionary_inspector.rb | less
92
+ or just plain old
93
+ ruby dictionary_inspector.rb
@@ -0,0 +1,16 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ hash = Hash.new(0)
4
+
5
+ File::readlines(ARGV[0]).each do |line|
6
+ line.gsub!(/[^a-zA-Z]/, "").upcase!
7
+ for x in 0...line.length-4
8
+ hash[line[x...x+4]] += 1
9
+ end
10
+ end
11
+
12
+ #puts hash.to_a.sort {|x, y| x[1] <=> y[1]}
13
+
14
+ File.open("english.dic", "w+") do |f|
15
+ Marshal.dump(hash, f)
16
+ end
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $dictionary = Hash.new(0) # The dictionary of tetragraph frequencies
4
+
5
+ File.open("english.dic") do |f| # Open the saved
6
+ $dictionary = Marshal.load(f) # And load this information into our dictionary
7
+ end
8
+
9
+ array = $dictionary.to_a.sort {|x, y| x[1] <=> y[1]}
10
+
11
+ x = 0
12
+ array.each do |tetragraph, freq|
13
+ puts "#{x+=1}, #{tetragraph}, #{freq}"
14
+ end
@@ -0,0 +1,113 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ $iteration = 0 # To record how many iterations the programs
4
+ # had to churn through
5
+
6
+ ciphertext = String.new
7
+
8
+ File::readlines(ARGV[0]).each do |line| # Grab the input from the standard input
9
+ ciphertext << line
10
+ end
11
+
12
+ ciphertext.gsub!(/[^a-zA-Z]/, "").upcase! # get rid of any non-alphabetic characters
13
+
14
+ key = Hash.new # Create a hash that will represent the translation key
15
+
16
+ $dictionary = Hash.new(0) # The dictionary of tetragraph frequencies
17
+ File.open("english.dic") do |f| # Open the saved tetragraph information
18
+ $dictionary = Marshal.load(f) # And load this information into our dictionary
19
+ end
20
+
21
+ def score(string) # This function will score a string against the tetragraph statistics
22
+ $iteration += 1 # Increment the iteration count as this is probably the most fundamental loop to the program
23
+ tally = 0 # Set a counter to 0
24
+ 0.upto(string.length-4) do |x| # Iterate through the string
25
+ tally += Math.log(($dictionary[string[x...x+4]].to_i)+1) # tally up the tetragraph frequencies after applying log to each one (the log is where the magic happens)
26
+ end
27
+ return tally # and return our grand total when we're finished adding it all up
28
+ end
29
+
30
+ def small_adj!(key) # this function makes small random adjustments to the key when we've hill climbed our way into a dead end
31
+ for i in 0...rand(5) # pick a random number of changes to make
32
+ j = rand(26) # now pick two random letters in the alphabet to swap
33
+ k = rand(26)
34
+ if j != k # if the random letters aren't equal
35
+ temp = key[(j+65).chr] # then go ahead and swap them
36
+ key[(j+65).chr] = key[(k+65).chr]
37
+ key[(k+65).chr] = temp
38
+ end
39
+ end
40
+ end
41
+
42
+ def plaintext(ciphertext, key) # This function will return the decoded ciphertext using a given key to do the decoding
43
+ return_string = String.new # create a return string
44
+
45
+ for x in 0...ciphertext.length # loop through the ciphertext
46
+ return_string << key[ciphertext[x].chr] # swap the letters out using the key and build up the return string
47
+ end
48
+ return return_string # return the answer
49
+ end
50
+
51
+ def randomize!(key) # completely randomize the key, ie start over from scratch
52
+ array = Array.new # create an array of letters to pick from
53
+
54
+ for x in 0...26
55
+ array[x] = (x+65).chr # populate the array with characters
56
+ end
57
+
58
+ for x in 0...26 # now loop through the array taking a letter out
59
+ y = rand(array.length) # one at a time randomely and adding it to the key
60
+ key[(x+65).chr] = array[y]
61
+ array.delete_at(y)
62
+ end
63
+ end
64
+
65
+ print "best overall = ", score(ciphertext), " : best score = ", score(ciphertext), "\n" #print the original ciphertext
66
+ puts ciphertext.gsub(/(.....)/, '\1 ')
67
+
68
+ randomize!(key) # randomize the key
69
+
70
+ best_score=score(ciphertext); # set the best score to the score of the ciphertext
71
+ best_overall=best_score-1; # set the best overall score to the best score -1
72
+ num_small_adjusts=0; # set the number of small adjustments to 0
73
+
74
+ loop do # loop forever
75
+ best_adj = best_score # set the best adjustment to the current best score
76
+
77
+ for i in 0...26 # loop through all possible "trivial" letter replacements
78
+ for j in i...26 # in the key looking for the best swap. This in effect is
79
+ test_key = key.dup # the so called "Hill Climbing" part of our program
80
+ temp = test_key[(i+65).chr]
81
+ test_key[(i+65).chr] = test_key[(j+65).chr]
82
+ test_key[(j+65).chr] = temp
83
+ sc = score(plaintext(ciphertext, test_key)) # score the change we've made
84
+ if sc > best_adj # if it's better than any so far
85
+ best_adj=sc # then record the change so we can apply it later if it
86
+ best_i = i # turns out to be the best one
87
+ best_j = j
88
+ end
89
+ end
90
+ end
91
+
92
+ if best_adj > best_score # if we found an adjustment that improves the best score
93
+ temp = key[(best_i+65).chr] # then apply that adjustment to the key
94
+ key[(best_i+65).chr] = key[(best_j+65).chr]
95
+ key[(best_j+65).chr] = temp
96
+ best_score = best_adj
97
+ if best_score > best_overall # if that adjustment is the best overall
98
+ num_small_adjusts = 0 # then reset the number of small adjusts counter
99
+ best_overall = best_score # set this new score as the best overall
100
+ print "best overall = ", best_overall, " : best score = ", best_score, " : iteration = #{$iteration}\n"
101
+ puts plaintext(ciphertext, key).gsub(/(.....)/, '\1 ') # and print our new found best overall value
102
+ end
103
+ else # otherwise none of the adjustments raised are score
104
+ if num_small_adjusts < 10 # so make a small random adjustment to the key
105
+ small_adj!(key) # as long as we haven't already made to many small adjustments
106
+ num_small_adjusts += 1 # increment the number of small adjustments
107
+ else # otherwise we've made to many small adjustments, we're
108
+ randomize!(key) # probably not getting anywhere and need to start looking
109
+ num_small_adjusts = 0 # somplace else, randomize the key and start climbing the
110
+ end # hill again
111
+ best_score=score(plaintext(ciphertext, key)) # set the best score to either the small adjustment value or the new randomized string value depending on what we did above.
112
+ end
113
+ end
metadata ADDED
@@ -0,0 +1,45 @@
1
+ --- !ruby/object:Gem::Specification
2
+ rubygems_version: 0.8.10
3
+ specification_version: 1
4
+ name: substitution_solver
5
+ version: !ruby/object:Gem::Version
6
+ version: 0.5.0
7
+ date: 2005-11-09
8
+ summary: "Program for solving mono-alphabetic simple substitution ciphers, (as in
9
+ cryptoquotes), without word lengths."
10
+ require_paths:
11
+ - lib
12
+ email: pfharlock@yahoo.com
13
+ homepage:
14
+ rubyforge_project:
15
+ description:
16
+ autorequire:
17
+ default_executable:
18
+ bindir: "."
19
+ has_rdoc: false
20
+ required_ruby_version: !ruby/object:Gem::Version::Requirement
21
+ requirements:
22
+ -
23
+ - ">"
24
+ - !ruby/object:Gem::Version
25
+ version: 0.0.0
26
+ version:
27
+ platform: ruby
28
+ authors:
29
+ - Gary Watson
30
+ files:
31
+ - substitution_solver.rb
32
+ - dictionary_builder.rb
33
+ - dictionary_inspector.rb
34
+ - README.txt
35
+ test_files: []
36
+ rdoc_options: []
37
+ extra_rdoc_files:
38
+ - README.txt
39
+ executables:
40
+ - substitution_solver.rb
41
+ - dictionary_builder.rb
42
+ - dictionary_inspector.rb
43
+ extensions: []
44
+ requirements: []
45
+ dependencies: []