winnow 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 09e56c471afeb1bf113f07da677460d4c6b4eb9f
4
+ data.tar.gz: e778e453f5f50fa49d305b90b97813aaccce4cdd
5
+ SHA512:
6
+ metadata.gz: 941aaca687e3350ce2bafc6c6e7a9d2209bc36ad23078ed659d308cec42f25435195e561ebfa9f3f8e824d21da437a53abc983821440b16deddd04f39cf765b9
7
+ data.tar.gz: 0e0075ad638eb46ab271ae5d0c0c896db3c3598200af5e89493eab28b117f9660a2e624a09b7080baf64976dfbd18d7e745ab6d2a2ce3e98d9e484c1e4ab700e
@@ -0,0 +1,17 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source 'https://rubygems.org'
2
+ gemspec
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2014 Ulysse Carion
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,150 @@
1
+ # Winnow
2
+
3
+ A tiny Ruby library for document fingerprinting.
4
+
5
+ ## What is document fingerprinting?
6
+
7
+ Document fingerprinting converts a document (e.g. a book, a piece of code, or
8
+ any other string) into a much smaller number of hashes called *fingerprints*. If
9
+ two documents share any fingerprints, then this means there is an exact
10
+ substring match between the two documents.
11
+
12
+ Document fingerprinting has many applications, but the most obvious one is for
13
+ plagiarism detection. By taking fingerprints of two documents, you can detect if
14
+ parts of one document were copied from another.
15
+
16
+ This library implements a fingerprinting algorithm called *winnowing*, described
17
+ by Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken in a paper titled
18
+ [*Winnowing: Local Algorithms for Document Fingerprinting*][swa_paper].
19
+
20
+ ## Usage
21
+
22
+ The `Fingerprinter` class takes care of fingerprinting documents. To create a
23
+ fingerprint, you need to provide two parameters, called the *noise threshold*
24
+ and the *guarantee threshold*. When comparing two documents' fingerprints, no
25
+ match shorter than the noise threshold will be detected, but any match at least
26
+ as long as the guarantee threshold is guaranteed to be found.
27
+
28
+ The proper values for your noise and guarantee thresholds varies by context.
29
+ Experiment with the data you're looking at until you're happy with the results.
30
+
31
+ Creating a fingerprinter is easy:
32
+
33
+ ```ruby
34
+ fingerprinter = Winnow::Fingerprinter.new(noise_threshold: 10, guarantee_threshold: 18)
35
+ ```
36
+
37
+ Then, use `#fingerprints` get the fingerprints. Optionally, pass `:source`
38
+ (default is `nil`) so that Winnow can later report which document a match is
39
+ from.
40
+
41
+ ```ruby
42
+ document = File.new('hamlet.txt')
43
+ fingerprints = fingerprinter.fingerprints(document.read, source: document)
44
+ ```
45
+
46
+ `#fingerprints` just returns a plain-old Ruby `Hash`. The keys of the hash are
47
+ generated from substrings of the document being fingerprinted. Finding shared
48
+ substrings between documents is as simple as seeing if they share any of the
49
+ keys in their `#fingerprints` hash.
50
+
51
+ To keep things easier for you, Winnow comes with a `Matcher` class that will
52
+ find matches between two documents.
53
+
54
+ Here's an example that puts everything together:
55
+
56
+ ```ruby
57
+ require 'winnow'
58
+
59
+ str_a = <<-EOF
60
+ 'Twas brillig, and the slithy toves
61
+ Did gyre and gimble in the wabe;
62
+ This is copied.
63
+ All mimsy were the borogoves,
64
+ And the mome raths outgrabe.
65
+ EOF
66
+
67
+ str_b = <<-EOF
68
+ "Beware the Jabberwock, my son!
69
+ The jaws that bite, the claws that catch!
70
+ Beware the Jubjub bird, and shun
71
+ The frumious -- This is copied. -- Bandersnatch!"
72
+ EOF
73
+
74
+ fprinter = Winnow::Fingerprinter.new(
75
+ guarantee_threshold: 13, noise_threshold: 9)
76
+
77
+ f1 = fprinter.fingerprints(str_a, source: "Stanza 1")
78
+ f2 = fprinter.fingerprints(str_b, source: "Stanza 2")
79
+
80
+ matches = Winnow::Matcher.find_matches(f1, f2)
81
+
82
+ # Because 'This is copied' is longer than the guarantee threshold, there might
83
+ # be a couple of matches found here. For the sake of brevity, let's only look at
84
+ # the first match found.
85
+ match = matches.first
86
+
87
+ # It's possible for the same key to appear in a document multiple times (e.g. if
88
+ # 'This is copied' appears more than once). Winnow::Matcher will return all
89
+ # matches from the same key in array.
90
+ #
91
+ # In this case, we know there's only one match (because 'This is copied' appears
92
+ # only once in each document), so let's only look at the first one.
93
+ match_a = match.matches_from_a.first
94
+ match_b = match.matches_from_b.first
95
+
96
+ p match_a.index, match_b.index # 71, 125
97
+
98
+ match_context_a = str_a[match_a.index - 10 .. match_a.index + 20]
99
+ match_context_b = str_b[match_b.index - 10 .. match_b.index + 20]
100
+
101
+ # Match from Stanza 1: "e wabe;\nThis is copied.\nAll mim"
102
+ puts "Match from #{match_a.source}: #{match_context_a.inspect}"
103
+
104
+ # Match from Stanza 2: "ious -- This is copied. -- Band"
105
+ puts "Match from #{match_b.source}: #{match_context_b.inspect}"
106
+ ```
107
+
108
+ You may find that `Matcher` doesn't handle your exact use-case. That's not a
109
+ problem. [The built-in matcher.rb file](lib/winnow/matcher.rb)
110
+ is only about 10 lines of code, so you could easily make your own.
111
+
112
+ ## :boom: :bomb: A major caveat with `String#hash` :bomb: :boom:
113
+
114
+ In order to avoid [algorithmic complexity attacks][wiki_aca], the value returned
115
+ from Ruby's `String#hash` method [changes every time you restart the
116
+ interpreter][hash_stackoverflow]:
117
+
118
+ ```sh
119
+ $ irb
120
+ 2.0.0p353 :001 > "hello".hash
121
+ => 482951767139383391
122
+ 2.0.0p353 :002 > exit
123
+
124
+ $ irb
125
+ 2.0.0p353 :001 > "hello".hash
126
+ => 3216751850140847920
127
+ 2.0.0p353 :002 > exit
128
+ ```
129
+
130
+ (This is the case even if you're using JRuby.)
131
+
132
+ This means that although the winnowing algorithm *should* allow you to
133
+ precalculate a document's fingerprints and store them somewhere, doing so in
134
+ Ruby will not work unless you're careful to make sure you never restart your
135
+ Ruby runtime.
136
+
137
+ ### A workaround
138
+
139
+ Winnow looks for the presence of a `String#consistent_hash` method. If it finds
140
+ one, it'll call that rather than call `String#hash`. You can therefore describe
141
+ your own hash function if you want to precalculate fingerprint data.
142
+
143
+ I've put together a super-simple (but effective) gem called
144
+ [consistent_hash][consistent_hash] that implements exactly this. It's about a
145
+ dozen lines of MRI C code and it'll probably work for you as well.
146
+
147
+ [swa_paper]: http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf
148
+ [wiki_aca]: http://en.wikipedia.org/wiki/Algorithmic_complexity_attack
149
+ [hash_stackoverflow]: http://stackoverflow.com/questions/23331725/why-are-ruby-hash-methods-randomized
150
+ [consistent_hash]: https://github.com/ucarion/consistent_hash
@@ -0,0 +1,6 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task default: :spec
@@ -0,0 +1,12 @@
1
+ require 'winnow/version'
2
+ require 'winnow/preprocessor'
3
+ require 'winnow/fingerprinter'
4
+ require 'winnow/matcher'
5
+
6
+ module Winnow
7
+ class Location < Struct.new(:source, :index)
8
+ end
9
+
10
+ class MatchDatum < Struct.new(:matches_from_a, :matches_from_b)
11
+ end
12
+ end
@@ -0,0 +1,60 @@
1
+ module Winnow
2
+ class Fingerprinter
3
+ attr_reader :guarantee, :noise, :preprocessor
4
+ alias_method :guarantee_threshold, :guarantee
5
+ alias_method :noise_threshold, :noise
6
+
7
+ def initialize(params)
8
+ @guarantee = params[:guarantee_threshold] || params[:t]
9
+ @noise = params[:noise_threshold] || params[:k]
10
+ @preprocessor = params[:preprocessor] || Preprocessors::Plaintext.new
11
+ end
12
+
13
+ def fingerprints(str, params = {})
14
+ source = params[:source]
15
+
16
+ fingerprints = {}
17
+
18
+ windows(str, source).each do |window|
19
+ least_fingerprint = window.min_by { |fingerprint| fingerprint[:value] }
20
+ value = least_fingerprint[:value]
21
+ location = least_fingerprint[:location]
22
+
23
+ (fingerprints[value] ||= []) << location
24
+ end
25
+
26
+ fingerprints
27
+ end
28
+
29
+ private
30
+
31
+ def windows(str, source)
32
+ k_grams(str, source).each_cons(window_size)
33
+ end
34
+
35
+ def window_size
36
+ guarantee - noise + 1
37
+ end
38
+
39
+ def k_grams(str, source)
40
+ tokens(str).each_cons(noise).map do |tokens_k_gram|
41
+ value = hash(tokens_k_gram.map { |(char)| char }.join)
42
+ location = Location.new(source, tokens_k_gram.first[1])
43
+
44
+ {value: value, location: location}
45
+ end
46
+ end
47
+
48
+ def tokens(str)
49
+ preprocessor.preprocess(str)
50
+ end
51
+
52
+ def hash(str)
53
+ if str.respond_to?(:consistent_hash)
54
+ str.consistent_hash
55
+ else
56
+ str.hash
57
+ end
58
+ end
59
+ end
60
+ end
@@ -0,0 +1,16 @@
1
+ module Winnow
2
+ class Matcher
3
+ class << self
4
+ def find_matches(fingerprints_a, fingerprints_b, params = {})
5
+ whitelist = params[:whitelist] || []
6
+
7
+ matched_values = fingerprints_a.keys & fingerprints_b.keys - whitelist
8
+
9
+ matched_values.map do |value|
10
+ matches_a, matches_b = fingerprints_a[value], fingerprints_b[value]
11
+ MatchDatum.new(matches_a, matches_b)
12
+ end
13
+ end
14
+ end
15
+ end
16
+ end
@@ -0,0 +1,10 @@
1
+ module Winnow
2
+ class Preprocessor
3
+ def preprocess
4
+ raise NotImplementedError
5
+ end
6
+ end
7
+ end
8
+
9
+ require 'winnow/preprocessors/plaintext'
10
+ require 'winnow/preprocessors/source_code'
@@ -0,0 +1,9 @@
1
+ module Winnow
2
+ module Preprocessors
3
+ class Plaintext < Preprocessor
4
+ def preprocess(str)
5
+ str.chars.each_with_index.to_a
6
+ end
7
+ end
8
+ end
9
+ end
@@ -0,0 +1,41 @@
1
+ require 'rouge'
2
+
3
+ module Winnow
4
+ module Preprocessors
5
+ class SourceCode < Preprocessor
6
+ attr_reader :lexer
7
+
8
+ def initialize(language)
9
+ @lexer = Rouge::Lexer.find(language)
10
+ end
11
+
12
+ def preprocess(str)
13
+ current_index = 0
14
+ processed_chars = []
15
+
16
+ lexer.lex(str).to_a.each do |token|
17
+ type, chunk = token
18
+
19
+ processed_chunk = case
20
+ when type <= Rouge::Token::Tokens::Name
21
+ 'x'
22
+ when type <= Rouge::Token::Tokens::Comment
23
+ ''
24
+ when type <= Rouge::Token::Tokens::Text
25
+ ''
26
+ else
27
+ chunk
28
+ end
29
+
30
+ processed_chars += processed_chunk.chars.map do |c|
31
+ [c, current_index]
32
+ end
33
+
34
+ current_index += chunk.length
35
+ end
36
+
37
+ processed_chars
38
+ end
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,3 @@
1
+ module Winnow
2
+ VERSION = "0.0.1"
3
+ end
@@ -0,0 +1,56 @@
1
+ require 'spec_helper'
2
+
3
+ describe Winnow::Fingerprinter do
4
+ describe '#initialize' do
5
+ it 'accepts :guarantee_threshold, :noise_threshold' do
6
+ fprinter = Winnow::Fingerprinter.new(
7
+ guarantee_threshold: 0, noise_threshold: 1)
8
+ expect(fprinter.guarantee_threshold).to eq 0
9
+ expect(fprinter.noise_threshold).to eq 1
10
+ end
11
+
12
+ it 'accepts :t and :k' do
13
+ fprinter = Winnow::Fingerprinter.new(t: 0, k: 1)
14
+ expect(fprinter.guarantee_threshold).to eq 0
15
+ expect(fprinter.noise_threshold).to eq 1
16
+ end
17
+ end
18
+
19
+ describe '#fingerprints' do
20
+ it 'hashes strings to get keys' do
21
+ # if t = k = 1, then each character will become a fingerprint
22
+ fprinter = Winnow::Fingerprinter.new(t: 1, k: 1)
23
+ fprints = fprinter.fingerprints("abcdefg")
24
+
25
+ hashes = ('a'..'g').map(&:hash)
26
+
27
+ expect(fprints.keys).to eq hashes
28
+ end
29
+
30
+ it 'chooses the smallest hash per window' do
31
+ # window size = t - k + 1 = 2 ; for a two-char string, the sole
32
+ # fingerprint should just be from the char with the smallest hash value
33
+ fprinter = Winnow::Fingerprinter.new(t: 2, k: 1)
34
+ fprints = fprinter.fingerprints("ab")
35
+
36
+ expect(fprints.keys.length).to eq 1
37
+ expect(fprints.keys.first).to eq ["a", "b"].map(&:hash).min
38
+ end
39
+
40
+ it 'correctly reports the location of a fingerprint' do
41
+ fprinter = Winnow::Fingerprinter.new(t: 1, k: 1)
42
+ fprints = fprinter.fingerprints("a\nb\ncde\nfg", source: "example")
43
+
44
+ fprint_d = fprints['d'.hash].first
45
+
46
+ expect(fprint_d.index).to eq 5
47
+ expect(fprint_d.source).to eq "example"
48
+ end
49
+
50
+ it 'uses #consistent_hash when possible' do
51
+ String.any_instance.should_receive(:consistent_hash)
52
+
53
+ Winnow::Fingerprinter.new(t: 1, k: 1).fingerprints("a")
54
+ end
55
+ end
56
+ end
@@ -0,0 +1,61 @@
1
+ require 'spec_helper'
2
+
3
+ describe Winnow::Matcher do
4
+ describe '#find_matches' do
5
+ def make_locations(*indices)
6
+ indices.map { |n| Winnow::Location.new(nil, n) }
7
+ end
8
+
9
+ let(:fprint1) do
10
+ {
11
+ 0 => make_locations(0),
12
+ 1 => make_locations(1, 2),
13
+ }
14
+ end
15
+
16
+ let(:fprint2) do
17
+ {
18
+ 0 => make_locations(3),
19
+ 1 => make_locations(4),
20
+ 3 => make_locations(5)
21
+ }
22
+ end
23
+
24
+ let(:matches) { Winnow::Matcher.find_matches(fprint1, fprint2) }
25
+
26
+ def match_with_loc(index, matches = matches)
27
+ matches.find do |data|
28
+ data.matches_from_a.find { |loc| loc.index == index } ||
29
+ data.matches_from_b.find { |loc| loc.index == index }
30
+ end
31
+ end
32
+
33
+ it 'returns an array of match data' do
34
+ expect(matches).to be_an(Array)
35
+ expect(matches.first).to be_a(Winnow::MatchDatum)
36
+ end
37
+
38
+ it 'reports a match when values are equal' do
39
+ match = match_with_loc(0)
40
+ matchloc_b = match.matches_from_b.first
41
+ expect(matchloc_b.index).to eq 3
42
+ end
43
+
44
+ it 'reports nothing when there is no match' do
45
+ match = match_with_loc(5)
46
+ expect(match).to be_nil
47
+ end
48
+
49
+ it 'reports all matches when multi matches' do
50
+ match = match_with_loc(1)
51
+ expect(match.matches_from_a.length).to eq 2
52
+ expect(match.matches_from_b.length).to eq 1
53
+ end
54
+
55
+ it 'ignores whitelisted values' do
56
+ matches = Winnow::Matcher.find_matches(fprint1, fprint2, whitelist: [0])
57
+
58
+ expect(match_with_loc(0, matches)).to be_nil
59
+ end
60
+ end
61
+ end
@@ -0,0 +1,49 @@
1
+ require 'spec_helper'
2
+
3
+ describe Winnow::Preprocessors::Plaintext do
4
+ subject { Winnow::Preprocessors::Plaintext.new }
5
+
6
+ it 'converts a string to an array of chars and indices' do
7
+ str = "abcde"
8
+ char_indices = [['a', 0], ['b', 1], ['c', 2], ['d', 3], ['e', 4]]
9
+
10
+ expect(subject.preprocess(str)).to eq char_indices
11
+ end
12
+ end
13
+
14
+ describe Winnow::Preprocessors::SourceCode do
15
+ subject { Winnow::Preprocessors::SourceCode.new(:java) }
16
+
17
+ it 'simplifies a string, but remembers correct locations' do
18
+ str = "i = 5"
19
+ result = [['x', 0], ['=', 2], ['5', 4]]
20
+
21
+ expect(subject.preprocess(str)).to eq result
22
+ end
23
+
24
+ it 'groups up the indices of a single token' do
25
+ str = "int i"
26
+ result = [['i', 0], ['n', 0], ['t', 0], ['x', 4]]
27
+
28
+ expect(subject.preprocess(str)).to eq result
29
+ end
30
+
31
+ def reconstruct_string(processed)
32
+ processed.map { |(char, _)| char }.join
33
+ end
34
+
35
+ it 'removes whitespace' do
36
+ str = '3; '
37
+ expect(reconstruct_string(subject.preprocess(str))).to eq '3;'
38
+ end
39
+
40
+ it 'removes variable names' do
41
+ str = 'class MyClass { int myInteger = 5 }'
42
+ expect(reconstruct_string(subject.preprocess(str))).to eq 'classx{intx=5}'
43
+ end
44
+
45
+ it 'removes comments' do
46
+ str = 'fooBar();//this is a comment'
47
+ expect(reconstruct_string(subject.preprocess(str))).to eq 'x();'
48
+ end
49
+ end
@@ -0,0 +1,8 @@
1
+ require 'bundler/setup'
2
+ Bundler.setup
3
+
4
+ require 'winnow'
5
+
6
+ RSpec.configure do |config|
7
+ config.color_enabled = true
8
+ end
@@ -0,0 +1,27 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'winnow/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "winnow"
8
+ spec.version = Winnow::VERSION
9
+ spec.authors = ["Ulysse Carion"]
10
+ spec.email = ["ulyssecarion@gmail.com"]
11
+ spec.description = %q{A tiny Ruby library that implements Winnowing,
12
+ an algorithm for document fingerprinting.}
13
+ spec.summary = %q{Simple document fingerprinting and plagiarism detection.}
14
+ spec.homepage = "https://github.com/ucarion/winnow"
15
+ spec.license = "MIT"
16
+
17
+ spec.files = `git ls-files`.split($/)
18
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
19
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
20
+ spec.require_paths = ["lib"]
21
+
22
+ spec.add_dependency 'rouge', '~> 1.3'
23
+
24
+ spec.add_development_dependency "bundler", "~> 1.3"
25
+ spec.add_development_dependency "rake"
26
+ spec.add_development_dependency "rspec", "~> 2.14"
27
+ end
metadata ADDED
@@ -0,0 +1,123 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: winnow
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Ulysse Carion
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2014-05-08 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rouge
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ~>
18
+ - !ruby/object:Gem::Version
19
+ version: '1.3'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ~>
25
+ - !ruby/object:Gem::Version
26
+ version: '1.3'
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ~>
32
+ - !ruby/object:Gem::Version
33
+ version: '1.3'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ~>
39
+ - !ruby/object:Gem::Version
40
+ version: '1.3'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - '>='
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - '>='
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ~>
60
+ - !ruby/object:Gem::Version
61
+ version: '2.14'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ~>
67
+ - !ruby/object:Gem::Version
68
+ version: '2.14'
69
+ description: |-
70
+ A tiny Ruby library that implements Winnowing,
71
+ an algorithm for document fingerprinting.
72
+ email:
73
+ - ulyssecarion@gmail.com
74
+ executables: []
75
+ extensions: []
76
+ extra_rdoc_files: []
77
+ files:
78
+ - .gitignore
79
+ - Gemfile
80
+ - LICENSE.txt
81
+ - README.md
82
+ - Rakefile
83
+ - lib/winnow.rb
84
+ - lib/winnow/fingerprinter.rb
85
+ - lib/winnow/matcher.rb
86
+ - lib/winnow/preprocessor.rb
87
+ - lib/winnow/preprocessors/plaintext.rb
88
+ - lib/winnow/preprocessors/source_code.rb
89
+ - lib/winnow/version.rb
90
+ - spec/fingerprinter_spec.rb
91
+ - spec/matcher_spec.rb
92
+ - spec/preprocessor_spec.rb
93
+ - spec/spec_helper.rb
94
+ - winnow.gemspec
95
+ homepage: https://github.com/ucarion/winnow
96
+ licenses:
97
+ - MIT
98
+ metadata: {}
99
+ post_install_message:
100
+ rdoc_options: []
101
+ require_paths:
102
+ - lib
103
+ required_ruby_version: !ruby/object:Gem::Requirement
104
+ requirements:
105
+ - - '>='
106
+ - !ruby/object:Gem::Version
107
+ version: '0'
108
+ required_rubygems_version: !ruby/object:Gem::Requirement
109
+ requirements:
110
+ - - '>='
111
+ - !ruby/object:Gem::Version
112
+ version: '0'
113
+ requirements: []
114
+ rubyforge_project:
115
+ rubygems_version: 2.1.11
116
+ signing_key:
117
+ specification_version: 4
118
+ summary: Simple document fingerprinting and plagiarism detection.
119
+ test_files:
120
+ - spec/fingerprinter_spec.rb
121
+ - spec/matcher_spec.rb
122
+ - spec/preprocessor_spec.rb
123
+ - spec/spec_helper.rb