tf-idf-similarity 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: bcd5811852a4ec0e65c55ac09854024c1256483b
4
+ data.tar.gz: 9978891aa4e76e8badec85da898b575ed91098cc
5
+ SHA512:
6
+ metadata.gz: 3219126f9ea91f3d2bfb8db0954d6637f090d4c75bf5f7c57c8208b0a812c1d8e2fd488d438cedd5e188a6a169ff7b9bf1470e82146085bc2bdf7298de1572fb
7
+ data.tar.gz: d523d3e77ab1cfd31ddc4e6b7f429726b4d6b3520377e3504d8b0c2bc7a51b11dcaa55991ed19ff497ae75e993ce49aef34dcef619f066e7a7c6ac9e7ee1aa35
@@ -23,10 +23,6 @@ before_install:
23
23
  # Installing ATLAS will install BLAS.
24
24
  - if [ $MATRIX_LIBRARY = 'nmatrix' ]; then sudo apt-get install -qq libatlas-dev libatlas-base-dev libatlas3gf-base; fi
25
25
  - if [ $MATRIX_LIBRARY = 'nmatrix' ]; then export CPLUS_INCLUDE_PATH=$CPLUS_INCLUDE_PATH:/usr/include/atlas; fi
26
- - if [ $MATRIX_LIBRARY = 'nmatrix' ]; then git clone git://github.com/SciRuby/nmatrix.git; fi
27
- - if [ $MATRIX_LIBRARY = 'nmatrix' ]; then cd nmatrix && ORIGINAL_BUNDLE_GEMFILE=$BUNDLE_GEMFILE; fi
28
- - if [ $MATRIX_LIBRARY = 'nmatrix' ]; then BUNDLE_GEMFILE=`pwd`/Gemfile && bundle && bundle exec rake install; fi
29
- - if [ $MATRIX_LIBRARY = 'nmatrix' ]; then cd .. && BUNDLE_GEMFILE=$ORIGINAL_BUNDLE_GEMFILE; fi
30
26
  # Travis sometimes runs without Bundler.
31
27
  install: bundle
32
28
  script: bundle exec rake --trace
data/Gemfile CHANGED
@@ -2,7 +2,7 @@ source "http://rubygems.org"
2
2
 
3
3
  gem 'gsl', '~> 1.15.3' if ENV['MATRIX_LIBRARY'] == 'gsl'
4
4
  gem 'narray', '~> 0.6.0.0' if ENV['MATRIX_LIBRARY'] == 'narray'
5
- gem 'nmatrix', :git => 'git://github.com/SciRuby/nmatrix.git' if ENV['MATRIX_LIBRARY'] == 'nmatrix' && RUBY_VERSION >= '1.9'
5
+ gem 'nmatrix', '~> 0.0.9' if ENV['MATRIX_LIBRARY'] == 'nmatrix' && RUBY_VERSION >= '1.9'
6
6
 
7
7
  # Specify your gem's dependencies in the gemspec
8
8
  gemspec
data/README.md CHANGED
@@ -5,8 +5,7 @@
5
5
  [![Coverage Status](https://coveralls.io/repos/opennorth/tf-idf-similarity/badge.png?branch=master)](https://coveralls.io/r/opennorth/tf-idf-similarity)
6
6
  [![Code Climate](https://codeclimate.com/github/opennorth/tf-idf-similarity.png)](https://codeclimate.com/github/opennorth/tf-idf-similarity)
7
7
 
8
- Calculates the similarity between texts using a [bag-of-words](http://en.wikipedia.org/wiki/Bag_of_words_model) [Vector Space Model](http://en.wikipedia.org/wiki/Vector_space_model) with [Term Frequency-Inverse Document Frequency (tf*idf)](http://en.wikipedia.org/wiki/
9
- ) weights. If your use case demands performance, use [Lucene](http://lucene.apache.org/core/) or similar (see below).
8
+ Calculates the similarity between texts using a [bag-of-words](http://en.wikipedia.org/wiki/Bag_of_words_model) [Vector Space Model](http://en.wikipedia.org/wiki/Vector_space_model) with [Term Frequency-Inverse Document Frequency (tf*idf)](http://en.wikipedia.org/wiki/Tf*idf) weights. If your use case demands performance, use [Lucene](http://lucene.apache.org/core/) (see below).
10
9
 
11
10
  ## Usage
12
11
 
@@ -20,34 +19,24 @@ Create a set of documents:
20
19
  corpus << TfIdfSimilarity::Document.new("Pellentesque sed ipsum dui...")
21
20
  corpus << TfIdfSimilarity::Document.new("Nam scelerisque dui sed leo...")
22
21
 
23
- Create a document-term matrix using [Term Frequency-Inverse Document Frequency function](http://en.wikipedia.org/wiki/) (default:
22
+ Create a document-term matrix using [Term Frequency-Inverse Document Frequency function](http://en.wikipedia.org/wiki/):
24
23
 
25
- model = TfIdfSimilarity::TfIdfModel(corpus, :function => :tf_idf)
24
+ model = TfIdfSimilarity::TfIdfModel.new(corpus)
26
25
 
27
26
  Create a document-term matrix using the [Okapi BM25 ranking function](http://en.wikipedia.org/wiki/Okapi_BM25):
28
27
 
29
- model = TfIdfSimilarity::TfIdfModel(corpus, :function => :bm25)
28
+ model = TfIdfSimilarity::BM25Model.new(corpus)
30
29
 
31
30
  [Read the documentation at RubyDoc.info.](http://rubydoc.info/gems/tf-idf-similarity)
32
31
 
33
32
  ## Speed
34
33
 
35
- Instead of using the Ruby Standard Library's [Matrix](http://www.ruby-doc.org/stdlib-2.0/libdoc/matrix/rdoc/Matrix.html) class, you can use one of the `gsl`, `narray` or `nmatrix` gems for faster matrix operations, e.g.:
34
+ Instead of using the Ruby Standard Library's [Matrix](http://www.ruby-doc.org/stdlib-2.0/libdoc/matrix/rdoc/Matrix.html) class, you can use one of the [GNU Scientific Library (GSL)](http://www.gnu.org/software/gsl/), [NArray](http://narray.rubyforge.org/) or [NMatrix](https://github.com/SciRuby/nmatrix) (0.0.9 or greater) gems for faster matrix operations. For example:
36
35
 
37
36
  require 'gsl'
38
- model = TfIdfSimilarity::TfIdfModel(corpus, :library => :gsl)
37
+ model = TfIdfSimilarity::TfIdfModel.new(corpus, :library => :gsl)
39
38
 
40
- ### [GNU Scientific Library (GSL)](http://www.gnu.org/software/gsl/)
41
-
42
- gem install gsl
43
-
44
- ### [NArray](http://narray.rubyforge.org/)
45
-
46
- gem install narray
47
-
48
- ### [NMatrix](https://github.com/SciRuby/nmatrix)
49
-
50
- The nmatrix gem gives access to [Automatically Tuned Linear Algebra Software (ATLAS)](http://math-atlas.sourceforge.net/), which you may know of through [Linear Algebra PACKage (LAPACK)](http://www.netlib.org/lapack/) or [Basic Linear Algebra Subprograms (BLAS)](http://www.netlib.org/blas/). Follow [these instructions](https://github.com/SciRuby/nmatrix#synopsis) to install the nmatrix gem. You may need [additional instructions for Mac OS X Lion](https://github.com/SciRuby/nmatrix/wiki/Installation).
39
+ The NMatrix gem gives access to [Automatically Tuned Linear Algebra Software (ATLAS)](http://math-atlas.sourceforge.net/), which you may know of through [Linear Algebra PACKage (LAPACK)](http://www.netlib.org/lapack/) or [Basic Linear Algebra Subprograms (BLAS)](http://www.netlib.org/blas/). Follow [these instructions](https://github.com/SciRuby/nmatrix#synopsis) to install the NMatrix gem. You may need [additional instructions for Mac OS X Lion](https://github.com/SciRuby/nmatrix/wiki/Installation).
51
40
 
52
41
  ## Extras
53
42
 
@@ -68,22 +57,25 @@ At the time of writing, no other Ruby gem implemented the tf*idf formula used by
68
57
 
69
58
  ### Term frequencies
70
59
 
71
- The [vss](https://github.com/mkdynamic/vss) gem does not normalize the frequency of a term in a document; this occurs frequently in the academic literature, but only to demonstrate why normalization is important. The [tf_idf](https://github.com/reddavis/TF-IDF) and similarity gems normalize the frequency of a term in a document to the number of terms in that document, which never occurs in the literature. The [tf-idf](https://github.com/mchung/tf-idf) gem normalizes the frequency of a term in a document to the number of *unique* terms in that document, which never occurs in the literature.
60
+ * The [vss](https://github.com/mkdynamic/vss) gem does not normalize the frequency of a term in a document; this occurs frequently in the academic literature, but only to demonstrate why normalization is important.
61
+ * The [tf_idf](https://github.com/reddavis/TF-IDF) and similarity gems normalize the frequency of a term in a document to the number of terms in that document, which never occurs in the literature.
62
+ * The [tf-idf](https://github.com/mchung/tf-idf) gem normalizes the frequency of a term in a document to the number of *unique* terms in that document, which never occurs in the literature.
72
63
 
73
64
  ### Document frequencies
74
65
 
75
- The vss gem does not normalize the inverse document frequency. The treat, tf_idf, tf-idf and similarity gems use variants of the typical inverse document frequency formula.
66
+ * The vss gem does not normalize the inverse document frequency.
67
+ * The treat, tf_idf, tf-idf and similarity gems use variants of the typical inverse document frequency formula.
76
68
 
77
69
  ### Normalization
78
70
 
79
- The treat, tf_idf, tf-idf, rsemantic and vss gems have no normalization component.
71
+ * The treat, tf_idf, tf-idf, rsemantic and vss gems have no normalization component.
80
72
 
81
73
  ## Additional adapters
82
74
 
83
75
  Adapters for the following projects were also considered:
84
76
 
85
77
  * [Ruby-LAPACK](http://ruby.gfd-dennou.org/products/ruby-lapack/) is a very thin wrapper around LAPACK, which has an opaque Fortran-style naming scheme.
86
- * [Linalg](https://github.com/quix/linalg) and [RNum](http://rnum.rubyforge.org/) give access to LAPACK from Ruby, but are old and unavailable as gems.
78
+ * [Linalg](https://github.com/quix/linalg) and [RNum](http://rnum.rubyforge.org/) give access to LAPACK from Ruby but are old and unavailable as gems.
87
79
 
88
80
  ## Reference
89
81
 
@@ -13,5 +13,6 @@ end
13
13
  require 'tf-idf-similarity/matrix_methods'
14
14
  require 'tf-idf-similarity/term_count_model'
15
15
  require 'tf-idf-similarity/tf_idf_model'
16
+ require 'tf-idf-similarity/bm25_model'
16
17
  require 'tf-idf-similarity/document'
17
18
  require 'tf-idf-similarity/token'
@@ -0,0 +1,69 @@
1
+ # A document-term matrix using the BM25 function.
2
+ #
3
+ # @see http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/BM25Similarity.html
4
+ # @see http://en.wikipedia.org/wiki/Okapi_BM25
5
+ class TfIdfSimilarity::BM25Model
6
+ include TfIdfSimilarity::MatrixMethods
7
+
8
+ extend Forwardable
9
+ def_delegators :@model, :documents, :terms, :document_count
10
+
11
+ # @param [Array<TfIdfSimilarity::Document>] documents documents
12
+ # @param [Hash] opts optional arguments
13
+ # @option opts [Symbol] :library :gsl, :narray, :nmatrix or :matrix (default)
14
+ def initialize(documents, opts = {})
15
+ @model = TfIdfSimilarity::TermCountModel.new(documents, opts)
16
+ @library = (opts[:library] || :matrix).to_sym
17
+
18
+ array = Array.new(terms.size) do |i|
19
+ idf = inverse_document_frequency(terms[i])
20
+ Array.new(documents.size) do |j|
21
+ term_frequency(documents[j], terms[i]) * idf
22
+ end
23
+ end
24
+
25
+ @matrix = initialize_matrix(array)
26
+ end
27
+
28
+ # Return the term's inverse document frequency.
29
+ #
30
+ # @param [String] term a term
31
+ # @return [Float] the term's inverse document frequency
32
+ def inverse_document_frequency(term)
33
+ df = @model.document_count(term)
34
+ log((documents.size - df + 0.5) / (df + 0.5))
35
+ end
36
+ alias_method :idf, :inverse_document_frequency
37
+
38
+ # Returns the term's frequency in the document.
39
+ #
40
+ # @param [Document] document a document
41
+ # @param [String] term a term
42
+ # @return [Float] the term's frequency in the document
43
+ #
44
+ # @note Like Lucene, we use a b value of 0.75 and a k1 value of 1.2.
45
+ def term_frequency(document, term)
46
+ tf = document.term_count(term)
47
+ (tf * 2.2) / (tf + 0.3 + 0.9 * documents.size / @model.average_document_size)
48
+ end
49
+ alias_method :tf, :term_frequency
50
+
51
+ # Return the term frequency–inverse document frequency.
52
+ #
53
+ # @param [Document] document a document
54
+ # @param [String] term a term
55
+ # @return [Float] the term frequency–inverse document frequency
56
+ def term_frequency_inverse_document_frequency(document, term)
57
+ inverse_document_frequency(term) * term_frequency(document, term)
58
+ end
59
+ alias_method :tfidf, :term_frequency_inverse_document_frequency
60
+
61
+ # Returns a similarity matrix for the documents in the corpus.
62
+ #
63
+ # @return [GSL::Matrix,NMatrix,Matrix] a similarity matrix
64
+ # @note Columns are normalized to unit vectors, so we can calculate the cosine
65
+ # similarity of all document vectors.
66
+ def similarity_matrix
67
+ multiply_self(normalize)
68
+ end
69
+ end
@@ -17,7 +17,7 @@ private
17
17
  norm[norm.where2[1]] = 1.0 # avoid division by zero
18
18
  NMatrix.refer(@matrix / norm) # must be NMatrix for matrix multiplication
19
19
  when :nmatrix # @see https://github.com/SciRuby/nmatrix/issues/38
20
- normal = NMatrix.new(:dense, @matrix.shape, :float64)
20
+ normal = NMatrix.new(:dense, @matrix.shape, 0, :float64)
21
21
  (0...@matrix.shape[1]).each do |j|
22
22
  column = @matrix.column(j)
23
23
  norm = Math.sqrt(column.transpose.dot(column)[0, 0])
@@ -100,7 +100,7 @@ private
100
100
  def values
101
101
  case @library
102
102
  when :nmatrix
103
- @matrix.each.to_a
103
+ @matrix.each.to_a # faster than NMatrix's `to_a` and `to_flat_a`
104
104
  else
105
105
  @matrix.to_a.flatten
106
106
  end
@@ -124,8 +124,8 @@ private
124
124
  GSL::Matrix[*array]
125
125
  when :narray
126
126
  NArray[*array]
127
- when :nmatrix # @see https://github.com/SciRuby/nmatrix/issues/91
128
- NMatrix.new(:dense, [array.size, array.empty? ? 0 : array[0].size], array.flatten)
127
+ when :nmatrix # @see https://github.com/SciRuby/nmatrix/issues/91#issuecomment-18870619
128
+ NMatrix.new(:dense, [array.size, array.empty? ? 0 : array[0].size], array.flatten, :float64)
129
129
  else
130
130
  Matrix[*array]
131
131
  end
@@ -1,8 +1,4 @@
1
1
  # A simple document-term matrix.
2
- #
3
- # @see http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
4
- # @see http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/BM25Similarity.html
5
- # @see http://en.wikipedia.org/wiki/Okapi_BM25
6
2
  class TfIdfSimilarity::TermCountModel
7
3
  include TfIdfSimilarity::MatrixMethods
8
4
 
@@ -63,7 +59,7 @@ class TfIdfSimilarity::TermCountModel
63
59
  when :gsl, :narray
64
60
  row(index).sum
65
61
  when :nmatrix
66
- row(index).each.reduce(0, :+)
62
+ row(index).each.reduce(0, :+) # NMatrix's `sum` method is slower
67
63
  else
68
64
  vector = row(index)
69
65
  unless vector.respond_to?(:reduce)
@@ -1,8 +1,6 @@
1
- # A document-term matrix using either the tf*idf or BM25 functions.
1
+ # A document-term matrix using the tf*idf function.
2
2
  #
3
3
  # @see http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
4
- # @see http://lucene.apache.org/core/4_0_0-BETA/core/org/apache/lucene/search/similarities/BM25Similarity.html
5
- # @see http://en.wikipedia.org/wiki/Okapi_BM25
6
4
  class TfIdfSimilarity::TfIdfModel
7
5
  include TfIdfSimilarity::MatrixMethods
8
6
 
@@ -12,11 +10,9 @@ class TfIdfSimilarity::TfIdfModel
12
10
  # @param [Array<TfIdfSimilarity::Document>] documents documents
13
11
  # @param [Hash] opts optional arguments
14
12
  # @option opts [Symbol] :library :gsl, :narray, :nmatrix or :matrix (default)
15
- # @option opts [Symbol] :function :tfidf (default) or :bm25
16
13
  def initialize(documents, opts = {})
17
14
  @model = TfIdfSimilarity::TermCountModel.new(documents, opts)
18
15
  @library = (opts[:library] || :matrix).to_sym
19
- @function = (opts[:function] || :tfidf).to_sym
20
16
 
21
17
  array = Array.new(terms.size) do |i|
22
18
  idf = inverse_document_frequency(terms[i])
@@ -34,11 +30,7 @@ class TfIdfSimilarity::TfIdfModel
34
30
  # @return [Float] the term's inverse document frequency
35
31
  def inverse_document_frequency(term)
36
32
  df = @model.document_count(term)
37
- if @function == :bm25
38
- log((documents.size - df + 0.5) / (df + 0.5))
39
- else
40
- 1 + log(documents.size / (df + 1.0))
41
- end
33
+ 1 + log(documents.size / (df + 1.0))
42
34
  end
43
35
  alias_method :idf, :inverse_document_frequency
44
36
 
@@ -47,15 +39,9 @@ class TfIdfSimilarity::TfIdfModel
47
39
  # @param [Document] document a document
48
40
  # @param [String] term a term
49
41
  # @return [Float] the term's frequency in the document
50
- #
51
- # @note Like Lucene, we use a b value of 0.75 and a k1 value of 1.2.
52
42
  def term_frequency(document, term)
53
43
  tf = document.term_count(term)
54
- if @function == :bm25
55
- (tf * 2.2) / (tf + 0.3 + 0.9 * documents.size / @model.average_document_size)
56
- else
57
- sqrt(tf)
58
- end
44
+ sqrt(tf)
59
45
  end
60
46
  alias_method :tf, :term_frequency
61
47
 
@@ -73,8 +59,7 @@ class TfIdfSimilarity::TfIdfModel
73
59
  #
74
60
  # @return [GSL::Matrix,NMatrix,Matrix] a similarity matrix
75
61
  # @note Columns are normalized to unit vectors, so we can calculate the cosine
76
- # similarity of all document vectors. BM25 doesn't normalize columns, but
77
- # BM25 wasn't written with this use case in mind.
62
+ # similarity of all document vectors.
78
63
  def similarity_matrix
79
64
  multiply_self(normalize)
80
65
  end
@@ -16,29 +16,16 @@ class TfIdfSimilarity::Token < String
16
16
  #
17
17
  # @return [Boolean] whether the string is a token
18
18
  def valid?
19
- if RUBY_VERSION < '1.9'
20
- !self[%r{
21
- \A
22
- (
23
- \d | # number
24
- [[:cntrl:]] | # control character
25
- [[:punct:]] | # punctuation
26
- [[:space:]] # whitespace
27
- )+
28
- \z
29
- }x]
30
- else
31
- !self[%r{
32
- \A
33
- (
34
- \d | # number
35
- \p{Cntrl} | # control character
36
- \p{Punct} | # punctuation
37
- \p{Space} # whitespace
38
- )+
39
- \z
40
- }x] # The Ruby 1.8 parser will complain about this regular expression.
41
- end
19
+ !self[%r{
20
+ \A
21
+ (
22
+ \d | # number
23
+ [[:cntrl:]] | # control character
24
+ [[:punct:]] | # punctuation
25
+ [[:space:]] # whitespace
26
+ )+
27
+ \z
28
+ }x]
42
29
  end
43
30
 
44
31
  # Returns a lowercase string.
@@ -1,3 +1,3 @@
1
1
  module TfIdfSimilarity
2
- VERSION = "0.1.0"
2
+ VERSION = "0.1.1"
3
3
  end
@@ -9,6 +9,7 @@ Gem::Specification.new do |s|
9
9
  s.email = ["info@opennorth.ca"]
10
10
  s.homepage = "http://github.com/opennorth/tf-idf-similarity"
11
11
  s.summary = %q{Calculates the similarity between texts using tf*idf}
12
+ s.license = 'MIT'
12
13
 
13
14
  s.files = `git ls-files`.split("\n")
14
15
  s.test_files = `git ls-files -- {test,spec,features}/*`.split("\n")
@@ -19,4 +20,5 @@ Gem::Specification.new do |s|
19
20
  s.add_development_dependency('rspec', '~> 2.10')
20
21
  s.add_development_dependency('rake')
21
22
  s.add_development_dependency('coveralls')
23
+ s.add_development_dependency('mime-types', '~> 1.25') # 2.0 requires Ruby 1.9.2
22
24
  end
metadata CHANGED
@@ -1,85 +1,102 @@
1
- --- !ruby/object:Gem::Specification
1
+ --- !ruby/object:Gem::Specification
2
2
  name: tf-idf-similarity
3
- version: !ruby/object:Gem::Version
4
- hash: 27
5
- prerelease:
6
- segments:
7
- - 0
8
- - 1
9
- - 0
10
- version: 0.1.0
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.1
11
5
  platform: ruby
12
- authors:
6
+ authors:
13
7
  - Open North
14
8
  autorequire:
15
9
  bindir: bin
16
10
  cert_chain: []
17
-
18
- date: 2013-06-03 00:00:00 -04:00
19
- default_executable:
20
- dependencies:
21
- - !ruby/object:Gem::Dependency
22
- name: rspec
11
+ date: 2014-03-28 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: unicode_utils
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
23
21
  prerelease: false
24
- requirement: &id001 !ruby/object:Gem::Requirement
25
- none: false
26
- requirements:
27
- - - ~>
28
- - !ruby/object:Gem::Version
29
- hash: 23
30
- segments:
31
- - 2
32
- - 10
33
- version: "2.10"
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rspec
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '2.10'
34
34
  type: :development
35
- version_requirements: *id001
36
- - !ruby/object:Gem::Dependency
37
- name: rake
38
35
  prerelease: false
39
- requirement: &id002 !ruby/object:Gem::Requirement
40
- none: false
41
- requirements:
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '2.10'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
42
45
  - - ">="
43
- - !ruby/object:Gem::Version
44
- hash: 3
45
- segments:
46
- - 0
47
- version: "0"
46
+ - !ruby/object:Gem::Version
47
+ version: '0'
48
48
  type: :development
49
- version_requirements: *id002
50
- - !ruby/object:Gem::Dependency
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ - !ruby/object:Gem::Dependency
51
56
  name: coveralls
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
52
63
  prerelease: false
53
- requirement: &id003 !ruby/object:Gem::Requirement
54
- none: false
55
- requirements:
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
56
66
  - - ">="
57
- - !ruby/object:Gem::Version
58
- hash: 3
59
- segments:
60
- - 0
61
- version: "0"
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: mime-types
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.25'
62
76
  type: :development
63
- version_requirements: *id003
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.25'
64
83
  description:
65
- email:
84
+ email:
66
85
  - info@opennorth.ca
67
86
  executables: []
68
-
69
87
  extensions: []
70
-
71
88
  extra_rdoc_files: []
72
-
73
- files:
74
- - .gitignore
75
- - .travis.yml
76
- - .yardopts
89
+ files:
90
+ - ".gitignore"
91
+ - ".travis.yml"
92
+ - ".yardopts"
77
93
  - Gemfile
78
94
  - LICENSE
79
95
  - README.md
80
96
  - Rakefile
81
97
  - USAGE
82
98
  - lib/tf-idf-similarity.rb
99
+ - lib/tf-idf-similarity/bm25_model.rb
83
100
  - lib/tf-idf-similarity/document.rb
84
101
  - lib/tf-idf-similarity/extras/document.rb
85
102
  - lib/tf-idf-similarity/extras/tf_idf_model.rb
@@ -95,41 +112,31 @@ files:
95
112
  - spec/tf_idf_model_spec.rb
96
113
  - spec/token_spec.rb
97
114
  - td-idf-similarity.gemspec
98
- has_rdoc: true
99
115
  homepage: http://github.com/opennorth/tf-idf-similarity
100
- licenses: []
101
-
116
+ licenses:
117
+ - MIT
118
+ metadata: {}
102
119
  post_install_message:
103
120
  rdoc_options: []
104
-
105
- require_paths:
121
+ require_paths:
106
122
  - lib
107
- required_ruby_version: !ruby/object:Gem::Requirement
108
- none: false
109
- requirements:
123
+ required_ruby_version: !ruby/object:Gem::Requirement
124
+ requirements:
110
125
  - - ">="
111
- - !ruby/object:Gem::Version
112
- hash: 3
113
- segments:
114
- - 0
115
- version: "0"
116
- required_rubygems_version: !ruby/object:Gem::Requirement
117
- none: false
118
- requirements:
126
+ - !ruby/object:Gem::Version
127
+ version: '0'
128
+ required_rubygems_version: !ruby/object:Gem::Requirement
129
+ requirements:
119
130
  - - ">="
120
- - !ruby/object:Gem::Version
121
- hash: 3
122
- segments:
123
- - 0
124
- version: "0"
131
+ - !ruby/object:Gem::Version
132
+ version: '0'
125
133
  requirements: []
126
-
127
134
  rubyforge_project:
128
- rubygems_version: 1.6.2
135
+ rubygems_version: 2.2.2
129
136
  signing_key:
130
- specification_version: 3
137
+ specification_version: 4
131
138
  summary: Calculates the similarity between texts using tf*idf
132
- test_files:
139
+ test_files:
133
140
  - spec/document_spec.rb
134
141
  - spec/extras/tf_idf_model_spec.rb
135
142
  - spec/spec_helper.rb