carrot2 0.2.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
- SHA1:
3
- metadata.gz: 33734687ea74c1750a823b92eb21648910a7c11c
4
- data.tar.gz: 3683ea29b97fa311a23919048d49b2c4c3784701
2
+ SHA256:
3
+ metadata.gz: e21b80531294f5840aceea2f020c79a6651a02287cd3246d8f7ca493588b1d7e
4
+ data.tar.gz: a64d7804ed34e9b3fc2ec30eddb54d94b119e9620b0a88afaa8e68ef01b3bf7e
5
5
  SHA512:
6
- metadata.gz: 8c87a0029e7d304717778fc7fadeee896895cb412524e7fe0d435aec2f84308cbd9d6967481abd8493dff75303024bcdd467b33f7f25947931c8702f261d0f6c
7
- data.tar.gz: 5028411dc8eb0bf3a5629824e9fba106846f2759a3c687a43fa5d19636b1cce0bc9b564daf5ab2b0e23ab2a6ca318061345584c0641db0933d7e5b834dc70e39
6
+ metadata.gz: 72b690c8088d295a3813e6d0a625e4d8972845f533af993c774cc179e1a1c800fe27e8b8c07c62e35f45705111f3053e74c40362ba36412f187aff31d55d50a0
7
+ data.tar.gz: b0e3e09432846648aebbbd2a599d0a51c1c862bde5074d06b7d29ba0e9e3c199b6e53e1c05eed040d9dbde8f1afdb37e66ad4b1ba3211a2725ea63c5880ac48a
data/CHANGELOG.md CHANGED
@@ -1,14 +1,27 @@
1
- ## 0.2.0
1
+ ## 0.4.0 (2022-08-28)
2
+
3
+ - Dropped support for Ruby < 2.7
4
+
5
+ ## 0.3.0 (2020-07-29)
6
+
7
+ - Added support for Carrot2 4
8
+ - Dropped support for Carrot2 < 4
9
+
10
+ ## 0.2.1 (2019-10-28)
11
+
12
+ - Added `open_timeout` and `read_timeout` options
13
+
14
+ ## 0.2.0 (2017-01-22)
2
15
 
3
16
  - Added `request` method
4
17
  - Removed dependency on `rest-client`
5
18
  - Better error messages
6
19
 
7
- ## 0.1.0
20
+ ## 0.1.0 (2017-01-17)
8
21
 
9
22
  - Added support for env var
10
23
  - Added tests
11
24
 
12
- ## 0.0.1
25
+ ## 0.0.1 (2012-06-20)
13
26
 
14
27
  - First version
@@ -1,4 +1,4 @@
1
- Copyright (c) 2012 Andrew Kane
1
+ Copyright (c) 2012-2021 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
@@ -19,4 +19,4 @@ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
19
  NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
20
  LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
21
  OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
- WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
- # Carrot2
1
+ # Carrot2 Ruby
2
2
 
3
- Ruby client for [Carrot2](http://project.carrot2.org/) - the open-source document clustering server
3
+ Ruby client for [Carrot2](https://github.com/carrot2/carrot2) - the open-source document clustering server
4
4
 
5
- ## Installation
5
+ [![Build Status](https://github.com/ankane/carrot2-ruby/workflows/build/badge.svg?branch=master)](https://github.com/ankane/carrot2-ruby/actions)
6
6
 
7
- First, [download and run](http://project.carrot2.org/download-dcs.html) the Carrot2 server. It’s the one on [this page](https://github.com/carrot2/carrot2/releases) that begins with `carrot2-dcs`.
7
+ ## Installation
8
8
 
9
- With Homebrew, use:
9
+ First, [download and run](https://github.com/carrot2/carrot2#installation) the Carrot2 server. With Homebrew, use:
10
10
 
11
11
  ```sh
12
12
  brew install carrot2
@@ -16,9 +16,11 @@ brew services start carrot2
16
16
  Then add this line to your application’s Gemfile:
17
17
 
18
18
  ```ruby
19
- gem 'carrot2'
19
+ gem "carrot2"
20
20
  ```
21
21
 
22
+ The latest version works with Carrot2 4. For Carrot2 3, use version 0.2.1 and [this readme](https://github.com/ankane/carrot2-ruby/blob/v0.2.1/README.md).
23
+
22
24
  ## How to Use
23
25
 
24
26
  To cluster documents, use:
@@ -31,7 +33,7 @@ documents = [
31
33
  "This is completely unrelated to the other documents."
32
34
  ]
33
35
 
34
- carrot2 = Carrot2.new
36
+ carrot2 = Carrot2::Client.new
35
37
  carrot2.cluster(documents)
36
38
  ```
37
39
 
@@ -39,79 +41,99 @@ This returns:
39
41
 
40
42
  ```ruby
41
43
  {
42
- "processing-time-total"=>1,
43
- "clusters"=> [
44
- {
45
- "id"=>0,
46
- "size"=>3,
47
- "phrases"=>["Coupon"],
48
- "score"=>0.06462323710740674,
49
- "documents"=>[0, 1, 2],
50
- "attributes"=>{"score"=>0.06462323710740674}
51
- },
44
+ "clusters" => [
52
45
  {
53
- "id"=>1,
54
- "size"=>2,
55
- "phrases"=>["Exclusive"],
56
- "score"=>0.05873148311034013,
57
- "documents"=>[0, 1],
58
- "attributes"=>{"score"=>0.05873148311034013}
46
+ "labels" => ["Coupon"],
47
+ "documents" => [0, 1, 2],
48
+ "clusters" => [],
49
+ "score" => 0.06418006702675011
59
50
  },
60
51
  {
61
- "id"=>2,
62
- "size"=>1,
63
- "phrases"=>["Other Topics"],
64
- "score"=>0.0,
65
- "documents"=>[3],
66
- "attributes"=>{"other-topics"=>true, "score"=>0.0}
52
+ "labels" => ["Exclusive"],
53
+ "documents" => [0, 1],
54
+ "clusters" => [],
55
+ "score" => 0.7040290701763807
67
56
  }
68
- ],
69
- "processing-time-algorithm"=>1,
70
- "query"=>nil
57
+ ]
71
58
  }
72
59
  ```
73
60
 
74
61
  Documents are numbered in the order provided, starting with 0.
75
62
 
76
- For other requests, use:
63
+ Specify a language with:
77
64
 
78
65
  ```ruby
79
- carrot2.request(
80
- "dcs.c2stream" => xml_str
81
- )
66
+ carrot2.cluster(documents, language: "French")
82
67
  ```
83
68
 
84
- ## Configuration
69
+ Specify an [algorithm](https://carrot2.github.io/release/4.0.0/doc/algorithms/) with:
85
70
 
86
- To specify the Carrot2 server, set `ENV["CARROT2_URL"]` or use:
71
+ ```ruby
72
+ carrot2.cluster(documents, algorithm: "Lingo")
73
+ ```
74
+
75
+ Get a list of supported languages and algorithms with:
87
76
 
88
77
  ```ruby
89
- Carrot2.new(url: "http://localhost:8080")
78
+ carrot2.list
90
79
  ```
91
80
 
92
- ## Heroku
81
+ Specify parameters with:
82
+
83
+ ```ruby
84
+ parameters = {
85
+ preprocessing: {
86
+ phraseDfThreshold: 1,
87
+ wordDfThreshold: 1
88
+ }
89
+ }
90
+ carrot2.cluster(documents, parameters: parameters)
91
+ ```
93
92
 
94
- Carrot2 can be easily deployed to Heroku thanks to support for [WAR deployment](https://devcenter.heroku.com/articles/war-deployment).
93
+ See supported parameters for [Lingo](https://carrot2.github.io/release/4.0.0/doc/lingo-attributes/), [STC](https://carrot2.github.io/release/4.0.0/doc/stc-attributes/), and [Bisecting K-Means](https://carrot2.github.io/release/4.0.0/doc/kmeans-attributes/).
95
94
 
96
- You can find the `.war` file in the `war` directory in the dcs download. Then run:
95
+ Specify a [template](https://carrot2.github.io/release/4.0.0/doc/dcs-templates/) with:
97
96
 
98
- ```sh
99
- heroku plugins:install heroku-cli-deploy
100
- heroku create <app_name>
101
- heroku war:deploy carrot2-dcs.war --app <app_name>
97
+ ```ruby
98
+ carrot2.cluster(documents, template: "lingo")
102
99
  ```
103
100
 
104
- And set `ENV["CARROT2_URL"]` in your application.
101
+ ## Configuration
102
+
103
+ To specify the Carrot2 server, set `ENV["CARROT2_URL"]` or use:
104
+
105
+ ```ruby
106
+ Carrot2::Client.new(url: "http://localhost:8080")
107
+ ```
108
+
109
+ Set timeouts
110
+
111
+ ```ruby
112
+ Carrot2::Client.new(open_timeout: 3, read_timeout: 5)
113
+ ```
114
+
115
+ ## Resources
116
+
117
+ - [Carrot2 REST API Basics](https://carrot2.github.io/release/4.0.0/doc/rest-api-basics/)
105
118
 
106
119
  ## History
107
120
 
108
- View the [changelog](https://github.com/ankane/carrot2/blob/master/CHANGELOG.md)
121
+ View the [changelog](https://github.com/ankane/carrot2-ruby/blob/master/CHANGELOG.md)
109
122
 
110
123
  ## Contributing
111
124
 
112
125
  Everyone is encouraged to help improve this project. Here are a few ways you can help:
113
126
 
114
- - [Report bugs](https://github.com/ankane/carrot2/issues)
115
- - Fix bugs and [submit pull requests](https://github.com/ankane/carrot2/pulls)
127
+ - [Report bugs](https://github.com/ankane/carrot2-ruby/issues)
128
+ - Fix bugs and [submit pull requests](https://github.com/ankane/carrot2-ruby/pulls)
116
129
  - Write, clarify, or fix documentation
117
130
  - Suggest or add new features
131
+
132
+ To get started with development:
133
+
134
+ ```sh
135
+ git clone https://github.com/ankane/carrot2-ruby.git
136
+ cd carrot2-ruby
137
+ bundle install
138
+ bundle exec rake test
139
+ ```
@@ -0,0 +1,74 @@
1
+ module Carrot2
2
+ class Client
3
+ HEADERS = {
4
+ "Content-Type" => "application/json",
5
+ "Accept" => "application/json"
6
+ }
7
+
8
+ def initialize(url: nil, open_timeout: 3, read_timeout: nil)
9
+ url ||= ENV["CARROT2_URL"] || "http://localhost:8080"
10
+ @uri = URI.parse(url)
11
+ @http = Net::HTTP.new(@uri.host, @uri.port)
12
+ @http.use_ssl = true if @uri.scheme == "https"
13
+ @http.open_timeout = open_timeout if open_timeout
14
+ @http.read_timeout = read_timeout if read_timeout
15
+ end
16
+
17
+ def list
18
+ get("service/list")
19
+ end
20
+
21
+ def cluster(documents, language: nil, algorithm: nil, parameters: nil, template: nil)
22
+ # no defaults if template
23
+ unless template
24
+ language ||= "English"
25
+ algorithm ||= "Lingo"
26
+ parameters ||= {}
27
+ end
28
+
29
+ # data
30
+ data = {
31
+ documents: documents.map { |v| v.is_a?(String) ? {field: v} : v }
32
+ }
33
+ data[:language] = language if language
34
+ data[:algorithm] = algorithm if algorithm
35
+ data[:parameters] = parameters if parameters
36
+
37
+ # path
38
+ path = "service/cluster"
39
+ path = "#{path}?#{URI.encode_www_form(template: template)}" if template
40
+
41
+ post(path, data)
42
+ end
43
+
44
+ private
45
+
46
+ def get(path)
47
+ handle_response do
48
+ @http.get("#{@uri.request_uri.chomp("/")}/#{path}", HEADERS)
49
+ end
50
+ end
51
+
52
+ def post(path, data)
53
+ handle_response do
54
+ @http.post("#{@uri.request_uri.chomp("/")}/#{path}", data.to_json, HEADERS)
55
+ end
56
+ end
57
+
58
+ def handle_response
59
+ begin
60
+ response = yield
61
+ rescue Errno::ECONNREFUSED => e
62
+ raise Carrot2::Error, e.message
63
+ end
64
+
65
+ unless response.kind_of?(Net::HTTPSuccess)
66
+ body = JSON.parse(response.body) rescue {}
67
+ message = body["message"] || "Bad response: #{response.code}"
68
+ raise Carrot2::Error, message
69
+ end
70
+
71
+ JSON.parse(response.body)
72
+ end
73
+ end
74
+ end
@@ -1,3 +1,3 @@
1
- class Carrot2
2
- VERSION = "0.2.0"
1
+ module Carrot2
2
+ VERSION = "0.4.0"
3
3
  end
data/lib/carrot2.rb CHANGED
@@ -1,48 +1,15 @@
1
- require "carrot2/version"
2
- require "builder"
3
- require "net/http"
1
+ # dependencies
4
2
  require "json"
3
+ require "net/http"
5
4
 
6
- class Carrot2
7
- class Error < StandardError; end
8
-
9
- def initialize(url: nil)
10
- @url = url || ENV["CARROT2_URL"] || "http://localhost:8080"
11
-
12
- # add dcs/rest
13
- @url = "#{@url.sub(/\/\z/, "")}/dcs/rest"
14
- @uri = URI.parse(@url)
15
- end
16
-
17
- def cluster(documents, language: "ENGLISH")
18
- xml = Builder::XmlMarkup.new
19
- xml.instruct! :xml, version: "1.0", encoding: "UTF-8"
20
- xml.searchresult do |s|
21
- documents.each do |document|
22
- s.document do |d|
23
- d.title document
24
- end
25
- end
26
- end
5
+ # modules
6
+ require "carrot2/client"
7
+ require "carrot2/version"
27
8
 
28
- request(
29
- "dcs.clusters.only" => true,
30
- "dcs.c2stream" => xml.target!,
31
- "MultilingualClustering.defaultLanguage" => language,
32
- multipart: true
33
- )
34
- end
9
+ module Carrot2
10
+ class Error < StandardError; end
35
11
 
36
- def request(params)
37
- response = Net::HTTP.post_form(@uri, params.merge("dcs.output.format" => "JSON"))
38
- if response.code == "200"
39
- JSON.parse(response.body)
40
- else
41
- body = response.body.to_s
42
- # try to get reason from title
43
- m = body.match(/<title>(.+)<\/title>/)
44
- message = m ? m[1] : body
45
- raise Carrot2::Error, message
46
- end
12
+ def self.new(**options)
13
+ Client.new(**options)
47
14
  end
48
15
  end
metadata CHANGED
@@ -1,93 +1,32 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: carrot2
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2017-01-23 00:00:00.000000000 Z
12
- dependencies:
13
- - !ruby/object:Gem::Dependency
14
- name: builder
15
- requirement: !ruby/object:Gem::Requirement
16
- requirements:
17
- - - ">="
18
- - !ruby/object:Gem::Version
19
- version: '0'
20
- type: :runtime
21
- prerelease: false
22
- version_requirements: !ruby/object:Gem::Requirement
23
- requirements:
24
- - - ">="
25
- - !ruby/object:Gem::Version
26
- version: '0'
27
- - !ruby/object:Gem::Dependency
28
- name: bundler
29
- requirement: !ruby/object:Gem::Requirement
30
- requirements:
31
- - - ">="
32
- - !ruby/object:Gem::Version
33
- version: '0'
34
- type: :development
35
- prerelease: false
36
- version_requirements: !ruby/object:Gem::Requirement
37
- requirements:
38
- - - ">="
39
- - !ruby/object:Gem::Version
40
- version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: rake
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: minitest
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- description: Ruby client for Carrot2
70
- email:
71
- - andrew@chartkick.com
11
+ date: 2022-08-29 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description:
14
+ email: andrew@ankane.org
72
15
  executables: []
73
16
  extensions: []
74
17
  extra_rdoc_files: []
75
18
  files:
76
- - ".gitignore"
77
19
  - CHANGELOG.md
78
- - Gemfile
79
- - LICENSE
20
+ - LICENSE.txt
80
21
  - README.md
81
- - Rakefile
82
- - carrot2.gemspec
83
22
  - lib/carrot2.rb
23
+ - lib/carrot2/client.rb
84
24
  - lib/carrot2/version.rb
85
- - test/carrot2_test.rb
86
- - test/test_helper.rb
87
- homepage: https://github.com/ankane/carrot2
88
- licenses: []
25
+ homepage: https://github.com/ankane/carrot2-ruby
26
+ licenses:
27
+ - MIT
89
28
  metadata: {}
90
- post_install_message:
29
+ post_install_message:
91
30
  rdoc_options: []
92
31
  require_paths:
93
32
  - lib
@@ -95,18 +34,15 @@ required_ruby_version: !ruby/object:Gem::Requirement
95
34
  requirements:
96
35
  - - ">="
97
36
  - !ruby/object:Gem::Version
98
- version: '0'
37
+ version: '2.7'
99
38
  required_rubygems_version: !ruby/object:Gem::Requirement
100
39
  requirements:
101
40
  - - ">="
102
41
  - !ruby/object:Gem::Version
103
42
  version: '0'
104
43
  requirements: []
105
- rubyforge_project:
106
- rubygems_version: 2.6.8
107
- signing_key:
44
+ rubygems_version: 3.3.7
45
+ signing_key:
108
46
  specification_version: 4
109
47
  summary: Ruby client for Carrot2
110
- test_files:
111
- - test/carrot2_test.rb
112
- - test/test_helper.rb
48
+ test_files: []
data/.gitignore DELETED
@@ -1,17 +0,0 @@
1
- *.gem
2
- *.rbc
3
- .bundle
4
- .config
5
- .yardoc
6
- Gemfile.lock
7
- InstalledFiles
8
- _yardoc
9
- coverage
10
- doc/
11
- lib/bundler/man
12
- pkg
13
- rdoc
14
- spec/reports
15
- test/tmp
16
- test/version_tmp
17
- tmp
data/Gemfile DELETED
@@ -1,4 +0,0 @@
1
- source "https://rubygems.org"
2
-
3
- # Specify your gem's dependencies in carrot2.gemspec
4
- gemspec
data/Rakefile DELETED
@@ -1,9 +0,0 @@
1
- require "bundler/gem_tasks"
2
- require "rake/testtask"
3
-
4
- task default: :test
5
- Rake::TestTask.new do |t|
6
- t.libs << "test"
7
- t.pattern = "test/**/*_test.rb"
8
- t.warning = false
9
- end
data/carrot2.gemspec DELETED
@@ -1,23 +0,0 @@
1
- # -*- encoding: utf-8 -*-
2
- require File.expand_path("../lib/carrot2/version", __FILE__)
3
-
4
- Gem::Specification.new do |spec|
5
- spec.authors = ["Andrew Kane"]
6
- spec.email = ["andrew@chartkick.com"]
7
- spec.description = "Ruby client for Carrot2"
8
- spec.summary = "Ruby client for Carrot2"
9
- spec.homepage = "https://github.com/ankane/carrot2"
10
-
11
- spec.files = `git ls-files`.split($OUTPUT_RECORD_SEPARATOR)
12
- spec.executables = spec.files.grep(%r{^exe/}).map { |f| File.basename(f) }
13
- spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
14
- spec.name = "carrot2"
15
- spec.require_paths = ["lib"]
16
- spec.version = Carrot2::VERSION
17
-
18
- spec.add_dependency "builder"
19
-
20
- spec.add_development_dependency "bundler"
21
- spec.add_development_dependency "rake"
22
- spec.add_development_dependency "minitest"
23
- end
data/test/carrot2_test.rb DELETED
@@ -1,23 +0,0 @@
1
- require_relative "test_helper"
2
-
3
- class Carrot2Test < Minitest::Test
4
- def test_cluster
5
- documents = [
6
- "Sign up for an exclusive coupon.",
7
- "Exclusive members get a free coupon.",
8
- "Coupons are going fast.",
9
- "This is completely unrelated to the other documents."
10
- ]
11
-
12
- assert_equal ["Coupon", "Exclusive", "Other Topics"], carrot2.cluster(documents)["clusters"].map { |c| c["phrases"].first }
13
- end
14
-
15
- def test_bad_request
16
- error = assert_raises(Carrot2::Error) { carrot2.request({}) }
17
- assert_includes error.message, "Error 400"
18
- end
19
-
20
- def carrot2
21
- @carrot2 ||= Carrot2.new
22
- end
23
- end
data/test/test_helper.rb DELETED
@@ -1,6 +0,0 @@
1
- require "bundler/setup"
2
- Bundler.require(:default)
3
- require "minitest/autorun"
4
- require "minitest/pride"
5
-
6
- Minitest::Test = Minitest::Unit::TestCase unless defined?(Minitest::Test)