crm114 1.0.1 → 1.0.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/AUTHORS +1 -0
- data/README +48 -51
- data/Rakefile +0 -13
- data/UNLICENSE +24 -0
- data/VERSION +1 -1
- data/lib/crm114.rb +2 -3
- data/test/test_code_or_text.rb +1 -1
- metadata +7 -6
- data/LICENSE +0 -19
data/AUTHORS
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
* Arto Bendiken <arto.bendiken@gmail.com> (Lead developer)
|
data/README
CHANGED
@@ -1,90 +1,87 @@
|
|
1
|
-
|
1
|
+
CRM114.rb: CRM114 Controllable Regex Mutilator for Ruby
|
2
|
+
=======================================================
|
2
3
|
|
3
4
|
This is a Ruby interface to the CRM114 Controllable Regex Mutilator, an
|
4
5
|
advanced and fast text classifier that uses sparse binary polynomial
|
5
6
|
matching with a Bayesian Chain Rule evaluator and a hidden Markov model to
|
6
7
|
categorize data with up to a 99.87% accuracy.
|
7
8
|
|
8
|
-
* http://crm114.rubyforge.org
|
9
|
-
* http://github.com/bendiken/crm114
|
10
|
-
* http://ar.to/2006/07/spam-filters-alien-technology-and-ruby-on-rails
|
9
|
+
* <http://crm114.rubyforge.org/>
|
10
|
+
* <http://github.com/bendiken/crm114>
|
11
|
+
* <http://ar.to/2006/07/spam-filters-alien-technology-and-ruby-on-rails>
|
11
12
|
|
13
|
+
### About CRM114
|
12
14
|
|
13
|
-
|
15
|
+
* <http://crm114.sourceforge.net/>
|
16
|
+
* <http://en.wikipedia.org/wiki/CRM114>
|
17
|
+
* <http://en.wikipedia.org/wiki/Dr_Strangelove>
|
18
|
+
* <http://www.paulgraham.com/wsy.html>
|
14
19
|
|
15
|
-
|
16
|
-
|
17
|
-
* http://en.wikipedia.org/wiki/Dr_Strangelove
|
18
|
-
* http://www.paulgraham.com/wsy.html
|
19
|
-
|
20
|
-
|
21
|
-
== Usage
|
20
|
+
Usage
|
21
|
+
-----
|
22
22
|
|
23
23
|
The CRM114 library interface is very similar to that of the
|
24
|
-
Classifier
|
24
|
+
[Classifier](http://rubyforge.org/projects/classifier) project.
|
25
25
|
|
26
26
|
Here follows a brief example:
|
27
27
|
|
28
|
-
|
28
|
+
require 'crm114'
|
29
29
|
|
30
|
-
|
30
|
+
crm = Classifier::CRM114.new([:interesting, :boring])
|
31
31
|
|
32
|
-
|
33
|
-
|
32
|
+
crm.train! :interesting, 'Some data set with a decent signal to noise ratio.'
|
33
|
+
crm.train! :boring, 'Pig latin, as in lorem ipsum dolor sit amet.'
|
34
34
|
|
35
|
-
|
36
|
-
|
37
|
-
|
35
|
+
crm.classify 'Lorem ipsum' => [:boring, 0.99]
|
36
|
+
crm.interesting? 'Lorem ipsum' => false
|
37
|
+
crm.boring? 'Lorem ipsum' => true
|
38
38
|
|
39
39
|
Have a look at the included unit tests for more comprehensive examples.
|
40
40
|
|
41
|
+
Dependencies
|
42
|
+
------------
|
41
43
|
|
42
|
-
|
43
|
-
|
44
|
-
Requires the CRM114 binaries to be installed. Specifically, the '+crm+'
|
45
|
-
binary should be accessible in the current user's +PATH+ environment
|
46
|
-
variable.
|
44
|
+
Requires the CRM114 binaries to be installed. Specifically, the `crm` binary
|
45
|
+
should be accessible in the current user's `PATH` environment variable.
|
47
46
|
|
48
|
-
|
49
|
-
|
47
|
+
Download
|
48
|
+
--------
|
50
49
|
|
51
50
|
To get a local working copy of the development repository, do:
|
52
51
|
|
53
|
-
|
52
|
+
% git clone git://github.com/bendiken/crm114.git
|
54
53
|
|
55
54
|
Alternatively, you can download the latest development version as a tarball
|
56
55
|
as follows:
|
57
56
|
|
58
|
-
|
59
|
-
|
57
|
+
% wget http://github.com/bendiken/crm114/tarball/master
|
60
58
|
|
61
|
-
|
59
|
+
Installation
|
60
|
+
------------
|
62
61
|
|
63
62
|
The recommended installation method is via RubyGems. To install the latest
|
64
|
-
official release from
|
65
|
-
|
66
|
-
% [sudo] gem install crm114
|
67
|
-
|
68
|
-
To use the very latest bleeding-edge development version, install the gem
|
69
|
-
directly from GitHub as follows:
|
70
|
-
|
71
|
-
% [sudo] gem install bendiken-crm114 -s http://gems.github.com
|
72
|
-
|
73
|
-
|
74
|
-
== Resources
|
63
|
+
official release from Gemcutter, do:
|
75
64
|
|
76
|
-
|
77
|
-
* http://www.elegantchaos.com/node/129 (crm.py)
|
78
|
-
* http://rubyforge.org/projects/classifier
|
79
|
-
* http://rubyforge.org/projects/bishop
|
65
|
+
% [sudo] gem install crm114
|
80
66
|
|
67
|
+
Resources
|
68
|
+
---------
|
81
69
|
|
82
|
-
|
70
|
+
* <http://gemcutter.org/gems/crm114>
|
71
|
+
* <http://rubyforge.org/projects/crm114>
|
72
|
+
* <http://raa.ruby-lang.org/project/crm114/>
|
73
|
+
* <http://www.ohloh.net/p/crm114-ruby>
|
74
|
+
* <http://www.elegantchaos.com/node/129> (crm.py)
|
75
|
+
* <http://rubyforge.org/projects/classifier>
|
76
|
+
* <http://rubyforge.org/projects/bishop>
|
83
77
|
|
84
|
-
|
78
|
+
Author
|
79
|
+
------
|
85
80
|
|
81
|
+
* [Arto Bendiken](mailto:arto.bendiken@gmail.com) - <http://ar.to/>
|
86
82
|
|
87
|
-
|
83
|
+
License
|
84
|
+
-------
|
88
85
|
|
89
|
-
|
90
|
-
information, see the accompanying
|
86
|
+
CRM114.rb is free and unencumbered public domain software. For more
|
87
|
+
information, see <http://unlicense.org/> or the accompanying UNLICENSE file.
|
data/Rakefile
CHANGED
@@ -3,16 +3,3 @@ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), 'lib')))
|
|
3
3
|
require 'rubygems'
|
4
4
|
require 'rakefile' # http://github.com/bendiken/rakefile
|
5
5
|
require 'crm114'
|
6
|
-
|
7
|
-
desc "Generate YARD documentation (with title)"
|
8
|
-
task :yardocs => :yardoc do
|
9
|
-
# FIXME: fork YARD and patch it to allow the title to be configured
|
10
|
-
sh "sed -i 's/YARD Documentation/CRM114.rb Documentation/' doc/yard/index.html"
|
11
|
-
|
12
|
-
# TODO: investigate why YARD doesn't auto-link URLs like RDoc does
|
13
|
-
html = File.read(file = 'doc/yard/readme.html')
|
14
|
-
html.gsub!(/>(http:\/\/)([\w\d\.\/\-]+)/, '><a href="\1\2" target="_blank">\2</a>')
|
15
|
-
html.gsub!(/(http:\/\/ar\.to)([^\/]+)/, '<a href="\1" target="_top">ar.to</a>\2')
|
16
|
-
html.gsub!(/(mailto:[^\)]+)/, '<a href="\1">\1</a>')
|
17
|
-
File.open(file, 'wb') { |f| f.puts html }
|
18
|
-
end
|
data/UNLICENSE
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
This is free and unencumbered software released into the public domain.
|
2
|
+
|
3
|
+
Anyone is free to copy, modify, publish, use, compile, sell, or
|
4
|
+
distribute this software, either in source code form or as a compiled
|
5
|
+
binary, for any purpose, commercial or non-commercial, and by any
|
6
|
+
means.
|
7
|
+
|
8
|
+
In jurisdictions that recognize copyright laws, the author or authors
|
9
|
+
of this software dedicate any and all copyright interest in the
|
10
|
+
software to the public domain. We make this dedication for the benefit
|
11
|
+
of the public at large and to the detriment of our heirs and
|
12
|
+
successors. We intend this dedication to be an overt act of
|
13
|
+
relinquishment in perpetuity of all present and future rights to this
|
14
|
+
software under copyright law.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
20
|
+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
21
|
+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
23
|
+
|
24
|
+
For more information, please refer to <http://unlicense.org/>
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
1.0.
|
1
|
+
1.0.2
|
data/lib/crm114.rb
CHANGED
@@ -1,12 +1,11 @@
|
|
1
1
|
# Author:: Arto Bendiken (mailto:arto.bendiken@gmail.com)
|
2
|
-
#
|
3
|
-
# License:: MIT
|
2
|
+
# License:: Public domain
|
4
3
|
|
5
4
|
module Classifier
|
6
5
|
|
7
6
|
class CRM114
|
8
7
|
|
9
|
-
VERSION = '1.0.
|
8
|
+
VERSION = '1.0.2'
|
10
9
|
|
11
10
|
CLASSIFICATION_TYPE = '<osb unique microgroom>'
|
12
11
|
FILE_EXTENSION = '.css'
|
data/test/test_code_or_text.rb
CHANGED
@@ -10,7 +10,7 @@ class TestCodeOrText < Test::Unit::TestCase
|
|
10
10
|
@crm = Classifier::CRM114.new([:code, :text], :path => @path)
|
11
11
|
assert_nothing_raised do
|
12
12
|
Dir["#{@path}/../lib/*.rb"].each { |file| @crm.code! File.read(file) }
|
13
|
-
['
|
13
|
+
['README', 'UNLICENSE'].each { |file| @crm.text! File.read(file) }
|
14
14
|
end
|
15
15
|
end
|
16
16
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: crm114
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Arto Bendiken
|
@@ -9,11 +9,11 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-
|
12
|
+
date: 2009-12-20 00:00:00 +01:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
|
-
name: rakefile
|
16
|
+
name: bendiken-rakefile
|
17
17
|
type: :development
|
18
18
|
version_requirement:
|
19
19
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -31,7 +31,8 @@ extensions: []
|
|
31
31
|
extra_rdoc_files: []
|
32
32
|
|
33
33
|
files:
|
34
|
-
-
|
34
|
+
- UNLICENSE
|
35
|
+
- AUTHORS
|
35
36
|
- README
|
36
37
|
- Rakefile
|
37
38
|
- VERSION
|
@@ -41,7 +42,7 @@ files:
|
|
41
42
|
has_rdoc: false
|
42
43
|
homepage: http://crm114.rubyforge.org/
|
43
44
|
licenses:
|
44
|
-
-
|
45
|
+
- Public Domain
|
45
46
|
post_install_message:
|
46
47
|
rdoc_options: []
|
47
48
|
|
@@ -62,7 +63,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
62
63
|
requirements:
|
63
64
|
- CRM114
|
64
65
|
rubyforge_project: crm114
|
65
|
-
rubygems_version: 1.3.
|
66
|
+
rubygems_version: 1.3.5
|
66
67
|
signing_key:
|
67
68
|
specification_version: 3
|
68
69
|
summary: Ruby interface to the CRM114 Controllable Regex Mutilator text classification engine.
|
data/LICENSE
DELETED
@@ -1,19 +0,0 @@
|
|
1
|
-
Copyright (c) 2005-2009 Arto Bendiken <http://ar.to/>
|
2
|
-
|
3
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
-
of this software and associated documentation files (the "Software"), to
|
5
|
-
deal in the Software without restriction, including without limitation the
|
6
|
-
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
7
|
-
sell copies of the Software, and to permit persons to whom the Software is
|
8
|
-
furnished to do so, subject to the following conditions:
|
9
|
-
|
10
|
-
The above copyright notice and this permission notice shall be included in
|
11
|
-
all copies or substantial portions of the Software.
|
12
|
-
|
13
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
18
|
-
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
19
|
-
IN THE SOFTWARE.
|