crm114 1.0.1 → 1.0.2
Sign up to get free protection for your applications and to get access to all the features.
- data/AUTHORS +1 -0
- data/README +48 -51
- data/Rakefile +0 -13
- data/UNLICENSE +24 -0
- data/VERSION +1 -1
- data/lib/crm114.rb +2 -3
- data/test/test_code_or_text.rb +1 -1
- metadata +7 -6
- data/LICENSE +0 -19
data/AUTHORS
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
* Arto Bendiken <arto.bendiken@gmail.com> (Lead developer)
|
data/README
CHANGED
@@ -1,90 +1,87 @@
|
|
1
|
-
|
1
|
+
CRM114.rb: CRM114 Controllable Regex Mutilator for Ruby
|
2
|
+
=======================================================
|
2
3
|
|
3
4
|
This is a Ruby interface to the CRM114 Controllable Regex Mutilator, an
|
4
5
|
advanced and fast text classifier that uses sparse binary polynomial
|
5
6
|
matching with a Bayesian Chain Rule evaluator and a hidden Markov model to
|
6
7
|
categorize data with up to a 99.87% accuracy.
|
7
8
|
|
8
|
-
* http://crm114.rubyforge.org
|
9
|
-
* http://github.com/bendiken/crm114
|
10
|
-
* http://ar.to/2006/07/spam-filters-alien-technology-and-ruby-on-rails
|
9
|
+
* <http://crm114.rubyforge.org/>
|
10
|
+
* <http://github.com/bendiken/crm114>
|
11
|
+
* <http://ar.to/2006/07/spam-filters-alien-technology-and-ruby-on-rails>
|
11
12
|
|
13
|
+
### About CRM114
|
12
14
|
|
13
|
-
|
15
|
+
* <http://crm114.sourceforge.net/>
|
16
|
+
* <http://en.wikipedia.org/wiki/CRM114>
|
17
|
+
* <http://en.wikipedia.org/wiki/Dr_Strangelove>
|
18
|
+
* <http://www.paulgraham.com/wsy.html>
|
14
19
|
|
15
|
-
|
16
|
-
|
17
|
-
* http://en.wikipedia.org/wiki/Dr_Strangelove
|
18
|
-
* http://www.paulgraham.com/wsy.html
|
19
|
-
|
20
|
-
|
21
|
-
== Usage
|
20
|
+
Usage
|
21
|
+
-----
|
22
22
|
|
23
23
|
The CRM114 library interface is very similar to that of the
|
24
|
-
Classifier
|
24
|
+
[Classifier](http://rubyforge.org/projects/classifier) project.
|
25
25
|
|
26
26
|
Here follows a brief example:
|
27
27
|
|
28
|
-
|
28
|
+
require 'crm114'
|
29
29
|
|
30
|
-
|
30
|
+
crm = Classifier::CRM114.new([:interesting, :boring])
|
31
31
|
|
32
|
-
|
33
|
-
|
32
|
+
crm.train! :interesting, 'Some data set with a decent signal to noise ratio.'
|
33
|
+
crm.train! :boring, 'Pig latin, as in lorem ipsum dolor sit amet.'
|
34
34
|
|
35
|
-
|
36
|
-
|
37
|
-
|
35
|
+
crm.classify 'Lorem ipsum' => [:boring, 0.99]
|
36
|
+
crm.interesting? 'Lorem ipsum' => false
|
37
|
+
crm.boring? 'Lorem ipsum' => true
|
38
38
|
|
39
39
|
Have a look at the included unit tests for more comprehensive examples.
|
40
40
|
|
41
|
+
Dependencies
|
42
|
+
------------
|
41
43
|
|
42
|
-
|
43
|
-
|
44
|
-
Requires the CRM114 binaries to be installed. Specifically, the '+crm+'
|
45
|
-
binary should be accessible in the current user's +PATH+ environment
|
46
|
-
variable.
|
44
|
+
Requires the CRM114 binaries to be installed. Specifically, the `crm` binary
|
45
|
+
should be accessible in the current user's `PATH` environment variable.
|
47
46
|
|
48
|
-
|
49
|
-
|
47
|
+
Download
|
48
|
+
--------
|
50
49
|
|
51
50
|
To get a local working copy of the development repository, do:
|
52
51
|
|
53
|
-
|
52
|
+
% git clone git://github.com/bendiken/crm114.git
|
54
53
|
|
55
54
|
Alternatively, you can download the latest development version as a tarball
|
56
55
|
as follows:
|
57
56
|
|
58
|
-
|
59
|
-
|
57
|
+
% wget http://github.com/bendiken/crm114/tarball/master
|
60
58
|
|
61
|
-
|
59
|
+
Installation
|
60
|
+
------------
|
62
61
|
|
63
62
|
The recommended installation method is via RubyGems. To install the latest
|
64
|
-
official release from
|
65
|
-
|
66
|
-
% [sudo] gem install crm114
|
67
|
-
|
68
|
-
To use the very latest bleeding-edge development version, install the gem
|
69
|
-
directly from GitHub as follows:
|
70
|
-
|
71
|
-
% [sudo] gem install bendiken-crm114 -s http://gems.github.com
|
72
|
-
|
73
|
-
|
74
|
-
== Resources
|
63
|
+
official release from Gemcutter, do:
|
75
64
|
|
76
|
-
|
77
|
-
* http://www.elegantchaos.com/node/129 (crm.py)
|
78
|
-
* http://rubyforge.org/projects/classifier
|
79
|
-
* http://rubyforge.org/projects/bishop
|
65
|
+
% [sudo] gem install crm114
|
80
66
|
|
67
|
+
Resources
|
68
|
+
---------
|
81
69
|
|
82
|
-
|
70
|
+
* <http://gemcutter.org/gems/crm114>
|
71
|
+
* <http://rubyforge.org/projects/crm114>
|
72
|
+
* <http://raa.ruby-lang.org/project/crm114/>
|
73
|
+
* <http://www.ohloh.net/p/crm114-ruby>
|
74
|
+
* <http://www.elegantchaos.com/node/129> (crm.py)
|
75
|
+
* <http://rubyforge.org/projects/classifier>
|
76
|
+
* <http://rubyforge.org/projects/bishop>
|
83
77
|
|
84
|
-
|
78
|
+
Author
|
79
|
+
------
|
85
80
|
|
81
|
+
* [Arto Bendiken](mailto:arto.bendiken@gmail.com) - <http://ar.to/>
|
86
82
|
|
87
|
-
|
83
|
+
License
|
84
|
+
-------
|
88
85
|
|
89
|
-
|
90
|
-
information, see the accompanying
|
86
|
+
CRM114.rb is free and unencumbered public domain software. For more
|
87
|
+
information, see <http://unlicense.org/> or the accompanying UNLICENSE file.
|
data/Rakefile
CHANGED
@@ -3,16 +3,3 @@ $:.unshift(File.expand_path(File.join(File.dirname(__FILE__), 'lib')))
|
|
3
3
|
require 'rubygems'
|
4
4
|
require 'rakefile' # http://github.com/bendiken/rakefile
|
5
5
|
require 'crm114'
|
6
|
-
|
7
|
-
desc "Generate YARD documentation (with title)"
|
8
|
-
task :yardocs => :yardoc do
|
9
|
-
# FIXME: fork YARD and patch it to allow the title to be configured
|
10
|
-
sh "sed -i 's/YARD Documentation/CRM114.rb Documentation/' doc/yard/index.html"
|
11
|
-
|
12
|
-
# TODO: investigate why YARD doesn't auto-link URLs like RDoc does
|
13
|
-
html = File.read(file = 'doc/yard/readme.html')
|
14
|
-
html.gsub!(/>(http:\/\/)([\w\d\.\/\-]+)/, '><a href="\1\2" target="_blank">\2</a>')
|
15
|
-
html.gsub!(/(http:\/\/ar\.to)([^\/]+)/, '<a href="\1" target="_top">ar.to</a>\2')
|
16
|
-
html.gsub!(/(mailto:[^\)]+)/, '<a href="\1">\1</a>')
|
17
|
-
File.open(file, 'wb') { |f| f.puts html }
|
18
|
-
end
|
data/UNLICENSE
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
This is free and unencumbered software released into the public domain.
|
2
|
+
|
3
|
+
Anyone is free to copy, modify, publish, use, compile, sell, or
|
4
|
+
distribute this software, either in source code form or as a compiled
|
5
|
+
binary, for any purpose, commercial or non-commercial, and by any
|
6
|
+
means.
|
7
|
+
|
8
|
+
In jurisdictions that recognize copyright laws, the author or authors
|
9
|
+
of this software dedicate any and all copyright interest in the
|
10
|
+
software to the public domain. We make this dedication for the benefit
|
11
|
+
of the public at large and to the detriment of our heirs and
|
12
|
+
successors. We intend this dedication to be an overt act of
|
13
|
+
relinquishment in perpetuity of all present and future rights to this
|
14
|
+
software under copyright law.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
|
19
|
+
IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR
|
20
|
+
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
|
21
|
+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
23
|
+
|
24
|
+
For more information, please refer to <http://unlicense.org/>
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
1.0.
|
1
|
+
1.0.2
|
data/lib/crm114.rb
CHANGED
@@ -1,12 +1,11 @@
|
|
1
1
|
# Author:: Arto Bendiken (mailto:arto.bendiken@gmail.com)
|
2
|
-
#
|
3
|
-
# License:: MIT
|
2
|
+
# License:: Public domain
|
4
3
|
|
5
4
|
module Classifier
|
6
5
|
|
7
6
|
class CRM114
|
8
7
|
|
9
|
-
VERSION = '1.0.
|
8
|
+
VERSION = '1.0.2'
|
10
9
|
|
11
10
|
CLASSIFICATION_TYPE = '<osb unique microgroom>'
|
12
11
|
FILE_EXTENSION = '.css'
|
data/test/test_code_or_text.rb
CHANGED
@@ -10,7 +10,7 @@ class TestCodeOrText < Test::Unit::TestCase
|
|
10
10
|
@crm = Classifier::CRM114.new([:code, :text], :path => @path)
|
11
11
|
assert_nothing_raised do
|
12
12
|
Dir["#{@path}/../lib/*.rb"].each { |file| @crm.code! File.read(file) }
|
13
|
-
['
|
13
|
+
['README', 'UNLICENSE'].each { |file| @crm.text! File.read(file) }
|
14
14
|
end
|
15
15
|
end
|
16
16
|
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: crm114
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 1.0.
|
4
|
+
version: 1.0.2
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Arto Bendiken
|
@@ -9,11 +9,11 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-
|
12
|
+
date: 2009-12-20 00:00:00 +01:00
|
13
13
|
default_executable:
|
14
14
|
dependencies:
|
15
15
|
- !ruby/object:Gem::Dependency
|
16
|
-
name: rakefile
|
16
|
+
name: bendiken-rakefile
|
17
17
|
type: :development
|
18
18
|
version_requirement:
|
19
19
|
version_requirements: !ruby/object:Gem::Requirement
|
@@ -31,7 +31,8 @@ extensions: []
|
|
31
31
|
extra_rdoc_files: []
|
32
32
|
|
33
33
|
files:
|
34
|
-
-
|
34
|
+
- UNLICENSE
|
35
|
+
- AUTHORS
|
35
36
|
- README
|
36
37
|
- Rakefile
|
37
38
|
- VERSION
|
@@ -41,7 +42,7 @@ files:
|
|
41
42
|
has_rdoc: false
|
42
43
|
homepage: http://crm114.rubyforge.org/
|
43
44
|
licenses:
|
44
|
-
-
|
45
|
+
- Public Domain
|
45
46
|
post_install_message:
|
46
47
|
rdoc_options: []
|
47
48
|
|
@@ -62,7 +63,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
62
63
|
requirements:
|
63
64
|
- CRM114
|
64
65
|
rubyforge_project: crm114
|
65
|
-
rubygems_version: 1.3.
|
66
|
+
rubygems_version: 1.3.5
|
66
67
|
signing_key:
|
67
68
|
specification_version: 3
|
68
69
|
summary: Ruby interface to the CRM114 Controllable Regex Mutilator text classification engine.
|
data/LICENSE
DELETED
@@ -1,19 +0,0 @@
|
|
1
|
-
Copyright (c) 2005-2009 Arto Bendiken <http://ar.to/>
|
2
|
-
|
3
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
-
of this software and associated documentation files (the "Software"), to
|
5
|
-
deal in the Software without restriction, including without limitation the
|
6
|
-
rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
7
|
-
sell copies of the Software, and to permit persons to whom the Software is
|
8
|
-
furnished to do so, subject to the following conditions:
|
9
|
-
|
10
|
-
The above copyright notice and this permission notice shall be included in
|
11
|
-
all copies or substantial portions of the Software.
|
12
|
-
|
13
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
18
|
-
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
19
|
-
IN THE SOFTWARE.
|