scylla 0.2.0 → 0.3.0

Sign up to get free protection for your applications and to get access to all the features.
data/README.rdoc CHANGED
@@ -2,27 +2,43 @@
2
2
 
3
3
  Scylla is a language categorizing gem that allows you to guess the language of a given text. Scylla is a Ruby port of TextCat (http://www.let.rug.nl/~vannoord/TextCat) and is based on the text categorization algorithm presented in Cavnar, W. B. and J. M. Trenkle, ``N-Gram-Based Text Categorization'' In Proceedings of Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, UNLV Publications/Reprographics, pp. 161-175, 11-13 April 1994.
4
4
 
5
- Installation:
5
+ == Installation
6
6
 
7
- gem install scylla
7
+ gem install scylla
8
8
 
9
- Usage:
9
+ == Usage
10
10
 
11
- require 'scylla'
11
+ require 'scylla'
12
12
 
13
- "this is english text".language
13
+ "this is english text".language
14
14
  => "english"
15
15
 
16
- "Este es un texto español".language
17
- => "spanish"
16
+ "Este es un texto español".language
17
+ => "spanish"
18
18
 
19
19
  Multiple results for other possible languages:
20
20
 
21
- "isso poderia ser confundido com espanhol, bem".language
22
- => "portuguese"
21
+ "isso poderia ser confundido com espanhol, bem".language
22
+ => "portuguese"
23
23
 
24
- "isso poderia ser confundido com espanhol, bem".guess
25
- => ["portuguese", "spanish"]
24
+ "isso poderia ser confundido com espanhol, bem".guess
25
+ => ["portuguese", "spanish"]
26
+
27
+ == Training
28
+
29
+ You can train scylla in new languages by providing sample texts in different languages. The default set is located in the 'source_texts' folder in the gem directory. Add new .txt files to this directory named according to the language i.e. a text file full of Hebrew text should be called 'hebrew.txt'. At least 500 lines of text recommended. Then, in the gem folder, run this:
30
+
31
+ rake scylla:train
32
+
33
+ If you want to store texts in your own folder, you can specify that to the rake task.
34
+ WARNING: specifying a different folder deletes all language support for files located in the default directory if they are not copied over.
35
+
36
+ rake scylla:train[/Users/hash/mytextdir]
37
+ "Creating language map for /Users/hash/mytextdir/english.txt"
38
+ "Creating language map for /Users/hash/mytextdir/kannada.txt"
39
+ .
40
+ .
41
+ etc
26
42
 
27
43
  == Contributing to scylla
28
44
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.0
1
+ 0.3.0
@@ -6,6 +6,7 @@ module Scylla
6
6
  # minsize: The minimum size of the ngrams that you would like to store
7
7
  def initialize(dirtext = DEFAULT_SOURCE_DIR, dirlm = DEFAULT_TARGET_DIR, minsize = 0, silent = false)
8
8
  @dirtext = dirtext
9
+ p @dirtext
9
10
  @dirlm = dirlm
10
11
  @minsize = minsize
11
12
  end
data/lib/scylla/tasks.rb CHANGED
@@ -10,8 +10,9 @@ module Scylla
10
10
  def define_training_task
11
11
  namespace :scylla do
12
12
  desc "Trains Scylla in new languages"
13
- task :train do
14
- sg = Scylla::Generator.new
13
+ task :train, :dir do |t, args|
14
+ args.with_defaults(:dir => DEFAULT_SOURCE_DIR)
15
+ sg = Scylla::Generator.new(args[:dir])
15
16
  sg.train
16
17
  end
17
18
  end
data/scylla.gemspec CHANGED
@@ -5,7 +5,7 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{scylla}
8
- s.version = "0.2.0"
8
+ s.version = "0.3.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Ashwin Hegde"]
data/test/loader_test.rb CHANGED
@@ -9,7 +9,7 @@ class LoaderTest < Test::Unit::TestCase
9
9
  end
10
10
 
11
11
  context "when being read" do
12
- should_eventually "only load from disk once" do
12
+ should "only load from disk once" do
13
13
  Scylla::Loader.expects(:load_language_maps).once.returns([])
14
14
  Scylla::Loader.languages
15
15
  Scylla::Loader.languages
metadata CHANGED
@@ -1,13 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: scylla
3
3
  version: !ruby/object:Gem::Version
4
- hash: 23
4
+ hash: 19
5
5
  prerelease:
6
6
  segments:
7
7
  - 0
8
- - 2
8
+ - 3
9
9
  - 0
10
- version: 0.2.0
10
+ version: 0.3.0
11
11
  platform: ruby
12
12
  authors:
13
13
  - Ashwin Hegde