kusari 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +9 -0
- data/.rspec +2 -0
- data/.travis.yml +4 -0
- data/CODE_OF_CONDUCT.md +13 -0
- data/Gemfile +4 -0
- data/README.md +61 -0
- data/Rakefile +6 -0
- data/bin/console +14 -0
- data/bin/setup +7 -0
- data/kusari.gemspec +26 -0
- data/lib/kusari.rb +18 -0
- data/lib/kusari/markov_sentence_generator.rb +80 -0
- data/lib/kusari/version.rb +3 -0
- metadata +113 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 1bd23a95575a6e870596ea70b119829c01cbeaf0
|
4
|
+
data.tar.gz: e52a64cf85a4993763df1121d9cd0b7adb53d6c2
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 8ead23f301fa3745b1dc25e19b11f7d225595868cd365b1c067c17ae7dc393502cbaed79dcd63ae121ba15a352dbd0971c1ae3b55a34570258ef168ca589f0db
|
7
|
+
data.tar.gz: 7afdbd2f11989ddcdde5ae67d8bcd3d59156e7fe1a4965c870827c699c6e9a6ef3d8b67965604eb5e6c13e24f9c1e9b077a9d01a6ca8a78c66e9b8b80e7fec79
|
data/.gitignore
ADDED
data/.rspec
ADDED
data/.travis.yml
ADDED
data/CODE_OF_CONDUCT.md
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
# Contributor Code of Conduct
|
2
|
+
|
3
|
+
As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
|
4
|
+
|
5
|
+
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
|
6
|
+
|
7
|
+
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
|
8
|
+
|
9
|
+
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
|
10
|
+
|
11
|
+
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
|
12
|
+
|
13
|
+
This Code of Conduct is adapted from the [Contributor Covenant](http://contributor-covenant.org), version 1.0.0, available at [http://contributor-covenant.org/version/1/0/0/](http://contributor-covenant.org/version/1/0/0/)
|
data/Gemfile
ADDED
data/README.md
ADDED
@@ -0,0 +1,61 @@
|
|
1
|
+
# :link: Kusari
|
2
|
+
|
3
|
+
Japanese random sentence generator based on Markov chain.
|
4
|
+
|
5
|
+
## Installation
|
6
|
+
|
7
|
+
Add this line to your application's Gemfile:
|
8
|
+
|
9
|
+
```ruby
|
10
|
+
gem 'kusari'
|
11
|
+
```
|
12
|
+
|
13
|
+
And then execute:
|
14
|
+
|
15
|
+
$ bundle
|
16
|
+
|
17
|
+
Or install it yourself as:
|
18
|
+
|
19
|
+
$ gem install kusari
|
20
|
+
|
21
|
+
## Usage
|
22
|
+
|
23
|
+
First of all, our application must load the gem and create a new instance as:
|
24
|
+
|
25
|
+
```ruby
|
26
|
+
require 'kusari'
|
27
|
+
generator = Kusari::Generator.new
|
28
|
+
# by default, the above statement is the same as:
|
29
|
+
# generator = Kusari::Generator.new(3, "./ipadic")
|
30
|
+
```
|
31
|
+
|
32
|
+
Note that the first argument `3` indicates N for the N-gram model used by creating tokenized word table. You can give arbitrary number. And the second one `./ipadic` tells the path of [IPA dictionary](http://taku910.github.io/mecab/#download), a dictionary for parsing Japanese strings, to the generator.
|
33
|
+
|
34
|
+
Next, adding strings (reference sentences for Markov chain) can be done by:
|
35
|
+
|
36
|
+
```ruby
|
37
|
+
generator.add_string("ネロとパトラッシュは、この世で二人きりでした。")
|
38
|
+
generator.add_string("彼らは、実の兄弟よりも仲のよい大の親友でした。")
|
39
|
+
generator.add_string("ネロは、アルデンネ生まれの少年でした。")
|
40
|
+
```
|
41
|
+
|
42
|
+
Finally, we can obtain randomly generated sentence as:
|
43
|
+
|
44
|
+
```ruby
|
45
|
+
sentence = generator.generate(140)
|
46
|
+
p sentence
|
47
|
+
# => "ネロは、アルデンネ生まれの兄弟よりも仲のよい大の少年でした。"
|
48
|
+
```
|
49
|
+
|
50
|
+
Here, an argument of the generate method defines limit length for the generated sentence; `generator.generate(140)` creates a sentence which can be posted on Twitter, for example.
|
51
|
+
|
52
|
+
## Development
|
53
|
+
|
54
|
+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
|
55
|
+
|
56
|
+
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
|
57
|
+
|
58
|
+
## Contributing
|
59
|
+
|
60
|
+
Bug reports and pull requests are welcome on GitHub at https://github.com/takuti/kusari. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
|
61
|
+
|
data/Rakefile
ADDED
data/bin/console
ADDED
@@ -0,0 +1,14 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
require "bundler/setup"
|
4
|
+
require "kusari"
|
5
|
+
|
6
|
+
# You can add fixtures and/or initialization code here to make experimenting
|
7
|
+
# with your gem easier. You can also use a different console, if you like.
|
8
|
+
|
9
|
+
# (If you use this, don't forget to add pry to your Gemfile!)
|
10
|
+
# require "pry"
|
11
|
+
# Pry.start
|
12
|
+
|
13
|
+
require "irb"
|
14
|
+
IRB.start
|
data/bin/setup
ADDED
data/kusari.gemspec
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'kusari/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "kusari"
|
8
|
+
spec.version = Kusari::VERSION
|
9
|
+
spec.license = "MIT"
|
10
|
+
spec.authors = ["takuti"]
|
11
|
+
spec.email = ["k.takuti@gmail.com"]
|
12
|
+
|
13
|
+
spec.summary = %q{Japanese random sentence generator based on Markov chain.}
|
14
|
+
spec.homepage = "https://github.com/takuti/kusari"
|
15
|
+
|
16
|
+
spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
|
17
|
+
spec.bindir = "exe"
|
18
|
+
spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
|
19
|
+
spec.require_paths = ["lib"]
|
20
|
+
|
21
|
+
spec.add_dependency "igo-ruby", "~> 0.1.5"
|
22
|
+
|
23
|
+
spec.add_development_dependency "bundler", "~> 1.10"
|
24
|
+
spec.add_development_dependency "rake", "~> 10.0"
|
25
|
+
spec.add_development_dependency "rspec"
|
26
|
+
end
|
data/lib/kusari.rb
ADDED
@@ -0,0 +1,18 @@
|
|
1
|
+
require "kusari/markov_sentence_generator"
|
2
|
+
require "kusari/version"
|
3
|
+
|
4
|
+
module Kusari
|
5
|
+
class Generator
|
6
|
+
def initialize(gram=3, ipadic_path="./ipadic")
|
7
|
+
@generator = MarkovSentenceGenerator.new(gram, ipadic_path)
|
8
|
+
end
|
9
|
+
|
10
|
+
def add_string(string)
|
11
|
+
@generator.add(string)
|
12
|
+
end
|
13
|
+
|
14
|
+
def generate(limit)
|
15
|
+
@generator.generate(limit)
|
16
|
+
end
|
17
|
+
end
|
18
|
+
end
|
@@ -0,0 +1,80 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
|
3
|
+
require "igo-ruby"
|
4
|
+
|
5
|
+
class MarkovSentenceGenerator
|
6
|
+
HEAD = "[HEAD]"
|
7
|
+
TAIL = "[TAIL]"
|
8
|
+
|
9
|
+
def initialize(gram=3, ipadic_path="./ipadic")
|
10
|
+
@gram = gram
|
11
|
+
|
12
|
+
# Japanese tokenizer
|
13
|
+
@tagger = Igo::Tagger.new(ipadic_path)
|
14
|
+
|
15
|
+
# save arrays of tokenized words based on the N-gram model
|
16
|
+
@markov_table = Array.new
|
17
|
+
end
|
18
|
+
|
19
|
+
def tokenize(string)
|
20
|
+
tokens = Array.new
|
21
|
+
tokens << HEAD
|
22
|
+
tokens += @tagger.wakati(string)
|
23
|
+
tokens << TAIL
|
24
|
+
end
|
25
|
+
|
26
|
+
def add(string)
|
27
|
+
tokens = tokenize(string)
|
28
|
+
|
29
|
+
# if there are at least 4 tokens, we can create both of HEAD-started and TAIL-ended array of words
|
30
|
+
return if tokens.size < 4
|
31
|
+
|
32
|
+
# update markov_table
|
33
|
+
i = 0
|
34
|
+
loop do
|
35
|
+
@markov_table << tokens[i..(i+@gram-1)]
|
36
|
+
break if tokens[i+@gram-1] == TAIL
|
37
|
+
i += 1
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
def generate(limit)
|
42
|
+
# select all HEAD-started arrays
|
43
|
+
head_arrays = Array.new
|
44
|
+
@markov_table.each do |markov_array|
|
45
|
+
if markov_array[0] == HEAD
|
46
|
+
head_arrays << markov_array
|
47
|
+
end
|
48
|
+
end
|
49
|
+
|
50
|
+
# sample one HEAD-started array and create initial sentence based on that
|
51
|
+
sampled_array = head_arrays.sample
|
52
|
+
sentence = sampled_array[1] + sampled_array[2]
|
53
|
+
|
54
|
+
# start Markov chain until getting the TAIL flag
|
55
|
+
loop do
|
56
|
+
# select all arrays which can chain their head word to current tail of the sentence
|
57
|
+
chain_arrays = Array.new
|
58
|
+
@markov_table.each do |markov_array|
|
59
|
+
if markov_array[0] == sampled_array[2]
|
60
|
+
chain_arrays << markov_array
|
61
|
+
end
|
62
|
+
end
|
63
|
+
|
64
|
+
# finish here if we cannot continue to chain
|
65
|
+
break if chain_arrays.length == 0
|
66
|
+
|
67
|
+
# grow current sentence and check if it has the TAIL flag
|
68
|
+
sampled_array = chain_arrays.sample
|
69
|
+
if sampled_array[2] == TAIL
|
70
|
+
sentence += sampled_array[1]
|
71
|
+
break
|
72
|
+
else
|
73
|
+
concat_string = sampled_array[1] + sampled_array[2]
|
74
|
+
break if sentence.length + concat_string.length > limit
|
75
|
+
sentence += concat_string
|
76
|
+
end
|
77
|
+
end
|
78
|
+
sentence
|
79
|
+
end
|
80
|
+
end
|
metadata
ADDED
@@ -0,0 +1,113 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: kusari
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 0.1.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- takuti
|
8
|
+
autorequire:
|
9
|
+
bindir: exe
|
10
|
+
cert_chain: []
|
11
|
+
date: 2015-12-09 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: igo-ruby
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - "~>"
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: 0.1.5
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - "~>"
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: 0.1.5
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: bundler
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - "~>"
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: '1.10'
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - "~>"
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: '1.10'
|
41
|
+
- !ruby/object:Gem::Dependency
|
42
|
+
name: rake
|
43
|
+
requirement: !ruby/object:Gem::Requirement
|
44
|
+
requirements:
|
45
|
+
- - "~>"
|
46
|
+
- !ruby/object:Gem::Version
|
47
|
+
version: '10.0'
|
48
|
+
type: :development
|
49
|
+
prerelease: false
|
50
|
+
version_requirements: !ruby/object:Gem::Requirement
|
51
|
+
requirements:
|
52
|
+
- - "~>"
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
version: '10.0'
|
55
|
+
- !ruby/object:Gem::Dependency
|
56
|
+
name: rspec
|
57
|
+
requirement: !ruby/object:Gem::Requirement
|
58
|
+
requirements:
|
59
|
+
- - ">="
|
60
|
+
- !ruby/object:Gem::Version
|
61
|
+
version: '0'
|
62
|
+
type: :development
|
63
|
+
prerelease: false
|
64
|
+
version_requirements: !ruby/object:Gem::Requirement
|
65
|
+
requirements:
|
66
|
+
- - ">="
|
67
|
+
- !ruby/object:Gem::Version
|
68
|
+
version: '0'
|
69
|
+
description:
|
70
|
+
email:
|
71
|
+
- k.takuti@gmail.com
|
72
|
+
executables: []
|
73
|
+
extensions: []
|
74
|
+
extra_rdoc_files: []
|
75
|
+
files:
|
76
|
+
- ".gitignore"
|
77
|
+
- ".rspec"
|
78
|
+
- ".travis.yml"
|
79
|
+
- CODE_OF_CONDUCT.md
|
80
|
+
- Gemfile
|
81
|
+
- README.md
|
82
|
+
- Rakefile
|
83
|
+
- bin/console
|
84
|
+
- bin/setup
|
85
|
+
- kusari.gemspec
|
86
|
+
- lib/kusari.rb
|
87
|
+
- lib/kusari/markov_sentence_generator.rb
|
88
|
+
- lib/kusari/version.rb
|
89
|
+
homepage: https://github.com/takuti/kusari
|
90
|
+
licenses:
|
91
|
+
- MIT
|
92
|
+
metadata: {}
|
93
|
+
post_install_message:
|
94
|
+
rdoc_options: []
|
95
|
+
require_paths:
|
96
|
+
- lib
|
97
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
98
|
+
requirements:
|
99
|
+
- - ">="
|
100
|
+
- !ruby/object:Gem::Version
|
101
|
+
version: '0'
|
102
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
103
|
+
requirements:
|
104
|
+
- - ">="
|
105
|
+
- !ruby/object:Gem::Version
|
106
|
+
version: '0'
|
107
|
+
requirements: []
|
108
|
+
rubyforge_project:
|
109
|
+
rubygems_version: 2.4.5.1
|
110
|
+
signing_key:
|
111
|
+
specification_version: 4
|
112
|
+
summary: Japanese random sentence generator based on Markov chain.
|
113
|
+
test_files: []
|