zombie_writer 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d2fc89617a2463cd8ee6a523de15d6b35d39669a
4
+ data.tar.gz: e5205c355f563885805cf3e0bfa36f1814a0725d
5
+ SHA512:
6
+ metadata.gz: b74df2a53753e40d64772771d7d9e26791e2e1c9c545d495f736065ef2aff193fe6a0d2454dbaf6c6e6421d8505227c0b9eca4241ea406729c72292010e4a1f2
7
+ data.tar.gz: 4fc4596727922929a099110e0942969c136fd7f94a144c6d3074506497bdac6e267f5fbcdcf758d408adcdddaae84eb073cbcd3de7caf35b9a904568b8613d38
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ /toy_zone/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/.travis.yml ADDED
@@ -0,0 +1,4 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.3.0
4
+ before_install: gem install bundler -v 1.11.2
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in zombie.gemspec
4
+ gemspec
data/LICENSE.md ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2017 Tariq Ali
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,157 @@
1
+ # ZombieWriter
2
+
3
+ ![Logo](zombie_writer_logo.png)
4
+
5
+ ZombieWriter is a Ruby gem that will enable users to generate news articles by aggregating paragraphs from other sources.
6
+
7
+ While you have to provide the paragraphs, Zombie will arrange the paragraphs into different articles for you to use and edit to your heart's content. You may choose between Machine Learning (Latent Semantic Analysis and k-means clustering) or Randomization.
8
+
9
+ ## Installation
10
+
11
+ ### Command Line
12
+
13
+ ```
14
+ gem install zombie_writer
15
+ ```
16
+
17
+ ### Gemfile
18
+
19
+ ```ruby
20
+ gem 'zombie_writer'
21
+ ```
22
+
23
+ ## Usage
24
+ First, decide whether you want to use ZombieWriter::MachineLearning or ZombieWriter::Randomization. ZombieWriter::MachineLearning uses Latent Semantic Analysis and k-means clustering to group your content into different articles, while Randomization simply pick random paragraphs to put in each article.
25
+
26
+ ZombieWriter::MachineLearning has the potential of producing better-quality articles, but is slightly slower than ZombieWriter::Randomization.
27
+
28
+ To create a MachineLearning zombie...
29
+ ```ruby
30
+ zombie = ZombieWriter::MachineLearning.new
31
+ ```
32
+
33
+ And to create a Randomization zombie...
34
+ ```ruby
35
+ zombie = ZombieWriter::Randomization.new
36
+ ```
37
+
38
+ Then, once you have your zombie, add your content.
39
+ ```ruby
40
+ zombie.add_string(content: "Lorem ipsum dolor sit amet.",
41
+ sourcetext: "Cicero's Great Speech On Ethics",
42
+ sourceurl: "http://example.com/lorem-ipsum")
43
+ ```
44
+
45
+ In the generated article, this content will appear in Markdown as:
46
+ ```markdown
47
+ Lorem ipsum dolor sit amet.--[Cicero's Great Speech On Ethics](http://example.com/lorem-ipsum)
48
+ ```
49
+
50
+ If your content is located in an external file, such as a CSV file, you can easily automate the process of adding strings to your zombie.
51
+
52
+ ```csv
53
+ Content,SourceText,SourceUrl
54
+ "Lorem ipsum dolor sit amet.","Cicero's Great Speech On Ethics","http://example.com/lorem-ipsum"
55
+ "Leverage agile frameworks.","Corporate Ipsum","http://www.cipsum.com/"
56
+ "Bacon ipsum dolor amet.","Bacon Ipsum","http://baconipsum.com/"
57
+ "Pork belly seitan photo booth.","Hipster Ipsum","https://hipsum.co/"
58
+ ```
59
+
60
+ ```ruby
61
+ require 'smarter_csv'
62
+
63
+ array_of_paragraphs = SmarterCSV.process("ipsum_quotes.csv")
64
+
65
+ array_of_paragraphs.each do |paragraph|
66
+ zombie.add_string(paragraph)
67
+ end
68
+ ```
69
+
70
+ Once you have finished giving your Zombie all the strings it needs, tell it to generate your articles. It will save them as an array, which you can then save elsewhere. Each article is numbered, starting from zero, and is also given an headline (which is the "most important sentence" in the article). All articles are formatted using Markdown.
71
+
72
+ ```ruby
73
+ array = zombie.generate_articles
74
+
75
+ File.open("articles.md", "w+") do |f|
76
+ array.each { |article| f.puts("#{article}<hr>\n") }
77
+ end
78
+ ```
79
+
80
+ Here's an example article that might be generated by Zombie:
81
+
82
+ ```markdown
83
+ <h2>0 - Lorem ipsum dolor sit amet.</h2>
84
+ Lorem ipsum dolor sit amet.---[Cicero's Great Speech On Ethics](http://example.com/lorem-ipsum)
85
+
86
+ Bacon ipsum dolor amet.---[Bacon Ipsum](http://baconipsum.com/)
87
+
88
+ Leverage agile frameworks.---[Corporate Ipsum](http://www.cipsum.com/)
89
+
90
+ Pork belly seitan photo booth.---[Hipster Ipsum](https://hipsum.co/)
91
+ <hr>
92
+ ```
93
+
94
+ ###Citation
95
+ You do not need to provide sourcetext or sourceurl. If you exclude the sourceurl, the article will only display the sourcetext as citation (with no hyperlink).
96
+
97
+ ```ruby
98
+ zombie.add_string(content: "This is some Lorem filler that my friend made up.",
99
+ sourcetext: "tra38's anonymous friend")
100
+ ```
101
+
102
+ ```markdown
103
+ This is some Lorem filler that my friend made up.---tra38's anonymous friend
104
+ ```
105
+
106
+ If you exclude the sourcetext, the article will use the sourceurl, while providing a hyperlink as well.
107
+
108
+ ```ruby
109
+ zombie.add_string(content: "Zombie ipsum reversus ab viral inferno.",
110
+ sourceurl: "http://www.zombieipsum.com")
111
+ ```
112
+
113
+ ```markdown
114
+ Zombie ipsum reversus ab viral inferno.---[http://www.zombieipsum.com](http://www.zombieipsum.com)
115
+ ```
116
+
117
+ If you exclude both the sourcetext and the sourceurl, the article will display no citation. This is useful for situations where you don't need to provide any citation metadata (such as if you have handwritten the content).
118
+ ```ruby
119
+ zombie.add_string(content: "This is filler text that I invented.")
120
+ ```
121
+
122
+ ```markdown
123
+ This is filler text that I invented.
124
+ ```
125
+
126
+ ##Real-World Examples
127
+
128
+ ###NaNoGenMo articles
129
+ The "National Novel Generation Month" competition has generated a lot of commentary on the Internet. Rather than hand-writing out new commentary, why not reuse existing ones?
130
+
131
+ - [Articles generated using ZombieWriter::MachineLearning](https://gist.github.com/tra38/aa7e9c63708f6e21c32db5c3616162b5)
132
+ - [Articles generated using ZombieWriter::Randomization](https://gist.github.com/tra38/a65408790642560498aa1d40a05be9fe)
133
+
134
+ In both instances, we used [this CSV file](https://gist.github.com/tra38/805003ef51ff63093b3c2775f161ce3c) as source data.
135
+
136
+ ## Development
137
+
138
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment. Run `bundle exec zombie_writer` to use the gem in this directory, ignoring other installed copies of this gem.
139
+
140
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
141
+
142
+ ## Contributing
143
+
144
+ 1. Fork it ( https://github.com/tra38/zombie/fork )
145
+ 2. Create your feature branch (git checkout -b my-new-feature)
146
+ 3. Commit your changes (git commit -am 'Add some feature')
147
+ 4. Push to the branch (git push origin my-new-feature)
148
+ 5. Create a new Pull Request
149
+
150
+ ## License
151
+
152
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
153
+
154
+ ## Credits
155
+ The name is inspired by the term "[Philosphical Zombie](https://en.wikipedia.org/wiki/Philosophical_zombie)". According to Wikipedia, a P-Zombie is "a hypothetical being that is indistinguishable from a normal human being except that it lacks conscious experience, qualia, or sentience". AI is the closest we can get to building a P-Zombie of our own.
156
+
157
+ The logo for this project was generated using [MarkMaker](http://emblemmatic.org/markmaker/#/).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "zombie_writer"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
data/exe/zombie_writer ADDED
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "zombie_writer"
@@ -0,0 +1,8 @@
1
+ require 'redcarpet'
2
+ require 'redcarpet/render_strip'
3
+
4
+ class CustomStripDownRender < Redcarpet::Render::StripDown
5
+ def link(link, title, content)
6
+ "#{content}"
7
+ end
8
+ end
@@ -0,0 +1,3 @@
1
+ module ZombieWriter
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,132 @@
1
+ require "zombie_writer/version"
2
+ require "zombie_writer/redcarpet_configuration"
3
+ require 'classifier-reborn'
4
+ require 'kmeans-clusterer'
5
+
6
+ module ZombieWriter
7
+
8
+ def self.citation_constructor(paragraph)
9
+ if (paragraph[:sourceurl] && paragraph[:sourcetext])
10
+ "---[#{paragraph[:sourcetext]}](#{paragraph[:sourceurl]})"
11
+ elsif paragraph[:sourcetext]
12
+ "---#{paragraph[:sourcetext]}"
13
+ elsif paragraph[:sourceurl]
14
+ "---[#{paragraph[:sourceurl]}](#{paragraph[:sourceurl]})"
15
+ else
16
+ ""
17
+ end
18
+ end
19
+
20
+ class MachineLearning
21
+ attr_reader :lsi, :labels, :paragraph_data, :renderer, :plain_to_markdown
22
+
23
+ def initialize
24
+ @lsi = ClassifierReborn::LSI.new
25
+ @labels = []
26
+ @paragraph_data = Hash.new
27
+ @plain_to_markdown = Hash.new
28
+ @renderer = Redcarpet::Markdown.new(CustomStripDownRender)
29
+ end
30
+
31
+ def add_string(paragraph)
32
+ content = paragraph[:content]
33
+
34
+ stripped_down_content = renderer.render(content)
35
+
36
+ plain_to_markdown[stripped_down_content] = content
37
+
38
+ paragraph_data[content] = ZombieWriter.citation_constructor(paragraph)
39
+
40
+ labels << stripped_down_content
41
+ lsi.add_item(stripped_down_content)
42
+ end
43
+
44
+ def generate_articles
45
+ number_of_articles = labels.length
46
+ clusters = determine_number_of_clusters(number_of_articles)
47
+ clusters = generate_clusters(clusters: clusters, runs: 10)
48
+ clusters.map do |cluster|
49
+ article_for_summarization = generate_article(cluster) do |point|
50
+ point.label
51
+ end
52
+
53
+ final_article = generate_article(cluster) do |point|
54
+ stripped_down_content = point.label
55
+ content = plain_to_markdown[stripped_down_content]
56
+ citation = paragraph_data[content]
57
+ "#{content}#{citation}"
58
+ end
59
+
60
+ generated_title = ClassifierReborn::Summarizer.summary(article_for_summarization, 1)
61
+ "<h2>#{cluster.id.to_s} - #{generated_title}</h2>\n#{final_article}\n"
62
+ end
63
+ end
64
+
65
+ private
66
+ def generate_clusters(clusters:, runs:)
67
+ string_data = lsi.instance_variable_get(:"@items")
68
+ data = labels.map do |string|
69
+ string_data[string].lsi_norm.to_a
70
+ end
71
+ kmeans = KMeansClusterer.run clusters, data, labels: labels, runs: runs
72
+ kmeans.clusters
73
+ end
74
+
75
+ def determine_number_of_clusters(number_of_articles)
76
+ [1, ((number_of_articles/5).to_f).floor].max
77
+ end
78
+
79
+ def generate_article(cluster, &block)
80
+ cluster.points.map do |point|
81
+ yield(point)
82
+ end.join("\n\n")
83
+ end
84
+ end
85
+
86
+ class Randomization
87
+ attr_reader :labels, :paragraph_data, :renderer
88
+
89
+ def initialize
90
+ @labels = []
91
+ @paragraph_data = Hash.new
92
+ @renderer = Redcarpet::Markdown.new(CustomStripDownRender)
93
+ end
94
+
95
+ def add_string(paragraph)
96
+ content = paragraph[:content]
97
+
98
+ paragraph_data[content] = ZombieWriter.citation_constructor(paragraph)
99
+
100
+ labels << content
101
+ end
102
+
103
+ def generate_articles
104
+ number_of_paragraphs = labels.length
105
+ possible_paragraphs = labels.shuffle
106
+
107
+ possible_paragraphs.each_slice(5).with_index.map do |cluster, index|
108
+ article_for_summarization = generate_article(cluster) do |content|
109
+ renderer.render(content)
110
+ end
111
+
112
+ final_article = generate_article(cluster) do |content|
113
+ citation = paragraph_data[content]
114
+ "#{content}#{citation}"
115
+ end
116
+
117
+ generated_title = ClassifierReborn::Summarizer.summary(article_for_summarization, 1)
118
+ "<h2>#{index} - #{generated_title}</h2>\n#{final_article}\n"
119
+ end
120
+ end
121
+
122
+ private
123
+ def generate_article(cluster, &block)
124
+ cluster.map do |content|
125
+ yield(content)
126
+ end.join("\n\n")
127
+ end
128
+
129
+ end
130
+
131
+
132
+ end
@@ -0,0 +1,28 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'zombie_writer/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "zombie_writer"
8
+ spec.version = ZombieWriter::VERSION
9
+ spec.authors = ["Tariq Ali"]
10
+ spec.email = ["tra38@nau.edu"]
11
+
12
+ spec.summary = %q{ZombieWriter is a Ruby gem that will enable users to generate news articles by aggregating paragraphs from other sources.}
13
+ spec.description = %q{While you have to provide the paragraphs, ZombieWriter will arrange the paragraphs into different articles for you to use and edit to your heart's content. You may choose between Machine Learning (Latent Semantic Analysis and k-means clustering) or Randomization.}
14
+ spec.homepage = "https://github.com/tra38/Zombie"
15
+ spec.license = "MIT"
16
+
17
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
18
+ spec.bindir = "exe"
19
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
20
+ spec.require_paths = ["lib"]
21
+
22
+ spec.add_development_dependency "bundler", "~> 1.11"
23
+ spec.add_development_dependency "rake", "~> 10.0"
24
+ spec.add_development_dependency "rspec", "~> 3.0"
25
+ spec.add_runtime_dependency "classifier-reborn", "~> 2.1"
26
+ spec.add_runtime_dependency "kmeans-clusterer", "~> 0.11.4"
27
+ spec.add_runtime_dependency "redcarpet", "~> 3.4"
28
+ end
Binary file
metadata ADDED
@@ -0,0 +1,148 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: zombie_writer
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Tariq Ali
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2017-02-21 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.11'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.11'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: classifier-reborn
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '2.1'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '2.1'
69
+ - !ruby/object:Gem::Dependency
70
+ name: kmeans-clusterer
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: 0.11.4
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: 0.11.4
83
+ - !ruby/object:Gem::Dependency
84
+ name: redcarpet
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '3.4'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '3.4'
97
+ description: While you have to provide the paragraphs, ZombieWriter will arrange the
98
+ paragraphs into different articles for you to use and edit to your heart's content.
99
+ You may choose between Machine Learning (Latent Semantic Analysis and k-means clustering)
100
+ or Randomization.
101
+ email:
102
+ - tra38@nau.edu
103
+ executables:
104
+ - zombie_writer
105
+ extensions: []
106
+ extra_rdoc_files: []
107
+ files:
108
+ - ".gitignore"
109
+ - ".rspec"
110
+ - ".travis.yml"
111
+ - Gemfile
112
+ - LICENSE.md
113
+ - README.md
114
+ - Rakefile
115
+ - bin/console
116
+ - bin/setup
117
+ - exe/zombie_writer
118
+ - lib/zombie_writer.rb
119
+ - lib/zombie_writer/redcarpet_configuration.rb
120
+ - lib/zombie_writer/version.rb
121
+ - zombie_writer.gemspec
122
+ - zombie_writer_logo.png
123
+ homepage: https://github.com/tra38/Zombie
124
+ licenses:
125
+ - MIT
126
+ metadata: {}
127
+ post_install_message:
128
+ rdoc_options: []
129
+ require_paths:
130
+ - lib
131
+ required_ruby_version: !ruby/object:Gem::Requirement
132
+ requirements:
133
+ - - ">="
134
+ - !ruby/object:Gem::Version
135
+ version: '0'
136
+ required_rubygems_version: !ruby/object:Gem::Requirement
137
+ requirements:
138
+ - - ">="
139
+ - !ruby/object:Gem::Version
140
+ version: '0'
141
+ requirements: []
142
+ rubyforge_project:
143
+ rubygems_version: 2.4.8
144
+ signing_key:
145
+ specification_version: 4
146
+ summary: ZombieWriter is a Ruby gem that will enable users to generate news articles
147
+ by aggregating paragraphs from other sources.
148
+ test_files: []