wiki-yggdrasil 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: eb439b993ec8d953183a78347ba0823ea19f93351315965e22df8765d157ecaa
4
+ data.tar.gz: dec29e7e24a77c1a15bbd0a8af3bac4b5e552cd2de0e4d267e156e05a94a2556
5
+ SHA512:
6
+ metadata.gz: 240261bfd3dd92c15ac716e50ab7951750dd9738560fbed4cf5ff5a6e7ed6a8ef3d1d3fe093fdfab5de66074b2562ed7db383a63128782c31d9662c84b8f9602
7
+ data.tar.gz: 819209e6592bc880931476265e8c6e5a9d365923966c5229868f4e5ddd6ce5759a0c710bb0e25de3a69ce241154ced34a4032ef87a695b7de4ec2f3a87042780
data/.gitignore ADDED
@@ -0,0 +1,14 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /_yardoc/
4
+ /coverage/
5
+ /doc/
6
+ /pkg/
7
+ /spec/reports/
8
+ /tmp/
9
+
10
+ # rspec failure tracking
11
+ .rspec_status
12
+
13
+ # Ignore Gemfile.lock
14
+ *.lock
data/.rspec ADDED
@@ -0,0 +1,3 @@
1
+ --format documentation
2
+ --color
3
+ --require spec_helper
data/.travis.yml ADDED
@@ -0,0 +1,5 @@
1
+ sudo: false
2
+ language: ruby
3
+ rvm:
4
+ - 2.5.0
5
+ before_install: gem install bundler -v 1.16.0
data/Gemfile ADDED
@@ -0,0 +1,6 @@
1
+ source "https://rubygems.org"
2
+
3
+ git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
4
+
5
+ # Specify your gem's dependencies in wiki-yggdrasil.gemspec
6
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2018 alex0112
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,55 @@
1
+ # Wiki::Yggdrasil
2
+ ![Travis CI Build](https://travis-ci.org/alex0112/wiki-yggdrasil.svg?branch=master)
3
+
4
+ You. You're up late at night again reading up on some obscure mathematical topic. You find yourself with *so many* open tabs on Wikipedia. Wouldn't it be nice if you could just pick an article, and then view a tree of the articles it references?
5
+
6
+ Introducing Wiki::Yggdrasil. Named after the tree in Norse mythology that drinks from the well of all wisdom, Wiki::Yggdrasil is here to help you drink just as deeply from the well of wisdom that is Wikipedia.
7
+
8
+ Wiki::Yggdrasil takes a Wikipedia URI as an argument, and proceeds to spider out a dependency tree of referenced articles.
9
+
10
+ ## Usage
11
+ ```ruby
12
+ require 'wiki/yggdrasil'
13
+
14
+ @tree = Wiki::Yggdrasil::Yggdrasil.new(uri: 'http://en.wikipedia.org/wiki/Yggdrasil')
15
+ referenced_articles = @tree.children(depth: 3) ## A hash of of articles linked by the parent
16
+ ```
17
+
18
+ ## FAQ
19
+
20
+ ### This is taking a long time. Is that normal?
21
+ Yes. This is normal. Any Yggdrasil object created with a depth of three or higher will likely take a few minutes to scrape the necessary information.
22
+
23
+ ### Why didn't you just use Wikipedia's API?
24
+ Wikipedia's API doesn't have an endpoint that allows you to programatically view the summary section of each article and its children. If it did that would obviously be the ideal choice.
25
+
26
+ ## Installation
27
+
28
+ Add this line to your application's Gemfile:
29
+
30
+ ```ruby
31
+ gem 'wiki-yggdrasil'
32
+ ```
33
+
34
+ And then execute:
35
+
36
+ $ bundle
37
+
38
+ Or install it yourself as:
39
+
40
+ $ gem install wiki-yggdrasil
41
+
42
+
43
+ ## Development
44
+
45
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
46
+
47
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
48
+
49
+ ## Contributing
50
+
51
+ Bug reports and pull requests are welcome on GitHub at https://github.com/alex0112/wiki-yggdrasil.
52
+
53
+ ## License
54
+
55
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "wiki/yggdrasil"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start(__FILE__)
data/bin/setup ADDED
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+ set -vx
5
+
6
+ bundle install
7
+
8
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,43 @@
1
+ require 'nokogiri'
2
+ require 'open-uri'
3
+ module Wiki::Yggdrasil
4
+
5
+ class Article
6
+ attr_reader :uri
7
+
8
+ def initialize(uri:)
9
+ raise ArgumentError unless Wiki::Yggdrasil::Article.is_valid_wiki_article?(uri: uri)
10
+ @uri = uri
11
+ @summary = nil
12
+ @child_links = nil
13
+ end
14
+
15
+ def summary
16
+ @summary ||= Nokogiri::HTML(Nokogiri::HTML(open(self.uri)).to_s.split('<div id="toc" class="toc">')[0]).css('p') ## TODO: Cleanup
17
+ end
18
+
19
+ def child_links
20
+ formatted_links = format_links
21
+ validated_links = formatted_links.select { |uri| Wiki::Yggdrasil::Article.is_valid_wiki_article?(uri: uri) }
22
+ @child_links ||= validated_links
23
+ end
24
+
25
+ def scrape_all_summary_links
26
+ self.summary.css('p a')
27
+ end
28
+
29
+ def format_links(anchors: self.scrape_all_summary_links)
30
+ uris = anchors.map do |anchor|
31
+ anchor.nil? || anchor['href'].nil? ? next : 'https://en.wikipedia.org' << anchor['href'] ## nil href attributes are often self refs (but possibly not always). Ignore them.
32
+ end
33
+
34
+ uris.compact
35
+ end
36
+
37
+ def self.is_valid_wiki_article?(uri:)
38
+ ## Is this URI a wikipedia article?
39
+ uri =~ /.*wikipedia\.org\/wiki\/.+/ ? true : false
40
+ end
41
+
42
+ end
43
+ end
@@ -0,0 +1,5 @@
1
+ module Wiki
2
+ module Yggdrasil
3
+ VERSION = "0.1.0"
4
+ end
5
+ end
@@ -0,0 +1,33 @@
1
+ require "wiki/yggdrasil/version"
2
+
3
+ module Wiki
4
+ module Yggdrasil
5
+ require 'wiki/article'
6
+
7
+ class Yggdrasil
8
+ attr_reader :root
9
+
10
+ def initialize(uri:)
11
+ @root = Wiki::Yggdrasil::Article.new(uri: uri)
12
+ @children = nil
13
+ end
14
+
15
+ def children(depth: 4, article_children: self.root.child_links)
16
+ get_children = lambda do |depth, article_children|
17
+ article_children.each_with_object({}) do |uri, tree|
18
+ if (depth == 1)
19
+ tree[uri] = nil
20
+ else
21
+ article = Wiki::Yggdrasil::Article.new(uri: uri)
22
+ @children = tree
23
+ tree[uri] = get_children.call(depth - 1, article.child_links)
24
+ end
25
+ end
26
+ end
27
+
28
+ @children ||= get_children.call(depth, article_children)
29
+ end
30
+
31
+ end
32
+ end
33
+ end
@@ -0,0 +1,38 @@
1
+
2
+ lib = File.expand_path("../lib", __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require "wiki/yggdrasil/version"
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "wiki-yggdrasil"
8
+ spec.version = Wiki::Yggdrasil::VERSION
9
+ spec.authors = ["alex0112"]
10
+ spec.email = ["alarsen0112@gmail.com"]
11
+
12
+ spec.summary = %q{ Scrape Wikipedia articles and generate a json tree }
13
+ spec.description = %q{ Given a Wikipedia article, generate a tree of linked articles from the summary of the first.}
14
+ spec.homepage = "https://github.com/alex0112/wiki-yggdrasil"
15
+ spec.license = "MIT"
16
+
17
+ # Prevent pushing this gem to RubyGems.org. To allow pushes either set the 'allowed_push_host'
18
+ # to allow pushing to a single host or delete this section to allow pushing to any host.
19
+ if spec.respond_to?(:metadata)
20
+ spec.metadata["allowed_push_host"] = 'https://rubygems.org'
21
+ else
22
+ raise "RubyGems 2.0 or newer is required to protect against " \
23
+ "public gem pushes."
24
+ end
25
+
26
+ spec.files = `git ls-files -z`.split("\x0").reject do |f|
27
+ f.match(%r{^(test|spec|features)/})
28
+ end
29
+ spec.bindir = "exe"
30
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
31
+ spec.require_paths = ["lib"]
32
+
33
+ spec.add_runtime_dependency "nokogiri", "~> 1.8.2"
34
+
35
+ spec.add_development_dependency "bundler", "~> 1.16"
36
+ spec.add_development_dependency "rake", "~> 10.0"
37
+ spec.add_development_dependency "rspec", "~> 3.0"
38
+ end
metadata ADDED
@@ -0,0 +1,115 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: wiki-yggdrasil
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - alex0112
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2018-06-12 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: 1.8.2
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: 1.8.2
27
+ - !ruby/object:Gem::Dependency
28
+ name: bundler
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '1.16'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '1.16'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rake
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '10.0'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '10.0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rspec
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '3.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '3.0'
69
+ description: " Given a Wikipedia article, generate a tree of linked articles from
70
+ the summary of the first."
71
+ email:
72
+ - alarsen0112@gmail.com
73
+ executables: []
74
+ extensions: []
75
+ extra_rdoc_files: []
76
+ files:
77
+ - ".gitignore"
78
+ - ".rspec"
79
+ - ".travis.yml"
80
+ - Gemfile
81
+ - LICENSE.txt
82
+ - README.md
83
+ - Rakefile
84
+ - bin/console
85
+ - bin/setup
86
+ - lib/wiki/article.rb
87
+ - lib/wiki/yggdrasil.rb
88
+ - lib/wiki/yggdrasil/version.rb
89
+ - wiki-yggdrasil.gemspec
90
+ homepage: https://github.com/alex0112/wiki-yggdrasil
91
+ licenses:
92
+ - MIT
93
+ metadata:
94
+ allowed_push_host: https://rubygems.org
95
+ post_install_message:
96
+ rdoc_options: []
97
+ require_paths:
98
+ - lib
99
+ required_ruby_version: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ required_rubygems_version: !ruby/object:Gem::Requirement
105
+ requirements:
106
+ - - ">="
107
+ - !ruby/object:Gem::Version
108
+ version: '0'
109
+ requirements: []
110
+ rubyforge_project:
111
+ rubygems_version: 2.7.4
112
+ signing_key:
113
+ specification_version: 4
114
+ summary: Scrape Wikipedia articles and generate a json tree
115
+ test_files: []