jruby-boilerpipe 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d2bd36cfc4e41996a09592e1dcd8fcfd9a6e77b3
4
+ data.tar.gz: 3ab824412d7cd4122ac4a90d2c256c8a98ce274a
5
+ SHA512:
6
+ metadata.gz: de47596c123950baa83bdb8b3d9b5d5d27db205eb75689c0ecdfc23da79405e80619dbd1c5758d60acb304f0b57f40f8a3827168bf328edb4041aa1b730d9de7
7
+ data.tar.gz: 953cd72c4e6bf21afe8f792e395dee996dab8a6dec6a0615feafa7902eda99b49a01bb821bfb1555e97c550cae85b04c005b91274a0aa8cd6c88fbbf21f39d72
data/.gitignore ADDED
@@ -0,0 +1,9 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --format documentation
2
+ --color
data/Gemfile ADDED
@@ -0,0 +1,3 @@
1
+ source 'https://rubygems.org'
2
+
3
+ gemspec
data/README.md ADDED
@@ -0,0 +1,44 @@
1
+ # Boilerpipe
2
+
3
+ I saw other gems wrapping boilerpipe but they seemed to be outdated, hit the free api or I couldn't get them to work because of dependency issues. I went directly to the original author's github https://github.com/kohlschutter/boilerpipe and forked that code base here https://github.com/gregors/boilerpipe . I made one notible change that is to add a user agent so article extractor wouldn't return a 403 for a variety of web servers. I compiled all dependencies into the jar using the maven-assembly-plugin using Java 8. Until the original author releases a proper 2.0 release I'll be appending rcx to reflect the latest snapshots.
4
+
5
+ ## Installation
6
+
7
+ Add this line to your application's Gemfile:
8
+
9
+ ```ruby
10
+ gem 'jruby-boilerpipe', require: 'boilerpipe'
11
+ ```
12
+
13
+ And then execute:
14
+
15
+ $ bundle
16
+
17
+ Or install it yourself as:
18
+
19
+ $ gem install jruby-boilerpipe
20
+
21
+ ## Usage
22
+ You can feed Boilerpipe:ArticleExractor either a valid url or html content.
23
+
24
+ jruby-9.1.7.0 :001 > require 'boilerpipe'
25
+ => true
26
+ jruby-9.1.7.0 :003 > Boilerpipe::ArticleExtractor.text("https://github.com/jruby/jruby/wiki/AboutJRuby")
27
+ => "Clone this wiki locally\nAbout JRuby\nJRuby is a 100% Java implementation of the Ruby programming language. It is Ruby for the JVM.\nJRuby provides a complete set of core \"builtin\" classes and syntax for the Ruby language, as well as most of the Ruby Standard Libraries. The standard libraries are mostly Ruby's own complement of .rb files, but a few that depend on C language-based extensions have been reimplemented. Some are still missing, but we hope to implement as many as is feasible.\nSee Differences Between MRI And JRuby for more information on potential incompatibilities between JRuby and the C implementation of Ruby.\nDevelopment Team\nJRuby's current core development team consists of nine developers:\nCharles Oliver Nutter (Red Hat) aka headius\nThomas Enebo (Red Hat) aka enebo\nNick Sieger (LivingSocial)\nThere are also many past contributors who still help out from time to time:\nOla Bini (Thoughtworks)\nMany more...check out the JRuby commit logs!\nLinks\n"
28
+
29
+ jruby-9.1.7.0 :002 > require 'open-uri'
30
+ => true
31
+ jruby-9.1.7.0 :004 > contents = open("https://github.com/jruby/jruby/wiki/AboutJRuby").read
32
+ jruby-9.1.7.0 :005 > Boilerpipe::ArticleExtractor.text(contents)
33
+ => "Clone this wiki locally\nAbout JRuby\nJRuby is a 100% Java implementation of the Ruby programming language. It is Ruby for the JVM.\nJRuby provides a complete set of core \"builtin\" classes and syntax for the Ruby language, as well as most of the Ruby Standard Libraries. The standard libraries are mostly Ruby's own complement of .rb files, but a few that depend on C language-based extensions have been reimplemented. Some are still missing, but we hope to implement as many as is feasible.\nSee Differences Between MRI And JRuby for more information on potential incompatibilities between JRuby and the C implementation of Ruby.\nDevelopment Team\nJRuby's current core development team consists of nine developers:\nCharles Oliver Nutter (Red Hat) aka headius\nThomas Enebo (Red Hat) aka enebo\nNick Sieger (LivingSocial)\nThere are also many past contributors who still help out from time to time:\nOla Bini (Thoughtworks)\nMany more...check out the JRuby commit logs!\nLinks\n"
34
+
35
+ ## Development
36
+
37
+ After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
38
+
39
+ To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org).
40
+
41
+ ## Contributing
42
+
43
+ Bug reports and pull requests are welcome on GitHub at https://github.com/gregors/boilerpipe.
44
+
data/Rakefile ADDED
@@ -0,0 +1,6 @@
1
+ require "bundler/gem_tasks"
2
+ require "rspec/core/rake_task"
3
+
4
+ RSpec::Core::RakeTask.new(:spec)
5
+
6
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "boilerpipe"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,24 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'boilerpipe/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = 'jruby-boilerpipe'
8
+ spec.version = Boilerpipe::VERSION
9
+ spec.authors = ["Gregory Ostermayr"]
10
+ spec.email = ["<gregory.ostermayr@gmail.com>"]
11
+
12
+ spec.summary = %q{Ruby wrapper around boilerpipe java library}
13
+ spec.description = %q{Java8 compiled - latest boilerpipe-2.0-SNAPSHOT - including all dependencies}
14
+ spec.homepage = "https://github.com/gregors/jruby-boilerpipe"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
+ spec.bindir = "exe"
18
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "bundler", "~> 1.10"
22
+ spec.add_development_dependency "rake", "~> 10.0"
23
+ spec.add_development_dependency "rspec"
24
+ end
data/lib/boilerpipe.rb ADDED
@@ -0,0 +1,25 @@
1
+ require 'boilerpipe/version'
2
+ require_relative 'boilerpipe-common-2.0-SNAPSHOT-jar-with-dependencies.jar'
3
+
4
+ module Boilerpipe
5
+ java_import 'com.kohlschutter.boilerpipe.extractors.ArticleExtractor'
6
+ java_import java.net.URL
7
+
8
+ class ArticleExtractor
9
+ def self.get_text(s)
10
+ url = nil
11
+
12
+ begin
13
+ url = Java::JavaNet::URL.new(s)
14
+ rescue Java::JavaNet::MalformedURLException => e
15
+ # not a URL
16
+ end
17
+ input = url ? url : s
18
+ ArticleExtractor::INSTANCE.get_text(input)
19
+ end
20
+
21
+ class <<self
22
+ alias_method :text, :get_text
23
+ end
24
+ end
25
+ end
@@ -0,0 +1,3 @@
1
+ module Boilerpipe
2
+ VERSION = '0.0.1'
3
+ end
metadata ADDED
@@ -0,0 +1,96 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: jruby-boilerpipe
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Gregory Ostermayr
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2017-08-30 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ requirement: !ruby/object:Gem::Requirement
15
+ requirements:
16
+ - - "~>"
17
+ - !ruby/object:Gem::Version
18
+ version: '1.10'
19
+ name: bundler
20
+ prerelease: false
21
+ type: :development
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.10'
27
+ - !ruby/object:Gem::Dependency
28
+ requirement: !ruby/object:Gem::Requirement
29
+ requirements:
30
+ - - "~>"
31
+ - !ruby/object:Gem::Version
32
+ version: '10.0'
33
+ name: rake
34
+ prerelease: false
35
+ type: :development
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ requirement: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ name: rspec
48
+ prerelease: false
49
+ type: :development
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '0'
55
+ description: Java8 compiled - latest boilerpipe-2.0-SNAPSHOT - including all dependencies
56
+ email:
57
+ - "<gregory.ostermayr@gmail.com>"
58
+ executables: []
59
+ extensions: []
60
+ extra_rdoc_files: []
61
+ files:
62
+ - ".gitignore"
63
+ - ".rspec"
64
+ - Gemfile
65
+ - README.md
66
+ - Rakefile
67
+ - bin/console
68
+ - bin/setup
69
+ - jruby-boilerpipe.gemspec
70
+ - lib/boilerpipe-common-2.0-SNAPSHOT-jar-with-dependencies.jar
71
+ - lib/boilerpipe.rb
72
+ - lib/boilerpipe/version.rb
73
+ homepage: https://github.com/gregors/jruby-boilerpipe
74
+ licenses: []
75
+ metadata: {}
76
+ post_install_message:
77
+ rdoc_options: []
78
+ require_paths:
79
+ - lib
80
+ required_ruby_version: !ruby/object:Gem::Requirement
81
+ requirements:
82
+ - - ">="
83
+ - !ruby/object:Gem::Version
84
+ version: '0'
85
+ required_rubygems_version: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ requirements: []
91
+ rubyforge_project:
92
+ rubygems_version: 2.6.8
93
+ signing_key:
94
+ specification_version: 4
95
+ summary: Ruby wrapper around boilerpipe java library
96
+ test_files: []