RubyGems - nhkore - Versions diffs - 0.3.1 → 0.3.2 - Mend

nhkore 0.3.1 → 0.3.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +28 -2
data/README.md +468 -3
data/lib/nhkore/app.rb +1 -0
data/lib/nhkore/article_scraper.rb +12 -11
data/lib/nhkore/cli/news_cmd.rb +1 -1
data/lib/nhkore/lib.rb +58 -0
data/lib/nhkore/scraper.rb +18 -4
data/lib/nhkore/sifter.rb +15 -10
data/lib/nhkore/util.rb +7 -2
data/lib/nhkore/variator.rb +1 -0
data/lib/nhkore/version.rb +1 -1
metadata +4 -3

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: fb2c0e6e53995b874a9e53c44b024f993032433d1a87c37e7b7bdea69965902d
-  data.tar.gz: 13d34c53fe9af9efa985c05089b1588eb1e76d6321f9aff18cc5da80598a52d4
+  metadata.gz: cf151c3859812632f09b1a464164f31bb0ce050f37ed7e7377f76265571ebd41
+  data.tar.gz: 1f3ee801e7557731cae4aeacd3f18fea4d7f33ac65b6ec77511a7d3d8f17856a
 SHA512:
-  metadata.gz: 643723d42e939a7852eca3b90c3ec4e65085838317eb59c1d8f21f79dd647d2e77e5ea68ab2ff3b5a208608f9bf350121a9918cb318dec6c3047731b73f59294
-  data.tar.gz: 3481fea3a3895a5b85ac3fcd5a77fe9b811f84e9a19b395a1de1d2e9b31fda93c5fb49a8d7d43581e05cb90c6f844f8537c5a97d73937c2b8ee97728ac7c7a1f
+  metadata.gz: 7e7d0d5b805ad6fa4312e8be26f3115dff18665b3762073c56db3a7a6a343a3ee6a05e47889e0abf7b62df3bb84cf5c977fce3efdfeb8a65c7bcff8167839d35
+  data.tar.gz: 957bc3da8492310d287a8947b9080f8be417f0874c3226db4f0bb63d020bee06c51a3da81c1fa3f779de22d354a32ab4cf41fc6f3018840774c31fd7060fbec3

data/CHANGELOG.md CHANGED Viewed

@@ -2,7 +2,33 @@
 Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
-## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.1...master)
+## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.2...master)
+## [v0.3.2] - 2020-04-22
+### Added
+- lib/nhkore/lib.rb
+    - Requires all files, excluding CLI-related files for speed when using this Gem as a library.
+- Scraper
+    - Added open_file() & reopen().
+- samples/looper.rb
+    - Script example of continuously scraping all articles.
+### Changed
+- README
+    - Finished writing the initial version of all sections.
+- ArticleScraper
+    - Changed the `year` param to expect an int, instead of a string.
+- Sifter
+    - In filter_by_datetime(), renamed keyword args `from_filter,to_filter` to simply `from,to`.
+### Fixed
+- Reduced load time of app from ~1s to 0.3~0.5s.
+    - Moved many `require '...'` statements into methods.
+    - It looks ugly & is not a good coding practice, but a necessary evil.
+    - Load time is still pretty slow (but a lot better!).
+- ArticleScraper
+    - Renamed `mode` param to `strict`. `mode` was overshadowing File.open()'s in Scraper.
 ## [v0.3.1] - 2020-04-20
@@ -11,7 +37,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 - NewsCmd/SiftCmd
     - Added `--no-sha256` option to not check if article links have already been scraped based on their contents' SHA-256.
 - Util
-    - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows &amp; Linux, it will check for both `/` &amp; `\`.
+    - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows & Linux, it will check for both `/` & `\`.
 ### Fixed
 - Reduced load time of app from ~1s to ~0.3-5s by moving some requires into methods.

data/README.md CHANGED Viewed

@@ -293,7 +293,7 @@ links:
 If you don't wish to edit this file by hand (or programmatically), that's where the `search` command comes into play.
-Currently, it only searches &amp; scrapes `bing.com`, but other search engines and/or methods can easily be added in the future.
+Currently, it only searches & scrapes `bing.com`, but other search engines and/or methods can easily be added in the future.
 Example usage:
@@ -319,6 +319,49 @@ Complete demo:
 #### News Command [^](#contents)
+In [The Basics](#the-basics-), you learned how to scrape 1 article using the `-u/--url` option with the `news` command.
+After creating a file of links from the [search](#search-command-) command (or manually/programmatically), you can also scrape multiple articles from this file using the `news` command.
+The defaults will scrape the 1st unscraped article from the `links` file:
+`$ nhkore news easy`
+You can scrape the 1st **X** unscraped articles with the `-s/--scrape` option:
+```
+# Scrape the 1st 11 unscraped articles.
+$ nhkore news -s 11 easy
+```
+You may wish to re-scrape articles that have already been scraped with the `-r/--redo` option:
+`$ nhkore news -r -s 11 easy`
+If you only wish to scrape specific article links, then you should use the `-k/--like` option, which does a fuzzy search on the URLs. For example, `--like '00123'` will match these links:
+- http<span>s://w</span>ww3.nhk.or.jp/news/easy/k1**00123**23711000/k10012323711000.html
+- http<span>s://w</span>ww3.nhk.or.jp/news/easy/k1**00123**21401000/k10012321401000.html
+- http<span>s://w</span>ww3.nhk.or.jp/news/easy/k1**00123**21511000/k10012321511000.html
+- ...
+`$ nhkore news -k '00123' -s 11 easy`
+Lastly, you can show the dictionary URL and contents for the 1st article if you're getting dictionary-related errors:
+```
+# This will exit after showing the 1st article's dictionary.
+$ nhkore news easy --show-dict
+```
+For the rest of the options, please see [The Basics](#the-basics-).
+Complete demo:
+[![asciinema Demo - News](https://asciinema.org/a/322324.png)](https://asciinema.org/a/322324)
+When I first scraped all of the articles in [nhkore-core.zip](https://github.com/esotericpig/nhkore/releases/latest), I had to use this [script](samples/looper.rb) because my internet isn't very good.
 ## Using the Library [^](#contents)
 ### Setup
@@ -336,11 +379,431 @@ In your *Gemfile*:
 ```Ruby
 # Pick one...
 gem 'nhkore', '~> X.X'
-gem 'nhkore', :git => 'https://github.com/esotericpig/psychgus.git', :tag => 'vX.X.X'
+gem 'nhkore', :git => 'https://github.com/esotericpig/nhkore.git', :tag => 'vX.X.X'
+```
+### Require
+In order to not require all of the CLI-related files, require this file instead:
+```Ruby
+require 'nhkore/lib'
+#require 'nhkore' # Slower
 ```
 ### Scraper
+All scraper classes extend this class. You can either extend it or use it by itself. It's a simple wrapper around *open-uri*, *Nokogiri*, etc.
+`initialize` automatically opens (connects to) the URL.
+```Ruby
+require 'nhkore/scraper'
+class MyScraper < NHKore::Scraper
+  def initialize()
+    super('https://www3.nhk.or.jp/news/easy/')
+  end
+end
+m = MyScraper.new()
+s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/')
+# Read all content into a String.
+mstr = m.read()
+sstr = s.read()
+# Get a Nokogiri::HTML object.
+mdoc = m.html_doc()
+sdoc = s.html_doc()
+# Get a RSS object.
+s = NHKore::Scraper.new('https://www.bing.com/search?format=rss&q=site%3Anhk.or.jp%2Fnews%2Feasy%2F&count=100')
+rss = s.rss_doc()
+```
+There are several useful options:
+```Ruby
+require 'nhkore/scraper'
+s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
+  open_timeout: 300, # Open timeout in seconds (default: nil)
+  read_timeout: 300, # Read timeout in seconds (default: nil)
+  # Maximum number of times to retry the URL
+  # - default: 3
+  # - Open/connect will fail a couple of times on a bad/slow internet connection.
+  max_retries: 10,
+  # Maximum number of redirects allowed.
+  # - default: 3
+  # - You can set this to nil or -1, but I recommend using a number
+  #   for safety (infinite-loop attack).
+  max_redirects: 1,
+  # How to check redirect URLs for safety.
+  # - default: :strict
+  # - nil      => do not check
+  # - :lenient => check the scheme only
+  #               (i.e., if https, redirect URL must be https)
+  # - :strict  => check the scheme and domain
+  #               (i.e., if https://bing.com, redirect URL must be https://bing.com)
+  redirect_rule: :lenient,
+  # Set the HTTP header field 'cookie' from the 'set-cookie' response.
+  # - default: false
+  # - Currently uses the 'http-cookie' Gem.
+  # - This is currently a time-consuming operation because it opens the URL twice.
+  # - Necessary for Search Engines or other sites that require cookies
+  #   in order to block bots.
+  eat_cookie: true,
+  # Set HTTP header fields.
+  # - default: nil
+  # - Necessary for Search Engines or other sites that try to block bots.
+  # - Simply pass in a Hash (not nil) to set the default ones.
+  header: {'user-agent' => 'Skynet'}, # Must use strings
+)
+# Open the URL yourself. This will be passed in directly to Nokogiri::HTML().
+# - In this way, you can use Faraday, HTTParty, RestClient, httprb/http, or
+#   some other Gem.
+s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
+  str_or_io: URI.open('https://www3.nhk.or.jp/news/easy/',redirect: false)
+)
+# Open and parse a file instead of a URL (for offline testing or slow internet).
+s = NHKore::Scraper.new('./my_article.html',is_file: true)
+doc = s.html_doc()
+```
+Here are some other useful methods:
+```Ruby
+require 'nhkore/scraper'
+s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/')
+s.reopen() # Re-open the current URL.
+# Get a relative URL.
+url = s.join_url('../../monkey.html')
+puts url # https://www3.nhk.or.jp/monkey.html
+# Open a new URL or file.
+s.open(url)
+s.open(url,URI.open(url,redirect: false))
+s.open('./my_article.html',is_file: true)
+# Open a file manually.
+s.open_file('./my_article.html')
+# Fetch the cookie & open a new URL manually.
+s.fetch_cookie(url)
+s.open_url(url)
+```
+### SearchScraper & BingScraper
+`SearchScraper` is used for scraping Search Engines for NHK News Web (Easy) links. It can also be used for search in general.
+By default, it sets the default HTTP header fields and fetches & sets the cookie.
+```Ruby
+require 'nhkore/search_scraper'
+ss = NHKore::SearchScraper.new('https://www.bing.com/search?q=nhk&count=100')
+doc = ss.html_doc()
+doc.css('a').each() do |anchor|
+  link = anchor['href']
+  next if ss.ignore_link?(link)
+  if link.include?('https://www3.nhk')
+    puts link
+  end
+end
+```
+`BingScraper` will search `bing.com` for you.
+```Ruby
+require 'nhkore/search_link'
+require 'nhkore/search_scraper'
+bs     = NHKore::BingScraper.new(:yasashii)
+slinks = NHKore::SearchLinks.new()
+next_page = bs.scrape(slinks)
+page_num  = 1
+while !next_page.empty?()
+  puts "Page #{page_num += 1}: #{next_page.count}"
+  bs = NHKore::BingScraper.new(:yasashii,url: next_page.url)
+  next_page = bs.scrape(slinks,next_page)
+end
+slinks.links.values.each() do |link|
+  puts link.url
+end
+```
+### ArticleScraper & DictScraper
+`ArticleScraper` scrapes an NHK News Web Easy article. Regular articles aren't currently supported.
+```Ruby
+require 'nhkore/article_scraper'
+as = NHKore::ArticleScraper.new(
+  'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html',
+  # If false, scrape the article leniently (for older articles which
+  # may not have certain tags, etc.).
+  # - default: true
+  strict: false,
+  # {Dict} to use as the dictionary for words (Easy articles).
+  # - default: :scrape
+  # - nil     => don't scrape/use it (necessary for Regular articles)
+  # - :scrape => auto-scrape it using {DictScraper}
+  # - {Dict}  => your own {Dict}
+  dict: nil,
+  # Date time to use as a fallback if the article doesn't have one
+  # (for older articles).
+  # - default: nil
+  datetime: Time.new(2020,2,2),
+  # Year to use as a fallback if the article doesn't have one
+  # (for older articles).
+  # - default: nil
+  year: 2020,
+)
+article = as.scrape()
+article.datetime
+article.futsuurl
+article.sha256
+article.title
+article.url
+article.words.each() do |key,word|
+  word.defn
+  word.eng
+  word.freq
+  word.kana
+  word.kanji
+  word.key
+end
+puts article.to_s(mini: true)
+puts '---'
+puts article
+```
+`DictScraper` scrapes an Easy article's dictionary file (JSON).
+```Ruby
+require 'nhkore/dict_scraper'
+url = 'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html'
+ds  = NHKore::DictScraper.new(
+  url,
+  # Change the URL appropriately to the dictionary URL.
+  # - default: true
+  parse_url: true,
+)
+puts NHKore::DictScraper.parse_url(url)
+puts
+dict = ds.scrape()
+dict.entries.each() do |key,entry|
+  entry.id
+  entry.defns.each() do |defn|
+    defn.hyoukis.each() {|hyouki| }
+    defn.text
+    defn.words.each() {|word| }
+  end
+  puts entry.build_hyouki()
+  puts entry.build_defn()
+  puts '---'
+end
+puts
+puts dict
+```
+### Fileable
+Any class that includes the `Fileable` mixin will have the following methods:
+- Class.load_file(file,mode: 'rt:BOM|UTF-8',**kargs)
+- save_file(file,mode: 'wt',**kargs)
+Any *kargs* will be passed to `File.open()`.
+```Ruby
+require 'nhkore/news'
+require 'nhkore/search_link'
+yn = NHKore::YasashiiNews.load_file()
+sl = NHKore::SearchLinks.load_file(NHKore::SearchLinks::DEFAULT_YASASHII_FILE)
+yn.articles.each() {|key,article| }
+yn.sha256s.each()  {|sha256,url|  }
+sl.links.each() do |key,link|
+  link.datetime
+  link.futsuurl
+  link.scraped?
+  link.sha256
+  link.title
+  link.url
+end
+#yn.save_file()
+#sl.save_file(NHKore::SearchLinks::DEFAULT_YASASHII_FILE)
+```
+### Sifter
+`Sifter` will sift & sort the `News` data into a single file. The data is sorted by frequency in descending order (i.e., most frequent words first).
+```Ruby
+require 'nhkore/news'
+require 'nhkore/sifter'
+require 'time'
+news = NHKore::YasashiiNews.load_file()
+sifter = NHKore::Sifter.new(news)
+sifter.caption = 'Sakura Fields Forever!'
+# Filter the data.
+#sifter.filter_by_datetime(Time.new(2019,12,5))
+sifter.filter_by_datetime(
+  from: Time.new(2019,12,4),to: Time.new(2019,12,7)
+)
+sifter.filter_by_title('桜')
+sifter.filter_by_url('k100')
+# Ignore (or blank out) certain columns from the output.
+sifter.ignore(:defn)
+sifter.ignore(:eng)
+# An array of the filtered & sorted words.
+words = sifter.sift()
+# Choose the file format.
+#sifter.put_csv!()
+#sifter.put_html!()
+sifter.put_yaml!()
+# Save to a file.
+file = 'sakura.yml'
+if !File.exist?(file)
+  sifter.save_file(file)
+end
+```
+### Util & UserAgents
+These provide a variety of useful methods/constants.
+Here are some of the most useful ones:
+```Ruby
+require 'nhkore/user_agents'
+require 'nhkore/util'
+include NHKore
+puts '======='
+puts '[ Net ]'
+puts '======='
+# Get a random User Agent for HTTP header field 'User-Agent'.
+# - This is used by default in Scraper/SearchScraper.
+puts "User-Agent:  #{UserAgents.sample()}"
+uri = URI('https://www.bing.com/search?q=nhk')
+Util.replace_uri_query!(uri,q: 'banana')
+puts "URI query:   #{uri}" # https://www.bing.com/search?q=banana
+# nhk.or.jp
+puts "Domain:      #{Util.domain(URI('https://www.nhk.or.jp/news/easy').host)}"
+# Ben &amp; Jerry&#39;s<br>
+puts "Escape HTML: #{Util.escape_html("Ben & Jerry's\n")}"
+puts
+puts '========'
+puts '[ Time ]'
+puts '========'
+puts "JST now:   #{Util.jst_now}"
+# Drops in JST_OFFSET, does not change hour/min.
+puts "JST time:  #{Util.jst_time(Time.now)}"
+puts "JST year:  #{Util::JST_YEAR}"
+puts "1999 sane? #{Util.sane_year?(1999)}" # true
+puts "1776 sane? #{Util.sane_year?(1776)}" # false
+puts "Guess 5:   #{Util.guess_year(5)}"    # 2005
+puts "Guess 99:  #{Util.guess_year(99)}"   # 1999
+puts
+puts "JST timezone offset:        #{Util::JST_OFFSET}"
+puts "JST timezone offset hour:   #{Util::JST_OFFSET_HOUR}"
+puts "JST timezone offset minute: #{Util::JST_OFFSET_MIN}"
+puts
+puts '============'
+puts '[ Japanese ]'
+puts '============'
+JPN = ['桜','ぶ','ブ']
+def fmt_jpn()
+  fmt = []
+  JPN.each() do |x|
+    x = yield(x)
+    x = x ? "\u2B55" : Util::JPN_SPACE unless x.is_a?(String)
+    fmt << x
+  end
+  return "[ #{fmt.join(' | ')} ]"
+end
+puts "          #{fmt_jpn{|x| x}}"
+puts "Hiragana? #{fmt_jpn{|x| !!Util.hiragana?(x)}}"
+puts "Kana?     #{fmt_jpn{|x| !!Util.kana?(x)}}"
+puts "Kanji?    #{fmt_jpn{|x| !!Util.kanji?(x)}}"
+puts "Reduce:   #{Util.reduce_jpn_space("'     '")}"
+puts
+puts '========='
+puts '[ Files ]'
+puts '========='
+puts "Dir str?   #{Util.dir_str?('dir/')}"          # true
+puts "Dir str?   #{Util.dir_str?('dir')}"           # false
+puts "File str?  #{Util.filename_str?('file')}"     # true
+puts "File str?  #{Util.filename_str?('dir/file')}" # false
+```
 ## Hacking [^](#contents)
 ```
@@ -370,7 +833,9 @@ $ bundle exec rake nokogiri_other # macOS, Windows, etc.
 `$ bundle exec rake doc`
-### Installing Locally (without Network Access)
+### Installing Locally
+You can make some changes/fixes to the code and then install your local version:
 `$ bundle exec rake install:local`

data/lib/nhkore/app.rb CHANGED Viewed

@@ -24,6 +24,7 @@
 require 'cri'
 require 'highline'
 require 'rainbow'
+require 'set'
 require 'tty-spinner'
 require 'nhkore/error'

data/lib/nhkore/article_scraper.rb CHANGED Viewed

@@ -47,19 +47,21 @@ module NHKore
     attr_accessor :dict
     attr_reader :kargs
     attr_accessor :missingno
-    attr_accessor :mode
     attr_reader :polishers
     attr_accessor :splitter
+    attr_accessor :strict
     attr_reader :variators
     attr_accessor :year
+    alias_method :strict?,:strict
     # @param dict [Dict,:scrape,nil] the {Dict} (dictionary) to use for {Word#defn} (definitions)
     #             [+:scrape+] auto-scrape it using {DictScraper}
     #             [+nil+]     don't scrape/use it
     # @param missingno [Missingno] data to use as a fallback for Ruby words without kana/kanji,
     #                  instead of raising an error
-    # @param mode [nil,:lenient]
-    def initialize(url,cleaners: [BestCleaner.new()],datetime: nil,dict: :scrape,missingno: nil,mode: nil,polishers: [BestPolisher.new()],splitter: BestSplitter.new(),variators: [BestVariator.new()],year: nil,**kargs)
+    # @param strict [true,false]
+    def initialize(url,cleaners: [BestCleaner.new()],datetime: nil,dict: :scrape,missingno: nil,polishers: [BestPolisher.new()],splitter: BestSplitter.new(),strict: true,variators: [BestVariator.new()],year: nil,**kargs)
       super(url,**kargs)
       @cleaners = Array(cleaners)
@@ -67,9 +69,9 @@ module NHKore
       @dict = dict
       @kargs = kargs
       @missingno = missingno
-      @mode = mode
       @polishers = Array(polishers)
       @splitter = splitter
+      @strict = strict
       @variators = Array(variators)
       @year = year
     end
@@ -188,7 +190,7 @@ module NHKore
       tag = doc.css('div.article-body') if tag.length < 1
       # - https://www3.nhk.or.jp/news/easy/tsunamikeihou/index.html
-      tag = doc.css('div#main') if tag.length < 1 && @mode == :lenient
+      tag = doc.css('div#main') if tag.length < 1 && !@strict
       if tag.length > 0
         text = Util.unspace_web_str(tag.text.to_s())
@@ -481,7 +483,7 @@ module NHKore
     def scrape_title(doc,article)
       tag = doc.css('h1.article-main__title')
-      if tag.length < 1 && @mode == :lenient
+      if tag.length < 1 && !@strict
         # This shouldn't be used except for select sites.
         # - https://www3.nhk.or.jp/news/easy/tsunamikeihou/index.html
@@ -583,7 +585,7 @@ module NHKore
       end
       # As a last resort, use our user-defined fallbacks (if specified).
-      return @year unless Util.empty_web_str?(@year)
+      return @year.to_i() unless @year.nil?()
       return @datetime.year if !@datetime.nil?() && Util.sane_year?(@datetime.year)
       raise ScrapeError,"could not scrape year at URL[#{@url}]"
@@ -604,11 +606,10 @@ module NHKore
     end
     def warn_or_error(klass,msg)
-      case @mode
-      when :lenient
-        Util.warn(msg)
-      else
+      if @strict
         raise klass,msg
+      else
+        Util.warn(msg)
       end
     end
   end

data/lib/nhkore/cli/news_cmd.rb CHANGED Viewed

@@ -237,7 +237,7 @@ module CLI
         dict: dict,
         is_file: is_file,
         missingno: missingno ? Missingno.new(news) : nil,
-        mode: lenient ? :lenient : nil,
+        strict: !lenient,
       })
       @news_dict_scraper_kargs = @scraper_kargs.merge({
         is_file: is_file,

data/lib/nhkore/lib.rb ADDED Viewed

@@ -0,0 +1,58 @@
+#!/usr/bin/env ruby
+# encoding: UTF-8
+# frozen_string_literal: true
+#--
+# This file is part of NHKore.
+# Copyright (c) 2020 Jonathan Bradley Whited (@esotericpig)
+#
+# NHKore is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# NHKore is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public License
+# along with NHKore.  If not, see <https://www.gnu.org/licenses/>.
+#++
+require 'nhkore/article'
+require 'nhkore/article_scraper'
+require 'nhkore/cleaner'
+require 'nhkore/defn'
+require 'nhkore/dict'
+require 'nhkore/dict_scraper'
+require 'nhkore/entry'
+require 'nhkore/error'
+require 'nhkore/fileable'
+require 'nhkore/missingno'
+require 'nhkore/news'
+require 'nhkore/polisher'
+require 'nhkore/scraper'
+require 'nhkore/search_link'
+require 'nhkore/search_scraper'
+require 'nhkore/sifter'
+require 'nhkore/splitter'
+require 'nhkore/user_agents'
+require 'nhkore/util'
+require 'nhkore/variator'
+require 'nhkore/version'
+require 'nhkore/word'
+module NHKore
+  ###
+  # Include this file to only require the files needed to use this
+  # Gem as a library (i.e., don't include CLI-related files).
+  #
+  # @author Jonathan Bradley Whited (@esotericpig)
+  # @since  0.3.2
+  ###
+  module Lib
+  end
+end

data/lib/nhkore/scraper.rb CHANGED Viewed

@@ -82,7 +82,7 @@ module NHKore
       @max_retries = max_retries
       @redirect_rule = redirect_rule
-      open(url,str_or_io)
+      open(url,str_or_io,is_file: is_file)
     end
     def fetch_cookie(url)
@@ -119,14 +119,14 @@ module NHKore
       return URI::join(@url,relative_url)
     end
-    def open(url,str_or_io=nil)
+    def open(url,str_or_io=nil,is_file: @is_file)
+      @is_file = is_file
       @str_or_io = str_or_io
       @url = url
       if str_or_io.nil?()
         if @is_file
-          # NHK's website tends to always use UTF-8.
-          @str_or_io = File.open(url,'rt:UTF-8',**@kargs)
+          open_file(url)
         else
           fetch_cookie(url) if @eat_cookie
           open_url(url)
@@ -136,6 +136,16 @@ module NHKore
       return self
     end
+    def open_file(file)
+      @is_file = true
+      @url = file
+      # NHK's website tends to always use UTF-8.
+      @str_or_io = File.open(file,'rt:UTF-8',**@kargs)
+      return self
+    end
     def open_url(url)
       max_redirects = (@max_redirects.nil?() || @max_redirects < 0) ? 10_000 : @max_redirects
       max_retries = (@max_retries.nil?() || @max_retries < 0) ? 10_000 : @max_retries
@@ -194,6 +204,10 @@ module NHKore
       return @str_or_io
     end
+    def reopen()
+      return open(@url)
+    end
     def rss_doc()
       require 'rss'

data/lib/nhkore/sifter.rb CHANGED Viewed

@@ -93,24 +93,29 @@ module NHKore
       return false
     end
-    def filter_by_datetime(datetime_filter=nil,from_filter: nil,to_filter: nil)
+    def filter_by_datetime(datetime_filter=nil,from: nil,to: nil)
       if !datetime_filter.nil?()
-        # If out-of-bounds, just nil.
-        from_filter = datetime_filter[0]
-        to_filter = datetime_filter[1]
+        if datetime_filter.respond_to?(:'[]')
+          # If out-of-bounds, just nil.
+          from = datetime_filter[0] if from.nil?()
+          to = datetime_filter[1] if to.nil?()
+        else
+          from = datetime_filter if from.nil?()
+          to = datetime_filter if to.nil?()
+        end
       end
-      from_filter = to_filter if from_filter.nil?()
-      to_filter = from_filter if to_filter.nil?()
+      from = to if from.nil?()
+      to = from if to.nil?()
-      from_filter = Util.jst_time(from_filter) unless from_filter.nil?()
-      to_filter = Util.jst_time(to_filter) unless to_filter.nil?()
+      from = Util.jst_time(from) unless from.nil?()
+      to = Util.jst_time(to) unless to.nil?()
-      datetime_filter = [from_filter,to_filter]
+      datetime_filter = [from,to]
       return self if datetime_filter.flatten().compact().empty?()
-      @filters[:datetime] = {from: from_filter,to: to_filter}
+      @filters[:datetime] = {from: from,to: to}
       return self
     end

data/lib/nhkore/util.rb CHANGED Viewed

@@ -22,8 +22,7 @@
 require 'cgi'
-require 'psychgus'
-require 'public_suffix'
+require 'set'
 require 'time'
 require 'uri'
@@ -68,6 +67,8 @@ module NHKore
     end
     def self.domain(host,clean: true)
+      require 'public_suffix'
       domain = PublicSuffix.domain(host)
       domain = unspace_web_str(domain).downcase() if !domain.nil?() && clean
@@ -75,6 +76,8 @@ module NHKore
     end
     def self.dump_yaml(obj,flow_level: 8)
+      require 'psychgus'
       return Psychgus.dump(obj,
         deref_aliases: true, # Dereference aliases for load_yaml()
         line_width: 10000, # Try not to wrap; ichiman!
@@ -142,6 +145,8 @@ module NHKore
     end
     def self.load_yaml(data,file: nil,**kargs)
+      require 'psychgus'
       return Psych.safe_load(data,
         aliases: false,
         filename: file,

data/lib/nhkore/variator.rb CHANGED Viewed

@@ -60,6 +60,7 @@ module NHKore
     attr_accessor :deinflector
     def initialize(*)
+      require 'set' # Must require manually because JapaneseDeinflector is old
       require 'japanese_deinflector'
       super

data/lib/nhkore/version.rb CHANGED Viewed

@@ -22,5 +22,5 @@
 module NHKore
-  VERSION = '0.3.1'
+  VERSION = '0.3.2'
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: nhkore
 version: !ruby/object:Gem::Version
-  version: 0.3.1
+  version: 0.3.2
 platform: ruby
 authors:
 - Jonathan Bradley Whited (@esotericpig)
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-04-20 00:00:00.000000000 Z
+date: 2020-04-21 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bimyou_segmenter
@@ -349,6 +349,7 @@ files:
 - lib/nhkore/entry.rb
 - lib/nhkore/error.rb
 - lib/nhkore/fileable.rb
+- lib/nhkore/lib.rb
 - lib/nhkore/missingno.rb
 - lib/nhkore/news.rb
 - lib/nhkore/polisher.rb
@@ -374,7 +375,7 @@ metadata:
   changelog_uri: https://github.com/esotericpig/nhkore/blob/master/CHANGELOG.md
   homepage_uri: https://github.com/esotericpig/nhkore
   source_code_uri: https://github.com/esotericpig/nhkore
-post_install_message: "  \n  NHKore v0.3.1\n  \n  You can now use [nhkore] on the
+post_install_message: "  \n  NHKore v0.3.2\n  \n  You can now use [nhkore] on the
   command line.\n  \n  Homepage:  https://github.com/esotericpig/nhkore\n  \n  Code:
   \     https://github.com/esotericpig/nhkore\n  Changelog: https://github.com/esotericpig/nhkore/blob/master/CHANGELOG.md\n
   \ Bugs:      https://github.com/esotericpig/nhkore/issues\n  \n"