nhkore 0.3.3 → 0.3.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.yardopts +3 -0
- data/CHANGELOG.md +97 -2
- data/Gemfile +0 -18
- data/Gemfile.lock +89 -0
- data/README.md +58 -30
- data/Rakefile +68 -42
- data/bin/nhkore +4 -15
- data/lib/nhkore.rb +8 -20
- data/lib/nhkore/app.rb +231 -236
- data/lib/nhkore/article.rb +56 -53
- data/lib/nhkore/article_scraper.rb +308 -289
- data/lib/nhkore/cleaner.rb +20 -32
- data/lib/nhkore/cli/fx_cmd.rb +41 -53
- data/lib/nhkore/cli/get_cmd.rb +59 -70
- data/lib/nhkore/cli/news_cmd.rb +145 -154
- data/lib/nhkore/cli/search_cmd.rb +110 -120
- data/lib/nhkore/cli/sift_cmd.rb +111 -227
- data/lib/nhkore/datetime_parser.rb +328 -0
- data/lib/nhkore/defn.rb +48 -55
- data/lib/nhkore/dict.rb +26 -38
- data/lib/nhkore/dict_scraper.rb +31 -40
- data/lib/nhkore/entry.rb +43 -55
- data/lib/nhkore/error.rb +16 -21
- data/lib/nhkore/fileable.rb +10 -21
- data/lib/nhkore/lib.rb +6 -17
- data/lib/nhkore/missingno.rb +21 -33
- data/lib/nhkore/news.rb +61 -66
- data/lib/nhkore/polisher.rb +22 -34
- data/lib/nhkore/scraper.rb +75 -82
- data/lib/nhkore/search_link.rb +85 -78
- data/lib/nhkore/search_scraper.rb +89 -92
- data/lib/nhkore/sifter.rb +157 -171
- data/lib/nhkore/splitter.rb +19 -31
- data/lib/nhkore/user_agents.rb +28 -32
- data/lib/nhkore/util.rb +72 -101
- data/lib/nhkore/variator.rb +20 -32
- data/lib/nhkore/version.rb +4 -16
- data/lib/nhkore/word.rb +105 -99
- data/nhkore.gemspec +58 -65
- data/samples/looper.rb +71 -0
- data/test/nhkore/test_helper.rb +3 -15
- data/test/nhkore_test.rb +6 -18
- metadata +53 -30
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: c63efbc2f65cfe83c7b55e53a0dfca329c2aded4c22ae05c2fb50583876452b4
|
|
4
|
+
data.tar.gz: 87c5116e11cb7e2dd4a5cdb86d6fc1a80ea58dd4efa7bc27ad448c25c4fad724
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 68eb93da6d8f5c8ba3c4c58e0a9a71803dd4eefc6063df4ead9f0d06c0f1ba59892f5ddb43a9735c30ceaf85db63ed80c1b155bac1d5f0daf73f9cebbc7f6c6e
|
|
7
|
+
data.tar.gz: 33e9f4f770bceb2c0eb5d6d62781af400bc2b66f4ba4d4092b01b224bc365edef98cc47032f4b2389e04664f6cabcd2c02024139971bee207a121570805a6015
|
data/.yardopts
ADDED
data/CHANGELOG.md
CHANGED
|
@@ -1,8 +1,96 @@
|
|
|
1
1
|
# Changelog | NHKore
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
Format is based on [Keep a Changelog v1.0.0](https://keepachangelog.com/en/1.0.0),
|
|
6
|
+
and this project adheres to [Semantic Versioning v2.0.0](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.8...HEAD)
|
|
9
|
+
-
|
|
10
|
+
|
|
11
|
+
|
|
12
|
+
## [v0.3.8] - 2021-06-26
|
|
13
|
+
|
|
14
|
+
### Fixed
|
|
15
|
+
- Fixed `App#refresh_cmd()` to also copy Cri's `default_proc` to the new Hash for the command options.
|
|
16
|
+
- Fixed to check for non-strings for JSON & URI.
|
|
17
|
+
- For JSON, convert `StringIO` to string in `DictScraper.scrape()`.
|
|
18
|
+
- For URL, convert URL using `URI()` because `URI.parse()` will crash with a non-string (URI object) in `Scraper.open_url()`.
|
|
19
|
+
- Fixed to scrape multiple HTML Ruby tag words (instead of just 1).
|
|
20
|
+
- I thought multiple Ruby bases/texts (`<rb>`/`<rt>`) were invalid, but after running into the article below and checking the HTML with a validator, it's actually valid HTML:
|
|
21
|
+
- https://www3.nhk.or.jp/news/easy/k10012759201000/k10012759201000.html
|
|
22
|
+
- No previous articles/URLs ran into this problem (would have raised an error), so it should only be a problem with this specific, new article.
|
|
23
|
+
|
|
24
|
+
### Changed
|
|
25
|
+
- Formatted/Linted all code using RuboCop.
|
|
26
|
+
- Updated Gems.
|
|
27
|
+
|
|
28
|
+
|
|
29
|
+
## [v0.3.7] - 2020-11-07
|
|
30
|
+
|
|
31
|
+
### Changed
|
|
32
|
+
- Updated Gem `attr_bool` to v0.2
|
|
33
|
+
- Changed upper-case *'-V'* flag for *version* to be a lower-case *'-v'*
|
|
34
|
+
- Seems like a lot of apps/people expect this
|
|
35
|
+
- Refactored/Formatted some code
|
|
36
|
+
- *nhkore.gemspec* especially
|
|
37
|
+
- Added *samples/*, *Gemfile.lock*, and *.yardopts* to the files in *nhkore.gemspec*
|
|
38
|
+
|
|
39
|
+
### Fixed
|
|
40
|
+
- ArticleScraper
|
|
41
|
+
- Fixed to accept text nodes that have Kanji, due to bad article:
|
|
42
|
+
- https://www3.nhk.or.jp/news/easy/k10012639271000/k10012639271000.html
|
|
43
|
+
- `第3のビール` should have HTML ruby tags around *第*
|
|
44
|
+
|
|
45
|
+
|
|
46
|
+
## [v0.3.6] - 2020-08-18
|
|
47
|
+
|
|
48
|
+
### Added
|
|
49
|
+
- `update_showcase` Rake task for development & personal site (GitHub Page)
|
|
50
|
+
- `$ bundle exec rake update_showcase`
|
|
51
|
+
|
|
52
|
+
### Changed
|
|
53
|
+
- Updated Gems
|
|
54
|
+
|
|
55
|
+
### Fixed
|
|
56
|
+
- ArticleScraper for title for specific site
|
|
57
|
+
- https://www3.nhk.or.jp/news/easy/article/disaster_earthquake_illust.html
|
|
58
|
+
- Ignored `/cgi2.*enqform/` URLs from SearchScraper (Bing)
|
|
59
|
+
- Added more detail to dictionary error in ArticleScraper
|
|
60
|
+
|
|
61
|
+
|
|
62
|
+
## [v0.3.5] - 2020-05-04
|
|
63
|
+
|
|
64
|
+
### Added
|
|
65
|
+
- Added check for environment var `NO_COLOR`
|
|
66
|
+
- [https://no-color.org/](https://no-color.org/)
|
|
67
|
+
|
|
68
|
+
### Fixed
|
|
69
|
+
- Fixed URLs stored in YAML data to always be of type String (not URI)
|
|
70
|
+
- This initially caused a problem in DictScraper.parse_url() from ArticleScraper, but fixed it for all data
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
## [v0.3.4] - 2020-04-25
|
|
74
|
+
|
|
75
|
+
### Added
|
|
76
|
+
- DatetimeParser
|
|
77
|
+
- Extracted from SiftCmd into its own class
|
|
78
|
+
- Fixed some minor logic bugs from the old code
|
|
79
|
+
- Added new feature where 1 range can be empty:
|
|
80
|
+
- `sift ez -d '...2019'` (from = 1924)
|
|
81
|
+
- `sift ez -d '2019...'` (to = current year)
|
|
82
|
+
- `sift ez -d '...'` (still an error)
|
|
83
|
+
- Added `update_core` rake task for dev
|
|
84
|
+
- Makes pushing a new release much easier
|
|
85
|
+
- See *Hacking.Releasing* section in *README*
|
|
86
|
+
|
|
87
|
+
### Fixed
|
|
88
|
+
- SiftCmd `parse_sift_datetime()` for `-d/--datetime` option
|
|
89
|
+
- Didn't work exactly right (as written in *README*) for some special inputs:
|
|
90
|
+
- `-d '2019...3'`
|
|
91
|
+
- `-d '3-3'`
|
|
92
|
+
- `-d '3'`
|
|
4
93
|
|
|
5
|
-
## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.3...master)
|
|
6
94
|
|
|
7
95
|
## [v0.3.3] - 2020-04-23
|
|
8
96
|
|
|
@@ -10,6 +98,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
|
|
10
98
|
- Added JSON support to Sifter & SiftCmd.
|
|
11
99
|
- Added use of `attr_bool` Gem for `attr_accessor?` & `attr_reader?`.
|
|
12
100
|
|
|
101
|
+
|
|
13
102
|
## [v0.3.2] - 2020-04-22
|
|
14
103
|
|
|
15
104
|
### Added
|
|
@@ -33,6 +122,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
|
|
33
122
|
- ArticleScraper
|
|
34
123
|
- Renamed `mode` param to `strict`. `mode` was overshadowing File.open()'s in Scraper.
|
|
35
124
|
|
|
125
|
+
|
|
36
126
|
## [v0.3.1] - 2020-04-20
|
|
37
127
|
|
|
38
128
|
### Changed
|
|
@@ -50,6 +140,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
|
|
50
140
|
- BingScraper
|
|
51
141
|
- Fixed possible RSS infinite loop.
|
|
52
142
|
|
|
143
|
+
|
|
53
144
|
## [v0.3.0] - 2020-04-12
|
|
54
145
|
|
|
55
146
|
### Added
|
|
@@ -84,7 +175,9 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
|
|
84
175
|
- ignore empty filenames in the Zip for safety.
|
|
85
176
|
- ask to overwrite files instead of erroring.
|
|
86
177
|
|
|
178
|
+
|
|
87
179
|
## [v0.2.0] - 2020-04-01
|
|
180
|
+
|
|
88
181
|
First working version.
|
|
89
182
|
|
|
90
183
|
### Added
|
|
@@ -120,7 +213,9 @@ First working version.
|
|
|
120
213
|
- test/nhkore_tester.rb
|
|
121
214
|
- Renamed to `test/nhkore/test_helper.rb`
|
|
122
215
|
|
|
216
|
+
|
|
123
217
|
## [v0.1.0] - 2020-02-24
|
|
218
|
+
|
|
124
219
|
### Added
|
|
125
220
|
- .gitignore
|
|
126
221
|
- CHANGELOG.md
|
data/Gemfile
CHANGED
|
@@ -1,24 +1,6 @@
|
|
|
1
1
|
# encoding: UTF-8
|
|
2
2
|
# frozen_string_literal: true
|
|
3
3
|
|
|
4
|
-
#--
|
|
5
|
-
# This file is part of NHKore.
|
|
6
|
-
# Copyright (c) 2020 Jonathan Bradley Whited (@esotericpig)
|
|
7
|
-
#
|
|
8
|
-
# NHKore is free software: you can redistribute it and/or modify
|
|
9
|
-
# it under the terms of the GNU Lesser General Public License as published by
|
|
10
|
-
# the Free Software Foundation, either version 3 of the License, or
|
|
11
|
-
# (at your option) any later version.
|
|
12
|
-
#
|
|
13
|
-
# NHKore is distributed in the hope that it will be useful,
|
|
14
|
-
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
15
|
-
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
16
|
-
# GNU Lesser General Public License for more details.
|
|
17
|
-
#
|
|
18
|
-
# You should have received a copy of the GNU Lesser General Public License
|
|
19
|
-
# along with NHKore. If not, see <https://www.gnu.org/licenses/>.
|
|
20
|
-
#++
|
|
21
|
-
|
|
22
4
|
|
|
23
5
|
source 'https://rubygems.org'
|
|
24
6
|
|
data/Gemfile.lock
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
PATH
|
|
2
|
+
remote: .
|
|
3
|
+
specs:
|
|
4
|
+
nhkore (0.3.8)
|
|
5
|
+
attr_bool (~> 0.2)
|
|
6
|
+
bimyou_segmenter (~> 1.2)
|
|
7
|
+
cri (~> 2.15)
|
|
8
|
+
down (~> 5.2)
|
|
9
|
+
highline (~> 2.0)
|
|
10
|
+
http-cookie (~> 1.0)
|
|
11
|
+
japanese_deinflector (~> 0.0)
|
|
12
|
+
nokogiri (~> 1.11)
|
|
13
|
+
psychgus (~> 1.3)
|
|
14
|
+
public_suffix (~> 4.0)
|
|
15
|
+
rainbow (~> 3.0)
|
|
16
|
+
rubyzip (~> 2.3)
|
|
17
|
+
tiny_segmenter (~> 0.0)
|
|
18
|
+
tty-progressbar (~> 0.18)
|
|
19
|
+
tty-spinner (~> 0.9)
|
|
20
|
+
|
|
21
|
+
GEM
|
|
22
|
+
remote: https://rubygems.org/
|
|
23
|
+
specs:
|
|
24
|
+
addressable (2.7.0)
|
|
25
|
+
public_suffix (>= 2.0.2, < 5.0)
|
|
26
|
+
attr_bool (0.2.2)
|
|
27
|
+
bimyou_segmenter (1.2.0)
|
|
28
|
+
cri (2.15.11)
|
|
29
|
+
domain_name (0.5.20190701)
|
|
30
|
+
unf (>= 0.0.5, < 1.0.0)
|
|
31
|
+
down (5.2.2)
|
|
32
|
+
addressable (~> 2.5)
|
|
33
|
+
highline (2.0.3)
|
|
34
|
+
http-cookie (1.0.4)
|
|
35
|
+
domain_name (~> 0.5)
|
|
36
|
+
japanese_deinflector (0.0.2)
|
|
37
|
+
mini_portile2 (2.5.3)
|
|
38
|
+
minitest (5.14.4)
|
|
39
|
+
nokogiri (1.11.7)
|
|
40
|
+
mini_portile2 (~> 2.5.0)
|
|
41
|
+
racc (~> 1.4)
|
|
42
|
+
psych (4.0.1)
|
|
43
|
+
psychgus (1.3.4)
|
|
44
|
+
psych (>= 3.0)
|
|
45
|
+
public_suffix (4.0.6)
|
|
46
|
+
racc (1.5.2)
|
|
47
|
+
rainbow (3.0.0)
|
|
48
|
+
rake (13.0.3)
|
|
49
|
+
raketeer (0.2.13)
|
|
50
|
+
rake
|
|
51
|
+
rdoc (6.3.1)
|
|
52
|
+
redcarpet (3.5.1)
|
|
53
|
+
rubyzip (2.3.0)
|
|
54
|
+
strings-ansi (0.2.0)
|
|
55
|
+
tiny_segmenter (0.0.6)
|
|
56
|
+
tty-cursor (0.7.1)
|
|
57
|
+
tty-progressbar (0.18.2)
|
|
58
|
+
strings-ansi (~> 0.2)
|
|
59
|
+
tty-cursor (~> 0.7)
|
|
60
|
+
tty-screen (~> 0.8)
|
|
61
|
+
unicode-display_width (>= 1.6, < 3.0)
|
|
62
|
+
tty-screen (0.8.1)
|
|
63
|
+
tty-spinner (0.9.3)
|
|
64
|
+
tty-cursor (~> 0.7)
|
|
65
|
+
unf (0.1.4)
|
|
66
|
+
unf_ext
|
|
67
|
+
unf_ext (0.0.7.7)
|
|
68
|
+
unicode-display_width (2.0.0)
|
|
69
|
+
yard (0.9.26)
|
|
70
|
+
yard_ghurt (1.2.1)
|
|
71
|
+
rake
|
|
72
|
+
yard
|
|
73
|
+
|
|
74
|
+
PLATFORMS
|
|
75
|
+
ruby
|
|
76
|
+
|
|
77
|
+
DEPENDENCIES
|
|
78
|
+
bundler (~> 2.2)
|
|
79
|
+
minitest (~> 5.14)
|
|
80
|
+
nhkore!
|
|
81
|
+
rake (~> 13.0)
|
|
82
|
+
raketeer (~> 0.2)
|
|
83
|
+
rdoc (~> 6.3)
|
|
84
|
+
redcarpet (~> 3.5)
|
|
85
|
+
yard (~> 0.9)
|
|
86
|
+
yard_ghurt (~> 1.2)
|
|
87
|
+
|
|
88
|
+
BUNDLED WITH
|
|
89
|
+
2.2.20
|
data/README.md
CHANGED
|
@@ -26,6 +26,8 @@ This is similar to a [core word/vocabulary list](https://www.fluentin3months.com
|
|
|
26
26
|
- [News Command](#news-command-)
|
|
27
27
|
- [Using the Library](#using-the-library-)
|
|
28
28
|
- [Hacking](#hacking-)
|
|
29
|
+
- [Updating](#updating-)
|
|
30
|
+
- [Releasing](#releasing-)
|
|
29
31
|
- [License](#license-)
|
|
30
32
|
|
|
31
33
|
## For Non-Power Users [^](#contents)
|
|
@@ -433,18 +435,18 @@ require 'nhkore/scraper'
|
|
|
433
435
|
s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
|
|
434
436
|
open_timeout: 300, # Open timeout in seconds (default: nil)
|
|
435
437
|
read_timeout: 300, # Read timeout in seconds (default: nil)
|
|
436
|
-
|
|
438
|
+
|
|
437
439
|
# Maximum number of times to retry the URL
|
|
438
440
|
# - default: 3
|
|
439
441
|
# - Open/connect will fail a couple of times on a bad/slow internet connection.
|
|
440
442
|
max_retries: 10,
|
|
441
|
-
|
|
443
|
+
|
|
442
444
|
# Maximum number of redirects allowed.
|
|
443
445
|
# - default: 3
|
|
444
446
|
# - You can set this to nil or -1, but I recommend using a number
|
|
445
447
|
# for safety (infinite-loop attack).
|
|
446
448
|
max_redirects: 1,
|
|
447
|
-
|
|
449
|
+
|
|
448
450
|
# How to check redirect URLs for safety.
|
|
449
451
|
# - default: :strict
|
|
450
452
|
# - nil => do not check
|
|
@@ -453,7 +455,7 @@ s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
|
|
|
453
455
|
# - :strict => check the scheme and domain
|
|
454
456
|
# (i.e., if https://bing.com, redirect URL must be https://bing.com)
|
|
455
457
|
redirect_rule: :lenient,
|
|
456
|
-
|
|
458
|
+
|
|
457
459
|
# Set the HTTP header field 'cookie' from the 'set-cookie' response.
|
|
458
460
|
# - default: false
|
|
459
461
|
# - Currently uses the 'http-cookie' Gem.
|
|
@@ -461,7 +463,7 @@ s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
|
|
|
461
463
|
# - Necessary for Search Engines or other sites that require cookies
|
|
462
464
|
# in order to block bots.
|
|
463
465
|
eat_cookie: true,
|
|
464
|
-
|
|
466
|
+
|
|
465
467
|
# Set HTTP header fields.
|
|
466
468
|
# - default: nil
|
|
467
469
|
# - Necessary for Search Engines or other sites that try to block bots.
|
|
@@ -524,9 +526,9 @@ doc = ss.html_doc()
|
|
|
524
526
|
|
|
525
527
|
doc.css('a').each() do |anchor|
|
|
526
528
|
link = anchor['href']
|
|
527
|
-
|
|
528
|
-
next if ss.ignore_link?(link)
|
|
529
|
-
|
|
529
|
+
|
|
530
|
+
next if ss.ignore_link?(link,cleaned: false)
|
|
531
|
+
|
|
530
532
|
if link.include?('https://www3.nhk')
|
|
531
533
|
puts link
|
|
532
534
|
end
|
|
@@ -547,9 +549,9 @@ page_num = 1
|
|
|
547
549
|
|
|
548
550
|
while !next_page.empty?()
|
|
549
551
|
puts "Page #{page_num += 1}: #{next_page.count}"
|
|
550
|
-
|
|
552
|
+
|
|
551
553
|
bs = NHKore::BingScraper.new(:yasashii,url: next_page.url)
|
|
552
|
-
|
|
554
|
+
|
|
553
555
|
next_page = bs.scrape(slinks,next_page)
|
|
554
556
|
end
|
|
555
557
|
|
|
@@ -564,27 +566,28 @@ end
|
|
|
564
566
|
|
|
565
567
|
```Ruby
|
|
566
568
|
require 'nhkore/article_scraper'
|
|
569
|
+
require 'time'
|
|
567
570
|
|
|
568
571
|
as = NHKore::ArticleScraper.new(
|
|
569
572
|
'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html',
|
|
570
|
-
|
|
573
|
+
|
|
571
574
|
# If false, scrape the article leniently (for older articles which
|
|
572
575
|
# may not have certain tags, etc.).
|
|
573
576
|
# - default: true
|
|
574
577
|
strict: false,
|
|
575
|
-
|
|
578
|
+
|
|
576
579
|
# {Dict} to use as the dictionary for words (Easy articles).
|
|
577
580
|
# - default: :scrape
|
|
578
581
|
# - nil => don't scrape/use it (necessary for Regular articles)
|
|
579
582
|
# - :scrape => auto-scrape it using {DictScraper}
|
|
580
583
|
# - {Dict} => your own {Dict}
|
|
581
584
|
dict: nil,
|
|
582
|
-
|
|
585
|
+
|
|
583
586
|
# Date time to use as a fallback if the article doesn't have one
|
|
584
587
|
# (for older articles).
|
|
585
588
|
# - default: nil
|
|
586
589
|
datetime: Time.new(2020,2,2),
|
|
587
|
-
|
|
590
|
+
|
|
588
591
|
# Year to use as a fallback if the article doesn't have one
|
|
589
592
|
# (for older articles).
|
|
590
593
|
# - default: nil
|
|
@@ -621,7 +624,7 @@ require 'nhkore/dict_scraper'
|
|
|
621
624
|
url = 'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html'
|
|
622
625
|
ds = NHKore::DictScraper.new(
|
|
623
626
|
url,
|
|
624
|
-
|
|
627
|
+
|
|
625
628
|
# Change the URL appropriately to the dictionary URL.
|
|
626
629
|
# - default: true
|
|
627
630
|
parse_url: true,
|
|
@@ -634,13 +637,13 @@ dict = ds.scrape()
|
|
|
634
637
|
|
|
635
638
|
dict.entries.each() do |key,entry|
|
|
636
639
|
entry.id
|
|
637
|
-
|
|
640
|
+
|
|
638
641
|
entry.defns.each() do |defn|
|
|
639
642
|
defn.hyoukis.each() {|hyouki| }
|
|
640
643
|
defn.text
|
|
641
644
|
defn.words.each() {|word| }
|
|
642
645
|
end
|
|
643
|
-
|
|
646
|
+
|
|
644
647
|
puts entry.build_hyouki()
|
|
645
648
|
puts entry.build_defn()
|
|
646
649
|
puts '---'
|
|
@@ -687,6 +690,7 @@ end
|
|
|
687
690
|
`Sifter` will sift & sort the `News` data into a single file. The data is sorted by frequency in descending order (i.e., most frequent words first).
|
|
688
691
|
|
|
689
692
|
```Ruby
|
|
693
|
+
require 'nhkore/datetime_parser'
|
|
690
694
|
require 'nhkore/news'
|
|
691
695
|
require 'nhkore/sifter'
|
|
692
696
|
require 'time'
|
|
@@ -698,7 +702,8 @@ sifter = NHKore::Sifter.new(news)
|
|
|
698
702
|
sifter.caption = 'Sakura Fields Forever!'
|
|
699
703
|
|
|
700
704
|
# Filter the data.
|
|
701
|
-
|
|
705
|
+
sifter.filter_by_datetime(NHKore::DatetimeParser.parse_range('2019-12-4...7'))
|
|
706
|
+
sifter.filter_by_datetime([Time.new(2019,12,4),Time.new(2019,12,7)])
|
|
702
707
|
sifter.filter_by_datetime(
|
|
703
708
|
from: Time.new(2019,12,4),to: Time.new(2019,12,7)
|
|
704
709
|
)
|
|
@@ -727,13 +732,14 @@ if !File.exist?(file)
|
|
|
727
732
|
end
|
|
728
733
|
```
|
|
729
734
|
|
|
730
|
-
### Util &
|
|
735
|
+
### Util, UserAgents, & DatetimeParser
|
|
731
736
|
|
|
732
737
|
These provide a variety of useful methods/constants.
|
|
733
738
|
|
|
734
739
|
Here are some of the most useful ones:
|
|
735
740
|
|
|
736
741
|
```Ruby
|
|
742
|
+
require 'nhkore/datetime_parser'
|
|
737
743
|
require 'nhkore/user_agents'
|
|
738
744
|
require 'nhkore/util'
|
|
739
745
|
|
|
@@ -759,14 +765,16 @@ puts
|
|
|
759
765
|
puts '========'
|
|
760
766
|
puts '[ Time ]'
|
|
761
767
|
puts '========'
|
|
762
|
-
puts "JST now: #{Util.jst_now}"
|
|
768
|
+
puts "JST now: #{Util.jst_now()}"
|
|
763
769
|
# Drops in JST_OFFSET, does not change hour/min.
|
|
764
770
|
puts "JST time: #{Util.jst_time(Time.now)}"
|
|
765
771
|
puts "JST year: #{Util::JST_YEAR}"
|
|
766
772
|
puts "1999 sane? #{Util.sane_year?(1999)}" # true
|
|
767
773
|
puts "1776 sane? #{Util.sane_year?(1776)}" # false
|
|
768
|
-
puts "Guess 5: #{
|
|
769
|
-
puts "Guess 99: #{
|
|
774
|
+
puts "Guess 5: #{DatetimeParser.guess_year(5)}" # 2005
|
|
775
|
+
puts "Guess 99: #{DatetimeParser.guess_year(99)}" # 1999
|
|
776
|
+
# => [2020-12-01 00:00:00 +0900, 2020-12-31 23:59:59 +0900]
|
|
777
|
+
puts "Parse: #{DatetimeParser.parse_range('2020-12')}"
|
|
770
778
|
puts
|
|
771
779
|
puts "JST timezone offset: #{Util::JST_OFFSET}"
|
|
772
780
|
puts "JST timezone offset hour: #{Util::JST_OFFSET_HOUR}"
|
|
@@ -781,20 +789,20 @@ JPN = ['桜','ぶ','ブ']
|
|
|
781
789
|
|
|
782
790
|
def fmt_jpn()
|
|
783
791
|
fmt = []
|
|
784
|
-
|
|
792
|
+
|
|
785
793
|
JPN.each() do |x|
|
|
786
794
|
x = yield(x)
|
|
787
795
|
x = x ? "\u2B55" : Util::JPN_SPACE unless x.is_a?(String)
|
|
788
796
|
fmt << x
|
|
789
797
|
end
|
|
790
|
-
|
|
798
|
+
|
|
791
799
|
return "[ #{fmt.join(' | ')} ]"
|
|
792
800
|
end
|
|
793
801
|
|
|
794
802
|
puts " #{fmt_jpn{|x| x}}"
|
|
795
|
-
puts "Hiragana? #{fmt_jpn{|x|
|
|
796
|
-
puts "Kana? #{fmt_jpn{|x|
|
|
797
|
-
puts "Kanji? #{fmt_jpn{|x|
|
|
803
|
+
puts "Hiragana? #{fmt_jpn{|x| Util.hiragana?(x)}}"
|
|
804
|
+
puts "Kana? #{fmt_jpn{|x| Util.kana?(x)}}"
|
|
805
|
+
puts "Kanji? #{fmt_jpn{|x| Util.kanji?(x)}}"
|
|
798
806
|
puts "Reduce: #{Util.reduce_jpn_space("' '")}"
|
|
799
807
|
puts
|
|
800
808
|
|
|
@@ -842,16 +850,36 @@ You can make some changes/fixes to the code and then install your local version:
|
|
|
842
850
|
|
|
843
851
|
`$ bundle exec rake install:local`
|
|
844
852
|
|
|
845
|
-
###
|
|
853
|
+
### Updating [^](#contents)
|
|
854
|
+
|
|
855
|
+
This will update *core/* for you:
|
|
856
|
+
|
|
857
|
+
`$ bundle exec rake update_core`
|
|
858
|
+
|
|
859
|
+
### Releasing [^](#contents)
|
|
860
|
+
|
|
861
|
+
1. Update *CHANGELOG.md*, *version.rb*, & *Gemfile.lock*
|
|
862
|
+
- *Raketary*: `$ raketary bump -v`
|
|
863
|
+
- Run: `$ bundle update`
|
|
864
|
+
2. Run: `$ bundle exec rake update_core`
|
|
865
|
+
3. Run: `$ bundle exec rake clobber pkg_core`
|
|
866
|
+
4. Create a new release & tag
|
|
867
|
+
- Add `pkg/nhkore-core.zip`
|
|
868
|
+
5. Run: `$ git pull`
|
|
869
|
+
6. Upload GitHub package
|
|
870
|
+
- *Raketary*: `$ raketary github_pkg`
|
|
871
|
+
7. Run: `$ bundle exec rake release`
|
|
872
|
+
|
|
873
|
+
Releasing new HTML file for website:
|
|
846
874
|
|
|
847
|
-
`$ bundle exec rake
|
|
875
|
+
1. `$ bundle exec rake update_showcase`
|
|
848
876
|
|
|
849
877
|
## License [^](#contents)
|
|
850
878
|
|
|
851
879
|
[GNU LGPL v3+](LICENSE.txt)
|
|
852
880
|
|
|
853
881
|
> NHKore (<https://github.com/esotericpig/nhkore>)
|
|
854
|
-
> Copyright (c) 2020 Jonathan Bradley Whited
|
|
882
|
+
> Copyright (c) 2020-2021 Jonathan Bradley Whited
|
|
855
883
|
>
|
|
856
884
|
> NHKore is free software: you can redistribute it and/or modify
|
|
857
885
|
> it under the terms of the GNU Lesser General Public License as published by
|