nhkore 0.3.1 → 0.3.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: fb2c0e6e53995b874a9e53c44b024f993032433d1a87c37e7b7bdea69965902d
4
- data.tar.gz: 13d34c53fe9af9efa985c05089b1588eb1e76d6321f9aff18cc5da80598a52d4
3
+ metadata.gz: 445adf6e8abd4da9fd6dd25e9632d5f477b467f6ce8c3dcecae87e3f61305d98
4
+ data.tar.gz: ca812639ff1edd8da835f5bbb2cde403c9cb63e17568fb3ec367eec00605ec17
5
5
  SHA512:
6
- metadata.gz: 643723d42e939a7852eca3b90c3ec4e65085838317eb59c1d8f21f79dd647d2e77e5ea68ab2ff3b5a208608f9bf350121a9918cb318dec6c3047731b73f59294
7
- data.tar.gz: 3481fea3a3895a5b85ac3fcd5a77fe9b811f84e9a19b395a1de1d2e9b31fda93c5fb49a8d7d43581e05cb90c6f844f8537c5a97d73937c2b8ee97728ac7c7a1f
6
+ metadata.gz: 392607205c53aa2a5dfcde244e5fa6137483d216dc27becf06c76798209d2dcf328f17abee2026d795207d4e783a23fd108e615525445f52ca6442560600cd42
7
+ data.tar.gz: 7a1219623b6645bbc633ba9c94e767dcf86be8852a7228c1d5ddd3936f61b884897f680369d4c9d9db5aba8ab4561048d59aed15cecf7ba05695c1957f31b0ea
@@ -2,7 +2,82 @@
2
2
 
3
3
  Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
4
4
 
5
- ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.1...master)
5
+ ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.6...master)
6
+
7
+ ## [v0.3.6] - 2020-08-18
8
+
9
+ ### Added
10
+ - `update_showcase` Rake task for development & personal site (GitHub Page)
11
+ - `$ bundle exec rake update_showcase`
12
+
13
+ ### Changed
14
+ - Updated Gems
15
+
16
+ ### Fixed
17
+ - ArticleScraper for title for specific site
18
+ - https://www3.nhk.or.jp/news/easy/article/disaster_earthquake_illust.html
19
+ - Ignored `/cgi2.*enqform/` URLs from SearchScraper (Bing)
20
+ - Added more detail to dictionary error in ArticleScraper
21
+
22
+ ## [v0.3.5] - 2020-05-04
23
+
24
+ ### Added
25
+ - Added check for environment var `NO_COLOR`
26
+ - [https://no-color.org/](https://no-color.org/)
27
+
28
+ ### Fixed
29
+ - Fixed URLs stored in YAML data to always be of type String (not URI)
30
+ - This initially caused a problem in DictScraper.parse_url() from ArticleScraper, but fixed it for all data
31
+
32
+ ## [v0.3.4] - 2020-04-25
33
+
34
+ ### Added
35
+ - DatetimeParser
36
+ - Extracted from SiftCmd into its own class
37
+ - Fixed some minor logic bugs from the old code
38
+ - Added new feature where 1 range can be empty:
39
+ - `sift ez -d '...2019'` (from = 1924)
40
+ - `sift ez -d '2019...'` (to = current year)
41
+ - `sift ez -d '...'` (still an error)
42
+ - Added `update_core` rake task for dev
43
+ - Makes pushing a new release much easier
44
+ - See *Hacking.Releasing* section in *README*
45
+
46
+ ### Fixed
47
+ - SiftCmd `parse_sift_datetime()` for `-d/--datetime` option
48
+ - Didn't work exactly right (as written in *README*) for some special inputs:
49
+ - `-d '2019...3'`
50
+ - `-d '3-3'`
51
+ - `-d '3'`
52
+
53
+ ## [v0.3.3] - 2020-04-23
54
+
55
+ ### Added
56
+ - Added JSON support to Sifter & SiftCmd.
57
+ - Added use of `attr_bool` Gem for `attr_accessor?` & `attr_reader?`.
58
+
59
+ ## [v0.3.2] - 2020-04-22
60
+
61
+ ### Added
62
+ - lib/nhkore/lib.rb
63
+ - Requires all files, excluding CLI-related files for speed when using this Gem as a library.
64
+ - Scraper
65
+ - Added open_file() & reopen().
66
+ - samples/looper.rb
67
+ - Script example of continuously scraping all articles.
68
+
69
+ ### Changed
70
+ - README
71
+ - Finished writing the initial version of all sections.
72
+ - ArticleScraper
73
+ - Changed the `year` param to expect an int, instead of a string.
74
+ - Sifter
75
+ - In filter_by_datetime(), renamed keyword args `from_filter,to_filter` to simply `from,to`.
76
+
77
+ ### Fixed
78
+ - Reduced load time of app a tiny bit more (see v0.3.1 for details).
79
+ - ArticleScraper
80
+ - Renamed `mode` param to `strict`. `mode` was overshadowing File.open()'s in Scraper.
6
81
 
7
82
  ## [v0.3.1] - 2020-04-20
8
83
 
@@ -11,10 +86,13 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
11
86
  - NewsCmd/SiftCmd
12
87
  - Added `--no-sha256` option to not check if article links have already been scraped based on their contents' SHA-256.
13
88
  - Util
14
- - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows & Linux, it will check for both `/` & `\`.
89
+ - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows & Linux, it will check for both `/` & `\`.
15
90
 
16
91
  ### Fixed
17
- - Reduced load time of app from ~1s to ~0.3-5s by moving some requires into methods.
92
+ - Reduced load time of app from about 1s to about 0.3-0.5s.
93
+ - Moved many `require '...'` statements into methods.
94
+ - It looks ugly & is not good coding practice, but a necessary evil.
95
+ - Load time is still pretty slow (but a lot better!).
18
96
  - BingScraper
19
97
  - Fixed possible RSS infinite loop.
20
98
 
data/README.md CHANGED
@@ -26,6 +26,8 @@ This is similar to a [core word/vocabulary list](https://www.fluentin3months.com
26
26
  - [News Command](#news-command-)
27
27
  - [Using the Library](#using-the-library-)
28
28
  - [Hacking](#hacking-)
29
+ - [Updating](#updating-)
30
+ - [Releasing](#releasing-)
29
31
  - [License](#license-)
30
32
 
31
33
  ## For Non-Power Users [^](#contents)
@@ -110,11 +112,12 @@ Example usage:
110
112
 
111
113
  `$ nhkore -t 300 -m 10 news -D -L -M -d '2011-03-07 06:30' easy -u 'https://www3.nhk.or.jp/news/easy/tsunamikeihou/index.html'`
112
114
 
113
- Now that the data from the article has been scraped, you can generate a CSV/HTML/YAML file of the words ordered by frequency:
115
+ Now that the data from the article has been scraped, you can generate a CSV/HTML/JSON/YAML file of the words ordered by frequency:
114
116
 
115
117
  ```
116
118
  $ nhkore sift easy -e csv
117
119
  $ nhkore sift easy -e html
120
+ $ nhkore sift easy -e json
118
121
  $ nhkore sift easy -e yml
119
122
  ```
120
123
 
@@ -154,11 +157,11 @@ After obtaining the scraped data, you can `sift` all of the data (or select data
154
157
  | --- | --- |
155
158
  | CSV | For uploading to a flashcard website (e.g., Memrise, Anki, Buffl) after changing the data appropriately. |
156
159
  | HTML | For comfortable viewing in a web browser or for sharing. |
157
- | YAML | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
160
+ | YAML/JSON | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
158
161
 
159
162
  The data is sorted by frequency in descending order (i.e., most frequent words first).
160
163
 
161
- If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/HTML file.
164
+ If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/JSON/HTML file.
162
165
 
163
166
  The defaults will sift all of the data into a CSV file, which may not be what you want:
164
167
 
@@ -203,7 +206,7 @@ You can save the data to a different format using one of these options:
203
206
 
204
207
  ```
205
208
  -e --ext=<value> type of file (extension) to save;
206
- valid options: [csv, htm, html, yaml, yml];
209
+ valid options: [csv, htm, html, json, yaml, yml];
207
210
  not needed if you specify a file extension with
208
211
  the '--out' option: '--out sift.html'
209
212
  (default: csv)
@@ -293,7 +296,7 @@ links:
293
296
 
294
297
  If you don't wish to edit this file by hand (or programmatically), that's where the `search` command comes into play.
295
298
 
296
- Currently, it only searches &amp; scrapes `bing.com`, but other search engines and/or methods can easily be added in the future.
299
+ Currently, it only searches & scrapes `bing.com`, but other search engines and/or methods can easily be added in the future.
297
300
 
298
301
  Example usage:
299
302
 
@@ -319,6 +322,49 @@ Complete demo:
319
322
 
320
323
  #### News Command [^](#contents)
321
324
 
325
+ In [The Basics](#the-basics-), you learned how to scrape 1 article using the `-u/--url` option with the `news` command.
326
+
327
+ After creating a file of links from the [search](#search-command-) command (or manually/programmatically), you can also scrape multiple articles from this file using the `news` command.
328
+
329
+ The defaults will scrape the 1st unscraped article from the `links` file:
330
+
331
+ `$ nhkore news easy`
332
+
333
+ You can scrape the 1st **X** unscraped articles with the `-s/--scrape` option:
334
+
335
+ ```
336
+ # Scrape the 1st 11 unscraped articles.
337
+ $ nhkore news -s 11 easy
338
+ ```
339
+
340
+ You may wish to re-scrape articles that have already been scraped with the `-r/--redo` option:
341
+
342
+ `$ nhkore news -r -s 11 easy`
343
+
344
+ If you only wish to scrape specific article links, then you should use the `-k/--like` option, which does a fuzzy search on the URLs. For example, `--like '00123'` will match these links:
345
+
346
+ - http<span>s://w</span>ww3.nhk.or.jp/news/easy/k1**00123**23711000/k10012323711000.html
347
+ - http<span>s://w</span>ww3.nhk.or.jp/news/easy/k1**00123**21401000/k10012321401000.html
348
+ - http<span>s://w</span>ww3.nhk.or.jp/news/easy/k1**00123**21511000/k10012321511000.html
349
+ - ...
350
+
351
+ `$ nhkore news -k '00123' -s 11 easy`
352
+
353
+ Lastly, you can show the dictionary URL and contents for the 1st article if you're getting dictionary-related errors:
354
+
355
+ ```
356
+ # This will exit after showing the 1st article's dictionary.
357
+ $ nhkore news easy --show-dict
358
+ ```
359
+
360
+ For the rest of the options, please see [The Basics](#the-basics-).
361
+
362
+ Complete demo:
363
+
364
+ [![asciinema Demo - News](https://asciinema.org/a/322324.png)](https://asciinema.org/a/322324)
365
+
366
+ When I first scraped all of the articles in [nhkore-core.zip](https://github.com/esotericpig/nhkore/releases/latest), I had to use this [script](samples/looper.rb) because my internet isn't very good.
367
+
322
368
  ## Using the Library [^](#contents)
323
369
 
324
370
  ### Setup
@@ -336,11 +382,439 @@ In your *Gemfile*:
336
382
  ```Ruby
337
383
  # Pick one...
338
384
  gem 'nhkore', '~> X.X'
339
- gem 'nhkore', :git => 'https://github.com/esotericpig/psychgus.git', :tag => 'vX.X.X'
385
+ gem 'nhkore', :git => 'https://github.com/esotericpig/nhkore.git', :tag => 'vX.X.X'
386
+ ```
387
+
388
+ ### Require
389
+
390
+ In order to not require all of the CLI-related files, require this file instead:
391
+
392
+ ```Ruby
393
+ require 'nhkore/lib'
394
+
395
+ #require 'nhkore' # Slower
340
396
  ```
341
397
 
342
398
  ### Scraper
343
399
 
400
+ All scraper classes extend this class. You can either extend it or use it by itself. It's a simple wrapper around *open-uri*, *Nokogiri*, etc.
401
+
402
+ `initialize` automatically opens (connects to) the URL.
403
+
404
+ ```Ruby
405
+ require 'nhkore/scraper'
406
+
407
+ class MyScraper < NHKore::Scraper
408
+ def initialize()
409
+ super('https://www3.nhk.or.jp/news/easy/')
410
+ end
411
+ end
412
+
413
+ m = MyScraper.new()
414
+ s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/')
415
+
416
+ # Read all content into a String.
417
+ mstr = m.read()
418
+ sstr = s.read()
419
+
420
+ # Get a Nokogiri::HTML object.
421
+ mdoc = m.html_doc()
422
+ sdoc = s.html_doc()
423
+
424
+ # Get a RSS object.
425
+ s = NHKore::Scraper.new('https://www.bing.com/search?format=rss&q=site%3Anhk.or.jp%2Fnews%2Feasy%2F&count=100')
426
+
427
+ rss = s.rss_doc()
428
+ ```
429
+
430
+ There are several useful options:
431
+
432
+ ```Ruby
433
+ require 'nhkore/scraper'
434
+
435
+ s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
436
+ open_timeout: 300, # Open timeout in seconds (default: nil)
437
+ read_timeout: 300, # Read timeout in seconds (default: nil)
438
+
439
+ # Maximum number of times to retry the URL
440
+ # - default: 3
441
+ # - Open/connect will fail a couple of times on a bad/slow internet connection.
442
+ max_retries: 10,
443
+
444
+ # Maximum number of redirects allowed.
445
+ # - default: 3
446
+ # - You can set this to nil or -1, but I recommend using a number
447
+ # for safety (infinite-loop attack).
448
+ max_redirects: 1,
449
+
450
+ # How to check redirect URLs for safety.
451
+ # - default: :strict
452
+ # - nil => do not check
453
+ # - :lenient => check the scheme only
454
+ # (i.e., if https, redirect URL must be https)
455
+ # - :strict => check the scheme and domain
456
+ # (i.e., if https://bing.com, redirect URL must be https://bing.com)
457
+ redirect_rule: :lenient,
458
+
459
+ # Set the HTTP header field 'cookie' from the 'set-cookie' response.
460
+ # - default: false
461
+ # - Currently uses the 'http-cookie' Gem.
462
+ # - This is currently a time-consuming operation because it opens the URL twice.
463
+ # - Necessary for Search Engines or other sites that require cookies
464
+ # in order to block bots.
465
+ eat_cookie: true,
466
+
467
+ # Set HTTP header fields.
468
+ # - default: nil
469
+ # - Necessary for Search Engines or other sites that try to block bots.
470
+ # - Simply pass in a Hash (not nil) to set the default ones.
471
+ header: {'user-agent' => 'Skynet'}, # Must use strings
472
+ )
473
+
474
+ # Open the URL yourself. This will be passed in directly to Nokogiri::HTML().
475
+ # - In this way, you can use Faraday, HTTParty, RestClient, httprb/http, or
476
+ # some other Gem.
477
+ s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/',
478
+ str_or_io: URI.open('https://www3.nhk.or.jp/news/easy/',redirect: false)
479
+ )
480
+
481
+ # Open and parse a file instead of a URL (for offline testing or slow internet).
482
+ s = NHKore::Scraper.new('./my_article.html',is_file: true)
483
+
484
+ doc = s.html_doc()
485
+ ```
486
+
487
+ Here are some other useful methods:
488
+
489
+ ```Ruby
490
+ require 'nhkore/scraper'
491
+
492
+ s = NHKore::Scraper.new('https://www3.nhk.or.jp/news/easy/')
493
+
494
+ s.reopen() # Re-open the current URL.
495
+
496
+ # Get a relative URL.
497
+ url = s.join_url('../../monkey.html')
498
+ puts url # https://www3.nhk.or.jp/monkey.html
499
+
500
+ # Open a new URL or file.
501
+ s.open(url)
502
+ s.open(url,URI.open(url,redirect: false))
503
+
504
+ s.open('./my_article.html',is_file: true)
505
+
506
+ # Open a file manually.
507
+ s.open_file('./my_article.html')
508
+
509
+ # Fetch the cookie & open a new URL manually.
510
+ s.fetch_cookie(url)
511
+ s.open_url(url)
512
+ ```
513
+
514
+ ### SearchScraper & BingScraper
515
+
516
+ `SearchScraper` is used for scraping Search Engines for NHK News Web (Easy) links. It can also be used for search in general.
517
+
518
+ By default, it sets the default HTTP header fields and fetches & sets the cookie.
519
+
520
+ ```Ruby
521
+ require 'nhkore/search_scraper'
522
+
523
+ ss = NHKore::SearchScraper.new('https://www.bing.com/search?q=nhk&count=100')
524
+
525
+ doc = ss.html_doc()
526
+
527
+ doc.css('a').each() do |anchor|
528
+ link = anchor['href']
529
+
530
+ next if ss.ignore_link?(link,cleaned: false)
531
+
532
+ if link.include?('https://www3.nhk')
533
+ puts link
534
+ end
535
+ end
536
+ ```
537
+
538
+ `BingScraper` will search `bing.com` for you.
539
+
540
+ ```Ruby
541
+ require 'nhkore/search_link'
542
+ require 'nhkore/search_scraper'
543
+
544
+ bs = NHKore::BingScraper.new(:yasashii)
545
+ slinks = NHKore::SearchLinks.new()
546
+
547
+ next_page = bs.scrape(slinks)
548
+ page_num = 1
549
+
550
+ while !next_page.empty?()
551
+ puts "Page #{page_num += 1}: #{next_page.count}"
552
+
553
+ bs = NHKore::BingScraper.new(:yasashii,url: next_page.url)
554
+
555
+ next_page = bs.scrape(slinks,next_page)
556
+ end
557
+
558
+ slinks.links.values.each() do |link|
559
+ puts link.url
560
+ end
561
+ ```
562
+
563
+ ### ArticleScraper & DictScraper
564
+
565
+ `ArticleScraper` scrapes an NHK News Web Easy article. Regular articles aren't currently supported.
566
+
567
+ ```Ruby
568
+ require 'nhkore/article_scraper'
569
+ require 'time'
570
+
571
+ as = NHKore::ArticleScraper.new(
572
+ 'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html',
573
+
574
+ # If false, scrape the article leniently (for older articles which
575
+ # may not have certain tags, etc.).
576
+ # - default: true
577
+ strict: false,
578
+
579
+ # {Dict} to use as the dictionary for words (Easy articles).
580
+ # - default: :scrape
581
+ # - nil => don't scrape/use it (necessary for Regular articles)
582
+ # - :scrape => auto-scrape it using {DictScraper}
583
+ # - {Dict} => your own {Dict}
584
+ dict: nil,
585
+
586
+ # Date time to use as a fallback if the article doesn't have one
587
+ # (for older articles).
588
+ # - default: nil
589
+ datetime: Time.new(2020,2,2),
590
+
591
+ # Year to use as a fallback if the article doesn't have one
592
+ # (for older articles).
593
+ # - default: nil
594
+ year: 2020,
595
+ )
596
+
597
+ article = as.scrape()
598
+
599
+ article.datetime
600
+ article.futsuurl
601
+ article.sha256
602
+ article.title
603
+ article.url
604
+
605
+ article.words.each() do |key,word|
606
+ word.defn
607
+ word.eng
608
+ word.freq
609
+ word.kana
610
+ word.kanji
611
+ word.key
612
+ end
613
+
614
+ puts article.to_s(mini: true)
615
+ puts '---'
616
+ puts article
617
+ ```
618
+
619
+ `DictScraper` scrapes an Easy article's dictionary file (JSON).
620
+
621
+ ```Ruby
622
+ require 'nhkore/dict_scraper'
623
+
624
+ url = 'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html'
625
+ ds = NHKore::DictScraper.new(
626
+ url,
627
+
628
+ # Change the URL appropriately to the dictionary URL.
629
+ # - default: true
630
+ parse_url: true,
631
+ )
632
+
633
+ puts NHKore::DictScraper.parse_url(url)
634
+ puts
635
+
636
+ dict = ds.scrape()
637
+
638
+ dict.entries.each() do |key,entry|
639
+ entry.id
640
+
641
+ entry.defns.each() do |defn|
642
+ defn.hyoukis.each() {|hyouki| }
643
+ defn.text
644
+ defn.words.each() {|word| }
645
+ end
646
+
647
+ puts entry.build_hyouki()
648
+ puts entry.build_defn()
649
+ puts '---'
650
+ end
651
+
652
+ puts
653
+ puts dict
654
+ ```
655
+
656
+ ### Fileable
657
+
658
+ Any class that includes the `Fileable` mixin will have the following methods:
659
+
660
+ - Class.load_file(file,mode: 'rt:BOM|UTF-8',**kargs)
661
+ - save_file(file,mode: 'wt',**kargs)
662
+
663
+ Any *kargs* will be passed to `File.open()`.
664
+
665
+ ```Ruby
666
+ require 'nhkore/news'
667
+ require 'nhkore/search_link'
668
+
669
+ yn = NHKore::YasashiiNews.load_file()
670
+ sl = NHKore::SearchLinks.load_file(NHKore::SearchLinks::DEFAULT_YASASHII_FILE)
671
+
672
+ yn.articles.each() {|key,article| }
673
+ yn.sha256s.each() {|sha256,url| }
674
+
675
+ sl.links.each() do |key,link|
676
+ link.datetime
677
+ link.futsuurl
678
+ link.scraped?
679
+ link.sha256
680
+ link.title
681
+ link.url
682
+ end
683
+
684
+ #yn.save_file()
685
+ #sl.save_file(NHKore::SearchLinks::DEFAULT_YASASHII_FILE)
686
+ ```
687
+
688
+ ### Sifter
689
+
690
+ `Sifter` will sift & sort the `News` data into a single file. The data is sorted by frequency in descending order (i.e., most frequent words first).
691
+
692
+ ```Ruby
693
+ require 'nhkore/datetime_parser'
694
+ require 'nhkore/news'
695
+ require 'nhkore/sifter'
696
+ require 'time'
697
+
698
+ news = NHKore::YasashiiNews.load_file()
699
+
700
+ sifter = NHKore::Sifter.new(news)
701
+
702
+ sifter.caption = 'Sakura Fields Forever!'
703
+
704
+ # Filter the data.
705
+ sifter.filter_by_datetime(NHKore::DatetimeParser.parse_range('2019-12-4...7'))
706
+ sifter.filter_by_datetime([Time.new(2019,12,4),Time.new(2019,12,7)])
707
+ sifter.filter_by_datetime(
708
+ from: Time.new(2019,12,4),to: Time.new(2019,12,7)
709
+ )
710
+ sifter.filter_by_title('桜')
711
+ sifter.filter_by_url('k100')
712
+
713
+ # Ignore certain columns from the output.
714
+ sifter.ignore(:defn)
715
+ sifter.ignore(:eng)
716
+
717
+ # An array of the sifted words.
718
+ words = sifter.sift() # Filtered & Sorted array of Word
719
+ rows = sifter.build_rows(words) # Ignored array of array
720
+
721
+ # Choose the file format.
722
+ #sifter.put_csv!()
723
+ #sifter.put_html!()
724
+ #sifter.put_json!()
725
+ sifter.put_yaml!()
726
+
727
+ # Save to a file.
728
+ file = 'sakura.yml'
729
+
730
+ if !File.exist?(file)
731
+ sifter.save_file(file)
732
+ end
733
+ ```
734
+
735
+ ### Util, UserAgents, & DatetimeParser
736
+
737
+ These provide a variety of useful methods/constants.
738
+
739
+ Here are some of the most useful ones:
740
+
741
+ ```Ruby
742
+ require 'nhkore/datetime_parser'
743
+ require 'nhkore/user_agents'
744
+ require 'nhkore/util'
745
+
746
+ include NHKore
747
+
748
+ puts '======='
749
+ puts '[ Net ]'
750
+ puts '======='
751
+ # Get a random User Agent for HTTP header field 'User-Agent'.
752
+ # - This is used by default in Scraper/SearchScraper.
753
+ puts "User-Agent: #{UserAgents.sample()}"
754
+
755
+ uri = URI('https://www.bing.com/search?q=nhk')
756
+ Util.replace_uri_query!(uri,q: 'banana')
757
+
758
+ puts "URI query: #{uri}" # https://www.bing.com/search?q=banana
759
+ # nhk.or.jp
760
+ puts "Domain: #{Util.domain(URI('https://www.nhk.or.jp/news/easy').host)}"
761
+ # Ben &amp; Jerry&#39;s<br>
762
+ puts "Escape HTML: #{Util.escape_html("Ben & Jerry's\n")}"
763
+ puts
764
+
765
+ puts '========'
766
+ puts '[ Time ]'
767
+ puts '========'
768
+ puts "JST now: #{Util.jst_now()}"
769
+ # Drops in JST_OFFSET, does not change hour/min.
770
+ puts "JST time: #{Util.jst_time(Time.now)}"
771
+ puts "JST year: #{Util::JST_YEAR}"
772
+ puts "1999 sane? #{Util.sane_year?(1999)}" # true
773
+ puts "1776 sane? #{Util.sane_year?(1776)}" # false
774
+ puts "Guess 5: #{DatetimeParser.guess_year(5)}" # 2005
775
+ puts "Guess 99: #{DatetimeParser.guess_year(99)}" # 1999
776
+ # => [2020-12-01 00:00:00 +0900, 2020-12-31 23:59:59 +0900]
777
+ puts "Parse: #{DatetimeParser.parse_range('2020-12')}"
778
+ puts
779
+ puts "JST timezone offset: #{Util::JST_OFFSET}"
780
+ puts "JST timezone offset hour: #{Util::JST_OFFSET_HOUR}"
781
+ puts "JST timezone offset minute: #{Util::JST_OFFSET_MIN}"
782
+ puts
783
+
784
+ puts '============'
785
+ puts '[ Japanese ]'
786
+ puts '============'
787
+
788
+ JPN = ['桜','ぶ','ブ']
789
+
790
+ def fmt_jpn()
791
+ fmt = []
792
+
793
+ JPN.each() do |x|
794
+ x = yield(x)
795
+ x = x ? "\u2B55" : Util::JPN_SPACE unless x.is_a?(String)
796
+ fmt << x
797
+ end
798
+
799
+ return "[ #{fmt.join(' | ')} ]"
800
+ end
801
+
802
+ puts " #{fmt_jpn{|x| x}}"
803
+ puts "Hiragana? #{fmt_jpn{|x| Util.hiragana?(x)}}"
804
+ puts "Kana? #{fmt_jpn{|x| Util.kana?(x)}}"
805
+ puts "Kanji? #{fmt_jpn{|x| Util.kanji?(x)}}"
806
+ puts "Reduce: #{Util.reduce_jpn_space("' '")}"
807
+ puts
808
+
809
+ puts '========='
810
+ puts '[ Files ]'
811
+ puts '========='
812
+ puts "Dir str? #{Util.dir_str?('dir/')}" # true
813
+ puts "Dir str? #{Util.dir_str?('dir')}" # false
814
+ puts "File str? #{Util.filename_str?('file')}" # true
815
+ puts "File str? #{Util.filename_str?('dir/file')}" # false
816
+ ```
817
+
344
818
  ## Hacking [^](#contents)
345
819
 
346
820
  ```
@@ -370,13 +844,35 @@ $ bundle exec rake nokogiri_other # macOS, Windows, etc.
370
844
 
371
845
  `$ bundle exec rake doc`
372
846
 
373
- ### Installing Locally (without Network Access)
847
+ ### Installing Locally
848
+
849
+ You can make some changes/fixes to the code and then install your local version:
374
850
 
375
851
  `$ bundle exec rake install:local`
376
852
 
377
- ### Releasing/Publishing
853
+ ### Updating [^](#contents)
854
+
855
+ This will update *core/* for you:
856
+
857
+ `$ bundle exec rake update_core`
858
+
859
+ ### Releasing [^](#contents)
860
+
861
+ 1. Update *CHANGELOG.md*, *version.rb*, & *Gemfile.lock*
862
+ - *Raketary*: `$ raketary bump -v`
863
+ - Run: `$ bundle update`
864
+ 2. Run: `$ bundle exec rake update_core`
865
+ 3. Run: `$ bundle exec rake clobber pkg_core`
866
+ 4. Create a new release & tag
867
+ - Add `pkg/nhkore-core.zip`
868
+ 5. Run: `$ git pull`
869
+ 6. Upload GitHub package
870
+ - *Raketary*: `$ raketary github_pkg`
871
+ 7. Run: `$ bundle exec rake release`
872
+
873
+ Releasing new HTML file for website:
378
874
 
379
- `$ bundle exec rake release`
875
+ 1. `$ bundle exec rake update_showcase`
380
876
 
381
877
  ## License [^](#contents)
382
878