nhkore 0.3.2 → 0.3.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cf151c3859812632f09b1a464164f31bb0ce050f37ed7e7377f76265571ebd41
4
- data.tar.gz: 1f3ee801e7557731cae4aeacd3f18fea4d7f33ac65b6ec77511a7d3d8f17856a
3
+ metadata.gz: 2b9464ae2a62f0c9cc797f2f70028d2b7afd6f0677a431c54e5453690175ca29
4
+ data.tar.gz: 577987179a9001926629f1efd8e8d39fb61ad62543d24ea1caa4f1fe063fd1a4
5
5
  SHA512:
6
- metadata.gz: 7e7d0d5b805ad6fa4312e8be26f3115dff18665b3762073c56db3a7a6a343a3ee6a05e47889e0abf7b62df3bb84cf5c977fce3efdfeb8a65c7bcff8167839d35
7
- data.tar.gz: 957bc3da8492310d287a8947b9080f8be417f0874c3226db4f0bb63d020bee06c51a3da81c1fa3f779de22d354a32ab4cf41fc6f3018840774c31fd7060fbec3
6
+ metadata.gz: '09d90011d4d641ea54c9dd7ebc8fd95efc8f7e68211e4322c1f3294e15a21303147de1eea2532694d5a01caaaf3c73f9a5172479193113be86b5b7a9fd08b910'
7
+ data.tar.gz: 65512547e6ee13503b345402e2eb1ba799e492131975f518cc96576a684bc97d48efb0f9c22787518f8dc62998e44fba1089ca2a1687b98e0533a489091e61c1
@@ -0,0 +1,3 @@
1
+ --files 'CHANGELOG.md,LICENSE.txt'
2
+ --protected
3
+ --readme 'README.md'
@@ -2,7 +2,75 @@
2
2
 
3
3
  Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
4
4
 
5
- ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.2...master)
5
+ ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.7...HEAD)
6
+
7
+ ## [v0.3.7] - 2020-11-07
8
+
9
+ ### Changed
10
+ - Updated Gem `attr_bool` to v0.2
11
+ - Changed upper-case *'-V'* flag for *version* to be a lower-case *'-v'*
12
+ - Seems like a lot of apps/people expect this
13
+ - Refactored/Formatted some code
14
+ - *nhkore.gemspec* especially
15
+ - Added *samples/*, *Gemfile.lock*, and *.yardopts* to the files in *nhkore.gemspec*
16
+
17
+ ### Fixed
18
+ - ArticleScraper
19
+ - Fixed to accept text nodes that have Kanji, due to bad article:
20
+ - https://www3.nhk.or.jp/news/easy/k10012639271000/k10012639271000.html
21
+ - `第3のビール` should have HTML ruby tags around *第*
22
+
23
+ ## [v0.3.6] - 2020-08-18
24
+
25
+ ### Added
26
+ - `update_showcase` Rake task for development & personal site (GitHub Page)
27
+ - `$ bundle exec rake update_showcase`
28
+
29
+ ### Changed
30
+ - Updated Gems
31
+
32
+ ### Fixed
33
+ - ArticleScraper for title for specific site
34
+ - https://www3.nhk.or.jp/news/easy/article/disaster_earthquake_illust.html
35
+ - Ignored `/cgi2.*enqform/` URLs from SearchScraper (Bing)
36
+ - Added more detail to dictionary error in ArticleScraper
37
+
38
+ ## [v0.3.5] - 2020-05-04
39
+
40
+ ### Added
41
+ - Added check for environment var `NO_COLOR`
42
+ - [https://no-color.org/](https://no-color.org/)
43
+
44
+ ### Fixed
45
+ - Fixed URLs stored in YAML data to always be of type String (not URI)
46
+ - This initially caused a problem in DictScraper.parse_url() from ArticleScraper, but fixed it for all data
47
+
48
+ ## [v0.3.4] - 2020-04-25
49
+
50
+ ### Added
51
+ - DatetimeParser
52
+ - Extracted from SiftCmd into its own class
53
+ - Fixed some minor logic bugs from the old code
54
+ - Added new feature where 1 range can be empty:
55
+ - `sift ez -d '...2019'` (from = 1924)
56
+ - `sift ez -d '2019...'` (to = current year)
57
+ - `sift ez -d '...'` (still an error)
58
+ - Added `update_core` rake task for dev
59
+ - Makes pushing a new release much easier
60
+ - See *Hacking.Releasing* section in *README*
61
+
62
+ ### Fixed
63
+ - SiftCmd `parse_sift_datetime()` for `-d/--datetime` option
64
+ - Didn't work exactly right (as written in *README*) for some special inputs:
65
+ - `-d '2019...3'`
66
+ - `-d '3-3'`
67
+ - `-d '3'`
68
+
69
+ ## [v0.3.3] - 2020-04-23
70
+
71
+ ### Added
72
+ - Added JSON support to Sifter & SiftCmd.
73
+ - Added use of `attr_bool` Gem for `attr_accessor?` & `attr_reader?`.
6
74
 
7
75
  ## [v0.3.2] - 2020-04-22
8
76
 
@@ -23,10 +91,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
23
91
  - In filter_by_datetime(), renamed keyword args `from_filter,to_filter` to simply `from,to`.
24
92
 
25
93
  ### Fixed
26
- - Reduced load time of app from ~1s to 0.3~0.5s.
27
- - Moved many `require '...'` statements into methods.
28
- - It looks ugly & is not a good coding practice, but a necessary evil.
29
- - Load time is still pretty slow (but a lot better!).
94
+ - Reduced load time of app a tiny bit more (see v0.3.1 for details).
30
95
  - ArticleScraper
31
96
  - Renamed `mode` param to `strict`. `mode` was overshadowing File.open()'s in Scraper.
32
97
 
@@ -40,7 +105,10 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
40
105
  - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows & Linux, it will check for both `/` & `\`.
41
106
 
42
107
  ### Fixed
43
- - Reduced load time of app from ~1s to ~0.3-5s by moving some requires into methods.
108
+ - Reduced load time of app from about 1s to about 0.3-0.5s.
109
+ - Moved many `require '...'` statements into methods.
110
+ - It looks ugly & is not good coding practice, but a necessary evil.
111
+ - Load time is still pretty slow (but a lot better!).
44
112
  - BingScraper
45
113
  - Fixed possible RSS infinite loop.
46
114
 
@@ -0,0 +1,86 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ nhkore (0.3.7)
5
+ attr_bool (~> 0.2)
6
+ bimyou_segmenter (~> 1.2)
7
+ cri (~> 2.15)
8
+ down (~> 5.1)
9
+ highline (~> 2.0)
10
+ http-cookie (~> 1.0)
11
+ japanese_deinflector (~> 0.0)
12
+ nokogiri (~> 1.10)
13
+ psychgus (~> 1.3)
14
+ public_suffix (~> 4.0)
15
+ rainbow (~> 3.0)
16
+ rubyzip (~> 2.3)
17
+ tiny_segmenter (~> 0.0)
18
+ tty-progressbar (~> 0.17)
19
+ tty-spinner (~> 0.9)
20
+
21
+ GEM
22
+ remote: https://rubygems.org/
23
+ specs:
24
+ addressable (2.7.0)
25
+ public_suffix (>= 2.0.2, < 5.0)
26
+ attr_bool (0.2.1)
27
+ bimyou_segmenter (1.2.0)
28
+ cri (2.15.10)
29
+ domain_name (0.5.20190701)
30
+ unf (>= 0.0.5, < 1.0.0)
31
+ down (5.2.0)
32
+ addressable (~> 2.5)
33
+ highline (2.0.3)
34
+ http-cookie (1.0.3)
35
+ domain_name (~> 0.5)
36
+ japanese_deinflector (0.0.2)
37
+ mini_portile2 (2.4.0)
38
+ minitest (5.14.2)
39
+ nokogiri (1.10.10)
40
+ mini_portile2 (~> 2.4.0)
41
+ psych (3.2.0)
42
+ psychgus (1.3.3)
43
+ psych (>= 3.0)
44
+ public_suffix (4.0.6)
45
+ rainbow (3.0.0)
46
+ rake (13.0.1)
47
+ raketeer (0.2.9)
48
+ rake
49
+ rdoc (6.2.1)
50
+ redcarpet (3.5.0)
51
+ rubyzip (2.3.0)
52
+ strings-ansi (0.1.0)
53
+ tiny_segmenter (0.0.6)
54
+ tty-cursor (0.7.1)
55
+ tty-progressbar (0.17.0)
56
+ strings-ansi (~> 0.1.0)
57
+ tty-cursor (~> 0.7)
58
+ tty-screen (~> 0.7)
59
+ unicode-display_width (~> 1.6)
60
+ tty-screen (0.8.1)
61
+ tty-spinner (0.9.3)
62
+ tty-cursor (~> 0.7)
63
+ unf (0.1.4)
64
+ unf_ext
65
+ unf_ext (0.0.7.7)
66
+ unicode-display_width (1.7.0)
67
+ yard (0.9.25)
68
+ yard_ghurt (1.2.0)
69
+ rake
70
+
71
+ PLATFORMS
72
+ ruby
73
+
74
+ DEPENDENCIES
75
+ bundler (~> 2.1)
76
+ minitest (~> 5.14)
77
+ nhkore!
78
+ rake (~> 13.0)
79
+ raketeer (~> 0.2)
80
+ rdoc (~> 6.2)
81
+ redcarpet (~> 3.5)
82
+ yard (~> 0.9)
83
+ yard_ghurt (~> 1.2)
84
+
85
+ BUNDLED WITH
86
+ 2.1.4
data/README.md CHANGED
@@ -26,6 +26,8 @@ This is similar to a [core word/vocabulary list](https://www.fluentin3months.com
26
26
  - [News Command](#news-command-)
27
27
  - [Using the Library](#using-the-library-)
28
28
  - [Hacking](#hacking-)
29
+ - [Updating](#updating-)
30
+ - [Releasing](#releasing-)
29
31
  - [License](#license-)
30
32
 
31
33
  ## For Non-Power Users [^](#contents)
@@ -110,11 +112,12 @@ Example usage:
110
112
 
111
113
  `$ nhkore -t 300 -m 10 news -D -L -M -d '2011-03-07 06:30' easy -u 'https://www3.nhk.or.jp/news/easy/tsunamikeihou/index.html'`
112
114
 
113
- Now that the data from the article has been scraped, you can generate a CSV/HTML/YAML file of the words ordered by frequency:
115
+ Now that the data from the article has been scraped, you can generate a CSV/HTML/JSON/YAML file of the words ordered by frequency:
114
116
 
115
117
  ```
116
118
  $ nhkore sift easy -e csv
117
119
  $ nhkore sift easy -e html
120
+ $ nhkore sift easy -e json
118
121
  $ nhkore sift easy -e yml
119
122
  ```
120
123
 
@@ -154,11 +157,11 @@ After obtaining the scraped data, you can `sift` all of the data (or select data
154
157
  | --- | --- |
155
158
  | CSV | For uploading to a flashcard website (e.g., Memrise, Anki, Buffl) after changing the data appropriately. |
156
159
  | HTML | For comfortable viewing in a web browser or for sharing. |
157
- | YAML | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
160
+ | YAML/JSON | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
158
161
 
159
162
  The data is sorted by frequency in descending order (i.e., most frequent words first).
160
163
 
161
- If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/HTML file.
164
+ If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/JSON/HTML file.
162
165
 
163
166
  The defaults will sift all of the data into a CSV file, which may not be what you want:
164
167
 
@@ -203,7 +206,7 @@ You can save the data to a different format using one of these options:
203
206
 
204
207
  ```
205
208
  -e --ext=<value> type of file (extension) to save;
206
- valid options: [csv, htm, html, yaml, yml];
209
+ valid options: [csv, htm, html, json, yaml, yml];
207
210
  not needed if you specify a file extension with
208
211
  the '--out' option: '--out sift.html'
209
212
  (default: csv)
@@ -524,7 +527,7 @@ doc = ss.html_doc()
524
527
  doc.css('a').each() do |anchor|
525
528
  link = anchor['href']
526
529
 
527
- next if ss.ignore_link?(link)
530
+ next if ss.ignore_link?(link,cleaned: false)
528
531
 
529
532
  if link.include?('https://www3.nhk')
530
533
  puts link
@@ -563,6 +566,7 @@ end
563
566
 
564
567
  ```Ruby
565
568
  require 'nhkore/article_scraper'
569
+ require 'time'
566
570
 
567
571
  as = NHKore::ArticleScraper.new(
568
572
  'https://www3.nhk.or.jp/news/easy/k10011862381000/k10011862381000.html',
@@ -686,6 +690,7 @@ end
686
690
  `Sifter` will sift & sort the `News` data into a single file. The data is sorted by frequency in descending order (i.e., most frequent words first).
687
691
 
688
692
  ```Ruby
693
+ require 'nhkore/datetime_parser'
689
694
  require 'nhkore/news'
690
695
  require 'nhkore/sifter'
691
696
  require 'time'
@@ -697,23 +702,26 @@ sifter = NHKore::Sifter.new(news)
697
702
  sifter.caption = 'Sakura Fields Forever!'
698
703
 
699
704
  # Filter the data.
700
- #sifter.filter_by_datetime(Time.new(2019,12,5))
705
+ sifter.filter_by_datetime(NHKore::DatetimeParser.parse_range('2019-12-4...7'))
706
+ sifter.filter_by_datetime([Time.new(2019,12,4),Time.new(2019,12,7)])
701
707
  sifter.filter_by_datetime(
702
708
  from: Time.new(2019,12,4),to: Time.new(2019,12,7)
703
709
  )
704
710
  sifter.filter_by_title('桜')
705
711
  sifter.filter_by_url('k100')
706
712
 
707
- # Ignore (or blank out) certain columns from the output.
713
+ # Ignore certain columns from the output.
708
714
  sifter.ignore(:defn)
709
715
  sifter.ignore(:eng)
710
716
 
711
- # An array of the filtered & sorted words.
712
- words = sifter.sift()
717
+ # An array of the sifted words.
718
+ words = sifter.sift() # Filtered & Sorted array of Word
719
+ rows = sifter.build_rows(words) # Ignored array of array
713
720
 
714
721
  # Choose the file format.
715
722
  #sifter.put_csv!()
716
723
  #sifter.put_html!()
724
+ #sifter.put_json!()
717
725
  sifter.put_yaml!()
718
726
 
719
727
  # Save to a file.
@@ -724,13 +732,14 @@ if !File.exist?(file)
724
732
  end
725
733
  ```
726
734
 
727
- ### Util & UserAgents
735
+ ### Util, UserAgents, & DatetimeParser
728
736
 
729
737
  These provide a variety of useful methods/constants.
730
738
 
731
739
  Here are some of the most useful ones:
732
740
 
733
741
  ```Ruby
742
+ require 'nhkore/datetime_parser'
734
743
  require 'nhkore/user_agents'
735
744
  require 'nhkore/util'
736
745
 
@@ -756,14 +765,16 @@ puts
756
765
  puts '========'
757
766
  puts '[ Time ]'
758
767
  puts '========'
759
- puts "JST now: #{Util.jst_now}"
768
+ puts "JST now: #{Util.jst_now()}"
760
769
  # Drops in JST_OFFSET, does not change hour/min.
761
770
  puts "JST time: #{Util.jst_time(Time.now)}"
762
771
  puts "JST year: #{Util::JST_YEAR}"
763
772
  puts "1999 sane? #{Util.sane_year?(1999)}" # true
764
773
  puts "1776 sane? #{Util.sane_year?(1776)}" # false
765
- puts "Guess 5: #{Util.guess_year(5)}" # 2005
766
- puts "Guess 99: #{Util.guess_year(99)}" # 1999
774
+ puts "Guess 5: #{DatetimeParser.guess_year(5)}" # 2005
775
+ puts "Guess 99: #{DatetimeParser.guess_year(99)}" # 1999
776
+ # => [2020-12-01 00:00:00 +0900, 2020-12-31 23:59:59 +0900]
777
+ puts "Parse: #{DatetimeParser.parse_range('2020-12')}"
767
778
  puts
768
779
  puts "JST timezone offset: #{Util::JST_OFFSET}"
769
780
  puts "JST timezone offset hour: #{Util::JST_OFFSET_HOUR}"
@@ -789,9 +800,9 @@ def fmt_jpn()
789
800
  end
790
801
 
791
802
  puts " #{fmt_jpn{|x| x}}"
792
- puts "Hiragana? #{fmt_jpn{|x| !!Util.hiragana?(x)}}"
793
- puts "Kana? #{fmt_jpn{|x| !!Util.kana?(x)}}"
794
- puts "Kanji? #{fmt_jpn{|x| !!Util.kanji?(x)}}"
803
+ puts "Hiragana? #{fmt_jpn{|x| Util.hiragana?(x)}}"
804
+ puts "Kana? #{fmt_jpn{|x| Util.kana?(x)}}"
805
+ puts "Kanji? #{fmt_jpn{|x| Util.kanji?(x)}}"
795
806
  puts "Reduce: #{Util.reduce_jpn_space("' '")}"
796
807
  puts
797
808
 
@@ -839,9 +850,29 @@ You can make some changes/fixes to the code and then install your local version:
839
850
 
840
851
  `$ bundle exec rake install:local`
841
852
 
842
- ### Releasing/Publishing
853
+ ### Updating [^](#contents)
854
+
855
+ This will update *core/* for you:
856
+
857
+ `$ bundle exec rake update_core`
858
+
859
+ ### Releasing [^](#contents)
860
+
861
+ 1. Update *CHANGELOG.md*, *version.rb*, & *Gemfile.lock*
862
+ - *Raketary*: `$ raketary bump -v`
863
+ - Run: `$ bundle update`
864
+ 2. Run: `$ bundle exec rake update_core`
865
+ 3. Run: `$ bundle exec rake clobber pkg_core`
866
+ 4. Create a new release & tag
867
+ - Add `pkg/nhkore-core.zip`
868
+ 5. Run: `$ git pull`
869
+ 6. Upload GitHub package
870
+ - *Raketary*: `$ raketary github_pkg`
871
+ 7. Run: `$ bundle exec rake release`
872
+
873
+ Releasing new HTML file for website:
843
874
 
844
- `$ bundle exec rake release`
875
+ 1. `$ bundle exec rake update_showcase`
845
876
 
846
877
  ## License [^](#contents)
847
878
 
data/Rakefile CHANGED
@@ -35,7 +35,7 @@ require 'nhkore/version'
35
35
 
36
36
  PKG_DIR = 'pkg'
37
37
 
38
- CLEAN.exclude('.git/','stock/')
38
+ CLEAN.exclude('{.git,core,stock}/**/*')
39
39
  CLOBBER.include('doc/',File.join(PKG_DIR,''))
40
40
 
41
41
 
@@ -49,7 +49,7 @@ desc "Package '#{File.join(NHKore::Util::CORE_DIR,'')}' data as a Zip file into
49
49
  task :pkg_core do |task|
50
50
  mkdir_p PKG_DIR
51
51
 
52
- pattern = File.join(NHKore::Util::CORE_DIR,'*.{csv,html,yml}')
52
+ pattern = File.join(NHKore::Util::CORE_DIR,'*.{csv,html,json,yml}')
53
53
  zip_file = File.join(PKG_DIR,'nhkore-core.zip')
54
54
 
55
55
  sh 'zip','-9rv',zip_file,*Dir.glob(pattern).sort()
@@ -59,17 +59,57 @@ Rake::TestTask.new() do |task|
59
59
  task.libs = ['lib','test']
60
60
  task.pattern = File.join('test','**','*_test.rb')
61
61
  task.description += ": '#{task.pattern}'"
62
- task.verbose = true
62
+ task.verbose = false
63
63
  task.warning = true
64
64
  end
65
65
 
66
- YARD::Rake::YardocTask.new() do |task|
67
- task.files = [File.join('lib','**','*.rb')]
66
+ # If you need to run a part after the 1st part,
67
+ # just type 'n' to not overwrite the file and then 'y' for continue.
68
+ desc "Update '#{File.join(NHKore::Util::CORE_DIR,'')}' files for release"
69
+ task :update_core do |task|
70
+ require 'highline'
71
+
72
+ CONTINUE_MSG = "\nContinue (y/n)? "
73
+
74
+ cmd = ['ruby','-w','./lib/nhkore.rb','-t','300','-m','10']
75
+ hl = HighLine.new()
76
+
77
+ next unless sh(*cmd,'se','ez','bing')
78
+ next unless hl.agree(CONTINUE_MSG)
79
+ puts
68
80
 
69
- task.options += ['--files','CHANGELOG.md,LICENSE.txt']
70
- task.options += ['--readme','README.md']
81
+ next unless sh(*cmd,'news','-s','100','ez')
82
+ next unless hl.agree(CONTINUE_MSG)
83
+ puts
84
+
85
+ next unless sh(*cmd,'sift','-e','csv' ,'ez')
86
+ next unless sh(*cmd,'sift','-e','html','ez')
87
+ next unless sh(*cmd,'sift','-e','json','ez')
88
+ next unless sh(*cmd,'sift','-e','yml' ,'ez')
89
+ end
90
+
91
+ # @since 0.3.6
92
+ desc 'Update showcase file for release'
93
+ task :update_showcase do |task|
94
+ require 'highline'
71
95
 
72
- task.options << '--protected' # Show protected methods
96
+ SHOWCASE_FILE = File.join('.','nhkore-ez.html')
97
+
98
+ hl = HighLine.new()
99
+
100
+ next unless sh('ruby','-w','./lib/nhkore.rb',
101
+ 'sift','ez','--no-eng',
102
+ '--out',SHOWCASE_FILE,
103
+ )
104
+
105
+ next unless hl.agree("\nMove the file (y/n)? ")
106
+ puts
107
+ next unless sh('mv','-iv',SHOWCASE_FILE,
108
+ File.join('..','esotericpig.github.io','showcase',''),
109
+ )
110
+ end
111
+
112
+ YARD::Rake::YardocTask.new() do |task|
73
113
  task.options += ['--template-path',File.join('yard','templates')]
74
114
  task.options += ['--title',"NHKore v#{NHKore::VERSION} Doc"]
75
115
  end