nhkore 0.3.2 → 0.3.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cf151c3859812632f09b1a464164f31bb0ce050f37ed7e7377f76265571ebd41
4
- data.tar.gz: 1f3ee801e7557731cae4aeacd3f18fea4d7f33ac65b6ec77511a7d3d8f17856a
3
+ metadata.gz: 0ca67a215cda7c49a82aa824c1322b49285abe332f627c9ad4fae774043cbfc9
4
+ data.tar.gz: b62a7e518787e89a3a54bcc66c191b4d3f005a911ab76861e3b118258f31b85f
5
5
  SHA512:
6
- metadata.gz: 7e7d0d5b805ad6fa4312e8be26f3115dff18665b3762073c56db3a7a6a343a3ee6a05e47889e0abf7b62df3bb84cf5c977fce3efdfeb8a65c7bcff8167839d35
7
- data.tar.gz: 957bc3da8492310d287a8947b9080f8be417f0874c3226db4f0bb63d020bee06c51a3da81c1fa3f779de22d354a32ab4cf41fc6f3018840774c31fd7060fbec3
6
+ metadata.gz: b4e84a07685c71400bd50b270c4ae662e6885f7149fc7ec3dec9476bf9b6b80f402d7f874ddcbef920c2b5034a1d39b44fbcb7e9ece06f3a2d517ca89e37de3d
7
+ data.tar.gz: 2527b477b7b7088f2612e4a05e0369b60cacb34bedb6ac59a3296643b6f59fcfce0c054ede67c68e0f4299864795bd79f04a85020d8f4c87b67f56c5a5dbeb77
@@ -2,7 +2,13 @@
2
2
 
3
3
  Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
4
4
 
5
- ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.2...master)
5
+ ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.3...master)
6
+
7
+ ## [v0.3.3] - 2020-04-23
8
+
9
+ ### Added
10
+ - Added JSON support to Sifter & SiftCmd.
11
+ - Added use of `attr_bool` Gem for `attr_accessor?` & `attr_reader?`.
6
12
 
7
13
  ## [v0.3.2] - 2020-04-22
8
14
 
@@ -23,10 +29,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
23
29
  - In filter_by_datetime(), renamed keyword args `from_filter,to_filter` to simply `from,to`.
24
30
 
25
31
  ### Fixed
26
- - Reduced load time of app from ~1s to 0.3~0.5s.
27
- - Moved many `require '...'` statements into methods.
28
- - It looks ugly & is not a good coding practice, but a necessary evil.
29
- - Load time is still pretty slow (but a lot better!).
32
+ - Reduced load time of app a tiny bit more (see v0.3.1 for details).
30
33
  - ArticleScraper
31
34
  - Renamed `mode` param to `strict`. `mode` was overshadowing File.open()'s in Scraper.
32
35
 
@@ -40,7 +43,10 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
40
43
  - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows & Linux, it will check for both `/` & `\`.
41
44
 
42
45
  ### Fixed
43
- - Reduced load time of app from ~1s to ~0.3-5s by moving some requires into methods.
46
+ - Reduced load time of app from about 1s to about 0.3-0.5s.
47
+ - Moved many `require '...'` statements into methods.
48
+ - It looks ugly & is not good coding practice, but a necessary evil.
49
+ - Load time is still pretty slow (but a lot better!).
44
50
  - BingScraper
45
51
  - Fixed possible RSS infinite loop.
46
52
 
data/README.md CHANGED
@@ -110,11 +110,12 @@ Example usage:
110
110
 
111
111
  `$ nhkore -t 300 -m 10 news -D -L -M -d '2011-03-07 06:30' easy -u 'https://www3.nhk.or.jp/news/easy/tsunamikeihou/index.html'`
112
112
 
113
- Now that the data from the article has been scraped, you can generate a CSV/HTML/YAML file of the words ordered by frequency:
113
+ Now that the data from the article has been scraped, you can generate a CSV/HTML/JSON/YAML file of the words ordered by frequency:
114
114
 
115
115
  ```
116
116
  $ nhkore sift easy -e csv
117
117
  $ nhkore sift easy -e html
118
+ $ nhkore sift easy -e json
118
119
  $ nhkore sift easy -e yml
119
120
  ```
120
121
 
@@ -154,11 +155,11 @@ After obtaining the scraped data, you can `sift` all of the data (or select data
154
155
  | --- | --- |
155
156
  | CSV | For uploading to a flashcard website (e.g., Memrise, Anki, Buffl) after changing the data appropriately. |
156
157
  | HTML | For comfortable viewing in a web browser or for sharing. |
157
- | YAML | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
158
+ | YAML/JSON | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
158
159
 
159
160
  The data is sorted by frequency in descending order (i.e., most frequent words first).
160
161
 
161
- If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/HTML file.
162
+ If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/JSON/HTML file.
162
163
 
163
164
  The defaults will sift all of the data into a CSV file, which may not be what you want:
164
165
 
@@ -203,7 +204,7 @@ You can save the data to a different format using one of these options:
203
204
 
204
205
  ```
205
206
  -e --ext=<value> type of file (extension) to save;
206
- valid options: [csv, htm, html, yaml, yml];
207
+ valid options: [csv, htm, html, json, yaml, yml];
207
208
  not needed if you specify a file extension with
208
209
  the '--out' option: '--out sift.html'
209
210
  (default: csv)
@@ -704,16 +705,18 @@ sifter.filter_by_datetime(
704
705
  sifter.filter_by_title('桜')
705
706
  sifter.filter_by_url('k100')
706
707
 
707
- # Ignore (or blank out) certain columns from the output.
708
+ # Ignore certain columns from the output.
708
709
  sifter.ignore(:defn)
709
710
  sifter.ignore(:eng)
710
711
 
711
- # An array of the filtered & sorted words.
712
- words = sifter.sift()
712
+ # An array of the sifted words.
713
+ words = sifter.sift() # Filtered & Sorted array of Word
714
+ rows = sifter.build_rows(words) # Ignored array of array
713
715
 
714
716
  # Choose the file format.
715
717
  #sifter.put_csv!()
716
718
  #sifter.put_html!()
719
+ #sifter.put_json!()
717
720
  sifter.put_yaml!()
718
721
 
719
722
  # Save to a file.
data/Rakefile CHANGED
@@ -49,7 +49,7 @@ desc "Package '#{File.join(NHKore::Util::CORE_DIR,'')}' data as a Zip file into
49
49
  task :pkg_core do |task|
50
50
  mkdir_p PKG_DIR
51
51
 
52
- pattern = File.join(NHKore::Util::CORE_DIR,'*.{csv,html,yml}')
52
+ pattern = File.join(NHKore::Util::CORE_DIR,'*.{csv,html,json,yml}')
53
53
  zip_file = File.join(PKG_DIR,'nhkore-core.zip')
54
54
 
55
55
  sh 'zip','-9rv',zip_file,*Dir.glob(pattern).sort()
@@ -29,28 +29,7 @@ if TESTING
29
29
  end
30
30
 
31
31
  require 'nhkore/app'
32
- require 'nhkore/article'
33
- require 'nhkore/article_scraper'
34
- require 'nhkore/cleaner'
35
- require 'nhkore/defn'
36
- require 'nhkore/dict'
37
- require 'nhkore/dict_scraper'
38
- require 'nhkore/entry'
39
- require 'nhkore/error'
40
- require 'nhkore/fileable'
41
- require 'nhkore/missingno'
42
- require 'nhkore/news'
43
- require 'nhkore/polisher'
44
- require 'nhkore/scraper'
45
- require 'nhkore/search_link'
46
- require 'nhkore/search_scraper'
47
- require 'nhkore/sifter'
48
- require 'nhkore/splitter'
49
- require 'nhkore/user_agents'
50
- require 'nhkore/util'
51
- require 'nhkore/variator'
52
- require 'nhkore/version'
53
- require 'nhkore/word'
32
+ require 'nhkore/lib'
54
33
 
55
34
  require 'nhkore/cli/fx_cmd'
56
35
  require 'nhkore/cli/get_cmd'
@@ -21,6 +21,7 @@
21
21
  #++
22
22
 
23
23
 
24
+ require 'attr_bool'
24
25
  require 'digest'
25
26
 
26
27
  require 'nhkore/article'
@@ -49,12 +50,10 @@ module NHKore
49
50
  attr_accessor :missingno
50
51
  attr_reader :polishers
51
52
  attr_accessor :splitter
52
- attr_accessor :strict
53
+ attr_accessor? :strict
53
54
  attr_reader :variators
54
55
  attr_accessor :year
55
56
 
56
- alias_method :strict?,:strict
57
-
58
57
  # @param dict [Dict,:scrape,nil] the {Dict} (dictionary) to use for {Word#defn} (definitions)
59
58
  # [+:scrape+] auto-scrape it using {DictScraper}
60
59
  # [+nil+] don't scrape/use it
@@ -39,7 +39,7 @@ module CLI
39
39
  DEFAULT_SIFT_EXT = :csv
40
40
  DEFAULT_SIFT_FUTSUU_FILE = "#{Sifter::DEFAULT_FUTSUU_FILE}{search.criteria}{file.ext}"
41
41
  DEFAULT_SIFT_YASASHII_FILE = "#{Sifter::DEFAULT_YASASHII_FILE}{search.criteria}{file.ext}"
42
- SIFT_EXTS = [:csv,:htm,:html,:yaml,:yml]
42
+ SIFT_EXTS = [:csv,:htm,:html,:json,:yaml,:yml]
43
43
 
44
44
  # Order matters.
45
45
  SIFT_DATETIME_FMTS = [
@@ -364,6 +364,8 @@ module CLI
364
364
  sifter.put_csv!()
365
365
  when :htm,:html
366
366
  sifter.put_html!()
367
+ when :json
368
+ sifter.put_json!()
367
369
  when :yaml,:yml
368
370
  sifter.put_yaml!()
369
371
  else
@@ -21,6 +21,7 @@
21
21
  #++
22
22
 
23
23
 
24
+ require 'attr_bool'
24
25
  require 'nokogiri'
25
26
  require 'open-uri'
26
27
 
@@ -40,8 +41,8 @@ module NHKore
40
41
  'dnt' => '1',
41
42
  }
42
43
 
43
- attr_accessor :eat_cookie
44
- attr_accessor :is_file
44
+ attr_accessor? :eat_cookie
45
+ attr_accessor? :is_file
45
46
  attr_reader :kargs
46
47
  attr_accessor :max_redirects
47
48
  attr_accessor :max_retries
@@ -49,9 +50,6 @@ module NHKore
49
50
  attr_accessor :str_or_io
50
51
  attr_accessor :url
51
52
 
52
- alias_method :eat_cookie?,:eat_cookie
53
- alias_method :is_file?,:is_file
54
-
55
53
  # +max_redirects+ defaults to 3 for safety (infinite-loop attack).
56
54
  #
57
55
  # All URL options: https://ruby-doc.org/stdlib-2.7.0/libdoc/open-uri/rdoc/OpenURI/OpenRead.html
@@ -21,6 +21,7 @@
21
21
  #++
22
22
 
23
23
 
24
+ require 'attr_bool'
24
25
  require 'time'
25
26
 
26
27
  require 'nhkore/fileable'
@@ -35,13 +36,11 @@ module NHKore
35
36
  class SearchLink
36
37
  attr_accessor :datetime
37
38
  attr_accessor :futsuurl
38
- attr_accessor :scraped
39
+ attr_accessor? :scraped
39
40
  attr_accessor :sha256
40
41
  attr_accessor :title
41
42
  attr_accessor :url
42
43
 
43
- alias_method :scraped?,:scraped
44
-
45
44
  def initialize(url,scraped: false)
46
45
  super()
47
46
 
@@ -60,6 +60,40 @@ module NHKore
60
60
  @output = nil
61
61
  end
62
62
 
63
+ def build_header()
64
+ header = []
65
+
66
+ header << 'Frequency' unless @ignores[:freq]
67
+ header << 'Word' unless @ignores[:word]
68
+ header << 'Kana' unless @ignores[:kana]
69
+ header << 'English' unless @ignores[:eng]
70
+ header << 'Definition' unless @ignores[:defn]
71
+
72
+ return header
73
+ end
74
+
75
+ def build_rows(words)
76
+ rows = []
77
+
78
+ words.each() do |word|
79
+ rows << build_word_row(word)
80
+ end
81
+
82
+ return rows
83
+ end
84
+
85
+ def build_word_row(word)
86
+ row = []
87
+
88
+ row << word.freq unless @ignores[:freq]
89
+ row << word.word unless @ignores[:word]
90
+ row << word.kana unless @ignores[:kana]
91
+ row << word.eng unless @ignores[:eng]
92
+ row << word.defn unless @ignores[:defn]
93
+
94
+ return row
95
+ end
96
+
63
97
  def filter?(article)
64
98
  return false if @filters.empty?()
65
99
 
@@ -151,26 +185,10 @@ module NHKore
151
185
  words = sift()
152
186
 
153
187
  @output = CSV.generate(headers: :first_row,write_headers: true) do |csv|
154
- row = []
155
-
156
- row << 'Frequency' unless @ignores[:freq]
157
- row << 'Word' unless @ignores[:word]
158
- row << 'Kana' unless @ignores[:kana]
159
- row << 'English' unless @ignores[:eng]
160
- row << 'Definition' unless @ignores[:defn]
161
-
162
- csv << row
188
+ csv << build_header()
163
189
 
164
190
  words.each() do |word|
165
- row = []
166
-
167
- row << word.freq unless @ignores[:freq]
168
- row << word.word unless @ignores[:word]
169
- row << word.kana unless @ignores[:kana]
170
- row << word.eng unless @ignores[:eng]
171
- row << word.defn unless @ignores[:defn]
172
-
173
- csv << row
191
+ csv << build_word_row(word)
174
192
  end
175
193
  end
176
194
 
@@ -232,7 +250,7 @@ module NHKore
232
250
  <h2>#{@caption}</h2>
233
251
  <table>
234
252
  EOH
235
- #" # Fix for editor
253
+ #"
236
254
 
237
255
  # If have too few or too many '<col>', invalid HTML.
238
256
  @output << %Q{<col style="width:6em;">\n} unless @ignores[:freq]
@@ -242,20 +260,20 @@ module NHKore
242
260
  @output << "<col>\n" unless @ignores[:defn] # No width for defn, fills rest of page
243
261
 
244
262
  @output << '<tr>'
245
- @output << '<th>Frequency</th>' unless @ignores[:freq]
246
- @output << '<th>Word</th>' unless @ignores[:word]
247
- @output << '<th>Kana</th>' unless @ignores[:kana]
248
- @output << '<th>English</th>' unless @ignores[:eng]
249
- @output << '<th>Definition</th>' unless @ignores[:defn]
263
+
264
+ build_header().each() do |h|
265
+ @output << "<th>#{h}</th>"
266
+ end
267
+
250
268
  @output << "</tr>\n"
251
269
 
252
270
  words.each() do |word|
253
271
  @output << '<tr>'
254
- @output << "<td>#{Util.escape_html(word.freq.to_s())}</td>" unless @ignores[:freq]
255
- @output << "<td>#{Util.escape_html(word.word.to_s())}</td>" unless @ignores[:word]
256
- @output << "<td>#{Util.escape_html(word.kana.to_s())}</td>" unless @ignores[:kana]
257
- @output << "<td>#{Util.escape_html(word.eng.to_s())}</td>" unless @ignores[:eng]
258
- @output << "<td>#{Util.escape_html(word.defn.to_s())}</td>" unless @ignores[:defn]
272
+
273
+ build_word_row(word).each() do |w|
274
+ @output << "<td>#{Util.escape_html(w.to_s())}</td>"
275
+ end
276
+
259
277
  @output << "</tr>\n"
260
278
  end
261
279
 
@@ -264,31 +282,63 @@ module NHKore
264
282
  </body>
265
283
  </html>
266
284
  EOH
267
- #/ # Fix for editor
285
+ #/
268
286
 
269
287
  return @output
270
288
  end
271
289
 
272
- def put_yaml!()
290
+ def put_json!()
291
+ require 'json'
292
+
273
293
  words = sift()
274
294
 
275
- # Just blank out ignores.
276
- if !@ignores.empty?()
277
- words.each() do |word|
278
- # word/kanji/kana do not have setters/mutators.
279
- word.defn = nil if @ignores[:defn]
280
- word.eng = nil if @ignores[:eng]
281
- word.freq = nil if @ignores[:freq]
295
+ @output = ''.dup()
296
+
297
+ @output << <<~EOJ
298
+ {
299
+ "caption": #{JSON.generate(@caption)},
300
+ "header": #{JSON.generate(build_header())},
301
+ "words": [
302
+ EOJ
303
+
304
+ if !words.empty?()
305
+ 0.upto(words.length - 2) do |i|
306
+ @output << " #{JSON.generate(build_word_row(words[i]))},\n"
282
307
  end
308
+
309
+ @output << " #{JSON.generate(build_word_row(words[-1]))}\n"
283
310
  end
284
311
 
312
+ @output << "]\n}\n"
313
+
314
+ return @output
315
+ end
316
+
317
+ def put_yaml!()
318
+ require 'psychgus'
319
+
320
+ words = sift()
321
+
285
322
  yaml = {
286
323
  caption: @caption,
287
- words: words
324
+ header: build_header(),
325
+ words: build_rows(words),
288
326
  }
289
327
 
328
+ header_styler = Class.new() do
329
+ include Psychgus::Styler
330
+
331
+ def style_sequence(sniffer,node)
332
+ parent = sniffer.parent
333
+
334
+ if !parent.nil?() && parent.node.respond_to?(:value) && parent.value == 'header'
335
+ node.style = Psychgus::SEQUENCE_FLOW
336
+ end
337
+ end
338
+ end
339
+
290
340
  # Put each Word on one line (flow/inline style).
291
- @output = Util.dump_yaml(yaml,flow_level: 4)
341
+ @output = Util.dump_yaml(yaml,flow_level: 4,stylers: header_styler.new())
292
342
 
293
343
  return @output
294
344
  end
@@ -306,7 +356,7 @@ module NHKore
306
356
 
307
357
  words = master_article.words.values()
308
358
 
309
- words = words.sort() do |word1,word2|
359
+ words.sort!() do |word1,word2|
310
360
  # Order by freq DESC (most frequent words to top).
311
361
  i = (word2.freq <=> word1.freq)
312
362
 
@@ -75,17 +75,20 @@ module NHKore
75
75
  return domain
76
76
  end
77
77
 
78
- def self.dump_yaml(obj,flow_level: 8)
78
+ def self.dump_yaml(obj,flow_level: 8,stylers: nil)
79
79
  require 'psychgus'
80
80
 
81
+ stylers = Array(stylers)
82
+
81
83
  return Psychgus.dump(obj,
82
84
  deref_aliases: true, # Dereference aliases for load_yaml()
85
+ header: true, # %YAML [version]
83
86
  line_width: 10000, # Try not to wrap; ichiman!
84
87
  stylers: [
85
88
  Psychgus::FlowStyler.new(flow_level), # Put extra details on one line (flow/inline style)
86
89
  Psychgus::NoSymStyler.new(cap: false), # Remove symbols, don't capitalize
87
90
  Psychgus::NoTagStyler.new(), # Remove class names (tags)
88
- ],
91
+ ].concat(stylers),
89
92
  )
90
93
  end
91
94
 
@@ -22,5 +22,5 @@
22
22
 
23
23
 
24
24
  module NHKore
25
- VERSION = '0.3.2'
25
+ VERSION = '0.3.3'
26
26
  end
@@ -59,6 +59,7 @@ Gem::Specification.new() do |spec|
59
59
 
60
60
  spec.requirements << 'Nokogiri: https://www.nokogiri.org/tutorials/installing_nokogiri.html'
61
61
 
62
+ spec.add_runtime_dependency 'attr_bool' ,'~> 0.1' # For attr_accessor?/attr_reader?
62
63
  spec.add_runtime_dependency 'bimyou_segmenter' ,'~> 1.2' # For splitting Japanese sentences into words
63
64
  spec.add_runtime_dependency 'cri' ,'~> 2.15' # For CLI commands/options
64
65
  spec.add_runtime_dependency 'down' ,'~> 5.1' # For downloading files (GetCmd)
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: nhkore
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.2
4
+ version: 0.3.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Bradley Whited (@esotericpig)
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-04-21 00:00:00.000000000 Z
11
+ date: 2020-04-23 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: attr_bool
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '0.1'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '0.1'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: bimyou_segmenter
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -375,7 +389,7 @@ metadata:
375
389
  changelog_uri: https://github.com/esotericpig/nhkore/blob/master/CHANGELOG.md
376
390
  homepage_uri: https://github.com/esotericpig/nhkore
377
391
  source_code_uri: https://github.com/esotericpig/nhkore
378
- post_install_message: " \n NHKore v0.3.2\n \n You can now use [nhkore] on the
392
+ post_install_message: " \n NHKore v0.3.3\n \n You can now use [nhkore] on the
379
393
  command line.\n \n Homepage: https://github.com/esotericpig/nhkore\n \n Code:
380
394
  \ https://github.com/esotericpig/nhkore\n Changelog: https://github.com/esotericpig/nhkore/blob/master/CHANGELOG.md\n
381
395
  \ Bugs: https://github.com/esotericpig/nhkore/issues\n \n"