nhkore 0.3.2 → 0.3.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cf151c3859812632f09b1a464164f31bb0ce050f37ed7e7377f76265571ebd41
4
- data.tar.gz: 1f3ee801e7557731cae4aeacd3f18fea4d7f33ac65b6ec77511a7d3d8f17856a
3
+ metadata.gz: 0ca67a215cda7c49a82aa824c1322b49285abe332f627c9ad4fae774043cbfc9
4
+ data.tar.gz: b62a7e518787e89a3a54bcc66c191b4d3f005a911ab76861e3b118258f31b85f
5
5
  SHA512:
6
- metadata.gz: 7e7d0d5b805ad6fa4312e8be26f3115dff18665b3762073c56db3a7a6a343a3ee6a05e47889e0abf7b62df3bb84cf5c977fce3efdfeb8a65c7bcff8167839d35
7
- data.tar.gz: 957bc3da8492310d287a8947b9080f8be417f0874c3226db4f0bb63d020bee06c51a3da81c1fa3f779de22d354a32ab4cf41fc6f3018840774c31fd7060fbec3
6
+ metadata.gz: b4e84a07685c71400bd50b270c4ae662e6885f7149fc7ec3dec9476bf9b6b80f402d7f874ddcbef920c2b5034a1d39b44fbcb7e9ece06f3a2d517ca89e37de3d
7
+ data.tar.gz: 2527b477b7b7088f2612e4a05e0369b60cacb34bedb6ac59a3296643b6f59fcfce0c054ede67c68e0f4299864795bd79f04a85020d8f4c87b67f56c5a5dbeb77
@@ -2,7 +2,13 @@
2
2
 
3
3
  Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
4
4
 
5
- ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.2...master)
5
+ ## [[Unreleased]](https://github.com/esotericpig/nhkore/compare/v0.3.3...master)
6
+
7
+ ## [v0.3.3] - 2020-04-23
8
+
9
+ ### Added
10
+ - Added JSON support to Sifter & SiftCmd.
11
+ - Added use of `attr_bool` Gem for `attr_accessor?` & `attr_reader?`.
6
12
 
7
13
  ## [v0.3.2] - 2020-04-22
8
14
 
@@ -23,10 +29,7 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
23
29
  - In filter_by_datetime(), renamed keyword args `from_filter,to_filter` to simply `from,to`.
24
30
 
25
31
  ### Fixed
26
- - Reduced load time of app from ~1s to 0.3~0.5s.
27
- - Moved many `require '...'` statements into methods.
28
- - It looks ugly & is not a good coding practice, but a necessary evil.
29
- - Load time is still pretty slow (but a lot better!).
32
+ - Reduced load time of app a tiny bit more (see v0.3.1 for details).
30
33
  - ArticleScraper
31
34
  - Renamed `mode` param to `strict`. `mode` was overshadowing File.open()'s in Scraper.
32
35
 
@@ -40,7 +43,10 @@ Format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
40
43
  - Changed `dir_str?()` and `filename_str?()` to check any slash. Previously, it only checked the slash for your system. But now on both Windows & Linux, it will check for both `/` & `\`.
41
44
 
42
45
  ### Fixed
43
- - Reduced load time of app from ~1s to ~0.3-5s by moving some requires into methods.
46
+ - Reduced load time of app from about 1s to about 0.3-0.5s.
47
+ - Moved many `require '...'` statements into methods.
48
+ - It looks ugly & is not good coding practice, but a necessary evil.
49
+ - Load time is still pretty slow (but a lot better!).
44
50
  - BingScraper
45
51
  - Fixed possible RSS infinite loop.
46
52
 
data/README.md CHANGED
@@ -110,11 +110,12 @@ Example usage:
110
110
 
111
111
  `$ nhkore -t 300 -m 10 news -D -L -M -d '2011-03-07 06:30' easy -u 'https://www3.nhk.or.jp/news/easy/tsunamikeihou/index.html'`
112
112
 
113
- Now that the data from the article has been scraped, you can generate a CSV/HTML/YAML file of the words ordered by frequency:
113
+ Now that the data from the article has been scraped, you can generate a CSV/HTML/JSON/YAML file of the words ordered by frequency:
114
114
 
115
115
  ```
116
116
  $ nhkore sift easy -e csv
117
117
  $ nhkore sift easy -e html
118
+ $ nhkore sift easy -e json
118
119
  $ nhkore sift easy -e yml
119
120
  ```
120
121
 
@@ -154,11 +155,11 @@ After obtaining the scraped data, you can `sift` all of the data (or select data
154
155
  | --- | --- |
155
156
  | CSV | For uploading to a flashcard website (e.g., Memrise, Anki, Buffl) after changing the data appropriately. |
156
157
  | HTML | For comfortable viewing in a web browser or for sharing. |
157
- | YAML | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
158
+ | YAML/JSON | For developers to automatically add translations or to manipulate the data in some other way programmatically. |
158
159
 
159
160
  The data is sorted by frequency in descending order (i.e., most frequent words first).
160
161
 
161
- If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/HTML file.
162
+ If you wish to sort/arrange the data in some other way, CSV editors (e.g., LibreOffice, WPS Office, Microsoft Office) can do this easily and efficiently, or if you are code-savvy, you can programmatically manipulate the CSV/YAML/JSON/HTML file.
162
163
 
163
164
  The defaults will sift all of the data into a CSV file, which may not be what you want:
164
165
 
@@ -203,7 +204,7 @@ You can save the data to a different format using one of these options:
203
204
 
204
205
  ```
205
206
  -e --ext=<value> type of file (extension) to save;
206
- valid options: [csv, htm, html, yaml, yml];
207
+ valid options: [csv, htm, html, json, yaml, yml];
207
208
  not needed if you specify a file extension with
208
209
  the '--out' option: '--out sift.html'
209
210
  (default: csv)
@@ -704,16 +705,18 @@ sifter.filter_by_datetime(
704
705
  sifter.filter_by_title('桜')
705
706
  sifter.filter_by_url('k100')
706
707
 
707
- # Ignore (or blank out) certain columns from the output.
708
+ # Ignore certain columns from the output.
708
709
  sifter.ignore(:defn)
709
710
  sifter.ignore(:eng)
710
711
 
711
- # An array of the filtered & sorted words.
712
- words = sifter.sift()
712
+ # An array of the sifted words.
713
+ words = sifter.sift() # Filtered & Sorted array of Word
714
+ rows = sifter.build_rows(words) # Ignored array of array
713
715
 
714
716
  # Choose the file format.
715
717
  #sifter.put_csv!()
716
718
  #sifter.put_html!()
719
+ #sifter.put_json!()
717
720
  sifter.put_yaml!()
718
721
 
719
722
  # Save to a file.
data/Rakefile CHANGED
@@ -49,7 +49,7 @@ desc "Package '#{File.join(NHKore::Util::CORE_DIR,'')}' data as a Zip file into
49
49
  task :pkg_core do |task|
50
50
  mkdir_p PKG_DIR
51
51
 
52
- pattern = File.join(NHKore::Util::CORE_DIR,'*.{csv,html,yml}')
52
+ pattern = File.join(NHKore::Util::CORE_DIR,'*.{csv,html,json,yml}')
53
53
  zip_file = File.join(PKG_DIR,'nhkore-core.zip')
54
54
 
55
55
  sh 'zip','-9rv',zip_file,*Dir.glob(pattern).sort()
@@ -29,28 +29,7 @@ if TESTING
29
29
  end
30
30
 
31
31
  require 'nhkore/app'
32
- require 'nhkore/article'
33
- require 'nhkore/article_scraper'
34
- require 'nhkore/cleaner'
35
- require 'nhkore/defn'
36
- require 'nhkore/dict'
37
- require 'nhkore/dict_scraper'
38
- require 'nhkore/entry'
39
- require 'nhkore/error'
40
- require 'nhkore/fileable'
41
- require 'nhkore/missingno'
42
- require 'nhkore/news'
43
- require 'nhkore/polisher'
44
- require 'nhkore/scraper'
45
- require 'nhkore/search_link'
46
- require 'nhkore/search_scraper'
47
- require 'nhkore/sifter'
48
- require 'nhkore/splitter'
49
- require 'nhkore/user_agents'
50
- require 'nhkore/util'
51
- require 'nhkore/variator'
52
- require 'nhkore/version'
53
- require 'nhkore/word'
32
+ require 'nhkore/lib'
54
33
 
55
34
  require 'nhkore/cli/fx_cmd'
56
35
  require 'nhkore/cli/get_cmd'
@@ -21,6 +21,7 @@
21
21
  #++
22
22
 
23
23
 
24
+ require 'attr_bool'
24
25
  require 'digest'
25
26
 
26
27
  require 'nhkore/article'
@@ -49,12 +50,10 @@ module NHKore
49
50
  attr_accessor :missingno
50
51
  attr_reader :polishers
51
52
  attr_accessor :splitter
52
- attr_accessor :strict
53
+ attr_accessor? :strict
53
54
  attr_reader :variators
54
55
  attr_accessor :year
55
56
 
56
- alias_method :strict?,:strict
57
-
58
57
  # @param dict [Dict,:scrape,nil] the {Dict} (dictionary) to use for {Word#defn} (definitions)
59
58
  # [+:scrape+] auto-scrape it using {DictScraper}
60
59
  # [+nil+] don't scrape/use it
@@ -39,7 +39,7 @@ module CLI
39
39
  DEFAULT_SIFT_EXT = :csv
40
40
  DEFAULT_SIFT_FUTSUU_FILE = "#{Sifter::DEFAULT_FUTSUU_FILE}{search.criteria}{file.ext}"
41
41
  DEFAULT_SIFT_YASASHII_FILE = "#{Sifter::DEFAULT_YASASHII_FILE}{search.criteria}{file.ext}"
42
- SIFT_EXTS = [:csv,:htm,:html,:yaml,:yml]
42
+ SIFT_EXTS = [:csv,:htm,:html,:json,:yaml,:yml]
43
43
 
44
44
  # Order matters.
45
45
  SIFT_DATETIME_FMTS = [
@@ -364,6 +364,8 @@ module CLI
364
364
  sifter.put_csv!()
365
365
  when :htm,:html
366
366
  sifter.put_html!()
367
+ when :json
368
+ sifter.put_json!()
367
369
  when :yaml,:yml
368
370
  sifter.put_yaml!()
369
371
  else
@@ -21,6 +21,7 @@
21
21
  #++
22
22
 
23
23
 
24
+ require 'attr_bool'
24
25
  require 'nokogiri'
25
26
  require 'open-uri'
26
27
 
@@ -40,8 +41,8 @@ module NHKore
40
41
  'dnt' => '1',
41
42
  }
42
43
 
43
- attr_accessor :eat_cookie
44
- attr_accessor :is_file
44
+ attr_accessor? :eat_cookie
45
+ attr_accessor? :is_file
45
46
  attr_reader :kargs
46
47
  attr_accessor :max_redirects
47
48
  attr_accessor :max_retries
@@ -49,9 +50,6 @@ module NHKore
49
50
  attr_accessor :str_or_io
50
51
  attr_accessor :url
51
52
 
52
- alias_method :eat_cookie?,:eat_cookie
53
- alias_method :is_file?,:is_file
54
-
55
53
  # +max_redirects+ defaults to 3 for safety (infinite-loop attack).
56
54
  #
57
55
  # All URL options: https://ruby-doc.org/stdlib-2.7.0/libdoc/open-uri/rdoc/OpenURI/OpenRead.html
@@ -21,6 +21,7 @@
21
21
  #++
22
22
 
23
23
 
24
+ require 'attr_bool'
24
25
  require 'time'
25
26
 
26
27
  require 'nhkore/fileable'
@@ -35,13 +36,11 @@ module NHKore
35
36
  class SearchLink
36
37
  attr_accessor :datetime
37
38
  attr_accessor :futsuurl
38
- attr_accessor :scraped
39
+ attr_accessor? :scraped
39
40
  attr_accessor :sha256
40
41
  attr_accessor :title
41
42
  attr_accessor :url
42
43
 
43
- alias_method :scraped?,:scraped
44
-
45
44
  def initialize(url,scraped: false)
46
45
  super()
47
46
 
@@ -60,6 +60,40 @@ module NHKore
60
60
  @output = nil
61
61
  end
62
62
 
63
+ def build_header()
64
+ header = []
65
+
66
+ header << 'Frequency' unless @ignores[:freq]
67
+ header << 'Word' unless @ignores[:word]
68
+ header << 'Kana' unless @ignores[:kana]
69
+ header << 'English' unless @ignores[:eng]
70
+ header << 'Definition' unless @ignores[:defn]
71
+
72
+ return header
73
+ end
74
+
75
+ def build_rows(words)
76
+ rows = []
77
+
78
+ words.each() do |word|
79
+ rows << build_word_row(word)
80
+ end
81
+
82
+ return rows
83
+ end
84
+
85
+ def build_word_row(word)
86
+ row = []
87
+
88
+ row << word.freq unless @ignores[:freq]
89
+ row << word.word unless @ignores[:word]
90
+ row << word.kana unless @ignores[:kana]
91
+ row << word.eng unless @ignores[:eng]
92
+ row << word.defn unless @ignores[:defn]
93
+
94
+ return row
95
+ end
96
+
63
97
  def filter?(article)
64
98
  return false if @filters.empty?()
65
99
 
@@ -151,26 +185,10 @@ module NHKore
151
185
  words = sift()
152
186
 
153
187
  @output = CSV.generate(headers: :first_row,write_headers: true) do |csv|
154
- row = []
155
-
156
- row << 'Frequency' unless @ignores[:freq]
157
- row << 'Word' unless @ignores[:word]
158
- row << 'Kana' unless @ignores[:kana]
159
- row << 'English' unless @ignores[:eng]
160
- row << 'Definition' unless @ignores[:defn]
161
-
162
- csv << row
188
+ csv << build_header()
163
189
 
164
190
  words.each() do |word|
165
- row = []
166
-
167
- row << word.freq unless @ignores[:freq]
168
- row << word.word unless @ignores[:word]
169
- row << word.kana unless @ignores[:kana]
170
- row << word.eng unless @ignores[:eng]
171
- row << word.defn unless @ignores[:defn]
172
-
173
- csv << row
191
+ csv << build_word_row(word)
174
192
  end
175
193
  end
176
194
 
@@ -232,7 +250,7 @@ module NHKore
232
250
  <h2>#{@caption}</h2>
233
251
  <table>
234
252
  EOH
235
- #" # Fix for editor
253
+ #"
236
254
 
237
255
  # If have too few or too many '<col>', invalid HTML.
238
256
  @output << %Q{<col style="width:6em;">\n} unless @ignores[:freq]
@@ -242,20 +260,20 @@ module NHKore
242
260
  @output << "<col>\n" unless @ignores[:defn] # No width for defn, fills rest of page
243
261
 
244
262
  @output << '<tr>'
245
- @output << '<th>Frequency</th>' unless @ignores[:freq]
246
- @output << '<th>Word</th>' unless @ignores[:word]
247
- @output << '<th>Kana</th>' unless @ignores[:kana]
248
- @output << '<th>English</th>' unless @ignores[:eng]
249
- @output << '<th>Definition</th>' unless @ignores[:defn]
263
+
264
+ build_header().each() do |h|
265
+ @output << "<th>#{h}</th>"
266
+ end
267
+
250
268
  @output << "</tr>\n"
251
269
 
252
270
  words.each() do |word|
253
271
  @output << '<tr>'
254
- @output << "<td>#{Util.escape_html(word.freq.to_s())}</td>" unless @ignores[:freq]
255
- @output << "<td>#{Util.escape_html(word.word.to_s())}</td>" unless @ignores[:word]
256
- @output << "<td>#{Util.escape_html(word.kana.to_s())}</td>" unless @ignores[:kana]
257
- @output << "<td>#{Util.escape_html(word.eng.to_s())}</td>" unless @ignores[:eng]
258
- @output << "<td>#{Util.escape_html(word.defn.to_s())}</td>" unless @ignores[:defn]
272
+
273
+ build_word_row(word).each() do |w|
274
+ @output << "<td>#{Util.escape_html(w.to_s())}</td>"
275
+ end
276
+
259
277
  @output << "</tr>\n"
260
278
  end
261
279
 
@@ -264,31 +282,63 @@ module NHKore
264
282
  </body>
265
283
  </html>
266
284
  EOH
267
- #/ # Fix for editor
285
+ #/
268
286
 
269
287
  return @output
270
288
  end
271
289
 
272
- def put_yaml!()
290
+ def put_json!()
291
+ require 'json'
292
+
273
293
  words = sift()
274
294
 
275
- # Just blank out ignores.
276
- if !@ignores.empty?()
277
- words.each() do |word|
278
- # word/kanji/kana do not have setters/mutators.
279
- word.defn = nil if @ignores[:defn]
280
- word.eng = nil if @ignores[:eng]
281
- word.freq = nil if @ignores[:freq]
295
+ @output = ''.dup()
296
+
297
+ @output << <<~EOJ
298
+ {
299
+ "caption": #{JSON.generate(@caption)},
300
+ "header": #{JSON.generate(build_header())},
301
+ "words": [
302
+ EOJ
303
+
304
+ if !words.empty?()
305
+ 0.upto(words.length - 2) do |i|
306
+ @output << " #{JSON.generate(build_word_row(words[i]))},\n"
282
307
  end
308
+
309
+ @output << " #{JSON.generate(build_word_row(words[-1]))}\n"
283
310
  end
284
311
 
312
+ @output << "]\n}\n"
313
+
314
+ return @output
315
+ end
316
+
317
+ def put_yaml!()
318
+ require 'psychgus'
319
+
320
+ words = sift()
321
+
285
322
  yaml = {
286
323
  caption: @caption,
287
- words: words
324
+ header: build_header(),
325
+ words: build_rows(words),
288
326
  }
289
327
 
328
+ header_styler = Class.new() do
329
+ include Psychgus::Styler
330
+
331
+ def style_sequence(sniffer,node)
332
+ parent = sniffer.parent
333
+
334
+ if !parent.nil?() && parent.node.respond_to?(:value) && parent.value == 'header'
335
+ node.style = Psychgus::SEQUENCE_FLOW
336
+ end
337
+ end
338
+ end
339
+
290
340
  # Put each Word on one line (flow/inline style).
291
- @output = Util.dump_yaml(yaml,flow_level: 4)
341
+ @output = Util.dump_yaml(yaml,flow_level: 4,stylers: header_styler.new())
292
342
 
293
343
  return @output
294
344
  end
@@ -306,7 +356,7 @@ module NHKore
306
356
 
307
357
  words = master_article.words.values()
308
358
 
309
- words = words.sort() do |word1,word2|
359
+ words.sort!() do |word1,word2|
310
360
  # Order by freq DESC (most frequent words to top).
311
361
  i = (word2.freq <=> word1.freq)
312
362
 
@@ -75,17 +75,20 @@ module NHKore
75
75
  return domain
76
76
  end
77
77
 
78
- def self.dump_yaml(obj,flow_level: 8)
78
+ def self.dump_yaml(obj,flow_level: 8,stylers: nil)
79
79
  require 'psychgus'
80
80
 
81
+ stylers = Array(stylers)
82
+
81
83
  return Psychgus.dump(obj,
82
84
  deref_aliases: true, # Dereference aliases for load_yaml()
85
+ header: true, # %YAML [version]
83
86
  line_width: 10000, # Try not to wrap; ichiman!
84
87
  stylers: [
85
88
  Psychgus::FlowStyler.new(flow_level), # Put extra details on one line (flow/inline style)
86
89
  Psychgus::NoSymStyler.new(cap: false), # Remove symbols, don't capitalize
87
90
  Psychgus::NoTagStyler.new(), # Remove class names (tags)
88
- ],
91
+ ].concat(stylers),
89
92
  )
90
93
  end
91
94
 
@@ -22,5 +22,5 @@
22
22
 
23
23
 
24
24
  module NHKore
25
- VERSION = '0.3.2'
25
+ VERSION = '0.3.3'
26
26
  end
@@ -59,6 +59,7 @@ Gem::Specification.new() do |spec|
59
59
 
60
60
  spec.requirements << 'Nokogiri: https://www.nokogiri.org/tutorials/installing_nokogiri.html'
61
61
 
62
+ spec.add_runtime_dependency 'attr_bool' ,'~> 0.1' # For attr_accessor?/attr_reader?
62
63
  spec.add_runtime_dependency 'bimyou_segmenter' ,'~> 1.2' # For splitting Japanese sentences into words
63
64
  spec.add_runtime_dependency 'cri' ,'~> 2.15' # For CLI commands/options
64
65
  spec.add_runtime_dependency 'down' ,'~> 5.1' # For downloading files (GetCmd)
metadata CHANGED
@@ -1,15 +1,29 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: nhkore
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.2
4
+ version: 0.3.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Bradley Whited (@esotericpig)
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-04-21 00:00:00.000000000 Z
11
+ date: 2020-04-23 00:00:00.000000000 Z
12
12
  dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: attr_bool
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '0.1'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '0.1'
13
27
  - !ruby/object:Gem::Dependency
14
28
  name: bimyou_segmenter
15
29
  requirement: !ruby/object:Gem::Requirement
@@ -375,7 +389,7 @@ metadata:
375
389
  changelog_uri: https://github.com/esotericpig/nhkore/blob/master/CHANGELOG.md
376
390
  homepage_uri: https://github.com/esotericpig/nhkore
377
391
  source_code_uri: https://github.com/esotericpig/nhkore
378
- post_install_message: " \n NHKore v0.3.2\n \n You can now use [nhkore] on the
392
+ post_install_message: " \n NHKore v0.3.3\n \n You can now use [nhkore] on the
379
393
  command line.\n \n Homepage: https://github.com/esotericpig/nhkore\n \n Code:
380
394
  \ https://github.com/esotericpig/nhkore\n Changelog: https://github.com/esotericpig/nhkore/blob/master/CHANGELOG.md\n
381
395
  \ Bugs: https://github.com/esotericpig/nhkore/issues\n \n"