yamd 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
Files changed (5) hide show
  1. checksums.yaml +4 -4
  2. data/bin/yamd +3 -0
  3. data/lib/yamd.rb +51 -5
  4. data/lib/yamd/fakku.rb +47 -0
  5. metadata +54 -12
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 37713855cb84a57329e8ff6cc81a41221d5658a7
4
- data.tar.gz: c6daab7709db44037bbcb4071733871948ee5ae7
3
+ metadata.gz: a7cfa4dadcceadf400f49b0d0fc2e353b9a474a7
4
+ data.tar.gz: ae7c9eb249ca40a34515832316ca20069addd234
5
5
  SHA512:
6
- metadata.gz: 1c6a49b2ee093ccf8f08b63ee41962ef7344d8b2cbaa129462e6033c3e2ac2c27f816138fac68f64fe8761b4d9e02cae821bdfca365d242ff8b1199a38ca0691
7
- data.tar.gz: 6a8adc74ffa5268c582709bea2438de86208628bff54132a81de8e396402c717df05d60af7b583b675c349286b0eedbca14d59fefc8c6f6a718a2fbe78a9ec7f
6
+ metadata.gz: 13e0fd6911898fe1eed2b82ef27e55bdda1cbc4df5020694bdbf5236145a5490a1d518e0bb4a1716977644bfcb8e3ce4774bcc6257a76459a8d08a96b1b26a3b
7
+ data.tar.gz: b9b3c932f1313b3d6909c88cb292a3be96f3d22b3246c8dc190853152b9897845ecc4621e12d46d2180f1af709c854d4743c17c1a60dfd33b093a7bc38f09725
data/bin/yamd CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  require 'yamd/mangahere'
4
4
  require 'yamd/mangafox'
5
+ require 'yamd/fakku'
5
6
 
6
7
  unless ARGV.size > 0
7
8
  puts 'USAGE: yamd <manga main page url>'
@@ -15,6 +16,8 @@ if /mangafox/.match(manga_main_page_url)
15
16
  manga = MangafoxCrawler.new(manga_main_page_url)
16
17
  elsif /mangahere/.match(manga_main_page_url)
17
18
  manga = MangahereCrawler.new(manga_main_page_url)
19
+ elsif /fakku/.match(manga_main_page_url)
20
+ manga = FakkuCrawler.new(manga_main_page_url)
18
21
  else
19
22
  puts "The argument (#{manga_main_page_url}) doesn't seem to be a URL of one of the supported sites."
20
23
  end
data/lib/yamd.rb CHANGED
@@ -3,6 +3,23 @@ require 'open-uri'
3
3
  require 'addressable/uri'
4
4
  require 'pathname'
5
5
 
6
+ require 'capybara'
7
+ require 'capybara/poltergeist'
8
+
9
+ Capybara.register_driver(:poltergeist) do | app |
10
+ Capybara::Poltergeist::Driver.new(app, js_errors: false)
11
+ end
12
+
13
+ Capybara.default_driver = :poltergeist
14
+ Capybara.run_server = false
15
+ $internet = Capybara.current_session
16
+
17
+ def my_open(url)
18
+ $internet.visit url
19
+
20
+ $internet.html
21
+ end
22
+
6
23
  class PageCrawler
7
24
  attr_reader :custom_data, :url, :parsed_html, :number, :chapter
8
25
 
@@ -42,7 +59,7 @@ class ChapterCrawler
42
59
  Enumerator.new do | yielder |
43
60
  number = 1
44
61
  pages_info.each do | page_info |
45
- parsed_html = Nokogiri::HTML(open(page_info[:url]))
62
+ parsed_html = Nokogiri::HTML(my_open(page_info[:url]))
46
63
  yielder.yield self.class.page_class.new(page_info, parsed_html, number, self)
47
64
  number += 1
48
65
  end
@@ -70,7 +87,7 @@ class MangaCrawler
70
87
  Enumerator.new do | yielder |
71
88
  number = 1
72
89
  chapters_info.each do | chapter_info |
73
- page = Nokogiri::HTML(open(chapter_info[:url]))
90
+ page = Nokogiri::HTML(my_open(chapter_info[:url]))
74
91
  yielder.yield self.class.chapter_class.new(chapter_info, page, number, self)
75
92
  number += 1
76
93
  end
@@ -91,22 +108,43 @@ class ImageDownloader
91
108
  @base_dir = base_dir
92
109
  end
93
110
 
111
+ # TODO: Many, many things:
112
+ # * Add a hash parameter with the possibilities of parallelization
113
+ # * What parallelization options should exist? Parallelize chapters? Parallelize pages independet of chapter? Chapters within a walking window? Pages within a walking window?
114
+ # * Add the retryable gem to all the IO actions, THIS INCLUDE THE ABSTRACT CLASSES ABOVE.
115
+ # * Avoid that an error with one page or chapter stops the download. Log the work of the algorithm and all the failures in a file inside the manga directory. This way the user can review the problems easily.
116
+ # * Good and bad points of the parallelization options:
117
+ # * Chapter - Start a thread for each chapter, download pages of the chapter in a sequential fashion.
118
+ # * Good: The most easy to implement. For the average manga is a good granularity: between 10~100 threads of 19~45 pages each.
119
+ # * Bad: If things goes bad, thing goes BAD. It's possible that each chapter will have not downloaded pages. In that case the best is remove everything and start over. Don't work for unending shounens (One Piece, in truth ~800 pieces of 19 pages each).
120
+ # * Chapter within window - starts N threads and put them in a queue, wait the first end, when this happen adds an new element at the end of the queue and wait again.
121
+ # * Good: Not very complex to implement. If things goes bad we remove the N last chapters, not everything. With log we can need to remove even less chapters. Works for unending shounens.
122
+ # * Bad: Adds a variable to be hardcoded or received. More complex than simply parallelize every chapter.
123
+ # * Pages (or Chapters and Pages) - starts a thread for every chapter, then starts a thread for every page of the chapter, simple, don't?
124
+ # * Good: If no other option eaten all of your bandwidth, this one will frozen your computer or give you the best result. Easy to implement.
125
+ # * Bad: Have you ever seen chaos? this is it. If something fail and you don't checked the log you will discover missing pages on the middle of chapters. Also not have a good granularity. There's a lot of thread creation overhead for little work. Also, almost surely will frozen your computer if the size of the manga is big and your bandwithd and processing power are small.
126
+ # * Pages within window - the same as the chapters within window but with pages instead of chapters.
127
+ # * Good: Not very complex to implement. Works well for mangas that the uploader has compressed an entire volume in an chapter, and there's only one volume. If things goes bad, you only need to delete the last N pages, if N is 40, for example, and it's a shounen with no less than 19 pages per chapter, delete the last 3 chapters.
128
+ # * Bad: Bad granularity. Not so bad if your bandwidth is small, but probably will cost a lot of CPU for little effort.
94
129
  def download(manga)
95
- manga_dir = Pathname.new(@base_dir).join(manga.name + '/')
130
+ manga_name = self.class.sanitize_dir_name(manga.name)
131
+ manga_dir = Pathname.new(@base_dir).join(manga_name + '/')
96
132
  if manga_dir.exist?
97
133
  p 'Manga dir exists. Skipping each existing chapter. If the script was forced to stop the last downloaded chapter can be incomplete. Remove it to be downloaded again.'
98
134
  else
99
135
  Dir.mkdir(manga_dir.to_s)
100
136
  end
101
137
  manga.chapters.each do | chapter |
102
- chapter_dir = manga_dir.join(chapter.name + '/')
138
+ chapter_name = self.class.sanitize_dir_name(chapter.name)
139
+ chapter_dir = manga_dir.join(chapter_name + '/')
103
140
  unless chapter_dir.exist?
104
141
  Dir.mkdir(chapter_dir.to_s)
105
142
  chapter.pages.each do | page |
106
143
  page_name = self.class.format_page_name(page, chapter, manga)
107
144
  page_abs_path = chapter_dir.join(page_name).to_s
108
145
  File.open(page_abs_path, 'wb') do | f |
109
- open(page.image_url, 'rb') do | image |
146
+ safe_uri = URI.encode(page.image_url, '[]')
147
+ open(safe_uri, 'rb') do | image |
110
148
  f.write(image.read)
111
149
  end
112
150
  end
@@ -120,5 +158,13 @@ class ImageDownloader
120
158
  page_path = Addressable::URI.parse(page.image_url).path
121
159
  format("%04d", page.number) + File.extname(page_path)
122
160
  end
161
+
162
+ # TODO: check if all text from the site that's used to make dirs or
163
+ # files is sanitized
164
+ # thanks to "Ranma 1/2"
165
+ def self.sanitize_dir_name(name)
166
+ # TODO: this is a hack, find a serious solution for every possible case
167
+ name.gsub(/\//, '_')
168
+ end
123
169
  end
124
170
 
data/lib/yamd/fakku.rb ADDED
@@ -0,0 +1,47 @@
1
+ require 'yamd'
2
+ require 'addressable/uri'
3
+
4
+ class FakkuPage < PageCrawler
5
+ def image_url
6
+ @parsed_html.at_css('img.current-page')['src']
7
+ end
8
+ end
9
+
10
+ class FakkuChapter < ChapterCrawler
11
+ def self.page_class
12
+ FakkuPage
13
+ end
14
+
15
+ def pages_info
16
+ # there's no need of an lazy enumerator here, no IO action is taken
17
+ page_options = @parsed_html.at_css('div#content select.drop').css('option')
18
+ pages_number = page_options.map { | option | option['value'].to_i }.max
19
+
20
+ page_urls = (1..pages_number).to_a.map do | i |
21
+ { url: self.url + "#page=#{i}" }
22
+ end
23
+
24
+ page_urls
25
+ end
26
+
27
+ def name
28
+ @custom_data[:name]
29
+ end
30
+ end
31
+
32
+ class FakkuCrawler < MangaCrawler
33
+ def chapters_info
34
+ url = URI.join(self.url, @parsed_html.at_css('a.button.green')['href'])
35
+ [{ name: 'OnlyChapter',
36
+ url: url }]
37
+ end
38
+
39
+ def self.chapter_class
40
+ FakkuChapter
41
+ end
42
+
43
+ def name
44
+ @parsed_html.at_css('div.content-name h1').text
45
+ end
46
+ end
47
+
metadata CHANGED
@@ -1,43 +1,85 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: yamd
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Henrique Becker
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-12-17 00:00:00.000000000 Z
11
+ date: 2015-12-25 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
15
15
  requirement: !ruby/object:Gem::Requirement
16
16
  requirements:
17
- - - ~>
17
+ - - "~>"
18
18
  - !ruby/object:Gem::Version
19
19
  version: '1.5'
20
20
  type: :runtime
21
21
  prerelease: false
22
22
  version_requirements: !ruby/object:Gem::Requirement
23
23
  requirements:
24
- - - ~>
24
+ - - "~>"
25
25
  - !ruby/object:Gem::Version
26
26
  version: '1.5'
27
27
  - !ruby/object:Gem::Dependency
28
28
  name: addressable
29
29
  requirement: !ruby/object:Gem::Requirement
30
30
  requirements:
31
- - - ~>
31
+ - - "~>"
32
32
  - !ruby/object:Gem::Version
33
33
  version: '2.3'
34
34
  type: :runtime
35
35
  prerelease: false
36
36
  version_requirements: !ruby/object:Gem::Requirement
37
37
  requirements:
38
- - - ~>
38
+ - - "~>"
39
39
  - !ruby/object:Gem::Version
40
40
  version: '2.3'
41
+ - !ruby/object:Gem::Dependency
42
+ name: capybara
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '2.5'
48
+ type: :runtime
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '2.5'
55
+ - !ruby/object:Gem::Dependency
56
+ name: poltergeist
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.8'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.8'
69
+ - !ruby/object:Gem::Dependency
70
+ name: phantomjs
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '1.9'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '1.9'
41
83
  description: 'This gem offers: classes to subclass and create a manga site crawler;
42
84
  a dowloader to use with these classes; some site-specific scripts.'
43
85
  email: henriquebecker91@gmail.com
@@ -46,10 +88,11 @@ executables:
46
88
  extensions: []
47
89
  extra_rdoc_files: []
48
90
  files:
91
+ - bin/yamd
92
+ - lib/yamd.rb
93
+ - lib/yamd/fakku.rb
49
94
  - lib/yamd/mangafox.rb
50
95
  - lib/yamd/mangahere.rb
51
- - lib/yamd.rb
52
- - bin/yamd
53
96
  homepage: http://rubygems.org/gems/yamd
54
97
  licenses:
55
98
  - Public domain
@@ -60,20 +103,19 @@ require_paths:
60
103
  - lib
61
104
  required_ruby_version: !ruby/object:Gem::Requirement
62
105
  requirements:
63
- - - '>='
106
+ - - ">="
64
107
  - !ruby/object:Gem::Version
65
108
  version: '0'
66
109
  required_rubygems_version: !ruby/object:Gem::Requirement
67
110
  requirements:
68
- - - '>='
111
+ - - ">="
69
112
  - !ruby/object:Gem::Version
70
113
  version: '0'
71
114
  requirements: []
72
115
  rubyforge_project:
73
- rubygems_version: 2.0.3
116
+ rubygems_version: 2.4.5.1
74
117
  signing_key:
75
118
  specification_version: 4
76
119
  summary: YAMD (Yet Another Manga Downloader) - A lazy interface for writting manga
77
120
  downloaders
78
121
  test_files: []
79
- has_rdoc: true