panchira 0.2.0 → 1.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f2715a3395e43d5ad43f35bedae84dbfe25a4cd533f964cbcc4cdaf953bc0c4b
4
- data.tar.gz: '059e23e1ca4831bc58c62a4a7ccd4ed87010fee75b7e8997560fe49f43486f01'
3
+ metadata.gz: 066440e461b75b84a9df04fd76f1960243785b26bc7f4c61289029248e0a8bd9
4
+ data.tar.gz: 1fc1f712c6a8d88363cf3c4162be2681e08631c515ffbe6631fba3fd204b91c0
5
5
  SHA512:
6
- metadata.gz: e5ed936514fec2e05dfcaeb727189d1bcc6328e1a27559bd925acba7dc3037c26c57c99fece2c88bad95c7d0d7ae7ffd6840f9e33dde58aef81db81ae600d829
7
- data.tar.gz: 8383db6bdc9c78e2e845651e7206d702f5a8566475b8161c9e464364da7b6aa9c5f9886125771c636e0465e8eec7f1ee1dda3c0865a1f0d478131510451c4a74
6
+ metadata.gz: 63a914d286eaf909f4a2ab7c128f3725a96a6badbac71a878362e4a09a4e29f720f1f81fab2fa4b1f0ddeb513fac04b5c00597132012f5dbe42d783f54b221b2
7
+ data.tar.gz: af6085627c05532b7019a7134da472329c52b0f61b3329079694a2f59115e52f1c7b0bc0acc2c9cc3ea19814a33c3e2cd9116fcd7f692278e2150de7874bb424
@@ -4,6 +4,46 @@ All notable changes to this project will be documented in this file.
4
4
  The format is based on [Keep a Changelog](http://keepachangelog.com/)
5
5
  and this project adheres to [Semantic Versioning](http://semver.org/).
6
6
 
7
+ ## 1.2.0 - 2020-10-31
8
+ ### Added
9
+ - You can now fetch author and circle name in resolvers (Resolver#fetch_author, Resolver#fetch_circle).
10
+
11
+ ### Changed
12
+ - Resolver#fetch_title returns the title of the content (not the original title of the page).
13
+
14
+ ## 1.1.1 - 2020-08-09
15
+ ### Added
16
+ - Added support for Fanza Doujin.
17
+ - Added support for description in Fanza Book.
18
+
19
+ ### Fixed
20
+ - Fixed an issue that fetching image was not working in Fanza Book.
21
+
22
+ ## 1.1.0 - 2020-08-06
23
+ ### Added
24
+ - Added support for Fanza Books.
25
+ - Added support for direct links to an image.
26
+ - You can now set cookie by overriding Resolver#cookie in individual resolvers.
27
+
28
+ ### Changed
29
+ - Resolver::USER_AGENT changed to Resolver#user_agent.
30
+
31
+ ## 1.0.0 - 2020-06-23
32
+ ### Added
33
+ - Added support for tags.
34
+
35
+ ### Fixed
36
+ - Fixed some outdated documents.
37
+
38
+ ## 0.3.0 - 2020-06-04
39
+ ### Added
40
+ - You can now register and use your own Resolver with this gem. (see Panchira::Extensions#register)
41
+ - Added support for new Twitter UI.
42
+
43
+ ### Changed
44
+ - Panchira::fetch now returns an instance of PanchiraResult instead of a hash.
45
+ - Changed default User-Agent slightly.
46
+
7
47
  ## 0.2.0 - 2020-05-18
8
48
  ### Added
9
49
  - Added support for Shousetsuka Ni Narou (novel18.syosetu.com).
@@ -18,6 +58,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
18
58
  ### Added
19
59
  - Released Panchira gem. At this time we can parse only 5 websites.
20
60
 
61
+ [1.1.0]: https://github.com/nuita/panchira/releases/tag/v1.1.0
62
+ [1.0.0]: https://github.com/nuita/panchira/releases/tag/v1.0.0
63
+ [0.3.0]: https://github.com/nuita/panchira/releases/tag/v0.3.0
21
64
  [0.2.0]: https://github.com/nuita/panchira/releases/tag/v0.2.0
22
65
  [0.1.1]: https://github.com/nuita/panchira/releases/tag/v0.1.1
23
66
  [0.1.0]: https://github.com/nuita/panchira/releases/tag/v0.1.0
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- panchira (0.2.0)
4
+ panchira (1.2.0)
5
5
  fastimage (~> 2.1.7)
6
6
  nokogiri (~> 1.10.9)
7
7
 
@@ -10,8 +10,8 @@ GEM
10
10
  specs:
11
11
  fastimage (2.1.7)
12
12
  mini_portile2 (2.4.0)
13
- minitest (5.14.0)
14
- nokogiri (1.10.9)
13
+ minitest (5.14.2)
14
+ nokogiri (1.10.10)
15
15
  mini_portile2 (~> 2.4.0)
16
16
  rake (12.3.3)
17
17
 
data/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  Due to some legal or ethical issues, most hentai and NSFW platforms don't clarify their content on meta tags. As a result, most hentai platforms are rendered poorly on the card previews on social media.
8
8
 
9
- To solve this issue, Panchira is made to parse correct and uncensored metadata from such web platforms (at this time we cover **DLSite, Komiflo, Melonbooks, Nijie and Pixiv**).
9
+ To solve this issue, Panchira is made to parse correct and uncensored metadata from such web platforms (at this time we cover **DLSite, Komiflo, Melonbooks, Nijie, Pixiv, Shousetsuka ni narou, Fanza and Twitter**).
10
10
 
11
11
  If you need card previews of hentai on your web application, but can't get them with simply parsing metatags, then it is time for Panchira.
12
12
 
@@ -16,7 +16,7 @@ This gem is derived from the [Nuita](https://github.com/nuita/nuita) project.
16
16
 
17
17
  **Please use this gem with appropriate censoring and age-restricting. Never violate local laws and copyrights.**
18
18
 
19
- If you are running one of the websites we cover and feel negative about it, please contact the community or the author([@kypkyp](https://github.com/kypkyp)).
19
+ If you are running one of the websites we cover and feel negative about this gem, please contact the community or the author([@kypkyp](https://github.com/kypkyp)).
20
20
 
21
21
  ## Installation
22
22
 
@@ -39,10 +39,12 @@ Or install it yourself as:
39
39
  ```
40
40
  > Panchira.fetch("https://www.pixiv.net/artworks/61711172")
41
41
 
42
- => {:canonical_url=>"https://pixiv.net/member_illust.php?mode=medium&illust_id=61711172", :title=>"#輿水幸子 すずしい顔で締め切りを破る幸子 - むらためのイラスト - pixiv", :description=>"(UTF16の)Pietで実行すると「すずしい」と出力する幸子(5色+白Pietカラーゴルフ)。解説記事は http://chy72.hatenablog.com/entry/2016/12/24/1", :image=>{:url=>"https://pixiv.cat/61711172.jpg", :width=>810, :height=>500}}
42
+ => #<Panchira::PanchiraResult:0x00007fb95d2c53f8 @canonical_url="https://pixiv.net/member_illust.php?mode=medium&illust_id=61711172", @title="#輿水幸子 すずしい顔で締め切りを破る幸子 - むらためのイラスト - pixiv", @description="(UTF16の)Pietで実行すると「すずしい」と出力する幸子(5色+白Pietカラーゴルフ)。解説記事は http://chy72.hatenablog.com/entry/2016/12/24/1", @image=#<Panchira::PanchiraImage:0x00007fb95f126ea0 @url="https://pixiv.cat/61711172.jpg", @width=810, @height=500>, @tags=["輿水幸子", "Piet", "プログラミング"]>
43
43
  ```
44
44
 
45
- Panchira is in beta at this time and doesn't have stable API documentation yet.
45
+ In most situation you would call `Panchira#fetch`. It is a singular method that takes a URI and returns an instance of `PanchiraResult`, which is a simple class that stores the website's information, such as title, description and so on.
46
+
47
+ Panchira has a special treatment for each website. `Resolver` classes are where those treatments take place, and you can use your own `Resolver` classes by registering it to Panchira. See `Panchira::Extensions` documentation in source code for further details.
46
48
 
47
49
  ## Development
48
50
 
@@ -6,16 +6,21 @@ require 'fastimage'
6
6
  require 'json'
7
7
 
8
8
  require_relative 'panchira/version'
9
+ require_relative 'panchira/panchira_result'
9
10
  require_relative 'panchira/resolvers/resolver'
10
11
  require_relative 'panchira/extensions'
11
12
 
12
13
  project_root = File.dirname(File.absolute_path(__FILE__))
13
14
  Dir.glob(project_root + '/panchira/resolvers/*_resolver.rb').sort.each { |file| require file }
14
15
 
16
+ # register fallback ImageResolver at the end. (resolver is selected by registration order)
17
+ ::Panchira::Extensions.register(Panchira::ImageResolver)
18
+
15
19
  # Main Panchira code goes here.
20
+ # If you simply want to get data from your URL, then ::Panchira::fetch() will do.
16
21
  module Panchira
17
22
  class << self
18
- # Fetch the given URL and returns a hash that contains attributes of hentai.
23
+ # Return a PanchiraResult that contains the attributes of given url.
19
24
  def fetch(url)
20
25
  resolver = select_resolver(url)
21
26
 
@@ -1,15 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Panchira
4
+ # This Module manages Resolver classes.
5
+ # To enable your own Resolver, you need to call Extensions::register().
4
6
  module Extensions
5
7
  @resolvers = []
6
8
 
7
9
  class << self
8
- # Register a resolver class which extends Panchira::Resolver.
10
+ # Register a given Resolver to Extensions::Resolvers.
9
11
  def register(resolver)
10
12
  @resolvers.push(resolver) unless @resolvers.include?(resolver)
11
13
  end
12
14
 
15
+ # Panchira::fetch will find a correct Resolver based on this list.
13
16
  attr_reader :resolvers
14
17
  end
15
18
  end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Panchira
4
+ # Image attributes in PanchiraResult.
5
+ class PanchiraImage
6
+ attr_accessor :url, :width, :height
7
+ end
8
+
9
+ # Result class for Panchira.fetch.
10
+ class PanchiraResult
11
+ attr_accessor :canonical_url, :title, :description, :image, :tags, :author, :circle
12
+ end
13
+ end
@@ -6,9 +6,39 @@ module Panchira
6
6
 
7
7
  private
8
8
 
9
+ # DLSiteのタイトルの[]に含まれている値はtitleタグだとサークル名 or 出版社名だが、
10
+ # Panchiraが優先するog:titleではサークル名 or 著者名 となる。
11
+ # 取得に際しては、以下の3パターンを考慮する必要があるため、titleタグとtableの解析が必要となる:
12
+ # 1) 同人系の一部, 特に音声など。タイトル[サークル名]. 本文中に著者・作者の記載なし
13
+ # 2) 同人系の一部, 特に一部の同人誌など。タイトル[サークル名]. 本文中に「作者」の記載あり
14
+ # 3) 商業系。タイトル[著者名] サークル名なし
15
+ # 込み入った実装になってしまったため、parse自体をいじる必要があるかも
16
+ def parse_title
17
+ @title_md = super.match(/(.+) \[(\S+)\] \|.+/)
18
+ @title_md[1]
19
+ end
20
+
21
+ def parse_author
22
+ @page.css('table[id*="work_"] tr').each do |tr|
23
+ if tr.css('th').text =~ /(作|著)者/
24
+ return @author = tr.css('td > a').first.text.strip
25
+ end
26
+ end
27
+
28
+ @author = nil
29
+ end
30
+
31
+ def parse_circle
32
+ @title_md[2] if @author != @title_md[2]
33
+ end
34
+
9
35
  def parse_image_url
10
36
  @page.css('//meta[property="og:image"]/@content').first.to_s.sub(/sam/, 'main')
11
37
  end
38
+
39
+ def parse_tags
40
+ @page.css('.main_genre').children.children.map(&:text)
41
+ end
12
42
  end
13
43
 
14
44
  ::Panchira::Extensions.register(Panchira::DlsiteResolver)
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'net/https'
4
+
5
+ module Panchira
6
+ module Fanza
7
+ FANZA_COOKIE = 'age_check_done=1;'
8
+
9
+ class FanzaResolver < Resolver
10
+ private
11
+
12
+ def cookie
13
+ ::Panchira::Fanza::FANZA_COOKIE
14
+ end
15
+ end
16
+
17
+ class FanzaBookResolver < FanzaResolver
18
+ URL_REGEXP = %r{book\.dmm\.co\.jp\/}.freeze
19
+
20
+ private
21
+
22
+ def parse_author
23
+ @page.css('.m-boxDetailProductInfoMainList__description__list__item > a').first&.text.to_s
24
+ end
25
+
26
+ def parse_image_url
27
+ @page.css('.m-imgDetailProductPack/@src').first.to_s
28
+ end
29
+
30
+ def parse_tags
31
+ @page.css('.m-boxDetailProductInfo__list__description__item > a').map(&:text)
32
+ end
33
+
34
+ def parse_description
35
+ @page.css('.m-boxDetailProduct__info__story').first&.text.to_s.gsub(/[\n\t]/, '')
36
+ end
37
+ end
38
+
39
+ class FanzaDoujinResolver < FanzaResolver
40
+ URL_REGEXP = %r{dmm\.co\.jp\/dc\/doujin\/}.freeze
41
+
42
+ private
43
+
44
+ def parse_circle
45
+ @page.css('a.circleName__txt').first.text
46
+ end
47
+
48
+ def parse_tags
49
+ @page.css('.genreTag__item').map { |t| t.text.strip }
50
+ end
51
+ end
52
+ end
53
+
54
+ ::Panchira::Extensions.register(Panchira::Fanza::FanzaBookResolver)
55
+ ::Panchira::Extensions.register(Panchira::Fanza::FanzaDoujinResolver)
56
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Panchira
4
+ class ImageResolver < Resolver
5
+ URL_REGEXP = %r{\.(png|gif|jpg|jpeg|webp)$}.freeze
6
+
7
+ def fetch
8
+ result = PanchiraResult.new
9
+ result.canonical_url = @url
10
+ result.image = PanchiraImage.new
11
+ result.image.url = @url
12
+ result
13
+ end
14
+ end
15
+ end
@@ -10,33 +10,36 @@ module Panchira
10
10
  @url = url
11
11
 
12
12
  @id = url.slice(URL_REGEXP, 1)
13
- raw_json = URI.parse("https://api.komiflo.com/content/id/#{@id}").read('User-Agent' => USER_AGENT)
13
+ raw_json = URI.parse("https://api.komiflo.com/content/id/#{@id}").read('User-Agent' => user_agent)
14
14
  @json = JSON.parse(raw_json)
15
15
  end
16
16
 
17
17
  private
18
18
 
19
19
  def parse_title
20
- comic_title = @json['content']['data']['title']
21
- "#{comic_title} | Komiflo"
20
+ @json['content']['data']['title']
22
21
  end
23
22
 
24
23
  def parse_image_url
25
24
  'https://t.komiflo.com/564_mobile_large_3x/' + @json['content']['named_imgs']['cover']['filename']
26
25
  end
27
26
 
28
- def parse_description
29
- author = @json['content']['attributes']['artists']['children'][0]['data']['name']
27
+ def parse_author
28
+ @json['content']['attributes']['artists']['children'][0]['data']['name']
29
+ end
30
30
 
31
- parent = @json['content']['parents'][0]['data']['title']
32
- description = '著: ' + author if author
33
- description += " / #{parent}" if parent
31
+ def parse_description
32
+ @json['content']['parents'][0]['data']['title']
34
33
  end
35
34
 
36
35
  def parse_canonical_url
37
36
  id = @url.slice(%r{komiflo\.com(?:/#!)?/comics/(\d+)}, 1)
38
37
  'https://komiflo.com/comics/' + id
39
38
  end
39
+
40
+ def parse_tags
41
+ @json['content']['attributes']['tags']['children'].map { |content| content['data']['name'] }
42
+ end
40
43
  end
41
44
 
42
45
  ::Panchira::Extensions.register(Panchira::KomifloResolver)
@@ -4,8 +4,41 @@ module Panchira
4
4
  class MelonbooksResolver < Resolver
5
5
  URL_REGEXP = %r{melonbooks.co.jp/detail/detail.php\?product_id=(\d+)}.freeze
6
6
 
7
+ def fetch
8
+ result = PanchiraResult.new
9
+
10
+ @page = fetch_page(@url)
11
+ result.canonical_url = parse_canonical_url
12
+
13
+ @page = fetch_page(result.canonical_url) if @url != result.canonical_url
14
+
15
+ result.title, result.author, result.circle = parse_table
16
+ result.description = parse_description
17
+ result.image = parse_image
18
+ result.tags = parse_tags
19
+
20
+ result
21
+ end
22
+
7
23
  private
8
24
 
25
+ def parse_table
26
+ title, author, circle = nil, nil, nil
27
+
28
+ @page.css('#description > table.stripe > tr').each do |tr|
29
+ case tr.css('th').text
30
+ when 'タイトル'
31
+ title = tr.css('td').text.strip
32
+ when 'サークル名'
33
+ circle = tr.css('td > a').text.match(/^(.+)\W\(作品数:/)&.values_at(1)[0]
34
+ when '作家名'
35
+ author = tr.css('td > a').text.strip
36
+ end
37
+ end
38
+
39
+ [title, author, circle]
40
+ end
41
+
9
42
  def parse_canonical_url
10
43
  product_id = @url.slice(URL_REGEXP, 1)
11
44
  'https://www.melonbooks.co.jp/detail/detail.php?product_id=' + product_id + '&adult_view=1'
@@ -25,6 +58,10 @@ module Panchira
25
58
  def parse_image_url
26
59
  @page.css('//meta[property="og:image"]/@content').first.to_s.sub(/&c=1/, '')
27
60
  end
61
+
62
+ def parse_tags
63
+ @page.css('#related_tags .clearfix').children.children.map(&:text)
64
+ end
28
65
  end
29
66
 
30
67
  ::Panchira::Extensions.register(Panchira::MelonbooksResolver)
@@ -3,18 +3,61 @@
3
3
  require 'net/https'
4
4
 
5
5
  module Panchira
6
- class NarouResolver < Resolver
7
- URL_REGEXP = %r{novel18\.syosetu\.com/}.freeze
6
+ module Narou
7
+ class Novel18Resolver < Resolver
8
+ URL_REGEXP = %r{novel18\.syosetu\.com/}.freeze
9
+ ID_REGEXP = %{novel18\.syosetu\.com/(?<id>[^/]+)}
8
10
 
9
- def fetch_page(uri)
10
- u = URI.parse(uri)
11
- http = Net::HTTP.new(u.host, u.port)
12
- http.use_ssl = u.port == 443
13
- res = http.get u.request_uri, { 'cookie' => 'over18=yes;' }
11
+ def initialize(url)
12
+ super(url)
14
13
 
15
- Nokogiri::HTML.parse(res.body, uri)
14
+ if id = @url.match(ID_REGEXP)[:id]
15
+ @desc = fetch_page("https://novel18.syosetu.com/novelview/infotop/ncode/#{id}/")
16
+ end
17
+ end
18
+
19
+ def fetch_page(uri)
20
+ u = URI.parse(uri)
21
+ http = Net::HTTP.new(u.host, u.port)
22
+ http.use_ssl = u.port == 443
23
+ res = http.get u.request_uri, { 'cookie' => 'over18=yes;' }
24
+
25
+ Nokogiri::HTML.parse(res.body, uri)
26
+ end
27
+
28
+ def parse_author
29
+ @desc&.xpath('//*[@id="noveltable1"]/tr[2]/td')&.text&.strip
30
+ end
31
+
32
+ def parse_tags
33
+ # つらい。
34
+ @desc&.xpath('//*[@id="noveltable1"]/tr[3]')&.text&.split("\n\n\n")&.dig(1)&.split(' ')
35
+ end
36
+ end
37
+
38
+ class NcodeResolver < Resolver
39
+ URL_REGEXP = /ncode\.syosetu\.com/.freeze
40
+ ID_REGEXP = %{ncode\.syosetu\.com/(?<id>[^/]+)}
41
+
42
+ def initialize(url)
43
+ super(url)
44
+
45
+ if id = @url.match(ID_REGEXP)[:id]
46
+ @desc = fetch_page("https://novel18.syosetu.com/novelview/infotop/ncode/#{id}/")
47
+ end
48
+ end
49
+
50
+ def parse_author
51
+ @desc&.xpath('//*[@id="noveltable1"]/tr[2]/td')&.text&.strip
52
+ end
53
+
54
+ def parse_tags
55
+ # めっちゃつらい。
56
+ @desc&.xpath('//*[@id="noveltable1"]/tr[3]')&.text&.split("\n\n\n")&.dig(1)&.delete("\u00A0")&.split(' ')&.grep_v('')
57
+ end
16
58
  end
17
59
  end
18
60
 
19
- ::Panchira::Extensions.register(Panchira::NarouResolver)
61
+ ::Panchira::Extensions.register(Panchira::Narou::NcodeResolver)
62
+ ::Panchira::Extensions.register(Panchira::Narou::Novel18Resolver)
20
63
  end
@@ -6,6 +6,21 @@ module Panchira
6
6
 
7
7
  private
8
8
 
9
+ def parse_title
10
+ full_title = super
11
+ @md = full_title.match(/\A(?<title>.+) \| (?<author>.+)\z/)
12
+
13
+ @md[:title]
14
+ end
15
+
16
+ def parse_author
17
+ @md[:author]
18
+ end
19
+
20
+ def parse_description
21
+ @page.css('p.illust_description')&.first&.text&.strip
22
+ end
23
+
9
24
  def parse_canonical_url
10
25
  @url.sub(/sp.nijie/, 'nijie').sub(/view_popup/, 'view')
11
26
  end
@@ -24,6 +39,10 @@ module Panchira
24
39
  @page.css('//meta[property="og:image"]/@content').first.to_s
25
40
  end
26
41
  end
42
+
43
+ def parse_tags
44
+ @page.css('#view-tag span.tag_name').map(&:text)
45
+ end
27
46
  end
28
47
 
29
48
  ::Panchira::Extensions.register(Panchira::NijieResolver)
@@ -7,10 +7,21 @@ module Panchira
7
7
  def initialize(url)
8
8
  super(url)
9
9
  @illust_id = url.slice(URL_REGEXP, 2)
10
+
11
+ raw_json = URI.parse("https://www.pixiv.net/ajax/illust/#{@illust_id}").read('User-Agent' => user_agent)
12
+ @json = JSON.parse(raw_json)
10
13
  end
11
14
 
12
15
  private
13
16
 
17
+ def parse_title
18
+ @json['body']['title']
19
+ end
20
+
21
+ def parse_author
22
+ @json['body']['userName']
23
+ end
24
+
14
25
  def parse_canonical_url
15
26
  'https://pixiv.net/member_illust.php?mode=medium&illust_id=' + @illust_id
16
27
  end
@@ -27,6 +38,10 @@ module Panchira
27
38
  rescue StandardError
28
39
  @page.css('//meta[property="og:image"]/@content').first.to_s
29
40
  end
41
+
42
+ def parse_tags
43
+ @json['body']['tags']['tags'].map { |content| content['tag'] }
44
+ end
30
45
  end
31
46
 
32
47
  ::Panchira::Extensions.register(Panchira::PixivResolver)
@@ -1,39 +1,43 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Resolver is a class that actually GET url and resolve attributes.
4
- # This class is the default resolver for pages,
5
- # and is inherited by the other resolvers.
6
3
  module Panchira
4
+ # Resolver is a class that actually get attributes by fetching designated url.
5
+ # This class is the default resolver for pages. <br>
6
+ # To create your own resolver, first you make a class that extends Resolver,
7
+ # and then register it by ::Panchira::Extensions::register().
8
+ # Then ::Panchira::fetch will pick up your resolver when Resolver::applicable?() is true.
7
9
  class Resolver
8
- # The URL pattern that this resolver tries to resolve.
9
- # Should be redefined in subclasses.
10
+ # URL pattern that a resolver tries to resolve.
11
+ # You must override this in subclasses to limit which urls to resolve.
10
12
  URL_REGEXP = URI::DEFAULT_PARSER.make_regexp
11
13
 
12
- USER_AGENT = "Mozilla/5.0 (compatible; Panchira/#{VERSION}; +https://github.com/nuita/panchira)"
13
-
14
14
  def initialize(url)
15
15
  @url = url
16
16
  end
17
17
 
18
+ # This function is called right after this Resolver instance is made.
19
+ # Fetch page from @url and return PanchiraResult.
18
20
  def fetch
19
- attributes = {}
21
+ result = PanchiraResult.new
20
22
 
21
23
  @page = fetch_page(@url)
22
- attributes[:canonical_url] = parse_canonical_url
24
+ result.canonical_url = parse_canonical_url
23
25
 
24
- if @url != attributes[:canonical_url]
25
- @page = fetch_page(attributes[:canonical_url])
26
- end
26
+ @page = fetch_page(result.canonical_url) if @url != result.canonical_url
27
27
 
28
- attributes[:title] = parse_title
29
- attributes[:description] = parse_description
30
- attributes[:image] = parse_image
28
+ result.title = parse_title
29
+ result.description = parse_description
30
+ result.image = parse_image
31
+ result.tags = parse_tags
32
+ result.author = parse_author
33
+ result.circle = parse_circle
31
34
 
32
- attributes
35
+ result
33
36
  end
34
37
 
35
38
  class << self
36
39
  # Tell whether the url is applicable for this resolver.
40
+ # ::Panchira::fetch uses this method to choose a Resolver for a URL.
37
41
  def applicable?(url)
38
42
  url =~ self::URL_REGEXP
39
43
  end
@@ -42,16 +46,33 @@ module Panchira
42
46
  private
43
47
 
44
48
  def fetch_page(url)
45
- raw_page = URI.parse(url).read('User-Agent' => USER_AGENT)
49
+ read_options = {
50
+ 'User-Agent' => user_agent,
51
+ 'Cookie' => cookie
52
+ }
53
+
54
+ raw_page = URI.parse(url).read(read_options)
46
55
  charset = raw_page.charset
47
56
  Nokogiri::HTML.parse(raw_page, url, charset)
48
57
  end
49
58
 
50
59
  def parse_canonical_url
51
- if (canonical_url = @page.css('//link[rel="canonical"]/@href')).any?
52
- canonical_url.to_s
53
- else
54
- @url
60
+ history = []
61
+
62
+ # fetch page and refresh canonical_url until canonical_url converges.
63
+ loop do
64
+ url_in_res = @page.css('//link[rel="canonical"]/@href').to_s
65
+
66
+ if url_in_res.empty?
67
+ return history.last || @url
68
+ else
69
+ if history.include?(url_in_res) || history.length > 5
70
+ return url_in_res
71
+ else
72
+ history.push(url_in_res)
73
+ @page = fetch_page(url_in_res)
74
+ end
75
+ end
55
76
  end
56
77
  end
57
78
 
@@ -72,9 +93,9 @@ module Panchira
72
93
  end
73
94
 
74
95
  def parse_image
75
- image = {}
76
- image[:url] = parse_image_url
77
- image[:width], image[:height] = FastImage.size(image[:url])
96
+ image = PanchiraImage.new
97
+ image.url = parse_image_url
98
+ image.width, image.height = FastImage.size(image.url)
78
99
 
79
100
  image
80
101
  end
@@ -82,5 +103,25 @@ module Panchira
82
103
  def parse_image_url
83
104
  @page.css('//meta[property="og:image"]/@content').first.to_s
84
105
  end
106
+
107
+ def parse_tags
108
+ []
109
+ end
110
+
111
+ def cookie
112
+ ''
113
+ end
114
+
115
+ def parse_author
116
+ @page.css('//meta[name="author"]/@content').first.to_s
117
+ end
118
+
119
+ def parse_circle
120
+ nil
121
+ end
122
+
123
+ def user_agent
124
+ "Mozilla/5.0 (compatible; PanchiraBot/#{VERSION}; +https://github.com/nuita/panchira)"
125
+ end
85
126
  end
86
127
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Panchira
4
- VERSION = '0.2.0'
4
+ VERSION = '1.2.0'
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: panchira
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - kyp
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-05-18 00:00:00.000000000 Z
11
+ date: 2020-10-31 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -101,7 +101,10 @@ files:
101
101
  - bin/setup
102
102
  - lib/panchira.rb
103
103
  - lib/panchira/extensions.rb
104
+ - lib/panchira/panchira_result.rb
104
105
  - lib/panchira/resolvers/dlsite_resolver.rb
106
+ - lib/panchira/resolvers/fanza_resolver.rb
107
+ - lib/panchira/resolvers/image_resolver.rb
105
108
  - lib/panchira/resolvers/komiflo_resolver.rb
106
109
  - lib/panchira/resolvers/melonbooks_resolver.rb
107
110
  - lib/panchira/resolvers/narou_resolver.rb