panchira 0.2.0 → 1.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: f2715a3395e43d5ad43f35bedae84dbfe25a4cd533f964cbcc4cdaf953bc0c4b
4
- data.tar.gz: '059e23e1ca4831bc58c62a4a7ccd4ed87010fee75b7e8997560fe49f43486f01'
3
+ metadata.gz: 066440e461b75b84a9df04fd76f1960243785b26bc7f4c61289029248e0a8bd9
4
+ data.tar.gz: 1fc1f712c6a8d88363cf3c4162be2681e08631c515ffbe6631fba3fd204b91c0
5
5
  SHA512:
6
- metadata.gz: e5ed936514fec2e05dfcaeb727189d1bcc6328e1a27559bd925acba7dc3037c26c57c99fece2c88bad95c7d0d7ae7ffd6840f9e33dde58aef81db81ae600d829
7
- data.tar.gz: 8383db6bdc9c78e2e845651e7206d702f5a8566475b8161c9e464364da7b6aa9c5f9886125771c636e0465e8eec7f1ee1dda3c0865a1f0d478131510451c4a74
6
+ metadata.gz: 63a914d286eaf909f4a2ab7c128f3725a96a6badbac71a878362e4a09a4e29f720f1f81fab2fa4b1f0ddeb513fac04b5c00597132012f5dbe42d783f54b221b2
7
+ data.tar.gz: af6085627c05532b7019a7134da472329c52b0f61b3329079694a2f59115e52f1c7b0bc0acc2c9cc3ea19814a33c3e2cd9116fcd7f692278e2150de7874bb424
@@ -4,6 +4,46 @@ All notable changes to this project will be documented in this file.
4
4
  The format is based on [Keep a Changelog](http://keepachangelog.com/)
5
5
  and this project adheres to [Semantic Versioning](http://semver.org/).
6
6
 
7
+ ## 1.2.0 - 2020-10-31
8
+ ### Added
9
+ - You can now fetch author and circle name in resolvers (Resolver#fetch_author, Resolver#fetch_circle).
10
+
11
+ ### Changed
12
+ - Resolver#fetch_title returns the title of the content (not the original title of the page).
13
+
14
+ ## 1.1.1 - 2020-08-09
15
+ ### Added
16
+ - Added support for Fanza Doujin.
17
+ - Added support for description in Fanza Book.
18
+
19
+ ### Fixed
20
+ - Fixed an issue that fetching image was not working in Fanza Book.
21
+
22
+ ## 1.1.0 - 2020-08-06
23
+ ### Added
24
+ - Added support for Fanza Books.
25
+ - Added support for direct links to an image.
26
+ - You can now set cookie by overriding Resolver#cookie in individual resolvers.
27
+
28
+ ### Changed
29
+ - Resolver::USER_AGENT changed to Resolver#user_agent.
30
+
31
+ ## 1.0.0 - 2020-06-23
32
+ ### Added
33
+ - Added support for tags.
34
+
35
+ ### Fixed
36
+ - Fixed some outdated documents.
37
+
38
+ ## 0.3.0 - 2020-06-04
39
+ ### Added
40
+ - You can now register and use your own Resolver with this gem. (see Panchira::Extensions#register)
41
+ - Added support for new Twitter UI.
42
+
43
+ ### Changed
44
+ - Panchira::fetch now returns an instance of PanchiraResult instead of a hash.
45
+ - Changed default User-Agent slightly.
46
+
7
47
  ## 0.2.0 - 2020-05-18
8
48
  ### Added
9
49
  - Added support for Shousetsuka Ni Narou (novel18.syosetu.com).
@@ -18,6 +58,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
18
58
  ### Added
19
59
  - Released Panchira gem. At this time we can parse only 5 websites.
20
60
 
61
+ [1.1.0]: https://github.com/nuita/panchira/releases/tag/v1.1.0
62
+ [1.0.0]: https://github.com/nuita/panchira/releases/tag/v1.0.0
63
+ [0.3.0]: https://github.com/nuita/panchira/releases/tag/v0.3.0
21
64
  [0.2.0]: https://github.com/nuita/panchira/releases/tag/v0.2.0
22
65
  [0.1.1]: https://github.com/nuita/panchira/releases/tag/v0.1.1
23
66
  [0.1.0]: https://github.com/nuita/panchira/releases/tag/v0.1.0
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- panchira (0.2.0)
4
+ panchira (1.2.0)
5
5
  fastimage (~> 2.1.7)
6
6
  nokogiri (~> 1.10.9)
7
7
 
@@ -10,8 +10,8 @@ GEM
10
10
  specs:
11
11
  fastimage (2.1.7)
12
12
  mini_portile2 (2.4.0)
13
- minitest (5.14.0)
14
- nokogiri (1.10.9)
13
+ minitest (5.14.2)
14
+ nokogiri (1.10.10)
15
15
  mini_portile2 (~> 2.4.0)
16
16
  rake (12.3.3)
17
17
 
data/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
 
7
7
  Due to some legal or ethical issues, most hentai and NSFW platforms don't clarify their content on meta tags. As a result, most hentai platforms are rendered poorly on the card previews on social media.
8
8
 
9
- To solve this issue, Panchira is made to parse correct and uncensored metadata from such web platforms (at this time we cover **DLSite, Komiflo, Melonbooks, Nijie and Pixiv**).
9
+ To solve this issue, Panchira is made to parse correct and uncensored metadata from such web platforms (at this time we cover **DLSite, Komiflo, Melonbooks, Nijie, Pixiv, Shousetsuka ni narou, Fanza and Twitter**).
10
10
 
11
11
  If you need card previews of hentai on your web application, but can't get them with simply parsing metatags, then it is time for Panchira.
12
12
 
@@ -16,7 +16,7 @@ This gem is derived from the [Nuita](https://github.com/nuita/nuita) project.
16
16
 
17
17
  **Please use this gem with appropriate censoring and age-restricting. Never violate local laws and copyrights.**
18
18
 
19
- If you are running one of the websites we cover and feel negative about it, please contact the community or the author([@kypkyp](https://github.com/kypkyp)).
19
+ If you are running one of the websites we cover and feel negative about this gem, please contact the community or the author([@kypkyp](https://github.com/kypkyp)).
20
20
 
21
21
  ## Installation
22
22
 
@@ -39,10 +39,12 @@ Or install it yourself as:
39
39
  ```
40
40
  > Panchira.fetch("https://www.pixiv.net/artworks/61711172")
41
41
 
42
- => {:canonical_url=>"https://pixiv.net/member_illust.php?mode=medium&illust_id=61711172", :title=>"#輿水幸子 すずしい顔で締め切りを破る幸子 - むらためのイラスト - pixiv", :description=>"(UTF16の)Pietで実行すると「すずしい」と出力する幸子(5色+白Pietカラーゴルフ)。解説記事は http://chy72.hatenablog.com/entry/2016/12/24/1", :image=>{:url=>"https://pixiv.cat/61711172.jpg", :width=>810, :height=>500}}
42
+ => #<Panchira::PanchiraResult:0x00007fb95d2c53f8 @canonical_url="https://pixiv.net/member_illust.php?mode=medium&illust_id=61711172", @title="#輿水幸子 すずしい顔で締め切りを破る幸子 - むらためのイラスト - pixiv", @description="(UTF16の)Pietで実行すると「すずしい」と出力する幸子(5色+白Pietカラーゴルフ)。解説記事は http://chy72.hatenablog.com/entry/2016/12/24/1", @image=#<Panchira::PanchiraImage:0x00007fb95f126ea0 @url="https://pixiv.cat/61711172.jpg", @width=810, @height=500>, @tags=["輿水幸子", "Piet", "プログラミング"]>
43
43
  ```
44
44
 
45
- Panchira is in beta at this time and doesn't have stable API documentation yet.
45
+ In most situation you would call `Panchira#fetch`. It is a singular method that takes a URI and returns an instance of `PanchiraResult`, which is a simple class that stores the website's information, such as title, description and so on.
46
+
47
+ Panchira has a special treatment for each website. `Resolver` classes are where those treatments take place, and you can use your own `Resolver` classes by registering it to Panchira. See `Panchira::Extensions` documentation in source code for further details.
46
48
 
47
49
  ## Development
48
50
 
@@ -6,16 +6,21 @@ require 'fastimage'
6
6
  require 'json'
7
7
 
8
8
  require_relative 'panchira/version'
9
+ require_relative 'panchira/panchira_result'
9
10
  require_relative 'panchira/resolvers/resolver'
10
11
  require_relative 'panchira/extensions'
11
12
 
12
13
  project_root = File.dirname(File.absolute_path(__FILE__))
13
14
  Dir.glob(project_root + '/panchira/resolvers/*_resolver.rb').sort.each { |file| require file }
14
15
 
16
+ # register fallback ImageResolver at the end. (resolver is selected by registration order)
17
+ ::Panchira::Extensions.register(Panchira::ImageResolver)
18
+
15
19
  # Main Panchira code goes here.
20
+ # If you simply want to get data from your URL, then ::Panchira::fetch() will do.
16
21
  module Panchira
17
22
  class << self
18
- # Fetch the given URL and returns a hash that contains attributes of hentai.
23
+ # Return a PanchiraResult that contains the attributes of given url.
19
24
  def fetch(url)
20
25
  resolver = select_resolver(url)
21
26
 
@@ -1,15 +1,18 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Panchira
4
+ # This Module manages Resolver classes.
5
+ # To enable your own Resolver, you need to call Extensions::register().
4
6
  module Extensions
5
7
  @resolvers = []
6
8
 
7
9
  class << self
8
- # Register a resolver class which extends Panchira::Resolver.
10
+ # Register a given Resolver to Extensions::Resolvers.
9
11
  def register(resolver)
10
12
  @resolvers.push(resolver) unless @resolvers.include?(resolver)
11
13
  end
12
14
 
15
+ # Panchira::fetch will find a correct Resolver based on this list.
13
16
  attr_reader :resolvers
14
17
  end
15
18
  end
@@ -0,0 +1,13 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Panchira
4
+ # Image attributes in PanchiraResult.
5
+ class PanchiraImage
6
+ attr_accessor :url, :width, :height
7
+ end
8
+
9
+ # Result class for Panchira.fetch.
10
+ class PanchiraResult
11
+ attr_accessor :canonical_url, :title, :description, :image, :tags, :author, :circle
12
+ end
13
+ end
@@ -6,9 +6,39 @@ module Panchira
6
6
 
7
7
  private
8
8
 
9
+ # DLSiteのタイトルの[]に含まれている値はtitleタグだとサークル名 or 出版社名だが、
10
+ # Panchiraが優先するog:titleではサークル名 or 著者名 となる。
11
+ # 取得に際しては、以下の3パターンを考慮する必要があるため、titleタグとtableの解析が必要となる:
12
+ # 1) 同人系の一部, 特に音声など。タイトル[サークル名]. 本文中に著者・作者の記載なし
13
+ # 2) 同人系の一部, 特に一部の同人誌など。タイトル[サークル名]. 本文中に「作者」の記載あり
14
+ # 3) 商業系。タイトル[著者名] サークル名なし
15
+ # 込み入った実装になってしまったため、parse自体をいじる必要があるかも
16
+ def parse_title
17
+ @title_md = super.match(/(.+) \[(\S+)\] \|.+/)
18
+ @title_md[1]
19
+ end
20
+
21
+ def parse_author
22
+ @page.css('table[id*="work_"] tr').each do |tr|
23
+ if tr.css('th').text =~ /(作|著)者/
24
+ return @author = tr.css('td > a').first.text.strip
25
+ end
26
+ end
27
+
28
+ @author = nil
29
+ end
30
+
31
+ def parse_circle
32
+ @title_md[2] if @author != @title_md[2]
33
+ end
34
+
9
35
  def parse_image_url
10
36
  @page.css('//meta[property="og:image"]/@content').first.to_s.sub(/sam/, 'main')
11
37
  end
38
+
39
+ def parse_tags
40
+ @page.css('.main_genre').children.children.map(&:text)
41
+ end
12
42
  end
13
43
 
14
44
  ::Panchira::Extensions.register(Panchira::DlsiteResolver)
@@ -0,0 +1,56 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'net/https'
4
+
5
+ module Panchira
6
+ module Fanza
7
+ FANZA_COOKIE = 'age_check_done=1;'
8
+
9
+ class FanzaResolver < Resolver
10
+ private
11
+
12
+ def cookie
13
+ ::Panchira::Fanza::FANZA_COOKIE
14
+ end
15
+ end
16
+
17
+ class FanzaBookResolver < FanzaResolver
18
+ URL_REGEXP = %r{book\.dmm\.co\.jp\/}.freeze
19
+
20
+ private
21
+
22
+ def parse_author
23
+ @page.css('.m-boxDetailProductInfoMainList__description__list__item > a').first&.text.to_s
24
+ end
25
+
26
+ def parse_image_url
27
+ @page.css('.m-imgDetailProductPack/@src').first.to_s
28
+ end
29
+
30
+ def parse_tags
31
+ @page.css('.m-boxDetailProductInfo__list__description__item > a').map(&:text)
32
+ end
33
+
34
+ def parse_description
35
+ @page.css('.m-boxDetailProduct__info__story').first&.text.to_s.gsub(/[\n\t]/, '')
36
+ end
37
+ end
38
+
39
+ class FanzaDoujinResolver < FanzaResolver
40
+ URL_REGEXP = %r{dmm\.co\.jp\/dc\/doujin\/}.freeze
41
+
42
+ private
43
+
44
+ def parse_circle
45
+ @page.css('a.circleName__txt').first.text
46
+ end
47
+
48
+ def parse_tags
49
+ @page.css('.genreTag__item').map { |t| t.text.strip }
50
+ end
51
+ end
52
+ end
53
+
54
+ ::Panchira::Extensions.register(Panchira::Fanza::FanzaBookResolver)
55
+ ::Panchira::Extensions.register(Panchira::Fanza::FanzaDoujinResolver)
56
+ end
@@ -0,0 +1,15 @@
1
+ # frozen_string_literal: true
2
+
3
+ module Panchira
4
+ class ImageResolver < Resolver
5
+ URL_REGEXP = %r{\.(png|gif|jpg|jpeg|webp)$}.freeze
6
+
7
+ def fetch
8
+ result = PanchiraResult.new
9
+ result.canonical_url = @url
10
+ result.image = PanchiraImage.new
11
+ result.image.url = @url
12
+ result
13
+ end
14
+ end
15
+ end
@@ -10,33 +10,36 @@ module Panchira
10
10
  @url = url
11
11
 
12
12
  @id = url.slice(URL_REGEXP, 1)
13
- raw_json = URI.parse("https://api.komiflo.com/content/id/#{@id}").read('User-Agent' => USER_AGENT)
13
+ raw_json = URI.parse("https://api.komiflo.com/content/id/#{@id}").read('User-Agent' => user_agent)
14
14
  @json = JSON.parse(raw_json)
15
15
  end
16
16
 
17
17
  private
18
18
 
19
19
  def parse_title
20
- comic_title = @json['content']['data']['title']
21
- "#{comic_title} | Komiflo"
20
+ @json['content']['data']['title']
22
21
  end
23
22
 
24
23
  def parse_image_url
25
24
  'https://t.komiflo.com/564_mobile_large_3x/' + @json['content']['named_imgs']['cover']['filename']
26
25
  end
27
26
 
28
- def parse_description
29
- author = @json['content']['attributes']['artists']['children'][0]['data']['name']
27
+ def parse_author
28
+ @json['content']['attributes']['artists']['children'][0]['data']['name']
29
+ end
30
30
 
31
- parent = @json['content']['parents'][0]['data']['title']
32
- description = '著: ' + author if author
33
- description += " / #{parent}" if parent
31
+ def parse_description
32
+ @json['content']['parents'][0]['data']['title']
34
33
  end
35
34
 
36
35
  def parse_canonical_url
37
36
  id = @url.slice(%r{komiflo\.com(?:/#!)?/comics/(\d+)}, 1)
38
37
  'https://komiflo.com/comics/' + id
39
38
  end
39
+
40
+ def parse_tags
41
+ @json['content']['attributes']['tags']['children'].map { |content| content['data']['name'] }
42
+ end
40
43
  end
41
44
 
42
45
  ::Panchira::Extensions.register(Panchira::KomifloResolver)
@@ -4,8 +4,41 @@ module Panchira
4
4
  class MelonbooksResolver < Resolver
5
5
  URL_REGEXP = %r{melonbooks.co.jp/detail/detail.php\?product_id=(\d+)}.freeze
6
6
 
7
+ def fetch
8
+ result = PanchiraResult.new
9
+
10
+ @page = fetch_page(@url)
11
+ result.canonical_url = parse_canonical_url
12
+
13
+ @page = fetch_page(result.canonical_url) if @url != result.canonical_url
14
+
15
+ result.title, result.author, result.circle = parse_table
16
+ result.description = parse_description
17
+ result.image = parse_image
18
+ result.tags = parse_tags
19
+
20
+ result
21
+ end
22
+
7
23
  private
8
24
 
25
+ def parse_table
26
+ title, author, circle = nil, nil, nil
27
+
28
+ @page.css('#description > table.stripe > tr').each do |tr|
29
+ case tr.css('th').text
30
+ when 'タイトル'
31
+ title = tr.css('td').text.strip
32
+ when 'サークル名'
33
+ circle = tr.css('td > a').text.match(/^(.+)\W\(作品数:/)&.values_at(1)[0]
34
+ when '作家名'
35
+ author = tr.css('td > a').text.strip
36
+ end
37
+ end
38
+
39
+ [title, author, circle]
40
+ end
41
+
9
42
  def parse_canonical_url
10
43
  product_id = @url.slice(URL_REGEXP, 1)
11
44
  'https://www.melonbooks.co.jp/detail/detail.php?product_id=' + product_id + '&adult_view=1'
@@ -25,6 +58,10 @@ module Panchira
25
58
  def parse_image_url
26
59
  @page.css('//meta[property="og:image"]/@content').first.to_s.sub(/&c=1/, '')
27
60
  end
61
+
62
+ def parse_tags
63
+ @page.css('#related_tags .clearfix').children.children.map(&:text)
64
+ end
28
65
  end
29
66
 
30
67
  ::Panchira::Extensions.register(Panchira::MelonbooksResolver)
@@ -3,18 +3,61 @@
3
3
  require 'net/https'
4
4
 
5
5
  module Panchira
6
- class NarouResolver < Resolver
7
- URL_REGEXP = %r{novel18\.syosetu\.com/}.freeze
6
+ module Narou
7
+ class Novel18Resolver < Resolver
8
+ URL_REGEXP = %r{novel18\.syosetu\.com/}.freeze
9
+ ID_REGEXP = %{novel18\.syosetu\.com/(?<id>[^/]+)}
8
10
 
9
- def fetch_page(uri)
10
- u = URI.parse(uri)
11
- http = Net::HTTP.new(u.host, u.port)
12
- http.use_ssl = u.port == 443
13
- res = http.get u.request_uri, { 'cookie' => 'over18=yes;' }
11
+ def initialize(url)
12
+ super(url)
14
13
 
15
- Nokogiri::HTML.parse(res.body, uri)
14
+ if id = @url.match(ID_REGEXP)[:id]
15
+ @desc = fetch_page("https://novel18.syosetu.com/novelview/infotop/ncode/#{id}/")
16
+ end
17
+ end
18
+
19
+ def fetch_page(uri)
20
+ u = URI.parse(uri)
21
+ http = Net::HTTP.new(u.host, u.port)
22
+ http.use_ssl = u.port == 443
23
+ res = http.get u.request_uri, { 'cookie' => 'over18=yes;' }
24
+
25
+ Nokogiri::HTML.parse(res.body, uri)
26
+ end
27
+
28
+ def parse_author
29
+ @desc&.xpath('//*[@id="noveltable1"]/tr[2]/td')&.text&.strip
30
+ end
31
+
32
+ def parse_tags
33
+ # つらい。
34
+ @desc&.xpath('//*[@id="noveltable1"]/tr[3]')&.text&.split("\n\n\n")&.dig(1)&.split(' ')
35
+ end
36
+ end
37
+
38
+ class NcodeResolver < Resolver
39
+ URL_REGEXP = /ncode\.syosetu\.com/.freeze
40
+ ID_REGEXP = %{ncode\.syosetu\.com/(?<id>[^/]+)}
41
+
42
+ def initialize(url)
43
+ super(url)
44
+
45
+ if id = @url.match(ID_REGEXP)[:id]
46
+ @desc = fetch_page("https://novel18.syosetu.com/novelview/infotop/ncode/#{id}/")
47
+ end
48
+ end
49
+
50
+ def parse_author
51
+ @desc&.xpath('//*[@id="noveltable1"]/tr[2]/td')&.text&.strip
52
+ end
53
+
54
+ def parse_tags
55
+ # めっちゃつらい。
56
+ @desc&.xpath('//*[@id="noveltable1"]/tr[3]')&.text&.split("\n\n\n")&.dig(1)&.delete("\u00A0")&.split(' ')&.grep_v('')
57
+ end
16
58
  end
17
59
  end
18
60
 
19
- ::Panchira::Extensions.register(Panchira::NarouResolver)
61
+ ::Panchira::Extensions.register(Panchira::Narou::NcodeResolver)
62
+ ::Panchira::Extensions.register(Panchira::Narou::Novel18Resolver)
20
63
  end
@@ -6,6 +6,21 @@ module Panchira
6
6
 
7
7
  private
8
8
 
9
+ def parse_title
10
+ full_title = super
11
+ @md = full_title.match(/\A(?<title>.+) \| (?<author>.+)\z/)
12
+
13
+ @md[:title]
14
+ end
15
+
16
+ def parse_author
17
+ @md[:author]
18
+ end
19
+
20
+ def parse_description
21
+ @page.css('p.illust_description')&.first&.text&.strip
22
+ end
23
+
9
24
  def parse_canonical_url
10
25
  @url.sub(/sp.nijie/, 'nijie').sub(/view_popup/, 'view')
11
26
  end
@@ -24,6 +39,10 @@ module Panchira
24
39
  @page.css('//meta[property="og:image"]/@content').first.to_s
25
40
  end
26
41
  end
42
+
43
+ def parse_tags
44
+ @page.css('#view-tag span.tag_name').map(&:text)
45
+ end
27
46
  end
28
47
 
29
48
  ::Panchira::Extensions.register(Panchira::NijieResolver)
@@ -7,10 +7,21 @@ module Panchira
7
7
  def initialize(url)
8
8
  super(url)
9
9
  @illust_id = url.slice(URL_REGEXP, 2)
10
+
11
+ raw_json = URI.parse("https://www.pixiv.net/ajax/illust/#{@illust_id}").read('User-Agent' => user_agent)
12
+ @json = JSON.parse(raw_json)
10
13
  end
11
14
 
12
15
  private
13
16
 
17
+ def parse_title
18
+ @json['body']['title']
19
+ end
20
+
21
+ def parse_author
22
+ @json['body']['userName']
23
+ end
24
+
14
25
  def parse_canonical_url
15
26
  'https://pixiv.net/member_illust.php?mode=medium&illust_id=' + @illust_id
16
27
  end
@@ -27,6 +38,10 @@ module Panchira
27
38
  rescue StandardError
28
39
  @page.css('//meta[property="og:image"]/@content').first.to_s
29
40
  end
41
+
42
+ def parse_tags
43
+ @json['body']['tags']['tags'].map { |content| content['tag'] }
44
+ end
30
45
  end
31
46
 
32
47
  ::Panchira::Extensions.register(Panchira::PixivResolver)
@@ -1,39 +1,43 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- # Resolver is a class that actually GET url and resolve attributes.
4
- # This class is the default resolver for pages,
5
- # and is inherited by the other resolvers.
6
3
  module Panchira
4
+ # Resolver is a class that actually get attributes by fetching designated url.
5
+ # This class is the default resolver for pages. <br>
6
+ # To create your own resolver, first you make a class that extends Resolver,
7
+ # and then register it by ::Panchira::Extensions::register().
8
+ # Then ::Panchira::fetch will pick up your resolver when Resolver::applicable?() is true.
7
9
  class Resolver
8
- # The URL pattern that this resolver tries to resolve.
9
- # Should be redefined in subclasses.
10
+ # URL pattern that a resolver tries to resolve.
11
+ # You must override this in subclasses to limit which urls to resolve.
10
12
  URL_REGEXP = URI::DEFAULT_PARSER.make_regexp
11
13
 
12
- USER_AGENT = "Mozilla/5.0 (compatible; Panchira/#{VERSION}; +https://github.com/nuita/panchira)"
13
-
14
14
  def initialize(url)
15
15
  @url = url
16
16
  end
17
17
 
18
+ # This function is called right after this Resolver instance is made.
19
+ # Fetch page from @url and return PanchiraResult.
18
20
  def fetch
19
- attributes = {}
21
+ result = PanchiraResult.new
20
22
 
21
23
  @page = fetch_page(@url)
22
- attributes[:canonical_url] = parse_canonical_url
24
+ result.canonical_url = parse_canonical_url
23
25
 
24
- if @url != attributes[:canonical_url]
25
- @page = fetch_page(attributes[:canonical_url])
26
- end
26
+ @page = fetch_page(result.canonical_url) if @url != result.canonical_url
27
27
 
28
- attributes[:title] = parse_title
29
- attributes[:description] = parse_description
30
- attributes[:image] = parse_image
28
+ result.title = parse_title
29
+ result.description = parse_description
30
+ result.image = parse_image
31
+ result.tags = parse_tags
32
+ result.author = parse_author
33
+ result.circle = parse_circle
31
34
 
32
- attributes
35
+ result
33
36
  end
34
37
 
35
38
  class << self
36
39
  # Tell whether the url is applicable for this resolver.
40
+ # ::Panchira::fetch uses this method to choose a Resolver for a URL.
37
41
  def applicable?(url)
38
42
  url =~ self::URL_REGEXP
39
43
  end
@@ -42,16 +46,33 @@ module Panchira
42
46
  private
43
47
 
44
48
  def fetch_page(url)
45
- raw_page = URI.parse(url).read('User-Agent' => USER_AGENT)
49
+ read_options = {
50
+ 'User-Agent' => user_agent,
51
+ 'Cookie' => cookie
52
+ }
53
+
54
+ raw_page = URI.parse(url).read(read_options)
46
55
  charset = raw_page.charset
47
56
  Nokogiri::HTML.parse(raw_page, url, charset)
48
57
  end
49
58
 
50
59
  def parse_canonical_url
51
- if (canonical_url = @page.css('//link[rel="canonical"]/@href')).any?
52
- canonical_url.to_s
53
- else
54
- @url
60
+ history = []
61
+
62
+ # fetch page and refresh canonical_url until canonical_url converges.
63
+ loop do
64
+ url_in_res = @page.css('//link[rel="canonical"]/@href').to_s
65
+
66
+ if url_in_res.empty?
67
+ return history.last || @url
68
+ else
69
+ if history.include?(url_in_res) || history.length > 5
70
+ return url_in_res
71
+ else
72
+ history.push(url_in_res)
73
+ @page = fetch_page(url_in_res)
74
+ end
75
+ end
55
76
  end
56
77
  end
57
78
 
@@ -72,9 +93,9 @@ module Panchira
72
93
  end
73
94
 
74
95
  def parse_image
75
- image = {}
76
- image[:url] = parse_image_url
77
- image[:width], image[:height] = FastImage.size(image[:url])
96
+ image = PanchiraImage.new
97
+ image.url = parse_image_url
98
+ image.width, image.height = FastImage.size(image.url)
78
99
 
79
100
  image
80
101
  end
@@ -82,5 +103,25 @@ module Panchira
82
103
  def parse_image_url
83
104
  @page.css('//meta[property="og:image"]/@content').first.to_s
84
105
  end
106
+
107
+ def parse_tags
108
+ []
109
+ end
110
+
111
+ def cookie
112
+ ''
113
+ end
114
+
115
+ def parse_author
116
+ @page.css('//meta[name="author"]/@content').first.to_s
117
+ end
118
+
119
+ def parse_circle
120
+ nil
121
+ end
122
+
123
+ def user_agent
124
+ "Mozilla/5.0 (compatible; PanchiraBot/#{VERSION}; +https://github.com/nuita/panchira)"
125
+ end
85
126
  end
86
127
  end
@@ -1,5 +1,5 @@
1
1
  # frozen_string_literal: true
2
2
 
3
3
  module Panchira
4
- VERSION = '0.2.0'
4
+ VERSION = '1.2.0'
5
5
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: panchira
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 1.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - kyp
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-05-18 00:00:00.000000000 Z
11
+ date: 2020-10-31 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -101,7 +101,10 @@ files:
101
101
  - bin/setup
102
102
  - lib/panchira.rb
103
103
  - lib/panchira/extensions.rb
104
+ - lib/panchira/panchira_result.rb
104
105
  - lib/panchira/resolvers/dlsite_resolver.rb
106
+ - lib/panchira/resolvers/fanza_resolver.rb
107
+ - lib/panchira/resolvers/image_resolver.rb
105
108
  - lib/panchira/resolvers/komiflo_resolver.rb
106
109
  - lib/panchira/resolvers/melonbooks_resolver.rb
107
110
  - lib/panchira/resolvers/narou_resolver.rb