panchira 0.2.0 → 1.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +43 -0
- data/Gemfile.lock +3 -3
- data/README.md +6 -4
- data/lib/panchira.rb +6 -1
- data/lib/panchira/extensions.rb +4 -1
- data/lib/panchira/panchira_result.rb +13 -0
- data/lib/panchira/resolvers/dlsite_resolver.rb +30 -0
- data/lib/panchira/resolvers/fanza_resolver.rb +56 -0
- data/lib/panchira/resolvers/image_resolver.rb +15 -0
- data/lib/panchira/resolvers/komiflo_resolver.rb +11 -8
- data/lib/panchira/resolvers/melonbooks_resolver.rb +37 -0
- data/lib/panchira/resolvers/narou_resolver.rb +52 -9
- data/lib/panchira/resolvers/nijie_resolver.rb +19 -0
- data/lib/panchira/resolvers/pixiv_resolver.rb +15 -0
- data/lib/panchira/resolvers/resolver.rb +65 -24
- data/lib/panchira/version.rb +1 -1
- metadata +5 -2
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 066440e461b75b84a9df04fd76f1960243785b26bc7f4c61289029248e0a8bd9
|
|
4
|
+
data.tar.gz: 1fc1f712c6a8d88363cf3c4162be2681e08631c515ffbe6631fba3fd204b91c0
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 63a914d286eaf909f4a2ab7c128f3725a96a6badbac71a878362e4a09a4e29f720f1f81fab2fa4b1f0ddeb513fac04b5c00597132012f5dbe42d783f54b221b2
|
|
7
|
+
data.tar.gz: af6085627c05532b7019a7134da472329c52b0f61b3329079694a2f59115e52f1c7b0bc0acc2c9cc3ea19814a33c3e2cd9116fcd7f692278e2150de7874bb424
|
data/CHANGELOG.md
CHANGED
|
@@ -4,6 +4,46 @@ All notable changes to this project will be documented in this file.
|
|
|
4
4
|
The format is based on [Keep a Changelog](http://keepachangelog.com/)
|
|
5
5
|
and this project adheres to [Semantic Versioning](http://semver.org/).
|
|
6
6
|
|
|
7
|
+
## 1.2.0 - 2020-10-31
|
|
8
|
+
### Added
|
|
9
|
+
- You can now fetch author and circle name in resolvers (Resolver#fetch_author, Resolver#fetch_circle).
|
|
10
|
+
|
|
11
|
+
### Changed
|
|
12
|
+
- Resolver#fetch_title returns the title of the content (not the original title of the page).
|
|
13
|
+
|
|
14
|
+
## 1.1.1 - 2020-08-09
|
|
15
|
+
### Added
|
|
16
|
+
- Added support for Fanza Doujin.
|
|
17
|
+
- Added support for description in Fanza Book.
|
|
18
|
+
|
|
19
|
+
### Fixed
|
|
20
|
+
- Fixed an issue that fetching image was not working in Fanza Book.
|
|
21
|
+
|
|
22
|
+
## 1.1.0 - 2020-08-06
|
|
23
|
+
### Added
|
|
24
|
+
- Added support for Fanza Books.
|
|
25
|
+
- Added support for direct links to an image.
|
|
26
|
+
- You can now set cookie by overriding Resolver#cookie in individual resolvers.
|
|
27
|
+
|
|
28
|
+
### Changed
|
|
29
|
+
- Resolver::USER_AGENT changed to Resolver#user_agent.
|
|
30
|
+
|
|
31
|
+
## 1.0.0 - 2020-06-23
|
|
32
|
+
### Added
|
|
33
|
+
- Added support for tags.
|
|
34
|
+
|
|
35
|
+
### Fixed
|
|
36
|
+
- Fixed some outdated documents.
|
|
37
|
+
|
|
38
|
+
## 0.3.0 - 2020-06-04
|
|
39
|
+
### Added
|
|
40
|
+
- You can now register and use your own Resolver with this gem. (see Panchira::Extensions#register)
|
|
41
|
+
- Added support for new Twitter UI.
|
|
42
|
+
|
|
43
|
+
### Changed
|
|
44
|
+
- Panchira::fetch now returns an instance of PanchiraResult instead of a hash.
|
|
45
|
+
- Changed default User-Agent slightly.
|
|
46
|
+
|
|
7
47
|
## 0.2.0 - 2020-05-18
|
|
8
48
|
### Added
|
|
9
49
|
- Added support for Shousetsuka Ni Narou (novel18.syosetu.com).
|
|
@@ -18,6 +58,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
|
|
|
18
58
|
### Added
|
|
19
59
|
- Released Panchira gem. At this time we can parse only 5 websites.
|
|
20
60
|
|
|
61
|
+
[1.1.0]: https://github.com/nuita/panchira/releases/tag/v1.1.0
|
|
62
|
+
[1.0.0]: https://github.com/nuita/panchira/releases/tag/v1.0.0
|
|
63
|
+
[0.3.0]: https://github.com/nuita/panchira/releases/tag/v0.3.0
|
|
21
64
|
[0.2.0]: https://github.com/nuita/panchira/releases/tag/v0.2.0
|
|
22
65
|
[0.1.1]: https://github.com/nuita/panchira/releases/tag/v0.1.1
|
|
23
66
|
[0.1.0]: https://github.com/nuita/panchira/releases/tag/v0.1.0
|
data/Gemfile.lock
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
PATH
|
|
2
2
|
remote: .
|
|
3
3
|
specs:
|
|
4
|
-
panchira (
|
|
4
|
+
panchira (1.2.0)
|
|
5
5
|
fastimage (~> 2.1.7)
|
|
6
6
|
nokogiri (~> 1.10.9)
|
|
7
7
|
|
|
@@ -10,8 +10,8 @@ GEM
|
|
|
10
10
|
specs:
|
|
11
11
|
fastimage (2.1.7)
|
|
12
12
|
mini_portile2 (2.4.0)
|
|
13
|
-
minitest (5.14.
|
|
14
|
-
nokogiri (1.10.
|
|
13
|
+
minitest (5.14.2)
|
|
14
|
+
nokogiri (1.10.10)
|
|
15
15
|
mini_portile2 (~> 2.4.0)
|
|
16
16
|
rake (12.3.3)
|
|
17
17
|
|
data/README.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
|
6
6
|
|
|
7
7
|
Due to some legal or ethical issues, most hentai and NSFW platforms don't clarify their content on meta tags. As a result, most hentai platforms are rendered poorly on the card previews on social media.
|
|
8
8
|
|
|
9
|
-
To solve this issue, Panchira is made to parse correct and uncensored metadata from such web platforms (at this time we cover **DLSite, Komiflo, Melonbooks, Nijie and
|
|
9
|
+
To solve this issue, Panchira is made to parse correct and uncensored metadata from such web platforms (at this time we cover **DLSite, Komiflo, Melonbooks, Nijie, Pixiv, Shousetsuka ni narou, Fanza and Twitter**).
|
|
10
10
|
|
|
11
11
|
If you need card previews of hentai on your web application, but can't get them with simply parsing metatags, then it is time for Panchira.
|
|
12
12
|
|
|
@@ -16,7 +16,7 @@ This gem is derived from the [Nuita](https://github.com/nuita/nuita) project.
|
|
|
16
16
|
|
|
17
17
|
**Please use this gem with appropriate censoring and age-restricting. Never violate local laws and copyrights.**
|
|
18
18
|
|
|
19
|
-
If you are running one of the websites we cover and feel negative about
|
|
19
|
+
If you are running one of the websites we cover and feel negative about this gem, please contact the community or the author([@kypkyp](https://github.com/kypkyp)).
|
|
20
20
|
|
|
21
21
|
## Installation
|
|
22
22
|
|
|
@@ -39,10 +39,12 @@ Or install it yourself as:
|
|
|
39
39
|
```
|
|
40
40
|
> Panchira.fetch("https://www.pixiv.net/artworks/61711172")
|
|
41
41
|
|
|
42
|
-
=>
|
|
42
|
+
=> #<Panchira::PanchiraResult:0x00007fb95d2c53f8 @canonical_url="https://pixiv.net/member_illust.php?mode=medium&illust_id=61711172", @title="#輿水幸子 すずしい顔で締め切りを破る幸子 - むらためのイラスト - pixiv", @description="(UTF16の)Pietで実行すると「すずしい」と出力する幸子(5色+白Pietカラーゴルフ)。解説記事は http://chy72.hatenablog.com/entry/2016/12/24/1", @image=#<Panchira::PanchiraImage:0x00007fb95f126ea0 @url="https://pixiv.cat/61711172.jpg", @width=810, @height=500>, @tags=["輿水幸子", "Piet", "プログラミング"]>
|
|
43
43
|
```
|
|
44
44
|
|
|
45
|
-
Panchira is
|
|
45
|
+
In most situation you would call `Panchira#fetch`. It is a singular method that takes a URI and returns an instance of `PanchiraResult`, which is a simple class that stores the website's information, such as title, description and so on.
|
|
46
|
+
|
|
47
|
+
Panchira has a special treatment for each website. `Resolver` classes are where those treatments take place, and you can use your own `Resolver` classes by registering it to Panchira. See `Panchira::Extensions` documentation in source code for further details.
|
|
46
48
|
|
|
47
49
|
## Development
|
|
48
50
|
|
data/lib/panchira.rb
CHANGED
|
@@ -6,16 +6,21 @@ require 'fastimage'
|
|
|
6
6
|
require 'json'
|
|
7
7
|
|
|
8
8
|
require_relative 'panchira/version'
|
|
9
|
+
require_relative 'panchira/panchira_result'
|
|
9
10
|
require_relative 'panchira/resolvers/resolver'
|
|
10
11
|
require_relative 'panchira/extensions'
|
|
11
12
|
|
|
12
13
|
project_root = File.dirname(File.absolute_path(__FILE__))
|
|
13
14
|
Dir.glob(project_root + '/panchira/resolvers/*_resolver.rb').sort.each { |file| require file }
|
|
14
15
|
|
|
16
|
+
# register fallback ImageResolver at the end. (resolver is selected by registration order)
|
|
17
|
+
::Panchira::Extensions.register(Panchira::ImageResolver)
|
|
18
|
+
|
|
15
19
|
# Main Panchira code goes here.
|
|
20
|
+
# If you simply want to get data from your URL, then ::Panchira::fetch() will do.
|
|
16
21
|
module Panchira
|
|
17
22
|
class << self
|
|
18
|
-
#
|
|
23
|
+
# Return a PanchiraResult that contains the attributes of given url.
|
|
19
24
|
def fetch(url)
|
|
20
25
|
resolver = select_resolver(url)
|
|
21
26
|
|
data/lib/panchira/extensions.rb
CHANGED
|
@@ -1,15 +1,18 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
3
|
module Panchira
|
|
4
|
+
# This Module manages Resolver classes.
|
|
5
|
+
# To enable your own Resolver, you need to call Extensions::register().
|
|
4
6
|
module Extensions
|
|
5
7
|
@resolvers = []
|
|
6
8
|
|
|
7
9
|
class << self
|
|
8
|
-
# Register a
|
|
10
|
+
# Register a given Resolver to Extensions::Resolvers.
|
|
9
11
|
def register(resolver)
|
|
10
12
|
@resolvers.push(resolver) unless @resolvers.include?(resolver)
|
|
11
13
|
end
|
|
12
14
|
|
|
15
|
+
# Panchira::fetch will find a correct Resolver based on this list.
|
|
13
16
|
attr_reader :resolvers
|
|
14
17
|
end
|
|
15
18
|
end
|
|
@@ -0,0 +1,13 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Panchira
|
|
4
|
+
# Image attributes in PanchiraResult.
|
|
5
|
+
class PanchiraImage
|
|
6
|
+
attr_accessor :url, :width, :height
|
|
7
|
+
end
|
|
8
|
+
|
|
9
|
+
# Result class for Panchira.fetch.
|
|
10
|
+
class PanchiraResult
|
|
11
|
+
attr_accessor :canonical_url, :title, :description, :image, :tags, :author, :circle
|
|
12
|
+
end
|
|
13
|
+
end
|
|
@@ -6,9 +6,39 @@ module Panchira
|
|
|
6
6
|
|
|
7
7
|
private
|
|
8
8
|
|
|
9
|
+
# DLSiteのタイトルの[]に含まれている値はtitleタグだとサークル名 or 出版社名だが、
|
|
10
|
+
# Panchiraが優先するog:titleではサークル名 or 著者名 となる。
|
|
11
|
+
# 取得に際しては、以下の3パターンを考慮する必要があるため、titleタグとtableの解析が必要となる:
|
|
12
|
+
# 1) 同人系の一部, 特に音声など。タイトル[サークル名]. 本文中に著者・作者の記載なし
|
|
13
|
+
# 2) 同人系の一部, 特に一部の同人誌など。タイトル[サークル名]. 本文中に「作者」の記載あり
|
|
14
|
+
# 3) 商業系。タイトル[著者名] サークル名なし
|
|
15
|
+
# 込み入った実装になってしまったため、parse自体をいじる必要があるかも
|
|
16
|
+
def parse_title
|
|
17
|
+
@title_md = super.match(/(.+) \[(\S+)\] \|.+/)
|
|
18
|
+
@title_md[1]
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def parse_author
|
|
22
|
+
@page.css('table[id*="work_"] tr').each do |tr|
|
|
23
|
+
if tr.css('th').text =~ /(作|著)者/
|
|
24
|
+
return @author = tr.css('td > a').first.text.strip
|
|
25
|
+
end
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
@author = nil
|
|
29
|
+
end
|
|
30
|
+
|
|
31
|
+
def parse_circle
|
|
32
|
+
@title_md[2] if @author != @title_md[2]
|
|
33
|
+
end
|
|
34
|
+
|
|
9
35
|
def parse_image_url
|
|
10
36
|
@page.css('//meta[property="og:image"]/@content').first.to_s.sub(/sam/, 'main')
|
|
11
37
|
end
|
|
38
|
+
|
|
39
|
+
def parse_tags
|
|
40
|
+
@page.css('.main_genre').children.children.map(&:text)
|
|
41
|
+
end
|
|
12
42
|
end
|
|
13
43
|
|
|
14
44
|
::Panchira::Extensions.register(Panchira::DlsiteResolver)
|
|
@@ -0,0 +1,56 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
require 'net/https'
|
|
4
|
+
|
|
5
|
+
module Panchira
|
|
6
|
+
module Fanza
|
|
7
|
+
FANZA_COOKIE = 'age_check_done=1;'
|
|
8
|
+
|
|
9
|
+
class FanzaResolver < Resolver
|
|
10
|
+
private
|
|
11
|
+
|
|
12
|
+
def cookie
|
|
13
|
+
::Panchira::Fanza::FANZA_COOKIE
|
|
14
|
+
end
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
class FanzaBookResolver < FanzaResolver
|
|
18
|
+
URL_REGEXP = %r{book\.dmm\.co\.jp\/}.freeze
|
|
19
|
+
|
|
20
|
+
private
|
|
21
|
+
|
|
22
|
+
def parse_author
|
|
23
|
+
@page.css('.m-boxDetailProductInfoMainList__description__list__item > a').first&.text.to_s
|
|
24
|
+
end
|
|
25
|
+
|
|
26
|
+
def parse_image_url
|
|
27
|
+
@page.css('.m-imgDetailProductPack/@src').first.to_s
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
def parse_tags
|
|
31
|
+
@page.css('.m-boxDetailProductInfo__list__description__item > a').map(&:text)
|
|
32
|
+
end
|
|
33
|
+
|
|
34
|
+
def parse_description
|
|
35
|
+
@page.css('.m-boxDetailProduct__info__story').first&.text.to_s.gsub(/[\n\t]/, '')
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
class FanzaDoujinResolver < FanzaResolver
|
|
40
|
+
URL_REGEXP = %r{dmm\.co\.jp\/dc\/doujin\/}.freeze
|
|
41
|
+
|
|
42
|
+
private
|
|
43
|
+
|
|
44
|
+
def parse_circle
|
|
45
|
+
@page.css('a.circleName__txt').first.text
|
|
46
|
+
end
|
|
47
|
+
|
|
48
|
+
def parse_tags
|
|
49
|
+
@page.css('.genreTag__item').map { |t| t.text.strip }
|
|
50
|
+
end
|
|
51
|
+
end
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
::Panchira::Extensions.register(Panchira::Fanza::FanzaBookResolver)
|
|
55
|
+
::Panchira::Extensions.register(Panchira::Fanza::FanzaDoujinResolver)
|
|
56
|
+
end
|
|
@@ -0,0 +1,15 @@
|
|
|
1
|
+
# frozen_string_literal: true
|
|
2
|
+
|
|
3
|
+
module Panchira
|
|
4
|
+
class ImageResolver < Resolver
|
|
5
|
+
URL_REGEXP = %r{\.(png|gif|jpg|jpeg|webp)$}.freeze
|
|
6
|
+
|
|
7
|
+
def fetch
|
|
8
|
+
result = PanchiraResult.new
|
|
9
|
+
result.canonical_url = @url
|
|
10
|
+
result.image = PanchiraImage.new
|
|
11
|
+
result.image.url = @url
|
|
12
|
+
result
|
|
13
|
+
end
|
|
14
|
+
end
|
|
15
|
+
end
|
|
@@ -10,33 +10,36 @@ module Panchira
|
|
|
10
10
|
@url = url
|
|
11
11
|
|
|
12
12
|
@id = url.slice(URL_REGEXP, 1)
|
|
13
|
-
raw_json = URI.parse("https://api.komiflo.com/content/id/#{@id}").read('User-Agent' =>
|
|
13
|
+
raw_json = URI.parse("https://api.komiflo.com/content/id/#{@id}").read('User-Agent' => user_agent)
|
|
14
14
|
@json = JSON.parse(raw_json)
|
|
15
15
|
end
|
|
16
16
|
|
|
17
17
|
private
|
|
18
18
|
|
|
19
19
|
def parse_title
|
|
20
|
-
|
|
21
|
-
"#{comic_title} | Komiflo"
|
|
20
|
+
@json['content']['data']['title']
|
|
22
21
|
end
|
|
23
22
|
|
|
24
23
|
def parse_image_url
|
|
25
24
|
'https://t.komiflo.com/564_mobile_large_3x/' + @json['content']['named_imgs']['cover']['filename']
|
|
26
25
|
end
|
|
27
26
|
|
|
28
|
-
def
|
|
29
|
-
|
|
27
|
+
def parse_author
|
|
28
|
+
@json['content']['attributes']['artists']['children'][0]['data']['name']
|
|
29
|
+
end
|
|
30
30
|
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
description += " / #{parent}" if parent
|
|
31
|
+
def parse_description
|
|
32
|
+
@json['content']['parents'][0]['data']['title']
|
|
34
33
|
end
|
|
35
34
|
|
|
36
35
|
def parse_canonical_url
|
|
37
36
|
id = @url.slice(%r{komiflo\.com(?:/#!)?/comics/(\d+)}, 1)
|
|
38
37
|
'https://komiflo.com/comics/' + id
|
|
39
38
|
end
|
|
39
|
+
|
|
40
|
+
def parse_tags
|
|
41
|
+
@json['content']['attributes']['tags']['children'].map { |content| content['data']['name'] }
|
|
42
|
+
end
|
|
40
43
|
end
|
|
41
44
|
|
|
42
45
|
::Panchira::Extensions.register(Panchira::KomifloResolver)
|
|
@@ -4,8 +4,41 @@ module Panchira
|
|
|
4
4
|
class MelonbooksResolver < Resolver
|
|
5
5
|
URL_REGEXP = %r{melonbooks.co.jp/detail/detail.php\?product_id=(\d+)}.freeze
|
|
6
6
|
|
|
7
|
+
def fetch
|
|
8
|
+
result = PanchiraResult.new
|
|
9
|
+
|
|
10
|
+
@page = fetch_page(@url)
|
|
11
|
+
result.canonical_url = parse_canonical_url
|
|
12
|
+
|
|
13
|
+
@page = fetch_page(result.canonical_url) if @url != result.canonical_url
|
|
14
|
+
|
|
15
|
+
result.title, result.author, result.circle = parse_table
|
|
16
|
+
result.description = parse_description
|
|
17
|
+
result.image = parse_image
|
|
18
|
+
result.tags = parse_tags
|
|
19
|
+
|
|
20
|
+
result
|
|
21
|
+
end
|
|
22
|
+
|
|
7
23
|
private
|
|
8
24
|
|
|
25
|
+
def parse_table
|
|
26
|
+
title, author, circle = nil, nil, nil
|
|
27
|
+
|
|
28
|
+
@page.css('#description > table.stripe > tr').each do |tr|
|
|
29
|
+
case tr.css('th').text
|
|
30
|
+
when 'タイトル'
|
|
31
|
+
title = tr.css('td').text.strip
|
|
32
|
+
when 'サークル名'
|
|
33
|
+
circle = tr.css('td > a').text.match(/^(.+)\W\(作品数:/)&.values_at(1)[0]
|
|
34
|
+
when '作家名'
|
|
35
|
+
author = tr.css('td > a').text.strip
|
|
36
|
+
end
|
|
37
|
+
end
|
|
38
|
+
|
|
39
|
+
[title, author, circle]
|
|
40
|
+
end
|
|
41
|
+
|
|
9
42
|
def parse_canonical_url
|
|
10
43
|
product_id = @url.slice(URL_REGEXP, 1)
|
|
11
44
|
'https://www.melonbooks.co.jp/detail/detail.php?product_id=' + product_id + '&adult_view=1'
|
|
@@ -25,6 +58,10 @@ module Panchira
|
|
|
25
58
|
def parse_image_url
|
|
26
59
|
@page.css('//meta[property="og:image"]/@content').first.to_s.sub(/&c=1/, '')
|
|
27
60
|
end
|
|
61
|
+
|
|
62
|
+
def parse_tags
|
|
63
|
+
@page.css('#related_tags .clearfix').children.children.map(&:text)
|
|
64
|
+
end
|
|
28
65
|
end
|
|
29
66
|
|
|
30
67
|
::Panchira::Extensions.register(Panchira::MelonbooksResolver)
|
|
@@ -3,18 +3,61 @@
|
|
|
3
3
|
require 'net/https'
|
|
4
4
|
|
|
5
5
|
module Panchira
|
|
6
|
-
|
|
7
|
-
|
|
6
|
+
module Narou
|
|
7
|
+
class Novel18Resolver < Resolver
|
|
8
|
+
URL_REGEXP = %r{novel18\.syosetu\.com/}.freeze
|
|
9
|
+
ID_REGEXP = %{novel18\.syosetu\.com/(?<id>[^/]+)}
|
|
8
10
|
|
|
9
|
-
|
|
10
|
-
|
|
11
|
-
http = Net::HTTP.new(u.host, u.port)
|
|
12
|
-
http.use_ssl = u.port == 443
|
|
13
|
-
res = http.get u.request_uri, { 'cookie' => 'over18=yes;' }
|
|
11
|
+
def initialize(url)
|
|
12
|
+
super(url)
|
|
14
13
|
|
|
15
|
-
|
|
14
|
+
if id = @url.match(ID_REGEXP)[:id]
|
|
15
|
+
@desc = fetch_page("https://novel18.syosetu.com/novelview/infotop/ncode/#{id}/")
|
|
16
|
+
end
|
|
17
|
+
end
|
|
18
|
+
|
|
19
|
+
def fetch_page(uri)
|
|
20
|
+
u = URI.parse(uri)
|
|
21
|
+
http = Net::HTTP.new(u.host, u.port)
|
|
22
|
+
http.use_ssl = u.port == 443
|
|
23
|
+
res = http.get u.request_uri, { 'cookie' => 'over18=yes;' }
|
|
24
|
+
|
|
25
|
+
Nokogiri::HTML.parse(res.body, uri)
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
def parse_author
|
|
29
|
+
@desc&.xpath('//*[@id="noveltable1"]/tr[2]/td')&.text&.strip
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def parse_tags
|
|
33
|
+
# つらい。
|
|
34
|
+
@desc&.xpath('//*[@id="noveltable1"]/tr[3]')&.text&.split("\n\n\n")&.dig(1)&.split(' ')
|
|
35
|
+
end
|
|
36
|
+
end
|
|
37
|
+
|
|
38
|
+
class NcodeResolver < Resolver
|
|
39
|
+
URL_REGEXP = /ncode\.syosetu\.com/.freeze
|
|
40
|
+
ID_REGEXP = %{ncode\.syosetu\.com/(?<id>[^/]+)}
|
|
41
|
+
|
|
42
|
+
def initialize(url)
|
|
43
|
+
super(url)
|
|
44
|
+
|
|
45
|
+
if id = @url.match(ID_REGEXP)[:id]
|
|
46
|
+
@desc = fetch_page("https://novel18.syosetu.com/novelview/infotop/ncode/#{id}/")
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
def parse_author
|
|
51
|
+
@desc&.xpath('//*[@id="noveltable1"]/tr[2]/td')&.text&.strip
|
|
52
|
+
end
|
|
53
|
+
|
|
54
|
+
def parse_tags
|
|
55
|
+
# めっちゃつらい。
|
|
56
|
+
@desc&.xpath('//*[@id="noveltable1"]/tr[3]')&.text&.split("\n\n\n")&.dig(1)&.delete("\u00A0")&.split(' ')&.grep_v('')
|
|
57
|
+
end
|
|
16
58
|
end
|
|
17
59
|
end
|
|
18
60
|
|
|
19
|
-
::Panchira::Extensions.register(Panchira::
|
|
61
|
+
::Panchira::Extensions.register(Panchira::Narou::NcodeResolver)
|
|
62
|
+
::Panchira::Extensions.register(Panchira::Narou::Novel18Resolver)
|
|
20
63
|
end
|
|
@@ -6,6 +6,21 @@ module Panchira
|
|
|
6
6
|
|
|
7
7
|
private
|
|
8
8
|
|
|
9
|
+
def parse_title
|
|
10
|
+
full_title = super
|
|
11
|
+
@md = full_title.match(/\A(?<title>.+) \| (?<author>.+)\z/)
|
|
12
|
+
|
|
13
|
+
@md[:title]
|
|
14
|
+
end
|
|
15
|
+
|
|
16
|
+
def parse_author
|
|
17
|
+
@md[:author]
|
|
18
|
+
end
|
|
19
|
+
|
|
20
|
+
def parse_description
|
|
21
|
+
@page.css('p.illust_description')&.first&.text&.strip
|
|
22
|
+
end
|
|
23
|
+
|
|
9
24
|
def parse_canonical_url
|
|
10
25
|
@url.sub(/sp.nijie/, 'nijie').sub(/view_popup/, 'view')
|
|
11
26
|
end
|
|
@@ -24,6 +39,10 @@ module Panchira
|
|
|
24
39
|
@page.css('//meta[property="og:image"]/@content').first.to_s
|
|
25
40
|
end
|
|
26
41
|
end
|
|
42
|
+
|
|
43
|
+
def parse_tags
|
|
44
|
+
@page.css('#view-tag span.tag_name').map(&:text)
|
|
45
|
+
end
|
|
27
46
|
end
|
|
28
47
|
|
|
29
48
|
::Panchira::Extensions.register(Panchira::NijieResolver)
|
|
@@ -7,10 +7,21 @@ module Panchira
|
|
|
7
7
|
def initialize(url)
|
|
8
8
|
super(url)
|
|
9
9
|
@illust_id = url.slice(URL_REGEXP, 2)
|
|
10
|
+
|
|
11
|
+
raw_json = URI.parse("https://www.pixiv.net/ajax/illust/#{@illust_id}").read('User-Agent' => user_agent)
|
|
12
|
+
@json = JSON.parse(raw_json)
|
|
10
13
|
end
|
|
11
14
|
|
|
12
15
|
private
|
|
13
16
|
|
|
17
|
+
def parse_title
|
|
18
|
+
@json['body']['title']
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def parse_author
|
|
22
|
+
@json['body']['userName']
|
|
23
|
+
end
|
|
24
|
+
|
|
14
25
|
def parse_canonical_url
|
|
15
26
|
'https://pixiv.net/member_illust.php?mode=medium&illust_id=' + @illust_id
|
|
16
27
|
end
|
|
@@ -27,6 +38,10 @@ module Panchira
|
|
|
27
38
|
rescue StandardError
|
|
28
39
|
@page.css('//meta[property="og:image"]/@content').first.to_s
|
|
29
40
|
end
|
|
41
|
+
|
|
42
|
+
def parse_tags
|
|
43
|
+
@json['body']['tags']['tags'].map { |content| content['tag'] }
|
|
44
|
+
end
|
|
30
45
|
end
|
|
31
46
|
|
|
32
47
|
::Panchira::Extensions.register(Panchira::PixivResolver)
|
|
@@ -1,39 +1,43 @@
|
|
|
1
1
|
# frozen_string_literal: true
|
|
2
2
|
|
|
3
|
-
# Resolver is a class that actually GET url and resolve attributes.
|
|
4
|
-
# This class is the default resolver for pages,
|
|
5
|
-
# and is inherited by the other resolvers.
|
|
6
3
|
module Panchira
|
|
4
|
+
# Resolver is a class that actually get attributes by fetching designated url.
|
|
5
|
+
# This class is the default resolver for pages. <br>
|
|
6
|
+
# To create your own resolver, first you make a class that extends Resolver,
|
|
7
|
+
# and then register it by ::Panchira::Extensions::register().
|
|
8
|
+
# Then ::Panchira::fetch will pick up your resolver when Resolver::applicable?() is true.
|
|
7
9
|
class Resolver
|
|
8
|
-
#
|
|
9
|
-
#
|
|
10
|
+
# URL pattern that a resolver tries to resolve.
|
|
11
|
+
# You must override this in subclasses to limit which urls to resolve.
|
|
10
12
|
URL_REGEXP = URI::DEFAULT_PARSER.make_regexp
|
|
11
13
|
|
|
12
|
-
USER_AGENT = "Mozilla/5.0 (compatible; Panchira/#{VERSION}; +https://github.com/nuita/panchira)"
|
|
13
|
-
|
|
14
14
|
def initialize(url)
|
|
15
15
|
@url = url
|
|
16
16
|
end
|
|
17
17
|
|
|
18
|
+
# This function is called right after this Resolver instance is made.
|
|
19
|
+
# Fetch page from @url and return PanchiraResult.
|
|
18
20
|
def fetch
|
|
19
|
-
|
|
21
|
+
result = PanchiraResult.new
|
|
20
22
|
|
|
21
23
|
@page = fetch_page(@url)
|
|
22
|
-
|
|
24
|
+
result.canonical_url = parse_canonical_url
|
|
23
25
|
|
|
24
|
-
if @url !=
|
|
25
|
-
@page = fetch_page(attributes[:canonical_url])
|
|
26
|
-
end
|
|
26
|
+
@page = fetch_page(result.canonical_url) if @url != result.canonical_url
|
|
27
27
|
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
28
|
+
result.title = parse_title
|
|
29
|
+
result.description = parse_description
|
|
30
|
+
result.image = parse_image
|
|
31
|
+
result.tags = parse_tags
|
|
32
|
+
result.author = parse_author
|
|
33
|
+
result.circle = parse_circle
|
|
31
34
|
|
|
32
|
-
|
|
35
|
+
result
|
|
33
36
|
end
|
|
34
37
|
|
|
35
38
|
class << self
|
|
36
39
|
# Tell whether the url is applicable for this resolver.
|
|
40
|
+
# ::Panchira::fetch uses this method to choose a Resolver for a URL.
|
|
37
41
|
def applicable?(url)
|
|
38
42
|
url =~ self::URL_REGEXP
|
|
39
43
|
end
|
|
@@ -42,16 +46,33 @@ module Panchira
|
|
|
42
46
|
private
|
|
43
47
|
|
|
44
48
|
def fetch_page(url)
|
|
45
|
-
|
|
49
|
+
read_options = {
|
|
50
|
+
'User-Agent' => user_agent,
|
|
51
|
+
'Cookie' => cookie
|
|
52
|
+
}
|
|
53
|
+
|
|
54
|
+
raw_page = URI.parse(url).read(read_options)
|
|
46
55
|
charset = raw_page.charset
|
|
47
56
|
Nokogiri::HTML.parse(raw_page, url, charset)
|
|
48
57
|
end
|
|
49
58
|
|
|
50
59
|
def parse_canonical_url
|
|
51
|
-
|
|
52
|
-
|
|
53
|
-
|
|
54
|
-
|
|
60
|
+
history = []
|
|
61
|
+
|
|
62
|
+
# fetch page and refresh canonical_url until canonical_url converges.
|
|
63
|
+
loop do
|
|
64
|
+
url_in_res = @page.css('//link[rel="canonical"]/@href').to_s
|
|
65
|
+
|
|
66
|
+
if url_in_res.empty?
|
|
67
|
+
return history.last || @url
|
|
68
|
+
else
|
|
69
|
+
if history.include?(url_in_res) || history.length > 5
|
|
70
|
+
return url_in_res
|
|
71
|
+
else
|
|
72
|
+
history.push(url_in_res)
|
|
73
|
+
@page = fetch_page(url_in_res)
|
|
74
|
+
end
|
|
75
|
+
end
|
|
55
76
|
end
|
|
56
77
|
end
|
|
57
78
|
|
|
@@ -72,9 +93,9 @@ module Panchira
|
|
|
72
93
|
end
|
|
73
94
|
|
|
74
95
|
def parse_image
|
|
75
|
-
image =
|
|
76
|
-
image
|
|
77
|
-
image
|
|
96
|
+
image = PanchiraImage.new
|
|
97
|
+
image.url = parse_image_url
|
|
98
|
+
image.width, image.height = FastImage.size(image.url)
|
|
78
99
|
|
|
79
100
|
image
|
|
80
101
|
end
|
|
@@ -82,5 +103,25 @@ module Panchira
|
|
|
82
103
|
def parse_image_url
|
|
83
104
|
@page.css('//meta[property="og:image"]/@content').first.to_s
|
|
84
105
|
end
|
|
106
|
+
|
|
107
|
+
def parse_tags
|
|
108
|
+
[]
|
|
109
|
+
end
|
|
110
|
+
|
|
111
|
+
def cookie
|
|
112
|
+
''
|
|
113
|
+
end
|
|
114
|
+
|
|
115
|
+
def parse_author
|
|
116
|
+
@page.css('//meta[name="author"]/@content').first.to_s
|
|
117
|
+
end
|
|
118
|
+
|
|
119
|
+
def parse_circle
|
|
120
|
+
nil
|
|
121
|
+
end
|
|
122
|
+
|
|
123
|
+
def user_agent
|
|
124
|
+
"Mozilla/5.0 (compatible; PanchiraBot/#{VERSION}; +https://github.com/nuita/panchira)"
|
|
125
|
+
end
|
|
85
126
|
end
|
|
86
127
|
end
|
data/lib/panchira/version.rb
CHANGED
metadata
CHANGED
|
@@ -1,14 +1,14 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: panchira
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 1.2.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- kyp
|
|
8
8
|
autorequire:
|
|
9
9
|
bindir: exe
|
|
10
10
|
cert_chain: []
|
|
11
|
-
date: 2020-
|
|
11
|
+
date: 2020-10-31 00:00:00.000000000 Z
|
|
12
12
|
dependencies:
|
|
13
13
|
- !ruby/object:Gem::Dependency
|
|
14
14
|
name: bundler
|
|
@@ -101,7 +101,10 @@ files:
|
|
|
101
101
|
- bin/setup
|
|
102
102
|
- lib/panchira.rb
|
|
103
103
|
- lib/panchira/extensions.rb
|
|
104
|
+
- lib/panchira/panchira_result.rb
|
|
104
105
|
- lib/panchira/resolvers/dlsite_resolver.rb
|
|
106
|
+
- lib/panchira/resolvers/fanza_resolver.rb
|
|
107
|
+
- lib/panchira/resolvers/image_resolver.rb
|
|
105
108
|
- lib/panchira/resolvers/komiflo_resolver.rb
|
|
106
109
|
- lib/panchira/resolvers/melonbooks_resolver.rb
|
|
107
110
|
- lib/panchira/resolvers/narou_resolver.rb
|