bookmeter_scraper 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 6c328c66bbd91ea36ee0471c01f4e16a69ffd347
4
+ data.tar.gz: d1d5fde5a9d223c0aada00c670d1feed0b5e9c3a
5
+ SHA512:
6
+ metadata.gz: 9a2c3a6149faa92850aca03c455bd5ccd95f8eddaee005e4befd4daa328c340e171df5018803f9feb0ebb0f7fdf630f91d9b9259c52fb358f5b3ccf5570595e7
7
+ data.tar.gz: dcbf2db1efa63928b1c00a00d35af13ffba46345578c84b40b26a9b320ba69acd899c66c9319af5e15894a3d4e5ccc5fadbf64136fbbcbea804a0097f0729251
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ spec/examples.txt
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --require spec_helper
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.0.0
4
+ - 2.1.0
5
+ - 2.2.0
6
+ - 2.3.0
7
+ before_install:
8
+ - gem update bundler
9
+ env:
10
+ - TZ=Asia/Tokyo
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in bookmeter_scraper.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2016 Kohei Yamamoto
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.ja.md ADDED
@@ -0,0 +1,157 @@
1
+ # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
2
+
3
+ [読書メーター](http://bookmeter.com)の情報をスクレイピングして Ruby で扱えるようにするための gem です。
4
+
5
+ - 書籍情報
6
+ - 読んだ本
7
+ - 読んでる本
8
+ - 積読本
9
+ - 読みたい本
10
+ - お気に入り / お気に入られユーザ
11
+ - ユーザプロフィール
12
+
13
+ を取得可能です。
14
+
15
+ ## 注意
16
+
17
+ スクレイピングの頻度は常識の範囲内にとどめてください。読書メーターのサーバーへ故意に著しい負荷をかける行為は、利用規約の第 9 条で禁止されています。
18
+
19
+ - [利用規約 - 読書メーター](http://bookmeter.com/terms.php)
20
+
21
+ ## 使いかた
22
+
23
+ この gem を使うときは以下のコードが必要です。
24
+
25
+ ```ruby
26
+ require 'bookmeter_scraper'
27
+ ```
28
+
29
+ ### ログイン
30
+
31
+ 書籍情報、お気に入り / お気に入られユーザ情報を取得するには、`Bookmeter.log_in` でログインしておく必要があります。
32
+
33
+ ```ruby
34
+ bookmeter = BookmeterScraper::Bookmeter.log_in('example@example.com', 'password')
35
+ bookmeter.logged_in? # true
36
+ ```
37
+
38
+ `Bookmeter#log_in` でもログイン可能です。
39
+
40
+ ```ruby
41
+ bookmeter = BookmeterScraper::Bookmeter.new
42
+ bookmeter.log_in('example@example.com', 'password')
43
+ ```
44
+
45
+ ### 書籍情報の取得
46
+
47
+ 以下の書籍情報
48
+
49
+ - 読んだ本
50
+ - 読んでる本
51
+ - 積読本
52
+ - 読みたい本
53
+
54
+ を取得できます。取得には事前のログインが必要です。
55
+
56
+ #### 読んだ本
57
+
58
+ `Bookmeter#read_books` で「読んだ本」情報が取得できます。
59
+
60
+ ```ruby
61
+ books = bookmeter.read_books # ログインユーザの「読んだ本」を取得
62
+ bookmeter.read_books('01010101') # 他のユーザの ID を指定して、そのユーザの「読んだ本」を取得
63
+ ```
64
+
65
+ 書籍情報は書名 `name` と読了日(初読了日と再読日の両方)の配列 `read_dates` を属性として持つ `Struct` の配列として取得できます。
66
+
67
+ ```ruby
68
+ books[0].name
69
+ books[0].read_dates
70
+ ```
71
+
72
+ さらに、`Bookmeter#read_books_in` で特定年月の「読んだ本」情報が取得できます。
73
+
74
+ ```ruby
75
+ books = bookmeter.read_books_in(2016, 1) # ログインユーザが 2016 年 1 月に「読んだ本」を取得
76
+ books = bookmeter.read_books_in(2016, 1, '01010101') # ID で指定した他のユーザが 2016 年 1 月に「読んだ本」を取得
77
+ ```
78
+
79
+ #### 読んでる本 / 積読本 / 読みたい本
80
+
81
+ 「読んだ本」以外の書籍情報
82
+
83
+ - 読んでる本
84
+ - 積読本
85
+ - 読みたい本
86
+
87
+ も、それぞれ
88
+
89
+ - `Bookmeter#reading_books`
90
+ - `Bookmeter#tsundoku`
91
+ - `Bookmeter#wish_list`
92
+
93
+ で取得できます。
94
+
95
+ ```ruby
96
+ books = bookmeter.reading_books # ログインユーザの「読んでる本」を取得
97
+ books[0].name
98
+ books[0].read_dates # 読了日の Array は空
99
+
100
+ bookmeter.tsundoku # ログインユーザの「積読本」を取得
101
+ bookmeter.wish_list # ログインユーザの「読みたい本」を取得
102
+ ```
103
+
104
+ ### お気に入り / お気に入られユーザ情報の取得
105
+
106
+ `Bookmeter#followings` と `Bookmeter#followers` でログインユーザが参照できるお気に入り / お気に入られユーザの情報を取得できます。取得には事前のログインが必要です。
107
+
108
+ ```ruby
109
+ following_users = bookmeter.followings # 「お気に入り」ユーザの情報を取得
110
+ followers = bookmeter.followers # 「お気に入られ」ユーザの情報を取得
111
+ ```
112
+
113
+ ユーザ情報はユーザ名 `name` とユーザ ID `id` を持つ `Struct` の配列として取得できます。
114
+
115
+ ```ruby
116
+ following_users[0].name
117
+ following_users[0].id
118
+ followers[0].name
119
+ followers[0].id
120
+ ```
121
+
122
+ #### 注意
123
+
124
+ **お気に入り / お気に入られのページにページネーションが存在する場合には未対応です。**
125
+
126
+ ### ユーザのプロフィールの取得
127
+
128
+ `Bookmeter#profile` でユーザのプロフィールを取得できます。プロフィールはログインなしで閲覧できるため、ログインは不要です。
129
+
130
+ ```ruby
131
+ bookmeter = BookmeterScraper::Bookmeter.new
132
+ user_id = '000000'
133
+ profile = bookmeter.profile(user_id) # 任意ユーザの ID を指定してプロフィールを取得可能
134
+ ```
135
+
136
+ プロフィール情報は以下の属性を持つ `Struct` として取得できます。プロフィールで設定されていない属性は `nil` となります。
137
+
138
+ ```ruby
139
+ profile.name # ユーザ名
140
+ profile.gender # 性別
141
+ profile.age # 年齢
142
+ profile.blood_type # 血液型
143
+ profile.job # 職業
144
+ profile.address # 現住所
145
+ profile.url # URL / ブログ
146
+ profile.description # 自己紹介
147
+ profile.first_day # 記録初日
148
+ profile.elapsed_days # 経過日数
149
+ profile.read_books_count # 読んだ本の数
150
+ profile.read_pages_count # 読んだページの数
151
+ profile.reviews_count # 感想/レビューの数
152
+ profile.bookshelfs_count # 本棚の数
153
+ ```
154
+
155
+ ## ライセンス
156
+
157
+ [MIT License](http://opensource.org/licenses/MIT)
data/README.md ADDED
@@ -0,0 +1,163 @@
1
+ # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
2
+
3
+ A library for scraping [Bookmeter](http://bookmeter.com).
4
+
5
+ Japanese README is [here](https://github.com/kymmt90/bookmeter_scraper/blob/master/README.ja.md).
6
+
7
+
8
+ ## Installation
9
+
10
+ Add this line to your application's Gemfile:
11
+
12
+ ```ruby
13
+ gem 'bookmeter_scraper'
14
+ ```
15
+
16
+ And then execute:
17
+
18
+ $ bundle
19
+
20
+ Or install it yourself as:
21
+
22
+ $ gem install bookmeter_scraper
23
+
24
+
25
+ ## Usage
26
+
27
+ Add this line to your code before using this library:
28
+
29
+ ```ruby
30
+ require 'bookmeter_scraper'
31
+ ```
32
+
33
+ ### Log in
34
+
35
+ You need to log in Bookmeter to get books and followings / followers information by `Bookmeter.log_in`:
36
+
37
+ ```ruby
38
+ bookmeter = BookmeterScraper::Bookmeter.log_in('example@example.com', 'password')
39
+ bookmeter.logged_in? # true
40
+ ```
41
+
42
+ `Bookmeter#log_in` is also available:
43
+
44
+ ```ruby
45
+ bookmeter = BookmeterScraper::Bookmeter.new
46
+ bookmeter.log_in('example@example.com', 'password')
47
+ ```
48
+
49
+ ### Get books information
50
+
51
+ You can get books information:
52
+
53
+ - read books
54
+ - reading books
55
+ - tsundoku (stockpile)
56
+ - wish list
57
+
58
+ You need to log in Bookmeter in advance to get these information.
59
+
60
+ #### Read books
61
+
62
+ You can get read books information by `Bookmeter#read_books`:
63
+
64
+ ```ruby
65
+ books = bookmeter.read_books # get read books of the logged in user
66
+ bookmeter.read_books('01010101') # get read books of a user specified by ID
67
+ ```
68
+
69
+ Books infomation is an array of `Struct` which has `name` and `read_dates` as attributes.
70
+ `read_dates` is an array of finished reading dates (first finished date and reread dates):
71
+
72
+ ```ruby
73
+ books[0].name
74
+ books[0].read_dates
75
+ ```
76
+
77
+ To specify year-month for read books, you can use `Bookmeter#read_books_in`:
78
+
79
+ ```ruby
80
+ books = bookmeter.read_books_in(2016, 1) # get read books of the logged in user in 2016-01
81
+ books = bookmeter.read_books_in(2016, 1, '01010101') # get read books of a user in 2016-01
82
+ ```
83
+
84
+ #### Reading books / Tsundoku / Wish list
85
+
86
+ You can get other books information:
87
+
88
+ - `Bookmeter#reading_books`
89
+ - `Bookmeter#tsundoku`
90
+ - `Bookmeter#wish_list`
91
+
92
+ ```ruby
93
+ books = bookmeter.reading_books
94
+ books[0].name
95
+ books[0].read_dates # this array is empty
96
+
97
+ bookmeter.tsundoku
98
+ bookmeter.wish_list
99
+ ```
100
+
101
+ ### Get followings users / followers information
102
+
103
+ You can get following users (followings) and followers information by `Bookmeter#followings` and `Bookmeter#followers`:
104
+
105
+ ```ruby
106
+ following_users = bookmeter.followings
107
+ followers = bookmeter.followers
108
+ ```
109
+
110
+ You need to log in Bookmeter in advance to get these information.
111
+
112
+ Users information is an array of `Struct` which has `name` and `id` as attributes.
113
+
114
+ ```ruby
115
+ following_users[0].name
116
+ following_users[0].id
117
+ followers[0].name
118
+ followers[0].id
119
+ ```
120
+
121
+ #### Notice
122
+
123
+ **`Bookmeter#followings` and `Bookmeter#followers` have not supported paginated followings / followers pages yet.**
124
+
125
+ ### Get user profile
126
+
127
+ You can get a user profile by `Bookmeter#profile`:
128
+
129
+ ```ruby
130
+ bookmeter = BookmeterScraper::Bookmeter.new
131
+ user_id = '000000'
132
+ profile = bookmeter.profile(user_id) # You can specify arbitrary user ID
133
+ ```
134
+
135
+ You do not need to log in to get user profiles.
136
+ Profile information is `Struct` which has these attributes:
137
+
138
+ ```ruby
139
+ profile.name
140
+ profile.gender
141
+ profile.age
142
+ profile.blood_type
143
+ profile.job
144
+ profile.address
145
+ profile.url
146
+ profile.description
147
+ profile.first_day
148
+ profile.elapsed_days
149
+ profile.read_books_count
150
+ profile.read_pages_count
151
+ profile.reviews_count
152
+ profile.bookshelfs_count
153
+ ```
154
+
155
+
156
+ ## Contributing
157
+
158
+ Bug reports and pull requests are welcome on GitHub at https://github.com/kymmt90/bookmeter_scraper.
159
+
160
+
161
+ ## License
162
+
163
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,5 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new("spec")
5
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "bookmeter_scraper"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,28 @@
1
+ lib = File.expand_path('../lib', __FILE__)
2
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
3
+ require 'bookmeter_scraper/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "bookmeter_scraper"
7
+ spec.version = BookmeterScraper::VERSION
8
+ spec.authors = ["Kohei Yamamoto"]
9
+ spec.email = ["kymmt90@gmail.com"]
10
+
11
+ spec.summary = %q{Bookmeter scraping library}
12
+ spec.description = %q{Bookmeter scraping library}
13
+ spec.homepage = "https://github.com/kymmt90/bookmeter_scraper"
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
+ spec.bindir = "exe"
18
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "bundler", "~> 1.10"
22
+ spec.add_development_dependency "rake", "~> 10.0"
23
+ spec.add_development_dependency "rspec", "~> 3.4"
24
+ spec.add_development_dependency "webmock", "~> 1.22"
25
+
26
+ spec.add_dependency "yasuri", "~> 0.0"
27
+ spec.add_dependency "mechanize", "~> 2.7"
28
+ end
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bookmeter_scraper"
@@ -0,0 +1,2 @@
1
+ require 'bookmeter_scraper/bookmeter'
2
+ require 'bookmeter_scraper/version'
@@ -0,0 +1,414 @@
1
+ require 'mechanize'
2
+ require 'yasuri'
3
+
4
+ module BookmeterScraper
5
+ class Bookmeter
6
+ ROOT_URI = 'http://bookmeter.com'.freeze
7
+ LOGIN_URI = "#{ROOT_URI}/login".freeze
8
+
9
+ PROFILE_ATTRIBUTES = %i(name gender age blood_type job address url description first_day elapsed_days read_books_count read_pages_count reviews_count bookshelfs_count)
10
+ Profile = Struct.new(*PROFILE_ATTRIBUTES)
11
+
12
+ BOOK_ATTRIBUTES = %i(name read_dates)
13
+ Book = Struct.new(*BOOK_ATTRIBUTES)
14
+
15
+ USER_ATTRIBUTES = %i(name id)
16
+ User = Struct.new(*USER_ATTRIBUTES)
17
+
18
+ JP_ATTRIBUTE_NAMES = {
19
+ gender: '性別',
20
+ age: '年齢',
21
+ blood_type: '血液型',
22
+ job: '職業',
23
+ address: '現住所',
24
+ url: 'URL / ブログ',
25
+ description: '自己紹介',
26
+ first_day: '記録初日',
27
+ elapsed_days: '経過日数',
28
+ read_books_count: '読んだ本',
29
+ read_pages_count: '読んだページ',
30
+ reviews_count: '感想/レビュー',
31
+ bookshelfs_count: '本棚',
32
+ }
33
+
34
+ NUM_BOOKS_PER_PAGE = 40
35
+ NUM_USERS_PER_PAGE = 20
36
+
37
+ attr_reader :log_in_user_id
38
+
39
+ def self.mypage_uri(user_id)
40
+ raise ArgumentError unless user_id =~ /^\d+$/
41
+ "#{ROOT_URI}/u/#{user_id}"
42
+ end
43
+
44
+ def self.read_books_uri(user_id)
45
+ raise ArgumentError unless user_id =~ /^\d+$/
46
+ "#{ROOT_URI}/u/#{user_id}/booklist"
47
+ end
48
+
49
+ def self.reading_books_uri(user_id)
50
+ raise ArgumentError unless user_id =~ /^\d+$/
51
+ "#{ROOT_URI}/u/#{user_id}/booklistnow"
52
+ end
53
+
54
+ def self.tsundoku_uri(user_id)
55
+ raise ArgumentError unless user_id =~ /^\d+$/
56
+ "#{ROOT_URI}/u/#{user_id}/booklisttun"
57
+ end
58
+
59
+ def self.wish_list_uri(user_id)
60
+ raise ArgumentError unless user_id =~ /^\d+$/
61
+ "#{ROOT_URI}/u/#{user_id}/booklistpre"
62
+ end
63
+
64
+ def self.followings_uri(user_id)
65
+ raise ArgumentError unless user_id =~ /^\d+$/
66
+ "#{ROOT_URI}/u/#{user_id}/favorite_user"
67
+ end
68
+
69
+ def self.followers_uri(user_id)
70
+ raise ArgumentError unless user_id =~ /^\d+$/
71
+ "#{ROOT_URI}/u/#{user_id}/favorited_user"
72
+ end
73
+
74
+ def self.log_in(mail, password)
75
+ Bookmeter.new.tap do |bookmeter|
76
+ bookmeter.log_in(mail, password)
77
+ end
78
+ end
79
+
80
+
81
+ def initialize(agent = nil)
82
+ @agent = agent.nil? ? Bookmeter.new_agent : agent
83
+ @logged_in = false
84
+ end
85
+
86
+ def log_in(mail, password)
87
+ raise BookmeterError if @agent.nil?
88
+
89
+ next_page = nil
90
+ page = @agent.get(LOGIN_URI) do |page|
91
+ next_page = page.form_with(action: '/login') do |form|
92
+ form.field_with(name: 'mail').value = mail
93
+ form.field_with(name: 'password').value = password
94
+ end.submit
95
+ end
96
+ @logged_in = next_page.uri.to_s == ROOT_URI + '/'
97
+ return unless logged_in?
98
+
99
+ mypage = next_page.link_with(text: 'マイページ').click
100
+ @log_in_user_id = extract_user_id(mypage)
101
+ end
102
+
103
+ def logged_in?
104
+ @logged_in
105
+ end
106
+
107
+ def profile(user_id)
108
+ raise ArgumentError unless user_id =~ /^\d+$/
109
+
110
+ mypage = @agent.get(Bookmeter.mypage_uri(user_id))
111
+
112
+ profile_dl_tags = mypage.search('#side_left > div.inner > div.profile > dl')
113
+ jp_attribute_names = profile_dl_tags.map { |i| i.children[0].children.text }
114
+ attribute_values = profile_dl_tags.map { |i| i.children[1].children.text }
115
+ jp_attributes = Hash[jp_attribute_names.zip(attribute_values)]
116
+ attributes = PROFILE_ATTRIBUTES.map do |attribute|
117
+ jp_attributes[JP_ATTRIBUTE_NAMES[attribute]]
118
+ end
119
+ attributes[0] = mypage.at_css('#side_left > div.inner > h3').text
120
+
121
+ Profile.new(*attributes)
122
+ end
123
+
124
+ def read_books(user_id = @log_in_user_id)
125
+ books = get_books(user_id, :read_books_uri)
126
+ books.each { |b| yield b } if block_given?
127
+ books
128
+ end
129
+
130
+ def read_books_in(year, month, user_id = @log_in_user_id)
131
+ date = Time.local(year, month)
132
+ books = get_read_books(user_id, date)
133
+ books.each { |b| yield b } if block_given?
134
+ books
135
+ end
136
+
137
+ def reading_books(user_id = @log_in_user_id)
138
+ books = get_books(user_id, :reading_books_uri)
139
+ books.each { |b| yield b } if block_given?
140
+ books
141
+ end
142
+
143
+ def tsundoku(user_id = @log_in_user_id)
144
+ books = get_books(user_id, :tsundoku_uri)
145
+ books.each { |b| yield b } if block_given?
146
+ books
147
+ end
148
+
149
+ def wish_list(user_id = @log_in_user_id)
150
+ books = get_books(user_id, :wish_list_uri)
151
+ books.each { |b| yield b } if block_given?
152
+ books
153
+ end
154
+
155
+ def followings(user_id = @log_in_user_id)
156
+ users = get_followings(user_id)
157
+ end
158
+
159
+ def followers(user_id = @log_in_user_id)
160
+ users = get_followers(user_id)
161
+ end
162
+
163
+ private
164
+
165
+ def self.new_agent
166
+ agent = Mechanize.new do |a|
167
+ a.user_agent_alias = 'Mac Safari'
168
+ end
169
+ end
170
+
171
+ def extract_user_id(page)
172
+ page.uri.to_s.match(/\/u\/(\d+)$/)[1]
173
+ end
174
+
175
+ def get_books(user_id, uri_method)
176
+ books = []
177
+ scraped_pages = scrape_book_pages(user_id, uri_method)
178
+ scraped_pages.each do |page|
179
+ books << get_book_structs(page)
180
+ books.flatten!
181
+ end
182
+ books
183
+ end
184
+
185
+ def get_read_books(user_id, target_ym)
186
+ result = []
187
+ scrape_book_pages(user_id, :read_books_uri).each do |page|
188
+ first_book_date = get_read_date(page['book_1_link'])
189
+ last_book_date = get_last_book_date(page)
190
+
191
+ first_book_ym = Time.local(first_book_date['year'].to_i, first_book_date['month'].to_i)
192
+ last_book_ym = Time.local(last_book_date['year'].to_i, last_book_date['month'].to_i)
193
+
194
+ if target_ym < last_book_ym
195
+ next
196
+ elsif target_ym == first_book_ym && target_ym > last_book_ym
197
+ result.concat(get_target_books(target_ym, page))
198
+ break
199
+ elsif target_ym < first_book_ym && target_ym > last_book_ym
200
+ result.concat(get_target_books(target_ym, page))
201
+ break
202
+ elsif target_ym <= first_book_ym && target_ym >= last_book_ym
203
+ result.concat(get_target_books(target_ym, page))
204
+ elsif target_ym > first_book_ym
205
+ break
206
+ end
207
+ end
208
+ result
209
+ end
210
+
211
+ def get_last_book_date(page)
212
+ NUM_BOOKS_PER_PAGE.downto(1) do |i|
213
+ link = page["book_#{i}_link"]
214
+ next if link.empty?
215
+ return get_read_date(link)
216
+ end
217
+ end
218
+
219
+ def get_target_books(target_ym, page)
220
+ target_books = []
221
+
222
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
223
+ next if page["book_#{i}_link"].empty?
224
+
225
+ read_yms = []
226
+ read_date = get_read_date(page["book_#{i}_link"])
227
+ read_dates = [Time.local(read_date['year'], read_date['month'], read_date['day'])]
228
+ read_yms << Time.local(read_date['year'], read_date['month'])
229
+
230
+ reread_dates = []
231
+ reread_dates << get_reread_date(page["book_#{i}_link"])
232
+ reread_dates.flatten!
233
+
234
+ unless reread_dates.empty?
235
+ reread_dates.each do |date|
236
+ read_yms << Time.local(date['reread_year'], date['reread_month'])
237
+ end
238
+ end
239
+
240
+ next unless read_yms.include?(target_ym)
241
+
242
+ unless reread_dates.empty?
243
+ reread_dates.each do |date|
244
+ read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
245
+ end
246
+ end
247
+ book_name = get_book_name(page["book_#{i}_link"])
248
+ book = Book.new(book_name, read_dates)
249
+ target_books << book
250
+ end
251
+
252
+ target_books
253
+ end
254
+
255
+ def scrape_book_pages(user_id, uri_method)
256
+ raise ArgumentError unless user_id =~ /^\d+$/
257
+ raise ArgumentError unless Bookmeter.methods.include?(uri_method)
258
+ return [] unless logged_in?
259
+
260
+ books_page = @agent.get(Bookmeter.method(uri_method).call(user_id))
261
+
262
+ # if books are not found at all
263
+ return [] if books_page.search('#main_left > div > center > a').empty?
264
+
265
+ if books_page.search('span.now_page').empty?
266
+ books_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
267
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
268
+ send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
269
+ send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
270
+ end
271
+ end
272
+ return [books_root.inject(@agent, books_page)]
273
+ end
274
+
275
+ books_root = Yasuri.pages_root '//span[@class="now_page"]/following-sibling::span[1]/a' do
276
+ text_page_index '//span[@class="now_page"]/a'
277
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
278
+ send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
279
+ send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
280
+ end
281
+ end
282
+ books_root.inject(@agent, books_page)
283
+ end
284
+
285
+ def get_book_name(book_link)
286
+ @agent.get(ROOT_URI + book_link).search('#title').text
287
+ end
288
+
289
+ def get_read_date(book_link)
290
+ book_page = @agent.get(ROOT_URI + book_link)
291
+ book_date = Yasuri.struct_date '//*[@id="book_edit_area"]/form[1]/div[2]' do
292
+ text_year '//*[@id="read_date_y"]/option[1]', truncate: /\d+/, proc: :to_i
293
+ text_month '//*[@id="read_date_m"]/option[1]', truncate: /\d+/, proc: :to_i
294
+ text_day '//*[@id="read_date_d"]/option[1]', truncate: /\d+/, proc: :to_i
295
+ end
296
+ book_date.inject(@agent, book_page)
297
+ end
298
+
299
+ def get_reread_date(book_link)
300
+ book_page = @agent.get(ROOT_URI + book_link)
301
+ book_reread_date = Yasuri.struct_reread_date '//*[@id="book_edit_area"]/div/form[1]/div[2]' do
302
+ text_reread_year '//div[@class="reread_box"]/form[1]/div[2]/select[1]/option[1]', truncate: /\d+/, proc: :to_i
303
+ text_reread_month '//div[@class="reread_box"]/form[1]/div[2]/select[2]/option[1]', truncate: /\d+/, proc: :to_i
304
+ text_reread_day '//div[@class="reread_box"]/form[1]/div[2]/select[3]/option[1]', truncate: /\d+/, proc: :to_i
305
+ end
306
+ book_reread_date.inject(@agent, book_page)
307
+ end
308
+
309
+ def get_book_structs(page)
310
+ books = []
311
+
312
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
313
+ break if page["book_#{i}_link"].empty?
314
+
315
+ read_dates = []
316
+ read_date = get_read_date(page["book_#{i}_link"])
317
+ unless read_date.empty?
318
+ read_dates << Time.local(read_date['year'], read_date['month'], read_date['day'])
319
+ end
320
+
321
+ reread_dates = []
322
+ reread_dates << get_reread_date(page["book_#{i}_link"])
323
+ reread_dates.flatten!
324
+
325
+ unless reread_dates.empty?
326
+ reread_dates.each do |date|
327
+ read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
328
+ end
329
+ end
330
+
331
+ book_name = get_book_name(page["book_#{i}_link"])
332
+ book = Book.new(book_name, read_dates)
333
+ books << book
334
+ end
335
+
336
+ books
337
+ end
338
+
339
+ def get_followings(user_id)
340
+ users = []
341
+ scraped_pages = user_id == @log_in_user_id ? scrape_followings_page(user_id)
342
+ : scrape_others_followings_page(user_id)
343
+ scraped_pages.each do |page|
344
+ users << get_user_structs(page)
345
+ users.flatten!
346
+ end
347
+ users
348
+ end
349
+
350
+ def get_followers(user_id)
351
+ users = []
352
+ scraped_pages = scrape_followers_page(user_id)
353
+ scraped_pages.each do |page|
354
+ users << get_user_structs(page)
355
+ users.flatten!
356
+ end
357
+ users
358
+ end
359
+
360
+ def get_user_structs(page)
361
+ users = []
362
+
363
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
364
+ break if page["user_#{i}_name"].empty?
365
+
366
+ user_name = page["user_#{i}_name"]
367
+ user_id = page["user_#{i}_link"].match(/\/u\/(\d+)$/)[1]
368
+ user = User.new(user_name, user_id)
369
+ users << user
370
+ end
371
+
372
+ users
373
+ end
374
+
375
+ def scrape_followings_page(user_id)
376
+ raise ArgumentError unless user_id =~ /^\d+$/
377
+ return [] unless logged_in?
378
+
379
+ followings_page = @agent.get(Bookmeter.followings_uri(user_id))
380
+ followings_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
381
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
382
+ send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@title")
383
+ send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@href")
384
+ end
385
+ end
386
+ [followings_root.inject(@agent, followings_page)]
387
+ end
388
+
389
+ def scrape_others_followings_page(user_id)
390
+ scrape_users_listing_page(user_id, :followings_uri)
391
+ end
392
+
393
+ def scrape_followers_page(user_id)
394
+ scrape_users_listing_page(user_id, :followers_uri)
395
+ end
396
+
397
+ def scrape_users_listing_page(user_id, uri_method)
398
+ raise ArgumentError unless user_id =~ /^\d+$/
399
+ raise ArgumentError unless Bookmeter.methods.include?(uri_method)
400
+ return [] unless logged_in?
401
+
402
+ page = @agent.get(Bookmeter.method(uri_method).call(user_id))
403
+ root = Yasuri.struct_users '//*[@id="main_left"]/div' do
404
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
405
+ send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@title")
406
+ send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@href")
407
+ end
408
+ end
409
+ [root.inject(@agent, page)]
410
+ end
411
+ end
412
+
413
+ class BookmeterError < StandardError; end
414
+ end
@@ -0,0 +1,3 @@
1
+ module BookmeterScraper
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,144 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bookmeter_scraper
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Kohei Yamamoto
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-02-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.10'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.10'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.4'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.4'
55
+ - !ruby/object:Gem::Dependency
56
+ name: webmock
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.22'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.22'
69
+ - !ruby/object:Gem::Dependency
70
+ name: yasuri
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '0.0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '0.0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: mechanize
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '2.7'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '2.7'
97
+ description: Bookmeter scraping library
98
+ email:
99
+ - kymmt90@gmail.com
100
+ executables:
101
+ - bookmeter_scraper
102
+ extensions: []
103
+ extra_rdoc_files: []
104
+ files:
105
+ - ".gitignore"
106
+ - ".rspec"
107
+ - ".travis.yml"
108
+ - Gemfile
109
+ - LICENSE.txt
110
+ - README.ja.md
111
+ - README.md
112
+ - Rakefile
113
+ - bin/console
114
+ - bin/setup
115
+ - bookmeter_scraper.gemspec
116
+ - exe/bookmeter_scraper
117
+ - lib/bookmeter_scraper.rb
118
+ - lib/bookmeter_scraper/bookmeter.rb
119
+ - lib/bookmeter_scraper/version.rb
120
+ homepage: https://github.com/kymmt90/bookmeter_scraper
121
+ licenses:
122
+ - MIT
123
+ metadata: {}
124
+ post_install_message:
125
+ rdoc_options: []
126
+ require_paths:
127
+ - lib
128
+ required_ruby_version: !ruby/object:Gem::Requirement
129
+ requirements:
130
+ - - ">="
131
+ - !ruby/object:Gem::Version
132
+ version: '0'
133
+ required_rubygems_version: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: '0'
138
+ requirements: []
139
+ rubyforge_project:
140
+ rubygems_version: 2.5.1
141
+ signing_key:
142
+ specification_version: 4
143
+ summary: Bookmeter scraping library
144
+ test_files: []