bookmeter_scraper 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 6c328c66bbd91ea36ee0471c01f4e16a69ffd347
4
+ data.tar.gz: d1d5fde5a9d223c0aada00c670d1feed0b5e9c3a
5
+ SHA512:
6
+ metadata.gz: 9a2c3a6149faa92850aca03c455bd5ccd95f8eddaee005e4befd4daa328c340e171df5018803f9feb0ebb0f7fdf630f91d9b9259c52fb358f5b3ccf5570595e7
7
+ data.tar.gz: dcbf2db1efa63928b1c00a00d35af13ffba46345578c84b40b26a9b320ba69acd899c66c9319af5e15894a3d4e5ccc5fadbf64136fbbcbea804a0097f0729251
data/.gitignore ADDED
@@ -0,0 +1,10 @@
1
+ /.bundle/
2
+ /.yardoc
3
+ /Gemfile.lock
4
+ /_yardoc/
5
+ /coverage/
6
+ /doc/
7
+ /pkg/
8
+ /spec/reports/
9
+ /tmp/
10
+ spec/examples.txt
data/.rspec ADDED
@@ -0,0 +1,2 @@
1
+ --color
2
+ --require spec_helper
data/.travis.yml ADDED
@@ -0,0 +1,10 @@
1
+ language: ruby
2
+ rvm:
3
+ - 2.0.0
4
+ - 2.1.0
5
+ - 2.2.0
6
+ - 2.3.0
7
+ before_install:
8
+ - gem update bundler
9
+ env:
10
+ - TZ=Asia/Tokyo
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in bookmeter_scraper.gemspec
4
+ gemspec
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2016 Kohei Yamamoto
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.ja.md ADDED
@@ -0,0 +1,157 @@
1
+ # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
2
+
3
+ [読書メーター](http://bookmeter.com)の情報をスクレイピングして Ruby で扱えるようにするための gem です。
4
+
5
+ - 書籍情報
6
+ - 読んだ本
7
+ - 読んでる本
8
+ - 積読本
9
+ - 読みたい本
10
+ - お気に入り / お気に入られユーザ
11
+ - ユーザプロフィール
12
+
13
+ を取得可能です。
14
+
15
+ ## 注意
16
+
17
+ スクレイピングの頻度は常識の範囲内にとどめてください。読書メーターのサーバーへ故意に著しい負荷をかける行為は、利用規約の第 9 条で禁止されています。
18
+
19
+ - [利用規約 - 読書メーター](http://bookmeter.com/terms.php)
20
+
21
+ ## 使いかた
22
+
23
+ この gem を使うときは以下のコードが必要です。
24
+
25
+ ```ruby
26
+ require 'bookmeter_scraper'
27
+ ```
28
+
29
+ ### ログイン
30
+
31
+ 書籍情報、お気に入り / お気に入られユーザ情報を取得するには、`Bookmeter.log_in` でログインしておく必要があります。
32
+
33
+ ```ruby
34
+ bookmeter = BookmeterScraper::Bookmeter.log_in('example@example.com', 'password')
35
+ bookmeter.logged_in? # true
36
+ ```
37
+
38
+ `Bookmeter#log_in` でもログイン可能です。
39
+
40
+ ```ruby
41
+ bookmeter = BookmeterScraper::Bookmeter.new
42
+ bookmeter.log_in('example@example.com', 'password')
43
+ ```
44
+
45
+ ### 書籍情報の取得
46
+
47
+ 以下の書籍情報
48
+
49
+ - 読んだ本
50
+ - 読んでる本
51
+ - 積読本
52
+ - 読みたい本
53
+
54
+ を取得できます。取得には事前のログインが必要です。
55
+
56
+ #### 読んだ本
57
+
58
+ `Bookmeter#read_books` で「読んだ本」情報が取得できます。
59
+
60
+ ```ruby
61
+ books = bookmeter.read_books # ログインユーザの「読んだ本」を取得
62
+ bookmeter.read_books('01010101') # 他のユーザの ID を指定して、そのユーザの「読んだ本」を取得
63
+ ```
64
+
65
+ 書籍情報は書名 `name` と読了日(初読了日と再読日の両方)の配列 `read_dates` を属性として持つ `Struct` の配列として取得できます。
66
+
67
+ ```ruby
68
+ books[0].name
69
+ books[0].read_dates
70
+ ```
71
+
72
+ さらに、`Bookmeter#read_books_in` で特定年月の「読んだ本」情報が取得できます。
73
+
74
+ ```ruby
75
+ books = bookmeter.read_books_in(2016, 1) # ログインユーザが 2016 年 1 月に「読んだ本」を取得
76
+ books = bookmeter.read_books_in(2016, 1, '01010101') # ID で指定した他のユーザが 2016 年 1 月に「読んだ本」を取得
77
+ ```
78
+
79
+ #### 読んでる本 / 積読本 / 読みたい本
80
+
81
+ 「読んだ本」以外の書籍情報
82
+
83
+ - 読んでる本
84
+ - 積読本
85
+ - 読みたい本
86
+
87
+ も、それぞれ
88
+
89
+ - `Bookmeter#reading_books`
90
+ - `Bookmeter#tsundoku`
91
+ - `Bookmeter#wish_list`
92
+
93
+ で取得できます。
94
+
95
+ ```ruby
96
+ books = bookmeter.reading_books # ログインユーザの「読んでる本」を取得
97
+ books[0].name
98
+ books[0].read_dates # 読了日の Array は空
99
+
100
+ bookmeter.tsundoku # ログインユーザの「積読本」を取得
101
+ bookmeter.wish_list # ログインユーザの「読みたい本」を取得
102
+ ```
103
+
104
+ ### お気に入り / お気に入られユーザ情報の取得
105
+
106
+ `Bookmeter#followings` と `Bookmeter#followers` でログインユーザが参照できるお気に入り / お気に入られユーザの情報を取得できます。取得には事前のログインが必要です。
107
+
108
+ ```ruby
109
+ following_users = bookmeter.followings # 「お気に入り」ユーザの情報を取得
110
+ followers = bookmeter.followers # 「お気に入られ」ユーザの情報を取得
111
+ ```
112
+
113
+ ユーザ情報はユーザ名 `name` とユーザ ID `id` を持つ `Struct` の配列として取得できます。
114
+
115
+ ```ruby
116
+ following_users[0].name
117
+ following_users[0].id
118
+ followers[0].name
119
+ followers[0].id
120
+ ```
121
+
122
+ #### 注意
123
+
124
+ **お気に入り / お気に入られのページにページネーションが存在する場合には未対応です。**
125
+
126
+ ### ユーザのプロフィールの取得
127
+
128
+ `Bookmeter#profile` でユーザのプロフィールを取得できます。プロフィールはログインなしで閲覧できるため、ログインは不要です。
129
+
130
+ ```ruby
131
+ bookmeter = BookmeterScraper::Bookmeter.new
132
+ user_id = '000000'
133
+ profile = bookmeter.profile(user_id) # 任意ユーザの ID を指定してプロフィールを取得可能
134
+ ```
135
+
136
+ プロフィール情報は以下の属性を持つ `Struct` として取得できます。プロフィールで設定されていない属性は `nil` となります。
137
+
138
+ ```ruby
139
+ profile.name # ユーザ名
140
+ profile.gender # 性別
141
+ profile.age # 年齢
142
+ profile.blood_type # 血液型
143
+ profile.job # 職業
144
+ profile.address # 現住所
145
+ profile.url # URL / ブログ
146
+ profile.description # 自己紹介
147
+ profile.first_day # 記録初日
148
+ profile.elapsed_days # 経過日数
149
+ profile.read_books_count # 読んだ本の数
150
+ profile.read_pages_count # 読んだページの数
151
+ profile.reviews_count # 感想/レビューの数
152
+ profile.bookshelfs_count # 本棚の数
153
+ ```
154
+
155
+ ## ライセンス
156
+
157
+ [MIT License](http://opensource.org/licenses/MIT)
data/README.md ADDED
@@ -0,0 +1,163 @@
1
+ # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
2
+
3
+ A library for scraping [Bookmeter](http://bookmeter.com).
4
+
5
+ Japanese README is [here](https://github.com/kymmt90/bookmeter_scraper/blob/master/README.ja.md).
6
+
7
+
8
+ ## Installation
9
+
10
+ Add this line to your application's Gemfile:
11
+
12
+ ```ruby
13
+ gem 'bookmeter_scraper'
14
+ ```
15
+
16
+ And then execute:
17
+
18
+ $ bundle
19
+
20
+ Or install it yourself as:
21
+
22
+ $ gem install bookmeter_scraper
23
+
24
+
25
+ ## Usage
26
+
27
+ Add this line to your code before using this library:
28
+
29
+ ```ruby
30
+ require 'bookmeter_scraper'
31
+ ```
32
+
33
+ ### Log in
34
+
35
+ You need to log in Bookmeter to get books and followings / followers information by `Bookmeter.log_in`:
36
+
37
+ ```ruby
38
+ bookmeter = BookmeterScraper::Bookmeter.log_in('example@example.com', 'password')
39
+ bookmeter.logged_in? # true
40
+ ```
41
+
42
+ `Bookmeter#log_in` is also available:
43
+
44
+ ```ruby
45
+ bookmeter = BookmeterScraper::Bookmeter.new
46
+ bookmeter.log_in('example@example.com', 'password')
47
+ ```
48
+
49
+ ### Get books information
50
+
51
+ You can get books information:
52
+
53
+ - read books
54
+ - reading books
55
+ - tsundoku (stockpile)
56
+ - wish list
57
+
58
+ You need to log in Bookmeter in advance to get these information.
59
+
60
+ #### Read books
61
+
62
+ You can get read books information by `Bookmeter#read_books`:
63
+
64
+ ```ruby
65
+ books = bookmeter.read_books # get read books of the logged in user
66
+ bookmeter.read_books('01010101') # get read books of a user specified by ID
67
+ ```
68
+
69
+ Books infomation is an array of `Struct` which has `name` and `read_dates` as attributes.
70
+ `read_dates` is an array of finished reading dates (first finished date and reread dates):
71
+
72
+ ```ruby
73
+ books[0].name
74
+ books[0].read_dates
75
+ ```
76
+
77
+ To specify year-month for read books, you can use `Bookmeter#read_books_in`:
78
+
79
+ ```ruby
80
+ books = bookmeter.read_books_in(2016, 1) # get read books of the logged in user in 2016-01
81
+ books = bookmeter.read_books_in(2016, 1, '01010101') # get read books of a user in 2016-01
82
+ ```
83
+
84
+ #### Reading books / Tsundoku / Wish list
85
+
86
+ You can get other books information:
87
+
88
+ - `Bookmeter#reading_books`
89
+ - `Bookmeter#tsundoku`
90
+ - `Bookmeter#wish_list`
91
+
92
+ ```ruby
93
+ books = bookmeter.reading_books
94
+ books[0].name
95
+ books[0].read_dates # this array is empty
96
+
97
+ bookmeter.tsundoku
98
+ bookmeter.wish_list
99
+ ```
100
+
101
+ ### Get followings users / followers information
102
+
103
+ You can get following users (followings) and followers information by `Bookmeter#followings` and `Bookmeter#followers`:
104
+
105
+ ```ruby
106
+ following_users = bookmeter.followings
107
+ followers = bookmeter.followers
108
+ ```
109
+
110
+ You need to log in Bookmeter in advance to get these information.
111
+
112
+ Users information is an array of `Struct` which has `name` and `id` as attributes.
113
+
114
+ ```ruby
115
+ following_users[0].name
116
+ following_users[0].id
117
+ followers[0].name
118
+ followers[0].id
119
+ ```
120
+
121
+ #### Notice
122
+
123
+ **`Bookmeter#followings` and `Bookmeter#followers` have not supported paginated followings / followers pages yet.**
124
+
125
+ ### Get user profile
126
+
127
+ You can get a user profile by `Bookmeter#profile`:
128
+
129
+ ```ruby
130
+ bookmeter = BookmeterScraper::Bookmeter.new
131
+ user_id = '000000'
132
+ profile = bookmeter.profile(user_id) # You can specify arbitrary user ID
133
+ ```
134
+
135
+ You do not need to log in to get user profiles.
136
+ Profile information is `Struct` which has these attributes:
137
+
138
+ ```ruby
139
+ profile.name
140
+ profile.gender
141
+ profile.age
142
+ profile.blood_type
143
+ profile.job
144
+ profile.address
145
+ profile.url
146
+ profile.description
147
+ profile.first_day
148
+ profile.elapsed_days
149
+ profile.read_books_count
150
+ profile.read_pages_count
151
+ profile.reviews_count
152
+ profile.bookshelfs_count
153
+ ```
154
+
155
+
156
+ ## Contributing
157
+
158
+ Bug reports and pull requests are welcome on GitHub at https://github.com/kymmt90/bookmeter_scraper.
159
+
160
+
161
+ ## License
162
+
163
+ The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).
data/Rakefile ADDED
@@ -0,0 +1,5 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rspec/core/rake_task'
3
+
4
+ RSpec::Core::RakeTask.new("spec")
5
+ task :default => :spec
data/bin/console ADDED
@@ -0,0 +1,14 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bundler/setup"
4
+ require "bookmeter_scraper"
5
+
6
+ # You can add fixtures and/or initialization code here to make experimenting
7
+ # with your gem easier. You can also use a different console, if you like.
8
+
9
+ # (If you use this, don't forget to add pry to your Gemfile!)
10
+ # require "pry"
11
+ # Pry.start
12
+
13
+ require "irb"
14
+ IRB.start
data/bin/setup ADDED
@@ -0,0 +1,7 @@
1
+ #!/bin/bash
2
+ set -euo pipefail
3
+ IFS=$'\n\t'
4
+
5
+ bundle install
6
+
7
+ # Do any other automated setup that you need to do here
@@ -0,0 +1,28 @@
1
+ lib = File.expand_path('../lib', __FILE__)
2
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
3
+ require 'bookmeter_scraper/version'
4
+
5
+ Gem::Specification.new do |spec|
6
+ spec.name = "bookmeter_scraper"
7
+ spec.version = BookmeterScraper::VERSION
8
+ spec.authors = ["Kohei Yamamoto"]
9
+ spec.email = ["kymmt90@gmail.com"]
10
+
11
+ spec.summary = %q{Bookmeter scraping library}
12
+ spec.description = %q{Bookmeter scraping library}
13
+ spec.homepage = "https://github.com/kymmt90/bookmeter_scraper"
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
17
+ spec.bindir = "exe"
18
+ spec.executables = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "bundler", "~> 1.10"
22
+ spec.add_development_dependency "rake", "~> 10.0"
23
+ spec.add_development_dependency "rspec", "~> 3.4"
24
+ spec.add_development_dependency "webmock", "~> 1.22"
25
+
26
+ spec.add_dependency "yasuri", "~> 0.0"
27
+ spec.add_dependency "mechanize", "~> 2.7"
28
+ end
@@ -0,0 +1,3 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require "bookmeter_scraper"
@@ -0,0 +1,2 @@
1
+ require 'bookmeter_scraper/bookmeter'
2
+ require 'bookmeter_scraper/version'
@@ -0,0 +1,414 @@
1
+ require 'mechanize'
2
+ require 'yasuri'
3
+
4
+ module BookmeterScraper
5
+ class Bookmeter
6
+ ROOT_URI = 'http://bookmeter.com'.freeze
7
+ LOGIN_URI = "#{ROOT_URI}/login".freeze
8
+
9
+ PROFILE_ATTRIBUTES = %i(name gender age blood_type job address url description first_day elapsed_days read_books_count read_pages_count reviews_count bookshelfs_count)
10
+ Profile = Struct.new(*PROFILE_ATTRIBUTES)
11
+
12
+ BOOK_ATTRIBUTES = %i(name read_dates)
13
+ Book = Struct.new(*BOOK_ATTRIBUTES)
14
+
15
+ USER_ATTRIBUTES = %i(name id)
16
+ User = Struct.new(*USER_ATTRIBUTES)
17
+
18
+ JP_ATTRIBUTE_NAMES = {
19
+ gender: '性別',
20
+ age: '年齢',
21
+ blood_type: '血液型',
22
+ job: '職業',
23
+ address: '現住所',
24
+ url: 'URL / ブログ',
25
+ description: '自己紹介',
26
+ first_day: '記録初日',
27
+ elapsed_days: '経過日数',
28
+ read_books_count: '読んだ本',
29
+ read_pages_count: '読んだページ',
30
+ reviews_count: '感想/レビュー',
31
+ bookshelfs_count: '本棚',
32
+ }
33
+
34
+ NUM_BOOKS_PER_PAGE = 40
35
+ NUM_USERS_PER_PAGE = 20
36
+
37
+ attr_reader :log_in_user_id
38
+
39
+ def self.mypage_uri(user_id)
40
+ raise ArgumentError unless user_id =~ /^\d+$/
41
+ "#{ROOT_URI}/u/#{user_id}"
42
+ end
43
+
44
+ def self.read_books_uri(user_id)
45
+ raise ArgumentError unless user_id =~ /^\d+$/
46
+ "#{ROOT_URI}/u/#{user_id}/booklist"
47
+ end
48
+
49
+ def self.reading_books_uri(user_id)
50
+ raise ArgumentError unless user_id =~ /^\d+$/
51
+ "#{ROOT_URI}/u/#{user_id}/booklistnow"
52
+ end
53
+
54
+ def self.tsundoku_uri(user_id)
55
+ raise ArgumentError unless user_id =~ /^\d+$/
56
+ "#{ROOT_URI}/u/#{user_id}/booklisttun"
57
+ end
58
+
59
+ def self.wish_list_uri(user_id)
60
+ raise ArgumentError unless user_id =~ /^\d+$/
61
+ "#{ROOT_URI}/u/#{user_id}/booklistpre"
62
+ end
63
+
64
+ def self.followings_uri(user_id)
65
+ raise ArgumentError unless user_id =~ /^\d+$/
66
+ "#{ROOT_URI}/u/#{user_id}/favorite_user"
67
+ end
68
+
69
+ def self.followers_uri(user_id)
70
+ raise ArgumentError unless user_id =~ /^\d+$/
71
+ "#{ROOT_URI}/u/#{user_id}/favorited_user"
72
+ end
73
+
74
+ def self.log_in(mail, password)
75
+ Bookmeter.new.tap do |bookmeter|
76
+ bookmeter.log_in(mail, password)
77
+ end
78
+ end
79
+
80
+
81
+ def initialize(agent = nil)
82
+ @agent = agent.nil? ? Bookmeter.new_agent : agent
83
+ @logged_in = false
84
+ end
85
+
86
+ def log_in(mail, password)
87
+ raise BookmeterError if @agent.nil?
88
+
89
+ next_page = nil
90
+ page = @agent.get(LOGIN_URI) do |page|
91
+ next_page = page.form_with(action: '/login') do |form|
92
+ form.field_with(name: 'mail').value = mail
93
+ form.field_with(name: 'password').value = password
94
+ end.submit
95
+ end
96
+ @logged_in = next_page.uri.to_s == ROOT_URI + '/'
97
+ return unless logged_in?
98
+
99
+ mypage = next_page.link_with(text: 'マイページ').click
100
+ @log_in_user_id = extract_user_id(mypage)
101
+ end
102
+
103
+ def logged_in?
104
+ @logged_in
105
+ end
106
+
107
+ def profile(user_id)
108
+ raise ArgumentError unless user_id =~ /^\d+$/
109
+
110
+ mypage = @agent.get(Bookmeter.mypage_uri(user_id))
111
+
112
+ profile_dl_tags = mypage.search('#side_left > div.inner > div.profile > dl')
113
+ jp_attribute_names = profile_dl_tags.map { |i| i.children[0].children.text }
114
+ attribute_values = profile_dl_tags.map { |i| i.children[1].children.text }
115
+ jp_attributes = Hash[jp_attribute_names.zip(attribute_values)]
116
+ attributes = PROFILE_ATTRIBUTES.map do |attribute|
117
+ jp_attributes[JP_ATTRIBUTE_NAMES[attribute]]
118
+ end
119
+ attributes[0] = mypage.at_css('#side_left > div.inner > h3').text
120
+
121
+ Profile.new(*attributes)
122
+ end
123
+
124
+ def read_books(user_id = @log_in_user_id)
125
+ books = get_books(user_id, :read_books_uri)
126
+ books.each { |b| yield b } if block_given?
127
+ books
128
+ end
129
+
130
+ def read_books_in(year, month, user_id = @log_in_user_id)
131
+ date = Time.local(year, month)
132
+ books = get_read_books(user_id, date)
133
+ books.each { |b| yield b } if block_given?
134
+ books
135
+ end
136
+
137
+ def reading_books(user_id = @log_in_user_id)
138
+ books = get_books(user_id, :reading_books_uri)
139
+ books.each { |b| yield b } if block_given?
140
+ books
141
+ end
142
+
143
+ def tsundoku(user_id = @log_in_user_id)
144
+ books = get_books(user_id, :tsundoku_uri)
145
+ books.each { |b| yield b } if block_given?
146
+ books
147
+ end
148
+
149
+ def wish_list(user_id = @log_in_user_id)
150
+ books = get_books(user_id, :wish_list_uri)
151
+ books.each { |b| yield b } if block_given?
152
+ books
153
+ end
154
+
155
+ def followings(user_id = @log_in_user_id)
156
+ users = get_followings(user_id)
157
+ end
158
+
159
+ def followers(user_id = @log_in_user_id)
160
+ users = get_followers(user_id)
161
+ end
162
+
163
+ private
164
+
165
+ def self.new_agent
166
+ agent = Mechanize.new do |a|
167
+ a.user_agent_alias = 'Mac Safari'
168
+ end
169
+ end
170
+
171
+ def extract_user_id(page)
172
+ page.uri.to_s.match(/\/u\/(\d+)$/)[1]
173
+ end
174
+
175
+ def get_books(user_id, uri_method)
176
+ books = []
177
+ scraped_pages = scrape_book_pages(user_id, uri_method)
178
+ scraped_pages.each do |page|
179
+ books << get_book_structs(page)
180
+ books.flatten!
181
+ end
182
+ books
183
+ end
184
+
185
+ def get_read_books(user_id, target_ym)
186
+ result = []
187
+ scrape_book_pages(user_id, :read_books_uri).each do |page|
188
+ first_book_date = get_read_date(page['book_1_link'])
189
+ last_book_date = get_last_book_date(page)
190
+
191
+ first_book_ym = Time.local(first_book_date['year'].to_i, first_book_date['month'].to_i)
192
+ last_book_ym = Time.local(last_book_date['year'].to_i, last_book_date['month'].to_i)
193
+
194
+ if target_ym < last_book_ym
195
+ next
196
+ elsif target_ym == first_book_ym && target_ym > last_book_ym
197
+ result.concat(get_target_books(target_ym, page))
198
+ break
199
+ elsif target_ym < first_book_ym && target_ym > last_book_ym
200
+ result.concat(get_target_books(target_ym, page))
201
+ break
202
+ elsif target_ym <= first_book_ym && target_ym >= last_book_ym
203
+ result.concat(get_target_books(target_ym, page))
204
+ elsif target_ym > first_book_ym
205
+ break
206
+ end
207
+ end
208
+ result
209
+ end
210
+
211
+ def get_last_book_date(page)
212
+ NUM_BOOKS_PER_PAGE.downto(1) do |i|
213
+ link = page["book_#{i}_link"]
214
+ next if link.empty?
215
+ return get_read_date(link)
216
+ end
217
+ end
218
+
219
+ def get_target_books(target_ym, page)
220
+ target_books = []
221
+
222
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
223
+ next if page["book_#{i}_link"].empty?
224
+
225
+ read_yms = []
226
+ read_date = get_read_date(page["book_#{i}_link"])
227
+ read_dates = [Time.local(read_date['year'], read_date['month'], read_date['day'])]
228
+ read_yms << Time.local(read_date['year'], read_date['month'])
229
+
230
+ reread_dates = []
231
+ reread_dates << get_reread_date(page["book_#{i}_link"])
232
+ reread_dates.flatten!
233
+
234
+ unless reread_dates.empty?
235
+ reread_dates.each do |date|
236
+ read_yms << Time.local(date['reread_year'], date['reread_month'])
237
+ end
238
+ end
239
+
240
+ next unless read_yms.include?(target_ym)
241
+
242
+ unless reread_dates.empty?
243
+ reread_dates.each do |date|
244
+ read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
245
+ end
246
+ end
247
+ book_name = get_book_name(page["book_#{i}_link"])
248
+ book = Book.new(book_name, read_dates)
249
+ target_books << book
250
+ end
251
+
252
+ target_books
253
+ end
254
+
255
+ def scrape_book_pages(user_id, uri_method)
256
+ raise ArgumentError unless user_id =~ /^\d+$/
257
+ raise ArgumentError unless Bookmeter.methods.include?(uri_method)
258
+ return [] unless logged_in?
259
+
260
+ books_page = @agent.get(Bookmeter.method(uri_method).call(user_id))
261
+
262
+ # if books are not found at all
263
+ return [] if books_page.search('#main_left > div > center > a').empty?
264
+
265
+ if books_page.search('span.now_page').empty?
266
+ books_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
267
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
268
+ send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
269
+ send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
270
+ end
271
+ end
272
+ return [books_root.inject(@agent, books_page)]
273
+ end
274
+
275
+ books_root = Yasuri.pages_root '//span[@class="now_page"]/following-sibling::span[1]/a' do
276
+ text_page_index '//span[@class="now_page"]/a'
277
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
278
+ send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
279
+ send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
280
+ end
281
+ end
282
+ books_root.inject(@agent, books_page)
283
+ end
284
+
285
+ def get_book_name(book_link)
286
+ @agent.get(ROOT_URI + book_link).search('#title').text
287
+ end
288
+
289
+ def get_read_date(book_link)
290
+ book_page = @agent.get(ROOT_URI + book_link)
291
+ book_date = Yasuri.struct_date '//*[@id="book_edit_area"]/form[1]/div[2]' do
292
+ text_year '//*[@id="read_date_y"]/option[1]', truncate: /\d+/, proc: :to_i
293
+ text_month '//*[@id="read_date_m"]/option[1]', truncate: /\d+/, proc: :to_i
294
+ text_day '//*[@id="read_date_d"]/option[1]', truncate: /\d+/, proc: :to_i
295
+ end
296
+ book_date.inject(@agent, book_page)
297
+ end
298
+
299
+ def get_reread_date(book_link)
300
+ book_page = @agent.get(ROOT_URI + book_link)
301
+ book_reread_date = Yasuri.struct_reread_date '//*[@id="book_edit_area"]/div/form[1]/div[2]' do
302
+ text_reread_year '//div[@class="reread_box"]/form[1]/div[2]/select[1]/option[1]', truncate: /\d+/, proc: :to_i
303
+ text_reread_month '//div[@class="reread_box"]/form[1]/div[2]/select[2]/option[1]', truncate: /\d+/, proc: :to_i
304
+ text_reread_day '//div[@class="reread_box"]/form[1]/div[2]/select[3]/option[1]', truncate: /\d+/, proc: :to_i
305
+ end
306
+ book_reread_date.inject(@agent, book_page)
307
+ end
308
+
309
+ def get_book_structs(page)
310
+ books = []
311
+
312
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
313
+ break if page["book_#{i}_link"].empty?
314
+
315
+ read_dates = []
316
+ read_date = get_read_date(page["book_#{i}_link"])
317
+ unless read_date.empty?
318
+ read_dates << Time.local(read_date['year'], read_date['month'], read_date['day'])
319
+ end
320
+
321
+ reread_dates = []
322
+ reread_dates << get_reread_date(page["book_#{i}_link"])
323
+ reread_dates.flatten!
324
+
325
+ unless reread_dates.empty?
326
+ reread_dates.each do |date|
327
+ read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
328
+ end
329
+ end
330
+
331
+ book_name = get_book_name(page["book_#{i}_link"])
332
+ book = Book.new(book_name, read_dates)
333
+ books << book
334
+ end
335
+
336
+ books
337
+ end
338
+
339
+ def get_followings(user_id)
340
+ users = []
341
+ scraped_pages = user_id == @log_in_user_id ? scrape_followings_page(user_id)
342
+ : scrape_others_followings_page(user_id)
343
+ scraped_pages.each do |page|
344
+ users << get_user_structs(page)
345
+ users.flatten!
346
+ end
347
+ users
348
+ end
349
+
350
+ def get_followers(user_id)
351
+ users = []
352
+ scraped_pages = scrape_followers_page(user_id)
353
+ scraped_pages.each do |page|
354
+ users << get_user_structs(page)
355
+ users.flatten!
356
+ end
357
+ users
358
+ end
359
+
360
+ def get_user_structs(page)
361
+ users = []
362
+
363
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
364
+ break if page["user_#{i}_name"].empty?
365
+
366
+ user_name = page["user_#{i}_name"]
367
+ user_id = page["user_#{i}_link"].match(/\/u\/(\d+)$/)[1]
368
+ user = User.new(user_name, user_id)
369
+ users << user
370
+ end
371
+
372
+ users
373
+ end
374
+
375
+ def scrape_followings_page(user_id)
376
+ raise ArgumentError unless user_id =~ /^\d+$/
377
+ return [] unless logged_in?
378
+
379
+ followings_page = @agent.get(Bookmeter.followings_uri(user_id))
380
+ followings_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
381
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
382
+ send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@title")
383
+ send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@href")
384
+ end
385
+ end
386
+ [followings_root.inject(@agent, followings_page)]
387
+ end
388
+
389
+ def scrape_others_followings_page(user_id)
390
+ scrape_users_listing_page(user_id, :followings_uri)
391
+ end
392
+
393
+ def scrape_followers_page(user_id)
394
+ scrape_users_listing_page(user_id, :followers_uri)
395
+ end
396
+
397
+ def scrape_users_listing_page(user_id, uri_method)
398
+ raise ArgumentError unless user_id =~ /^\d+$/
399
+ raise ArgumentError unless Bookmeter.methods.include?(uri_method)
400
+ return [] unless logged_in?
401
+
402
+ page = @agent.get(Bookmeter.method(uri_method).call(user_id))
403
+ root = Yasuri.struct_users '//*[@id="main_left"]/div' do
404
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
405
+ send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@title")
406
+ send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@href")
407
+ end
408
+ end
409
+ [root.inject(@agent, page)]
410
+ end
411
+ end
412
+
413
+ class BookmeterError < StandardError; end
414
+ end
@@ -0,0 +1,3 @@
1
+ module BookmeterScraper
2
+ VERSION = "0.1.0"
3
+ end
metadata ADDED
@@ -0,0 +1,144 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bookmeter_scraper
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Kohei Yamamoto
8
+ autorequire:
9
+ bindir: exe
10
+ cert_chain: []
11
+ date: 2016-02-26 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: bundler
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.10'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.10'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rake
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '10.0'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '10.0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: rspec
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.4'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.4'
55
+ - !ruby/object:Gem::Dependency
56
+ name: webmock
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '1.22'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '1.22'
69
+ - !ruby/object:Gem::Dependency
70
+ name: yasuri
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '0.0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '0.0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: mechanize
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - "~>"
88
+ - !ruby/object:Gem::Version
89
+ version: '2.7'
90
+ type: :runtime
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - "~>"
95
+ - !ruby/object:Gem::Version
96
+ version: '2.7'
97
+ description: Bookmeter scraping library
98
+ email:
99
+ - kymmt90@gmail.com
100
+ executables:
101
+ - bookmeter_scraper
102
+ extensions: []
103
+ extra_rdoc_files: []
104
+ files:
105
+ - ".gitignore"
106
+ - ".rspec"
107
+ - ".travis.yml"
108
+ - Gemfile
109
+ - LICENSE.txt
110
+ - README.ja.md
111
+ - README.md
112
+ - Rakefile
113
+ - bin/console
114
+ - bin/setup
115
+ - bookmeter_scraper.gemspec
116
+ - exe/bookmeter_scraper
117
+ - lib/bookmeter_scraper.rb
118
+ - lib/bookmeter_scraper/bookmeter.rb
119
+ - lib/bookmeter_scraper/version.rb
120
+ homepage: https://github.com/kymmt90/bookmeter_scraper
121
+ licenses:
122
+ - MIT
123
+ metadata: {}
124
+ post_install_message:
125
+ rdoc_options: []
126
+ require_paths:
127
+ - lib
128
+ required_ruby_version: !ruby/object:Gem::Requirement
129
+ requirements:
130
+ - - ">="
131
+ - !ruby/object:Gem::Version
132
+ version: '0'
133
+ required_rubygems_version: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - ">="
136
+ - !ruby/object:Gem::Version
137
+ version: '0'
138
+ requirements: []
139
+ rubyforge_project:
140
+ rubygems_version: 2.5.1
141
+ signing_key:
142
+ specification_version: 4
143
+ summary: Bookmeter scraping library
144
+ test_files: []