bookmeter_scraper 0.1.1 → 0.1.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a8d0b08d6fe209f678ebda3c621b41412a575aef
4
- data.tar.gz: 1d9db389c1c0ac92216c00e76f7a025327ed77c7
3
+ metadata.gz: eed0f25219959cbcb0f1e74a0db32d7f6ef46de8
4
+ data.tar.gz: bf5981a2fcb2c933c41720cb99846ac8d1df7dad
5
5
  SHA512:
6
- metadata.gz: 9df797a4925a93e8cb8a7f07e38bbd4b1288acadd27a2d0e32fd3e0161120a843dd3b2c5c184c7116e90bb6d8b52a2cacfc246a04f7ede74b6081359882f2d51
7
- data.tar.gz: ba4d5f59e35034facda467d469a0f2ef3cfef10829f2e75a82af6a839233fe2ff0e588a9c1535db25bc48eb24a2d08fd81d614b67f1c92d66b2656b0603fc303
6
+ metadata.gz: 894e75e566f6e547089048bf6872917c79dcb2a9456d36afd59dc624bdd62a67b9bdce23cd811e1035408b14bc7eba48928e03337d50fed102206daee899cf5f
7
+ data.tar.gz: 3077ac2b3b900537f494ed3fe001cb2be7af6a726293945ad738215ea205ac536d53ac3ad70ba1587dfbcedf4d820387845036c31fc80af92445c4ea2ffd9388
@@ -1,4 +1,5 @@
1
- # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
1
+ # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper) [![Gem Version](https://badge.fury.io/rb/bookmeter_scraper.svg)](https://badge.fury.io/rb/bookmeter_scraper)
2
+
2
3
 
3
4
  [読書メーター](http://bookmeter.com)の情報をスクレイピングして Ruby で扱えるようにするための gem です。
4
5
 
@@ -30,10 +31,11 @@ require 'bookmeter_scraper'
30
31
 
31
32
  書籍情報、お気に入り / お気に入られユーザ情報を取得するには、`Bookmeter.log_in` または `Bookmeter#log_in` でログインしておく必要があります。
32
33
 
33
- ログイン情報の入力には以下の 2 通りの方法があります。
34
+ ログイン情報の入力には以下の 3 通りの方法があります。
34
35
 
35
36
  1. 引数として渡す
36
37
  2. `config.yml` へ記述しておく
38
+ 3. ブロック内で設定する
37
39
 
38
40
  #### 1. 引数として渡す
39
41
 
@@ -67,6 +69,28 @@ bookmeter = BookmeterScraper::Bookmeter.log_in
67
69
  bookmeter.logged_in? # true
68
70
  ```
69
71
 
72
+ #### 3. ブロック内で設定する
73
+
74
+ 以下のように `Bookmeter.log_in` へブロックを渡すことで、ログインできます。
75
+
76
+ ```ruby
77
+ bookmeter = BookmeterScraper::Bookmeter.log_in do |configuration|
78
+ configuration.mail = 'example@example.com'
79
+ configuration.password = 'password'
80
+ end
81
+ bookmeter.logged_in? # true
82
+ ```
83
+
84
+ `Bookmeter#log_in` でもログイン可能です。
85
+
86
+ ```ruby
87
+ bookmeter = BookmeterScraper::Bookmeter.new
88
+ bookmeter.log_in do |configuration|
89
+ configuration.mail = 'example@example.com'
90
+ configuration.password = 'password'
91
+ end
92
+ ```
93
+
70
94
  ### 書籍情報の取得
71
95
 
72
96
  以下の書籍情報
@@ -76,7 +100,7 @@ bookmeter.logged_in? # true
76
100
  - 積読本
77
101
  - 読みたい本
78
102
 
79
- を取得できます。取得には事前のログインが必要です。
103
+ を取得できます。取得には `Bookmeter.log_in` などによる事前のログインが必要です。
80
104
 
81
105
  #### 読んだ本
82
106
 
@@ -92,13 +116,17 @@ bookmeter.read_books('01010101') # 他のユーザの ID を指定して、
92
116
  - 書名 `name`
93
117
  - 著者 `author`
94
118
  - 読了日(初読了日と再読日の両方)の配列 `read_dates`
119
+ - 読書メーター内の書籍ページの URI `uri`
120
+ - 書籍の表紙画像 URI `image_uri`
95
121
 
96
- を属性として持つ `Struct` の配列として取得できます。
122
+ を属性として持つ `Book` の配列として取得できます。
97
123
 
98
124
  ```ruby
99
125
  books[0].name
100
126
  books[0].author
101
127
  books[0].read_dates
128
+ books[0].uri
129
+ books[0].image_uri
102
130
  ```
103
131
 
104
132
  さらに、`Bookmeter#read_books_in` で特定年月の「読んだ本」情報が取得できます。
@@ -129,6 +157,8 @@ books = bookmeter.reading_books # ログインユーザの「読んでる本
129
157
  books[0].name
130
158
  books[0].author
131
159
  books[0].read_dates # 読了日の Array は空
160
+ books[0].uri
161
+ books[0].image_uri
132
162
 
133
163
  bookmeter.tsundoku # ログインユーザの「積読本」を取得
134
164
  bookmeter.wish_list # ログインユーザの「読みたい本」を取得
@@ -143,13 +173,20 @@ following_users = bookmeter.followings # 「お気に入り」ユーザの情
143
173
  followers = bookmeter.followers # 「お気に入られ」ユーザの情報を取得
144
174
  ```
145
175
 
146
- ユーザ情報はユーザ名 `name` とユーザ ID `id` を持つ `Struct` の配列として取得できます。
176
+ ユーザ情報は
177
+
178
+ - ユーザ名 `name`
179
+ - ユーザ ID `id`
180
+ - 読書メーター内のユーザページの URI `uri`
181
+
182
+ を持つ `User` の配列として取得できます。
147
183
 
148
184
  ```ruby
149
185
  following_users[0].name
150
186
  following_users[0].id
151
187
  followers[0].name
152
188
  followers[0].id
189
+ followers[0].uri
153
190
  ```
154
191
 
155
192
  #### 注意
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
1
+ # Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper) [![Gem Version](https://badge.fury.io/rb/bookmeter_scraper.svg)](https://badge.fury.io/rb/bookmeter_scraper)
2
2
 
3
3
  A library for scraping [Bookmeter](http://bookmeter.com).
4
4
 
@@ -34,10 +34,11 @@ require 'bookmeter_scraper'
34
34
 
35
35
  You need to log in Bookmeter to get books and followings / followers information by `Bookmeter.log_in` or `Bookmeter#log_in`.
36
36
 
37
- There are 2 ways to input authentication information:
37
+ There are 3 ways to input authentication information:
38
38
 
39
39
  1. Passing as arguments
40
40
  2. Writing out to `config.yml`
41
+ 3. Configuring in a block
41
42
 
42
43
  #### 1. Passing as arguments
43
44
 
@@ -71,6 +72,27 @@ bookmeter = BookmeterScraper::Bookmeter.log_in
71
72
  bookmeter.logged_in? # true
72
73
  ```
73
74
 
75
+ #### 3. Configuring in a block
76
+
77
+ You can configure mail address and password in a block.
78
+
79
+ ```ruby
80
+ bookmeter = BookmeterScraper::Bookmeter.log_in do |configuration|
81
+ configuration.mail = 'example@example.com'
82
+ configuration.password = 'password'
83
+ end
84
+ bookmeter.logged_in? # true
85
+ ```
86
+
87
+ `Bookmeter#log_in` is also available:
88
+
89
+ ```ruby
90
+ bookmeter = BookmeterScraper::Bookmeter.new
91
+ bookmeter.log_in do |configuration|
92
+ configuration.mail = 'example@example.com'
93
+ configuration.password = 'password'
94
+ end
95
+ ```
74
96
 
75
97
  ### Get books information
76
98
 
@@ -92,12 +114,20 @@ books = bookmeter.read_books # get read books of the logged in user
92
114
  bookmeter.read_books('01010101') # get read books of a user specified by ID
93
115
  ```
94
116
 
95
- Books infomation is an array of `Struct` which has `name` and `read_dates` as attributes.
117
+ Books infomation is an array of `Book` which has these attributes:
118
+
119
+ - `name`
120
+ - `read_dates`
121
+ - `uri`
122
+ - `image_uri`
123
+
96
124
  `read_dates` is an array of finished reading dates (first finished date and reread dates):
97
125
 
98
126
  ```ruby
99
127
  books[0].name
100
128
  books[0].read_dates
129
+ books[0].uri
130
+ books[0].image_uri
101
131
  ```
102
132
 
103
133
  To specify year-month for read books, you can use `Bookmeter#read_books_in`:
@@ -135,13 +165,19 @@ followers = bookmeter.followers
135
165
 
136
166
  You need to log in Bookmeter in advance to get these information.
137
167
 
138
- Users information is an array of `Struct` which has `name` and `id` as attributes.
168
+ Users information is an array of `Struct` which has following attributes:
169
+
170
+ - `name`
171
+ - `id`
172
+ - `uri`
139
173
 
140
174
  ```ruby
141
175
  following_users[0].name
142
176
  following_users[0].id
177
+ following_users[0].uri
143
178
  followers[0].name
144
179
  followers[0].id
180
+ followers[0].uri
145
181
  ```
146
182
 
147
183
  #### Notice
@@ -1,3 +1,49 @@
1
1
  require 'bookmeter_scraper/bookmeter'
2
2
  require 'bookmeter_scraper/configuration'
3
3
  require 'bookmeter_scraper/version'
4
+
5
+ module BookmeterScraper
6
+ ROOT_URI = 'http://bookmeter.com'.freeze
7
+ LOGIN_URI = "#{ROOT_URI}/login".freeze
8
+
9
+ USER_ID_REGEX = /^\d+$/
10
+
11
+ class << self
12
+ def mypage_uri(user_id)
13
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
14
+ "#{ROOT_URI}/u/#{user_id}"
15
+ end
16
+
17
+ def read_books_uri(user_id)
18
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
19
+ "#{ROOT_URI}/u/#{user_id}/booklist"
20
+ end
21
+
22
+ def reading_books_uri(user_id)
23
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
24
+ "#{ROOT_URI}/u/#{user_id}/booklistnow"
25
+ end
26
+
27
+ def tsundoku_uri(user_id)
28
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
29
+ "#{ROOT_URI}/u/#{user_id}/booklisttun"
30
+ end
31
+
32
+ def wish_list_uri(user_id)
33
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
34
+ "#{ROOT_URI}/u/#{user_id}/booklistpre"
35
+ end
36
+
37
+ def followings_uri(user_id)
38
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
39
+ "#{ROOT_URI}/u/#{user_id}/favorite_user"
40
+ end
41
+
42
+ def followers_uri(user_id)
43
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
44
+ "#{ROOT_URI}/u/#{user_id}/favorited_user"
45
+ end
46
+ end
47
+
48
+ class BookmeterError < StandardError; end
49
+ end
@@ -0,0 +1,59 @@
1
+ require 'forwardable'
2
+
3
+ module BookmeterScraper
4
+ class Agent
5
+ extend Forwardable
6
+ def_delegator :@agent, :get
7
+ def_delegator :@agent, :click
8
+
9
+ attr_reader :log_in_user_id
10
+
11
+
12
+ def initialize
13
+ @agent = Mechanize.new do |a|
14
+ a.user_agent_alias = Mechanize::AGENT_ALIASES.keys.reject do |ua_alias|
15
+ %w(Android iPad iPhone Mechanize).include?(ua_alias)
16
+ end.sample
17
+ end
18
+ @log_in_user_id = nil
19
+ end
20
+
21
+ def log_in(config)
22
+ raise ArgumentError if config.nil?
23
+
24
+ page_after_submitting_form = nil
25
+ @agent.get(BookmeterScraper::LOGIN_URI) do |page|
26
+ page_after_submitting_form = page.form_with(action: '/login') do |form|
27
+ form.field_with(name: 'mail').value = config.mail
28
+ form.field_with(name: 'password').value = config.password
29
+ end.submit
30
+ end
31
+
32
+ if page_after_logging_in? page_after_submitting_form
33
+ mypage = page_after_submitting_form.link_with(text: 'マイページ').click
34
+ @log_in_user_id = extract_user_id(mypage)
35
+ else
36
+ nil
37
+ end
38
+ end
39
+
40
+ def logged_in?
41
+ !@log_in_user_id.nil?
42
+ end
43
+
44
+
45
+ private
46
+
47
+ def page_after_logging_in?(page)
48
+ raise ArgumentError if page.nil?
49
+
50
+ page.uri.to_s == BookmeterScraper::ROOT_URI + '/'
51
+ end
52
+
53
+ def extract_user_id(page)
54
+ raise ArgumentError if page.nil?
55
+
56
+ page.uri.to_s.match(/\/u\/(\d+)$/)[1]
57
+ end
58
+ end
59
+ end
@@ -1,130 +1,51 @@
1
- require 'forwardable'
2
- require 'mechanize'
3
- require 'yasuri'
1
+ require 'bookmeter_scraper/agent'
2
+ require 'bookmeter_scraper/scraper'
4
3
 
5
4
  module BookmeterScraper
6
5
  class Bookmeter
7
6
  DEFAULT_CONFIG_PATH = './config.yml'.freeze
8
7
 
9
- ROOT_URI = 'http://bookmeter.com'.freeze
10
- LOGIN_URI = "#{ROOT_URI}/login".freeze
11
-
12
- PROFILE_ATTRIBUTES = %i(name gender age blood_type job address url description first_day elapsed_days read_books_count read_pages_count reviews_count bookshelfs_count)
13
- Profile = Struct.new(*PROFILE_ATTRIBUTES)
14
-
15
- BOOK_ATTRIBUTES = %i(name author read_dates)
16
- Book = Struct.new(*BOOK_ATTRIBUTES)
17
- class Books
18
- extend Forwardable
19
-
20
- def_delegator :@books, :[]
21
- def_delegator :@books, :[]=
22
- def_delegator :@books, :<<
23
- def_delegator :@books, :each
24
- def_delegator :@books, :flatten!
25
-
26
- def initialize; @books = []; end
27
-
28
- def concat(books)
29
- books.each do |book|
30
- next if @books.any? { |b| b.name == book.name && b.author == book.author }
31
- @books << book
32
- end
33
- end
34
-
35
- def to_a; @books; end
36
- end
37
-
38
- USER_ATTRIBUTES = %i(name id)
39
- User = Struct.new(*USER_ATTRIBUTES)
40
-
41
- JP_ATTRIBUTE_NAMES = {
42
- gender: '性別',
43
- age: '年齢',
44
- blood_type: '血液型',
45
- job: '職業',
46
- address: '現住所',
47
- url: 'URL / ブログ',
48
- description: '自己紹介',
49
- first_day: '記録初日',
50
- elapsed_days: '経過日数',
51
- read_books_count: '読んだ本',
52
- read_pages_count: '読んだページ',
53
- reviews_count: '感想/レビュー',
54
- bookshelfs_count: '本棚',
55
- }
56
-
57
- NUM_BOOKS_PER_PAGE = 40
58
- NUM_USERS_PER_PAGE = 20
59
-
60
8
  attr_reader :log_in_user_id
61
9
 
62
- def self.mypage_uri(user_id)
63
- raise ArgumentError unless user_id =~ /^\d+$/
64
- "#{ROOT_URI}/u/#{user_id}"
65
- end
66
-
67
- def self.read_books_uri(user_id)
68
- raise ArgumentError unless user_id =~ /^\d+$/
69
- "#{ROOT_URI}/u/#{user_id}/booklist"
70
- end
71
-
72
- def self.reading_books_uri(user_id)
73
- raise ArgumentError unless user_id =~ /^\d+$/
74
- "#{ROOT_URI}/u/#{user_id}/booklistnow"
75
- end
76
-
77
- def self.tsundoku_uri(user_id)
78
- raise ArgumentError unless user_id =~ /^\d+$/
79
- "#{ROOT_URI}/u/#{user_id}/booklisttun"
80
- end
81
-
82
- def self.wish_list_uri(user_id)
83
- raise ArgumentError unless user_id =~ /^\d+$/
84
- "#{ROOT_URI}/u/#{user_id}/booklistpre"
85
- end
86
-
87
- def self.followings_uri(user_id)
88
- raise ArgumentError unless user_id =~ /^\d+$/
89
- "#{ROOT_URI}/u/#{user_id}/favorite_user"
90
- end
91
-
92
- def self.followers_uri(user_id)
93
- raise ArgumentError unless user_id =~ /^\d+$/
94
- "#{ROOT_URI}/u/#{user_id}/favorited_user"
95
- end
96
10
 
97
- def self.log_in(mail = nil, password = nil)
98
- Bookmeter.new.tap do |bookmeter|
99
- bookmeter.log_in(mail, password)
11
+ class << self
12
+ def log_in(mail = nil, password = nil)
13
+ Bookmeter.new.tap do |bookmeter|
14
+ if block_given?
15
+ config = Configuration.new
16
+ yield config
17
+ bookmeter.log_in(config.mail, config.password)
18
+ else
19
+ bookmeter.log_in(mail, password)
20
+ end
21
+ end
100
22
  end
101
23
  end
102
24
 
103
25
 
104
26
  def initialize(agent = nil)
105
- @agent = agent.nil? ? Bookmeter.new_agent : agent
106
- @logged_in = false
27
+ @agent = agent.nil? ? Agent.new : agent
28
+ @scraper = Scraper.new(@agent)
29
+ @logged_in = false
107
30
  @log_in_user_id = nil
108
- @book_pages = {}
109
31
  end
110
32
 
111
33
  def log_in(mail = nil, password = nil)
112
34
  raise BookmeterError if @agent.nil?
113
35
 
114
- config = Configuration.new(DEFAULT_CONFIG_PATH) if mail.nil? && password.nil?
115
-
116
- page_after_submitting_form = nil
117
- page = @agent.get(LOGIN_URI) do |page|
118
- page_after_submitting_form = page.form_with(action: '/login') do |form|
119
- form.field_with(name: 'mail').value = config ? config.mail : mail
120
- form.field_with(name: 'password').value = config ? config.password : password.to_s
121
- end.submit
122
- end
123
- @logged_in = page_after_submitting_form.uri.to_s == ROOT_URI + '/'
124
- return unless logged_in?
36
+ configuration = if block_given?
37
+ Configuration.new.tap { |config| yield config }
38
+ elsif mail.nil? && password.nil?
39
+ Configuration.new(DEFAULT_CONFIG_PATH)
40
+ else
41
+ Configuration.new.tap do |config|
42
+ config.mail = mail
43
+ config.password = password
44
+ end
45
+ end
125
46
 
126
- mypage = page_after_submitting_form.link_with(text: 'マイページ').click
127
- @log_in_user_id = extract_user_id(mypage)
47
+ @log_in_user_id = @agent.log_in(configuration)
48
+ @logged_in = !@log_in_user_id.nil?
128
49
  end
129
50
 
130
51
  def logged_in?
@@ -132,321 +53,59 @@ module BookmeterScraper
132
53
  end
133
54
 
134
55
  def profile(user_id)
135
- raise ArgumentError unless user_id =~ /^\d+$/
136
-
137
- mypage = @agent.get(Bookmeter.mypage_uri(user_id))
138
-
139
- profile_dl_tags = mypage.search('#side_left > div.inner > div.profile > dl')
140
- jp_attribute_names = profile_dl_tags.map { |i| i.children[0].children.text }
141
- attribute_values = profile_dl_tags.map { |i| i.children[1].children.text }
142
- jp_attributes = Hash[jp_attribute_names.zip(attribute_values)]
143
- attributes = PROFILE_ATTRIBUTES.map do |attribute|
144
- jp_attributes[JP_ATTRIBUTE_NAMES[attribute]]
145
- end
146
- attributes[0] = mypage.at_css('#side_left > div.inner > h3').text
147
-
148
- Profile.new(*attributes)
56
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
57
+ @scraper.fetch_profile(user_id)
149
58
  end
150
59
 
151
60
  def read_books(user_id = @log_in_user_id)
152
- books = get_books(user_id, :read_books_uri)
153
- books.each { |b| yield b } if block_given?
154
- books.to_a
61
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
62
+ fetch_books(user_id, :read_books_uri)
155
63
  end
156
64
 
157
65
  def read_books_in(year, month, user_id = @log_in_user_id)
66
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
67
+
158
68
  date = Time.local(year, month)
159
- books = get_read_books(user_id, date)
69
+ books = @scraper.fetch_read_books(user_id, date)
160
70
  books.each { |b| yield b } if block_given?
161
71
  books.to_a
162
72
  end
163
73
 
164
74
  def reading_books(user_id = @log_in_user_id)
165
- books = get_books(user_id, :reading_books_uri)
166
- books.each { |b| yield b } if block_given?
167
- books.to_a
75
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
76
+ fetch_books(user_id, :reading_books_uri)
168
77
  end
169
78
 
170
79
  def tsundoku(user_id = @log_in_user_id)
171
- books = get_books(user_id, :tsundoku_uri)
172
- books.each { |b| yield b } if block_given?
173
- books.to_a
80
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
81
+ fetch_books(user_id, :tsundoku_uri)
174
82
  end
175
83
 
176
84
  def wish_list(user_id = @log_in_user_id)
177
- books = get_books(user_id, :wish_list_uri)
178
- books.each { |b| yield b } if block_given?
179
- books.to_a
85
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
86
+ fetch_books(user_id, :wish_list_uri)
180
87
  end
181
88
 
182
89
  def followings(user_id = @log_in_user_id)
183
- users = get_followings(user_id)
90
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
91
+ @scraper.fetch_followings(user_id)
184
92
  end
185
93
 
186
94
  def followers(user_id = @log_in_user_id)
187
- users = get_followers(user_id)
188
- end
189
-
190
- private
191
-
192
- def self.new_agent
193
- agent = Mechanize.new do |a|
194
- a.user_agent_alias = Mechanize::AGENT_ALIASES.keys.reject do |ua_alias|
195
- %w(Android iPad iPhone Mechanize).include?(ua_alias)
196
- end.sample
197
- end
198
- end
199
-
200
- def extract_user_id(page)
201
- page.uri.to_s.match(/\/u\/(\d+)$/)[1]
202
- end
203
-
204
- def get_books(user_id, uri_method)
205
- books = Books.new
206
- scraped_pages = scrape_book_pages(user_id, uri_method)
207
- scraped_pages.each do |page|
208
- books << get_book_structs(page)
209
- books.flatten!
210
- end
211
- books
95
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
96
+ @scraper.fetch_followers(user_id)
212
97
  end
213
98
 
214
- def get_read_books(user_id, target_ym)
215
- result = Books.new
216
- scrape_book_pages(user_id, :read_books_uri).each do |page|
217
- first_book_date = get_read_date(page['book_1_link'])
218
- last_book_date = get_last_book_date(page)
219
-
220
- first_book_ym = Time.local(first_book_date['year'].to_i, first_book_date['month'].to_i)
221
- last_book_ym = Time.local(last_book_date['year'].to_i, last_book_date['month'].to_i)
222
-
223
- if target_ym < last_book_ym
224
- next
225
- elsif target_ym == first_book_ym && target_ym > last_book_ym
226
- result.concat(get_target_books(target_ym, page))
227
- break
228
- elsif target_ym < first_book_ym && target_ym > last_book_ym
229
- result.concat(get_target_books(target_ym, page))
230
- break
231
- elsif target_ym <= first_book_ym && target_ym >= last_book_ym
232
- result.concat(get_target_books(target_ym, page))
233
- elsif target_ym > first_book_ym
234
- break
235
- end
236
- end
237
- result
238
- end
239
-
240
- def get_last_book_date(page)
241
- NUM_BOOKS_PER_PAGE.downto(1) do |i|
242
- link = page["book_#{i}_link"]
243
- next if link.empty?
244
- return get_read_date(link)
245
- end
246
- end
247
-
248
- def get_target_books(target_ym, page)
249
- target_books = Books.new
250
-
251
- 1.upto(NUM_BOOKS_PER_PAGE) do |i|
252
- next if page["book_#{i}_link"].empty?
253
-
254
- read_yms = []
255
- read_date = get_read_date(page["book_#{i}_link"])
256
- read_dates = [Time.local(read_date['year'], read_date['month'], read_date['day'])]
257
- read_yms << Time.local(read_date['year'], read_date['month'])
258
-
259
- reread_dates = []
260
- reread_dates << get_reread_date(page["book_#{i}_link"])
261
- reread_dates.flatten!
262
-
263
- unless reread_dates.empty?
264
- reread_dates.each do |date|
265
- read_yms << Time.local(date['reread_year'], date['reread_month'])
266
- end
267
- end
268
99
 
269
- next unless read_yms.include?(target_ym)
270
-
271
- unless reread_dates.empty?
272
- reread_dates.each do |date|
273
- read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
274
- end
275
- end
276
- book_name = get_book_name(page["book_#{i}_link"])
277
- book_author = get_book_author(page["book_#{i}_link"])
278
- book = Book.new(book_name, book_author, read_dates)
279
- target_books << book
280
- end
281
-
282
- target_books
283
- end
284
-
285
- def scrape_book_pages(user_id, uri_method)
286
- raise ArgumentError unless user_id =~ /^\d+$/
287
- raise ArgumentError unless Bookmeter.methods.include?(uri_method)
288
- return [] unless logged_in?
289
-
290
- books_page = @agent.get(Bookmeter.method(uri_method).call(user_id))
291
-
292
- # if books are not found at all
293
- return [] if books_page.search('#main_left > div > center > a').empty?
294
-
295
- if books_page.search('span.now_page').empty?
296
- books_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
297
- 1.upto(NUM_BOOKS_PER_PAGE) do |i|
298
- send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
299
- send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
300
- end
301
- end
302
- return [books_root.inject(@agent, books_page)]
303
- end
304
-
305
- books_root = Yasuri.pages_root '//span[@class="now_page"]/following-sibling::span[1]/a' do
306
- text_page_index '//span[@class="now_page"]/a'
307
- 1.upto(NUM_BOOKS_PER_PAGE) do |i|
308
- send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
309
- send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
310
- end
311
- end
312
- books_root.inject(@agent, books_page)
313
- end
314
-
315
- def get_book_page(book_uri)
316
- @book_pages[book_uri] = @agent.get(ROOT_URI + book_uri) unless @book_pages[book_uri]
317
- @book_pages[book_uri]
318
- end
319
-
320
- def get_book_name(book_uri)
321
- get_book_page(book_uri).search('#title').text
322
- end
323
-
324
- def get_book_author(book_uri)
325
- get_book_page(book_uri).search('#author_name').text
326
- end
327
-
328
- def get_read_date(book_uri)
329
- book_date = Yasuri.struct_date '//*[@id="book_edit_area"]/form[1]/div[2]' do
330
- text_year '//*[@id="read_date_y"]/option[1]', truncate: /\d+/, proc: :to_i
331
- text_month '//*[@id="read_date_m"]/option[1]', truncate: /\d+/, proc: :to_i
332
- text_day '//*[@id="read_date_d"]/option[1]', truncate: /\d+/, proc: :to_i
333
- end
334
- book_date.inject(@agent, get_book_page(book_uri))
335
- end
336
-
337
- def get_reread_date(book_uri)
338
- book_reread_date = Yasuri.struct_reread_date '//*[@id="book_edit_area"]/div/form[1]/div[2]' do
339
- text_reread_year '//div[@class="reread_box"]/form[1]/div[2]/select[1]/option[1]', truncate: /\d+/, proc: :to_i
340
- text_reread_month '//div[@class="reread_box"]/form[1]/div[2]/select[2]/option[1]', truncate: /\d+/, proc: :to_i
341
- text_reread_day '//div[@class="reread_box"]/form[1]/div[2]/select[3]/option[1]', truncate: /\d+/, proc: :to_i
342
- end
343
- book_reread_date.inject(@agent, get_book_page(book_uri))
344
- end
345
-
346
- def get_book_structs(page)
347
- books = []
348
-
349
- 1.upto(NUM_BOOKS_PER_PAGE) do |i|
350
- break if page["book_#{i}_link"].empty?
351
-
352
- read_dates = []
353
- read_date = get_read_date(page["book_#{i}_link"])
354
- unless read_date.empty?
355
- read_dates << Time.local(read_date['year'], read_date['month'], read_date['day'])
356
- end
357
-
358
- reread_dates = []
359
- reread_dates << get_reread_date(page["book_#{i}_link"])
360
- reread_dates.flatten!
361
-
362
- unless reread_dates.empty?
363
- reread_dates.each do |date|
364
- read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
365
- end
366
- end
367
-
368
- book_name = get_book_name(page["book_#{i}_link"])
369
- book_author = get_book_author(page["book_#{i}_link"])
370
- book = Book.new(book_name, book_author, read_dates)
371
- books << book
372
- end
373
-
374
- books
375
- end
376
-
377
- def get_followings(user_id)
378
- users = []
379
- scraped_pages = user_id == @log_in_user_id ? scrape_followings_page(user_id)
380
- : scrape_others_followings_page(user_id)
381
- scraped_pages.each do |page|
382
- users << get_user_structs(page)
383
- users.flatten!
384
- end
385
- users
386
- end
387
-
388
- def get_followers(user_id)
389
- users = []
390
- scraped_pages = scrape_followers_page(user_id)
391
- scraped_pages.each do |page|
392
- users << get_user_structs(page)
393
- users.flatten!
394
- end
395
- users
396
- end
397
-
398
- def get_user_structs(page)
399
- users = []
400
-
401
- 1.upto(NUM_USERS_PER_PAGE) do |i|
402
- break if page["user_#{i}_name"].empty?
403
-
404
- user_name = page["user_#{i}_name"]
405
- user_id = page["user_#{i}_link"].match(/\/u\/(\d+)$/)[1]
406
- user = User.new(user_name, user_id)
407
- users << user
408
- end
409
-
410
- users
411
- end
412
-
413
- def scrape_followings_page(user_id)
414
- raise ArgumentError unless user_id =~ /^\d+$/
415
- return [] unless logged_in?
416
-
417
- followings_page = @agent.get(Bookmeter.followings_uri(user_id))
418
- followings_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
419
- 1.upto(NUM_USERS_PER_PAGE) do |i|
420
- send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@title")
421
- send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@href")
422
- end
423
- end
424
- [followings_root.inject(@agent, followings_page)]
425
- end
426
-
427
- def scrape_others_followings_page(user_id)
428
- scrape_users_listing_page(user_id, :followings_uri)
429
- end
430
-
431
- def scrape_followers_page(user_id)
432
- scrape_users_listing_page(user_id, :followers_uri)
433
- end
100
+ private
434
101
 
435
- def scrape_users_listing_page(user_id, uri_method)
436
- raise ArgumentError unless user_id =~ /^\d+$/
437
- raise ArgumentError unless Bookmeter.methods.include?(uri_method)
438
- return [] unless logged_in?
102
+ def fetch_books(user_id, uri_method)
103
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
104
+ raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
439
105
 
440
- page = @agent.get(Bookmeter.method(uri_method).call(user_id))
441
- root = Yasuri.struct_users '//*[@id="main_left"]/div' do
442
- 1.upto(NUM_USERS_PER_PAGE) do |i|
443
- send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@title")
444
- send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@href")
445
- end
446
- end
447
- [root.inject(@agent, page)]
106
+ books = @scraper.fetch_books(user_id, uri_method)
107
+ books.each { |book| yield book } if block_given?
108
+ books.to_a
448
109
  end
449
110
  end
450
-
451
- class BookmeterError < StandardError; end
452
111
  end
@@ -1,11 +1,14 @@
1
- require 'yaml'
2
-
3
1
  module BookmeterScraper
4
2
  class Configuration
5
- attr_reader :mail, :password
3
+ attr_accessor :mail, :password
4
+
5
+ def initialize(config_file = nil)
6
+ if config_file.nil?
7
+ @mail = @password = ''
8
+ return
9
+ end
6
10
 
7
- def initialize(config_file)
8
- config = YAML.load_file(config_file)
11
+ config = load_yaml_file(config_file)
9
12
  unless config.has_key?('mail') && config.has_key?('password')
10
13
  raise ConfigurationError, "#{config_file}: Invalid configuration file"
11
14
  end
@@ -13,6 +16,14 @@ module BookmeterScraper
13
16
  @mail = config['mail']
14
17
  @password = config['password']
15
18
  end
19
+
20
+
21
+ private
22
+
23
+ def load_yaml_file(config_file)
24
+ require 'yaml'
25
+ YAML.load_file(config_file)
26
+ end
16
27
  end
17
28
 
18
29
  class ConfigurationError < StandardError; end
@@ -0,0 +1,388 @@
1
+ require 'forwardable'
2
+ require 'mechanize'
3
+ require 'yasuri'
4
+
5
+ module BookmeterScraper
6
+ class Scraper
7
+ PROFILE_ATTRIBUTES = %i(
8
+ name
9
+ gender
10
+ age
11
+ blood_type
12
+ job
13
+ address
14
+ url
15
+ description
16
+ first_day
17
+ elapsed_days
18
+ read_books_count
19
+ read_pages_count
20
+ reviews_count
21
+ bookshelfs_count
22
+ )
23
+ Profile = Struct.new(*PROFILE_ATTRIBUTES)
24
+
25
+ JP_ATTRIBUTE_NAMES = {
26
+ gender: '性別',
27
+ age: '年齢',
28
+ blood_type: '血液型',
29
+ job: '職業',
30
+ address: '現住所',
31
+ url: 'URL / ブログ',
32
+ description: '自己紹介',
33
+ first_day: '記録初日',
34
+ elapsed_days: '経過日数',
35
+ read_books_count: '読んだ本',
36
+ read_pages_count: '読んだページ',
37
+ reviews_count: '感想/レビュー',
38
+ bookshelfs_count: '本棚',
39
+ }
40
+
41
+ BOOK_ATTRIBUTES = %i(name author read_dates uri image_uri)
42
+ Book = Struct.new(*BOOK_ATTRIBUTES)
43
+ class Books
44
+ extend Forwardable
45
+
46
+ def_delegator :@books, :[]
47
+ def_delegator :@books, :[]=
48
+ def_delegator :@books, :<<
49
+ def_delegator :@books, :each
50
+ def_delegator :@books, :flatten!
51
+ def_delegator :@books, :empty?
52
+
53
+ def initialize; @books = []; end
54
+
55
+ def concat(books)
56
+ books.each do |book|
57
+ next if @books.any? { |b| b.name == book.name && b.author == book.author }
58
+ @books << book
59
+ end
60
+ end
61
+
62
+ def to_a; @books; end
63
+ end
64
+
65
+ USER_ATTRIBUTES = %i(name id uri)
66
+ User = Struct.new(*USER_ATTRIBUTES)
67
+
68
+ NUM_BOOKS_PER_PAGE = 40
69
+ NUM_USERS_PER_PAGE = 20
70
+
71
+ attr_accessor :agent
72
+
73
+
74
+ def initialize(agent = nil)
75
+ @agent = agent
76
+ @book_pages = {}
77
+ end
78
+
79
+ def fetch_profile(user_id, agent = @agent)
80
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
81
+ raise ScraperError if agent.nil?
82
+
83
+ Profile.new(*scrape_profile(user_id, agent))
84
+ end
85
+
86
+ def scrape_profile(user_id, agent)
87
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
88
+ raise ScraperError if agent.nil?
89
+
90
+ mypage = agent.get(BookmeterScraper.mypage_uri(user_id))
91
+
92
+ profile_dl_tags = mypage.search('#side_left > div.inner > div.profile > dl')
93
+ jp_attribute_names = profile_dl_tags.map { |i| i.children[0].children.text }
94
+ attribute_values = profile_dl_tags.map { |i| i.children[1].children.text }
95
+ jp_attributes = Hash[jp_attribute_names.zip(attribute_values)]
96
+
97
+ attributes = PROFILE_ATTRIBUTES.map do |attribute|
98
+ jp_attributes[JP_ATTRIBUTE_NAMES[attribute]]
99
+ end
100
+ attributes[0] = mypage.at_css('#side_left > div.inner > h3').text
101
+
102
+ attributes
103
+ end
104
+
105
+ def fetch_books(user_id, uri_method, agent = @agent)
106
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
107
+ raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
108
+ raise ScraperError if agent.nil?
109
+ return [] unless agent.logged_in?
110
+
111
+ books = Books.new
112
+ scraped_pages = scrape_books_pages(user_id, uri_method)
113
+ scraped_pages.each do |page|
114
+ books << extract_books(page)
115
+ books.flatten!
116
+ end
117
+ books
118
+ end
119
+
120
+ def scrape_books_pages(user_id, uri_method, agent = @agent)
121
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
122
+ raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
123
+ raise ScraperError if agent.nil?
124
+ return [] unless agent.logged_in?
125
+
126
+ books_page = agent.get(BookmeterScraper.method(uri_method).call(user_id))
127
+
128
+ # if books are not found at all
129
+ return [] if books_page.search('#main_left > div > center > a').empty?
130
+
131
+ if books_page.search('span.now_page').empty?
132
+ books_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
133
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
134
+ send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
135
+ send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
136
+ end
137
+ end
138
+ return [books_root.inject(agent, books_page)]
139
+ end
140
+
141
+ books_root = Yasuri.pages_root '//span[@class="now_page"]/following-sibling::span[1]/a' do
142
+ text_page_index '//span[@class="now_page"]/a'
143
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
144
+ send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
145
+ send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
146
+ end
147
+ end
148
+ books_root.inject(agent, books_page)
149
+ end
150
+
151
+ def extract_books(page)
152
+ raise ArgumentError if page.nil?
153
+
154
+ books = []
155
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
156
+ break if page["book_#{i}_link"].empty?
157
+
158
+ read_dates = []
159
+ read_date = scrape_read_date(page["book_#{i}_link"])
160
+ unless read_date.empty?
161
+ read_dates << Time.local(read_date['year'], read_date['month'], read_date['day'])
162
+ end
163
+
164
+ reread_dates = []
165
+ reread_dates << scrape_reread_date(page["book_#{i}_link"])
166
+ reread_dates.flatten!
167
+
168
+ unless reread_dates.empty?
169
+ reread_dates.each do |date|
170
+ read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
171
+ end
172
+ end
173
+
174
+ book_path = page["book_#{i}_link"]
175
+ book_name = scrape_book_name(book_path)
176
+ book_author = scrape_book_author(book_path)
177
+ book_image_uri = scrape_book_image_uri(book_path)
178
+ book = Book.new(book_name,
179
+ book_author,
180
+ read_dates,
181
+ ROOT_URI + book_path,
182
+ book_image_uri)
183
+ books << book
184
+ end
185
+
186
+ books
187
+ end
188
+
189
+ def fetch_read_books(user_id, target_year_month)
190
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
191
+ raise ArgumentError if target_year_month.nil?
192
+
193
+ result = Books.new
194
+ scrape_books_pages(user_id, :read_books_uri).each do |page|
195
+ first_book_date = scrape_read_date(page['book_1_link'])
196
+ last_book_date = get_last_book_date(page)
197
+
198
+ first_book_year_month = Time.local(first_book_date['year'].to_i, first_book_date['month'].to_i)
199
+ last_book_year_month = Time.local(last_book_date['year'].to_i, last_book_date['month'].to_i)
200
+
201
+ if target_year_month < last_book_year_month
202
+ next
203
+ elsif target_year_month == first_book_year_month && target_year_month > last_book_year_month
204
+ result.concat(fetch_target_books(target_year_month, page))
205
+ break
206
+ elsif target_year_month < first_book_year_month && target_year_month > last_book_year_month
207
+ result.concat(fetch_target_books(target_year_month, page))
208
+ break
209
+ elsif target_year_month <= first_book_year_month && target_year_month >= last_book_year_month
210
+ result.concat(fetch_target_books(target_year_month, page))
211
+ elsif target_year_month > first_book_year_month
212
+ break
213
+ end
214
+ end
215
+ result
216
+ end
217
+
218
+ def get_last_book_date(page)
219
+ raise ArgumentError if page.nil?
220
+
221
+ NUM_BOOKS_PER_PAGE.downto(1) do |i|
222
+ link = page["book_#{i}_link"]
223
+ next if link.empty?
224
+ return scrape_read_date(link)
225
+ end
226
+ end
227
+
228
+ def fetch_target_books(target_year_month, page)
229
+ raise ArgumentError if target_year_month.nil?
230
+ raise ArgumentError if page.nil?
231
+
232
+ target_books = Books.new
233
+ 1.upto(NUM_BOOKS_PER_PAGE) do |i|
234
+ next if page["book_#{i}_link"].empty?
235
+
236
+ read_year_months = []
237
+ read_date = scrape_read_date(page["book_#{i}_link"])
238
+ read_dates = [Time.local(read_date['year'], read_date['month'], read_date['day'])]
239
+ read_year_months << Time.local(read_date['year'], read_date['month'])
240
+
241
+ reread_dates = []
242
+ reread_dates << scrape_reread_date(page["book_#{i}_link"])
243
+ reread_dates.flatten!
244
+
245
+ unless reread_dates.empty?
246
+ reread_dates.each do |date|
247
+ read_year_months << Time.local(date['reread_year'], date['reread_month'])
248
+ end
249
+ end
250
+
251
+ next unless read_year_months.include?(target_year_month)
252
+
253
+ unless reread_dates.empty?
254
+ reread_dates.each do |date|
255
+ read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
256
+ end
257
+ end
258
+ book_path = page["book_#{i}_link"]
259
+ book_name = scrape_book_name(book_path)
260
+ book_author = scrape_book_author(book_path)
261
+ book_image_uri = scrape_book_image_uri(book_path)
262
+ target_books << Book.new(book_name, book_author, read_dates, ROOT_URI + book_path, book_image_uri)
263
+ end
264
+
265
+ target_books
266
+ end
267
+
268
+ def get_book_page(book_uri, agent = @agent)
269
+ @book_pages[book_uri] = agent.get(ROOT_URI + book_uri) unless @book_pages[book_uri]
270
+ @book_pages[book_uri]
271
+ end
272
+
273
+ def scrape_book_name(book_uri)
274
+ get_book_page(book_uri).search('#title').text
275
+ end
276
+
277
+ def scrape_book_author(book_uri)
278
+ get_book_page(book_uri).search('#author_name').text
279
+ end
280
+
281
+ def scrape_book_image_uri(book_uri)
282
+ get_book_page(book_uri).search('//*[@id="book_image"]/@src').text
283
+ end
284
+
285
+ def scrape_read_date(book_uri, agent = @agent)
286
+ book_date = Yasuri.struct_date '//*[@id="book_edit_area"]/form[1]/div[2]' do
287
+ text_year '//*[@id="read_date_y"]/option[1]', truncate: /\d+/, proc: :to_i
288
+ text_month '//*[@id="read_date_m"]/option[1]', truncate: /\d+/, proc: :to_i
289
+ text_day '//*[@id="read_date_d"]/option[1]', truncate: /\d+/, proc: :to_i
290
+ end
291
+ book_date.inject(agent, get_book_page(book_uri))
292
+ end
293
+
294
+ def scrape_reread_date(book_uri, agent = @agent)
295
+ book_reread_date = Yasuri.struct_reread_date '//*[@id="book_edit_area"]/div/form[1]/div[2]' do
296
+ text_reread_year '//div[@class="reread_box"]/form[1]/div[2]/select[1]/option[1]', truncate: /\d+/, proc: :to_i
297
+ text_reread_month '//div[@class="reread_box"]/form[1]/div[2]/select[2]/option[1]', truncate: /\d+/, proc: :to_i
298
+ text_reread_day '//div[@class="reread_box"]/form[1]/div[2]/select[3]/option[1]', truncate: /\d+/, proc: :to_i
299
+ end
300
+ book_reread_date.inject(agent, get_book_page(book_uri))
301
+ end
302
+
303
+ def fetch_followings(user_id, agent = @agent)
304
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
305
+ raise ScraperError if agent.nil?
306
+ return [] unless agent.logged_in?
307
+
308
+ users = []
309
+ scraped_pages = user_id == agent.log_in_user_id ? scrape_followings_page(user_id)
310
+ : scrape_others_followings_page(user_id)
311
+ scraped_pages.each do |page|
312
+ users << extract_users(page)
313
+ users.flatten!
314
+ end
315
+ users
316
+ end
317
+
318
+ def fetch_followers(user_id, agent = @agent)
319
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
320
+ raise ScraperError if agent.nil?
321
+ return [] unless agent.logged_in?
322
+
323
+ users = []
324
+ scraped_pages = scrape_followers_page(user_id)
325
+ scraped_pages.each do |page|
326
+ users << extract_users(page)
327
+ users.flatten!
328
+ end
329
+ users
330
+ end
331
+
332
+ def scrape_followings_page(user_id, agent = @agent)
333
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
334
+ return [] unless agent.logged_in?
335
+
336
+ followings_page = agent.get(BookmeterScraper.followings_uri(user_id))
337
+ followings_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
338
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
339
+ send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@title")
340
+ send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@href")
341
+ end
342
+ end
343
+ [followings_root.inject(agent, followings_page)]
344
+ end
345
+
346
+ def scrape_others_followings_page(user_id)
347
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
348
+ scrape_users_listing_page(user_id, :followings_uri)
349
+ end
350
+
351
+ def scrape_followers_page(user_id)
352
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
353
+ scrape_users_listing_page(user_id, :followers_uri)
354
+ end
355
+
356
+ def scrape_users_listing_page(user_id, uri_method, agent = @agent)
357
+ raise ArgumentError unless user_id =~ USER_ID_REGEX
358
+ raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
359
+ return [] unless agent.logged_in?
360
+
361
+ page = agent.get(BookmeterScraper.method(uri_method).call(user_id))
362
+ root = Yasuri.struct_users '//*[@id="main_left"]/div' do
363
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
364
+ send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@title")
365
+ send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@href")
366
+ end
367
+ end
368
+ [root.inject(agent, page)]
369
+ end
370
+
371
+ def extract_users(page)
372
+ raise ArgumentError if page.nil?
373
+
374
+ users = []
375
+ 1.upto(NUM_USERS_PER_PAGE) do |i|
376
+ break if page["user_#{i}_name"].empty?
377
+
378
+ user_name = page["user_#{i}_name"]
379
+ user_id = page["user_#{i}_link"].match(/\/u\/(\d+)$/)[1]
380
+ users << User.new(user_name, user_id, ROOT_URI + "/u/#{user_id}")
381
+ end
382
+
383
+ users
384
+ end
385
+ end
386
+
387
+ class ScraperError < StandardError; end
388
+ end
@@ -1,3 +1,3 @@
1
1
  module BookmeterScraper
2
- VERSION = "0.1.1"
2
+ VERSION = "0.1.2"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bookmeter_scraper
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.1
4
+ version: 0.1.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Kohei Yamamoto
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2016-03-05 00:00:00.000000000 Z
11
+ date: 2016-03-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -115,8 +115,10 @@ files:
115
115
  - bookmeter_scraper.gemspec
116
116
  - exe/bookmeter_scraper
117
117
  - lib/bookmeter_scraper.rb
118
+ - lib/bookmeter_scraper/agent.rb
118
119
  - lib/bookmeter_scraper/bookmeter.rb
119
120
  - lib/bookmeter_scraper/configuration.rb
121
+ - lib/bookmeter_scraper/scraper.rb
120
122
  - lib/bookmeter_scraper/version.rb
121
123
  homepage: https://github.com/kymmt90/bookmeter_scraper
122
124
  licenses:
@@ -138,7 +140,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
138
140
  version: '0'
139
141
  requirements: []
140
142
  rubyforge_project:
141
- rubygems_version: 2.4.5.1
143
+ rubygems_version: 2.5.1
142
144
  signing_key:
143
145
  specification_version: 4
144
146
  summary: Bookmeter scraping library