RubyGems - bookmeter_scraper - Versions diffs - 0.1.1 → 0.1.2 - Mend

bookmeter_scraper 0.1.1 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/README.ja.md +42 -5
data/README.md +40 -4
data/lib/bookmeter_scraper.rb +46 -0
data/lib/bookmeter_scraper/agent.rb +59 -0
data/lib/bookmeter_scraper/bookmeter.rb +52 -393
data/lib/bookmeter_scraper/configuration.rb +16 -5
data/lib/bookmeter_scraper/scraper.rb +388 -0
data/lib/bookmeter_scraper/version.rb +1 -1
metadata +5 -3

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: a8d0b08d6fe209f678ebda3c621b41412a575aef
-  data.tar.gz: 1d9db389c1c0ac92216c00e76f7a025327ed77c7
+  metadata.gz: eed0f25219959cbcb0f1e74a0db32d7f6ef46de8
+  data.tar.gz: bf5981a2fcb2c933c41720cb99846ac8d1df7dad
 SHA512:
-  metadata.gz: 9df797a4925a93e8cb8a7f07e38bbd4b1288acadd27a2d0e32fd3e0161120a843dd3b2c5c184c7116e90bb6d8b52a2cacfc246a04f7ede74b6081359882f2d51
-  data.tar.gz: ba4d5f59e35034facda467d469a0f2ef3cfef10829f2e75a82af6a839233fe2ff0e588a9c1535db25bc48eb24a2d08fd81d614b67f1c92d66b2656b0603fc303
+  metadata.gz: 894e75e566f6e547089048bf6872917c79dcb2a9456d36afd59dc624bdd62a67b9bdce23cd811e1035408b14bc7eba48928e03337d50fed102206daee899cf5f
+  data.tar.gz: 3077ac2b3b900537f494ed3fe001cb2be7af6a726293945ad738215ea205ac536d53ac3ad70ba1587dfbcedf4d820387845036c31fc80af92445c4ea2ffd9388

data/README.ja.md CHANGED

@@ -1,4 +1,5 @@
-# Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
+# Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper) [![Gem Version](https://badge.fury.io/rb/bookmeter_scraper.svg)](https://badge.fury.io/rb/bookmeter_scraper)
 [読書メーター](http://bookmeter.com)の情報をスクレイピングして Ruby で扱えるようにするための gem です。
@@ -30,10 +31,11 @@ require 'bookmeter_scraper'
 書籍情報、お気に入り / お気に入られユーザ情報を取得するには、`Bookmeter.log_in` または `Bookmeter#log_in` でログインしておく必要があります。
-ログイン情報の入力には以下の 2 通りの方法があります。
+ログイン情報の入力には以下の 3 通りの方法があります。
 1. 引数として渡す
 2. `config.yml` へ記述しておく
+3. ブロック内で設定する
 #### 1. 引数として渡す
@@ -67,6 +69,28 @@ bookmeter = BookmeterScraper::Bookmeter.log_in
 bookmeter.logged_in?    # true
 ```
+#### 3. ブロック内で設定する
+以下のように `Bookmeter.log_in` へブロックを渡すことで、ログインできます。
+```ruby
+bookmeter = BookmeterScraper::Bookmeter.log_in do |configuration|
+  configuration.mail     = 'example@example.com'
+  configuration.password = 'password'
+end
+bookmeter.logged_in?    # true
+```
+`Bookmeter#log_in` でもログイン可能です。
+```ruby
+bookmeter = BookmeterScraper::Bookmeter.new
+bookmeter.log_in do |configuration|
+  configuration.mail     = 'example@example.com'
+  configuration.password = 'password'
+end
+```
 ### 書籍情報の取得
 以下の書籍情報
@@ -76,7 +100,7 @@ bookmeter.logged_in?    # true
 - 積読本
 - 読みたい本
-を取得できます。取得には事前のログインが必要です。
+を取得できます。取得には `Bookmeter.log_in` などによる事前のログインが必要です。
 #### 読んだ本
@@ -92,13 +116,17 @@ bookmeter.read_books('01010101')    # 他のユーザの ID を指定して、
 - 書名 `name`
 - 著者 `author`
 - 読了日（初読了日と再読日の両方）の配列 `read_dates`
+- 読書メーター内の書籍ページの URI `uri`
+- 書籍の表紙画像 URI `image_uri`
-を属性として持つ `Struct` の配列として取得できます。
+を属性として持つ `Book` の配列として取得できます。
 ```ruby
 books[0].name
 books[0].author
 books[0].read_dates
+books[0].uri
+books[0].image_uri
 ```
 さらに、`Bookmeter#read_books_in` で特定年月の「読んだ本」情報が取得できます。
@@ -129,6 +157,8 @@ books = bookmeter.reading_books    # ログインユーザの「読んでる本
 books[0].name
 books[0].author
 books[0].read_dates    # 読了日の Array は空
+books[0].uri
+books[0].image_uri
 bookmeter.tsundoku     # ログインユーザの「積読本」を取得
 bookmeter.wish_list    # ログインユーザの「読みたい本」を取得
@@ -143,13 +173,20 @@ following_users = bookmeter.followings    # 「お気に入り」ユーザの情
 followers = bookmeter.followers           # 「お気に入られ」ユーザの情報を取得
 ```
-ユーザ情報はユーザ名 `name` とユーザ ID `id` を持つ `Struct` の配列として取得できます。
+ユーザ情報は
+- ユーザ名 `name`
+- ユーザ ID `id`
+- 読書メーター内のユーザページの URI `uri`
+を持つ `User` の配列として取得できます。
 ```ruby
 following_users[0].name
 following_users[0].id
 followers[0].name
 followers[0].id
+followers[0].uri
 ```
 #### 注意

data/README.md CHANGED

@@ -1,4 +1,4 @@
-# Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper)
+# Bookmeter Scraper [![Build Status](https://travis-ci.org/kymmt90/bookmeter_scraper.svg?branch=master)](https://travis-ci.org/kymmt90/bookmeter_scraper) [![Gem Version](https://badge.fury.io/rb/bookmeter_scraper.svg)](https://badge.fury.io/rb/bookmeter_scraper)
 A library for scraping [Bookmeter](http://bookmeter.com).
@@ -34,10 +34,11 @@ require 'bookmeter_scraper'
 You need to log in Bookmeter to get books and followings / followers information by `Bookmeter.log_in` or `Bookmeter#log_in`.
-There are 2 ways to input authentication information:
+There are 3 ways to input authentication information:
 1. Passing as arguments
 2. Writing out to `config.yml`
+3. Configuring in a block
 #### 1. Passing as arguments
@@ -71,6 +72,27 @@ bookmeter = BookmeterScraper::Bookmeter.log_in
 bookmeter.logged_in?    # true
 ```
+#### 3. Configuring in a block
+You can configure mail address and password in a block.
+```ruby
+bookmeter = BookmeterScraper::Bookmeter.log_in do |configuration|
+  configuration.mail = 'example@example.com'
+  configuration.password = 'password'
+end
+bookmeter.logged_in?    # true
+```
+`Bookmeter#log_in` is also available:
+```ruby
+bookmeter = BookmeterScraper::Bookmeter.new
+bookmeter.log_in do |configuration|
+  configuration.mail = 'example@example.com'
+  configuration.password = 'password'
+end
+```
 ### Get books information
@@ -92,12 +114,20 @@ books = bookmeter.read_books        # get read books of the logged in user
 bookmeter.read_books('01010101')    # get read books of a user specified by ID
 ```
-Books infomation is an array of `Struct` which has `name` and `read_dates` as attributes.
+Books infomation is an array of `Book` which has these attributes:
+- `name`
+- `read_dates`
+- `uri`
+- `image_uri`
 `read_dates` is an array of finished reading dates (first finished date and reread dates):
 ```ruby
 books[0].name
 books[0].read_dates
+books[0].uri
+books[0].image_uri
 ```
 To specify year-month for read books, you can use `Bookmeter#read_books_in`:
@@ -135,13 +165,19 @@ followers = bookmeter.followers
 You need to log in Bookmeter in advance to get these information.
-Users information is an array of `Struct` which has `name` and `id` as attributes.
+Users information is an array of `Struct` which has following attributes:
+- `name`
+- `id`
+- `uri`
 ```ruby
 following_users[0].name
 following_users[0].id
+following_users[0].uri
 followers[0].name
 followers[0].id
+followers[0].uri
 ```
 #### Notice

data/lib/bookmeter_scraper.rb CHANGED

@@ -1,3 +1,49 @@
 require 'bookmeter_scraper/bookmeter'
 require 'bookmeter_scraper/configuration'
 require 'bookmeter_scraper/version'
+module BookmeterScraper
+  ROOT_URI  = 'http://bookmeter.com'.freeze
+  LOGIN_URI = "#{ROOT_URI}/login".freeze
+  USER_ID_REGEX = /^\d+$/
+  class << self
+    def mypage_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}"
+    end
+    def read_books_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}/booklist"
+    end
+    def reading_books_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}/booklistnow"
+    end
+    def tsundoku_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}/booklisttun"
+    end
+    def wish_list_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}/booklistpre"
+    end
+    def followings_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}/favorite_user"
+    end
+    def followers_uri(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      "#{ROOT_URI}/u/#{user_id}/favorited_user"
+    end
+  end
+  class BookmeterError < StandardError; end
+end

data/lib/bookmeter_scraper/agent.rb ADDED

@@ -0,0 +1,59 @@
+require 'forwardable'
+module BookmeterScraper
+  class Agent
+    extend Forwardable
+    def_delegator :@agent, :get
+    def_delegator :@agent, :click
+    attr_reader :log_in_user_id
+    def initialize
+      @agent = Mechanize.new do |a|
+        a.user_agent_alias = Mechanize::AGENT_ALIASES.keys.reject do |ua_alias|
+          %w(Android iPad iPhone Mechanize).include?(ua_alias)
+        end.sample
+      end
+      @log_in_user_id = nil
+    end
+    def log_in(config)
+      raise ArgumentError if config.nil?
+      page_after_submitting_form = nil
+      @agent.get(BookmeterScraper::LOGIN_URI) do |page|
+        page_after_submitting_form = page.form_with(action: '/login') do |form|
+          form.field_with(name: 'mail').value     = config.mail
+          form.field_with(name: 'password').value = config.password
+        end.submit
+      end
+      if page_after_logging_in? page_after_submitting_form
+        mypage = page_after_submitting_form.link_with(text: 'マイページ').click
+        @log_in_user_id = extract_user_id(mypage)
+      else
+        nil
+      end
+    end
+    def logged_in?
+      !@log_in_user_id.nil?
+    end
+    private
+    def page_after_logging_in?(page)
+      raise ArgumentError if page.nil?
+      page.uri.to_s == BookmeterScraper::ROOT_URI + '/'
+    end
+    def extract_user_id(page)
+      raise ArgumentError if page.nil?
+      page.uri.to_s.match(/\/u\/(\d+)$/)[1]
+    end
+  end
+end

data/lib/bookmeter_scraper/bookmeter.rb CHANGED

@@ -1,130 +1,51 @@
-require 'forwardable'
-require 'mechanize'
-require 'yasuri'
+require 'bookmeter_scraper/agent'
+require 'bookmeter_scraper/scraper'
 module BookmeterScraper
   class Bookmeter
     DEFAULT_CONFIG_PATH = './config.yml'.freeze
-    ROOT_URI  = 'http://bookmeter.com'.freeze
-    LOGIN_URI = "#{ROOT_URI}/login".freeze
-    PROFILE_ATTRIBUTES = %i(name gender age blood_type job address url description first_day elapsed_days read_books_count read_pages_count reviews_count bookshelfs_count)
-    Profile = Struct.new(*PROFILE_ATTRIBUTES)
-    BOOK_ATTRIBUTES = %i(name author read_dates)
-    Book = Struct.new(*BOOK_ATTRIBUTES)
-    class Books
-      extend Forwardable
-      def_delegator :@books, :[]
-      def_delegator :@books, :[]=
-      def_delegator :@books, :<<
-      def_delegator :@books, :each
-      def_delegator :@books, :flatten!
-      def initialize; @books = []; end
-      def concat(books)
-        books.each do |book|
-          next if @books.any? { |b| b.name == book.name && b.author == book.author }
-          @books << book
-        end
-      end
-      def to_a; @books; end
-    end
-    USER_ATTRIBUTES = %i(name id)
-    User = Struct.new(*USER_ATTRIBUTES)
-    JP_ATTRIBUTE_NAMES = {
-      gender: '性別',
-      age: '年齢',
-      blood_type: '血液型',
-      job: '職業',
-      address: '現住所',
-      url: 'URL / ブログ',
-      description: '自己紹介',
-      first_day: '記録初日',
-      elapsed_days: '経過日数',
-      read_books_count: '読んだ本',
-      read_pages_count: '読んだページ',
-      reviews_count: '感想/レビュー',
-      bookshelfs_count: '本棚',
-    }
-    NUM_BOOKS_PER_PAGE = 40
-    NUM_USERS_PER_PAGE = 20
     attr_reader :log_in_user_id
-    def self.mypage_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}"
-    end
-    def self.read_books_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}/booklist"
-    end
-    def self.reading_books_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}/booklistnow"
-    end
-    def self.tsundoku_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}/booklisttun"
-    end
-    def self.wish_list_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}/booklistpre"
-    end
-    def self.followings_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}/favorite_user"
-    end
-    def self.followers_uri(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      "#{ROOT_URI}/u/#{user_id}/favorited_user"
-    end
-    def self.log_in(mail = nil, password = nil)
-      Bookmeter.new.tap do |bookmeter|
-        bookmeter.log_in(mail, password)
+    class << self
+      def log_in(mail = nil, password = nil)
+        Bookmeter.new.tap do |bookmeter|
+          if block_given?
+            config = Configuration.new
+            yield config
+            bookmeter.log_in(config.mail, config.password)
+          else
+            bookmeter.log_in(mail, password)
+          end
+        end
       end
     end
     def initialize(agent = nil)
-      @agent = agent.nil? ? Bookmeter.new_agent : agent
-      @logged_in = false
+      @agent          = agent.nil? ? Agent.new : agent
+      @scraper        = Scraper.new(@agent)
+      @logged_in      = false
       @log_in_user_id = nil
-      @book_pages = {}
     end
     def log_in(mail = nil, password = nil)
       raise BookmeterError if @agent.nil?
-      config = Configuration.new(DEFAULT_CONFIG_PATH) if mail.nil? && password.nil?
-      page_after_submitting_form = nil
-      page = @agent.get(LOGIN_URI) do |page|
-        page_after_submitting_form = page.form_with(action: '/login') do |form|
-          form.field_with(name: 'mail').value     = config ? config.mail     : mail
-          form.field_with(name: 'password').value = config ? config.password : password.to_s
-        end.submit
-      end
-      @logged_in = page_after_submitting_form.uri.to_s == ROOT_URI + '/'
-      return unless logged_in?
+      configuration = if block_given?
+                        Configuration.new.tap { |config| yield config }
+                      elsif mail.nil? && password.nil?
+                        Configuration.new(DEFAULT_CONFIG_PATH)
+                      else
+                        Configuration.new.tap do |config|
+                          config.mail     = mail
+                          config.password = password
+                        end
+                      end
-      mypage = page_after_submitting_form.link_with(text: 'マイページ').click
-      @log_in_user_id = extract_user_id(mypage)
+      @log_in_user_id = @agent.log_in(configuration)
+      @logged_in      = !@log_in_user_id.nil?
     end
     def logged_in?
@@ -132,321 +53,59 @@ module BookmeterScraper
     end
     def profile(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      mypage = @agent.get(Bookmeter.mypage_uri(user_id))
-      profile_dl_tags = mypage.search('#side_left > div.inner > div.profile > dl')
-      jp_attribute_names = profile_dl_tags.map { |i| i.children[0].children.text }
-      attribute_values   = profile_dl_tags.map { |i| i.children[1].children.text }
-      jp_attributes = Hash[jp_attribute_names.zip(attribute_values)]
-      attributes = PROFILE_ATTRIBUTES.map do |attribute|
-        jp_attributes[JP_ATTRIBUTE_NAMES[attribute]]
-      end
-      attributes[0] = mypage.at_css('#side_left > div.inner > h3').text
-      Profile.new(*attributes)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      @scraper.fetch_profile(user_id)
     end
     def read_books(user_id = @log_in_user_id)
-      books = get_books(user_id, :read_books_uri)
-      books.each { |b| yield b } if block_given?
-      books.to_a
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      fetch_books(user_id, :read_books_uri)
     end
     def read_books_in(year, month, user_id = @log_in_user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
       date = Time.local(year, month)
-      books = get_read_books(user_id, date)
+      books = @scraper.fetch_read_books(user_id, date)
       books.each { |b| yield b } if block_given?
       books.to_a
     end
     def reading_books(user_id = @log_in_user_id)
-      books = get_books(user_id, :reading_books_uri)
-      books.each { |b| yield b } if block_given?
-      books.to_a
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      fetch_books(user_id, :reading_books_uri)
     end
     def tsundoku(user_id = @log_in_user_id)
-      books = get_books(user_id, :tsundoku_uri)
-      books.each { |b| yield b } if block_given?
-      books.to_a
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      fetch_books(user_id, :tsundoku_uri)
     end
     def wish_list(user_id = @log_in_user_id)
-      books = get_books(user_id, :wish_list_uri)
-      books.each { |b| yield b } if block_given?
-      books.to_a
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      fetch_books(user_id, :wish_list_uri)
     end
     def followings(user_id = @log_in_user_id)
-      users = get_followings(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      @scraper.fetch_followings(user_id)
     end
     def followers(user_id = @log_in_user_id)
-      users = get_followers(user_id)
-    end
-    private
-    def self.new_agent
-      agent = Mechanize.new do |a|
-        a.user_agent_alias = Mechanize::AGENT_ALIASES.keys.reject do |ua_alias|
-          %w(Android iPad iPhone Mechanize).include?(ua_alias)
-        end.sample
-      end
-    end
-    def extract_user_id(page)
-      page.uri.to_s.match(/\/u\/(\d+)$/)[1]
-    end
-    def get_books(user_id, uri_method)
-      books = Books.new
-      scraped_pages = scrape_book_pages(user_id, uri_method)
-      scraped_pages.each do |page|
-        books << get_book_structs(page)
-        books.flatten!
-      end
-      books
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      @scraper.fetch_followers(user_id)
     end
-    def get_read_books(user_id, target_ym)
-      result = Books.new
-      scrape_book_pages(user_id, :read_books_uri).each do |page|
-        first_book_date = get_read_date(page['book_1_link'])
-        last_book_date  = get_last_book_date(page)
-        first_book_ym = Time.local(first_book_date['year'].to_i, first_book_date['month'].to_i)
-        last_book_ym  = Time.local(last_book_date['year'].to_i, last_book_date['month'].to_i)
-        if target_ym < last_book_ym
-          next
-        elsif target_ym == first_book_ym && target_ym > last_book_ym
-          result.concat(get_target_books(target_ym, page))
-          break
-        elsif target_ym < first_book_ym && target_ym > last_book_ym
-          result.concat(get_target_books(target_ym, page))
-          break
-        elsif target_ym <= first_book_ym && target_ym >= last_book_ym
-          result.concat(get_target_books(target_ym, page))
-        elsif target_ym > first_book_ym
-          break
-        end
-      end
-      result
-    end
-    def get_last_book_date(page)
-      NUM_BOOKS_PER_PAGE.downto(1) do |i|
-        link = page["book_#{i}_link"]
-        next if link.empty?
-        return get_read_date(link)
-      end
-    end
-    def get_target_books(target_ym, page)
-      target_books = Books.new
-      1.upto(NUM_BOOKS_PER_PAGE) do |i|
-        next if page["book_#{i}_link"].empty?
-        read_yms = []
-        read_date = get_read_date(page["book_#{i}_link"])
-        read_dates = [Time.local(read_date['year'], read_date['month'], read_date['day'])]
-        read_yms << Time.local(read_date['year'], read_date['month'])
-        reread_dates = []
-        reread_dates << get_reread_date(page["book_#{i}_link"])
-        reread_dates.flatten!
-        unless reread_dates.empty?
-          reread_dates.each do |date|
-            read_yms << Time.local(date['reread_year'], date['reread_month'])
-          end
-        end
-        next unless read_yms.include?(target_ym)
-        unless reread_dates.empty?
-          reread_dates.each do |date|
-            read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
-          end
-        end
-        book_name = get_book_name(page["book_#{i}_link"])
-        book_author = get_book_author(page["book_#{i}_link"])
-        book = Book.new(book_name, book_author, read_dates)
-        target_books << book
-      end
-      target_books
-    end
-    def scrape_book_pages(user_id, uri_method)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      raise ArgumentError unless Bookmeter.methods.include?(uri_method)
-      return [] unless logged_in?
-      books_page = @agent.get(Bookmeter.method(uri_method).call(user_id))
-      # if books are not found at all
-      return [] if books_page.search('#main_left > div > center > a').empty?
-      if books_page.search('span.now_page').empty?
-        books_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
-          1.upto(NUM_BOOKS_PER_PAGE) do |i|
-            send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
-            send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
-          end
-        end
-        return [books_root.inject(@agent, books_page)]
-      end
-      books_root = Yasuri.pages_root '//span[@class="now_page"]/following-sibling::span[1]/a' do
-        text_page_index '//span[@class="now_page"]/a'
-        1.upto(NUM_BOOKS_PER_PAGE) do |i|
-          send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
-          send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
-        end
-      end
-      books_root.inject(@agent, books_page)
-    end
-    def get_book_page(book_uri)
-      @book_pages[book_uri] = @agent.get(ROOT_URI + book_uri) unless @book_pages[book_uri]
-      @book_pages[book_uri]
-    end
-    def get_book_name(book_uri)
-      get_book_page(book_uri).search('#title').text
-    end
-    def get_book_author(book_uri)
-      get_book_page(book_uri).search('#author_name').text
-    end
-    def get_read_date(book_uri)
-      book_date = Yasuri.struct_date '//*[@id="book_edit_area"]/form[1]/div[2]' do
-        text_year  '//*[@id="read_date_y"]/option[1]', truncate: /\d+/, proc: :to_i
-        text_month '//*[@id="read_date_m"]/option[1]', truncate: /\d+/, proc: :to_i
-        text_day   '//*[@id="read_date_d"]/option[1]', truncate: /\d+/, proc: :to_i
-      end
-      book_date.inject(@agent, get_book_page(book_uri))
-    end
-    def get_reread_date(book_uri)
-      book_reread_date = Yasuri.struct_reread_date '//*[@id="book_edit_area"]/div/form[1]/div[2]' do
-        text_reread_year  '//div[@class="reread_box"]/form[1]/div[2]/select[1]/option[1]', truncate: /\d+/, proc: :to_i
-        text_reread_month '//div[@class="reread_box"]/form[1]/div[2]/select[2]/option[1]', truncate: /\d+/, proc: :to_i
-        text_reread_day   '//div[@class="reread_box"]/form[1]/div[2]/select[3]/option[1]', truncate: /\d+/, proc: :to_i
-      end
-      book_reread_date.inject(@agent, get_book_page(book_uri))
-    end
-    def get_book_structs(page)
-      books = []
-      1.upto(NUM_BOOKS_PER_PAGE) do |i|
-        break if page["book_#{i}_link"].empty?
-        read_dates = []
-        read_date = get_read_date(page["book_#{i}_link"])
-        unless read_date.empty?
-          read_dates << Time.local(read_date['year'], read_date['month'], read_date['day'])
-        end
-        reread_dates = []
-        reread_dates << get_reread_date(page["book_#{i}_link"])
-        reread_dates.flatten!
-        unless reread_dates.empty?
-          reread_dates.each do |date|
-            read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
-          end
-        end
-        book_name = get_book_name(page["book_#{i}_link"])
-        book_author = get_book_author(page["book_#{i}_link"])
-        book = Book.new(book_name, book_author, read_dates)
-        books << book
-      end
-      books
-    end
-    def get_followings(user_id)
-      users = []
-      scraped_pages = user_id == @log_in_user_id ? scrape_followings_page(user_id)
-                                                 : scrape_others_followings_page(user_id)
-      scraped_pages.each do |page|
-        users << get_user_structs(page)
-        users.flatten!
-      end
-      users
-    end
-    def get_followers(user_id)
-      users = []
-      scraped_pages = scrape_followers_page(user_id)
-      scraped_pages.each do |page|
-        users << get_user_structs(page)
-        users.flatten!
-      end
-      users
-    end
-    def get_user_structs(page)
-      users = []
-      1.upto(NUM_USERS_PER_PAGE) do |i|
-        break if page["user_#{i}_name"].empty?
-        user_name = page["user_#{i}_name"]
-        user_id = page["user_#{i}_link"].match(/\/u\/(\d+)$/)[1]
-        user = User.new(user_name, user_id)
-        users << user
-      end
-      users
-    end
-    def scrape_followings_page(user_id)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      return [] unless logged_in?
-      followings_page = @agent.get(Bookmeter.followings_uri(user_id))
-      followings_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
-        1.upto(NUM_USERS_PER_PAGE) do |i|
-          send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@title")
-          send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@href")
-        end
-      end
-      [followings_root.inject(@agent, followings_page)]
-    end
-    def scrape_others_followings_page(user_id)
-      scrape_users_listing_page(user_id, :followings_uri)
-    end
-    def scrape_followers_page(user_id)
-      scrape_users_listing_page(user_id, :followers_uri)
-    end
+    private
-    def scrape_users_listing_page(user_id, uri_method)
-      raise ArgumentError unless user_id =~ /^\d+$/
-      raise ArgumentError unless Bookmeter.methods.include?(uri_method)
-      return [] unless logged_in?
+    def fetch_books(user_id, uri_method)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
-      page = @agent.get(Bookmeter.method(uri_method).call(user_id))
-      root = Yasuri.struct_users '//*[@id="main_left"]/div' do
-        1.upto(NUM_USERS_PER_PAGE) do |i|
-          send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@title")
-          send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@href")
-        end
-      end
-      [root.inject(@agent, page)]
+      books = @scraper.fetch_books(user_id, uri_method)
+      books.each { |book| yield book } if block_given?
+      books.to_a
     end
   end
-  class BookmeterError < StandardError; end
 end

data/lib/bookmeter_scraper/configuration.rb CHANGED

@@ -1,11 +1,14 @@
-require 'yaml'
 module BookmeterScraper
   class Configuration
-    attr_reader :mail, :password
+    attr_accessor :mail, :password
+    def initialize(config_file = nil)
+      if config_file.nil?
+        @mail = @password = ''
+        return
+      end
-    def initialize(config_file)
-      config = YAML.load_file(config_file)
+      config = load_yaml_file(config_file)
       unless config.has_key?('mail') && config.has_key?('password')
         raise ConfigurationError, "#{config_file}: Invalid configuration file"
       end
@@ -13,6 +16,14 @@ module BookmeterScraper
       @mail     = config['mail']
       @password = config['password']
     end
+    private
+    def load_yaml_file(config_file)
+      require 'yaml'
+      YAML.load_file(config_file)
+    end
   end
   class ConfigurationError < StandardError; end

data/lib/bookmeter_scraper/scraper.rb ADDED

@@ -0,0 +1,388 @@
+require 'forwardable'
+require 'mechanize'
+require 'yasuri'
+module BookmeterScraper
+  class Scraper
+    PROFILE_ATTRIBUTES = %i(
+      name
+      gender
+      age
+      blood_type
+      job
+      address
+      url
+      description
+      first_day
+      elapsed_days
+      read_books_count
+      read_pages_count
+      reviews_count
+      bookshelfs_count
+    )
+    Profile = Struct.new(*PROFILE_ATTRIBUTES)
+    JP_ATTRIBUTE_NAMES = {
+      gender: '性別',
+      age: '年齢',
+      blood_type: '血液型',
+      job: '職業',
+      address: '現住所',
+      url: 'URL / ブログ',
+      description: '自己紹介',
+      first_day: '記録初日',
+      elapsed_days: '経過日数',
+      read_books_count: '読んだ本',
+      read_pages_count: '読んだページ',
+      reviews_count: '感想/レビュー',
+      bookshelfs_count: '本棚',
+    }
+    BOOK_ATTRIBUTES = %i(name author read_dates uri image_uri)
+    Book = Struct.new(*BOOK_ATTRIBUTES)
+    class Books
+      extend Forwardable
+      def_delegator :@books, :[]
+      def_delegator :@books, :[]=
+      def_delegator :@books, :<<
+      def_delegator :@books, :each
+      def_delegator :@books, :flatten!
+      def_delegator :@books, :empty?
+      def initialize; @books = []; end
+      def concat(books)
+        books.each do |book|
+          next if @books.any? { |b| b.name == book.name && b.author == book.author }
+          @books << book
+        end
+      end
+      def to_a; @books; end
+    end
+    USER_ATTRIBUTES = %i(name id uri)
+    User = Struct.new(*USER_ATTRIBUTES)
+    NUM_BOOKS_PER_PAGE = 40
+    NUM_USERS_PER_PAGE = 20
+    attr_accessor :agent
+    def initialize(agent = nil)
+      @agent = agent
+      @book_pages = {}
+    end
+    def fetch_profile(user_id, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ScraperError if agent.nil?
+      Profile.new(*scrape_profile(user_id, agent))
+    end
+    def scrape_profile(user_id, agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ScraperError if agent.nil?
+      mypage = agent.get(BookmeterScraper.mypage_uri(user_id))
+      profile_dl_tags    = mypage.search('#side_left > div.inner > div.profile > dl')
+      jp_attribute_names = profile_dl_tags.map { |i| i.children[0].children.text }
+      attribute_values   = profile_dl_tags.map { |i| i.children[1].children.text }
+      jp_attributes      = Hash[jp_attribute_names.zip(attribute_values)]
+      attributes = PROFILE_ATTRIBUTES.map do |attribute|
+        jp_attributes[JP_ATTRIBUTE_NAMES[attribute]]
+      end
+      attributes[0] = mypage.at_css('#side_left > div.inner > h3').text
+      attributes
+    end
+    def fetch_books(user_id, uri_method, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
+      raise ScraperError if agent.nil?
+      return [] unless agent.logged_in?
+      books = Books.new
+      scraped_pages = scrape_books_pages(user_id, uri_method)
+      scraped_pages.each do |page|
+        books << extract_books(page)
+        books.flatten!
+      end
+      books
+    end
+    def scrape_books_pages(user_id, uri_method, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
+      raise ScraperError if agent.nil?
+      return [] unless agent.logged_in?
+      books_page = agent.get(BookmeterScraper.method(uri_method).call(user_id))
+      # if books are not found at all
+      return [] if books_page.search('#main_left > div > center > a').empty?
+      if books_page.search('span.now_page').empty?
+        books_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
+          1.upto(NUM_BOOKS_PER_PAGE) do |i|
+            send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
+            send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
+          end
+        end
+        return [books_root.inject(agent, books_page)]
+      end
+      books_root = Yasuri.pages_root '//span[@class="now_page"]/following-sibling::span[1]/a' do
+        text_page_index '//span[@class="now_page"]/a'
+        1.upto(NUM_BOOKS_PER_PAGE) do |i|
+          send("text_book_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a")
+          send("text_book_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i + 1}]/div[2]/a/@href")
+        end
+      end
+      books_root.inject(agent, books_page)
+    end
+    def extract_books(page)
+      raise ArgumentError if page.nil?
+      books = []
+      1.upto(NUM_BOOKS_PER_PAGE) do |i|
+        break if page["book_#{i}_link"].empty?
+        read_dates = []
+        read_date  = scrape_read_date(page["book_#{i}_link"])
+        unless read_date.empty?
+          read_dates << Time.local(read_date['year'], read_date['month'], read_date['day'])
+        end
+        reread_dates = []
+        reread_dates << scrape_reread_date(page["book_#{i}_link"])
+        reread_dates.flatten!
+        unless reread_dates.empty?
+          reread_dates.each do |date|
+            read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
+          end
+        end
+        book_path = page["book_#{i}_link"]
+        book_name = scrape_book_name(book_path)
+        book_author    = scrape_book_author(book_path)
+        book_image_uri = scrape_book_image_uri(book_path)
+        book = Book.new(book_name,
+                        book_author,
+                        read_dates,
+                        ROOT_URI + book_path,
+                        book_image_uri)
+        books << book
+      end
+      books
+    end
+    def fetch_read_books(user_id, target_year_month)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ArgumentError if target_year_month.nil?
+      result = Books.new
+      scrape_books_pages(user_id, :read_books_uri).each do |page|
+        first_book_date = scrape_read_date(page['book_1_link'])
+        last_book_date  = get_last_book_date(page)
+        first_book_year_month = Time.local(first_book_date['year'].to_i, first_book_date['month'].to_i)
+        last_book_year_month  = Time.local(last_book_date['year'].to_i, last_book_date['month'].to_i)
+        if target_year_month < last_book_year_month
+          next
+        elsif target_year_month == first_book_year_month && target_year_month > last_book_year_month
+          result.concat(fetch_target_books(target_year_month, page))
+          break
+        elsif target_year_month < first_book_year_month && target_year_month > last_book_year_month
+          result.concat(fetch_target_books(target_year_month, page))
+          break
+        elsif target_year_month <= first_book_year_month && target_year_month >= last_book_year_month
+          result.concat(fetch_target_books(target_year_month, page))
+        elsif target_year_month > first_book_year_month
+          break
+        end
+      end
+      result
+    end
+    def get_last_book_date(page)
+      raise ArgumentError if page.nil?
+      NUM_BOOKS_PER_PAGE.downto(1) do |i|
+        link = page["book_#{i}_link"]
+        next if link.empty?
+        return scrape_read_date(link)
+      end
+    end
+    def fetch_target_books(target_year_month, page)
+      raise ArgumentError if target_year_month.nil?
+      raise ArgumentError if page.nil?
+      target_books = Books.new
+      1.upto(NUM_BOOKS_PER_PAGE) do |i|
+        next if page["book_#{i}_link"].empty?
+        read_year_months = []
+        read_date  = scrape_read_date(page["book_#{i}_link"])
+        read_dates = [Time.local(read_date['year'], read_date['month'], read_date['day'])]
+        read_year_months << Time.local(read_date['year'], read_date['month'])
+        reread_dates = []
+        reread_dates << scrape_reread_date(page["book_#{i}_link"])
+        reread_dates.flatten!
+        unless reread_dates.empty?
+          reread_dates.each do |date|
+            read_year_months << Time.local(date['reread_year'], date['reread_month'])
+          end
+        end
+        next unless read_year_months.include?(target_year_month)
+        unless reread_dates.empty?
+          reread_dates.each do |date|
+            read_dates << Time.local(date['reread_year'], date['reread_month'], date['reread_day'])
+          end
+        end
+        book_path = page["book_#{i}_link"]
+        book_name = scrape_book_name(book_path)
+        book_author    = scrape_book_author(book_path)
+        book_image_uri = scrape_book_image_uri(book_path)
+        target_books << Book.new(book_name, book_author, read_dates, ROOT_URI + book_path, book_image_uri)
+      end
+      target_books
+    end
+    def get_book_page(book_uri, agent = @agent)
+      @book_pages[book_uri] = agent.get(ROOT_URI + book_uri) unless @book_pages[book_uri]
+      @book_pages[book_uri]
+    end
+    def scrape_book_name(book_uri)
+      get_book_page(book_uri).search('#title').text
+    end
+    def scrape_book_author(book_uri)
+      get_book_page(book_uri).search('#author_name').text
+    end
+    def scrape_book_image_uri(book_uri)
+      get_book_page(book_uri).search('//*[@id="book_image"]/@src').text
+    end
+    def scrape_read_date(book_uri, agent = @agent)
+      book_date = Yasuri.struct_date '//*[@id="book_edit_area"]/form[1]/div[2]' do
+        text_year  '//*[@id="read_date_y"]/option[1]', truncate: /\d+/, proc: :to_i
+        text_month '//*[@id="read_date_m"]/option[1]', truncate: /\d+/, proc: :to_i
+        text_day   '//*[@id="read_date_d"]/option[1]', truncate: /\d+/, proc: :to_i
+      end
+      book_date.inject(agent, get_book_page(book_uri))
+    end
+    def scrape_reread_date(book_uri, agent = @agent)
+      book_reread_date = Yasuri.struct_reread_date '//*[@id="book_edit_area"]/div/form[1]/div[2]' do
+        text_reread_year  '//div[@class="reread_box"]/form[1]/div[2]/select[1]/option[1]', truncate: /\d+/, proc: :to_i
+        text_reread_month '//div[@class="reread_box"]/form[1]/div[2]/select[2]/option[1]', truncate: /\d+/, proc: :to_i
+        text_reread_day   '//div[@class="reread_box"]/form[1]/div[2]/select[3]/option[1]', truncate: /\d+/, proc: :to_i
+      end
+      book_reread_date.inject(agent, get_book_page(book_uri))
+    end
+    def fetch_followings(user_id, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ScraperError if agent.nil?
+      return [] unless agent.logged_in?
+      users = []
+      scraped_pages = user_id == agent.log_in_user_id ? scrape_followings_page(user_id)
+                                                      : scrape_others_followings_page(user_id)
+      scraped_pages.each do |page|
+        users << extract_users(page)
+        users.flatten!
+      end
+      users
+    end
+    def fetch_followers(user_id, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ScraperError if agent.nil?
+      return [] unless agent.logged_in?
+      users = []
+      scraped_pages = scrape_followers_page(user_id)
+      scraped_pages.each do |page|
+        users << extract_users(page)
+        users.flatten!
+      end
+      users
+    end
+    def scrape_followings_page(user_id, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      return [] unless agent.logged_in?
+      followings_page = agent.get(BookmeterScraper.followings_uri(user_id))
+      followings_root = Yasuri.struct_books '//*[@id="main_left"]/div' do
+        1.upto(NUM_USERS_PER_PAGE) do |i|
+          send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@title")
+          send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/a/@href")
+        end
+      end
+      [followings_root.inject(agent, followings_page)]
+    end
+    def scrape_others_followings_page(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      scrape_users_listing_page(user_id, :followings_uri)
+    end
+    def scrape_followers_page(user_id)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      scrape_users_listing_page(user_id, :followers_uri)
+    end
+    def scrape_users_listing_page(user_id, uri_method, agent = @agent)
+      raise ArgumentError unless user_id =~ USER_ID_REGEX
+      raise ArgumentError unless BookmeterScraper.methods.include?(uri_method)
+      return [] unless agent.logged_in?
+      page = agent.get(BookmeterScraper.method(uri_method).call(user_id))
+      root = Yasuri.struct_users '//*[@id="main_left"]/div' do
+        1.upto(NUM_USERS_PER_PAGE) do |i|
+          send("text_user_#{i}_name", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@title")
+          send("text_user_#{i}_link", "//*[@id=\"main_left\"]/div/div[#{i}]/div/div[2]/a/@href")
+        end
+      end
+      [root.inject(agent, page)]
+    end
+    def extract_users(page)
+      raise ArgumentError if page.nil?
+      users = []
+      1.upto(NUM_USERS_PER_PAGE) do |i|
+        break if page["user_#{i}_name"].empty?
+        user_name = page["user_#{i}_name"]
+        user_id   = page["user_#{i}_link"].match(/\/u\/(\d+)$/)[1]
+        users << User.new(user_name, user_id, ROOT_URI + "/u/#{user_id}")
+      end
+      users
+    end
+  end
+  class ScraperError < StandardError; end
+end

data/lib/bookmeter_scraper/version.rb CHANGED

@@ -1,3 +1,3 @@
 module BookmeterScraper
-  VERSION = "0.1.1"
+  VERSION = "0.1.2"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: bookmeter_scraper
 version: !ruby/object:Gem::Version
-  version: 0.1.1
+  version: 0.1.2
 platform: ruby
 authors:
 - Kohei Yamamoto
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2016-03-05 00:00:00.000000000 Z
+date: 2016-03-27 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: bundler
@@ -115,8 +115,10 @@ files:
 - bookmeter_scraper.gemspec
 - exe/bookmeter_scraper
 - lib/bookmeter_scraper.rb
+- lib/bookmeter_scraper/agent.rb
 - lib/bookmeter_scraper/bookmeter.rb
 - lib/bookmeter_scraper/configuration.rb
+- lib/bookmeter_scraper/scraper.rb
 - lib/bookmeter_scraper/version.rb
 homepage: https://github.com/kymmt90/bookmeter_scraper
 licenses:
@@ -138,7 +140,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.4.5.1
+rubygems_version: 2.5.1
 signing_key:
 specification_version: 4
 summary: Bookmeter scraping library