RubyGems - yasuri - Versions diffs - 2.0.12 → 3.3.0 - Mend

yasuri 2.0.12 → 3.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

checksums.yaml +5 -5
data/.github/workflows/ruby.yml +35 -0
data/.gitignore +1 -2
data/.ruby-version +1 -0
data/.travis.yml +1 -3
data/README.md +87 -21
data/USAGE.ja.md +368 -120
data/USAGE.md +375 -125
data/examples/example.rb +79 -0
data/examples/github.yml +15 -0
data/examples/sample.json +4 -0
data/examples/sample.yml +11 -0
data/exe/yasuri +5 -0
data/lib/yasuri.rb +1 -0
data/lib/yasuri/version.rb +1 -1
data/lib/yasuri/yasuri.rb +86 -41
data/lib/yasuri/yasuri_cli.rb +64 -0
data/lib/yasuri/yasuri_links_node.rb +11 -5
data/lib/yasuri/yasuri_map_node.rb +40 -0
data/lib/yasuri/yasuri_node.rb +37 -2
data/lib/yasuri/yasuri_node_generator.rb +16 -11
data/lib/yasuri/yasuri_paginate_node.rb +10 -4
data/lib/yasuri/yasuri_struct_node.rb +5 -1
data/lib/yasuri/yasuri_text_node.rb +9 -2
data/spec/cli_resources/tree.json +8 -0
data/spec/cli_resources/tree.yml +5 -0
data/spec/cli_resources/tree_wrong.json +9 -0
data/spec/cli_resources/tree_wrong.yml +6 -0
data/spec/spec_helper.rb +4 -9
data/spec/yasuri_cli_spec.rb +96 -0
data/spec/yasuri_links_node_spec.rb +34 -12
data/spec/yasuri_map_spec.rb +75 -0
data/spec/yasuri_paginate_node_spec.rb +22 -10
data/spec/yasuri_spec.rb +244 -94
data/spec/yasuri_struct_node_spec.rb +13 -17
data/spec/yasuri_text_node_spec.rb +11 -12
data/yasuri.gemspec +5 -3
metadata +52 -18
data/app.rb +0 -52

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: a4fed4a13bb125758515e3c0ced665b1ca3d20b6
-  data.tar.gz: e9dfb2ed6256a367db2e5b6a78d23fa097c422d7
+SHA256:
+  metadata.gz: a7bf438a08fc83fec7e78cb5543577c98f6cc98b4f5fae7b0dd969f2049c0531
+  data.tar.gz: e399c6b57589b7d8ba2e8eff7a1d204fa7f8e676f82f631057e19a9377333060
 SHA512:
-  metadata.gz: 8b9d6345f3f49b1f7d9445ce18bca736b8cbeedc69979a45d541b59af4e09092d7c1d12886801a24296e9e3d73f39a7c2d53a7c2de12e1a0ff890623b47cfe84
-  data.tar.gz: 6d755f266062052dd5244599deefefea85f7570c827a898e48eee22c44510dde287b0554ed2cae85e3b94b44fe4eb6f74b512c44047e6cc1bb43fe27a93143b0
+  metadata.gz: 56f39994972657712cb7d95e5ceaadefca8de41e06c2cd4759363b496d7c8531fad7517f9df99bf2446c144f01c5cd82cbc94146c432d6b5b552f092b975ecd7
+  data.tar.gz: cf74a25615187ecbe5f8ca5f2072679fa9cc1902dfa3bf2190b87e11104f332688cdaee16f4be6cb00f9ed63fa18f2ec8f27cf32b0b27389f81d98229fa212e6

data/.github/workflows/ruby.yml ADDED Viewed

@@ -0,0 +1,35 @@
+# This workflow uses actions that are not certified by GitHub.
+# They are provided by a third-party and are governed by
+# separate terms of service, privacy policy, and support
+# documentation.
+# This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
+# For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
+name: Ruby
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        ruby-version: ['2.6', '2.7', '3.0']
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Ruby
+    # To automatically get bug fixes and new Ruby versions for ruby/setup-ruby,
+    # change this to (see https://github.com/ruby/setup-ruby#versioning):
+    # uses: ruby/setup-ruby@v1
+      uses: ruby/setup-ruby@473e4d8fe5dd94ee328fdfca9f8c9c7afc9dae5e
+      with:
+        ruby-version: ${{ matrix.ruby-version }}
+        bundler-cache: true # runs 'bundle install' and caches installed gems automatically
+    - name: Run tests
+      run: bundle exec rake

data/.gitignore CHANGED Viewed

@@ -66,5 +66,4 @@ tramp
 # cask packages
 .cask/
-.ruby-version
-Gemfile.lock
+Gemfile.lock

data/.ruby-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.0.0

data/.travis.yml CHANGED Viewed

@@ -1,9 +1,7 @@
 language: ruby
-rvm:
-  - 2.2.0
 script:
   - ruby --version
   - rspec spec
 addons:
   code_climate:
-    repo_token: 0dc78d33107a7f11f257c0218ac1a37e0073005bb9734f2fd61d0f7e803fc151
+    repo_token: 0dc78d33107a7f11f257c0218ac1a37e0073005bb9734f2fd61d0f7e803fc151

data/README.md CHANGED Viewed

@@ -1,6 +1,8 @@
-# Yasuri [![Build Status](https://travis-ci.org/tac0x2a/yasuri.svg?branch=master)](https://travis-ci.org/tac0x2a/yasuri) [![Coverage Status](https://coveralls.io/repos/tac0x2a/yasuri/badge.svg?branch=master)](https://coveralls.io/r/tac0x2a/yasuri?branch=master) [![Code Climate](https://codeclimate.com/github/tac0x2a/yasuri/badges/gpa.svg)](https://codeclimate.com/github/tac0x2a/yasuri)
+# Yasuri
+[![Build Status](https://github.com/tac0x2a/yasuri/actions/workflows/ruby.yml/badge.svg)](https://github.com/tac0x2a/yasuri/actions/workflows/ruby.yml)
+[![Coverage Status](https://coveralls.io/repos/tac0x2a/yasuri/badge.svg?branch=master)](https://coveralls.io/r/tac0x2a/yasuri?branch=master) [![Maintainability](https://api.codeclimate.com/v1/badges/c29480fea1305afe999f/maintainability)](https://codeclimate.com/github/tac0x2a/yasuri/maintainability)
-Yasuri (鑢) is an easy web-scraping library for supporting "[Mechanize](https://github.com/sparklemotion/mechanize)".
+Yasuri (鑢) is an easy web-scraping library for supporting "[Mechanize](https://github.com/sparklemotion/mechanize)", and CLI tool using it.
 Yasuri can reduce frequently processes in Scraping.
@@ -31,7 +33,10 @@ or
 ```ruby
 # for Ruby 1.9.3 or lower
-gem 'yasuri', '~> 1.9'
+gem 'yasuri', '~> 2.0', '>= 2.0.13'
+# for Ruby 3.0.0 or lower
+gem 'yasuri', '~> 3.1'
 ```
@@ -44,6 +49,7 @@ Or install it yourself as:
     $ gem install yasuri
 ## Usage
+### Use as library
 ```ruby
 # Node tree constructing by DSL
@@ -52,32 +58,92 @@ root = Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
          text_content '//*[@id="contents"]/p[1]'
        end
+# Node tree constructing by YAML
+src = <<-EOYAML
+links_root:
+  path: "//*[@id='menu']/ul/li/a"
+  text_title: "//*[@id='contents']/h2"
+  text_content: "//*[@id='contents']/p[1]"
+EOYAML
+root = Yasuri.yaml2tree(src)
 # Node tree constructing by JSON
 src = <<-EOJSON
-   { "node"     : "links",
-     "name"     : "root",
-     "path"     : "//*[@id='menu']/ul/li/a",
-     "children" : [
-                    { "node" : "text",
-                      "name" : "title",
-                      "path" : "//*[@id='contents']/h2"
-                    },
-                    { "node" : "text",
-                      "name" : "content",
-                      "path" : "//*[@id='contents']/p[1]"
-                    }
-                  ]
-   }
+{
+  "links_root": {
+    "path": "//*[@id='menu']/ul/li/a",
+    "text_title": "//*[@id='contents']/h2",
+    "text_content": "//*[@id='contents']/p[1]"
+  }
+}
 EOJSON
 root = Yasuri.json2tree(src)
-agent = Mechanize.new
-root_page = agent.get("http://some.scraping.page.net/")
+# Execution and getting scraped result
+result = root.scrape("http://some.scraping.page.tac42.net/")
+# => [
+#      {"title" => "PageTitle 01", "content" => "Page Contents  01" },
+#      {"title" => "PageTitle 02", "content" => "Page Contents  02" },
+#      ...
+#      {"title" => "PageTitle N",  "content" => "Page Contents  N" }
+#    ]
+```
+### Use as CLI
+```sh
+# After gem installation..
+$ yasuri help scrape
+Usage:
+  yasuri scrape <URI> [[--file <TREE_FILE>] or [--json <JSON>]]
+Options:
+  f, [--file=FILE]   # path to file that written yasuri tree as json or yaml
+  j, [--json=JSON]   # yasuri tree format json string
+  i, [--interval=N]  # interval each request [ms]
-result = root.inject(agent, root_page)
-# => [ {"title" => "PageTitle", "content" => "Page Contents" }, ...  ]
+Getting from <URI> and scrape it. with <JSON> or json/yml from <TREE_FILE>. They should be Yasuri's format json or yaml string.
 ```
+Example
+```sh
+$ yasuri scrape "https://www.ruby-lang.org/en/" -j '
+{
+  "text_title": "/html/head/title",
+  "text_desc": "//*[@id=\"intro\"]/p"
+}'
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+```
+## Dev
+```sh
+$ gem install bundler
+$ bundle install
+```
+### Test
+```sh
+$ rake
+# or
+$ rspec spec/*spec.rb
+```
+### Test gem in local
+```sh
+$ gem build yasuri.gemspec
+$ gem install yasuri-*.gem
+```
+### Release RubyGems
+```sh
+# Only first time
+$ curl -u <user_name> https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials
+$ chmod 0600 ~/.gem/credentials
+$ nano lib/yasuri/version.rb # edit gem version
+$ rake release
+```
 ## Contributing

data/USAGE.ja.md CHANGED Viewed

@@ -1,24 +1,32 @@
-# Yasuri の使い方
+# Yasuri
 ## Yasuri とは
-Yasuri (鑢) は簡単にWebスクレイピングを行うための、"[Mechanize](https://github.com/sparklemotion/mechanize)" をサポートするライブラリです．
+Yasuri (鑢) はWebスクレイピングを宣言的に行うためのライブラリと、それを用いたスクレイピングのコマンドラインツールです。
+簡単な宣言的記法で期待結果を記述するだけでスクレイピングした結果を得られます。
 Yasuriは、スクレイピングにおける、よくある処理を簡単に記述することができます．
-例えば、
+例えば、以下のような処理を簡単に実現することができます．
-+ ページ内の複数のリンクを開いて、各ページをスクレイピングした結果をHashで取得する
 + ページ内の複数のテキストをスクレイピングし、名前をつけてHashにする
++ ページ内の複数のリンクを開いて、各ページをスクレイピングした結果をHashで取得する
 + ページ内に繰り返し出現するテーブルをそれぞれスクレイピングして、配列として取得する
-+ ページネーションで提供される各ページのうち、上位3つだけを順にスクレイピングする
-これらを簡単に実装することができます．
++ ページネーションで提供される各ページのうち、最初の3ページだけをスクレイピングする
 ## クイックスタート
+#### インストール
+```sh
+# for Ruby 2.3.2
+$ gem 'yasuri', '~> 2.0', '>= 2.0.13'
 ```
+または
+```sh
+# for Ruby 3.0.0 or upper
 $ gem install yasuri
 ```
+#### ライブラリとして使う
 ```ruby
 require 'yasuri'
 require 'machinize'
@@ -29,83 +37,190 @@ root = Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
          text_content '//*[@id="contents"]/p[1]'
        end
-agent = Mechanize.new
-root_page = agent.get("http://some.scraping.page.net/")
-result = root.inject(agent, root_page)
-# => [ {"title" => "PageTitle1", "content" => "Page Contents1" },
-#      {"title" => "PageTitle2", "content" => "Page Contents2" }, ...  ]
+result = root.scrape("http://some.scraping.page.tac42.net/")
+# => [
+#      {"title" => "PageTitle 01", "content" => "Page Contents  01" },
+#      {"title" => "PageTitle 02", "content" => "Page Contents  02" },
+#      ...
+#      {"title" => "PageTitle N",  "content" => "Page Contents  N" }
+#    ]
 ```
 この例では、 LinkNode(`links_root`)の xpath で指定された各リンク先のページから、TextNode(`text_title`,`text_content`) の xpath で指定された2つのテキストをスクレイピングする例です．
 (言い換えると、`//*[@id="menu"]/ul/li/a` で示される各リンクを開いて、`//*[@id="contents"]/h2` と `//*[@id="contents"]/p[1]` で指定されたテキストをスクレイピングします)
-## 基本
-1. パースツリーを作る
-2. Mechanize の agent と対象のページを与えてパースを開始する
+#### CLIツールとして使う
+上記と同じことを、CLIのコマンドとして実行できます。
+```sh
+$ yasuri scrape "http://some.scraping.page.tac42.net/" -j '
+{
+  "links_root": {
+    "path": "//*[@id=\"menu\"]/ul/li/a",
+    "text_title": "//*[@id=\"contents\"]/h2",
+    "text_content": "//*[@id=\"contents\"]/p[1]"
+    }
+}'
+[
+  {"title":"PageTitle 01","content":"Page Contents  01"},
+  {"title":"PageTitle 02","content":"Page Contents  02"},
+  ...,
+  {"title":"PageTitle N","content":"Page Contents  N"}
+]
+```
+結果はjson形式の文字列として取得できます。
-### パースツリーを作る
+----------------------------
+## パースツリー
-```ruby
-require 'mechanize'
-require 'yasuri'
+パースツリーとは、スクレイピングする要素と出力構造を宣言的に定義するための木構造データです。
+パースツリーは入れ子になった Node で構成されます．Node は `Type`, `Name`, `Path`, `Childlen`, `Options` 属性を持っており、その `Type` に応じたスクレイピング処理を行います．(ただし、`MapNode` のみ `Path` を持ちません)
-# 1. パースツリーを作る
-tree = Yasuri.links_title '/html/body/a' do
-         text_name '/html/body/p'
-       end
+パースツリーは以下のフォーマットで定義されます．
-# 2. Mechanize の agent と対象のページを与えてパースを開始する
-agent = Mechanize.new
-page = agent.get(uri)
+```ruby
+# 1ノードからなる単純なツリー
+Yasuri.<Type>_<Name> <Path> [,<Options>]
+# 入れ子になっているツリー
+Yasuri.<Type>_<Name> <Path> [,<Options>] do
+  <Type>_<Name> <Path> [,<Options>] do
+    <Type>_<Name> <Path> [,<Options>]
+    ...
+  end
+end
+```
+**例**
+```ruby
+# 1ノードからなる単純なツリー
+Yasuri.text_title '/html/head/title', truncate:/^[^,]+/
-tree.inject(agent, page)
+# 入れ子になっているツリー
+Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
+  struct_table './tr' do
+    text_title    './td[1]'
+    text_pub_date './td[2]'
+  end
+end
 ```
-ツリーは、DSLまたはjsonで定義することができます．上の例ではDSLで定義しています．
-以下は、jsonで上記と等価な解析ツリーを定義した例です．
+パースツリーはRubyのDSL、JSON、YAMLのいずれかで定義することができます。
+以下は、上記と同じパースツリーをそれぞれの記法で定義した例です。
+**Ruby DSLで定義する場合**
 ```ruby
-# json で構成する場合
-src = <<-EOJSON
-   { "node"     : "links",
-     "name"     : "title",
-     "path"     : "/html/body/a",
-     "children" : [
-                    { "node" : "text",
-                      "name" : "name",
-                      "path" : "/html/body/p"
-                    }
-                  ]
-   }
-EOJSON
-tree = Yasuri.json2tree(src)
+Yasuri.links_title '/html/body/a' do
+  text_name '/html/body/p'
+end
 ```
+**JSONで定義する場合**
+```json
+{
+  links_title": {
+    "path": "/html/body/a",
+    "text_name": "/html/body/p"
+  }
+}
+```
-### Node
-ツリーは入れ子になった *Node* で構成されます．
-Node は `Type`, `Name`, `Path`, `Childlen`, `Options` を持っています．
+**YAMLで定義する場合**
+```yaml
+links_title:
+  path: "/html/body/a"
+  text_name: "/html/body/p"
+```
-Nodeは以下のフォーマットで定義されます．
+**パースツリーの特殊なケース**
+rootの直下の要素が1つだけの場合、Hash(Object)ではなく、その要素を直接返します。
+```json
+{
+  "text_title": "/html/head/title",
+  "text_body": "/html/body",
+}
+# => {"title": "Welcome to yasuri!", "body": "Yasuri is ..."}
+{
+  "text_title": "/html/head/title"}
+}
+# => Welcome to yasuri!
+```
+jsonまたはyaml形式では、子Nodeを持たない場合、`path` を直接値に指定することができます。以下の2つのjsonは同じパースツリーになります。
+```json
+{
+  "text_name": "/html/body/p"
+}
+{
+  "text_name": {
+    "path": "/html/body/p"
+  }
+}
+```
+### ツリーを実行する
+パースツリーのルートノードで`Node#scrape(uri, opt={})`メソッドをコールします。
+**例**
 ```ruby
-# トップレベル
-Yasuri.<Type>_<Name> <Path> [,<Options>]
+root = Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
+         text_title '//*[@id="contents"]/h2'
+         text_content '//*[@id="contents"]/p[1]'
+       end
-# 入れ子になっている場合
-Yasuri.<Type>_<Name> <Path> [,<Options>] do
-  <Type>_<Name> <Path> [,<Options>] do
-    <Children>
-  end
-end
+result = root.scrape("http://some.scraping.page.tac42.net/", interval_ms: 1000)
 ```
++ `uri` はスクレイピングする対象ページのURIです。
++ `opt` はオプションをHashで指定します。以下のオプションを利用できます。
+Yasuriはスクレイピングを行うエージェントとして、内部で`Mechanize`を使用しています。
+このインスタンスを指定したい場合は、`Node#scrape_with_agent(uri, agent, opt={})`をコールします。
+```ruby
+require 'logger'
+agent = Mechanize.new
+agent.log = Logger.new $stderr
+agent.request_headers = {
+  # ...
+}
+result = root.scrape_with_agent(
+  "http://some.scraping.page.tac42.net/",
+  agent,
+  interval_ms: 1000)
+```
+### `opt`
+#### `interval_ms`
+複数ページにリクエストする際の間隔[ミリ秒]です。
+省略した場合はインターバルなしで続けてリクエストしますが、多数のページへのリクエストが予想される場合、対象ホストが高負荷とならないよう、インターバル時間を指定することを強くお勧めします。
+#### `retry_count`
+ページ取得失敗時のリトライ回数です。省略した場合は5回リトライします。
+#### `symbolize_names`
+`true`のとき、結果セットのキーをシンボルとして返します。
+--------------------------
+## Node
+Nodeはパースツリーの節または葉となる要素で、`Type`, `Name`, `Path`, `Childlen`, `Options` を持っており、その `Type` に応じてスクレイピングを行います．(ただし、`MapNode` のみ `Path` を持ちません)
 #### Type
 *Type* は Nodeの振る舞いを示します．Typeには以下のものがあります．
@@ -113,18 +228,21 @@ end
 - *Struct*
 - *Links*
 - *Paginate*
+- *Map*
+詳細は各ノードの説明を参照してください。
-### Name
+#### Name
 *Name* は 解析結果のHashにおけるキーになります．
-### Path
+#### Path
 *Path* は xpath あるいは css セレクタによって、HTML上の特定のノードを指定します．
 これは Machinize の `search` で使用されます．
-### Childlen
+#### Childlen
 入れ子になっているノードの子ノードです．TextNodeはツリーの葉に当たるため、子ノードを持ちません．
-### Options
+#### Options
 パースのオプションです．オプションはTypeごとに異なります．
 各ノードに対して、`opt`メソッドをコールすることで、利用可能なオプションを取得できます．
@@ -140,7 +258,7 @@ node.opt #=> {:truncate => /^[^,]+/, :proc => nil}
 ### 例
 ```html
-<!-- http://yasuri.example.net -->
+<!-- http://yasuri.example.tac42.net -->
 <html>
   <head></head>
   <body>
@@ -151,25 +269,24 @@ node.opt #=> {:truncate => /^[^,]+/, :proc => nil}
 ```
 ```ruby
-agent = Mechanize.new
-page = agent.get("http://yasuri.example.net")
 p1  = Yasuri.text_title '/html/body/p[1]'
 p1t = Yasuri.text_title '/html/body/p[1]', truncate:/^[^,]+/
-p2u = Yasuri.text_title '/html/body/p[2]', proc: :upcase
+p2u = Yasuri.text_title '/html/body/p[1]', proc: :upcase
-p1.inject(agent, page)   #=> { "title" => "Hello,World" }
-p1t.inject(agent, page)  #=> { "title" => "Hello" }
-node.inject(agent, page) #=> { "title" => "HELLO,YASURI" }
+p1.scrape("http://yasuri.example.tac42.net")   #=> "Hello,World"
+p1t.scrape("http://yasuri.example.tac42.net")  #=> "Hello"
+p2u.scrape("http://yasuri.example.tac42.net")  #=> "HELLO,WORLD"
 ```
+なお、同じページ内の複数の要素を一度にスクレイピングする場合は、`MapNode`を使用します。詳細は、`MapNode`の例を参照してください。
 ### オプション
 ##### `truncate`
 正規表現にマッチした文字列を取り出します．グループを指定した場合、最初にマッチしたグループだけを返します．
 ```ruby
 node  = Yasuri.text_example '/html/body/p[1]', truncate:/H(.+)i/
-node.inject(agent, index_page)
+node.scrape(uri)
 #=> { "example" => "ello,Yasur" }
 ```
@@ -180,7 +297,7 @@ node.inject(agent, index_page)
 ```ruby
 node = Yasuri.text_example '/html/body/p[1]', proc: :upcase, truncate:/H(.+)i/
-node.inject(agent, index_page)
+node.scrape(uri)
 #=> { "example" => "ELLO,YASUR" }
 ```
@@ -195,7 +312,7 @@ Struct Node の `Path` が複数のタグにマッチする場合、配列とし
 ### 例
 ```html
-<!-- http://yasuri.example.net -->
+<!-- http://yasuri.example.tac42.net -->
 <html>
   <head>
     <title>Books</title>
@@ -236,15 +353,12 @@ Struct Node の `Path` が複数のタグにマッチする場合、配列とし
 ```
 ```ruby
-agent = Mechanize.new
-page = agent.get("http://yasuri.example.net")
 node = Yasuri.struct_table '/html/body/table[1]/tr' do
   text_title    './td[1]'
   text_pub_date './td[2]'
-])
+end
-node.inject(agent, page)
+node.scrape("http://yasuri.example.tac42.net")
 #=> [ { "title"    => "The Perfect Insider",
 #       "pub_date" => "1996/4/5" },
 #     { "title"    => "Doctors in Isolated Room",
@@ -258,23 +372,19 @@ Struct Node は xpath `'/html/body/table[1]/tr'` によって、最初の `<tabl
 この場合は、最初の `<table>` は 3つの `<tr>`タグを持っているため、3つのHashを返します．(`<thead><tr>` は `Path` にマッチしないため4つではないことに注意)
 各HashはTextNodeによってパースされたテキストを含んでいます．
 また以下の例のように、Struct Node は TextNode以外のノードを子ノードとすることができます．
 ### 例
 ```ruby
-agent = Mechanize.new
-page = agent.get("http://yasuri.example.net")
 node = Yasuri.strucre_tables '/html/body/table' do
   struct_table './tr' do
     text_title    './td[1]'
     text_pub_date './td[2]'
   end
-])
+end
-node.inject(agent, page)
+node.scrape("http://yasuri.example.tac42.net")
 #=>      [ { "table" => [ { "title"    => "The Perfect Insider",
 #                           "pub_date" => "1996/4/5" },
@@ -306,8 +416,8 @@ node.inject(agent, page)
 Links Node は リンクされた各ページをパースして結果を返します．
 ### 例
-```
-<!-- http://yasuri.example.net -->
+```html
+<!-- http://yasuri.example.tac42.net -->
 <html>
   <head><title>Yasuri Test</title></head>
   <body>
@@ -319,8 +429,8 @@ Links Node は リンクされた各ページをパースして結果を返し
 <title>
 ```
-```
-<!-- http://yasuri.example.net/child01.html -->
+```html
+<!-- http://yasuri.example.tac42.net/child01.html -->
 <html>
   <head><title>Child 01 Test</title></head>
   <body>
@@ -333,8 +443,8 @@ Links Node は リンクされた各ページをパースして結果を返し
 <title>
 ```
-```
-<!-- http://yasuri.example.net/child02.html -->
+```html
+<!-- http://yasuri.example.tac42.net/child02.html -->
 <html>
   <head><title>Child 02 Test</title></head>
   <body>
@@ -343,8 +453,8 @@ Links Node は リンクされた各ページをパースして結果を返し
 <title>
 ```
-```
-<!-- http://yasuri.example.net/child03.html -->
+```html
+<!-- http://yasuri.example.tac42.net/child03.html -->
 <html>
   <head><title>Child 03 Test</title></head>
   <body>
@@ -356,22 +466,19 @@ Links Node は リンクされた各ページをパースして結果を返し
 <title>
 ```
-```
-agent = Mechanize.new
-page = agent.get("http://yasuri.example.net")
+```ruby
 node = Yasuri.links_title '/html/body/a' do
   text_content '/html/body/p'
 end
-node.inject(agent, page)
+node.scrape("http://yasuri.example.tac42.net")
 #=> [ {"content" => "Child 01 page."},
       {"content" => "Child 02 page."},
       {"content" => "Child 03 page."}]
 ```
 まず、 LinksNode は `Path` にマッチするすべてのリンクを最初のページから探します．
-この例では、LinksNodeは `/html/body/a` にマッチするすべてのタグを `http://yasuri.example.net` から探します．
+この例では、LinksNodeは `/html/body/a` にマッチするすべてのタグを `http://yasuri.example.tac42.net` から探します．
 次に、見つかったタグのhref属性で指定されたページを開きます．(`./child01.html`, `./child02.html`, `./child03.html`)
 開いた各ページに対して、子ノードによる解析を行います．LinksNodeは 各ページに対するパース結果をHashの配列として返します．
@@ -384,7 +491,7 @@ PaginateNodeは ページネーション(パジネーション, Pagination) で
 `page02.html` から `page04.html` も同様です．
 ```html
-<!-- http://yasuri.example.net/page01.html -->
+<!-- http://yasuri.example.tac42.net/page01.html -->
 <html>
   <head><title>Page01</title></head>
   <body>
@@ -404,17 +511,14 @@ PaginateNodeは ページネーション(パジネーション, Pagination) で
 ```
 ```ruby
-agent = Mechanize.new
-page = agent.get("http://yasuri.example.net/page01.html")
 node = Yasuri.pages_root "/html/body/nav/span/a[@class='next']" , limit:3 do
          text_content '/html/body/p'
        end
-node.inject(agent, page)
+node.scrape("http://yasuri.example.tac42.net/page01.html")
 #=> [ {"content" => "Patination01"},
-      {"content" => "Patination02"},
-      {"content" => "Patination03"}]
+#     {"content" => "Patination02"},
+#     {"content" => "Patination03"}]
 ```
 PaginateNodeは 次のページ を指すリンクを`Path`として指定する必要があります．
 この例では、`NextPage` (`/html/body/nav/span/a[@class='next']`)が、次のページを指すリンクに該当します．
@@ -427,7 +531,7 @@ PaginateNodeは 次のページ を指すリンクを`Path`として指定する
 node = Yasuri.pages_root "/html/body/nav/span/a[@class='next']" , limit:2 do
          text_content '/html/body/p'
        end
-node.inject(agent, page)
+node.scrape(uri)
 #=> [ {"content" => "Pagination01"}, {"content" => "Pagination02"}]
 ```
 この場合、PaginateNode は最大2つまでのページを開いてパースします．ページネーションは4つのページを持っているようですが、`limit:2`が指定されているため、結果の配列には2つの結果のみが含まれています．
@@ -436,33 +540,177 @@ node.inject(agent, page)
 取得した各ページの結果を展開します．
 ```ruby
-agent = Mechanize.new
-page = agent.get("http://yasuri.example.net/page01.html")
 node = Yasuri.pages_root "/html/body/nav/span/a[@class='next']" , flatten:true do
          text_title   '/html/head/title'
          text_content '/html/body/p'
        end
-node.inject(agent, page)
+node.scrape("http://yasuri.example.tac42.net/page01.html")
 #=> [ {"title" => "Page01",
-       "content" => "Patination01"},
-      {"title"   => "Page01",
-       "content" => "Patination02"},
-      {"title"   => "Page01",
-       "content" => "Patination03"}]
+#      "content" => "Patination01"},
+#     {"title"   => "Page01",
+#      "content" => "Patination02"},
+#     {"title"   => "Page01",
+#      "content" => "Patination03"}]
 node = Yasuri.pages_root "/html/body/nav/span/a[@class='next']" , flatten:true do
         text_title   '/html/head/title'
         text_content '/html/body/p'
       end
-node.inject(agent, page)
+node.scrape("http://yasuri.example.tac42.net/page01.html")
 #=> [ "Page01",
-      "Patination01",
-      "Page02",
-      "Patination02",
-      "Page03",
-      "Patination03"]
+#     "Patination01",
+#     "Page02",
+#     "Patination02",
+#     "Page03",
+#     "Patination03"]
+```
+## Map Node
+*MapNode* はスクレイピングした結果をまとめるノードです．このノードはパースツリーにおいて常に節です．
+### 例
+```html
+<!-- http://yasuri.example.tac42.net -->
+<html>
+  <head><title>Yasuri Example</title></head>
+  <body>
+    <p>Hello,World</p>
+    <p>Hello,Yasuri</p>
+  </body>
+</html>
+```
+```ruby
+tree = Yasuri.map_root do
+  text_title  '/html/head/title'
+  text_body_p '/html/body/p[1]'
+end
+tree.scrape("http://yasuri.example.tac42.net") #=> { "title" => "Yasuri Example", "body_p" => "Hello,World" }
+tree = Yasuri.map_root do
+  map_group1 { text_child01  '/html/body/a[1]' }
+  map_group2 do
+    text_child01 '/html/body/a[1]'
+    text_child03 '/html/body/a[3]'
+  end
+end
+tree.scrape("http://yasuri.example.tac42.net") #=> {
+#   "group1" => {
+#           "child01" => "child01"
+#         },
+#         "group2" => {
+#           "child01" => "child01",
+#           "child03" => "child03"
+#         }
+# }
 ```
+### オプション
+なし
+-------------------------
+## 使い方
+### ライブラリとして使う
+ライブラリとして使用する場合は、DSL, json, yaml の形式でツリーを定義できます。
+```ruby
+require 'yasuri'
+# 1. パースツリーを作る
+# DSLで定義する
+tree = Yasuri.links_title '/html/body/a' do
+         text_name '/html/body/p'
+       end
+# jsonで定義する場合
+src = <<-EOJSON
+{
+  links_title": {
+    "path": "/html/body/a",
+    "text_name": "/html/body/p"
+  }
+}
+EOJSON
+tree = Yasuri.json2tree(src)
+# yamlで定義する場合
+src = <<-EOYAML
+links_title:
+  path: "/html/body/a"
+  text_name: "/html/body/p"
+EOYAML
+tree = Yasuri.yaml2tree(src)
+# 2. URLを与えてパースを開始する
+tree.inject(uri)
+```
+### CLIツールとして使う
+**ヘルプ表示**
+```sh
+$ yasuri help scrape
+Usage:
+  yasuri scrape <URI> [[--file <TREE_FILE>] or [--json <JSON>]]
+Options:
+  f, [--file=FILE]  # path to file that written yasuri tree as json or yaml
+  j, [--json=JSON]  # yasuri tree format json string
+  i, [--interval=N]  # interval each request [ms]
+Getting from <URI> and scrape it. with <JSON> or json/yml from <TREE_FILE>. They should be Yasuri's format json or yaml string.
+```
+CLIツールでは以下のどちらかの方法でパースツリーを指定します。
++ `--file`, `-f` : ファイルに出力されたjson形式またはyaml形式のパースツリーを読み込む
++ `--json`, `-j` : パースツリーを文字列として直接指定する
+**パースツリーをファイルで指定する例**
+```sh
+% cat sample.yml
+text_title: "/html/head/title"
+text_desc: "//*[@id=\"intro\"]/p"
+% yasuri scrape "https://www.ruby-lang.org/en/" --file sample.yml
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+% cat sample.json
+{
+  "text_title": "/html/head/title",
+  "text_desc": "//*[@id=\"intro\"]/p"
+}
+% yasuri scrape "https://www.ruby-lang.org/en/" --file sample.json
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+```
+ファイルがjsonまたはyamlのどちらで記載されているかについては自動判別されます。
+**パースツリーをjsonで直接指定する例**
+```sh
+$ yasuri scrape "https://www.ruby-lang.org/en/" -j '
+{
+  "text_title": "/html/head/title",
+  "text_desc": "//*[@id=\"intro\"]/p"
+}'
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+```
+#### その他のオプション
++ `--interval`, `-i` : 複数ページにリクエストする際の間隔[ミリ秒]です。
+   **例: 1秒間隔でリクエストする**
+   ```sh
+   $ yasuri scrape "https://www.ruby-lang.org/en/" --file sample.yml --interval 1000
+   ```