RubyGems - yasuri - Versions diffs - 2.0.11 → 3.2.0 - Mend

yasuri 2.0.11 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

checksums.yaml +5 -5
data/.github/workflows/ruby.yml +35 -0
data/.gitignore +1 -2
data/.ruby-version +1 -0
data/.travis.yml +1 -3
data/README.md +88 -19
data/USAGE.ja.md +325 -63
data/USAGE.md +335 -69
data/exe/yasuri +5 -0
data/lib/yasuri.rb +1 -0
data/lib/yasuri/version.rb +1 -1
data/lib/yasuri/yasuri.rb +80 -39
data/lib/yasuri/yasuri_cli.rb +64 -0
data/lib/yasuri/yasuri_links_node.rb +10 -6
data/lib/yasuri/yasuri_map_node.rb +39 -0
data/lib/yasuri/yasuri_node.rb +24 -3
data/lib/yasuri/yasuri_node_generator.rb +16 -11
data/lib/yasuri/yasuri_paginate_node.rb +18 -6
data/lib/yasuri/yasuri_struct_node.rb +8 -4
data/lib/yasuri/yasuri_text_node.rb +11 -4
data/spec/cli_resources/tree.json +8 -0
data/spec/cli_resources/tree.yml +5 -0
data/spec/cli_resources/tree_wrong.json +9 -0
data/spec/cli_resources/tree_wrong.yml +6 -0
data/spec/htdocs/struct/structual_links.html +30 -0
data/spec/htdocs/{structual_text.html → struct/structual_text.html} +0 -0
data/spec/spec_helper.rb +1 -6
data/spec/yasuri_cli_spec.rb +83 -0
data/spec/yasuri_links_node_spec.rb +12 -4
data/spec/yasuri_map_spec.rb +76 -0
data/spec/yasuri_paginate_node_spec.rb +43 -0
data/spec/yasuri_spec.rb +199 -84
data/spec/yasuri_struct_node_spec.rb +42 -1
data/yasuri.gemspec +5 -3
metadata +52 -19

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: d8d6bd8c37be444f0c5568bcf20604d7bca5c223
-  data.tar.gz: 8438eee300a7e4f73be7107cbd9417da18f5048d
+SHA256:
+  metadata.gz: cd5fc7327c6d09b37771ac1c3ec40db2c052bf49ec9a1627e9ae49e047102856
+  data.tar.gz: a645f1e09ce72b73c54e2055af6fbf81bb145c8823e1d8428bb19c042bbb661d
 SHA512:
-  metadata.gz: 107ddc8cd0310c646841e6fe6a2695313edb9692418a783e133a5d269d4a1ab39385975276ae167ac68863b9760794ebb2738832dccfc4f599686c5a9e50f244
-  data.tar.gz: b6d089de8cd866f137ca58dd779396cd4948e080d3225cc4384f8f9cdb54f5a778cd4be85b89628938ccacbf11dfefe74ea8bd248e835971470e7a64df597411
+  metadata.gz: 654bd6cfe8012811283b1aa03e0dcc1200ce957ef4641eed2b5fa65956fb974070157b832e42f340d7299031756848c5118a7f43019ff94f088c49974e2304e8
+  data.tar.gz: 5ad07b82672ea2ceebfb8154bb91631c095e9ad8d69f3d62c0bf8d528c4c539fab2597f4112b4212bffe7ad641b30d913686e8e2bfea7dfdbdd9a4468311b6c0

data/.github/workflows/ruby.yml ADDED Viewed

@@ -0,0 +1,35 @@
+# This workflow uses actions that are not certified by GitHub.
+# They are provided by a third-party and are governed by
+# separate terms of service, privacy policy, and support
+# documentation.
+# This workflow will download a prebuilt Ruby version, install dependencies and run tests with Rake
+# For more information see: https://github.com/marketplace/actions/setup-ruby-jruby-and-truffleruby
+name: Ruby
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        ruby-version: ['2.6', '2.7', '3.0']
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Ruby
+    # To automatically get bug fixes and new Ruby versions for ruby/setup-ruby,
+    # change this to (see https://github.com/ruby/setup-ruby#versioning):
+    # uses: ruby/setup-ruby@v1
+      uses: ruby/setup-ruby@473e4d8fe5dd94ee328fdfca9f8c9c7afc9dae5e
+      with:
+        ruby-version: ${{ matrix.ruby-version }}
+        bundler-cache: true # runs 'bundle install' and caches installed gems automatically
+    - name: Run tests
+      run: bundle exec rake

data/.gitignore CHANGED Viewed

@@ -66,5 +66,4 @@ tramp
 # cask packages
 .cask/
-.ruby-version
-Gemfile.lock
+Gemfile.lock

data/.ruby-version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 3.0.0

data/.travis.yml CHANGED Viewed

@@ -1,9 +1,7 @@
 language: ruby
-rvm:
-  - 2.2.0
 script:
   - ruby --version
   - rspec spec
 addons:
   code_climate:
-    repo_token: 0dc78d33107a7f11f257c0218ac1a37e0073005bb9734f2fd61d0f7e803fc151
+    repo_token: 0dc78d33107a7f11f257c0218ac1a37e0073005bb9734f2fd61d0f7e803fc151

data/README.md CHANGED Viewed

@@ -1,6 +1,8 @@
-# Yasuri [![Build Status](https://travis-ci.org/tac0x2a/yasuri.svg?branch=master)](https://travis-ci.org/tac0x2a/yasuri) [![Coverage Status](https://coveralls.io/repos/tac0x2a/yasuri/badge.svg?branch=master)](https://coveralls.io/r/tac0x2a/yasuri?branch=master) [![Code Climate](https://codeclimate.com/github/tac0x2a/yasuri/badges/gpa.svg)](https://codeclimate.com/github/tac0x2a/yasuri)
+# Yasuri
+[![Build Status](https://github.com/tac0x2a/yasuri/actions/workflows/ruby.yml/badge.svg)](https://github.com/tac0x2a/yasuri/actions/workflows/ruby.yml)
+[![Coverage Status](https://coveralls.io/repos/tac0x2a/yasuri/badge.svg?branch=master)](https://coveralls.io/r/tac0x2a/yasuri?branch=master) [![Maintainability](https://api.codeclimate.com/v1/badges/c29480fea1305afe999f/maintainability)](https://codeclimate.com/github/tac0x2a/yasuri/maintainability)
-Yasuri (鑢) is an easy web-scraping library for supporting "[Mechanize](https://github.com/sparklemotion/mechanize)".
+Yasuri (鑢) is an easy web-scraping library for supporting "[Mechanize](https://github.com/sparklemotion/mechanize)", and CLI tool using it.
 Yasuri can reduce frequently processes in Scraping.
@@ -31,7 +33,10 @@ or
 ```ruby
 # for Ruby 1.9.3 or lower
-gem 'yasuri', '~> 1.9'
+gem 'yasuri', '~> 2.0', '>= 2.0.13'
+# for Ruby 3.0.0 or lower
+gem 'yasuri', '~> 3.1'
 ```
@@ -44,6 +49,7 @@ Or install it yourself as:
     $ gem install yasuri
 ## Usage
+### Use as library
 ```ruby
 # Node tree constructing by DSL
@@ -52,32 +58,95 @@ root = Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
          text_content '//*[@id="contents"]/p[1]'
        end
+# Node tree constructing by YAML
+src = <<-EOYAML
+links_root:
+  path: "//*[@id='menu']/ul/li/a"
+  text_title: "//*[@id='contents']/h2"
+  text_content: "//*[@id='contents']/p[1]"
+EOYAML
+root = Yasuri.yaml2tree(src)
 # Node tree constructing by JSON
 src = <<-EOJSON
-   { "node"     : "links",
-     "name"     : "root",
-     "path"     : "//*[@id='menu']/ul/li/a",
-     "children" : [
-                    { "node" : "text",
-                      "name" : "title",
-                      "path" : "//*[@id='contents']/h2"
-                    },
-                    { "node" : "text",
-                      "name" : "content",
-                      "path" : "//*[@id='contents']/p[1]"
-                    }
-                  ]
-   }
+{
+  "links_root": {
+    "path": "//*[@id='menu']/ul/li/a",
+    "text_title": "//*[@id='contents']/h2",
+    "text_content": "//*[@id='contents']/p[1]"
+  }
+}
 EOJSON
 root = Yasuri.json2tree(src)
+# Execution and getting scraped result
 agent = Mechanize.new
-root_page = agent.get("http://some.scraping.page.net/")
+root_page = agent.get("http://some.scraping.page.tac42.net/")
 result = root.inject(agent, root_page)
-# => [ {"title" => "PageTitle", "content" => "Page Contents" }, ...  ]
+# => [
+#      {"title" => "PageTitle 01", "content" => "Page Contents  01" },
+#      {"title" => "PageTitle 02", "content" => "Page Contents  02" },
+#      ...
+#      {"title" => "PageTitle N",  "content" => "Page Contents  N" }
+#    ]
+```
+### Use as CLI
+```sh
+# After gem installation..
+$ yasuri help scrape
+Usage:
+  yasuri scrape <URI> [[--file <TREE_FILE>] or [--json <JSON>]]
+Options:
+  f, [--file=FILE]  # path to file that written yasuri tree as json or yaml
+  j, [--json=JSON]  # yasuri tree format json string
+Getting from <URI> and scrape it. with <JSON> or json/yml from <TREE_FILE>. They should be Yasuri's format json or yaml string.
 ```
+Example
+```sh
+$ yasuri scrape "https://www.ruby-lang.org/en/" -j '
+{
+  "text_title": "/html/head/title",
+  "text_desc": "//*[@id=\"intro\"]/p"
+}'
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+```
+## Dev
+```sh
+$ gem install bundler
+$ bundle install
+```
+### Test
+```sh
+$ rake
+# or
+$ rspec spec/*spec.rb
+```
+### Test gem in local
+```sh
+$ gem build yasuri.gemspec
+$ gem install yasuri-*.gem
+```
+### Release RubyGems
+```sh
+# Only first time
+$ curl -u <user_name> https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials
+$ chmod 0600 ~/.gem/credentials
+$ nano lib/yasuri/version.rb # edit gem version
+$ rake release
+```
 ## Contributing

data/USAGE.ja.md CHANGED Viewed

@@ -1,24 +1,31 @@
-# Yasuri の使い方
+# Yasuri
 ## Yasuri とは
-Yasuri (鑢) は簡単にWebスクレイピングを行うための、"[Mechanize](https://github.com/sparklemotion/mechanize)" をサポートするライブラリです．
+Yasuri (鑢) はWebスクレイピングを宣言的に行うためのライブラリと、それを用いたスクレイピングのコマンドラインツールです。
+簡単な宣言的記法で期待結果を記述するだけで、"[Mechanize](https://github.com/sparklemotion/mechanize)" によるスクレイピングを実行します。
 Yasuriは、スクレイピングにおける、よくある処理を簡単に記述することができます．
-例えば、
+例えば、以下のような処理を簡単に実現することができます．
-+ ページ内の複数のリンクを開いて、各ページをスクレイピングした結果をHashで取得する
 + ページ内の複数のテキストをスクレイピングし、名前をつけてHashにする
++ ページ内の複数のリンクを開いて、各ページをスクレイピングした結果をHashで取得する
 + ページ内に繰り返し出現するテーブルをそれぞれスクレイピングして、配列として取得する
-+ ページネーションで提供される各ページのうち、上位3つだけを順にスクレイピングする
-これらを簡単に実装することができます．
++ ページネーションで提供される各ページのうち、最初の3ページだけをスクレイピングする
 ## クイックスタート
+#### インストール
+```sh
+# for Ruby 2.3.2
+$ gem 'yasuri', '~> 2.0', '>= 2.0.13'
 ```
+または
+```sh
+# for Ruby 3.0.0 or upper
 $ gem install yasuri
 ```
+#### ライブラリとして使う
 ```ruby
 require 'yasuri'
 require 'machinize'
@@ -30,82 +37,148 @@ root = Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
        end
 agent = Mechanize.new
-root_page = agent.get("http://some.scraping.page.net/")
+root_page = agent.get("http://some.scraping.page.tac42.net/")
 result = root.inject(agent, root_page)
-# => [ {"title" => "PageTitle1", "content" => "Page Contents1" },
-#      {"title" => "PageTitle2", "content" => "Page Contents2" }, ...  ]
+# => [
+#      {"title" => "PageTitle 01", "content" => "Page Contents  01" },
+#      {"title" => "PageTitle 02", "content" => "Page Contents  02" },
+#      ...
+#      {"title" => "PageTitle N",  "content" => "Page Contents  N" }
+#    ]
 ```
 この例では、 LinkNode(`links_root`)の xpath で指定された各リンク先のページから、TextNode(`text_title`,`text_content`) の xpath で指定された2つのテキストをスクレイピングする例です．
 (言い換えると、`//*[@id="menu"]/ul/li/a` で示される各リンクを開いて、`//*[@id="contents"]/h2` と `//*[@id="contents"]/p[1]` で指定されたテキストをスクレイピングします)
-## 基本
-1. パースツリーを作る
-2. Mechanize の agent と対象のページを与えてパースを開始する
+#### CLIツールとして使う
+上記と同じことを、CLIのコマンドとして実行できます。
+```sh
+$ yasuri scrape "http://some.scraping.page.tac42.net/" -j '
+{
+  "links_root": {
+    "path": "//*[@id=\"menu\"]/ul/li/a",
+    "text_title": "//*[@id=\"contents\"]/h2",
+    "text_content": "//*[@id=\"contents\"]/p[1]"
+    }
+}'
+[
+  {"title":"PageTitle 01","content":"Page Contents  01"},
+  {"title":"PageTitle 02","content":"Page Contents  02"},
+  ...,
+  {"title":"PageTitle N","content":"Page Contents  N"}
+]
+```
+結果はjson形式の文字列として取得できます。
-### パースツリーを作る
+----------------------------
+## パースツリー
-```ruby
-require 'mechanize'
-require 'yasuri'
+パースツリーとは、スクレイピングする要素と出力構造を宣言的に定義するための木構造データです。
+パースツリーは入れ子になった Node で構成されます．Node は `Type`, `Name`, `Path`, `Childlen`, `Options` 属性を持っており、その `Type` に応じたスクレイピング処理を行います．(ただし、`MapNode` のみ `Path` を持ちません)
-# 1. パースツリーを作る
-tree = Yasuri.links_title '/html/body/a' do
-         text_name '/html/body/p'
-       end
+パースツリーは以下のフォーマットで定義されます．
-# 2. Mechanize の agent と対象のページを与えてパースを開始する
-agent = Mechanize.new
-page = agent.get(uri)
+```ruby
+# 1ノードからなる単純なツリー
+Yasuri.<Type>_<Name> <Path> [,<Options>]
+# 入れ子になっているツリー
+Yasuri.<Type>_<Name> <Path> [,<Options>] do
+  <Type>_<Name> <Path> [,<Options>] do
+    <Type>_<Name> <Path> [,<Options>]
+    ...
+  end
+end
+```
-tree.inject(agent, page)
+**例**
+```ruby
+# 1ノードからなる単純なツリー
+Yasuri.text_title '/html/head/title', truncate:/^[^,]+/
+# 入れ子になっているツリー
+Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
+  struct_table './tr' do
+    text_title    './td[1]'
+    text_pub_date './td[2]'
+  end
+end
 ```
-ツリーは、DSLまたはjsonで定義することができます．上の例ではDSLで定義しています．
-以下は、jsonで上記と等価な解析ツリーを定義した例です．
+パースツリーはRubyのDSL、JSON、YAMLのいずれかで定義することができます。
+以下は、上記と同じパースツリーをそれぞれの記法で定義した例です。
+**Ruby DSLで定義する場合**
 ```ruby
-# json で構成する場合
-src = <<-EOJSON
-   { "node"     : "links",
-     "name"     : "title",
-     "path"     : "/html/body/a",
-     "children" : [
-                    { "node" : "text",
-                      "name" : "name",
-                      "path" : "/html/body/p"
-                    }
-                  ]
-   }
-EOJSON
-tree = Yasuri.json2tree(src)
+Yasuri.links_title '/html/body/a' do
+  text_name '/html/body/p'
+end
 ```
+**JSONで定義する場合**
+```json
+{
+  links_title": {
+    "path": "/html/body/a",
+    "text_name": "/html/body/p"
+  }
+}
+```
-### Node
-ツリーは入れ子になった *Node* で構成されます．
-Node は `Type`, `Name`, `Path`, `Childlen`, `Options` を持っています．
+**YAMLで定義する場合**
+```yaml
+links_title:
+  path: "/html/body/a"
+  text_name: "/html/body/p"
+```
-Nodeは以下のフォーマットで定義されます．
+**パースツリーの特殊なケース**
-```ruby
-# トップレベル
-Yasuri.<Type>_<Name> <Path> [,<Options>]
+rootの直下の要素が1つだけの場合、Hash(Object)ではなく、その要素を直接返します。
+```json
+{
+  "text_title": "/html/head/title",
+  "text_body": "/html/body",
+}
+# => {"title": "Welcome to yasuri!", "body": "Yasuri is ..."}
-# 入れ子になっている場合
-Yasuri.<Type>_<Name> <Path> [,<Options>] do
-  <Type>_<Name> <Path> [,<Options>] do
-    <Children>
-  end
-end
+{
+  "text_title": "/html/head/title"}
+}
+# => Welcome to yasuri!
+```
+jsonまたはyaml形式では、子Nodeを持たない場合、`path` を直接値に指定することができます。以下の2つのjsonは同じパースツリーになります。
+```json
+{
+  "text_name": "/html/body/p"
+}
+{
+  "text_name": {
+    "path": "/html/body/p"
+  }
+}
 ```
+--------------------------
+## Node
+Nodeはパースツリーの節または葉となる要素で、`Type`, `Name`, `Path`, `Childlen`, `Options` を持っており、その `Type` に応じてスクレイピングを行います．(ただし、`MapNode` のみ `Path` を持ちません)
 #### Type
 *Type* は Nodeの振る舞いを示します．Typeには以下のものがあります．
@@ -113,18 +186,21 @@ end
 - *Struct*
 - *Links*
 - *Paginate*
+- *Map*
+詳細は各ノードの説明を参照してください。
-### Name
+#### Name
 *Name* は 解析結果のHashにおけるキーになります．
-### Path
+#### Path
 *Path* は xpath あるいは css セレクタによって、HTML上の特定のノードを指定します．
 これは Machinize の `search` で使用されます．
-### Childlen
+#### Childlen
 入れ子になっているノードの子ノードです．TextNodeはツリーの葉に当たるため、子ノードを持ちません．
-### Options
+#### Options
 パースのオプションです．オプションはTypeごとに異なります．
 各ノードに対して、`opt`メソッドをコールすることで、利用可能なオプションを取得できます．
@@ -156,13 +232,16 @@ page = agent.get("http://yasuri.example.net")
 p1  = Yasuri.text_title '/html/body/p[1]'
 p1t = Yasuri.text_title '/html/body/p[1]', truncate:/^[^,]+/
-p2u = Yasuri.text_title '/html/body/p[2]', proc: :upcase
+p2u = Yasuri.text_title '/html/body/p[1]', proc: :upcase
-p1.inject(agent, page)   #=> { "title" => "Hello,World" }
-p1t.inject(agent, page)  #=> { "title" => "Hello" }
-node.inject(agent, page) #=> { "title" => "HELLO,YASURI" }
+p1.inject(agent, page)   #=> "Hello,World"
+p1t.inject(agent, page)  #=> "Hello"
+p2u.inject(agent, page)  #=> "HELLO,WORLD"
 ```
+なお、同じページ内の複数の要素を一度にスクレイピングする場合は、`MapNode`を使用します。詳細は、`MapNode`の例を参照してください。
 ### オプション
 ##### `truncate`
 正規表現にマッチした文字列を取り出します．グループを指定した場合、最初にマッチしたグループだけを返します．
@@ -431,3 +510,186 @@ node.inject(agent, page)
 #=> [ {"content" => "Pagination01"}, {"content" => "Pagination02"}]
 ```
 この場合、PaginateNode は最大2つまでのページを開いてパースします．ページネーションは4つのページを持っているようですが、`limit:2`が指定されているため、結果の配列には2つの結果のみが含まれています．
+##### `flatten`
+取得した各ページの結果を展開します．
+```ruby
+agent = Mechanize.new
+page = agent.get("http://yasuri.example.net/page01.html")
+node = Yasuri.pages_root "/html/body/nav/span/a[@class='next']" , flatten:true do
+         text_title   '/html/head/title'
+         text_content '/html/body/p'
+       end
+node.inject(agent, page)
+#=> [ {"title" => "Page01",
+       "content" => "Patination01"},
+      {"title"   => "Page01",
+       "content" => "Patination02"},
+      {"title"   => "Page01",
+       "content" => "Patination03"}]
+node = Yasuri.pages_root "/html/body/nav/span/a[@class='next']" , flatten:true do
+        text_title   '/html/head/title'
+        text_content '/html/body/p'
+      end
+node.inject(agent, page)
+#=> [ "Page01",
+      "Patination01",
+      "Page02",
+      "Patination02",
+      "Page03",
+      "Patination03"]
+```
+## Map Node
+*MapNode* はスクレイピングした結果をまとめるノードです．このノードはパースツリーにおいて常に節です．
+### 例
+```html
+<!-- http://yasuri.example.net -->
+<html>
+  <head><title>Yasuri Example</title></head>
+  <body>
+    <p>Hello,World</p>
+    <p>Hello,Yasuri</p>
+  </body>
+</html>
+```
+```ruby
+agent = Mechanize.new
+page = agent.get("http://yasuri.example.net")
+tree = Yasuri.map_root do
+  text_title  '/html/head/title'
+  text_body_p '/html/body/p[1]'
+end
+tree.inject(agent, page) #=> { "title" => "Yasuri Example", "body_p" => "Hello,World" }
+tree = Yasuri.map_root do
+  map_group1 { text_child01  '/html/body/a[1]' }
+  map_group2 do
+    text_child01 '/html/body/a[1]'
+    text_child03 '/html/body/a[3]'
+  end
+end
+tree.inject(agent, page) #=> {
+#   "group1" => {
+#           "child01" => "child01"
+#         },
+#         "group2" => {
+#           "child01" => "child01",
+#           "child03" => "child03"
+#         }
+# }
+```
+### オプション
+なし
+-------------------------
+## 使い方
+#### ライブラリとして使用する場合
+ライブラリとして使用する場合は、DSL, json, yaml の形式でツリーを定義できます。
+```ruby
+require 'mechanize'
+require 'yasuri'
+# 1. パースツリーを作る
+# DSLで定義する倍
+tree = Yasuri.links_title '/html/body/a' do
+         text_name '/html/body/p'
+       end
+# jsonで定義する場合
+src = <<-EOJSON
+{
+  links_title": {
+    "path": "/html/body/a",
+    "text_name": "/html/body/p"
+  }
+}
+EOJSON
+tree = Yasuri.json2tree(src)
+# yamlで定義する場合
+src = <<-EOYAML
+links_title:
+  path: "/html/body/a"
+  text_name: "/html/body/p"
+EOYAML
+tree = Yasuri.yaml2tree(src)
+# 2. Mechanize の agent と対象のページを与えてパースを開始する
+agent = Mechanize.new
+page = agent.get(uri)
+tree.inject(agent, page)
+```
+#### CLIツールとして使用する場合
+**ヘルプ表示**
+```sh
+$ yasuri help scrape
+Usage:
+  yasuri scrape <URI> [[--file <TREE_FILE>] or [--json <JSON>]]
+Options:
+  f, [--file=FILE]  # path to file that written yasuri tree as json or yaml
+  j, [--json=JSON]  # yasuri tree format json string
+Getting from <URI> and scrape it. with <JSON> or json/yml from <TREE_FILE>. They should be Yasuri's format json or yaml string.
+```
+CLIツールでは以下のどちらかの方法でパースツリーを指定します。
++ `--file`, `-f` オプションで、ファイルに出力されたjson形式またはyaml形式のパースツリーを読み込む
++ `--json`, `-j` オプションで、パースツリーを文字列として直接指定する
+**パースツリーをファイルで指定する例**
+```sh
+% cat sample.yml
+text_title: "/html/head/title"
+text_desc: "//*[@id=\"intro\"]/p"
+% yasuri scrape "https://www.ruby-lang.org/en/" --file sample.yml
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+% cat sample.json
+{
+  "text_title": "/html/head/title",
+  "text_desc": "//*[@id=\"intro\"]/p"
+}
+% yasuri scrape "https://www.ruby-lang.org/en/" --file sample.json
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+```
+**パースツリーをjsonで直接指定する例**
+```sh
+$ yasuri scrape "https://www.ruby-lang.org/en/" -j '
+{
+  "text_title": "/html/head/title",
+  "text_desc": "//*[@id=\"intro\"]/p"
+}'
+{"title":"Ruby Programming Language","desc":"\n    A dynamic, open source programming language with a focus on\n    simplicity and productivity. It has an elegant syntax that is\n    natural to read and easy to write.\n    "}
+```