yasuri 3.0.0 → 3.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7f360d6efb02954a5a54e2fc308d0cd0c2e5c129c52eba727fb0dfe4a40ce502
4
- data.tar.gz: 8d8805a55c7ce16c76eb50945b954ad19327a3a63183eca098dac6ac93d2203b
3
+ metadata.gz: f3542a2cc0959a4534520f6104fc2922bdf0dbd368fcd4c149c3d251c2fc2198
4
+ data.tar.gz: 6fdb960db697e9a4ec1d87f2b83bf0e9914e3c9efe90764536bbee6d68774353
5
5
  SHA512:
6
- metadata.gz: ffe02aee78de5f30f1e583b2aca8c0617324bdbf62d7c64e371e90d139bac8b1d26df23e9725df0b81b946c6a465283f88a7d51945872c56e7be892eac1b5e4e
7
- data.tar.gz: c8983dc2cd283c7de0d97357d2a8164426ee3e1017e73c498c0676716a1c9ab4c42cc02a836bf7e559877d50ca23df6fa656c0197b5018a4881997e2fb4c57d0
6
+ metadata.gz: 9df576243bea289f4c285c46f1bd2137b7b69b79b24e0c657e4ac952114dd7bcf82a5f95cd2dae88c6eac4e3e468273b7dbd6ead9d05ffdc8d25861921702333
7
+ data.tar.gz: 13f2ae72b3e8fa6d3ef58932daa2acad49f5d4f57c80f34e5215394940fc2305bc016d949760efe9f43ae2b8c3796064a1b0bd9bccf236cfe3789c2c291dfd8b
data/README.md CHANGED
@@ -1,5 +1,6 @@
1
1
  # Yasuri
2
- [![Build Status](https://travis-ci.org/tac0x2a/yasuri.svg?branch=master)](https://travis-ci.org/tac0x2a/yasuri) [![Coverage Status](https://coveralls.io/repos/tac0x2a/yasuri/badge.svg?branch=master)](https://coveralls.io/r/tac0x2a/yasuri?branch=master) [![Maintainability](https://api.codeclimate.com/v1/badges/c29480fea1305afe999f/maintainability)](https://codeclimate.com/github/tac0x2a/yasuri/maintainability)
2
+ [![Build Status](https://github.com/tac0x2a/yasuri/actions/workflows/ruby.yml/badge.svg)](https://github.com/tac0x2a/yasuri/actions/workflows/ruby.yml)
3
+ [![Coverage Status](https://coveralls.io/repos/tac0x2a/yasuri/badge.svg?branch=master)](https://coveralls.io/r/tac0x2a/yasuri?branch=master) [![Maintainability](https://api.codeclimate.com/v1/badges/c29480fea1305afe999f/maintainability)](https://codeclimate.com/github/tac0x2a/yasuri/maintainability)
3
4
 
4
5
  Yasuri (鑢) is an easy web-scraping library for supporting "[Mechanize](https://github.com/sparklemotion/mechanize)".
5
6
 
@@ -33,6 +34,9 @@ or
33
34
  ```ruby
34
35
  # for Ruby 1.9.3 or lower
35
36
  gem 'yasuri', '~> 1.9'
37
+
38
+ # for Ruby 3.0.0 or lower
39
+ gem 'yasuri', '~> 3.0.1'
36
40
  ```
37
41
 
38
42
 
@@ -104,6 +108,16 @@ $ rake
104
108
  $ rspec spec/*spec.rb
105
109
  ```
106
110
 
111
+ ### Release RubyGems
112
+ ```sh
113
+ # Only first time
114
+ $ curl -u <user_name> https://rubygems.org/api/v1/api_key.yaml > ~/.gem/credentials
115
+ $ chmod 0600 ~/.gem/credentials
116
+
117
+ $ nano lib/yasuri/version.rb # edit gem version
118
+ $ rake release
119
+ ```
120
+
107
121
  ## Contributing
108
122
 
109
123
  1. Fork it ( https://github.com/tac0x2a/yasuri/fork )
data/USAGE.ja.md CHANGED
@@ -104,21 +104,37 @@ tree = Yasuri.yaml2tree(src)
104
104
  ### Node
105
105
  ツリーは入れ子になった *Node* で構成されます.
106
106
  Node は `Type`, `Name`, `Path`, `Childlen`, `Options` を持っています.
107
+ (ただし、`MapNode` のみ `Path` を持ちません)
107
108
 
108
109
  Nodeは以下のフォーマットで定義されます.
109
110
 
110
111
  ```ruby
111
- # トップレベル
112
112
  Yasuri.<Type>_<Name> <Path> [,<Options>]
113
113
 
114
114
  # 入れ子になっている場合
115
115
  Yasuri.<Type>_<Name> <Path> [,<Options>] do
116
116
  <Type>_<Name> <Path> [,<Options>] do
117
- <Children>
117
+ <Type>_<Name> <Path> [,<Options>]
118
+ ...
118
119
  end
119
120
  end
120
121
  ```
121
122
 
123
+
124
+
125
+ ```ruby
126
+ Yasuri.text_title '/html/head/title', truncate:/^[^,]+/
127
+
128
+ # 入れ子になっている場合
129
+ Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
130
+ struct_table './tr' do
131
+ text_title './td[1]'
132
+ text_pub_date './td[2]'
133
+ end
134
+ end
135
+ ```
136
+
137
+
122
138
  #### Type
123
139
  *Type* は Nodeの振る舞いを示します.Typeには以下のものがあります.
124
140
 
@@ -126,18 +142,19 @@ end
126
142
  - *Struct*
127
143
  - *Links*
128
144
  - *Paginate*
145
+ - *Map*
129
146
 
130
- ### Name
147
+ #### Name
131
148
  *Name* は 解析結果のHashにおけるキーになります.
132
149
 
133
- ### Path
150
+ #### Path
134
151
  *Path* は xpath あるいは css セレクタによって、HTML上の特定のノードを指定します.
135
152
  これは Machinize の `search` で使用されます.
136
153
 
137
- ### Childlen
154
+ #### Childlen
138
155
  入れ子になっているノードの子ノードです.TextNodeはツリーの葉に当たるため、子ノードを持ちません.
139
156
 
140
- ### Options
157
+ #### Options
141
158
  パースのオプションです.オプションはTypeごとに異なります.
142
159
  各ノードに対して、`opt`メソッドをコールすることで、利用可能なオプションを取得できます.
143
160
 
@@ -169,13 +186,15 @@ page = agent.get("http://yasuri.example.net")
169
186
 
170
187
  p1 = Yasuri.text_title '/html/body/p[1]'
171
188
  p1t = Yasuri.text_title '/html/body/p[1]', truncate:/^[^,]+/
172
- p2u = Yasuri.text_title '/html/body/p[2]', proc: :upcase
189
+ p2u = Yasuri.text_title '/html/body/p[1]', proc: :upcase
173
190
 
174
- p1.inject(agent, page) #=> { "title" => "Hello,World" }
175
- p1t.inject(agent, page) #=> { "title" => "Hello" }
176
- node.inject(agent, page) #=> { "title" => "HELLO,YASURI" }
191
+ p1.inject(agent, page) #=> "Hello,World"
192
+ p1t.inject(agent, page) #=> "Hello"
193
+ p2u.inject(agent, page) #=> "HELLO,WORLD"
177
194
  ```
178
195
 
196
+ なお、同じページ内の複数の要素を一度にスクレイピングする場合は、`MapNode`を使用します。
197
+
179
198
  ### オプション
180
199
  ##### `truncate`
181
200
  正規表現にマッチした文字列を取り出します.グループを指定した場合、最初にマッチしたグループだけを返します.
@@ -479,3 +498,54 @@ node.inject(agent, page)
479
498
  "Page03",
480
499
  "Patination03"]
481
500
  ```
501
+
502
+ ## Map Node
503
+ *MapNode* はスクレイピングした結果をまとめるノードです.このノードはパースツリーにおいて常に節です.
504
+
505
+ ### 例
506
+
507
+ ```html
508
+ <!-- http://yasuri.example.net -->
509
+ <html>
510
+ <head><title>Yasuri Example</title></head>
511
+ <body>
512
+ <p>Hello,World</p>
513
+ <p>Hello,Yasuri</p>
514
+ </body>
515
+ </html>
516
+ ```
517
+
518
+ ```ruby
519
+ agent = Mechanize.new
520
+ page = agent.get("http://yasuri.example.net")
521
+
522
+
523
+ tree = Yasuri.map_root do
524
+ text_title '/html/head/title'
525
+ text_body_p '/html/body/p[1]'
526
+ end
527
+
528
+ tree.inject(agent, page) #=> { "title" => "Yasuri Example", "body_p" => "Hello,World" }
529
+
530
+
531
+ tree = Yasuri.map_root do
532
+ map_group1 { text_child01 '/html/body/a[1]' }
533
+ map_group2 do
534
+ text_child01 '/html/body/a[1]'
535
+ text_child03 '/html/body/a[3]'
536
+ end
537
+ end
538
+
539
+ tree.inject(agent, page) #=> {
540
+ # "group1" => {
541
+ # "child01" => "child01"
542
+ # },
543
+ # "group2" => {
544
+ # "child01" => "child01",
545
+ # "child03" => "child03"
546
+ # }
547
+ # }
548
+ ```
549
+
550
+ ### オプション
551
+ なし
data/USAGE.md CHANGED
@@ -106,18 +106,33 @@ tree = Yasuri.yaml2tree(src)
106
106
  ### Node
107
107
  Tree is constructed by nested Nodes.
108
108
  Node has `Type`, `Name`, `Path`, `Childlen`, and `Options`.
109
+ (But only `MapNode` does not have `Path`.)
109
110
 
110
111
  Node is defined by this format.
111
112
 
112
113
 
113
114
  ```ruby
114
- # Top Level
115
115
  Yasuri.<Type>_<Name> <Path> [,<Options>]
116
116
 
117
- # Nested
117
+ # Nested case
118
118
  Yasuri.<Type>_<Name> <Path> [,<Options>] do
119
119
  <Type>_<Name> <Path> [,<Options>] do
120
- <Children>
120
+ <Type>_<Name> <Path> [,<Options>]
121
+ ...
122
+ end
123
+ end
124
+ ```
125
+
126
+ Example
127
+
128
+ ```ruby
129
+ Yasuri.text_title '/html/head/title', truncate:/^[^,]+/
130
+
131
+ # Nested case
132
+ Yasuri.links_root '//*[@id="menu"]/ul/li/a' do
133
+ struct_table './tr' do
134
+ text_title './td[1]'
135
+ text_pub_date './td[2]'
121
136
  end
122
137
  end
123
138
  ```
@@ -129,17 +144,18 @@ Type meen behavior of Node.
129
144
  - *Struct*
130
145
  - *Links*
131
146
  - *Paginate*
147
+ - *Map*
132
148
 
133
- ### Name
149
+ #### Name
134
150
  Name is used keys in returned hash.
135
151
 
136
- ### Path
152
+ #### Path
137
153
  Path determine target node by xpath or css selector. It given by Machinize `search`.
138
154
 
139
- ### Childlen
155
+ #### Childlen
140
156
  Child nodes. TextNode has always empty set, because TextNode is leaf.
141
157
 
142
- ### Options
158
+ #### Options
143
159
  Parse options. It different in each types. You can get options and values by `opt` method.
144
160
 
145
161
  ```ruby
@@ -170,13 +186,15 @@ page = agent.get("http://yasuri.example.net")
170
186
 
171
187
  p1 = Yasuri.text_title '/html/body/p[1]'
172
188
  p1t = Yasuri.text_title '/html/body/p[1]', truncate:/^[^,]+/
173
- p2u = Yasuri.text_title '/html/body/p[2]', proc: :upcase
189
+ p2u = Yasuri.text_title '/html/body/p[1]', proc: :upcase
174
190
 
175
- p1.inject(agent, page) #=> { "title" => "Hello,World" }
176
- p1t.inject(agent, page) #=> { "title" => "Hello" }
177
- node.inject(agent, page) #=> { "title" => "HELLO,YASURI" }
191
+ p1.inject(agent, page) #=> "Hello,World"
192
+ p1t.inject(agent, page) #=> "Hello"
193
+ p2u.inject(agent, page) #=> "HELLO,WORLD"
178
194
  ```
179
195
 
196
+ Note that if you want to scrape multiple elements in the same page at once, use `MapNode`. See the `MapNode` example for details.
197
+
180
198
  ### Options
181
199
  ##### `truncate`
182
200
  Match to regexp, and truncate text. When you use group, it will return first matched group only.
@@ -479,3 +497,54 @@ node.inject(agent, page)
479
497
  "Page03",
480
498
  "Patination03"]
481
499
  ```
500
+
501
+ ## Map Node
502
+ *MapNode* is a node that summarizes the results of scraping. This node is always a branch node in the parse tree.
503
+
504
+ ### Example
505
+
506
+ ```html
507
+ <!-- http://yasuri.example.net -->
508
+ <html>
509
+ <head><title>Yasuri Example</title></head>
510
+ <body>
511
+ <p>Hello,World</p>
512
+ <p>Hello,Yasuri</p>
513
+ </body>
514
+ </html>
515
+ ```
516
+
517
+ ```ruby
518
+ agent = Mechanize.new
519
+ page = agent.get("http://yasuri.example.net")
520
+
521
+
522
+ tree = Yasuri.map_root do
523
+ text_title '/html/head/title'
524
+ text_body_p '/html/body/p[1]'
525
+ end
526
+
527
+ tree.inject(agent, page) #=> { "title" => "Yasuri Example", "body_p" => "Hello,World" }
528
+
529
+
530
+ tree = Yasuri.map_root do
531
+ map_group1 { text_child01 '/html/body/a[1]' }
532
+ map_group2 do
533
+ text_child01 '/html/body/a[1]'
534
+ text_child03 '/html/body/a[3]'
535
+ end
536
+ end
537
+
538
+ tree.inject(agent, page) #=> {
539
+ # "group1" => {
540
+ # "child01" => "child01"
541
+ # },
542
+ # "group2" => {
543
+ # "child01" => "child01",
544
+ # "child03" => "child03"
545
+ # }
546
+ # }
547
+ ```
548
+
549
+ ### Options
550
+ None.
@@ -1,3 +1,3 @@
1
1
  module Yasuri
2
- VERSION = "3.0.0"
2
+ VERSION = "3.1.0"
3
3
  end
data/lib/yasuri/yasuri.rb CHANGED
@@ -11,6 +11,7 @@ require_relative 'yasuri_text_node'
11
11
  require_relative 'yasuri_struct_node'
12
12
  require_relative 'yasuri_paginate_node'
13
13
  require_relative 'yasuri_links_node'
14
+ require_relative 'yasuri_map_node'
14
15
  require_relative 'yasuri_node_generator'
15
16
 
16
17
  module Yasuri
@@ -54,9 +55,9 @@ module Yasuri
54
55
  body
55
56
  end
56
57
 
57
- def self.method_missing(node_name, pattern, **opt, &block)
58
- generated = Yasuri::NodeGenerator.gen(node_name, pattern, **opt, &block)
59
- generated || super(node_name, **opt)
58
+ def self.method_missing(method_name, pattern=nil, **opt, &block)
59
+ generated = Yasuri::NodeGenerator.gen(method_name, pattern, **opt, &block)
60
+ generated || super(method_name, **opt)
60
61
  end
61
62
 
62
63
  private
@@ -64,49 +65,22 @@ module Yasuri
64
65
  text: Yasuri::TextNode,
65
66
  struct: Yasuri::StructNode,
66
67
  links: Yasuri::LinksNode,
67
- pages: Yasuri::PaginateNode
68
+ pages: Yasuri::PaginateNode,
69
+ map: Yasuri::MapNode
68
70
  }
69
71
  Node2Text = Text2Node.invert
70
72
 
71
73
  ReservedKeys = %i|node name path children|
72
74
  def self.hash2node(node_h)
73
- node, name, path, children = ReservedKeys.map do |key|
74
- node_h[key]
75
- end
76
- children ||= []
75
+ node = node_h[:node]
77
76
 
78
77
  fail "Not found 'node' value in map" if node.nil?
79
- fail "Not found 'name' value in map" if name.nil?
80
- fail "Not found 'path' value in map" if path.nil?
81
-
82
- childnodes = children.map{|c| Yasuri.hash2node(c) }
83
- ReservedKeys.each{|key| node_h.delete(key)}
84
- opt = node_h
85
-
86
78
  klass = Text2Node[node.to_sym]
87
- fail "Undefined node type #{node}" if klass.nil?
88
- klass.new(path, name, childnodes, **opt)
79
+ klass::hash2node(node_h)
89
80
  end
90
81
 
91
82
  def self.node2hash(node)
92
- json = JSON.parse("{}")
93
- return json if node.nil?
94
-
95
- klass = node.class
96
- klass_str = Node2Text[klass]
97
-
98
- json["node"] = klass_str
99
- json["name"] = node.name
100
- json["path"] = node.xpath
101
-
102
- children = node.children.map{|c| Yasuri.node2hash(c)}
103
- json["children"] = children if not children.empty?
104
-
105
- node.opts.each do |key,value|
106
- json[key] = value if not value.nil?
107
- end
108
-
109
- json
83
+ node.to_h
110
84
  end
111
85
 
112
86
  def self.NodeName(name, opt)
@@ -22,5 +22,9 @@ module Yasuri
22
22
  Hash[child_results_kv]
23
23
  end # each named child node
24
24
  end
25
- end
26
- end
25
+
26
+ def node_type_str
27
+ "links"
28
+ end
29
+ end # class
30
+ end # module
@@ -0,0 +1,54 @@
1
+
2
+ module Yasuri
3
+ class MapNode
4
+ attr_reader :name, :children
5
+
6
+ def initialize(name, children, opt: {})
7
+ @name = name
8
+ @children = children
9
+ @opt = opt
10
+ end
11
+
12
+ def inject(agent, page, opt = {}, element = page)
13
+ child_results_kv = @children.map do |node|
14
+ [node.name, node.inject(agent, page, opt)]
15
+ end
16
+ Hash[child_results_kv]
17
+ end
18
+
19
+ def opts
20
+ {}
21
+ end
22
+
23
+ def to_h
24
+ h = {}
25
+ h["node"] = "map"
26
+ h["name"] = self.name
27
+ h["children"] = self.children.map{|c| c.to_h} if not children.empty?
28
+
29
+ self.opts.each do |key,value|
30
+ h[key] = value if not value.nil?
31
+ end
32
+
33
+ h
34
+ end
35
+
36
+ def self.hash2node(node_h)
37
+ reservedKeys = %i|node name children|
38
+
39
+ node, name, children = reservedKeys.map do |key|
40
+ node_h[key]
41
+ end
42
+
43
+ fail "Not found 'name' value in map" if name.nil?
44
+ fail "Not found 'children' value in map" if children.nil?
45
+ children ||= []
46
+
47
+ childnodes = children.map{|c| Yasuri.hash2node(c) }
48
+ reservedKeys.each{|key| node_h.delete(key)}
49
+ opt = node_h
50
+
51
+ self.new(name, childnodes, **opt)
52
+ end
53
+ end
54
+ end
@@ -12,10 +12,53 @@ module Yasuri
12
12
  end
13
13
 
14
14
  def inject(agent, page, opt = {}, element = page)
15
- fail "#{Kernel.__method__} is not implemented."
15
+ fail "#{Kernel.__method__} is not implemented in included class."
16
16
  end
17
+
17
18
  def opts
18
19
  {}
19
20
  end
21
+
22
+ def to_h
23
+ h = {}
24
+ h["node"] = self.node_type_str
25
+ h["name"] = self.name
26
+ h["path"] = self.xpath
27
+ h["children"] = self.children.map{|c| c.to_h} if not children.empty?
28
+
29
+ self.opts.each do |key,value|
30
+ h[key] = value if not value.nil?
31
+ end
32
+
33
+ h
34
+ end
35
+
36
+ module ClassMethods
37
+ def hash2node(node_h)
38
+ reservedKeys = %i|node name path children|
39
+
40
+ node, name, path, children = ReservedKeys.map do |key|
41
+ node_h[key]
42
+ end
43
+
44
+ fail "Not found 'name' value in map" if name.nil?
45
+ fail "Not found 'path' value in map" if path.nil?
46
+ children ||= []
47
+
48
+ childnodes = children.map{|c| Yasuri.hash2node(c) }
49
+ reservedKeys.each{|key| node_h.delete(key)}
50
+ opt = node_h
51
+
52
+ self.new(path, name, childnodes, **opt)
53
+ end
54
+
55
+ def node_type_str
56
+ fail "#{Kernel.__method__} is not implemented in included class."
57
+ end
58
+ end
59
+
60
+ def self.included(base)
61
+ base.extend(ClassMethods)
62
+ end
20
63
  end
21
64
  end
@@ -6,6 +6,7 @@ require_relative 'yasuri_text_node'
6
6
  require_relative 'yasuri_struct_node'
7
7
  require_relative 'yasuri_links_node'
8
8
  require_relative 'yasuri_paginate_node'
9
+ require_relative 'yasuri_map_node'
9
10
 
10
11
  module Yasuri
11
12
  class NodeGenerator
@@ -15,27 +16,33 @@ module Yasuri
15
16
  @nodes
16
17
  end
17
18
 
18
- def method_missing(name, pattern, **args, &block)
19
+ def method_missing(name, pattern=nil, **args, &block)
19
20
  node = NodeGenerator.gen(name, pattern, **args, &block)
20
21
  raise "Undefined Node Name '#{name}'" if node == nil
21
22
  @nodes << node
22
23
  end
23
24
 
24
- def self.gen(name, xpath, **opt, &block)
25
+ def self.gen(method_name, xpath, **opt, &block)
25
26
  children = Yasuri::NodeGenerator.new.gen_recursive(&block) if block_given?
26
27
 
27
- case name
28
+ case method_name
28
29
  when /^text_(.+)$/
29
- Yasuri::TextNode.new(xpath, $1, children || [], **opt)
30
+ # Todo raise error xpath is not valid
31
+ Yasuri::TextNode.new(xpath, $1, children || [], **opt)
30
32
  when /^struct_(.+)$/
33
+ # Todo raise error xpath is not valid
31
34
  Yasuri::StructNode.new(xpath, $1, children || [], **opt)
32
35
  when /^links_(.+)$/
33
- Yasuri::LinksNode.new(xpath, $1, children || [], **opt)
36
+ # Todo raise error xpath is not valid
37
+ Yasuri::LinksNode.new(xpath, $1, children || [], **opt)
34
38
  when /^pages_(.+)$/
39
+ # Todo raise error xpath is not valid
35
40
  Yasuri::PaginateNode.new(xpath, $1, children || [], **opt)
41
+ when /^map_(.+)$/
42
+ Yasuri::MapNode.new($1, children, **opt)
36
43
  else
37
44
  nil
38
45
  end
39
- end # of self.gen(name, *args, &block)
46
+ end # of self.gen(method_name, xpath, **opt, &block)
40
47
  end # of class NodeGenerator
41
48
  end
@@ -44,5 +44,9 @@ module Yasuri
44
44
  def opts
45
45
  {limit:@limit, flatten:@flatten}
46
46
  end
47
+
48
+ def node_type_str
49
+ "pages"
50
+ end
47
51
  end
48
52
  end
@@ -34,6 +34,10 @@ module Yasuri
34
34
  text
35
35
  end
36
36
 
37
+ def node_type_str
38
+ "text"
39
+ end
40
+
37
41
  def opts
38
42
  {truncate:@truncate, proc:@proc}
39
43
  end
@@ -0,0 +1,76 @@
1
+ require_relative 'spec_helper'
2
+
3
+ describe 'Yasuri' do
4
+ include_context 'httpserver'
5
+
6
+ before do
7
+ @agent = Mechanize.new
8
+ @index_page = @agent.get(uri)
9
+ end
10
+
11
+ describe '::MapNode' do
12
+ it "multi scrape in singe page" do
13
+ map = Yasuri.map_sample do
14
+ text_title '/html/head/title'
15
+ text_body_p '/html/body/p[1]'
16
+ end
17
+ actual = map.inject(@agent, @index_page)
18
+
19
+ expected = {
20
+ "title" => "Yasuri Test",
21
+ "body_p" => "Hello,Yasuri"
22
+ }
23
+ expect(actual).to include expected
24
+ end
25
+
26
+ it "nested multi scrape in singe page" do
27
+ map = Yasuri.map_sample do
28
+ map_group1 { text_child01 '/html/body/a[1]' }
29
+ map_group2 do
30
+ text_child01 '/html/body/a[1]'
31
+ text_child03 '/html/body/a[3]'
32
+ end
33
+ end
34
+ actual = map.inject(@agent, @index_page)
35
+
36
+ expected = {
37
+ "group1" => {
38
+ "child01" => "child01"
39
+ },
40
+ "group2" => {
41
+ "child01" => "child01",
42
+ "child03" => "child03"
43
+ }
44
+ }
45
+ expect(actual).to include expected
46
+ end
47
+
48
+ it "scrape with links node" do
49
+ map = Yasuri.map_sample do
50
+ map_group1 do
51
+ links_a '/html/body/a' do
52
+ text_content '/html/body/p'
53
+ end
54
+ text_child01 '/html/body/a[1]'
55
+ end
56
+ map_group2 do
57
+ text_child03 '/html/body/a[3]'
58
+ end
59
+ end
60
+ actual = map.inject(@agent, @index_page)
61
+
62
+ expected = {
63
+ "group1" => {
64
+ "a" => [
65
+ {"content" => "Child 01 page."},
66
+ {"content" => "Child 02 page."},
67
+ {"content" => "Child 03 page."},
68
+ ],
69
+ "child01" => "child01"
70
+ },
71
+ "group2" => { "child03" => "child03" }
72
+ }
73
+ expect(actual).to include expected
74
+ end
75
+ end
76
+ end
data/spec/yasuri_spec.rb CHANGED
@@ -126,6 +126,27 @@ EOB
126
126
  compare_generated_vs_original(generated, original, @index_page)
127
127
  end
128
128
 
129
+ it "return MapNode with TextNodes" do
130
+ src = %q| { "node" : "map",
131
+ "name" : "parent",
132
+ "children" : [
133
+ { "node" : "text",
134
+ "name" : "content01",
135
+ "path" : "/html/body/p[1]"
136
+ },
137
+ { "node" : "text",
138
+ "name" : "content02",
139
+ "path" : "/html/body/p[2]"
140
+ }
141
+ ]
142
+ }|
143
+ generated = Yasuri.json2tree(src)
144
+ original = Yasuri::MapNode.new('parent', [
145
+ Yasuri::TextNode.new('/html/body/p[1]', "content01"),
146
+ Yasuri::TextNode.new('/html/body/p[2]', "content02"),
147
+ ])
148
+ compare_generated_vs_original(generated, original, @index_page)
149
+ end
129
150
 
130
151
  it "return LinksNode/TextNode" do
131
152
  src = %q| { "node" : "links",
@@ -248,6 +269,31 @@ EOB
248
269
  expect(actual).to match expected
249
270
  end
250
271
 
272
+ it "return map node with text nodes" do
273
+ tree = Yasuri::MapNode.new('parent', [
274
+ Yasuri::TextNode.new('/html/body/p[1]', "content01"),
275
+ Yasuri::TextNode.new('/html/body/p[2]', "content02"),
276
+ ])
277
+ actual_json = Yasuri.tree2json(tree)
278
+
279
+ expected_json = %q| { "node" : "map",
280
+ "name" : "parent",
281
+ "children" : [
282
+ { "node" : "text",
283
+ "name" : "content01",
284
+ "path" : "/html/body/p[1]"
285
+ },
286
+ { "node" : "text",
287
+ "name" : "content02",
288
+ "path" : "/html/body/p[2]"
289
+ }
290
+ ]
291
+ }|
292
+ expected = Yasuri.tree2json(Yasuri.json2tree(expected_json))
293
+ actual = Yasuri.tree2json(Yasuri.json2tree(actual_json))
294
+ expect(actual).to match expected
295
+ end
296
+
251
297
  it "return LinksNode/TextNode" do
252
298
  tree = Yasuri::LinksNode.new('/html/body/a', "root", [
253
299
  Yasuri::TextNode.new('/html/body/p', "content"),
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: yasuri
3
3
  version: !ruby/object:Gem::Version
4
- version: 3.0.0
4
+ version: 3.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - TAC
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-03-18 00:00:00.000000000 Z
11
+ date: 2021-03-21 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -160,6 +160,7 @@ files:
160
160
  - lib/yasuri/version.rb
161
161
  - lib/yasuri/yasuri.rb
162
162
  - lib/yasuri/yasuri_links_node.rb
163
+ - lib/yasuri/yasuri_map_node.rb
163
164
  - lib/yasuri/yasuri_node.rb
164
165
  - lib/yasuri/yasuri_node_generator.rb
165
166
  - lib/yasuri/yasuri_paginate_node.rb
@@ -181,6 +182,7 @@ files:
181
182
  - spec/servers/httpserver.rb
182
183
  - spec/spec_helper.rb
183
184
  - spec/yasuri_links_node_spec.rb
185
+ - spec/yasuri_map_spec.rb
184
186
  - spec/yasuri_node_spec.rb
185
187
  - spec/yasuri_paginate_node_spec.rb
186
188
  - spec/yasuri_spec.rb
@@ -227,6 +229,7 @@ test_files:
227
229
  - spec/servers/httpserver.rb
228
230
  - spec/spec_helper.rb
229
231
  - spec/yasuri_links_node_spec.rb
232
+ - spec/yasuri_map_spec.rb
230
233
  - spec/yasuri_node_spec.rb
231
234
  - spec/yasuri_paginate_node_spec.rb
232
235
  - spec/yasuri_spec.rb