baiduserp 2.1.14 → 2.2.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d5fe173a0b067ff22f37a68ab8cd4d633b556630
4
- data.tar.gz: e42a0a457467435480fd7cf7e82dfbfac88b9691
3
+ metadata.gz: 1d1c2a61259d5e3134d0566e002ba551a791b06c
4
+ data.tar.gz: 8b3f78ad2190a17a2f530019c0b0719bb9abfb93
5
5
  SHA512:
6
- metadata.gz: c4d75fa5fb5429aaa2e9a0293203dd624b4724aab1a1a7e5f3b39dba74b40a9c2bcbd82ec21c6318bc8f8955fa70a7d8a17c7dfef57049ba4e8e8f8366d6ab55
7
- data.tar.gz: 90f9f95d9f822d278f0c755ce0f4048b6201f089f999ebf9b3976981d504e35b3f1b09ddf5d85a698e002f56fd30f769689f0e8d56c737f58d815a31a53156ca
6
+ metadata.gz: 859d3c01765741cfe50eed92a05f5cbec93c68c003d13d2a2dad2e6b85e22d63d2171ded85fa3cd2271d7b830c522bfa95b3b27f91a269af621f811f23c3a87a
7
+ data.tar.gz: 5847dd8a81f4c61a0f72d2d8a2e6cd91e4aa1e9eed32f3c495453e31fe006a5c8162f902ebf2e831adb16731114fcb6b7d4a4867ed8ce575aecc39e2feca14c4
data/README.md CHANGED
@@ -1,231 +1 @@
1
- # Baiduserp
2
-
3
- 此gem的目的是专门用来解析百度的搜索结果页.并以最大限度获取SERP结果页面所能拿到的信息为目的.
4
- (注意目前这并不是一个批量处理关键词排名的程序, 但可以作为一个批量排名查询软件中解析百度SERP页面的模块)
5
-
6
- ## 特点
7
- ### 解析SERP结果尽量全面
8
- 众所周知百度的SERP页面现在越来越复杂,左侧的各种新样式层出不穷.右侧也增加了很多内容.
9
- 这个GEM的功能就是把SERP页面解析成ruby中的数据结构.
10
-
11
- 做SERP页面分析的时候有可能会想要分析页面上各种信息, 如SEO排名, SEM排名, 竞争对手排名,
12
- 标题/描述文字, 还有相关关键词, 右侧相关推荐信息, 是否有百度开放平台等...
13
-
14
- 此gem会把上述各种各样的信息都解析出来, 供后续分析使用. 并且如果使用量越来越大, 或者百度又出新产品的话,
15
- 也可以增加新模块的解析.
16
-
17
-
18
- ### 提供命令行接口(既可以测试用,也可以以JSON格式输出)
19
- 除了提供ruby调用外, 使用其他编程语言的也可以用命令行的接口, 使用JSON格式输出结果数据.
20
- 详细使用说明见下文.
21
-
22
- ### 已知问题
23
- 目前这个只是一个基本能用的版本,可能会有各种各样的问题,欢迎提BUG. [已知问题列表](https://github.com/semseo/baiduserp/issues).
24
-
25
- ## Installation
26
-
27
- 1 系统要求
28
-
29
- Linux或Mac. Linux最好使用新版本的Ubuntu或Fedora系列.
30
-
31
- 2 安装ruby环境
32
-
33
- 只支持ruby1.9及以上. 最好的安装ruby的方法是通过[RVM](https://rvm.io/),RVM的使用方法可以参考这个页面[http://ruby-china.org/wiki/install_ruby_guide](http://ruby-china.org/wiki/install_ruby_guide), 虽然同时安装了一些不需要的rails相关的软件, 但是介绍很详细.
34
-
35
- 在最新的Ubuntu或Fedora系列的Linux中,也可以通过apt-get或yum安装ruby1.9.
36
-
37
- 3 安装gem依赖
38
-
39
- 需要依赖nokogiri这个gem.而这个gem需要系统中的两个库.
40
- 所以在ubuntu或者fedora下需要
41
-
42
- ```
43
- $ sudo apt-get install libxslt-dev libxslt libxml2-dev libxml2 # ubuntu
44
- $ sudo yum install libxml2-devel libxml2 libxslt libxslt-devel # fedora
45
- ```
46
-
47
- 以上依赖安装完成后,
48
-
49
- `$ gem install nokogiri`
50
-
51
- 4 最后我们安装 baiduserp gem
52
-
53
- `$ gem install baiduserp`
54
-
55
- ## Usage
56
-
57
- ruby 代码示例
58
-
59
- ```
60
- require 'baiduserp'
61
- require 'open-uri'
62
- require 'pp'
63
-
64
- pp Baiduserp.search 'keyword'
65
-
66
- pp Baiduserp.parse open(http://www.baidu.com/s?wd=keyword).read.encode('UTF-8')
67
-
68
- ```
69
-
70
- 另外为了方便非ruby程序使用以及一次性调试,也提供了命令行调用方法,可以通过JSON格式交换数据:
71
-
72
- ```
73
- $ baiduserp -h
74
- Usage:
75
- 1. baiduserp -s 'keyword' # search 'keyword' and print parse result
76
- 2. baiduserp -s 'keyword' -o output.json # -o means save result to a file
77
- 3. baiduserp -f 'file path' # parse html source code from file
78
- 4. baiduserp -s 'keyword' -j # search 'keyword' and print parse result in JSON format
79
- -s, --search Keyword Search Keyword & Parse SERP
80
- -j, --jsonprint Print result in JSON format
81
- -o, --output Output Save Result to File in JSON format
82
- -f, --file File Parse Local File
83
- ```
84
-
85
- 最终结果采用了哈希表和数组相互嵌套的数据结构.结果示例如下:
86
-
87
- ```
88
- $ baiduserp -s 香港
89
- {:ads_right=>
90
- [{:rank=>1,
91
- :title=>"预订香港酒店上携程,全景图..",
92
- :content=>"订香港酒店,享受有房保障,服务好,折扣低,返现高达201元,订香港酒店上携程超划算!",
93
- :site=>"www.ctrip.com"},
94
- {:rank=>2,
95
- :title=>"香港-去哪儿网度假频道,聪明..",
96
- :content=>"香港-去哪儿网度假频道,比价首选!180000条报价实时更新,先比价后出行!",
97
- :site=>"dujia.qunar.com"},
98
- {:rank=>3,
99
- :title=>"香港香港怎么玩最划算?",
100
- :content=>"深圳旅行社香港,香港旅游您的超值之选!天天出团港澳游专线,缤纷全程绝不",
101
- :site=>"www.sztygl128.com"},
102
- {:rank=>4,
103
- :title=>"香港旅游攻略-150000条点评",
104
- :content=>"还没来过香港?507个景点都玩遍?到到网告诉你网友怎么玩(150000张游记照片)!",
105
- :site=>"www.daodao.com"},
106
- {:rank=>5,
107
- :title=>"香港旅游首选北京青年旅行社..",
108
- :content=>"北京青年旅行社专业香港旅游旅行社,高品质服务,天天折扣价,",
109
- :site=>"www.hqly8.com"},
110
- {:rank=>6,
111
- :title=>"香港旅游特价啦!香港旅游价..",
112
- :content=>"本社邀你一起体验超值香港旅游,全程绝无强制购物,行程安排合理.",
113
- :site=>"www.cctbj.net"},
114
- {:rank=>7,
115
- :title=>"香港旅游线路",
116
- :content=>"北京旅行社提供香港咨询服务,多条精品旅游线路供您选择.",
117
- :site=>"www.ctslyw.com"},
118
- {:rank=>8,
119
- :title=>"全新香港旅游报价,香港旅游..",
120
- :content=>"北京国际旅行社,精选多条香港旅游线路,信誉保证,全程无隐性消费.",
121
- :site=>"www.quly8.net"}],
122
- :ads_top=>
123
- [{:rank=>1,
124
- :title=>"香港酒店预订 在Agoda立享1-7折",
125
- :content=>"香港酒店预订,尽在Agoda,网上订购低价回馈,为您节省75%.",
126
- :site=>"www.agoda.com"},
127
- {:rank=>2,
128
- :title=>"香港酒店预订 在Agoda立享1-7折",
129
- :content=>"香港酒店预订,尽在Agoda,网上订购低价回馈,为您节省75%.",
130
- :site=>"www.agoda.com"}],
131
- :pinpaizhuanqu=>false,
132
- :ranks=>
133
- [{:rank=>1,
134
- :url=>
135
- "http://baike.baidu.com/link?url=Ujomxkw-4Whq7C7TI6do9nxHr3G0sO6ywJ3SZfr-lX4qQiht-2rnuGomrclwc4bJ",
136
- :title=>"香港_百度百科",
137
- :content=>nil,
138
- :mu=>"http://baike.baidu.com/view/2607.htm",
139
- :baiduopen=>false},
140
- {:rank=>2,
141
- :url=>"http://lvyou.baidu.com/xianggang/",
142
- :title=>"2013香港旅游攻略_香港景点线路游记_百度旅游",
143
- :content=>nil,
144
- :mu=>"http://lvyou.baidu.com/xianggang/",
145
- :baiduopen=>false},
146
- {:rank=>3,
147
- :url=>
148
- "http://image.baidu.com/i?tn=baiduimage&ct=201326592&lm=-1&cl=2&fr=ala1&word=%CF%E3%B8%DB",
149
- :title=>"香港_百度图片 - 举报图片",
150
- :content=>nil,
151
- :mu=>
152
- "http://image.baidu.com/i?tn=baiduimage&ct=201326592&lm=-1&cl=2&fr=ala1&word=%CF%E3%B8%DB",
153
- :baiduopen=>false},
154
- {:rank=>4,
155
- :url=>"http://www.gov.hk/sc/residents/",
156
- :title=>"GovHK 香港政府一站通:本港居民",
157
- :content=>
158
- "香港政府为当地居民提供的资讯和服务,内容包括通讯及科技、文化、康乐及运动、教育及培训、就业、环境、政府、法律及治安、保健及医疗服务、房屋及社会服务、入境事务...",
159
- :mu=>nil,
160
- :baiduopen=>false},
161
- {:rank=>5,
162
- :url=>"http://www.baidu.com/s?rtt=2&tn=baiduwb&rn=20&cl=2&wd=%CF%E3%B8%DB",
163
- :title=>"香港的最新微博结果",
164
- :content=>nil,
165
- :mu=>"http://www.baidu.com/s?rtt=2&tn=baiduwb&rn=20&cl=2&wd=%CF%E3%B8%DB",
166
- :baiduopen=>false},
167
- {:rank=>6,
168
- :url=>"http://tieba.baidu.com/f?kw=%CF%E3%B8%DB&fr=ala0",
169
- :title=>"香港吧 百度贴吧",
170
- :content=>
171
- "月活跃用户:38万人  累计发贴:202万 图片(1856)  |  视频(61)  |  精品贴(335) 香港和上海的夜景那个美?????????? 点击:439 回复:259 最近怎么那么多自以为漂亮的S!B女求认证啊。 点击:303 回复:69 什么时候,去香港不必签证,和去北京上海一样容... 点击:839 回复:187 查看更多香港吧内容>> tieba.baidu.com/香港?fr=ala0 2013-10-18",
172
- :mu=>"http://tieba.baidu.com/f?kw=%CF%E3%B8%DB&fr=ala0",
173
- :baiduopen=>false},
174
- {:rank=>7,
175
- :url=>"http://www.weather.com.cn/html/weather/101320101.shtml",
176
- :title=>"香港天气预报_一周天气预报_中国天气网 - 最近访问:",
177
- :content=>nil,
178
- :mu=>"http://www.weather.com.cn/html/weather/101320101.shtml",
179
- :baiduopen=>true},
180
- {:rank=>8,
181
- :url=>"http://www.mafengwo.cn/travel-scenic-spot/mafengwo/10189.html",
182
- :title=>"2013香港旅游攻略,香港自助游攻略,蚂蜂窝香港出游攻略游记 - 蚂蜂窝",
183
- :content=>
184
- "在香港寻吃完全就是一场舌尖的盛宴,从街边小吃到世界顶级的米其林餐厅任您选择,茶餐厅、早茶、烧腊和及甜品极具港式风味,世界各地的美食料理也一个不落单。 香港...",
185
- :mu=>nil,
186
- :baiduopen=>false},
187
- {:rank=>9,
188
- :url=>"http://hongkong.cncn.com/",
189
- :title=>"香港旅游攻略_香港香港旅游景点_香港旅游网",
190
- :content=>
191
- "香港欣欣旅游网,提供香港香港旅游景点推荐、10月香港旅游攻略、香港旅行社、香港旅游线路、香港酒店预订、香港旅游地图等出行指南及旅游服务●欣欣旅游网 CNCN.com ...",
192
- :mu=>nil,
193
- :baiduopen=>false},
194
- {:rank=>10,
195
- :url=>"http://www.baidu.com/s?tn=baidurt&rtt=1&bsst=1&wd=%CF%E3%B8%DB",
196
- :title=>"香港的最新相关信息",
197
- :content=>nil,
198
- :mu=>"http://www.baidu.com/s?tn=baidurt&rtt=1&bsst=1&wd=%CF%E3%B8%DB",
199
- :baiduopen=>false}],
200
- :related_keywords=>
201
- ["香港电影",
202
- "香港旅游",
203
- "香港天气",
204
- "香港大学",
205
- "香港购物",
206
- "香港地图",
207
- "香港电视剧",
208
- "香港地铁",
209
- "香港中文大学",
210
- "香港苹果官网"],
211
- :result_num=>100000000,
212
- :right_hotel=>nil,
213
- :right_personinfo=>nil,
214
- :right_relaperson=>
215
- [{:title=>"香港特别行政区行政区划", :names=>["油尖旺区", "九龙城区", "湾仔", "元朗区", "西贡区"]},
216
- {:title=>"全球性国际金融中心", :names=>["新加坡", "纽约", "东京", "伦敦"]},
217
- {:title=>"其他人还搜", :names=>["台北", "上海", "直布罗陀", "海南", "深圳"]}],
218
- :right_weather=>nil}
219
- ```
220
-
221
- ## Contributing
222
-
223
- 欢迎大家帮忙协助继续完善这个gem:
224
-
225
- 1. Fork it
226
- 2. Create your feature branch (`git checkout -b my-new-feature`)
227
- 3. Commit your changes (`git commit -am 'Add some feature'`)
228
- 4. Push to the branch (`git push origin my-new-feature`)
229
- 5. Create new Pull Request
230
-
231
- 或者可以到Issue页面提交问题,可以提BUG,新的需求,各种建议,等等.
1
+ 解析百度的搜索结果页面, 并返回结构化数据以进行后续分析.
data/bin/baiduserp CHANGED
@@ -4,50 +4,56 @@ require 'baiduserp'
4
4
  require 'optparse'
5
5
  require 'json'
6
6
  require 'pp'
7
+ require 'docopt'
7
8
 
8
- usage = "Usage:
9
+ cmd = File.basename(__FILE__)
10
+
11
+ doc = <<DOCOPT
9
12
  1. baiduserp -s 'keyword' # search 'keyword' and print parse result
10
13
  2. baiduserp -s 'keyword' -o output.json # -o means save result to a file
11
14
  3. baiduserp -f 'file path' # parse html source code from file
12
15
  4. baiduserp -s 'keyword' -j # search 'keyword' and print parse result in JSON format
13
- "
14
-
15
- options = {}
16
- OptionParser.new do |opts|
17
- opts.banner = usage
18
-
19
- opts.on("-s Keyword", "--search Keyword", "Search Keyword & Parse SERP") do |v|
20
- options[:keyword] = v
21
- end
22
-
23
- opts.on("-j","--jsonprint","Print result in JSON format") do |v|
24
- options[:jsonprint] = v
25
- end
26
16
 
27
- opts.on("-o Output", "--output Output", "Save Result to File in JSON format") do |v|
28
- options[:output] = v
29
- end
30
-
31
- opts.on("-f File", "--file File", "Parse Local File") do |v|
32
- options[:file] = v
33
- end
34
- end.parse!
17
+ Usage:
18
+ #{cmd} [options]
19
+
20
+ Options:
21
+ -h --help show this help message and exit
22
+ -v --version show version and exit
23
+ -a --analyse Name analyse as the given name
24
+ --keywords File uses with -a, import give keywords File before search
25
+ -s --search Keyword search Keyword and show result
26
+ -f --file File parse local file or given url
27
+ -j --json print JSON output
28
+ -o --output File output JSON result to File
29
+
30
+ DOCOPT
31
+
32
+ begin
33
+ options = Docopt::docopt(doc, version: Baiduserp::VERSION)
34
+ # pp options
35
+ rescue Docopt::Exit => e
36
+ puts e.message
37
+ end
35
38
 
36
39
  result = ''
37
-
38
- if options[:file].nil?
39
- result = Baiduserp.search options[:keyword]
40
+ if options['--analyse']
41
+ analyse = Baiduserp.analyse(options['--analyse'])
42
+ analyse.import_keywords(options('--keywords'))
43
+ analyse.search
44
+ result = 'Analyse finished!'
45
+ elsif options['--search']
46
+ result = Baiduserp.search options['--search']
47
+ elsif options['--file']
48
+ result = Baiduserp.parse_file options['--file']
40
49
  else
41
- result = Baiduserp.parse_file options[:file]
50
+ puts "At least given one of -a/-s/-f"
42
51
  end
43
52
 
44
- if options[:output].nil?
45
- if options[:jsonprint].nil?
46
- pp result
47
- else
48
- puts result.to_json
49
- end
53
+ if options['--json']
54
+ puts result.to_json
50
55
  else
51
- open(options[:output],'w').puts result.to_json
56
+ pp result
52
57
  end
53
58
 
59
+ open(options['--output'],'w').puts result.to_json if options['--output']
data/lib/baiduserp.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  require "baiduserp/version"
2
- require 'baiduserp/parser'
2
+ require "baiduserp/parser"
3
+ require "baiduserp/analyser"
3
4
 
4
5
  module Baiduserp
5
6
  def self.search(keyword,page=1)
@@ -17,4 +18,8 @@ module Baiduserp
17
18
  def self.parse_file(file_path)
18
19
  Parser.new.parse_file file_path
19
20
  end
21
+
22
+ def self.analyse(name,attrs={})
23
+ Analyser.new(name,attrs)
24
+ end
20
25
  end
@@ -0,0 +1,17 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :keywords do
4
+ primary_key :id
5
+ String :term
6
+ Integer :weight
7
+ String :category
8
+
9
+ index :term
10
+ end
11
+
12
+ end
13
+
14
+ down do
15
+ drop_table :keywords
16
+ end
17
+ end
@@ -0,0 +1,17 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :htmls do
4
+ primary_key :id
5
+ foreign_key :keyword_id, :keywords
6
+ Date :date
7
+ String :content, :text => true
8
+
9
+ index :date
10
+ end
11
+ end
12
+
13
+ down do
14
+ drop_table :htmls
15
+ end
16
+ end
17
+
@@ -0,0 +1,16 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :serps do
4
+ primary_key :id
5
+ foreign_key :keyword_id, :keywords
6
+ Date :date
7
+ String :content, :text => true
8
+
9
+ index :date
10
+ end
11
+ end
12
+
13
+ down do
14
+ drop_table :serps
15
+ end
16
+ end
@@ -0,0 +1,69 @@
1
+ require 'sequel'
2
+ require 'csv'
3
+ require 'date'
4
+ require 'yaml'
5
+
6
+ module Baiduserp
7
+ class Analyser
8
+ # Dir[File.expand_path('../analyser/*.rb', __FILE__)].each{|f| require f}
9
+
10
+ def initialize(name,attrs={})
11
+ @db_file = name + ".db"
12
+ @attrs = attrs
13
+ @keywords_imported = File.exists?(@db_file)
14
+
15
+ @db = Sequel.connect("sqlite://" + @db_file)
16
+
17
+ migrate!
18
+
19
+ @keywords = Class.new(Sequel::Model) do
20
+ set_dataset :keywords
21
+ end
22
+
23
+ @htmls = Class.new(Sequel::Model) do
24
+ set_dataset :htmls
25
+ end
26
+
27
+ @serps = Class.new(Sequel::Model) do
28
+ set_dataset :serps
29
+ end
30
+
31
+ import_keywords unless @keywords_imported
32
+ end
33
+
34
+ def run
35
+
36
+ end
37
+
38
+ def migrate!
39
+ Sequel.extension :migration, :core_extensions
40
+ Sequel::Migrator.apply(@db, File.expand_path('../analyser-migrations/',__FILE__))
41
+ end
42
+
43
+ def import_keywords(file=@attrs[:keywords])
44
+ CSV.foreach(file) do |l|
45
+ @keywords.insert(:term => l[0], :weight => l[1], :category => l[2])
46
+ end
47
+ end
48
+
49
+ def search(date=Date.today)
50
+ @keywords.each do |k|
51
+ next if @htmls.where(:date => date, :keyword_id => k[:id]).count > 0
52
+ puts k.to_hash
53
+ html = Baiduserp.get_search_html(k[:term])
54
+ serp = Baiduserp.parse(html)
55
+ @htmls.insert(:keyword_id => k[:id], :date => date, :content => html)
56
+ @serps.insert(:keyword_id => k[:id], :date => date, :content => YAML.dump(serp))
57
+ end
58
+ end
59
+
60
+ def _analyse_competitors(date=Date.today)
61
+ sites = Hash.new(0)
62
+ @serps.where(:date => date).each do |serp|
63
+ serp = YAML.load(serp[:content])
64
+ serp.sem_sites.each {|site| sites[site] += 1}
65
+ end
66
+ puts YAML.dump(sites)
67
+ end
68
+ end
69
+ end
@@ -41,6 +41,10 @@ module Baiduserp
41
41
  response = self.class.get_serp(url)
42
42
  end
43
43
 
44
+ if response.headers['Content-Length'].nil?
45
+ response = self.class.get_serp(url,retries)
46
+ end
47
+
44
48
  if response.headers['Content-Length'].to_i != response.body.bytesize
45
49
  issue_file = "/tmp/baiduserp_crawler_issue_#{Time.now.strftime("%Y%m%d%H%M%S")}.html"
46
50
  open(issue_file,'w').puts(response.body)
@@ -8,7 +8,7 @@ require 'baiduserp/result'
8
8
 
9
9
  module Baiduserp
10
10
  class Parser
11
- Dir[File.expand_path('../../parsers/*.rb', __FILE__)].each{|f| require f}
11
+ Dir[File.expand_path('../parser/*.rb', __FILE__)].each{|f| require f}
12
12
 
13
13
  def parse(html)
14
14
  html = html.encode!('UTF-8','UTF-8',:invalid => :replace)
File without changes
@@ -3,7 +3,10 @@ class Baiduserp::Parser
3
3
  result = []
4
4
  rank = 0
5
5
 
6
- file[:doc].search('div#content_left').first.children.each do |div|
6
+ part = file[:doc].search('div#content_left').first
7
+ return result if part.nil?
8
+
9
+ part.children.each do |div|
7
10
  id = div['id'].to_i
8
11
  break if id > 0 && id < 3000
9
12
  next unless div['class'].to_s.include?('ec_pp_f')
File without changes
@@ -0,0 +1,8 @@
1
+ class Baiduserp::Parser
2
+ def _parse_pinpaizhuanqu(file)
3
+ part = file[:doc].search("div[@id='content_left']").first
4
+ return false if part.nil?
5
+
6
+ part.children[2].name == 'script'
7
+ end
8
+ end
@@ -1,7 +1,10 @@
1
1
  class Baiduserp::Parser
2
2
  def _parse_ranks(file)
3
3
  result = []
4
- file[:doc].search("div[@id='content_left']").first.children.each do |table|
4
+ part = file[:doc].search("div[@id='content_left']").first
5
+ return result if part.nil?
6
+
7
+ part.children.each do |table|
5
8
  next if table.nil?
6
9
  id = table['id'].to_i
7
10
  next unless id > 0 && id < 3000
File without changes
File without changes
File without changes
File without changes
@@ -1,3 +1,3 @@
1
1
  module Baiduserp
2
- VERSION = "2.1.14"
2
+ VERSION = "2.2.9"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: baiduserp
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.14
4
+ version: 2.2.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - MingQian Zhang
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-11-28 00:00:00.000000000 Z
11
+ date: 2013-12-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -52,6 +52,34 @@ dependencies:
52
52
  - - '>='
53
53
  - !ruby/object:Gem::Version
54
54
  version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: sequel
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - '>='
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: docopt
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - '>='
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - '>='
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
55
83
  description: Parse Baidu SERP result page.
56
84
  email:
57
85
  - zmingqian@qq.com
@@ -60,24 +88,28 @@ executables:
60
88
  extensions: []
61
89
  extra_rdoc_files: []
62
90
  files:
91
+ - lib/baiduserp/analyser-migrations/001_create_keywords_table.rb
92
+ - lib/baiduserp/analyser-migrations/002_create_htmls_table.rb
93
+ - lib/baiduserp/analyser-migrations/003_create_serps_table.rb
94
+ - lib/baiduserp/analyser.rb
63
95
  - lib/baiduserp/client.rb
64
96
  - lib/baiduserp/helper.rb
97
+ - lib/baiduserp/parser/ads_right.rb
98
+ - lib/baiduserp/parser/ads_top.rb
99
+ - lib/baiduserp/parser/con_ar.rb
100
+ - lib/baiduserp/parser/pinpaizhuanqu.rb
101
+ - lib/baiduserp/parser/ranks.rb
102
+ - lib/baiduserp/parser/related_keywords.rb
103
+ - lib/baiduserp/parser/result_num.rb
104
+ - lib/baiduserp/parser/right_hotel.rb
105
+ - lib/baiduserp/parser/right_personinfo.rb
106
+ - lib/baiduserp/parser/right_relaperson.rb
107
+ - lib/baiduserp/parser/right_weather.rb
108
+ - lib/baiduserp/parser/zhixin.rb
65
109
  - lib/baiduserp/parser.rb
66
110
  - lib/baiduserp/result.rb
67
111
  - lib/baiduserp/version.rb
68
112
  - lib/baiduserp.rb
69
- - lib/parsers/ads_right.rb
70
- - lib/parsers/ads_top.rb
71
- - lib/parsers/con_ar.rb
72
- - lib/parsers/pinpaizhuanqu.rb
73
- - lib/parsers/ranks.rb
74
- - lib/parsers/related_keywords.rb
75
- - lib/parsers/result_num.rb
76
- - lib/parsers/right_hotel.rb
77
- - lib/parsers/right_personinfo.rb
78
- - lib/parsers/right_relaperson.rb
79
- - lib/parsers/right_weather.rb
80
- - lib/parsers/zhixin.rb
81
113
  - bin/baiduserp
82
114
  - README.md
83
115
  - lib/baiduserp/user_agents.yml
@@ -1,5 +0,0 @@
1
- class Baiduserp::Parser
2
- def _parse_pinpaizhuanqu(file)
3
- file[:doc].search("div[@id='content_left']").first.children[2].name == 'script'
4
- end
5
- end