baiduserp 2.1.14 → 2.2.9

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: d5fe173a0b067ff22f37a68ab8cd4d633b556630
4
- data.tar.gz: e42a0a457467435480fd7cf7e82dfbfac88b9691
3
+ metadata.gz: 1d1c2a61259d5e3134d0566e002ba551a791b06c
4
+ data.tar.gz: 8b3f78ad2190a17a2f530019c0b0719bb9abfb93
5
5
  SHA512:
6
- metadata.gz: c4d75fa5fb5429aaa2e9a0293203dd624b4724aab1a1a7e5f3b39dba74b40a9c2bcbd82ec21c6318bc8f8955fa70a7d8a17c7dfef57049ba4e8e8f8366d6ab55
7
- data.tar.gz: 90f9f95d9f822d278f0c755ce0f4048b6201f089f999ebf9b3976981d504e35b3f1b09ddf5d85a698e002f56fd30f769689f0e8d56c737f58d815a31a53156ca
6
+ metadata.gz: 859d3c01765741cfe50eed92a05f5cbec93c68c003d13d2a2dad2e6b85e22d63d2171ded85fa3cd2271d7b830c522bfa95b3b27f91a269af621f811f23c3a87a
7
+ data.tar.gz: 5847dd8a81f4c61a0f72d2d8a2e6cd91e4aa1e9eed32f3c495453e31fe006a5c8162f902ebf2e831adb16731114fcb6b7d4a4867ed8ce575aecc39e2feca14c4
data/README.md CHANGED
@@ -1,231 +1 @@
1
- # Baiduserp
2
-
3
- 此gem的目的是专门用来解析百度的搜索结果页.并以最大限度获取SERP结果页面所能拿到的信息为目的.
4
- (注意目前这并不是一个批量处理关键词排名的程序, 但可以作为一个批量排名查询软件中解析百度SERP页面的模块)
5
-
6
- ## 特点
7
- ### 解析SERP结果尽量全面
8
- 众所周知百度的SERP页面现在越来越复杂,左侧的各种新样式层出不穷.右侧也增加了很多内容.
9
- 这个GEM的功能就是把SERP页面解析成ruby中的数据结构.
10
-
11
- 做SERP页面分析的时候有可能会想要分析页面上各种信息, 如SEO排名, SEM排名, 竞争对手排名,
12
- 标题/描述文字, 还有相关关键词, 右侧相关推荐信息, 是否有百度开放平台等...
13
-
14
- 此gem会把上述各种各样的信息都解析出来, 供后续分析使用. 并且如果使用量越来越大, 或者百度又出新产品的话,
15
- 也可以增加新模块的解析.
16
-
17
-
18
- ### 提供命令行接口(既可以测试用,也可以以JSON格式输出)
19
- 除了提供ruby调用外, 使用其他编程语言的也可以用命令行的接口, 使用JSON格式输出结果数据.
20
- 详细使用说明见下文.
21
-
22
- ### 已知问题
23
- 目前这个只是一个基本能用的版本,可能会有各种各样的问题,欢迎提BUG. [已知问题列表](https://github.com/semseo/baiduserp/issues).
24
-
25
- ## Installation
26
-
27
- 1 系统要求
28
-
29
- Linux或Mac. Linux最好使用新版本的Ubuntu或Fedora系列.
30
-
31
- 2 安装ruby环境
32
-
33
- 只支持ruby1.9及以上. 最好的安装ruby的方法是通过[RVM](https://rvm.io/),RVM的使用方法可以参考这个页面[http://ruby-china.org/wiki/install_ruby_guide](http://ruby-china.org/wiki/install_ruby_guide), 虽然同时安装了一些不需要的rails相关的软件, 但是介绍很详细.
34
-
35
- 在最新的Ubuntu或Fedora系列的Linux中,也可以通过apt-get或yum安装ruby1.9.
36
-
37
- 3 安装gem依赖
38
-
39
- 需要依赖nokogiri这个gem.而这个gem需要系统中的两个库.
40
- 所以在ubuntu或者fedora下需要
41
-
42
- ```
43
- $ sudo apt-get install libxslt-dev libxslt libxml2-dev libxml2 # ubuntu
44
- $ sudo yum install libxml2-devel libxml2 libxslt libxslt-devel # fedora
45
- ```
46
-
47
- 以上依赖安装完成后,
48
-
49
- `$ gem install nokogiri`
50
-
51
- 4 最后我们安装 baiduserp gem
52
-
53
- `$ gem install baiduserp`
54
-
55
- ## Usage
56
-
57
- ruby 代码示例
58
-
59
- ```
60
- require 'baiduserp'
61
- require 'open-uri'
62
- require 'pp'
63
-
64
- pp Baiduserp.search 'keyword'
65
-
66
- pp Baiduserp.parse open(http://www.baidu.com/s?wd=keyword).read.encode('UTF-8')
67
-
68
- ```
69
-
70
- 另外为了方便非ruby程序使用以及一次性调试,也提供了命令行调用方法,可以通过JSON格式交换数据:
71
-
72
- ```
73
- $ baiduserp -h
74
- Usage:
75
- 1. baiduserp -s 'keyword' # search 'keyword' and print parse result
76
- 2. baiduserp -s 'keyword' -o output.json # -o means save result to a file
77
- 3. baiduserp -f 'file path' # parse html source code from file
78
- 4. baiduserp -s 'keyword' -j # search 'keyword' and print parse result in JSON format
79
- -s, --search Keyword Search Keyword & Parse SERP
80
- -j, --jsonprint Print result in JSON format
81
- -o, --output Output Save Result to File in JSON format
82
- -f, --file File Parse Local File
83
- ```
84
-
85
- 最终结果采用了哈希表和数组相互嵌套的数据结构.结果示例如下:
86
-
87
- ```
88
- $ baiduserp -s 香港
89
- {:ads_right=>
90
- [{:rank=>1,
91
- :title=>"预订香港酒店上携程,全景图..",
92
- :content=>"订香港酒店,享受有房保障,服务好,折扣低,返现高达201元,订香港酒店上携程超划算!",
93
- :site=>"www.ctrip.com"},
94
- {:rank=>2,
95
- :title=>"香港-去哪儿网度假频道,聪明..",
96
- :content=>"香港-去哪儿网度假频道,比价首选!180000条报价实时更新,先比价后出行!",
97
- :site=>"dujia.qunar.com"},
98
- {:rank=>3,
99
- :title=>"香港香港怎么玩最划算?",
100
- :content=>"深圳旅行社香港,香港旅游您的超值之选!天天出团港澳游专线,缤纷全程绝不",
101
- :site=>"www.sztygl128.com"},
102
- {:rank=>4,
103
- :title=>"香港旅游攻略-150000条点评",
104
- :content=>"还没来过香港?507个景点都玩遍?到到网告诉你网友怎么玩(150000张游记照片)!",
105
- :site=>"www.daodao.com"},
106
- {:rank=>5,
107
- :title=>"香港旅游首选北京青年旅行社..",
108
- :content=>"北京青年旅行社专业香港旅游旅行社,高品质服务,天天折扣价,",
109
- :site=>"www.hqly8.com"},
110
- {:rank=>6,
111
- :title=>"香港旅游特价啦!香港旅游价..",
112
- :content=>"本社邀你一起体验超值香港旅游,全程绝无强制购物,行程安排合理.",
113
- :site=>"www.cctbj.net"},
114
- {:rank=>7,
115
- :title=>"香港旅游线路",
116
- :content=>"北京旅行社提供香港咨询服务,多条精品旅游线路供您选择.",
117
- :site=>"www.ctslyw.com"},
118
- {:rank=>8,
119
- :title=>"全新香港旅游报价,香港旅游..",
120
- :content=>"北京国际旅行社,精选多条香港旅游线路,信誉保证,全程无隐性消费.",
121
- :site=>"www.quly8.net"}],
122
- :ads_top=>
123
- [{:rank=>1,
124
- :title=>"香港酒店预订 在Agoda立享1-7折",
125
- :content=>"香港酒店预订,尽在Agoda,网上订购低价回馈,为您节省75%.",
126
- :site=>"www.agoda.com"},
127
- {:rank=>2,
128
- :title=>"香港酒店预订 在Agoda立享1-7折",
129
- :content=>"香港酒店预订,尽在Agoda,网上订购低价回馈,为您节省75%.",
130
- :site=>"www.agoda.com"}],
131
- :pinpaizhuanqu=>false,
132
- :ranks=>
133
- [{:rank=>1,
134
- :url=>
135
- "http://baike.baidu.com/link?url=Ujomxkw-4Whq7C7TI6do9nxHr3G0sO6ywJ3SZfr-lX4qQiht-2rnuGomrclwc4bJ",
136
- :title=>"香港_百度百科",
137
- :content=>nil,
138
- :mu=>"http://baike.baidu.com/view/2607.htm",
139
- :baiduopen=>false},
140
- {:rank=>2,
141
- :url=>"http://lvyou.baidu.com/xianggang/",
142
- :title=>"2013香港旅游攻略_香港景点线路游记_百度旅游",
143
- :content=>nil,
144
- :mu=>"http://lvyou.baidu.com/xianggang/",
145
- :baiduopen=>false},
146
- {:rank=>3,
147
- :url=>
148
- "http://image.baidu.com/i?tn=baiduimage&ct=201326592&lm=-1&cl=2&fr=ala1&word=%CF%E3%B8%DB",
149
- :title=>"香港_百度图片 - 举报图片",
150
- :content=>nil,
151
- :mu=>
152
- "http://image.baidu.com/i?tn=baiduimage&ct=201326592&lm=-1&cl=2&fr=ala1&word=%CF%E3%B8%DB",
153
- :baiduopen=>false},
154
- {:rank=>4,
155
- :url=>"http://www.gov.hk/sc/residents/",
156
- :title=>"GovHK 香港政府一站通:本港居民",
157
- :content=>
158
- "香港政府为当地居民提供的资讯和服务,内容包括通讯及科技、文化、康乐及运动、教育及培训、就业、环境、政府、法律及治安、保健及医疗服务、房屋及社会服务、入境事务...",
159
- :mu=>nil,
160
- :baiduopen=>false},
161
- {:rank=>5,
162
- :url=>"http://www.baidu.com/s?rtt=2&tn=baiduwb&rn=20&cl=2&wd=%CF%E3%B8%DB",
163
- :title=>"香港的最新微博结果",
164
- :content=>nil,
165
- :mu=>"http://www.baidu.com/s?rtt=2&tn=baiduwb&rn=20&cl=2&wd=%CF%E3%B8%DB",
166
- :baiduopen=>false},
167
- {:rank=>6,
168
- :url=>"http://tieba.baidu.com/f?kw=%CF%E3%B8%DB&fr=ala0",
169
- :title=>"香港吧 百度贴吧",
170
- :content=>
171
- "月活跃用户:38万人  累计发贴:202万 图片(1856)  |  视频(61)  |  精品贴(335) 香港和上海的夜景那个美?????????? 点击:439 回复:259 最近怎么那么多自以为漂亮的S!B女求认证啊。 点击:303 回复:69 什么时候,去香港不必签证,和去北京上海一样容... 点击:839 回复:187 查看更多香港吧内容>> tieba.baidu.com/香港?fr=ala0 2013-10-18",
172
- :mu=>"http://tieba.baidu.com/f?kw=%CF%E3%B8%DB&fr=ala0",
173
- :baiduopen=>false},
174
- {:rank=>7,
175
- :url=>"http://www.weather.com.cn/html/weather/101320101.shtml",
176
- :title=>"香港天气预报_一周天气预报_中国天气网 - 最近访问:",
177
- :content=>nil,
178
- :mu=>"http://www.weather.com.cn/html/weather/101320101.shtml",
179
- :baiduopen=>true},
180
- {:rank=>8,
181
- :url=>"http://www.mafengwo.cn/travel-scenic-spot/mafengwo/10189.html",
182
- :title=>"2013香港旅游攻略,香港自助游攻略,蚂蜂窝香港出游攻略游记 - 蚂蜂窝",
183
- :content=>
184
- "在香港寻吃完全就是一场舌尖的盛宴,从街边小吃到世界顶级的米其林餐厅任您选择,茶餐厅、早茶、烧腊和及甜品极具港式风味,世界各地的美食料理也一个不落单。 香港...",
185
- :mu=>nil,
186
- :baiduopen=>false},
187
- {:rank=>9,
188
- :url=>"http://hongkong.cncn.com/",
189
- :title=>"香港旅游攻略_香港香港旅游景点_香港旅游网",
190
- :content=>
191
- "香港欣欣旅游网,提供香港香港旅游景点推荐、10月香港旅游攻略、香港旅行社、香港旅游线路、香港酒店预订、香港旅游地图等出行指南及旅游服务●欣欣旅游网 CNCN.com ...",
192
- :mu=>nil,
193
- :baiduopen=>false},
194
- {:rank=>10,
195
- :url=>"http://www.baidu.com/s?tn=baidurt&rtt=1&bsst=1&wd=%CF%E3%B8%DB",
196
- :title=>"香港的最新相关信息",
197
- :content=>nil,
198
- :mu=>"http://www.baidu.com/s?tn=baidurt&rtt=1&bsst=1&wd=%CF%E3%B8%DB",
199
- :baiduopen=>false}],
200
- :related_keywords=>
201
- ["香港电影",
202
- "香港旅游",
203
- "香港天气",
204
- "香港大学",
205
- "香港购物",
206
- "香港地图",
207
- "香港电视剧",
208
- "香港地铁",
209
- "香港中文大学",
210
- "香港苹果官网"],
211
- :result_num=>100000000,
212
- :right_hotel=>nil,
213
- :right_personinfo=>nil,
214
- :right_relaperson=>
215
- [{:title=>"香港特别行政区行政区划", :names=>["油尖旺区", "九龙城区", "湾仔", "元朗区", "西贡区"]},
216
- {:title=>"全球性国际金融中心", :names=>["新加坡", "纽约", "东京", "伦敦"]},
217
- {:title=>"其他人还搜", :names=>["台北", "上海", "直布罗陀", "海南", "深圳"]}],
218
- :right_weather=>nil}
219
- ```
220
-
221
- ## Contributing
222
-
223
- 欢迎大家帮忙协助继续完善这个gem:
224
-
225
- 1. Fork it
226
- 2. Create your feature branch (`git checkout -b my-new-feature`)
227
- 3. Commit your changes (`git commit -am 'Add some feature'`)
228
- 4. Push to the branch (`git push origin my-new-feature`)
229
- 5. Create new Pull Request
230
-
231
- 或者可以到Issue页面提交问题,可以提BUG,新的需求,各种建议,等等.
1
+ 解析百度的搜索结果页面, 并返回结构化数据以进行后续分析.
data/bin/baiduserp CHANGED
@@ -4,50 +4,56 @@ require 'baiduserp'
4
4
  require 'optparse'
5
5
  require 'json'
6
6
  require 'pp'
7
+ require 'docopt'
7
8
 
8
- usage = "Usage:
9
+ cmd = File.basename(__FILE__)
10
+
11
+ doc = <<DOCOPT
9
12
  1. baiduserp -s 'keyword' # search 'keyword' and print parse result
10
13
  2. baiduserp -s 'keyword' -o output.json # -o means save result to a file
11
14
  3. baiduserp -f 'file path' # parse html source code from file
12
15
  4. baiduserp -s 'keyword' -j # search 'keyword' and print parse result in JSON format
13
- "
14
-
15
- options = {}
16
- OptionParser.new do |opts|
17
- opts.banner = usage
18
-
19
- opts.on("-s Keyword", "--search Keyword", "Search Keyword & Parse SERP") do |v|
20
- options[:keyword] = v
21
- end
22
-
23
- opts.on("-j","--jsonprint","Print result in JSON format") do |v|
24
- options[:jsonprint] = v
25
- end
26
16
 
27
- opts.on("-o Output", "--output Output", "Save Result to File in JSON format") do |v|
28
- options[:output] = v
29
- end
30
-
31
- opts.on("-f File", "--file File", "Parse Local File") do |v|
32
- options[:file] = v
33
- end
34
- end.parse!
17
+ Usage:
18
+ #{cmd} [options]
19
+
20
+ Options:
21
+ -h --help show this help message and exit
22
+ -v --version show version and exit
23
+ -a --analyse Name analyse as the given name
24
+ --keywords File uses with -a, import give keywords File before search
25
+ -s --search Keyword search Keyword and show result
26
+ -f --file File parse local file or given url
27
+ -j --json print JSON output
28
+ -o --output File output JSON result to File
29
+
30
+ DOCOPT
31
+
32
+ begin
33
+ options = Docopt::docopt(doc, version: Baiduserp::VERSION)
34
+ # pp options
35
+ rescue Docopt::Exit => e
36
+ puts e.message
37
+ end
35
38
 
36
39
  result = ''
37
-
38
- if options[:file].nil?
39
- result = Baiduserp.search options[:keyword]
40
+ if options['--analyse']
41
+ analyse = Baiduserp.analyse(options['--analyse'])
42
+ analyse.import_keywords(options('--keywords'))
43
+ analyse.search
44
+ result = 'Analyse finished!'
45
+ elsif options['--search']
46
+ result = Baiduserp.search options['--search']
47
+ elsif options['--file']
48
+ result = Baiduserp.parse_file options['--file']
40
49
  else
41
- result = Baiduserp.parse_file options[:file]
50
+ puts "At least given one of -a/-s/-f"
42
51
  end
43
52
 
44
- if options[:output].nil?
45
- if options[:jsonprint].nil?
46
- pp result
47
- else
48
- puts result.to_json
49
- end
53
+ if options['--json']
54
+ puts result.to_json
50
55
  else
51
- open(options[:output],'w').puts result.to_json
56
+ pp result
52
57
  end
53
58
 
59
+ open(options['--output'],'w').puts result.to_json if options['--output']
data/lib/baiduserp.rb CHANGED
@@ -1,5 +1,6 @@
1
1
  require "baiduserp/version"
2
- require 'baiduserp/parser'
2
+ require "baiduserp/parser"
3
+ require "baiduserp/analyser"
3
4
 
4
5
  module Baiduserp
5
6
  def self.search(keyword,page=1)
@@ -17,4 +18,8 @@ module Baiduserp
17
18
  def self.parse_file(file_path)
18
19
  Parser.new.parse_file file_path
19
20
  end
21
+
22
+ def self.analyse(name,attrs={})
23
+ Analyser.new(name,attrs)
24
+ end
20
25
  end
@@ -0,0 +1,17 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :keywords do
4
+ primary_key :id
5
+ String :term
6
+ Integer :weight
7
+ String :category
8
+
9
+ index :term
10
+ end
11
+
12
+ end
13
+
14
+ down do
15
+ drop_table :keywords
16
+ end
17
+ end
@@ -0,0 +1,17 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :htmls do
4
+ primary_key :id
5
+ foreign_key :keyword_id, :keywords
6
+ Date :date
7
+ String :content, :text => true
8
+
9
+ index :date
10
+ end
11
+ end
12
+
13
+ down do
14
+ drop_table :htmls
15
+ end
16
+ end
17
+
@@ -0,0 +1,16 @@
1
+ Sequel.migration do
2
+ up do
3
+ create_table :serps do
4
+ primary_key :id
5
+ foreign_key :keyword_id, :keywords
6
+ Date :date
7
+ String :content, :text => true
8
+
9
+ index :date
10
+ end
11
+ end
12
+
13
+ down do
14
+ drop_table :serps
15
+ end
16
+ end
@@ -0,0 +1,69 @@
1
+ require 'sequel'
2
+ require 'csv'
3
+ require 'date'
4
+ require 'yaml'
5
+
6
+ module Baiduserp
7
+ class Analyser
8
+ # Dir[File.expand_path('../analyser/*.rb', __FILE__)].each{|f| require f}
9
+
10
+ def initialize(name,attrs={})
11
+ @db_file = name + ".db"
12
+ @attrs = attrs
13
+ @keywords_imported = File.exists?(@db_file)
14
+
15
+ @db = Sequel.connect("sqlite://" + @db_file)
16
+
17
+ migrate!
18
+
19
+ @keywords = Class.new(Sequel::Model) do
20
+ set_dataset :keywords
21
+ end
22
+
23
+ @htmls = Class.new(Sequel::Model) do
24
+ set_dataset :htmls
25
+ end
26
+
27
+ @serps = Class.new(Sequel::Model) do
28
+ set_dataset :serps
29
+ end
30
+
31
+ import_keywords unless @keywords_imported
32
+ end
33
+
34
+ def run
35
+
36
+ end
37
+
38
+ def migrate!
39
+ Sequel.extension :migration, :core_extensions
40
+ Sequel::Migrator.apply(@db, File.expand_path('../analyser-migrations/',__FILE__))
41
+ end
42
+
43
+ def import_keywords(file=@attrs[:keywords])
44
+ CSV.foreach(file) do |l|
45
+ @keywords.insert(:term => l[0], :weight => l[1], :category => l[2])
46
+ end
47
+ end
48
+
49
+ def search(date=Date.today)
50
+ @keywords.each do |k|
51
+ next if @htmls.where(:date => date, :keyword_id => k[:id]).count > 0
52
+ puts k.to_hash
53
+ html = Baiduserp.get_search_html(k[:term])
54
+ serp = Baiduserp.parse(html)
55
+ @htmls.insert(:keyword_id => k[:id], :date => date, :content => html)
56
+ @serps.insert(:keyword_id => k[:id], :date => date, :content => YAML.dump(serp))
57
+ end
58
+ end
59
+
60
+ def _analyse_competitors(date=Date.today)
61
+ sites = Hash.new(0)
62
+ @serps.where(:date => date).each do |serp|
63
+ serp = YAML.load(serp[:content])
64
+ serp.sem_sites.each {|site| sites[site] += 1}
65
+ end
66
+ puts YAML.dump(sites)
67
+ end
68
+ end
69
+ end
@@ -41,6 +41,10 @@ module Baiduserp
41
41
  response = self.class.get_serp(url)
42
42
  end
43
43
 
44
+ if response.headers['Content-Length'].nil?
45
+ response = self.class.get_serp(url,retries)
46
+ end
47
+
44
48
  if response.headers['Content-Length'].to_i != response.body.bytesize
45
49
  issue_file = "/tmp/baiduserp_crawler_issue_#{Time.now.strftime("%Y%m%d%H%M%S")}.html"
46
50
  open(issue_file,'w').puts(response.body)
@@ -8,7 +8,7 @@ require 'baiduserp/result'
8
8
 
9
9
  module Baiduserp
10
10
  class Parser
11
- Dir[File.expand_path('../../parsers/*.rb', __FILE__)].each{|f| require f}
11
+ Dir[File.expand_path('../parser/*.rb', __FILE__)].each{|f| require f}
12
12
 
13
13
  def parse(html)
14
14
  html = html.encode!('UTF-8','UTF-8',:invalid => :replace)
File without changes
@@ -3,7 +3,10 @@ class Baiduserp::Parser
3
3
  result = []
4
4
  rank = 0
5
5
 
6
- file[:doc].search('div#content_left').first.children.each do |div|
6
+ part = file[:doc].search('div#content_left').first
7
+ return result if part.nil?
8
+
9
+ part.children.each do |div|
7
10
  id = div['id'].to_i
8
11
  break if id > 0 && id < 3000
9
12
  next unless div['class'].to_s.include?('ec_pp_f')
File without changes
@@ -0,0 +1,8 @@
1
+ class Baiduserp::Parser
2
+ def _parse_pinpaizhuanqu(file)
3
+ part = file[:doc].search("div[@id='content_left']").first
4
+ return false if part.nil?
5
+
6
+ part.children[2].name == 'script'
7
+ end
8
+ end
@@ -1,7 +1,10 @@
1
1
  class Baiduserp::Parser
2
2
  def _parse_ranks(file)
3
3
  result = []
4
- file[:doc].search("div[@id='content_left']").first.children.each do |table|
4
+ part = file[:doc].search("div[@id='content_left']").first
5
+ return result if part.nil?
6
+
7
+ part.children.each do |table|
5
8
  next if table.nil?
6
9
  id = table['id'].to_i
7
10
  next unless id > 0 && id < 3000
File without changes
File without changes
File without changes
File without changes
@@ -1,3 +1,3 @@
1
1
  module Baiduserp
2
- VERSION = "2.1.14"
2
+ VERSION = "2.2.9"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: baiduserp
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.1.14
4
+ version: 2.2.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - MingQian Zhang
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-11-28 00:00:00.000000000 Z
11
+ date: 2013-12-02 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: nokogiri
@@ -52,6 +52,34 @@ dependencies:
52
52
  - - '>='
53
53
  - !ruby/object:Gem::Version
54
54
  version: '0'
55
+ - !ruby/object:Gem::Dependency
56
+ name: sequel
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - '>='
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :runtime
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - '>='
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: docopt
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - '>='
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :runtime
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - '>='
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
55
83
  description: Parse Baidu SERP result page.
56
84
  email:
57
85
  - zmingqian@qq.com
@@ -60,24 +88,28 @@ executables:
60
88
  extensions: []
61
89
  extra_rdoc_files: []
62
90
  files:
91
+ - lib/baiduserp/analyser-migrations/001_create_keywords_table.rb
92
+ - lib/baiduserp/analyser-migrations/002_create_htmls_table.rb
93
+ - lib/baiduserp/analyser-migrations/003_create_serps_table.rb
94
+ - lib/baiduserp/analyser.rb
63
95
  - lib/baiduserp/client.rb
64
96
  - lib/baiduserp/helper.rb
97
+ - lib/baiduserp/parser/ads_right.rb
98
+ - lib/baiduserp/parser/ads_top.rb
99
+ - lib/baiduserp/parser/con_ar.rb
100
+ - lib/baiduserp/parser/pinpaizhuanqu.rb
101
+ - lib/baiduserp/parser/ranks.rb
102
+ - lib/baiduserp/parser/related_keywords.rb
103
+ - lib/baiduserp/parser/result_num.rb
104
+ - lib/baiduserp/parser/right_hotel.rb
105
+ - lib/baiduserp/parser/right_personinfo.rb
106
+ - lib/baiduserp/parser/right_relaperson.rb
107
+ - lib/baiduserp/parser/right_weather.rb
108
+ - lib/baiduserp/parser/zhixin.rb
65
109
  - lib/baiduserp/parser.rb
66
110
  - lib/baiduserp/result.rb
67
111
  - lib/baiduserp/version.rb
68
112
  - lib/baiduserp.rb
69
- - lib/parsers/ads_right.rb
70
- - lib/parsers/ads_top.rb
71
- - lib/parsers/con_ar.rb
72
- - lib/parsers/pinpaizhuanqu.rb
73
- - lib/parsers/ranks.rb
74
- - lib/parsers/related_keywords.rb
75
- - lib/parsers/result_num.rb
76
- - lib/parsers/right_hotel.rb
77
- - lib/parsers/right_personinfo.rb
78
- - lib/parsers/right_relaperson.rb
79
- - lib/parsers/right_weather.rb
80
- - lib/parsers/zhixin.rb
81
113
  - bin/baiduserp
82
114
  - README.md
83
115
  - lib/baiduserp/user_agents.yml
@@ -1,5 +0,0 @@
1
- class Baiduserp::Parser
2
- def _parse_pinpaizhuanqu(file)
3
- file[:doc].search("div[@id='content_left']").first.children[2].name == 'script'
4
- end
5
- end