human_power 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 277dc00cb5eb662ee14dd8982e83e9c2eae68259
4
- data.tar.gz: 4bf0e2eaf3b4f109be86812b141b788ad3185f8c
3
+ metadata.gz: 85d862a0641b75ed9ca834a320e636899b0a707e
4
+ data.tar.gz: 8526d1ddd6b2a51439f7efb4339f397c5f575b07
5
5
  SHA512:
6
- metadata.gz: c0d3358d62276159e40f6bc18e7b40ad49d83e1aea7c4b4c88102c7d56ec57a1c3d220ec3f8113b77c11706f7d95e1fc0c68018053b18337889b4b40341c67c1
7
- data.tar.gz: 5d0f9a11855094a673582851e7f244a4f605a4c592d92e103bb7d7f1c09cfa4b0305f4d6f9826d459d975fb35d35d139a6c491820a1fa41f09648a67cf360a58
6
+ metadata.gz: ff9ebfe171ddc808a642e6bda45e3475d0da4ea07d1f063694ae7d92807ff4cfd9042dbd643d11dcf30c6449377f5437c4c84139cd2153149c8dff7adbb347fb
7
+ data.tar.gz: becb2c0552d849252a8ee195cb5cb0e0dd9274ad093a3dc4ddf31dc6c64b92b6630b598d45ce8d605d54a74de7003801b03814c1b1d4ccbc61ef54b474e12b72
data/README.md CHANGED
@@ -24,7 +24,7 @@ If you are using Rails, you can add a sample *config/robots.rb* configuration fi
24
24
 
25
25
  $ rails g human_power:install
26
26
 
27
- It will allow crawlers to access to the whole site by default.
27
+ It will allow crawlers access to the whole site by default.
28
28
 
29
29
  Now you can restart your server and visit `/robots.txt` to see what's generated from the new configuration file.
30
30
 
@@ -70,6 +70,13 @@ sitemap sitemap_url
70
70
  sitemap one_url, two_url
71
71
  ```
72
72
 
73
+ Then visit `/robots.txt` in your browser.
74
+
75
+ ## Crawlers
76
+
77
+ Please see [user_agents.yml](https://github.com/lassebunk/human_power/blob/master/user_agents.yml) for a list of 170+ built-in user agents/crawlers you can use like shown above.
78
+ The list is from [UserAgentString.com](http://www.useragentstring.com/pages/Crawlerlist/).
79
+
73
80
  ## Caveats
74
81
 
75
82
  Human Power is great for adding rules to your robots.txt.
@@ -1,3 +1,3 @@
1
1
  module HumanPower
2
- VERSION = "0.0.2"
2
+ VERSION = "0.0.3"
3
3
  end
data/lib/human_power.rb CHANGED
@@ -1,7 +1,6 @@
1
1
  require "human_power/version"
2
2
  require "human_power/generator"
3
3
  require "human_power/rule"
4
- require "human_power/user_agents"
5
4
  require "human_power/rails" if defined?(Rails)
6
5
 
7
6
  module HumanPower
@@ -20,8 +19,17 @@ module HumanPower
20
19
  user_agents[key] = user_agent_string
21
20
  end
22
21
 
22
+ # Hash of registered user agents.
23
23
  def user_agents
24
- @user_agents ||= DEFAULT_USER_AGENTS
24
+ @user_agents ||= load_user_agents
25
25
  end
26
+
27
+ private
28
+
29
+ # Loads the built-in user agents from crawlers.yml.
30
+ def load_user_agents
31
+ path = File.expand_path("../../user_agents.yml", __FILE__)
32
+ Hash[YAML.load(open(path).read).map { |k, v| [k.to_sym, v] }]
33
+ end
26
34
  end
27
35
  end
data/user_agents.yml ADDED
@@ -0,0 +1,170 @@
1
+ 008: 008
2
+ abacho_bot: ABACHOBot
3
+ accoona_ai_agent: Accoona-AI-Agent
4
+ add_sugar_spider_bot: AddSugarSpiderBot
5
+ any_apex_bot: AnyApexBot
6
+ arachmo: Arachmo
7
+ blitzbot: B-l-i-t-z-B-O-T
8
+ baiduspider: Baiduspider
9
+ become_bot: BecomeBot
10
+ beslist_bot: BeslistBot
11
+ billy_bob_bot: BillyBobBot
12
+ bimbot: Bimbot
13
+ bingbot: Bingbot
14
+ blitz_bot: BlitzBOT
15
+ boitho_com_dc: boitho.com-dc
16
+ boitho_com_robot: boitho.com-robot
17
+ btbot: btbot
18
+ catch_bot: CatchBot
19
+ cerberian_drtrs: Cerberian Drtrs
20
+ charlotte: Charlotte
21
+ convera_crawler: ConveraCrawler
22
+ cosmos: cosmos
23
+ covario_ids: Covario IDS
24
+ datapark_search: DataparkSearch
25
+ diamond_bot: DiamondBot
26
+ discobot: Discobot
27
+ dotbot: Dotbot
28
+ emerald_shield_com_web_bot: EmeraldShield.com WebBot
29
+ envolk_its_spider: envolk[ITS]spider
30
+ esperanza_bot: EsperanzaBot
31
+ exabot: Exabot
32
+ fast_enterprise_crawler: FAST Enterprise Crawler
33
+ fast_web_crawler: FAST-WebCrawler
34
+ fdse_robot: FDSE robot
35
+ find_links: FindLinks
36
+ furl_bot: FurlBot
37
+ fyber_spider: FyberSpider
38
+ g2crawler: g2crawler
39
+ gaisbot: Gaisbot
40
+ galaxy_bot: GalaxyBot
41
+ genie_bot: genieBot
42
+ gigabot: Gigabot
43
+ girafabot: Girafabot
44
+ googlebot: Googlebot
45
+ googlebot_image: Googlebot-Image
46
+ guruji_bot: GurujiBot
47
+ happy_fun_bot: HappyFunBot
48
+ hl_ftien_spider: hl_ftien_spider
49
+ holmes: Holmes
50
+ htdig: htdig
51
+ iaskspider: iaskspider
52
+ ia_archiver: ia_archiver
53
+ ic_crawler: iCCrawler
54
+ ichiro: ichiro
55
+ igde_spyder: igdeSpyder
56
+ irl_bot: IRLbot
57
+ issue_crawler: IssueCrawler
58
+ jaxified_bot: Jaxified Bot
59
+ jyxobot: Jyxobot
60
+ koepa_bot: KoepaBot
61
+ l_webis: L.webis
62
+ lapozz_bot: LapozzBot
63
+ larbin: Larbin
64
+ ld_spider: LDSpider
65
+ lexxe_bot: LexxeBot
66
+ linguee_bot: Linguee Bot
67
+ link_walker: LinkWalker
68
+ lmspider: lmspider
69
+ lwp_trivial: lwp-trivial
70
+ mabontland: mabontland
71
+ magpie_crawler: magpie-crawler
72
+ mediapartners_google: Mediapartners-Google
73
+ mj12bot: MJ12bot
74
+ mnogosearch: Mnogosearch
75
+ mogimogi: mogimogi
76
+ mojeek_bot: MojeekBot
77
+ moreoverbot: Moreoverbot
78
+ morning_paper: Morning Paper
79
+ msnbot: msnbot
80
+ msr_bot: MSRBot
81
+ mva_client: MVAClient
82
+ mxbot: mxbot
83
+ net_research_server: NetResearchServer
84
+ net_seer_crawler: NetSeer Crawler
85
+ news_gator: NewsGator
86
+ ng_search: NG-Search
87
+ nicebot: nicebot
88
+ noxtrumbot: noxtrumbot
89
+ nusearch_spider: Nusearch Spider
90
+ nutch_cvs: NutchCVS
91
+ nymesis: Nymesis
92
+ obot: obot
93
+ oegp: oegp
94
+ omgilibot: omgilibot
95
+ omni_explorer_bot: OmniExplorer_Bot
96
+ oozbot: OOZBOT
97
+ orbiter: Orbiter
98
+ page_bites_hyper_bot: PageBitesHyperBot
99
+ peew: Peew
100
+ polybot: polybot
101
+ pompos: Pompos
102
+ post_post: PostPost
103
+ psbot: Psbot
104
+ pyc_url: PycURL
105
+ qseero: Qseero
106
+ radian6: Radian6
107
+ rampy_bot: RAMPyBot
108
+ rufus_bot: RufusBot
109
+ sand_crawler: SandCrawler
110
+ sb_ider: SBIder
111
+ scout_jet: ScoutJet
112
+ scrubby: Scrubby
113
+ search_sight: SearchSight
114
+ seekbot: Seekbot
115
+ semanticdiscovery: semanticdiscovery
116
+ sensis_web_crawler: Sensis Web Crawler
117
+ seo_chat_bot: SEOChat::Bot
118
+ seznam_bot: SeznamBot
119
+ shim_crawler: Shim-Crawler
120
+ shop_wiki: ShopWiki
121
+ shoula_robot: Shoula robot
122
+ silk: silk
123
+ sitebot: Sitebot
124
+ snappy: Snappy
125
+ sogou_spider: sogou spider
126
+ sosospider: Sosospider
127
+ speedy_spider: Speedy Spider
128
+ sqworm: Sqworm
129
+ stack_rambler: StackRambler
130
+ suggybot: suggybot
131
+ survey_bot: SurveyBot
132
+ synoo_bot: SynooBot
133
+ teoma: Teoma
134
+ terrawiz_bot: TerrawizBot
135
+ the_su_bot: TheSuBot
136
+ thumbnail_cz_robot: Thumbnail.CZ robot
137
+ tin_eye: TinEye
138
+ truwo_gps: truwoGPS
139
+ turnitin_bot: TurnitinBot
140
+ tweeted_times_bot: TweetedTimes Bot
141
+ twenga_bot: TwengaBot
142
+ updated: updated
143
+ urlfilebot: Urlfilebot
144
+ vagabondo: Vagabondo
145
+ voila_bot: VoilaBot
146
+ vortex: Vortex
147
+ voyager: voyager
148
+ vyu2: VYU2
149
+ webcollage: webcollage
150
+ websquash_com: Websquash.com
151
+ wf84: wf84
152
+ wo_finde_ich_robot: WoFindeIch Robot
153
+ womlpe_factory: WomlpeFactory
154
+ xaldon_web_spider: Xaldon_WebSpider
155
+ yacy: yacy
156
+ yahoo_slurp: Yahoo! Slurp
157
+ yahoo_slurp_china: Yahoo! Slurp China
158
+ yahoo_seeker: YahooSeeker
159
+ yahoo_seeker_testing: YahooSeeker-Testing
160
+ yandex_bot: YandexBot
161
+ yandex_images: YandexImages
162
+ yasaklibot: Yasaklibot
163
+ yeti: Yeti
164
+ yodao_bot: YodaoBot
165
+ yoogli_fetch_agent: yoogliFetchAgent
166
+ youdao_bot: YoudaoBot
167
+ zao: Zao
168
+ zealbot: Zealbot
169
+ zspider: zspider
170
+ zy_borg: ZyBorg
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: human_power
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Lasse Bunk
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2013-12-08 00:00:00.000000000 Z
11
+ date: 2013-12-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: bundler
@@ -90,7 +90,6 @@ files:
90
90
  - lib/human_power/rails/controller.rb
91
91
  - lib/human_power/rails/engine.rb
92
92
  - lib/human_power/rule.rb
93
- - lib/human_power/user_agents.rb
94
93
  - lib/human_power/version.rb
95
94
  - test/dummy/README.rdoc
96
95
  - test/dummy/Rakefile
@@ -134,6 +133,7 @@ files:
134
133
  - test/generator_test.rb
135
134
  - test/rails/integration_test.rb
136
135
  - test/test_helper.rb
136
+ - user_agents.yml
137
137
  homepage: https://github.com/lassebunk/human_power
138
138
  licenses:
139
139
  - MIT
@@ -154,7 +154,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
154
154
  version: '0'
155
155
  requirements: []
156
156
  rubyforge_project:
157
- rubygems_version: 2.1.10
157
+ rubygems_version: 2.0.3
158
158
  signing_key:
159
159
  specification_version: 4
160
160
  summary: Easy generation of robots.txt. Force the robots into submission!
@@ -1,7 +0,0 @@
1
- module HumanPower
2
- DEFAULT_USER_AGENTS = {
3
- all: "*",
4
- googlebot: "Googlebot",
5
- bingbot: "Bingbot"
6
- }
7
- end