bot_challenge_page 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 826b15d243c27b3003ad7c37650836ee67536319177d947b4a046eb200914df7
4
- data.tar.gz: 4709e5302b7c4298fc06bb490646273cce5a916e3337ed7d788c5390f345777e
3
+ metadata.gz: a1913d93cd52d599d33f7217bb3714fbf67c82d08a888a3025cc30e51c51e438
4
+ data.tar.gz: 6e06e420e625a069132ae89a3ef733c1b2c246878e771e3b459ea0cdc351d14e
5
5
  SHA512:
6
- metadata.gz: e6e260de1875ebbb96a8b809907b50f8d92abe1a9e1d6aee711fd7f3f08567ba627325a035ac08b67f45e42468ad982a85b2f4c45c6709460abe186acbb71b6a
7
- data.tar.gz: 3a11694e124de65ca11df02d6fc3aa38b7ee4d93881b2eba64f89a6c30924f3bee3d98297cc8ff9340aa029823bf5cef9e19077cad3b8acfc39aa9cc57e7fe29
6
+ metadata.gz: 11504f2622783e4ca4bfb06b981df28260c1903e04a8900fe1797d34cd05bc29bcf3b2a1a6b9c93b85ae0f3a4639ff115699d4a548f6912b839bee62147e595c
7
+ data.tar.gz: 229ed02d6173b651ffd3cbef26e6444a974a14a1e9e3390d76567774a2385ad38cb4e1c289a78dda2c66f57d0e6a54e375506c9826cdbd7df7f799a7b96f03ff
data/README.md CHANGED
@@ -27,6 +27,8 @@ The motivating use case is fairly dumb (probably AI-related) crawlers, rather th
27
27
 
28
28
  * If you do not want to use rack-attack and want challenge on FIRST request, `rails g bot_challenge_page:install --no-rack-attack`
29
29
 
30
+ * By default challenge pages are "inline" at protected URL. To redirect to a separate challenge page URL instead, `--redirect-for-challenge`
31
+
30
32
  * If you are **not using rack-attack**, you need to add a before_action to the controller(s)
31
33
  you'd like to protect, eg:
32
34
 
@@ -59,10 +61,41 @@ To customize the layout or challenge page HTML more further, you can use configu
59
61
  ```ruby
60
62
  BotChallengePage::BotChallengePageController.bot_challenge_config.challenge_renderer = ()-> {
61
63
  render "my_local_view_folder/whatever", layout "another_layout"
62
- render layout: "another_layout" # default html but change layout. etc.
63
64
  }
64
65
  ```
65
66
 
67
+ ## Logging
68
+
69
+ By default we log when a challenge result is submitted to the back-end; you can find challenge passes or failures by searching your logs for `BotChallengePage`.
70
+
71
+ We do not log when a challenge is issued -- experience shows challenge issues far outnumber challenge results, and can fill up the logs too fast.
72
+
73
+ If you'd like to log or observe challenge issues, you can configure a proc that is executed
74
+ in the context of the controller, and is called when a page is blocked by a challenge.
75
+
76
+ ```ruby
77
+ BotChallengePage::BotChallengePageController.bot_challenge_config.after_blocked = (_bot_challenge_class)-> {
78
+ logger.info("page blocked by challenge: #{request.uri}")
79
+ }
80
+ ```
81
+
82
+ Or, here's how I managed to get it in [lograge](https://github.com/roidrage/lograge), so a page blocked results in a `bot_chlng=true` param in a lograge line.
83
+
84
+ ```ruby
85
+ BotChallengePage::BotChallengePageController.bot_challenge_config.after_blocked =
86
+ ->(bot_detect_class) {
87
+ request.env["bot_detect.blocked_for_challenge"] = true
88
+ }
89
+
90
+
91
+ # production.rb
92
+ config.lograge.custom_payload do |controller|
93
+ {
94
+ bot_chlng: controller.request.env["bot_detect.blocked_for_challenge"]
95
+ }.compact
96
+ end
97
+ ```
98
+
66
99
  ## Example possible Blacklight config
67
100
 
68
101
  Many of us in my professional community use [blacklight](https://github.com/projectblacklight/blacklight). Here's a possible sample blacklight config to:
@@ -88,13 +121,12 @@ Rails.application.config.to_prepare do
88
121
  BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limit_count = 3
89
122
 
90
123
  BotChallengePage::BotChallengePageController.allow_exempt = ->(controller) {
91
- # Excempt any Catalog #facet action that looks like an ajax/fetch request, the redirect
92
- # ain't gonna work there, we just exempt it.
124
+ # Excempt any Catalog #facet or #range_limit action that looks like an ajax/fetch request, the # challenge isn't going to work there, we just exempt it.
93
125
  #
94
126
  # sec-fetch-dest is set to 'empty' by browser on fetch requests, to limit us further;
95
127
  # sure an attacker could fake it, we don't mind if someone determined can avoid
96
128
  # bot challenge on this one action
97
- ( controller.params[:action] == "facet" &&
129
+ ( controller.params[:action].in?(["facet", "range_limit"]) &&
98
130
  controller.request.headers["sec-fetch-dest"] == "empty" &&
99
131
  controller.kind_of?(CatalogController)
100
132
  )
@@ -117,6 +149,8 @@ Locally one way to test with a specific rails version appraisal is `bundle exec
117
149
 
118
150
  If you make any changes to `Gemfile` you may need to run `bundle exec appraisal install` and commit changes.
119
151
 
152
+ **One reason tests are slow** is I think we're running system tests with real turnstile proof-of-work bot detection JS code? (Or is it, when we are are using a CF turnstile testing key that always passes?). There aren't many tests so it's no big deal, but this is something that could be investigated/optmized more potentially.
153
+
120
154
  ## Possible future features?
121
155
 
122
156
  * allow regex in default location_matcher? Easy to do if you want it, just say so.
@@ -19,9 +19,6 @@ module BotChallengePage
19
19
  # different paths in your app if you like, is why config is with controller
20
20
  class_attribute :bot_challenge_config, default: ::BotChallengePage::Config.new
21
21
 
22
- delegate :cf_turnstile_js_url, :cf_turnstile_sitekey, :still_around_delay_ms, to: :bot_challenge_config
23
- helper_method :cf_turnstile_js_url, :cf_turnstile_sitekey, :still_around_delay_ms
24
-
25
22
  SESSION_DATETIME_KEY = "t"
26
23
  SESSION_IP_KEY = "i"
27
24
 
@@ -29,19 +26,22 @@ module BotChallengePage
29
26
  class_attribute :_track_notification_subscription, instance_accessor: false
30
27
 
31
28
 
29
+ # only used if config.redirect_for_challenge is true
32
30
  def challenge
33
- # possible custom render to choose layouts or templates, but normally
34
- # we just do default rails render and this proc is empty.
35
- if self.bot_challenge_config.challenge_renderer
36
- instance_exec &self.bot_challenge_config.challenge_renderer
37
- end
31
+ # possible custom render to choose layouts or templates, but
32
+ # default is what would be default template for this action
33
+ #
34
+ # We put it in instancevar as a hacky way of passing to template that can be fulfilled
35
+ # both here and in arbitrary controllers for direct render.
36
+ @bot_challenge_config = bot_challenge_config
37
+ instance_exec &self.bot_challenge_config.challenge_renderer
38
38
  end
39
39
 
40
40
  def verify_challenge
41
41
  body = {
42
42
  secret: self.bot_challenge_config.cf_turnstile_secret_key,
43
43
  response: params["cf_turnstile_response"],
44
- remoteip: request.remote_ip
44
+ remoteip: request.remote_ip,
45
45
  }
46
46
 
47
47
  http = HTTP.timeout(self.bot_challenge_config.cf_timeout)
@@ -64,7 +64,10 @@ module BotChallengePage
64
64
  Rails.logger.warn("#{self.class.name}: Cloudflare Turnstile validation failed (#{request.remote_ip}, #{request.user_agent}): #{result}: #{params["dest"]}")
65
65
  end
66
66
 
67
- # let's just return the whole thing to client? Is there anything confidential there?
67
+ # add config needed by JS to result
68
+ result["redirect_for_challenge"] = self.bot_challenge_config.redirect_for_challenge
69
+
70
+ # and let's just return the whole thing to client? Is there anything confidential there?
68
71
  render json: result
69
72
  rescue HTTP::Error, JSON::ParserError => e
70
73
  # probably an http timeout? or something weird.
@@ -25,9 +25,20 @@ module BotChallengePage
25
25
  return
26
26
  end
27
27
 
28
- Rails.logger.info("#{self.name}: Cloudflare Turnstile challenge redirect: (#{controller.request.remote_ip}, #{controller.request.user_agent}): from #{controller.request.url}")
29
- # status code temporary
30
- controller.redirect_to controller.bot_detect_challenge_path(dest: controller.request.original_fullpath), status: 307
28
+ # Prevent caching of bot challenge page
29
+ controller.response.headers["Cache-Control"] = "no-store"
30
+
31
+ if self.bot_challenge_config.redirect_for_challenge
32
+ # status code temporary
33
+ controller.redirect_to controller.bot_detect_challenge_path(dest: controller.request.original_fullpath), status: 307
34
+ else
35
+ # hacky way to get config to view template in an arbitrary controller, good enough for now
36
+ controller.instance_variable_set("@bot_challenge_config", self.bot_challenge_config) unless controller.instance_variable_get("@bot_challenge_config")
37
+ controller.instance_exec &self.bot_challenge_config.challenge_renderer
38
+ end
39
+
40
+ # allow app to see and log if desired
41
+ controller.instance_exec(self, &self.bot_challenge_config.after_blocked)
31
42
  end
32
43
  end
33
44
 
@@ -27,6 +27,10 @@ module BotChallengePage
27
27
  end
28
28
  end
29
29
 
30
+ # Should we redirect to a challenge page (true) or just display it inline
31
+ # with a 403 status (false)
32
+ attribute :redirect_for_challenge, default: false
33
+
30
34
  attribute :enabled, default: false # Must set to true to turn on at all
31
35
 
32
36
  attribute :cf_turnstile_sitekey, default: "1x00000000000000000000AA" # a testing key that always passes
@@ -55,7 +59,11 @@ module BotChallengePage
55
59
  attribute :allow_exempt, default: ->(controller, config) { false }
56
60
 
57
61
  # replace with say `->() { render layout: 'something' }`, or `render "somedir/some_template"`
58
- attribute :challenge_renderer, default: nil
62
+ attribute :challenge_renderer, default: ->() {
63
+ render "bot_challenge_page/bot_challenge_page/challenge", status: 403
64
+ }
65
+
66
+ attribute :after_blocked, default: ->(bot_detect_class) {}
59
67
 
60
68
 
61
69
  # rate limit per subnet, following lehigh's lead, although we use a smaller
@@ -1,6 +1,7 @@
1
+ <%# locals: (bot_challenge_config:) -%>
2
+
1
3
  <%# we deliver our simple javascript as inline script to make deployment more
2
4
  reliable without having to deal with different asset pipelines, and it's really a fine choice anyway %>
3
-
4
5
  <script type="text/javascript">
5
6
  async function turnstileCallback(token) {
6
7
  try {
@@ -31,12 +32,6 @@
31
32
 
32
33
  result = await response.json();
33
34
  if (result["success"] == true) {
34
- const dest = new URLSearchParams(window.location.search).get("dest");
35
- // For security make sure it only has path and on
36
- if (!dest.startsWith("/") || dest.startsWith("//")) {
37
- throw new Error("illegal non-local redirect: " + dest);
38
- }
39
-
40
35
  // in case this page stays around, (say it was rediret to media asset), let's add a failsafe message after
41
36
  // a couple seconds.
42
37
  const delay = document.querySelector("#botChallengePageStillAroundTemplate")?.getAttribute("data-still-around-delay-ms") || 1200;
@@ -44,8 +39,19 @@
44
39
  _displayStillAroundNote()
45
40
  }, delay);
46
41
 
47
- // replace the challenge page in history
48
- window.location.replace(dest);
42
+ if (result["redirect_for_challenge"] == true) {
43
+ const dest = new URLSearchParams(window.location.search).get("dest");
44
+ // For security make sure it only has path and on
45
+ if (!dest.startsWith("/") || dest.startsWith("//")) {
46
+ throw new Error("illegal non-local redirect: " + dest);
47
+ }
48
+
49
+ // replace the challenge page in history
50
+ window.location.replace(dest);
51
+ } else {
52
+ // just need to reload and now we'll get through
53
+ window.location.reload();
54
+ }
49
55
  } else {
50
56
  console.error("Turnstile response reported as failure: " + JSON.stringify(result))
51
57
  _displayChallengeError();
@@ -1,5 +1,6 @@
1
+ <%# locals: (bot_challenge_config:) -%>
1
2
  <div
2
3
  class="cf-turnstile"
3
- data-sitekey="<%= cf_turnstile_sitekey %>"
4
+ data-sitekey="<%= bot_challenge_config.cf_turnstile_sitekey %>"
4
5
  data-callback="turnstileCallback"
5
6
  ></div>
@@ -1,7 +1,7 @@
1
1
  <div class="bot_challenge_page">
2
2
  <h1 class="mb-4"><%= t('bot_challenge_page.title') %></h1>
3
3
 
4
- <%= render "bot_challenge_page/turnstile_widget_placeholder" %>
4
+ <%= render "bot_challenge_page/turnstile_widget_placeholder", bot_challenge_config: @bot_challenge_config %>
5
5
 
6
6
  <noscript>
7
7
  <div class="alert alert-danger"><%= t('bot_challenge_page.noscript') %></div>
@@ -16,14 +16,14 @@
16
16
  </div>
17
17
  </template>
18
18
 
19
- <template id="botChallengePageStillAroundTemplate" data-still_around_delay_ms="<%= still_around_delay_ms %>">
19
+ <template id="botChallengePageStillAroundTemplate" data-still_around_delay_ms="<%= @bot_challenge_config.still_around_delay_ms %>">
20
20
  <div class="alert alert-info" role="alert">
21
21
  <i class="fa fa-info-circle" aria-hidden="true"></i>
22
22
  <%= t('bot_challenge_page.still_around') %>
23
23
  </div>
24
24
  </template>
25
25
 
26
- <script src="<%= cf_turnstile_js_url %>" async defer></script>
26
+ <script src="<%= @bot_challenge_config.cf_turnstile_js_url %>" async defer></script>
27
27
 
28
- <%= render "bot_challenge_page/local_turnstile_script_tag" %>
28
+ <%= render "bot_challenge_page/local_turnstile_script_tag", bot_challenge_config: @bot_challenge_config %>
29
29
  </div>
@@ -1,3 +1,3 @@
1
1
  module BotChallengePage
2
- VERSION = "0.2.0"
2
+ VERSION = "0.3.0"
3
3
  end
@@ -3,10 +3,14 @@ module BotChallengePage
3
3
  source_root File.expand_path("templates", __dir__)
4
4
 
5
5
  class_option :'rack_attack', type: :boolean, default: true, desc: "Support rate-limit allowance configuration"
6
+ class_option :redirect_for_challenge, type: :boolean, default: false, desc: "Redirect to separate challenge page instead of inline challenge"
6
7
 
7
8
  def generate_routes
8
- route 'get "/challenge", to: "bot_challenge_page/bot_challenge_page#challenge", as: :bot_detect_challenge'
9
- route 'post "/challenge", to: "bot_challenge_page/bot_challenge_page#verify_challenge"'
9
+ route 'post "/challenge", to: "bot_challenge_page/bot_challenge_page#verify_challenge", as: :bot_detect_challenge'
10
+
11
+ if options[:redirect_for_challenge]
12
+ route 'get "/challenge", to: "bot_challenge_page/bot_challenge_page#challenge"'
13
+ end
10
14
  end
11
15
 
12
16
  def add_before_filter_enforcement
@@ -4,9 +4,14 @@ Rails.application.config.to_prepare do
4
4
 
5
5
  # Get from CloudFlare Turnstile: https://www.cloudflare.com/application-services/products/turnstile/
6
6
  # Some testing keys are also available: https://developers.cloudflare.com/turnstile/troubleshooting/testing/
7
+ #
8
+ # Always pass testing sitekey: "1x00000000000000000000AA"
7
9
  BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_sitekey = "MUST GET"
10
+ # Always pass testing secret_key: "1x0000000000000000000000000000000AA"
8
11
  BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_secret_key = "MUST GET"
9
12
 
13
+ BotChallengePage::BotChallengePageController.bot_challenge_config.redirect_for_challenge = <%= options[:redirect_for_challenge] %>
14
+
10
15
  <%- if options[:rack_attack] %>
11
16
  # What paths do you want to protect?
12
17
  #
@@ -33,7 +38,7 @@ Rails.application.config.to_prepare do
33
38
  # BotChallengePage::BotChallengePageController.bot_challenge_config.session_passed_good_for = 36.hours
34
39
 
35
40
  # Exempt some requests from bot challenge protection
36
- # BotChallengePage::BotChallengePageController.allow_exempt = ->(controller) {
41
+ # BotChallengePage::BotChallengePageController.bot_challenge_config.allow_exempt = ->(controller) {
37
42
  # # controller.params
38
43
  # # controller.request
39
44
  # # controller.session
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bot_challenge_page
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-03-03 00:00:00.000000000 Z
11
+ date: 2025-03-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: appraisal