bot_challenge_page 0.2.0 → 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 826b15d243c27b3003ad7c37650836ee67536319177d947b4a046eb200914df7
4
- data.tar.gz: 4709e5302b7c4298fc06bb490646273cce5a916e3337ed7d788c5390f345777e
3
+ metadata.gz: 2e582326d0a139a9407ccd31a0bbb822d99d51a7040093d22427d563997dbea8
4
+ data.tar.gz: 68107c7fc6a9aa2d8e3e079e401708c740e3cf6ee840b7be6cc4f7c85e8b64df
5
5
  SHA512:
6
- metadata.gz: e6e260de1875ebbb96a8b809907b50f8d92abe1a9e1d6aee711fd7f3f08567ba627325a035ac08b67f45e42468ad982a85b2f4c45c6709460abe186acbb71b6a
7
- data.tar.gz: 3a11694e124de65ca11df02d6fc3aa38b7ee4d93881b2eba64f89a6c30924f3bee3d98297cc8ff9340aa029823bf5cef9e19077cad3b8acfc39aa9cc57e7fe29
6
+ metadata.gz: 3748c684eeca9e6e66b50fa5e0e5fe4b97c328862799a9dee8fc19632974df5c4489e076eede3d67f0ca8d6f5dc16476ae682d938be015606837c232866256ce
7
+ data.tar.gz: 2b0bf202ef9c4751a07d0acebeaedd722fd4de501b0a0ae94b004b70b03fb77682cd472aa7e1d3a34ff36810ce3b532662751f3cc79c5453c5fa673977afd0db
data/README.md CHANGED
@@ -27,6 +27,8 @@ The motivating use case is fairly dumb (probably AI-related) crawlers, rather th
27
27
 
28
28
  * If you do not want to use rack-attack and want challenge on FIRST request, `rails g bot_challenge_page:install --no-rack-attack`
29
29
 
30
+ * By default challenge pages are "inline" at protected URL. To redirect to a separate challenge page URL instead, `--redirect-for-challenge`
31
+
30
32
  * If you are **not using rack-attack**, you need to add a before_action to the controller(s)
31
33
  you'd like to protect, eg:
32
34
 
@@ -59,10 +61,41 @@ To customize the layout or challenge page HTML more further, you can use configu
59
61
  ```ruby
60
62
  BotChallengePage::BotChallengePageController.bot_challenge_config.challenge_renderer = ()-> {
61
63
  render "my_local_view_folder/whatever", layout "another_layout"
62
- render layout: "another_layout" # default html but change layout. etc.
63
64
  }
64
65
  ```
65
66
 
67
+ ## Logging
68
+
69
+ By default we log when a challenge result is submitted to the back-end; you can find challenge passes or failures by searching your logs for `BotChallengePage`.
70
+
71
+ We do not log when a challenge is issued -- experience shows challenge issues far outnumber challenge results, and can fill up the logs too fast.
72
+
73
+ If you'd like to log or observe challenge issues, you can configure a proc that is executed
74
+ in the context of the controller, and is called when a page is blocked by a challenge.
75
+
76
+ ```ruby
77
+ BotChallengePage::BotChallengePageController.bot_challenge_config.after_blocked = (_bot_challenge_class)-> {
78
+ logger.info("page blocked by challenge: #{request.uri}")
79
+ }
80
+ ```
81
+
82
+ Or, here's how I managed to get it in [lograge](https://github.com/roidrage/lograge), so a page blocked results in a `bot_chlng=true` param in a lograge line.
83
+
84
+ ```ruby
85
+ BotChallengePage::BotChallengePageController.bot_challenge_config.after_blocked =
86
+ ->(bot_detect_class) {
87
+ request.env["bot_detect.blocked_for_challenge"] = true
88
+ }
89
+
90
+
91
+ # production.rb
92
+ config.lograge.custom_payload do |controller|
93
+ {
94
+ bot_chlng: controller.request.env["bot_detect.blocked_for_challenge"]
95
+ }.compact
96
+ end
97
+ ```
98
+
66
99
  ## Example possible Blacklight config
67
100
 
68
101
  Many of us in my professional community use [blacklight](https://github.com/projectblacklight/blacklight). Here's a possible sample blacklight config to:
@@ -88,13 +121,12 @@ Rails.application.config.to_prepare do
88
121
  BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limit_count = 3
89
122
 
90
123
  BotChallengePage::BotChallengePageController.allow_exempt = ->(controller) {
91
- # Excempt any Catalog #facet action that looks like an ajax/fetch request, the redirect
92
- # ain't gonna work there, we just exempt it.
124
+ # Excempt any Catalog #facet or #range_limit action that looks like an ajax/fetch request, the # challenge isn't going to work there, we just exempt it.
93
125
  #
94
126
  # sec-fetch-dest is set to 'empty' by browser on fetch requests, to limit us further;
95
127
  # sure an attacker could fake it, we don't mind if someone determined can avoid
96
128
  # bot challenge on this one action
97
- ( controller.params[:action] == "facet" &&
129
+ ( controller.params[:action].in?(["facet", "range_limit"]) &&
98
130
  controller.request.headers["sec-fetch-dest"] == "empty" &&
99
131
  controller.kind_of?(CatalogController)
100
132
  )
@@ -117,6 +149,8 @@ Locally one way to test with a specific rails version appraisal is `bundle exec
117
149
 
118
150
  If you make any changes to `Gemfile` you may need to run `bundle exec appraisal install` and commit changes.
119
151
 
152
+ **One reason tests are slow** is I think we're running system tests with real turnstile proof-of-work bot detection JS code? (Or is it, when we are are using a CF turnstile testing key that always passes?). There aren't many tests so it's no big deal, but this is something that could be investigated/optmized more potentially.
153
+
120
154
  ## Possible future features?
121
155
 
122
156
  * allow regex in default location_matcher? Easy to do if you want it, just say so.
@@ -135,6 +169,8 @@ The gem is available as open source under the terms of the [MIT License](https:/
135
169
 
136
170
  * Joe's [similar plugin for drupal](https://drupal.org/project/turnstile_protect)
137
171
 
172
+ * Joe's [similar plugin for traefik reverse-proxy](https://github.com/libops/captcha-protect)
173
+
138
174
  * [Similar feature built into PHP VuFind app](https://github.com/vufind-org/vufind/pull/4079)
139
175
 
140
176
  * [My own blog post about this approach](https://bibwild.wordpress.com/2025/01/16/using-cloudflare-turnstile-to-protect-certain-pages-on-a-rails-app/).
@@ -19,9 +19,6 @@ module BotChallengePage
19
19
  # different paths in your app if you like, is why config is with controller
20
20
  class_attribute :bot_challenge_config, default: ::BotChallengePage::Config.new
21
21
 
22
- delegate :cf_turnstile_js_url, :cf_turnstile_sitekey, :still_around_delay_ms, to: :bot_challenge_config
23
- helper_method :cf_turnstile_js_url, :cf_turnstile_sitekey, :still_around_delay_ms
24
-
25
22
  SESSION_DATETIME_KEY = "t"
26
23
  SESSION_IP_KEY = "i"
27
24
 
@@ -29,19 +26,22 @@ module BotChallengePage
29
26
  class_attribute :_track_notification_subscription, instance_accessor: false
30
27
 
31
28
 
29
+ # only used if config.redirect_for_challenge is true
32
30
  def challenge
33
- # possible custom render to choose layouts or templates, but normally
34
- # we just do default rails render and this proc is empty.
35
- if self.bot_challenge_config.challenge_renderer
36
- instance_exec &self.bot_challenge_config.challenge_renderer
37
- end
31
+ # possible custom render to choose layouts or templates, but
32
+ # default is what would be default template for this action
33
+ #
34
+ # We put it in instancevar as a hacky way of passing to template that can be fulfilled
35
+ # both here and in arbitrary controllers for direct render.
36
+ @bot_challenge_config = bot_challenge_config
37
+ instance_exec &self.bot_challenge_config.challenge_renderer
38
38
  end
39
39
 
40
40
  def verify_challenge
41
41
  body = {
42
42
  secret: self.bot_challenge_config.cf_turnstile_secret_key,
43
43
  response: params["cf_turnstile_response"],
44
- remoteip: request.remote_ip
44
+ remoteip: request.remote_ip,
45
45
  }
46
46
 
47
47
  http = HTTP.timeout(self.bot_challenge_config.cf_timeout)
@@ -64,7 +64,10 @@ module BotChallengePage
64
64
  Rails.logger.warn("#{self.class.name}: Cloudflare Turnstile validation failed (#{request.remote_ip}, #{request.user_agent}): #{result}: #{params["dest"]}")
65
65
  end
66
66
 
67
- # let's just return the whole thing to client? Is there anything confidential there?
67
+ # add config needed by JS to result
68
+ result["redirect_for_challenge"] = self.bot_challenge_config.redirect_for_challenge
69
+
70
+ # and let's just return the whole thing to client? Is there anything confidential there?
68
71
  render json: result
69
72
  rescue HTTP::Error, JSON::ParserError => e
70
73
  # probably an http timeout? or something weird.
@@ -25,9 +25,20 @@ module BotChallengePage
25
25
  return
26
26
  end
27
27
 
28
- Rails.logger.info("#{self.name}: Cloudflare Turnstile challenge redirect: (#{controller.request.remote_ip}, #{controller.request.user_agent}): from #{controller.request.url}")
29
- # status code temporary
30
- controller.redirect_to controller.bot_detect_challenge_path(dest: controller.request.original_fullpath), status: 307
28
+ # Prevent caching of bot challenge page
29
+ controller.response.headers["Cache-Control"] = "no-store"
30
+
31
+ if self.bot_challenge_config.redirect_for_challenge
32
+ # status code temporary
33
+ controller.redirect_to controller.bot_detect_challenge_path(dest: controller.request.original_fullpath), status: 307
34
+ else
35
+ # hacky way to get config to view template in an arbitrary controller, good enough for now
36
+ controller.instance_variable_set("@bot_challenge_config", self.bot_challenge_config) unless controller.instance_variable_get("@bot_challenge_config")
37
+ controller.instance_exec &self.bot_challenge_config.challenge_renderer
38
+ end
39
+
40
+ # allow app to see and log if desired
41
+ controller.instance_exec(self, &self.bot_challenge_config.after_blocked)
31
42
  end
32
43
  end
33
44
 
@@ -27,6 +27,10 @@ module BotChallengePage
27
27
  end
28
28
  end
29
29
 
30
+ # Should we redirect to a challenge page (true) or just display it inline
31
+ # with a 403 status (false)
32
+ attribute :redirect_for_challenge, default: false
33
+
30
34
  attribute :enabled, default: false # Must set to true to turn on at all
31
35
 
32
36
  attribute :cf_turnstile_sitekey, default: "1x00000000000000000000AA" # a testing key that always passes
@@ -55,7 +59,11 @@ module BotChallengePage
55
59
  attribute :allow_exempt, default: ->(controller, config) { false }
56
60
 
57
61
  # replace with say `->() { render layout: 'something' }`, or `render "somedir/some_template"`
58
- attribute :challenge_renderer, default: nil
62
+ attribute :challenge_renderer, default: ->() {
63
+ render "bot_challenge_page/bot_challenge_page/challenge", status: 403
64
+ }
65
+
66
+ attribute :after_blocked, default: ->(bot_detect_class) {}
59
67
 
60
68
 
61
69
  # rate limit per subnet, following lehigh's lead, although we use a smaller
@@ -63,9 +71,9 @@ module BotChallengePage
63
71
  # https://git.drupalcode.org/project/turnstile_protect/-/blob/0dae9f95d48f9d8cae5a8e61e767c69f64490983/src/EventSubscriber/Challenge.php#L140-151
64
72
  attribute :rate_limit_discriminator, default: (lambda do |req, config|
65
73
  if req.ip.index(":") # ipv6
66
- IPAddr.new("#{req.ip}/24").to_string
67
- else
68
74
  IPAddr.new("#{req.ip}/72").to_string
75
+ else
76
+ IPAddr.new("#{req.ip}/24").to_string
69
77
  end
70
78
  rescue IPAddr::InvalidAddressError
71
79
  req.ip
@@ -0,0 +1,90 @@
1
+ require 'digest'
2
+ require "base64"
3
+
4
+ module BotChallengePage
5
+ # A simple proof-of-work algorithm, that we can also do in javascript
6
+ #
7
+ # ## Algorithm
8
+ #
9
+ # We calculate a deterministic "challenge" based on a secret key (salt?), current time period,
10
+ # and the specific client request characteristics (prob just client IP).
11
+ #
12
+ # The client has to find a prefix than when prepended to the challenge yields a Sha256 hash
13
+ # that begins with a certain number of zeroes in the hex representtion. The number of zeroes is the "difficulty".
14
+ # Each zero in hex rep is 4 bits.
15
+ #
16
+ # They send the prefix back to us as a solution, and we confirm that when prefixed to
17
+ # our challenge, and hashed, it has the required number of leading zeroes.
18
+ #
19
+ # (TODO: Leading zeroes in a hex represnetation or what?)
20
+ class SimplePow1
21
+ # how long is a challenge good for, it will really be good for somewhere between this and 2x this,
22
+ # since we always try previous challenge to avoid race condition on switch
23
+ CHALLENGE_PERIOD = 6.minutes
24
+
25
+ # how many leading 0 *BITS* -- and time varies a LOT and expands RAPIDLY when we add, we dont' totally knokw what we're doing
26
+ DEFAULT_DIFFICULTY = 18
27
+
28
+ DEFAULT_SECRET = ActiveSupport::KeyGenerator.new(Rails.application.config.secret_key_base).generate_key("BotChallengePage::SimplePow1")
29
+
30
+ attr_reader :client_id, :difficulty
31
+
32
+ def initialize(client_id:, secret: DEFAULT_SECRET, difficulty: DEFAULT_DIFFICULTY)
33
+ @client_id = client_id # usually client ip
34
+ @difficulty = difficulty
35
+ @secret = secret
36
+ end
37
+
38
+ # challenge is determinsitic based on our secret, the current time, and the client_id
39
+ def challenge(for_time: Time.now.utc)
40
+ period_normalized_time = for_time - (for_time.to_i % CHALLENGE_PERIOD)
41
+
42
+ Digest::SHA256.hexdigest "#{period_normalized_time.to_s}_#{client_id.to_s}_#{@secret.to_s}"
43
+ end
44
+
45
+ def challenge_for_last_period
46
+ challenge(for_time: Time.now.utc - CHALLENGE_PERIOD)
47
+ end
48
+
49
+ def challenge_params
50
+ {
51
+ challenge: challenge,
52
+ difficulty: difficulty
53
+ }
54
+ end
55
+
56
+ # Check solution against current challenge, AND against the previous period's challenge,
57
+ # in case we just had a race condition, meaning our time goid is actually
58
+ # min CHALLENGE_PERIOD and max 2 * CHALLENGE_PERIOD
59
+ #
60
+ # @param solution [String] *Base64-encoded data*, that when prefixed to the challenge,
61
+ # results in a sha256 digest with `difficulty` leading 0 bits.
62
+ #
63
+ def verify_solution(solution)
64
+ solution = Base64.decode64(solution)
65
+
66
+ verify_solution_for_challenge(solution, challenge) ||
67
+ verify_solution_for_challenge(solution, challenge(for_time: Time.now.utc - CHALLENGE_PERIOD))
68
+ end
69
+
70
+ # @param solution [String] actual data, **not** base64 encoded
71
+ #
72
+ def verify_solution_for_challenge(aSolution, aChallenge)
73
+ # there's prob a more efficient mathematical way to do this wihtout converting
74
+ # to hex string, but this is what we've got.
75
+ bindigest = Digest::SHA256.digest(aSolution + aChallenge)
76
+
77
+ # hopefully we are not going to have a problem with endian-ness here. :(
78
+
79
+ bytes_required = (difficulty / 8) + 1
80
+ prefix_bytes = bindigest.byteslice(0, bytes_required).bytes
81
+ prefix_bits = prefix_bytes.collect do |byte|
82
+ reversed_bits = byte.digits(2)
83
+ reversed_bits.fill(0, reversed_bits.length..7).reverse
84
+ end.compact.join.slice(0, difficulty)
85
+
86
+ prefix_bits == ("0" * difficulty)
87
+ end
88
+
89
+ end
90
+ end
File without changes
@@ -1,6 +1,7 @@
1
+ <%# locals: (bot_challenge_config:) -%>
2
+
1
3
  <%# we deliver our simple javascript as inline script to make deployment more
2
4
  reliable without having to deal with different asset pipelines, and it's really a fine choice anyway %>
3
-
4
5
  <script type="text/javascript">
5
6
  async function turnstileCallback(token) {
6
7
  try {
@@ -31,12 +32,6 @@
31
32
 
32
33
  result = await response.json();
33
34
  if (result["success"] == true) {
34
- const dest = new URLSearchParams(window.location.search).get("dest");
35
- // For security make sure it only has path and on
36
- if (!dest.startsWith("/") || dest.startsWith("//")) {
37
- throw new Error("illegal non-local redirect: " + dest);
38
- }
39
-
40
35
  // in case this page stays around, (say it was rediret to media asset), let's add a failsafe message after
41
36
  // a couple seconds.
42
37
  const delay = document.querySelector("#botChallengePageStillAroundTemplate")?.getAttribute("data-still-around-delay-ms") || 1200;
@@ -44,8 +39,19 @@
44
39
  _displayStillAroundNote()
45
40
  }, delay);
46
41
 
47
- // replace the challenge page in history
48
- window.location.replace(dest);
42
+ if (result["redirect_for_challenge"] == true) {
43
+ const dest = new URLSearchParams(window.location.search).get("dest");
44
+ // For security make sure it only has path and on
45
+ if (!dest.startsWith("/") || dest.startsWith("//")) {
46
+ throw new Error("illegal non-local redirect: " + dest);
47
+ }
48
+
49
+ // replace the challenge page in history
50
+ window.location.replace(dest);
51
+ } else {
52
+ // just need to reload and now we'll get through
53
+ window.location.reload();
54
+ }
49
55
  } else {
50
56
  console.error("Turnstile response reported as failure: " + JSON.stringify(result))
51
57
  _displayChallengeError();
@@ -1,5 +1,6 @@
1
+ <%# locals: (bot_challenge_config:) -%>
1
2
  <div
2
3
  class="cf-turnstile"
3
- data-sitekey="<%= cf_turnstile_sitekey %>"
4
+ data-sitekey="<%= bot_challenge_config.cf_turnstile_sitekey %>"
4
5
  data-callback="turnstileCallback"
5
6
  ></div>
@@ -1,7 +1,7 @@
1
1
  <div class="bot_challenge_page">
2
2
  <h1 class="mb-4"><%= t('bot_challenge_page.title') %></h1>
3
3
 
4
- <%= render "bot_challenge_page/turnstile_widget_placeholder" %>
4
+ <%= render "bot_challenge_page/turnstile_widget_placeholder", bot_challenge_config: @bot_challenge_config %>
5
5
 
6
6
  <noscript>
7
7
  <div class="alert alert-danger"><%= t('bot_challenge_page.noscript') %></div>
@@ -16,14 +16,14 @@
16
16
  </div>
17
17
  </template>
18
18
 
19
- <template id="botChallengePageStillAroundTemplate" data-still_around_delay_ms="<%= still_around_delay_ms %>">
19
+ <template id="botChallengePageStillAroundTemplate" data-still_around_delay_ms="<%= @bot_challenge_config.still_around_delay_ms %>">
20
20
  <div class="alert alert-info" role="alert">
21
21
  <i class="fa fa-info-circle" aria-hidden="true"></i>
22
22
  <%= t('bot_challenge_page.still_around') %>
23
23
  </div>
24
24
  </template>
25
25
 
26
- <script src="<%= cf_turnstile_js_url %>" async defer></script>
26
+ <script src="<%= @bot_challenge_config.cf_turnstile_js_url %>" async defer></script>
27
27
 
28
- <%= render "bot_challenge_page/local_turnstile_script_tag" %>
28
+ <%= render "bot_challenge_page/local_turnstile_script_tag", bot_challenge_config: @bot_challenge_config %>
29
29
  </div>
@@ -1,3 +1,3 @@
1
1
  module BotChallengePage
2
- VERSION = "0.2.0"
2
+ VERSION = "0.3.1"
3
3
  end
@@ -3,10 +3,14 @@ module BotChallengePage
3
3
  source_root File.expand_path("templates", __dir__)
4
4
 
5
5
  class_option :'rack_attack', type: :boolean, default: true, desc: "Support rate-limit allowance configuration"
6
+ class_option :redirect_for_challenge, type: :boolean, default: false, desc: "Redirect to separate challenge page instead of inline challenge"
6
7
 
7
8
  def generate_routes
8
- route 'get "/challenge", to: "bot_challenge_page/bot_challenge_page#challenge", as: :bot_detect_challenge'
9
- route 'post "/challenge", to: "bot_challenge_page/bot_challenge_page#verify_challenge"'
9
+ route 'post "/challenge", to: "bot_challenge_page/bot_challenge_page#verify_challenge", as: :bot_detect_challenge'
10
+
11
+ if options[:redirect_for_challenge]
12
+ route 'get "/challenge", to: "bot_challenge_page/bot_challenge_page#challenge"'
13
+ end
10
14
  end
11
15
 
12
16
  def add_before_filter_enforcement
@@ -4,9 +4,14 @@ Rails.application.config.to_prepare do
4
4
 
5
5
  # Get from CloudFlare Turnstile: https://www.cloudflare.com/application-services/products/turnstile/
6
6
  # Some testing keys are also available: https://developers.cloudflare.com/turnstile/troubleshooting/testing/
7
+ #
8
+ # Always pass testing sitekey: "1x00000000000000000000AA"
7
9
  BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_sitekey = "MUST GET"
10
+ # Always pass testing secret_key: "1x0000000000000000000000000000000AA"
8
11
  BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_secret_key = "MUST GET"
9
12
 
13
+ BotChallengePage::BotChallengePageController.bot_challenge_config.redirect_for_challenge = <%= options[:redirect_for_challenge] %>
14
+
10
15
  <%- if options[:rack_attack] %>
11
16
  # What paths do you want to protect?
12
17
  #
@@ -33,7 +38,7 @@ Rails.application.config.to_prepare do
33
38
  # BotChallengePage::BotChallengePageController.bot_challenge_config.session_passed_good_for = 36.hours
34
39
 
35
40
  # Exempt some requests from bot challenge protection
36
- # BotChallengePage::BotChallengePageController.allow_exempt = ->(controller) {
41
+ # BotChallengePage::BotChallengePageController.bot_challenge_config.allow_exempt = ->(controller) {
37
42
  # # controller.params
38
43
  # # controller.request
39
44
  # # controller.session
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: bot_challenge_page
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2025-03-03 00:00:00.000000000 Z
11
+ date: 2025-04-07 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: appraisal
@@ -156,6 +156,8 @@ files:
156
156
  - app/controllers/concerns/bot_challenge_page/enforce_filter.rb
157
157
  - app/controllers/concerns/bot_challenge_page/rack_attack_init.rb
158
158
  - app/models/bot_challenge_page/config.rb
159
+ - app/models/bot_challenge_page/simple_pow1.rb
160
+ - app/models/bot_challenge_page/test.html
159
161
  - app/views/bot_challenge_page/_local_turnstile_script_tag.html.erb
160
162
  - app/views/bot_challenge_page/_turnstile_widget_placeholder.html.erb
161
163
  - app/views/bot_challenge_page/bot_challenge_page/challenge.html.erb