bot_challenge_page 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: f388eccf957733cbaab0c019d82e9803b215bf8035b2b10dc7c656b821907e7d
4
+ data.tar.gz: 5b8126932d6bd901ddadab4dc68259bb1d2652992de882421fbfb0cfca67b500
5
+ SHA512:
6
+ metadata.gz: e5a1b5e05aefd618aca8e14570e1e0c9a32f4d5144f6bd93dabae265d1ac69759ebe19bc4dcdaf9fc07a0b121ba81be819f4d46d7ac1b111454d1bb191131906
7
+ data.tar.gz: 58cf73819292626fc40a696fecb7c7737ab89f92b7093e9c707bb42fe95ff83cda7303c510b807ab41a221fd0fe5c0ed5d3251983b466726dea9f252f276bdac
data/MIT-LICENSE ADDED
@@ -0,0 +1,20 @@
1
+ Copyright Jonathan Rochkind
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining
4
+ a copy of this software and associated documentation files (the
5
+ "Software"), to deal in the Software without restriction, including
6
+ without limitation the rights to use, copy, modify, merge, publish,
7
+ distribute, sublicense, and/or sell copies of the Software, and to
8
+ permit persons to whom the Software is furnished to do so, subject to
9
+ the following conditions:
10
+
11
+ The above copyright notice and this permission notice shall be
12
+ included in all copies or substantial portions of the Software.
13
+
14
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
15
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
16
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
17
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
18
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
19
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
20
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,119 @@
1
+ # BotChallengePage
2
+
3
+ [![CI](https://github.com/samvera-labs/bot_challenge_page/actions/workflows/ci.yml/badge.svg)](https://github.com/samvera-labs/bot_challenge_page/actions/workflows/ci.yml)
4
+
5
+ BotChallengePage lets you protect certain routes in your Rails app with [CloudFlare Turnstile](https://www.cloudflare.com/application-services/products/turnstile/) "CAPTHCA alternate" bot detector. Rather than the typical form submission use case for Turnstile, the user will be redirected to an interstitial challenge page, and redirected back on success.
6
+
7
+ The motivating use case is fairly dumb (probably AI-related) crawlers, rather than targetted attacks, although we have tried to pay attention to security. Many of our use cases were crawlers getting caught in "infinite" page variations by following every combination of voluminous facet values in search results in a near "infinite space", and causing us resource usage issues.
8
+
9
+ ![challenge page screenshot](docs/challenge-page-example.png)
10
+
11
+ * You can optionally configure a rate limit that is allowed BEFORE the challenge is triggered
12
+ * Uses rack-attack to track rate, requires `Rails.cache` or `Rack::Attack.cache.store` to be set to a persistent shared high-performance cache, probably redis or memcached.
13
+
14
+ * Once a challenge is passed, the pass is stored in a cookie, and a challenge won't be redisplayed for a configurable amount of time, so long as cookie is present
15
+
16
+ * **Note:** User-agent does always need both cookies and javascript enabled to be able to pass challenge and get through!
17
+
18
+
19
+ ## Installation and Configuration
20
+
21
+ * Get a CloudFlare account and Turnstile widget set up, which should give you a turnstile `sitekey` and `secret_key` you will need later in configuration.
22
+
23
+ * `bundle add bot_challenge_page`, `bundle install`
24
+
25
+ * Run the installer
26
+ * if you want to use rack-attack for some permissive pre-challenge rate, `rails g bot_challenge_page:install`
27
+ * If you do not want to use rack-attack and want challenge on FIRST request, `rails g bot_challenge_page:install --without-rack-attack`
28
+
29
+ * Configure in the generated `./config/initializers/bot_challenge_page.rb`
30
+ * At a minimum you need to configure your Cloudflare Turnstile keys, and some paths to protect!
31
+ * Note that we can only protect GET paths, and also think about making sure you DON'T protect
32
+ any path your front-end needs JS `fetch` access to, as this would block it (at least
33
+ without custom front-end code we haven't really explored)
34
+ * If you are tempted to just protect `/` that may work, but worth thinking about any hearbeat paths, front-end requestable paths, or other machine-access-desired paths.
35
+ * Some other configuration options are offered -- more advanced/specialized ones are available that are not mentioned in generated config file, see [Config class](./app/models/bot_challenge_page/config.rb)
36
+
37
+
38
+ ## Customize challenge page display
39
+
40
+ Some of the default challenge page html uses bootstrap alert classes. You may want to provide custom CSS if you aren't using bootstrap. You can see the default challenge page html at [challenge.html.erb](./app/views/bot_challenge_page/bot_challenge_page/challenge.html). You may wish to CSS-style other parts too!
41
+
42
+ You can customize all text via I18n, see keys in [bot_challenge_page.en.yml](./config/locales/bot_challenge_page.en.yml)
43
+
44
+ The challenge page by default will be displayed in your app's default rails `layout`.
45
+
46
+ To customize the layout or challenge page HTML more further, you can use configuration to supply a `render` method for the controller pointing to your own templates or other layouts. You will probably want to re-use the partials we use in our default template, for standard functionality. And you'll want to provide `<template>` elements with the same id's for those elements, but can put whatever you want inside the templates!
47
+
48
+ ```ruby
49
+ BotChallengePage::BotChallengePageController.bot_challenge_config.challenge_renderer = ()-> {
50
+ render "my_local_view_folder/whatever", layout "another_layout"
51
+ render layout: "another_layout" # default html but change layout. etc.
52
+ }
53
+ ```
54
+
55
+ ## Example possible Blacklight config
56
+
57
+ Many of us in my professional community use [blacklight](https://github.com/projectblacklight/blacklight). Here's a possible sample blacklight config to:
58
+
59
+ * Protect default catalog controller, including search results and any other actions
60
+ * But give the user 3 free searches in a 36 hour period before challenged
61
+ * For the #facet action used for "facet… more" links -- exempt from protection if the request is being made by a browser JS `fetch`, we just let those through. (Which means a determined attacker could do that on purpose, not defense against on purpose DDoS)
62
+
63
+ ```ruby
64
+ Rails.application.config.to_prepare do
65
+ BotChallengePage::BotChallengePageController.bot_challenge_config.enabled = true
66
+
67
+ # Get from CloudFlare Turnstile: https://www.cloudflare.com/application-services/products/turnstile/
68
+ BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_sitekey = "MUST GET"
69
+ BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_secret_key = "MUST GET"
70
+
71
+ BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limited_locations = [
72
+ "/catalog"
73
+ ]
74
+
75
+ # allow rate_limit_count requests in rate_limit_period, before issuing challenge
76
+ BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limit_period = 36.hour
77
+ BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limit_count = 3
78
+
79
+ BotChallengePage::BotChallengePageController.allow_exempt = ->(controller) {
80
+ # Excempt any Catalog #facet action that looks like an ajax/fetch request, the redirect
81
+ # ain't gonna work there, we just exempt it.
82
+ #
83
+ # sec-fetch-dest is set to 'empty' by browser on fetch requests, to limit us further;
84
+ # sure an attacker could fake it, we don't mind if someone determined can avoid
85
+ # rate-limiting on this one action
86
+ ( controller.params[:action] == "facet" &&
87
+ controller.request.headers["sec-fetch-dest"] == "empty" &&
88
+ controller.kind_of?(CatalogController)
89
+ )
90
+ }
91
+
92
+ BotChallengePage::BotChallengePageController.rack_attack_init
93
+ end
94
+
95
+ ```
96
+
97
+ ## Possible future features?
98
+
99
+ * allow regex in default location_matcher? Easy to do if you want it, just say so.
100
+
101
+ * We could support swap-in Turnstile-alternatives, like [hCAPTHCA](https://www.hcaptcha.com/), [Google reCAPTCHA v3](https://developers.google.com/recaptcha/docs/v3), or even open source proof of work implementations like [ALTCHA](https://altcha.org/docs/get-started/), [pow-bot-deterrent](https://github.com/sequentialread/pow-bot-deterrent), or [Friendly Captcha](https://github.com/FriendlyCaptcha/friendly-captcha-sdk). But the (free) cost/benefit of Turnstile are pretty good, so I don't myself have a lot of motivation to add this complexity.
102
+
103
+ * Something to make it easier to switch the challenge on only based on signals that server/app is under some defined heavy load?
104
+
105
+
106
+ ## License
107
+ The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
108
+
109
+ ## See also/Acknowledgements
110
+
111
+ * [Joe Corral's blog post](https://lehigh-university-libraries.github.io/blog/turnstile.html) about using this approach at Lehigh University Libraries with an islandora/drupal app.
112
+
113
+ * Joe's [similar plugin for drupal](https://drupal.org/project/turnstile_protect)
114
+
115
+ * [Similar feature built into PHP VuFind app](https://github.com/vufind-org/vufind/pull/4079)
116
+
117
+ * Wow only after I developed all this did I notice [rails-cloudflare-turnstile](https://github.com/instrumentl/rails-cloudflare-turnstile) which implements some pieces that could have been re-used here, but I feel good.
118
+
119
+ * And yet another implementation in Rails that perhaps makes more assumptions about use cases, [turnstile-captcha](https://github.com/pfeiffer/turnstile-captcha). Haven't looked at it much.
data/Rakefile ADDED
@@ -0,0 +1,10 @@
1
+ require "bundler/setup"
2
+
3
+ APP_RAKEFILE = File.expand_path("spec/dummy/Rakefile", __dir__)
4
+ load "rails/tasks/engine.rake"
5
+
6
+ load "rails/tasks/statistics.rake"
7
+
8
+ require "bundler/gem_tasks"
9
+
10
+ require 'rspec/core/rake_task'
@@ -0,0 +1,160 @@
1
+ require 'http'
2
+
3
+ # This controller has actions for issuing a challenge page for CloudFlare Turnstile product,
4
+ # and then redirecting back to desired page.
5
+ #
6
+ # It also includes logic for configuring rack attack and a Rails controller filter to enforce
7
+ # redirection to these actions. All the logic related to bot detection with turnstile is
8
+ # mostly in this file -- with very flexible configuration in class_attributes -- to faciliate
9
+ # future extraction to a re-usable gem if desired.
10
+ #
11
+ #
12
+ module BotChallengePage
13
+ class BotChallengePageController < ::ApplicationController
14
+ # Config for bot detection is held in class object here -- idea is
15
+ # to support different controllers with different config protecting
16
+ # different paths in your app if you like, is why config is with controller
17
+ class_attribute :bot_challenge_config, default: ::BotChallengePage::Config.new
18
+
19
+ delegate :cf_turnstile_js_url, :cf_turnstile_sitekey, :still_around_delay_ms, to: :bot_challenge_config
20
+ helper_method :cf_turnstile_js_url, :cf_turnstile_sitekey, :still_around_delay_ms
21
+
22
+ SESSION_DATETIME_KEY = "t"
23
+ SESSION_IP_KEY = "i"
24
+
25
+ # for allowing unsubscribe for testing
26
+ class_attribute :_track_notification_subscription, instance_accessor: false
27
+
28
+ # perhaps in an initializer, and after changing any config, run:
29
+ #
30
+ # Rails.application.config.to_prepare do
31
+ # BotChallengePage::BotChallengePageController.rack_attack_init
32
+ # end
33
+ #
34
+ # Safe to call more than once if you change config and want to call again, say in testing.
35
+ def self.rack_attack_init
36
+ self._rack_attack_uninit # make it safe for calling multiple times
37
+
38
+ ## Turnstile bot detection throttling
39
+ #
40
+ # for paths matched by `rate_limited_locations`, after over rate_limit count requests in rate_limit_period,
41
+ # token will be stored in rack env instructing challenge is required.
42
+ #
43
+ # For actual challenge, need before_action in controller.
44
+ #
45
+ # You could rate limit detect on wider paths than you actually challenge on, or the same. You probably
46
+ # don't want to rate-limit detect on narrower list of paths than you challenge on!
47
+ Rack::Attack.track("bot_detect/rate_exceeded/#{self.name}",
48
+ limit: self.bot_challenge_config.rate_limit_count,
49
+ period: self.bot_challenge_config.rate_limit_period) do |req|
50
+ if self.bot_challenge_config.enabled && self.bot_challenge_config.location_matcher.call(req, self.bot_challenge_config)
51
+ self.bot_challenge_config.rate_limit_discriminator.call(req, self.bot_challenge_config)
52
+ end
53
+ end
54
+
55
+ self._track_notification_subscription = ActiveSupport::Notifications.subscribe("track.rack_attack") do |_name, _start, _finish, request_id, payload|
56
+ rack_request = payload[:request]
57
+ rack_env = rack_request.env
58
+ match_name = rack_env["rack.attack.matched"] # name of rack-attack rule
59
+ #
60
+ if match_name == "bot_detect/rate_exceeded/#{self.name}"
61
+ match_data = rack_env["rack.attack.match_data"]
62
+ match_data_formatted = match_data.slice(:count, :limit, :period).map { |k, v| "#{k}=#{v}"}.join(" ")
63
+ discriminator = rack_env["rack.attack.match_discriminator"] # unique key for rate limit, usually includes ip
64
+
65
+ rack_env[self.bot_challenge_config.env_challenge_trigger_key] = true
66
+ end
67
+ end
68
+ end
69
+
70
+ def self._rack_attack_uninit
71
+ Rack::Attack.track("bot_detect/rate_exceeded/#{self.name}") {} # overwrite track name with empty proc
72
+ ActiveSupport::Notifications.unsubscribe(self._track_notification_subscription) if self._track_notification_subscription
73
+ self._track_notification_subscription = nil
74
+ end
75
+
76
+ # Usually in your ApplicationController,
77
+ #
78
+ # before_action { |controller| BotChallengePage::BotChallengePageController.bot_challenge_enforce_filter(controller) }
79
+ #
80
+ # @param immediate [Boolean] always force bot protection, ignore any allowed pre-challenge rate limit
81
+ def self.bot_challenge_enforce_filter(controller, immediate: false)
82
+ if self.bot_challenge_config.enabled &&
83
+ (controller.request.env[self.bot_challenge_config.env_challenge_trigger_key] || immediate) &&
84
+ ! self._bot_detect_passed_good?(controller.request) &&
85
+ ! controller.kind_of?(self) && # don't ever guard ourself, that'd be a mess!
86
+ ! self.bot_challenge_config.allow_exempt.call(controller, self.bot_challenge_config)
87
+
88
+ # we can only do GET requests right now
89
+ if !controller.request.get?
90
+ Rails.logger.warn("#{self}: Asked to protect request we could not, unprotected: #{controller.request.method} #{controller.request.url}, (#{controller.request.remote_ip}, #{controller.request.user_agent})")
91
+ return
92
+ end
93
+
94
+ Rails.logger.info("#{self.name}: Cloudflare Turnstile challenge redirect: (#{controller.request.remote_ip}, #{controller.request.user_agent}): from #{controller.request.url}")
95
+ # status code temporary
96
+ controller.redirect_to controller.bot_detect_challenge_path(dest: controller.request.original_fullpath), status: 307
97
+ end
98
+ end
99
+
100
+ # Does the session already contain a bot detect pass that is good for this request
101
+ # Tie to IP address to prevent session replay shared among IPs
102
+ def self._bot_detect_passed_good?(request)
103
+ session_data = request.session[self.bot_challenge_config.session_passed_key]
104
+
105
+ return false unless session_data && session_data.kind_of?(Hash)
106
+
107
+ datetime = session_data[SESSION_DATETIME_KEY]
108
+ ip = session_data[SESSION_IP_KEY]
109
+
110
+ (ip == request.remote_ip) && (Time.now - Time.iso8601(datetime) < self.bot_challenge_config.session_passed_good_for )
111
+ end
112
+
113
+
114
+ def challenge
115
+ # possible custom render to choose layouts or templates, but normally
116
+ # we just do default rails render and this proc is empty.
117
+ if self.bot_challenge_config.challenge_renderer
118
+ instance_exec &self.bot_challenge_config.challenge_renderer
119
+ end
120
+ end
121
+
122
+ def verify_challenge
123
+ body = {
124
+ secret: self.bot_challenge_config.cf_turnstile_secret_key,
125
+ response: params["cf_turnstile_response"],
126
+ remoteip: request.remote_ip
127
+ }
128
+
129
+ http = HTTP.timeout(self.bot_challenge_config.cf_timeout)
130
+ response = http.post(self.bot_challenge_config.cf_turnstile_validation_url,
131
+ json: body)
132
+
133
+ result = response.parse
134
+ # {"success"=>true, "error-codes"=>[], "challenge_ts"=>"2025-01-06T17:44:28.544Z", "hostname"=>"example.com", "metadata"=>{"result_with_testing_key"=>true}}
135
+ # {"success"=>false, "error-codes"=>["invalid-input-response"], "messages"=>[], "metadata"=>{"result_with_testing_key"=>true}}
136
+
137
+ if result["success"]
138
+ # mark it as succesful in session, and record time. They do need a session/cookies
139
+ # to get through the challenge.
140
+ Rails.logger.info("#{self.class.name}: Cloudflare Turnstile validation passed api (#{request.remote_ip}, #{request.user_agent}): #{params["dest"]}")
141
+ session[self.bot_challenge_config.session_passed_key] = {
142
+ SESSION_DATETIME_KEY => Time.now.utc.iso8601,
143
+ SESSION_IP_KEY => request.remote_ip
144
+ }
145
+ else
146
+ Rails.logger.warn("#{self.class.name}: Cloudflare Turnstile validation failed (#{request.remote_ip}, #{request.user_agent}): #{result}: #{params["dest"]}")
147
+ end
148
+
149
+ # let's just return the whole thing to client? Is there anything confidential there?
150
+ render json: result
151
+ rescue HTTP::Error, JSON::ParserError => e
152
+ # probably an http timeout? or something weird.
153
+ Rails.logger.warn("#{self.class.name}: Cloudflare turnstile validation error (#{request.remote_ip}, #{request.user_agent}): #{e}: #{response&.body}")
154
+ render json: {
155
+ success: false,
156
+ http_exception: e
157
+ }
158
+ end
159
+ end
160
+ end
@@ -0,0 +1,116 @@
1
+ module BotChallengePage
2
+ class Config
3
+ # meh let's do a little accessor definition to make this value class more legible
4
+
5
+ # default can be a proc, in which case it really is a proc as a value for default,
6
+ # the value is the proc!
7
+ def self.attribute(name, default:nil)
8
+ attr_defaults[name] = default
9
+ self.attr_accessor name
10
+ end
11
+
12
+ class_attribute :attr_defaults, default: {}, instance_accessor: false
13
+
14
+ def initialize(**values)
15
+ self.class.attr_defaults.merge(values).each_pair do |key, value|
16
+ # super hacky way to execute any procs in the context of this config,
17
+ # so they can access other config values easily.
18
+ if value.kind_of?(Proc)
19
+ newval = lambda do |*args|
20
+ self.instance_exec(*args, &value)
21
+ end
22
+ else
23
+ newval = value
24
+ end
25
+
26
+ send("#{key}=", newval)
27
+ end
28
+ end
29
+
30
+ attribute :enabled, default: false # Must set to true to turn on at all
31
+
32
+ attribute :cf_turnstile_sitekey, default: "1x00000000000000000000AA" # a testing key that always passes
33
+ attribute :cf_turnstile_secret_key, default: "1x0000000000000000000000000000000AA" # a testing key always passes
34
+ # Turnstile testing keys: https://developers.cloudflare.com/turnstile/troubleshooting/testing/
35
+
36
+ # up to rate_limit_count requests in rate_limit_period before challenged
37
+ attribute :rate_limit_period, default: 12.hour
38
+ attribute :rate_limit_count, default: 10
39
+
40
+ # how long is a challenge pass good for before re-challenge?
41
+ attribute :session_passed_good_for, default: 24.hours
42
+
43
+ # An array, can be:
44
+ # * a string, path prefix
45
+ # * a hash of rails route-decoded params, like `{ controller: "something" }`,
46
+ # or `{ controller: "something", action: "index" }
47
+ # The hash is more expensive to check and uses some not-technically-public
48
+ # Rails api, but it's just so convenient.
49
+ #
50
+ # Used by default :location_matcher, if set custom may not be used
51
+ attribute :rate_limited_locations, default: []
52
+
53
+ # Executed at the _controller_ filter level, to last minute exempt certain
54
+ # actions from protection.
55
+ attribute :allow_exempt, default: ->(controller, config) { false }
56
+
57
+ # replace with say `->() { render layout: 'something' }`, or `render "somedir/some_template"`
58
+ attribute :challenge_renderer, default: nil
59
+
60
+
61
+ # rate limit per subnet, following lehigh's lead, although we use a smaller
62
+ # subnet: /24 for IPv4, and /72 for IPv6
63
+ # https://git.drupalcode.org/project/turnstile_protect/-/blob/0dae9f95d48f9d8cae5a8e61e767c69f64490983/src/EventSubscriber/Challenge.php#L140-151
64
+ attribute :rate_limit_discriminator, default: (lambda do |req, config|
65
+ if req.ip.index(":") # ipv6
66
+ IPAddr.new("#{req.ip}/24").to_string
67
+ else
68
+ IPAddr.new("#{req.ip}/72").to_string
69
+ end
70
+ rescue IPAddr::InvalidAddressError
71
+ req.ip
72
+ end)
73
+
74
+ attribute :location_matcher, default: ->(rack_req, config) {
75
+ parsed_route = nil
76
+ config.rate_limited_locations.any? do |val|
77
+ case val
78
+ when Hash
79
+ begin
80
+ # #recognize_path may e not techinically public API, and may be expensive, but
81
+ # no other way to do this, and it's mentioned in rack-attack:
82
+ # https://github.com/rack/rack-attack/blob/86650c4f7ea1af24fe4a89d3040e1309ee8a88bc/docs/advanced_configuration.md#match-actions-in-rails
83
+ # We do it lazily only if needed so if you don't want that don't use it.
84
+ parsed_route ||= rack_req.env["action_dispatch.routes"].recognize_path(rack_req.url, method: rack_req.request_method)
85
+ parsed_route && parsed_route >= val
86
+ rescue ActionController::RoutingError
87
+ false
88
+ end
89
+ when String
90
+ # string complete path at beginning, must end in ?, or end of string
91
+ /\A#{Regexp.escape val}(\/|\?|\Z)/ =~ rack_req.path
92
+ end
93
+ end
94
+ }
95
+ attribute :cf_turnstile_js_url, default: "https://challenges.cloudflare.com/turnstile/v0/api.js"
96
+ attribute :cf_turnstile_validation_url, default: "https://challenges.cloudflare.com/turnstile/v0/siteverify"
97
+ attribute :cf_timeout, default: 3 # max timeout seconds waiting on Cloudfront Turnstile api
98
+
99
+
100
+ # key stored in Rails session object with channge passed confirmed
101
+ attribute :session_passed_key, default: "bot_detection-passed"
102
+
103
+ # key in rack env that says challenge is required
104
+ attribute :env_challenge_trigger_key, default: "bot_detect.should_challenge"
105
+
106
+ attribute :still_around_delay_ms, default: 1200
107
+
108
+ # make sure dup dups all attributes please
109
+ def initialize_dup(source)
110
+ self.class.attr_defaults.keys.each do |attr_key|
111
+ instance_variable_set("@#{attr_key}", instance_variable_get("@#{attr_key}").deep_dup)
112
+ super
113
+ end
114
+ end
115
+ end
116
+ end
@@ -0,0 +1,72 @@
1
+ <%# we deliver our simple javascript as inline script to make deployment more
2
+ reliable without having to deal with different asset pipelines, and it's really a fine choice anyway %>
3
+
4
+ <script type="text/javascript">
5
+ async function turnstileCallback(token) {
6
+ try {
7
+ // I don't know if we could be disabling CSRF instead for this one, but we'll just use it
8
+ const csrfToken = document.querySelector("[name='csrf-token']");
9
+
10
+ const response = await fetch('<%= bot_detect_challenge_path %>', {
11
+ method: 'POST',
12
+ headers: {
13
+ "X-CSRF-Token": csrfToken?.content,
14
+ "Content-Type": "application/json"
15
+ },
16
+ body: JSON.stringify({ cf_turnstile_response: token }),
17
+ });
18
+
19
+ if (!response.ok) {
20
+ throw new Error('bad response: ' + response.status + ": " + response.url);
21
+ }
22
+
23
+ // This page may end up staying around on sucesss, stay if the dest url is a media
24
+ // type that can only be downloaded.
25
+ //
26
+ // When so, if the page stays around, it may end up
27
+ // calling turnstile and this callback over and over again, without an (that
28
+ // we're not tracking) this will remove the most recent turnstile widget executed,
29
+ // we only expect one.
30
+ turnstile.remove();
31
+
32
+ result = await response.json();
33
+ if (result["success"] == true) {
34
+ const dest = new URLSearchParams(window.location.search).get("dest");
35
+ // For security make sure it only has path and on
36
+ if (!dest.startsWith("/") || dest.startsWith("//")) {
37
+ throw new Error("illegal non-local redirect: " + dest);
38
+ }
39
+
40
+ // in case this page stays around, (say it was rediret to media asset), let's add a failsafe message after
41
+ // a couple seconds.
42
+ const delay = document.querySelector("#botChallengePageStillAroundTemplate")?.getAttribute("data-still-around-delay-ms") || 1200;
43
+ window.setTimeout(function() {
44
+ _displayStillAroundNote()
45
+ }, delay);
46
+
47
+ // replace the challenge page in history
48
+ window.location.replace(dest);
49
+ } else {
50
+ console.error("Turnstile response reported as failure: " + JSON.stringify(result))
51
+ _displayChallengeError();
52
+ }
53
+ } catch(error) {
54
+ console.error("Error processing turnstile challenge backend action: " + error);
55
+ _displayChallengeError();
56
+ }
57
+ }
58
+
59
+ function _displayChallengeError() {
60
+ const template = document.querySelector("#botChallengePageErrorTemplate");
61
+ const clone = template.content.cloneNode(true);
62
+ document.querySelector(".cf-turnstile").replaceChildren(clone);
63
+ }
64
+
65
+ // If the page is still around after location changed, what's up?
66
+ // Warn them they might need t use back button, maybe it was a media download
67
+ function _displayStillAroundNote() {
68
+ const template = document.querySelector("#botChallengePageStillAroundTemplate");
69
+ const clone = template.content.cloneNode(true);
70
+ document.querySelector(".cf-turnstile")?.after(clone);
71
+ }
72
+ </script>
@@ -0,0 +1,5 @@
1
+ <div
2
+ class="cf-turnstile"
3
+ data-sitekey="<%= cf_turnstile_sitekey %>"
4
+ data-callback="turnstileCallback"
5
+ ></div>
@@ -0,0 +1,29 @@
1
+ <div class="bot_challenge_page">
2
+ <h1 class="mb-4"><%= t('bot_challenge_page.title') %></h1>
3
+
4
+ <%= render "bot_challenge_page/turnstile_widget_placeholder" %>
5
+
6
+ <noscript>
7
+ <div class="alert alert-danger"><%= t('bot_challenge_page.noscript') %></div>
8
+ </noscript>
9
+
10
+ <%= t('bot_challenge_page.blurb_html') %>
11
+
12
+ <template id="botChallengePageErrorTemplate">
13
+ <div class="alert alert-danger" role="alert">
14
+ <i class="fa fa-exclamation-triangle" aria-hidden="true"></i>
15
+ <%= t('bot_challenge_page.error') %>
16
+ </div>
17
+ </template>
18
+
19
+ <template id="botChallengePageStillAroundTemplate" data-still_around_delay_ms="<%= still_around_delay_ms %>">
20
+ <div class="alert alert-info" role="alert">
21
+ <i class="fa fa-info-circle" aria-hidden="true"></i>
22
+ <%= t('bot_challenge_page.still_around') %>
23
+ </div>
24
+ </template>
25
+
26
+ <script src="<%= cf_turnstile_js_url %>" async defer></script>
27
+
28
+ <%= render "bot_challenge_page/local_turnstile_script_tag" %>
29
+ </div>
@@ -0,0 +1,7 @@
1
+ en:
2
+ bot_challenge_page:
3
+ title: Traffic control and bot detection...
4
+ noscript: Sorry, Javascript is required to be enabled for our traffic check, and does not appear available.
5
+ blurb_html: <p>If this check is preventing you from making use of our resources, make sure you have cookies enabled.</p>
6
+ error: Check failed. Sorry, something has gone wrong, or your traffic looks unusual to us. You can try refreshing this page to try again.
7
+ still_around: The traffic check has completed. You may need to return to your original browser tab or press the back button.
data/config/routes.rb ADDED
@@ -0,0 +1,2 @@
1
+ Rails.application.routes.draw do
2
+ end
@@ -0,0 +1,4 @@
1
+ module BotChallengePage
2
+ class Engine < ::Rails::Engine
3
+ end
4
+ end
@@ -0,0 +1,3 @@
1
+ module BotChallengePage
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,6 @@
1
+ require "bot_challenge_page/version"
2
+ require "bot_challenge_page/engine"
3
+
4
+ module BotChallengePage
5
+ # Your code goes here...
6
+ end
@@ -0,0 +1,45 @@
1
+ module BotChallengePage
2
+ class InstallGenerator < Rails::Generators::Base
3
+ source_root File.expand_path("templates", __dir__)
4
+
5
+ class_option :'rack_attack', type: :boolean, default: true, desc: "Support rate-limit allowance configuration"
6
+
7
+ def generate_routes
8
+ route 'get "/challenge", to: "bot_challenge_page/bot_challenge_page#challenge", as: :bot_detect_challenge'
9
+ route 'post "/challenge", to: "bot_challenge_page/bot_challenge_page#verify_challenge"'
10
+ end
11
+
12
+ def add_before_filter_enforcement
13
+ inject_into_class "app/controllers/application_controller.rb", "ApplicationController" do
14
+ filter_code = if options[:rack_attack]
15
+ "BotChallengePage::BotChallengePageController.bot_challenge_enforce_filter(controller)"
16
+ else
17
+ "BotChallengePage::BotChallengePageController.bot_challenge_enforce_filter(controller, immediate: true)"
18
+ end
19
+
20
+ <<-EOS
21
+ # This will only protect CONFIGURED routes, but also could be put on just certain
22
+ # controllers, it does not need to be in ApplicationController
23
+ before_action do |controller|
24
+ #{filter_code}
25
+ end
26
+
27
+ EOS
28
+ end
29
+ end
30
+
31
+ def add_rack_attack_require_if_needed
32
+ if options[:rack_attack]
33
+ # since it's an intermediate dependency, we need to require it after rails
34
+ # so it will load it's rails stuff
35
+ inject_into_file "config/application.rb", "\nrequire 'rack/attack'\n", after: /require.*rails\/[^\n]+\n/m
36
+
37
+ end
38
+ end
39
+
40
+ def copy_initializer_file
41
+ template "initializer.rb.erb", "config/initializers/bot_challenge_page.rb"
42
+ end
43
+
44
+ end
45
+ end
@@ -0,0 +1,50 @@
1
+ Rails.application.config.to_prepare do
2
+
3
+ BotChallengePage::BotChallengePageController.bot_challenge_config.enabled = true
4
+
5
+ # Get from CloudFlare Turnstile: https://www.cloudflare.com/application-services/products/turnstile/
6
+ BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_sitekey = "MUST GET"
7
+ BotChallengePage::BotChallengePageController.bot_challenge_config.cf_turnstile_secret_key = "MUST GET"
8
+
9
+ # What paths do you want to protect?
10
+ #
11
+ # You can use path prefixes: "/catalog" or even "/"
12
+ #
13
+ # Or hashes with controller and/or action:
14
+ #
15
+ # { controller: "catalog" }
16
+ # { controller: "catalog", action: "index" }
17
+ #
18
+ # Note that we can only protect GET paths, and also think about making sure you DON'T protect
19
+ # any path your front-end needs JS `fetch` access to, as this would block it (at least
20
+ # without custom front-end code we haven't really explored)
21
+
22
+ BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limited_locations = [
23
+ ]
24
+
25
+ # How long will a challenge success exempt a session from further challenges?
26
+ # BotChallengePage::BotChallengePageController.bot_challenge_config.session_passed_good_for = 36.hours
27
+
28
+ <%- if options[:rack_attack] %>
29
+ # allow rate_limit_count requests in rate_limit_period, before issuing challenge
30
+ BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limit_period = 12.hour
31
+ BotChallengePage::BotChallengePageController.bot_challenge_config.rate_limit_count = 2
32
+ <% end -%>
33
+
34
+ # Exempt some requests from bot challenge protection
35
+ # BotChallengePage::BotChallengePageController.allow_exempt = ->(controller) {
36
+ # # controller.params
37
+ # # controller.request
38
+ # # controller.session
39
+
40
+ # # Here's a way to identify browser `fetch` API requests; note
41
+ # # it can be faked by an "attacker"
42
+ # controller.request.headers["sec-fetch-dest"] == "empty"
43
+ # }
44
+
45
+ # More configuration is available
46
+
47
+ <%- if options[:rack_attack] %>
48
+ BotChallengePage::BotChallengePageController.rack_attack_init
49
+ <% end %>
50
+ end
@@ -0,0 +1,4 @@
1
+ # desc "Explaining what the task does"
2
+ # task :bot_challenge_page do
3
+ # # Task goes here
4
+ # end
metadata ADDED
@@ -0,0 +1,193 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bot_challenge_page
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Jonathan Rochkind
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2025-02-27 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: appraisal
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: rspec-rails
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '7.1'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '7.1'
41
+ - !ruby/object:Gem::Dependency
42
+ name: capybara
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '3.40'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '3.40'
55
+ - !ruby/object:Gem::Dependency
56
+ name: selenium-webdriver
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - ">="
60
+ - !ruby/object:Gem::Version
61
+ version: '0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - ">="
67
+ - !ruby/object:Gem::Version
68
+ version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: webmock
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - "~>"
74
+ - !ruby/object:Gem::Version
75
+ version: '3.5'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - "~>"
81
+ - !ruby/object:Gem::Version
82
+ version: '3.5'
83
+ - !ruby/object:Gem::Dependency
84
+ name: nokogiri
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rails
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '7.1'
104
+ - - "<"
105
+ - !ruby/object:Gem::Version
106
+ version: '8.1'
107
+ type: :runtime
108
+ prerelease: false
109
+ version_requirements: !ruby/object:Gem::Requirement
110
+ requirements:
111
+ - - ">="
112
+ - !ruby/object:Gem::Version
113
+ version: '7.1'
114
+ - - "<"
115
+ - !ruby/object:Gem::Version
116
+ version: '8.1'
117
+ - !ruby/object:Gem::Dependency
118
+ name: rack-attack
119
+ requirement: !ruby/object:Gem::Requirement
120
+ requirements:
121
+ - - "~>"
122
+ - !ruby/object:Gem::Version
123
+ version: '6.7'
124
+ type: :runtime
125
+ prerelease: false
126
+ version_requirements: !ruby/object:Gem::Requirement
127
+ requirements:
128
+ - - "~>"
129
+ - !ruby/object:Gem::Version
130
+ version: '6.7'
131
+ - !ruby/object:Gem::Dependency
132
+ name: http
133
+ requirement: !ruby/object:Gem::Requirement
134
+ requirements:
135
+ - - "~>"
136
+ - !ruby/object:Gem::Version
137
+ version: '5.2'
138
+ type: :runtime
139
+ prerelease: false
140
+ version_requirements: !ruby/object:Gem::Requirement
141
+ requirements:
142
+ - - "~>"
143
+ - !ruby/object:Gem::Version
144
+ version: '5.2'
145
+ description:
146
+ email:
147
+ - jonathan@dnil.net
148
+ executables: []
149
+ extensions: []
150
+ extra_rdoc_files: []
151
+ files:
152
+ - MIT-LICENSE
153
+ - README.md
154
+ - Rakefile
155
+ - app/controllers/bot_challenge_page/bot_challenge_page_controller.rb
156
+ - app/models/bot_challenge_page/config.rb
157
+ - app/views/bot_challenge_page/_local_turnstile_script_tag.html.erb
158
+ - app/views/bot_challenge_page/_turnstile_widget_placeholder.html.erb
159
+ - app/views/bot_challenge_page/bot_challenge_page/challenge.html.erb
160
+ - config/locales/bot_challenge_page.en.yml
161
+ - config/routes.rb
162
+ - lib/bot_challenge_page.rb
163
+ - lib/bot_challenge_page/engine.rb
164
+ - lib/bot_challenge_page/version.rb
165
+ - lib/generators/bot_challenge_page/install_generator.rb
166
+ - lib/generators/bot_challenge_page/templates/initializer.rb.erb
167
+ - lib/tasks/bot_challenge_page_tasks.rake
168
+ homepage: https://github.com/samvera-labs/bot_challenge_page
169
+ licenses:
170
+ - MIT
171
+ metadata:
172
+ homepage_uri: https://github.com/samvera-labs/bot_challenge_page
173
+ source_code_uri: https://github.com/samvera-labs/bot_challenge_page
174
+ post_install_message:
175
+ rdoc_options: []
176
+ require_paths:
177
+ - lib
178
+ required_ruby_version: !ruby/object:Gem::Requirement
179
+ requirements:
180
+ - - ">="
181
+ - !ruby/object:Gem::Version
182
+ version: '0'
183
+ required_rubygems_version: !ruby/object:Gem::Requirement
184
+ requirements:
185
+ - - ">="
186
+ - !ruby/object:Gem::Version
187
+ version: '0'
188
+ requirements: []
189
+ rubygems_version: 3.5.9
190
+ signing_key:
191
+ specification_version: 4
192
+ summary: Show a bot challenge interstitial for Rails, usually using Cloudflare Turnstile
193
+ test_files: []