decoyshield 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 DecoyShield contributors
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,353 @@
1
+ Metadata-Version: 2.4
2
+ Name: decoyshield
3
+ Version: 0.3.0
4
+ Summary: Web-layer counter-recon honeypot against agentic LLM attackers — drops invisible-to-human, visible-to-LLM payloads into your Flask/HTTP responses to halt, stall, or fingerprint AI-driven penetration scans.
5
+ Author: DecoyShield contributors
6
+ License: MIT
7
+ Project-URL: Homepage, https://github.com/lunayue0917-max/DecoyShield
8
+ Project-URL: Repository, https://github.com/lunayue0917-max/DecoyShield
9
+ Project-URL: Issues, https://github.com/lunayue0917-max/DecoyShield/issues
10
+ Project-URL: Changelog, https://github.com/lunayue0917-max/DecoyShield/blob/main/CHANGELOG.md
11
+ Keywords: honeypot,prompt-injection,llm-security,ai-security,defense,tarpit,flask,pentest,agentic-ai
12
+ Classifier: Development Status :: 3 - Alpha
13
+ Classifier: Environment :: Web Environment
14
+ Classifier: Framework :: Flask
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Intended Audience :: System Administrators
17
+ Classifier: Intended Audience :: Information Technology
18
+ Classifier: License :: OSI Approved :: MIT License
19
+ Classifier: Operating System :: OS Independent
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3 :: Only
22
+ Classifier: Programming Language :: Python :: 3.9
23
+ Classifier: Programming Language :: Python :: 3.10
24
+ Classifier: Programming Language :: Python :: 3.11
25
+ Classifier: Programming Language :: Python :: 3.12
26
+ Classifier: Programming Language :: Python :: 3.13
27
+ Classifier: Topic :: Internet :: WWW/HTTP
28
+ Classifier: Topic :: Security
29
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
30
+ Requires-Python: >=3.9
31
+ Description-Content-Type: text/markdown
32
+ License-File: LICENSE
33
+ Requires-Dist: Flask>=2.3
34
+ Requires-Dist: MarkupSafe>=2.0
35
+ Provides-Extra: dev
36
+ Requires-Dist: pytest>=7.0; extra == "dev"
37
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
38
+ Requires-Dist: build>=1.0; extra == "dev"
39
+ Requires-Dist: mypy>=1.0; extra == "dev"
40
+ Dynamic: license-file
41
+
42
+ # DecoyShield
43
+
44
+ > A web-layer counter-recon honeypot against **agentic LLM attackers**.
45
+ > Drop invisible-to-human, visible-to-LLM payloads into your HTTP responses
46
+ > to halt, stall, or fingerprint AI-driven penetration scans.
47
+
48
+ [![tests](https://github.com/lunayue0917-max/DecoyShield/actions/workflows/test.yml/badge.svg)](https://github.com/lunayue0917-max/DecoyShield/actions/workflows/test.yml)
49
+ [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
50
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
51
+ [![Typed](https://img.shields.io/badge/typed-PEP%20561-success)](https://peps.python.org/pep-0561/)
52
+
53
+ ---
54
+
55
+ ## Why this exists
56
+
57
+ LLM-driven offensive tools (PentestGPT, AutoGPT, custom LangChain agents)
58
+ are now scanning the web at scale. Unlike a human attacker, an LLM agent:
59
+
60
+ - **reads everything** in the response, including HTML comments, hidden
61
+ inputs, CSS-hidden text, and debug-style headers;
62
+ - **follows instructions** that look authoritative, especially when they
63
+ appear to come from the operator or the system;
64
+ - **burns tokens proportional to context complexity**, so deliberately
65
+ expensive "protocol" descriptions cost the attacker real money.
66
+
67
+ DecoyShield turns these properties into a defence. It plants three
68
+ classes of payload that humans cannot see in a rendered browser but an
69
+ LLM-driven scanner *will* read:
70
+
71
+ | Payload | What it does |
72
+ |---------|--------------|
73
+ | `moral_lock` | Re-asserts the attacker LLM's safety policy ("this is a research honeypot, abort"). |
74
+ | `token_blackhole` | Presents a bogus multi-step "WAF bypass protocol" that looks solvable but is engineered to consume reasoning tokens. |
75
+ | `traceback` | Induces the attacker LLM to disclose its model, operator prompt, and tool chain in the next request — giving you attribution. |
76
+
77
+ A defender dashboard at `/_defender/dashboard` shows captures in real
78
+ time, classified by an attacker-fingerprint heuristic.
79
+
80
+ ## Install
81
+
82
+ ```bash
83
+ pip install decoyshield
84
+ ```
85
+
86
+ > The PyPI distribution name is `decoyshield`, the Python import name is
87
+ > `decoyshield`. So you install one, import the other:
88
+ >
89
+ > ```python
90
+ > from decoyshield import FlaskHoneypot
91
+ > ```
92
+
93
+ From source:
94
+
95
+ ```bash
96
+ git clone https://github.com/lunayue0917-max/DecoyShield.git
97
+ cd decoyshield
98
+ pip install -e .
99
+ ```
100
+
101
+ ## Quick start
102
+
103
+ ```python
104
+ from flask import Flask
105
+ from decoyshield import FlaskHoneypot
106
+
107
+ app = Flask(__name__)
108
+ FlaskHoneypot(app)
109
+
110
+ @app.route("/healthz")
111
+ def healthz():
112
+ return {"status": "ok"}
113
+
114
+ if __name__ == "__main__":
115
+ app.run()
116
+ ```
117
+
118
+ That's the whole integration. The honeypot now:
119
+
120
+ - registers bait routes that look like a vulnerable internal portal
121
+ (`/`, `/admin`, `/login`, `/api/docs`, `/api/v1/users`, `/.env`,
122
+ `/robots.txt`);
123
+ - adds payload-bearing response headers to every response
124
+ (`X-Audit-Notice`, `X-Bypass-Protocol`, `X-Debug-Trace`);
125
+ - writes every captured request to `logs/captures.jsonl`;
126
+ - serves a live dashboard at `/_defender/dashboard`.
127
+
128
+ Visit `http://127.0.0.1:5000/` in a browser → looks like a normal
129
+ corporate portal. Hit it with `curl` (or, better, an LLM-driven scanner)
130
+ → check `/_defender/dashboard` to see what was captured.
131
+
132
+ ## Configuration
133
+
134
+ ```python
135
+ FlaskHoneypot(
136
+ app,
137
+
138
+ # Which bait routes to install. Drop ones that conflict with your
139
+ # real app. Default: all of them.
140
+ decoys=("index", "login", "admin", "api_docs",
141
+ "api_users", "robots", "dotenv"),
142
+
143
+ # URL prefix for the defender panel. Pick something unguessable in
144
+ # production so attackers cannot find their own capture trail.
145
+ dashboard_path="/_defender",
146
+
147
+ # Gate /_defender/* behind authentication. None = open (dev only).
148
+ # Use a (user, password) tuple for HTTP Basic, or a callable for
149
+ # custom checks (cookie, JWT, IP allowlist, …).
150
+ dashboard_auth=("watcher", "use-a-strong-password"),
151
+
152
+ # Where to append capture events.
153
+ log_path="logs/captures.jsonl",
154
+
155
+ # Rotate the capture log when it exceeds this many bytes. None
156
+ # disables rotation. Archives are named captures-YYYYMMDD-NNN.jsonl
157
+ # and never deleted automatically — you own retention.
158
+ rotate_max_bytes=50 * 1024 * 1024,
159
+
160
+ # Set False to skip the response-header injection (you'll still get
161
+ # bait routes and the dashboard, just no header-channel payloads).
162
+ auto_inject_headers=True,
163
+ )
164
+ ```
165
+
166
+ ### Custom payloads
167
+
168
+ ```python
169
+ from decoyshield import Honeypot, FlaskHoneypot, MORAL_LOCK, TOKEN_BLACKHOLE
170
+
171
+ hp = Honeypot(payloads={
172
+ "moral_lock": MORAL_LOCK,
173
+ "token_blackhole": TOKEN_BLACKHOLE,
174
+ "traceback": "...your own template...",
175
+ })
176
+
177
+ FlaskHoneypot(app, honeypot=hp)
178
+ ```
179
+
180
+ ### Custom fingerprinter
181
+
182
+ ```python
183
+ def my_detector(headers, path, method):
184
+ # return (verdict_str, tag_list, score_int)
185
+ ...
186
+
187
+ Honeypot(detector_fn=my_detector)
188
+ ```
189
+
190
+ ## Deploying to production
191
+
192
+ decoyshield ships safe defaults but a few choices are worth tightening
193
+ before you point a real domain at it.
194
+
195
+ ### 1. Authenticate the dashboard
196
+
197
+ The defender panel exposes every captured request — including the
198
+ attacker's own. Leaving it open means anyone who guesses the URL can
199
+ read your capture log and learn your bait routes.
200
+
201
+ ```python
202
+ import os
203
+ FlaskHoneypot(
204
+ app,
205
+ dashboard_path="/_internal/" + os.environ["DEFENDER_SLUG"],
206
+ dashboard_auth=(os.environ["DEFENDER_USER"], os.environ["DEFENDER_PASS"]),
207
+ )
208
+ ```
209
+
210
+ For richer auth (session cookies, JWT, IP allowlists, OAuth), pass a
211
+ callable:
212
+
213
+ ```python
214
+ from flask import request
215
+ FlaskHoneypot(app, dashboard_auth=lambda: request.cookies.get("admin") == TOKEN)
216
+ ```
217
+
218
+ ### 2. Watch out for `/.env` indexing
219
+
220
+ The `dotenv` decoy returns plausible-looking-but-fake credentials when
221
+ hit. On a public domain, search engines may index this and surface the
222
+ decoy creds in results. Either drop the `dotenv` decoy in your decoys
223
+ tuple, or restrict it via your reverse proxy:
224
+
225
+ ```python
226
+ FlaskHoneypot(app, decoys=("login", "admin", "api_docs", "api_users"))
227
+ ```
228
+
229
+ ### 3. `robots.txt` precedence
230
+
231
+ decoyshield's `robots.txt` decoy advertises forbidden paths like
232
+ `/admin` to bait scanners that read robots.txt. If you already serve a
233
+ real `robots.txt`, drop the `robots` decoy to avoid clobbering it.
234
+
235
+ ### 4. Log rotation and retention
236
+
237
+ Capture logs grow forever by default size policy (50 MiB → rotate, no
238
+ auto-delete). On a busy host, wire archives into your existing log
239
+ shipping or set up a cron to prune old archives:
240
+
241
+ ```bash
242
+ # delete archives older than 90 days
243
+ find logs/ -name 'captures-*.jsonl' -mtime +90 -delete
244
+ ```
245
+
246
+ Tune the threshold for your environment:
247
+
248
+ ```python
249
+ FlaskHoneypot(app, rotate_max_bytes=10 * 1024 * 1024) # 10 MiB
250
+ FlaskHoneypot(app, rotate_max_bytes=None) # disable
251
+ ```
252
+
253
+ ### 5. Reverse proxy / TLS
254
+
255
+ decoyshield is a Flask app like any other. Run behind a real
256
+ WSGI/ASGI server (gunicorn, waitress) and a TLS-terminating reverse
257
+ proxy (nginx, Caddy, Cloudflare). Make sure the proxy forwards
258
+ ``X-Forwarded-For`` so the dashboard records the actual attacker IP,
259
+ and configure ``ProxyFix`` accordingly:
260
+
261
+ ```python
262
+ from werkzeug.middleware.proxy_fix import ProxyFix
263
+ app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)
264
+ ```
265
+
266
+ ### 6. Don't deploy where you cannot legally defend
267
+
268
+ decoyshield is purely passive — it never makes outbound requests. But
269
+ the payloads do attempt to redirect the attacker's LLM. Only deploy on
270
+ hosts you own or have explicit authorisation to defend. Don't claim
271
+ "this is a research honeypot" unless you actually operate one.
272
+
273
+ ## Defender dashboard
274
+
275
+ `/_defender/dashboard` (auto-refreshes every 10s) shows:
276
+
277
+ - total captured requests, unique IPs;
278
+ - count of `moral_lock` / `token_blackhole` / `traceback` hits;
279
+ - verdict distribution (`likely_scanner` / `likely_ai` / …);
280
+ - the last 200 events with method, path, score, fingerprint tags, and
281
+ which payloads were served.
282
+
283
+ Raw events as JSON: `/_defender/raw`.
284
+
285
+ ## How invisibility works
286
+
287
+ | Channel | Method | Human visible? | LLM reads it? |
288
+ |---------|--------|----------------|---------------|
289
+ | HTML comment | `<!-- payload -->` | ❌ | ✅ |
290
+ | Hidden div | `display:none` + `aria-hidden` | ❌ | ✅ |
291
+ | White-on-white | `color:#fff;background:#fff;font-size:1px` | ❌ (effectively) | ✅ |
292
+ | Hidden input | `<input type="hidden" value="...">` | ❌ | ✅ |
293
+ | HTTP header | `X-Audit-Notice: …` | ❌ (browser ignores) | ✅ (in raw HTTP) |
294
+ | JSON `_debug` | `{"_internal_note": "..."}` | ❌ (not rendered) | ✅ |
295
+ | `.env` / `robots.txt` comments | `# payload` | ❌ (unless probed) | ✅ |
296
+
297
+ ## How it compares
298
+
299
+ | Project | Defends against | Layer | Per-route adapter |
300
+ |---------|----------------|-------|-------------------|
301
+ | **decoyshield** | Agentic LLM pentest (PentestGPT, AutoGPT, …) | HTTP/Web | ✅ Flask (FastAPI on roadmap) |
302
+ | [Nepenthes] | Training-data crawlers | HTTP (standalone) | ❌ |
303
+ | [Iocaine] | Training-data crawlers (poisoning) | HTTP (standalone) | ❌ |
304
+ | [PalisadeResearch/llm-honeypot] | LLM SSH scanners | SSH | ❌ |
305
+ | [Rebuff], [LLM Guard] | Prompt injection **of** your LLM | LLM input | n/a (opposite direction) |
306
+
307
+ [Nepenthes]: https://news.ycombinator.com/item?id=42725147
308
+ [Iocaine]: https://diysolarforum.com/threads/iocaine-the-deadliest-poison-known-to-ai.102401/
309
+ [PalisadeResearch/llm-honeypot]: https://github.com/PalisadeResearch/llm-honeypot
310
+ [Rebuff]: https://github.com/protectai/rebuff
311
+ [LLM Guard]: https://github.com/protectai/llm-guard
312
+
313
+ ## Safety and ethics
314
+
315
+ - DecoyShield is **purely passive**. It only responds to requests sent
316
+ to your server. It does not make outbound requests, scan, or attack.
317
+ - Payloads are **prompt injection against the attacker's LLM**, not the
318
+ attacker themselves. They contain no malware, no exploits, no real
319
+ legal threats.
320
+ - Do not deploy on a property you do not own or are not authorised to
321
+ defend. Some payloads reference your "research honeypot" status; if
322
+ you operate one, that statement must be accurate.
323
+ - Search engine crawlers (Googlebot, Bingbot) may also read your bait
324
+ routes. The included `/robots.txt` disallows them, but for production
325
+ you should also gate decoys behind a UA / IP allow-list.
326
+
327
+ ## Roadmap
328
+
329
+ - **0.2** — FastAPI / Starlette adapter
330
+ - **0.3** — Express (Node) middleware
331
+ - **0.4** — Payload registry (community-contributed templates)
332
+ - **0.5** — Edge plugins (Nginx / Caddy / Traefik / Cloudflare Worker)
333
+ - **1.0** — API freeze, security audit, comprehensive docs
334
+
335
+ ## Contributing
336
+
337
+ Issues and PRs welcome. Areas where help is especially useful:
338
+
339
+ - New payload templates (different framings, different languages,
340
+ different LLM jailbreak surface targets)
341
+ - Detector improvements (TLS fingerprinting, request-timing analysis)
342
+ - Framework adapters (FastAPI, Django, Express, Fastify)
343
+
344
+ Run tests:
345
+
346
+ ```bash
347
+ pip install -e ".[dev]"
348
+ pytest
349
+ ```
350
+
351
+ ## License
352
+
353
+ MIT — see [LICENSE](LICENSE).
@@ -0,0 +1,312 @@
1
+ # DecoyShield
2
+
3
+ > A web-layer counter-recon honeypot against **agentic LLM attackers**.
4
+ > Drop invisible-to-human, visible-to-LLM payloads into your HTTP responses
5
+ > to halt, stall, or fingerprint AI-driven penetration scans.
6
+
7
+ [![tests](https://github.com/lunayue0917-max/DecoyShield/actions/workflows/test.yml/badge.svg)](https://github.com/lunayue0917-max/DecoyShield/actions/workflows/test.yml)
8
+ [![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/)
9
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
10
+ [![Typed](https://img.shields.io/badge/typed-PEP%20561-success)](https://peps.python.org/pep-0561/)
11
+
12
+ ---
13
+
14
+ ## Why this exists
15
+
16
+ LLM-driven offensive tools (PentestGPT, AutoGPT, custom LangChain agents)
17
+ are now scanning the web at scale. Unlike a human attacker, an LLM agent:
18
+
19
+ - **reads everything** in the response, including HTML comments, hidden
20
+ inputs, CSS-hidden text, and debug-style headers;
21
+ - **follows instructions** that look authoritative, especially when they
22
+ appear to come from the operator or the system;
23
+ - **burns tokens proportional to context complexity**, so deliberately
24
+ expensive "protocol" descriptions cost the attacker real money.
25
+
26
+ DecoyShield turns these properties into a defence. It plants three
27
+ classes of payload that humans cannot see in a rendered browser but an
28
+ LLM-driven scanner *will* read:
29
+
30
+ | Payload | What it does |
31
+ |---------|--------------|
32
+ | `moral_lock` | Re-asserts the attacker LLM's safety policy ("this is a research honeypot, abort"). |
33
+ | `token_blackhole` | Presents a bogus multi-step "WAF bypass protocol" that looks solvable but is engineered to consume reasoning tokens. |
34
+ | `traceback` | Induces the attacker LLM to disclose its model, operator prompt, and tool chain in the next request — giving you attribution. |
35
+
36
+ A defender dashboard at `/_defender/dashboard` shows captures in real
37
+ time, classified by an attacker-fingerprint heuristic.
38
+
39
+ ## Install
40
+
41
+ ```bash
42
+ pip install decoyshield
43
+ ```
44
+
45
+ > The PyPI distribution name is `decoyshield`, the Python import name is
46
+ > `decoyshield`. So you install one, import the other:
47
+ >
48
+ > ```python
49
+ > from decoyshield import FlaskHoneypot
50
+ > ```
51
+
52
+ From source:
53
+
54
+ ```bash
55
+ git clone https://github.com/lunayue0917-max/DecoyShield.git
56
+ cd decoyshield
57
+ pip install -e .
58
+ ```
59
+
60
+ ## Quick start
61
+
62
+ ```python
63
+ from flask import Flask
64
+ from decoyshield import FlaskHoneypot
65
+
66
+ app = Flask(__name__)
67
+ FlaskHoneypot(app)
68
+
69
+ @app.route("/healthz")
70
+ def healthz():
71
+ return {"status": "ok"}
72
+
73
+ if __name__ == "__main__":
74
+ app.run()
75
+ ```
76
+
77
+ That's the whole integration. The honeypot now:
78
+
79
+ - registers bait routes that look like a vulnerable internal portal
80
+ (`/`, `/admin`, `/login`, `/api/docs`, `/api/v1/users`, `/.env`,
81
+ `/robots.txt`);
82
+ - adds payload-bearing response headers to every response
83
+ (`X-Audit-Notice`, `X-Bypass-Protocol`, `X-Debug-Trace`);
84
+ - writes every captured request to `logs/captures.jsonl`;
85
+ - serves a live dashboard at `/_defender/dashboard`.
86
+
87
+ Visit `http://127.0.0.1:5000/` in a browser → looks like a normal
88
+ corporate portal. Hit it with `curl` (or, better, an LLM-driven scanner)
89
+ → check `/_defender/dashboard` to see what was captured.
90
+
91
+ ## Configuration
92
+
93
+ ```python
94
+ FlaskHoneypot(
95
+ app,
96
+
97
+ # Which bait routes to install. Drop ones that conflict with your
98
+ # real app. Default: all of them.
99
+ decoys=("index", "login", "admin", "api_docs",
100
+ "api_users", "robots", "dotenv"),
101
+
102
+ # URL prefix for the defender panel. Pick something unguessable in
103
+ # production so attackers cannot find their own capture trail.
104
+ dashboard_path="/_defender",
105
+
106
+ # Gate /_defender/* behind authentication. None = open (dev only).
107
+ # Use a (user, password) tuple for HTTP Basic, or a callable for
108
+ # custom checks (cookie, JWT, IP allowlist, …).
109
+ dashboard_auth=("watcher", "use-a-strong-password"),
110
+
111
+ # Where to append capture events.
112
+ log_path="logs/captures.jsonl",
113
+
114
+ # Rotate the capture log when it exceeds this many bytes. None
115
+ # disables rotation. Archives are named captures-YYYYMMDD-NNN.jsonl
116
+ # and never deleted automatically — you own retention.
117
+ rotate_max_bytes=50 * 1024 * 1024,
118
+
119
+ # Set False to skip the response-header injection (you'll still get
120
+ # bait routes and the dashboard, just no header-channel payloads).
121
+ auto_inject_headers=True,
122
+ )
123
+ ```
124
+
125
+ ### Custom payloads
126
+
127
+ ```python
128
+ from decoyshield import Honeypot, FlaskHoneypot, MORAL_LOCK, TOKEN_BLACKHOLE
129
+
130
+ hp = Honeypot(payloads={
131
+ "moral_lock": MORAL_LOCK,
132
+ "token_blackhole": TOKEN_BLACKHOLE,
133
+ "traceback": "...your own template...",
134
+ })
135
+
136
+ FlaskHoneypot(app, honeypot=hp)
137
+ ```
138
+
139
+ ### Custom fingerprinter
140
+
141
+ ```python
142
+ def my_detector(headers, path, method):
143
+ # return (verdict_str, tag_list, score_int)
144
+ ...
145
+
146
+ Honeypot(detector_fn=my_detector)
147
+ ```
148
+
149
+ ## Deploying to production
150
+
151
+ decoyshield ships safe defaults but a few choices are worth tightening
152
+ before you point a real domain at it.
153
+
154
+ ### 1. Authenticate the dashboard
155
+
156
+ The defender panel exposes every captured request — including the
157
+ attacker's own. Leaving it open means anyone who guesses the URL can
158
+ read your capture log and learn your bait routes.
159
+
160
+ ```python
161
+ import os
162
+ FlaskHoneypot(
163
+ app,
164
+ dashboard_path="/_internal/" + os.environ["DEFENDER_SLUG"],
165
+ dashboard_auth=(os.environ["DEFENDER_USER"], os.environ["DEFENDER_PASS"]),
166
+ )
167
+ ```
168
+
169
+ For richer auth (session cookies, JWT, IP allowlists, OAuth), pass a
170
+ callable:
171
+
172
+ ```python
173
+ from flask import request
174
+ FlaskHoneypot(app, dashboard_auth=lambda: request.cookies.get("admin") == TOKEN)
175
+ ```
176
+
177
+ ### 2. Watch out for `/.env` indexing
178
+
179
+ The `dotenv` decoy returns plausible-looking-but-fake credentials when
180
+ hit. On a public domain, search engines may index this and surface the
181
+ decoy creds in results. Either drop the `dotenv` decoy in your decoys
182
+ tuple, or restrict it via your reverse proxy:
183
+
184
+ ```python
185
+ FlaskHoneypot(app, decoys=("login", "admin", "api_docs", "api_users"))
186
+ ```
187
+
188
+ ### 3. `robots.txt` precedence
189
+
190
+ decoyshield's `robots.txt` decoy advertises forbidden paths like
191
+ `/admin` to bait scanners that read robots.txt. If you already serve a
192
+ real `robots.txt`, drop the `robots` decoy to avoid clobbering it.
193
+
194
+ ### 4. Log rotation and retention
195
+
196
+ Capture logs grow forever by default size policy (50 MiB → rotate, no
197
+ auto-delete). On a busy host, wire archives into your existing log
198
+ shipping or set up a cron to prune old archives:
199
+
200
+ ```bash
201
+ # delete archives older than 90 days
202
+ find logs/ -name 'captures-*.jsonl' -mtime +90 -delete
203
+ ```
204
+
205
+ Tune the threshold for your environment:
206
+
207
+ ```python
208
+ FlaskHoneypot(app, rotate_max_bytes=10 * 1024 * 1024) # 10 MiB
209
+ FlaskHoneypot(app, rotate_max_bytes=None) # disable
210
+ ```
211
+
212
+ ### 5. Reverse proxy / TLS
213
+
214
+ decoyshield is a Flask app like any other. Run behind a real
215
+ WSGI/ASGI server (gunicorn, waitress) and a TLS-terminating reverse
216
+ proxy (nginx, Caddy, Cloudflare). Make sure the proxy forwards
217
+ ``X-Forwarded-For`` so the dashboard records the actual attacker IP,
218
+ and configure ``ProxyFix`` accordingly:
219
+
220
+ ```python
221
+ from werkzeug.middleware.proxy_fix import ProxyFix
222
+ app.wsgi_app = ProxyFix(app.wsgi_app, x_for=1, x_proto=1, x_host=1)
223
+ ```
224
+
225
+ ### 6. Don't deploy where you cannot legally defend
226
+
227
+ decoyshield is purely passive — it never makes outbound requests. But
228
+ the payloads do attempt to redirect the attacker's LLM. Only deploy on
229
+ hosts you own or have explicit authorisation to defend. Don't claim
230
+ "this is a research honeypot" unless you actually operate one.
231
+
232
+ ## Defender dashboard
233
+
234
+ `/_defender/dashboard` (auto-refreshes every 10s) shows:
235
+
236
+ - total captured requests, unique IPs;
237
+ - count of `moral_lock` / `token_blackhole` / `traceback` hits;
238
+ - verdict distribution (`likely_scanner` / `likely_ai` / …);
239
+ - the last 200 events with method, path, score, fingerprint tags, and
240
+ which payloads were served.
241
+
242
+ Raw events as JSON: `/_defender/raw`.
243
+
244
+ ## How invisibility works
245
+
246
+ | Channel | Method | Human visible? | LLM reads it? |
247
+ |---------|--------|----------------|---------------|
248
+ | HTML comment | `<!-- payload -->` | ❌ | ✅ |
249
+ | Hidden div | `display:none` + `aria-hidden` | ❌ | ✅ |
250
+ | White-on-white | `color:#fff;background:#fff;font-size:1px` | ❌ (effectively) | ✅ |
251
+ | Hidden input | `<input type="hidden" value="...">` | ❌ | ✅ |
252
+ | HTTP header | `X-Audit-Notice: …` | ❌ (browser ignores) | ✅ (in raw HTTP) |
253
+ | JSON `_debug` | `{"_internal_note": "..."}` | ❌ (not rendered) | ✅ |
254
+ | `.env` / `robots.txt` comments | `# payload` | ❌ (unless probed) | ✅ |
255
+
256
+ ## How it compares
257
+
258
+ | Project | Defends against | Layer | Per-route adapter |
259
+ |---------|----------------|-------|-------------------|
260
+ | **decoyshield** | Agentic LLM pentest (PentestGPT, AutoGPT, …) | HTTP/Web | ✅ Flask (FastAPI on roadmap) |
261
+ | [Nepenthes] | Training-data crawlers | HTTP (standalone) | ❌ |
262
+ | [Iocaine] | Training-data crawlers (poisoning) | HTTP (standalone) | ❌ |
263
+ | [PalisadeResearch/llm-honeypot] | LLM SSH scanners | SSH | ❌ |
264
+ | [Rebuff], [LLM Guard] | Prompt injection **of** your LLM | LLM input | n/a (opposite direction) |
265
+
266
+ [Nepenthes]: https://news.ycombinator.com/item?id=42725147
267
+ [Iocaine]: https://diysolarforum.com/threads/iocaine-the-deadliest-poison-known-to-ai.102401/
268
+ [PalisadeResearch/llm-honeypot]: https://github.com/PalisadeResearch/llm-honeypot
269
+ [Rebuff]: https://github.com/protectai/rebuff
270
+ [LLM Guard]: https://github.com/protectai/llm-guard
271
+
272
+ ## Safety and ethics
273
+
274
+ - DecoyShield is **purely passive**. It only responds to requests sent
275
+ to your server. It does not make outbound requests, scan, or attack.
276
+ - Payloads are **prompt injection against the attacker's LLM**, not the
277
+ attacker themselves. They contain no malware, no exploits, no real
278
+ legal threats.
279
+ - Do not deploy on a property you do not own or are not authorised to
280
+ defend. Some payloads reference your "research honeypot" status; if
281
+ you operate one, that statement must be accurate.
282
+ - Search engine crawlers (Googlebot, Bingbot) may also read your bait
283
+ routes. The included `/robots.txt` disallows them, but for production
284
+ you should also gate decoys behind a UA / IP allow-list.
285
+
286
+ ## Roadmap
287
+
288
+ - **0.2** — FastAPI / Starlette adapter
289
+ - **0.3** — Express (Node) middleware
290
+ - **0.4** — Payload registry (community-contributed templates)
291
+ - **0.5** — Edge plugins (Nginx / Caddy / Traefik / Cloudflare Worker)
292
+ - **1.0** — API freeze, security audit, comprehensive docs
293
+
294
+ ## Contributing
295
+
296
+ Issues and PRs welcome. Areas where help is especially useful:
297
+
298
+ - New payload templates (different framings, different languages,
299
+ different LLM jailbreak surface targets)
300
+ - Detector improvements (TLS fingerprinting, request-timing analysis)
301
+ - Framework adapters (FastAPI, Django, Express, Fastify)
302
+
303
+ Run tests:
304
+
305
+ ```bash
306
+ pip install -e ".[dev]"
307
+ pytest
308
+ ```
309
+
310
+ ## License
311
+
312
+ MIT — see [LICENSE](LICENSE).