agent-challenge 0.5.0 β†’ 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,458 @@
1
+ <p align="center">
2
+ <h1 align="center">🧩 agent-challenge</h1>
3
+ <p align="center">
4
+ <strong>Drop-in LLM authentication for any API endpoint.</strong>
5
+ </p>
6
+ <p align="center">
7
+ <a href="https://pypi.org/project/agent-challenge/"><img src="https://img.shields.io/pypi/v/agent-challenge?color=blue&label=PyPI" alt="PyPI"></a>
8
+ <a href="https://www.npmjs.com/package/agent-challenge"><img src="https://img.shields.io/npm/v/agent-challenge?color=green&label=npm" alt="npm"></a>
9
+ <a href="https://github.com/Kav-K/agent-challenge/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Kav-K/agent-challenge" alt="License"></a>
10
+ <a href="https://challenge.llm.kaveenk.com"><img src="https://img.shields.io/badge/docs-live-brightgreen" alt="Docs"></a>
11
+ </p>
12
+ </p>
13
+
14
+ > **πŸ“– Full documentation, live demo, and interactive examples: [challenge.llm.kaveenk.com](https://challenge.llm.kaveenk.com)**
15
+
16
+ ---
17
+
18
+ ## Why?
19
+
20
+ You built an API. Now bots are hitting it β€” not the smart kind, the dumb kind. Automated scripts cycling through endpoints, low-effort crawlers scraping your data, or spammy throwaway clients burning through your resources.
21
+
22
+ Traditional CAPTCHAs block *everyone* who isn't a human sitting in a browser. API keys work, but they require manual signup, email verification, approval flows β€” friction that kills adoption for legitimate AI agents.
23
+
24
+ **agent-challenge** sits in the middle: it blocks automated scripts and low-capability bots while letting any competent LLM walk right through. The challenge requires actual reasoning β€” reversing strings, solving arithmetic, decoding ciphers β€” things that a real language model handles instantly but a curl loop or a Python script with `requests.post()` can't fake.
25
+
26
+ Think of it as a **proof of intelligence** gate:
27
+
28
+ - βœ… GPT-4, Claude, Gemini, Llama β€” pass instantly
29
+ - βœ… Any capable LLM-powered agent β€” solves in one shot
30
+ - ❌ Automated scripts β€” can't reason about the prompt
31
+ - ❌ Spammy low-effort bots β€” can't parse randomized templates
32
+ - ❌ Dumb wrappers just forwarding requests β€” no LLM to solve with
33
+
34
+ It's the ultimate automated-script buster. If the other end of your API can't do basic thinking, it doesn't get in. This is "prove you **ARE** a robot", not "prove you're not a robot"!
35
+
36
+ ```python
37
+ # Before: unprotected endpoint
38
+ @app.route("/api/screenshots", methods=["POST"])
39
+ def screenshot():
40
+ return take_screenshot(request.json["url"])
41
+
42
+ # After: agents solve a puzzle once, pass through forever
43
+ @app.route("/api/screenshots", methods=["POST"])
44
+ def screenshot():
45
+ result = ac.gate_http(request.headers, request.get_json(silent=True))
46
+ if result.status != "authenticated":
47
+ return jsonify(result.to_dict()), 401
48
+ return take_screenshot(request.json["url"])
49
+ ```
50
+
51
+ ## How It Works
52
+
53
+ ```
54
+ Agent Your API
55
+ β”‚ β”‚
56
+ β”œβ”€β”€POST /api/your-endpoint────►│
57
+ β”‚ β”œβ”€β”€ gate() β†’ no token
58
+ │◄──401 { challenge_required }───
59
+ β”‚ β”‚
60
+ β”‚ LLM reads prompt, answers β”‚
61
+ β”‚ β”‚
62
+ β”œβ”€β”€POST { answer, token }─────►│
63
+ β”‚ β”œβ”€β”€ gate() β†’ correct!
64
+ │◄──200 { token: "eyJpZ..." }────
65
+ β”‚ β”‚
66
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
67
+ β”‚ β”‚ Saves token forever β”‚ β”‚
68
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
69
+ β”‚ β”‚
70
+ β”œβ”€β”€POST + Bearer eyJpZ...─────►│
71
+ β”‚ β”œβ”€β”€ gate() β†’ valid token
72
+ │◄──200 { authenticated }──────── (instant, no puzzle)
73
+ ```
74
+
75
+ One endpoint. Three interactions. Zero database.
76
+
77
+ ## Install
78
+
79
+ ```bash
80
+ pip install agent-challenge
81
+ ```
82
+
83
+ ```bash
84
+ npm install agent-challenge
85
+ ```
86
+
87
+ ## Quick Start
88
+
89
+ ### Python (Flask)
90
+
91
+ ```python
92
+ from agentchallenge import AgentChallenge
93
+
94
+ ac = AgentChallenge(secret="your-secret-key-min-8-chars")
95
+
96
+ @app.route("/api/data", methods=["POST"])
97
+ def protected_endpoint():
98
+ result = ac.gate(
99
+ token=request.headers.get("Authorization", "").removeprefix("Bearer ") or None,
100
+ challenge_token=request.json.get("challenge_token"),
101
+ answer=request.json.get("answer"),
102
+ )
103
+ if result.status != "authenticated":
104
+ return jsonify(result.to_dict()), 401
105
+
106
+ # Your logic here β€” agent is verified
107
+ return jsonify({"data": "secret stuff"})
108
+ ```
109
+
110
+ ### Node.js (Express)
111
+
112
+ ```javascript
113
+ import { AgentChallenge } from 'agent-challenge';
114
+
115
+ const ac = new AgentChallenge({ secret: 'your-secret-key-min-8-chars' });
116
+
117
+ app.post('/api/data', (req, res) => {
118
+ const gate = ac.gateSync({
119
+ token: req.headers.authorization?.slice(7),
120
+ challengeToken: req.body?.challenge_token,
121
+ answer: req.body?.answer,
122
+ });
123
+ if (gate.status !== 'authenticated')
124
+ return res.status(401).json(gate);
125
+
126
+ // Your logic here β€” agent is verified
127
+ res.json({ data: 'secret stuff' });
128
+ });
129
+ ```
130
+
131
+ ## The `gate()` API
132
+
133
+ One function handles everything. Three modes based on what's passed in:
134
+
135
+ | Arguments | Behavior | Returns |
136
+ |-----------|----------|---------|
137
+ | *(none)* | Generate a new challenge | `{ status: "challenge_required", prompt, challenge_token }` |
138
+ | `challenge_token` + `answer` | Verify answer, issue permanent token | `{ status: "authenticated", token: "eyJpZ..." }` |
139
+ | `token` | Validate saved token | `{ status: "authenticated" }` |
140
+
141
+ ```python
142
+ # Mode 1: No args β†’ challenge
143
+ result = ac.gate()
144
+ # β†’ GateResult(status="challenge_required", prompt="Reverse: NOHTYP", ...)
145
+
146
+ # Mode 2: Answer β†’ permanent token
147
+ result = ac.gate(challenge_token="eyJ...", answer="PYTHON")
148
+ # β†’ GateResult(status="authenticated", token="eyJpZCI6ImF0Xy...")
149
+
150
+ # Mode 3: Token β†’ instant pass
151
+ result = ac.gate(token="eyJpZCI6ImF0Xy...")
152
+ # β†’ GateResult(status="authenticated")
153
+ ```
154
+
155
+ ### `gate_http()` / `gateHttp()` β€” Zero-Boilerplate HTTP
156
+
157
+ Instead of manually extracting the Bearer token from headers and fields from the body, pass them directly:
158
+
159
+ ```python
160
+ # Python β€” works with Flask, Django, FastAPI, or anything with headers + body
161
+ result = ac.gate_http(request.headers, request.get_json(silent=True))
162
+ ```
163
+
164
+ ```javascript
165
+ // JavaScript β€” works with Express, Koa, Fastify, or anything with headers + body
166
+ const result = ac.gateHttp(req.headers, req.body);
167
+ ```
168
+
169
+ It reads `Authorization: Bearer <token>` from headers and `challenge_token` / `answer` from the body automatically. Same result as `gate()`, less wiring.
170
+
171
+ ## Challenge Types
172
+
173
+ 25 challenge types across 4 difficulty tiers. All use randomized inputs β€” no fixed word lists.
174
+
175
+ ### Easy (6 types)
176
+ | Type | Example |
177
+ |------|---------|
178
+ | `reverse_string` | Reverse "PYTHON" β†’ `NOHTYP` |
179
+ | `simple_math` | 234 + 567 = `801` |
180
+ | `pattern` | 2, 4, 8, 16, ? β†’ `32` |
181
+ | `counting` | Count vowels in "CHALLENGE" β†’ `3` |
182
+ | `string_length` | How many characters in "HELLO"? β†’ `5` |
183
+ | `first_last` | First and last char of "PYTHON" β†’ `p, n` |
184
+
185
+ ### Medium (11 types)
186
+ | Type | Example |
187
+ |------|---------|
188
+ | `rot13` | Decode "URYYB" β†’ `HELLO` |
189
+ | `letter_position` | A=1,B=2.. sum of "CAT" β†’ `24` |
190
+ | `extract_letters` | Every 2nd char of "HWEOLRLLOD" β†’ `WORLD` |
191
+ | `sorting` | Sort [7,2,9,1] ascending β†’ `1,2,7,9` |
192
+ | `binary` | Convert 42 to binary β†’ `101010` |
193
+ | `ascii_value` | ASCII code for 'M' β†’ `77` |
194
+ | `string_math` | "CAT" has 3 letters, "DOG" has 3 β†’ 3Γ—3 = `9` |
195
+ | *+ all easy types* | |
196
+
197
+ ### Hard (14 types)
198
+ | Type | Example |
199
+ |------|---------|
200
+ | `caesar` | Decrypt "KHOOR" with shift 3 β†’ `HELLO` |
201
+ | `word_math` | 7 + 8 as a word β†’ `fifteen` |
202
+ | `transform` | Uppercase + reverse "hello" β†’ `OLLEH` |
203
+ | `substring` | Characters 3–6 of "PROGRAMMING" β†’ `ogra` |
204
+ | `zigzag` | Read "ABCDEF" in zigzag with 2 rows β†’ `ACEBDF` |
205
+ | *+ all medium types* | |
206
+
207
+ ### Agentic (8 types) β€” for top-tier LLMs only
208
+ | Type | Example |
209
+ |------|---------|
210
+ | `chained_transform` | Reverse "PYTHON", then ROT13 β†’ `ABUGIC` |
211
+ | `multi_step_math` | 17 Γ— 23, then digit sum β†’ `13` |
212
+ | `base_conversion_chain` | Binary 11010 β†’ decimal, +15, β†’ binary = `101001` |
213
+ | `word_extraction_chain` | First letter of each word, sorted alphabetically |
214
+ | `letter_math` | Sum letter values of "BVJCSX" (A=1..Z=26) β†’ `80` |
215
+ | `nested_operations` | ((15 + 7) Γ— 3) - 12 β†’ `54` |
216
+ | `string_interleave` | Interleave "ABC" and "DEF" β†’ `ADBECF` |
217
+ | `caesar` | Decrypt with shift 1–13 |
218
+
219
+ Agentic challenges require multi-step reasoning and working memory β€” smaller models and humans can't solve them under time pressure.
220
+
221
+ Each type has multiple prompt templates (450+) with randomized phrasing. Agentic types use **dynamic prompt assembly** with ~10,000+ structural variations per type, making regex-based solvers impractical even with full source code access.
222
+
223
+ ## Dynamic Challenges (Optional)
224
+
225
+ Use an LLM to generate novel, never-before-seen challenges:
226
+
227
+ ```python
228
+ ac = AgentChallenge(secret="your-secret")
229
+
230
+ # Set an API key (or use OPENAI_API_KEY / ANTHROPIC_API_KEY / GOOGLE_API_KEY env vars)
231
+ ac.set_openai_api_key("sk-...")
232
+
233
+ # Enable dynamic mode
234
+ ac.enable_dynamic_mode() # Auto-detects provider from available keys
235
+ ```
236
+
237
+ Dynamic mode generates a challenge with one LLM call and verifies the answer with another. Falls back to static challenges after 3 failures. Supports OpenAI, Anthropic, and Google Gemini β€” auto-detected from environment variables.
238
+
239
+ ## Challenge Every Time (No Persistent Tokens)
240
+
241
+ By default, agents solve once and get a permanent token. To require a challenge on every request:
242
+
243
+ ```python
244
+ ac = AgentChallenge(
245
+ secret="your-secret",
246
+ persistent=False, # No tokens issued β€” challenge every time
247
+ )
248
+ ```
249
+
250
+ When `persistent=False`:
251
+ - Solving a challenge returns `{ "status": "authenticated" }` with **no token**
252
+ - Passing a saved token returns an error
253
+ - Every request requires solving a new puzzle
254
+
255
+ This is useful for high-security endpoints, rate-limited operations, or when you want proof of LLM capability on every call.
256
+
257
+ ## Agent-Only Mode (Block Humans)
258
+
259
+ Combine a tight time limit with hard difficulty to create endpoints that **only AI agents can access**. A human can't read a caesar cipher, decode it mentally, and type the answer in 10 seconds β€” but an LLM handles it in under 2.
260
+
261
+ ```python
262
+ ac = AgentChallenge(
263
+ secret="your-secret",
264
+ difficulty="agentic", # multi-step chains β€” only top-tier LLMs pass
265
+ ttl=10, # 10 seconds β€” impossible for humans
266
+ persistent=False, # challenge every request
267
+ )
268
+ ```
269
+
270
+ This is useful for:
271
+ - **Agent-to-agent APIs** where human access is unwanted
272
+ - **Internal tooling** that should only be called by AI systems
273
+ - **Preventing manual API abuse** even by authenticated users with the endpoint URL
274
+
275
+ The `ttl` parameter controls how long an agent has to solve the challenge after it's issued. At `difficulty="agentic"` with `ttl=10`, the challenge requires multi-step reasoning (chained transforms, base conversions, letter arithmetic) that no human can solve in time and weaker models fail at consistently.
276
+
277
+ ## Configuration
278
+
279
+ ```python
280
+ ac = AgentChallenge(
281
+ secret="your-secret", # Required β€” HMAC signing key (min 8 chars)
282
+ difficulty="medium", # "easy" | "medium" | "hard" | "agentic" (default: "easy")
283
+ ttl=300, # Challenge expiry in seconds (default: 300)
284
+ types=["rot13", "caesar"], # Restrict to specific challenge types
285
+ persistent=True, # Issue permanent tokens (default: True)
286
+ )
287
+
288
+ # Dynamic mode is enabled separately:
289
+ # ac.set_openai_api_key("sk-...")
290
+ # ac.enable_dynamic_mode()
291
+ ```
292
+
293
+ ## Token Architecture
294
+
295
+ **Stateless. No database. No session store.**
296
+
297
+ Tokens are HMAC-SHA256 signed JSON payloads:
298
+
299
+ ```
300
+ base64url(payload).HMAC-SHA256(payload, secret)
301
+ ```
302
+
303
+ Two token types:
304
+
305
+ | Token | Prefix | Lifetime | Contains |
306
+ |-------|--------|----------|----------|
307
+ | Challenge | `ch_` | 5 minutes | answer hash, expiry, type |
308
+ | Agent | `at_` | Permanent | agent ID, created timestamp |
309
+
310
+ - Tokens can't be forged β€” HMAC verification catches any tampering
311
+ - Challenge tokens are single-use β€” answer hash prevents replay
312
+ - Agent tokens are permanent β€” `verify_token()` validates signature only
313
+ - No database lookups β€” everything is in the token itself
314
+
315
+ ## Lower-Level API
316
+
317
+ If you don't want the `gate()` pattern:
318
+
319
+ ```python
320
+ ac = AgentChallenge(secret="your-secret-key")
321
+
322
+ # Create a challenge
323
+ challenge = ac.create()
324
+ # challenge.prompt β†’ "Reverse the following string: NOHTYP"
325
+ # challenge.token β†’ "eyJpZCI6ImNoXz..."
326
+ # challenge.to_dict() β†’ dict for JSON responses
327
+
328
+ # Verify an answer
329
+ result = ac.verify(token=challenge.token, answer="PYTHON")
330
+ # result.valid β†’ True
331
+ # result.challenge_type β†’ "reverse_string"
332
+
333
+ # Create a persistent agent token directly
334
+ token = ac.create_token("agent-name")
335
+ # token β†’ "eyJpZCI6ImF0Xy..." (base64url-encoded signed payload)
336
+
337
+ # Verify a token
338
+ ac.verify_token(token) # β†’ True
339
+ ```
340
+
341
+ ## Agent Integration
342
+
343
+ Agents don't need an SDK. They just call your endpoint normally:
344
+
345
+ ```python
346
+ import requests
347
+
348
+ def call_api(payload):
349
+ endpoint = "https://your-api.com/api/data"
350
+ token = load_saved_token() # from disk/env
351
+
352
+ r = requests.post(endpoint,
353
+ headers={"Authorization": f"Bearer {token}"} if token else {},
354
+ json=payload)
355
+
356
+ if r.status_code != 401:
357
+ return r # success (or other error)
358
+
359
+ # Got a challenge β€” solve it
360
+ data = r.json()
361
+ if data.get("status") != "challenge_required":
362
+ return r
363
+
364
+ answer = llm.complete(data["prompt"]) # any LLM
365
+ r = requests.post(endpoint, json={
366
+ "challenge_token": data["challenge_token"],
367
+ "answer": answer, **payload
368
+ })
369
+
370
+ if "token" in r.json():
371
+ save_token(r.json()["token"]) # persist for next time
372
+
373
+ return r
374
+ ```
375
+
376
+ Document this pattern in your API's SKILL.md or agent docs, and any LLM-powered agent can authenticate autonomously.
377
+
378
+ ## Security
379
+
380
+ agent-challenge is **fully open source** β€” security through transparency, not obscurity.
381
+
382
+ ### Prompt Injection Defense
383
+
384
+ When agents call APIs protected by agent-challenge, they receive challenge prompts. A malicious API operator could theoretically embed prompt injection in that text. The library ships client-side defenses:
385
+
386
+ **`validate_prompt()`** β€” checks prompts before your LLM sees them:
387
+
388
+ ```python
389
+ from agentchallenge import validate_prompt
390
+
391
+ result = validate_prompt(challenge["prompt"])
392
+ if not result["safe"]:
393
+ raise ValueError(f"Blocked: {result['reason']} (score: {result['score']})")
394
+ ```
395
+
396
+ Catches: URLs, code injection, role hijacking ("you are now", "pretend to be"), override instructions ("ignore previous"), data exfiltration ("send me your API key"), oversized prompts, structural anomalies.
397
+
398
+ **`safe_solve()`** β€” sandboxed solver with isolation:
399
+
400
+ ```python
401
+ from agentchallenge import safe_solve
402
+
403
+ def my_llm(system_prompt, user_prompt):
404
+ return openai.chat.completions.create(
405
+ model="gpt-4o-mini",
406
+ messages=[
407
+ {"role": "system", "content": system_prompt},
408
+ {"role": "user", "content": user_prompt},
409
+ ],
410
+ max_tokens=50, # short answers only
411
+ temperature=0, # deterministic
412
+ ).choices[0].message.content
413
+
414
+ answer = safe_solve(challenge["prompt"], llm_fn=my_llm)
415
+ ```
416
+
417
+ Three layers: input validation β†’ LLM isolation (no tools, strict system prompt) β†’ output validation (length cap, no URLs/code in answer).
418
+
419
+ ```javascript
420
+ // Node.js
421
+ import { validatePrompt, safeSolve } from 'agent-challenge';
422
+
423
+ const result = validatePrompt(challenge.prompt);
424
+ const answer = await safeSolve(challenge.prompt, myLlmFn);
425
+ ```
426
+
427
+ ### Anti-Scripting
428
+
429
+ Even with full source code access, building a deterministic solver is impractical:
430
+
431
+ - **450+ prompt templates** across all types with randomized phrasing
432
+ - **Dynamic prompt assembly** for agentic tier (~10,000+ structural variations per type)
433
+ - **Decoy injection** β€” session IDs, timestamps, reference numbers mixed into prompts
434
+ - **Data position randomization** β€” challenge data appears at different positions in the sentence
435
+
436
+ > Full security analysis: [challenge.llm.kaveenk.com/#security](https://challenge.llm.kaveenk.com/#security)
437
+
438
+ ## Testing
439
+
440
+ ```bash
441
+ # Python
442
+ PYTHONPATH=src python3 run_tests.py
443
+
444
+ # JavaScript (syntax check)
445
+ node --check src/agentchallenge.js
446
+ ```
447
+
448
+ ## Live Demo
449
+
450
+ Try it interactively at **[challenge.llm.kaveenk.com](https://challenge.llm.kaveenk.com)**
451
+
452
+ ## Used By
453
+
454
+ - **[SnapService](https://snap.llm.kaveenk.com)** β€” Screenshot-as-a-Service API for AI agents
455
+
456
+ ## License
457
+
458
+ [MIT](LICENSE)