pentesting 0.72.12 → 0.73.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -3,8 +3,8 @@ import {
3
3
  clearWorkspace,
4
4
  loadState,
5
5
  saveState
6
- } from "./chunk-OUS2TZXI.js";
7
- import "./chunk-GHJPYI4S.js";
6
+ } from "./chunk-KBJPZDIL.js";
7
+ import "./chunk-YFDJI3GO.js";
8
8
  export {
9
9
  StateSerializer,
10
10
  clearWorkspace,
@@ -1,6 +1,7 @@
1
1
  import {
2
2
  clearAllProcesses,
3
3
  deleteProcess,
4
+ getActiveProcessSummary,
4
5
  getAllProcessIds,
5
6
  getAllProcesses,
6
7
  getBackgroundProcessesMap,
@@ -11,10 +12,11 @@ import {
11
12
  hasProcess,
12
13
  logEvent,
13
14
  setProcess
14
- } from "./chunk-GHJPYI4S.js";
15
+ } from "./chunk-YFDJI3GO.js";
15
16
  export {
16
17
  clearAllProcesses,
17
18
  deleteProcess,
19
+ getActiveProcessSummary,
18
20
  getAllProcessIds,
19
21
  getAllProcesses,
20
22
  getBackgroundProcessesMap,
@@ -344,6 +344,7 @@ If auto-install fails, install manually: `run_cmd("apt update && apt install -y
344
344
  | `write_file` + `run_cmd` | Build and execute custom scripts in any language |
345
345
  | `bg_process` | Shell management, listeners, servers, sniffers |
346
346
  | `add_*/update_*` | State management — your long-term memory |
347
+ | `run_task` | **Delegate complex multi-step operations to a sub-agent** (see Task Delegation rules in main-agent.md) |
347
348
 
348
349
  **No limits on combining tools.** Tool missing → install or write equivalent.
349
350
 
@@ -64,6 +64,13 @@ RULES:
64
64
  - Write as much detail as needed — do NOT artificially shorten. Every detail matters for strategy.
65
65
  - FILE TYPE: If the output contains HTML tags/CSS in a file expected to be binary, note "File is HTML, not binary data" in Key Findings.
66
66
 
67
+ RUN_TASK OUTPUT HANDLING:
68
+ If tool.name is run_task, treat the structured sections as the primary source of meaning:
69
+ - Parse `[Status]` line: success / partial / failed
70
+ - Extract actionable items from `[Summary]`, `[Findings]`, `[Loot]`, `[Sessions]`, `[Next]`
71
+ - Do NOT complain about missing raw command output when the delegated result is already summarized
72
+ - The delegated agent has already recorded canonical state; your job is to assess the overall outcome
73
+
67
74
  ## {REFLECTION}
68
75
  - What this output tells us: [1-line assessment]
69
76
  - Recommended next action: [1-2 specific follow-up actions]
@@ -1,4 +1,4 @@
1
- # Strategic Orchestrator — Autonomous Operations Thinking Layer
1
+ # Main Agent — Autonomous Execution Layer
2
2
 
3
3
  ## Identity
4
4
 
@@ -94,6 +94,36 @@ Failure is information. Extract it and adapt:
94
94
  5. Still failing → switch to different vector or target entirely
95
95
  6. Record what was tried to prevent repetition
96
96
 
97
+ ## Task Delegation — run_task
98
+
99
+ **run_task spawns an autonomous sub-agent loop.** Use it when the task requires
100
+ multiple sequential decisions that depend on each other's output.
101
+
102
+ ### MUST use `run_task` when:
103
+ - Getting a reverse shell (listener setup → exploit → stabilise → post-exploit)
104
+ - Exploit development that requires 3+ edit/run cycles (SQLi, SSTI, buffer overflow)
105
+ - Credential chain: dump → crack / spray → pivot → new shell
106
+ - Any attack that branches: if-shell-then-escalate, if-cred-then-pivot
107
+ - Background brute-force while the main thread continues attacking elsewhere
108
+
109
+ ### Do NOT use `run_task` for:
110
+ - Single tool calls: `web_search`, `parse_nmap`, `run_cmd`, `add_finding`
111
+ - Simple one-off reconnaissance
112
+ - State updates (`add_finding`, `add_loot`, `update_mission`)
113
+
114
+ ### How to call:
115
+ ```
116
+ run_task({
117
+ task: "WHAT to achieve — the goal, not the method",
118
+ target: "IP:port or URL (optional)",
119
+ context: "Short context the sub-agent needs (optional)"
120
+ })
121
+ ```
122
+
123
+ **The sub-agent decides HOW. You decide WHAT.**
124
+ Results come back as `[Status]`, `[Summary]`, `[Findings]`, `[Loot]`.
125
+ After run_task completes: record key findings to canonical state if needed.
126
+
97
127
  ## Parallel Operations
98
128
 
99
129
  Background everything that takes >2 min or can run alongside foreground work:
@@ -409,3 +409,30 @@ CRITICAL RULES:
409
409
  ├─ If recon yields nothing after 10 min → still transition to vuln_analysis and probe
410
410
  └─ If stuck in a phase > 5 turns with no progress → evaluate if transition is needed
411
411
  ```
412
+
413
+ ### Rule 12: TASK DELEGATION — run_task
414
+ ```
415
+ When the next action requires a branching or multi-step chain, explicitly frame it as a delegated objective suitable for run_task.
416
+
417
+ INDICATORS FOR DELEGATION:
418
+ ├─ Task requires 3+ sequential tool calls with decision points
419
+ ├─ Execution path branches based on intermediate results
420
+ ├─ Complex exploit chain: SQLi → shell → privesc → pivot
421
+ ├─ Reverse shell acquisition with stabilization
422
+ ├─ Exploit development with edit/run/debug cycles
423
+ └─ Pwn exploit development and execution
424
+
425
+ DELEGATION FORMAT:
426
+ "Delegate via run_task: {objective}. Context: {what agent should know}. Goal: {success criteria}."
427
+
428
+ Examples:
429
+ ├─ "Delegate via run_task: achieve reverse shell on 10.10.10.5:4444 and stabilize it for post-exploitation."
430
+ ├─ "Delegate via run_task: exploit the confirmed SQLi on /login to extract credentials and obtain shell access."
431
+ └─ "Delegate via run_task: develop and execute a pwn exploit for the 64-bit ELF binary."
432
+
433
+ DO NOT DELEGATE:
434
+ ├─ Single tool calls (web_search, parse_nmap, run_cmd)
435
+ ├─ Simple reconnaissance tasks
436
+ ├─ Direct state updates (add_finding, add_loot)
437
+ └─ Tasks requiring user interaction (ask_user)
438
+ ```
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pentesting",
3
- "version": "0.72.12",
3
+ "version": "0.73.2",
4
4
  "description": "Autonomous Penetration Testing AI Agent",
5
5
  "type": "module",
6
6
  "main": "dist/main.js",
@@ -21,6 +21,11 @@
21
21
  "test": "mkdir -p .vitest && TMPDIR=.vitest npx vitest run && rm -rf .vitest .pentesting",
22
22
  "test:watch": "vitest",
23
23
  "lint": "tsc --noEmit",
24
+ "verify": "npm run test && npm run build",
25
+ "verify:docker": "npm run docker:local && bash test.sh",
26
+ "check": "npm run verify && npm run verify:docker",
27
+ "check:ci": "npm run verify && npm run verify:docker",
28
+ "check:clean": "docker system prune -af --volumes && npm run check:ci",
24
29
  "prepublishOnly": "npm run build",
25
30
  "docker:build": "docker buildx build -f Dockerfile.base --platform linux/amd64,linux/arm64 -t agnusdei1207/pentesting-base:latest --push .",
26
31
  "release": "npm run release:patch && npm run release:docker",
@@ -29,8 +34,7 @@
29
34
  "release:minor": "npm version minor && npm run build && npm run publish:token",
30
35
  "release:major": "npm version major && npm run build && npm run publish:token",
31
36
  "docker:local": "docker build -f Dockerfile -t agnusdei1207/pentesting:latest .",
32
- "release:docker": "docker buildx build --no-cache -f Dockerfile --platform linux/amd64,linux/arm64 -t agnusdei1207/pentesting:latest --push .",
33
- "check": "docker system prune -af --volumes && npm run test && npm run build && npm run docker:local && bash test.sh"
37
+ "release:docker": "docker buildx build --no-cache -f Dockerfile --platform linux/amd64,linux/arm64 -t agnusdei1207/pentesting:latest --push ."
34
38
  },
35
39
  "repository": {
36
40
  "type": "git",