pentesting 0.73.3 → 0.73.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +121 -0
- package/dist/{agent-tool-JEFUBDZE.js → agent-tool-MP274HWD.js} +3 -3
- package/dist/{chunk-BKWCGMSV.js → chunk-3KWJPPYB.js} +46 -11
- package/dist/{chunk-UB7RW6LM.js → chunk-7E2VUIFU.js} +194 -63
- package/dist/chunk-I52SWXYV.js +1122 -0
- package/dist/main.js +1635 -1005
- package/dist/{persistence-2WKQHGOL.js → persistence-BNVN3WW6.js} +2 -2
- package/dist/{process-registry-QIW7ZIUT.js → process-registry-BI7BKPHN.js} +1 -1
- package/package.json +3 -4
- package/dist/chunk-GLO6TOJN.js +0 -333
package/README.md
CHANGED
|
@@ -120,3 +120,124 @@ we don't stop until the flag is captured.
|
|
|
120
120
|
<br/>
|
|
121
121
|
|
|
122
122
|
</div>
|
|
123
|
+
|
|
124
|
+
---
|
|
125
|
+
|
|
126
|
+
## Research References
|
|
127
|
+
|
|
128
|
+
This section collects representative papers matched to the design themes reflected in `pentesting`.
|
|
129
|
+
|
|
130
|
+
It is an inference-based reconstruction from topic overlap, not a verbatim personal reading log.
|
|
131
|
+
|
|
132
|
+
### Mapping
|
|
133
|
+
|
|
134
|
+
- Offensive security agent papers inform the autonomous pentest workflow.
|
|
135
|
+
- Planner-executor and heterogeneous collaboration papers inform task decomposition and coordination.
|
|
136
|
+
- Multi-agent orchestration papers inform role separation, delegation, and control topology.
|
|
137
|
+
- Benchmark and evaluation papers inform capability framing and validation strategy.
|
|
138
|
+
|
|
139
|
+
### Offensive Security Agents
|
|
140
|
+
|
|
141
|
+
1. [PentestGPT: Evaluating and Harnessing Large Language Models for Automated Penetration Testing](https://www.usenix.org/conference/usenixsecurity24/presentation/deng)
|
|
142
|
+
USENIX Security 2024
|
|
143
|
+
Relevance: autonomous pentest loop and operator-assist workflow.
|
|
144
|
+
|
|
145
|
+
2. [D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security](https://arxiv.org/abs/2502.10931)
|
|
146
|
+
arXiv 2025
|
|
147
|
+
Relevance: collaborative offensive agents, planning, and heterogeneous execution roles.
|
|
148
|
+
|
|
149
|
+
3. [Towards Automated Software Security Testing: Augmenting Penetration Testing through LLMs](https://conf.researchr.org/room/ssbse-2023/fse-2023-venue-golden-gate-c1)
|
|
150
|
+
ESEC/FSE 2023
|
|
151
|
+
Relevance: LLM-augmented penetration testing as a software engineering workflow.
|
|
152
|
+
|
|
153
|
+
4. [LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks](https://arxiv.org/abs/2310.11409)
|
|
154
|
+
arXiv 2023
|
|
155
|
+
Relevance: offensive autonomy in post-exploitation and privilege escalation.
|
|
156
|
+
|
|
157
|
+
5. [Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks](https://arxiv.org/abs/2502.04227)
|
|
158
|
+
arXiv 2025
|
|
159
|
+
Relevance: enterprise network movement and AD-focused agent behavior.
|
|
160
|
+
|
|
161
|
+
6. [LLM Agents can Autonomously Hack Websites](https://arxiv.org/abs/2402.06664)
|
|
162
|
+
arXiv 2024
|
|
163
|
+
Relevance: web exploitation agents and end-to-end task execution.
|
|
164
|
+
|
|
165
|
+
7. [LLM Agents can Autonomously Exploit One-day Vulnerabilities](https://arxiv.org/abs/2404.08144)
|
|
166
|
+
arXiv 2024
|
|
167
|
+
Relevance: exploit execution against known-vulnerability targets.
|
|
168
|
+
|
|
169
|
+
8. [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](https://arxiv.org/abs/2406.01637)
|
|
170
|
+
arXiv 2024
|
|
171
|
+
Relevance: multi-agent offensive workflows for harder vulnerability exploitation.
|
|
172
|
+
|
|
173
|
+
9. [AutoPentester: An LLM Agent-based Framework for Automated Pentesting](https://arxiv.org/abs/2510.05605)
|
|
174
|
+
arXiv 2025
|
|
175
|
+
Relevance: explicit automated pentesting framework alignment.
|
|
176
|
+
|
|
177
|
+
### Benchmarks and Cyber Evaluation
|
|
178
|
+
|
|
179
|
+
10. [AutoPenBench: A Vulnerability Testing Benchmark for Generative Agents](https://aclanthology.org/2025.emnlp-industry.114/)
|
|
180
|
+
EMNLP Industry 2025
|
|
181
|
+
Relevance: benchmark framing for generative vulnerability-testing agents.
|
|
182
|
+
|
|
183
|
+
11. [Training Language Model Agents to Find Vulnerabilities with CTF-Dojo](https://arxiv.org/abs/2508.18370)
|
|
184
|
+
arXiv 2025
|
|
185
|
+
Relevance: CTF-grounded vulnerability discovery and training/eval setup.
|
|
186
|
+
|
|
187
|
+
12. [Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models](https://arxiv.org/abs/2408.08926)
|
|
188
|
+
arXiv 2024
|
|
189
|
+
Relevance: evaluation of cybersecurity capability and misuse risk.
|
|
190
|
+
|
|
191
|
+
13. [CyberGym: Evaluating AI Agents' Cybersecurity Capabilities with Real-World Vulnerabilities at Scale](https://arxiv.org/abs/2506.02548)
|
|
192
|
+
arXiv 2025
|
|
193
|
+
Relevance: large-scale realistic vulnerability evaluation.
|
|
194
|
+
|
|
195
|
+
14. [CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models](https://arxiv.org/abs/2404.13161)
|
|
196
|
+
arXiv 2024
|
|
197
|
+
Relevance: broad cyber eval framing and safety measurement.
|
|
198
|
+
|
|
199
|
+
15. [When LLMs Meet Cybersecurity: A Systematic Literature Review](https://arxiv.org/abs/2405.03644)
|
|
200
|
+
arXiv 2024
|
|
201
|
+
Relevance: survey grounding across offensive and defensive use cases.
|
|
202
|
+
|
|
203
|
+
16. [Large Language Models in Cybersecurity: State-of-the-Art](https://arxiv.org/abs/2402.00891)
|
|
204
|
+
arXiv 2024
|
|
205
|
+
Relevance: landscape overview for positioning the project.
|
|
206
|
+
|
|
207
|
+
### Multi-Agent Collaboration and Orchestration
|
|
208
|
+
|
|
209
|
+
17. [A Survey on Large Language Model based Autonomous Agents](https://arxiv.org/abs/2308.11432)
|
|
210
|
+
arXiv 2023
|
|
211
|
+
Relevance: agent architecture baseline and terminology.
|
|
212
|
+
|
|
213
|
+
18. [Large Language Model based Multi-Agents: A Survey of Progress and Challenges](https://arxiv.org/abs/2402.01680)
|
|
214
|
+
arXiv 2024
|
|
215
|
+
Relevance: multi-agent coordination patterns and failure modes.
|
|
216
|
+
|
|
217
|
+
19. [AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation](https://arxiv.org/abs/2308.08155)
|
|
218
|
+
arXiv 2023
|
|
219
|
+
Relevance: role-based dialogue and tool-using multi-agent orchestration.
|
|
220
|
+
|
|
221
|
+
20. [MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework](https://arxiv.org/abs/2308.00352)
|
|
222
|
+
arXiv 2023
|
|
223
|
+
Relevance: structured role decomposition and pipeline-style collaboration.
|
|
224
|
+
|
|
225
|
+
21. [ChatDev: Communicative Agents for Software Development](https://aclanthology.org/2024.acl-long.810/)
|
|
226
|
+
ACL 2024
|
|
227
|
+
Relevance: communication protocol and software-task role separation.
|
|
228
|
+
|
|
229
|
+
22. [CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society](https://arxiv.org/abs/2303.17760)
|
|
230
|
+
arXiv 2023
|
|
231
|
+
Relevance: agent role prompting and cooperative interaction patterns.
|
|
232
|
+
|
|
233
|
+
23. [AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors](https://arxiv.org/abs/2308.10848)
|
|
234
|
+
arXiv 2023
|
|
235
|
+
Relevance: multi-agent environment framing and emergent collaboration.
|
|
236
|
+
|
|
237
|
+
24. [Scaling Large-Language-Model-based Multi-Agent Collaboration](https://arxiv.org/abs/2406.07155)
|
|
238
|
+
arXiv 2024
|
|
239
|
+
Relevance: scale behavior and coordination bottlenecks.
|
|
240
|
+
|
|
241
|
+
25. [Multi-Agent Collaboration via Evolving Orchestration](https://arxiv.org/abs/2505.19591)
|
|
242
|
+
arXiv 2025
|
|
243
|
+
Relevance: orchestration policy evolution and adaptive coordination.
|
|
@@ -5,7 +5,7 @@ import {
|
|
|
5
5
|
createContextExtractor,
|
|
6
6
|
getLLMClient,
|
|
7
7
|
getShellSupervisorLifecycleSnapshot
|
|
8
|
-
} from "./chunk-
|
|
8
|
+
} from "./chunk-3KWJPPYB.js";
|
|
9
9
|
import {
|
|
10
10
|
AGENT_ROLES,
|
|
11
11
|
EVENT_TYPES,
|
|
@@ -13,14 +13,14 @@ import {
|
|
|
13
13
|
TOOL_NAMES,
|
|
14
14
|
getProcessOutput,
|
|
15
15
|
listBackgroundProcesses
|
|
16
|
-
} from "./chunk-
|
|
16
|
+
} from "./chunk-7E2VUIFU.js";
|
|
17
17
|
import {
|
|
18
18
|
DETECTION_PATTERNS,
|
|
19
19
|
PROCESS_EVENTS,
|
|
20
20
|
PROCESS_ROLES,
|
|
21
21
|
getActiveProcessSummary,
|
|
22
22
|
getProcessEventLog
|
|
23
|
-
} from "./chunk-
|
|
23
|
+
} from "./chunk-I52SWXYV.js";
|
|
24
24
|
|
|
25
25
|
// src/engine/agent-tool/completion-box.ts
|
|
26
26
|
function createCompletionBox() {
|
|
@@ -53,6 +53,9 @@ import {
|
|
|
53
53
|
getTorBrowserArgs,
|
|
54
54
|
getUsedPorts,
|
|
55
55
|
listBackgroundProcesses,
|
|
56
|
+
llmNodeCooldownPolicy,
|
|
57
|
+
llmNodeOutputParsing,
|
|
58
|
+
llmNodeSystemPrompt,
|
|
56
59
|
promoteToShell,
|
|
57
60
|
readFileContent,
|
|
58
61
|
runCommand,
|
|
@@ -61,7 +64,7 @@ import {
|
|
|
61
64
|
startBackgroundProcess,
|
|
62
65
|
stopBackgroundProcess,
|
|
63
66
|
writeFileContent
|
|
64
|
-
} from "./chunk-
|
|
67
|
+
} from "./chunk-7E2VUIFU.js";
|
|
65
68
|
import {
|
|
66
69
|
DETECTION_PATTERNS,
|
|
67
70
|
HEALTH_CONFIG,
|
|
@@ -74,11 +77,8 @@ import {
|
|
|
74
77
|
SYSTEM_LIMITS,
|
|
75
78
|
__require,
|
|
76
79
|
getProcessEventLog,
|
|
77
|
-
llmNodeCooldownPolicy,
|
|
78
|
-
llmNodeOutputParsing,
|
|
79
|
-
llmNodeSystemPrompt,
|
|
80
80
|
logEvent
|
|
81
|
-
} from "./chunk-
|
|
81
|
+
} from "./chunk-I52SWXYV.js";
|
|
82
82
|
|
|
83
83
|
// src/shared/utils/config/env.ts
|
|
84
84
|
var ENV_KEYS = {
|
|
@@ -2887,7 +2887,7 @@ var CoreAgent = class _CoreAgent {
|
|
|
2887
2887
|
);
|
|
2888
2888
|
return { output: "", toolsExecuted, isCompleted: false };
|
|
2889
2889
|
}
|
|
2890
|
-
// ─── AgentController Methods for Dynamic
|
|
2890
|
+
// ─── AgentController Methods for Dynamic Runtime Pipeline ─────────────────
|
|
2891
2891
|
async runLLMInference(ctx, systemPrompt) {
|
|
2892
2892
|
const iteration = ctx.memory.iteration || 0;
|
|
2893
2893
|
const progress = ctx.memory.progress;
|
|
@@ -3438,14 +3438,28 @@ var shellTools = [
|
|
|
3438
3438
|
["process_id"],
|
|
3439
3439
|
async (params) => {
|
|
3440
3440
|
const processId = params.process_id;
|
|
3441
|
+
const profile = params.profile;
|
|
3441
3442
|
const result = await handleInteractAction(
|
|
3442
3443
|
processId,
|
|
3443
|
-
getShellCheckCommand(
|
|
3444
|
+
getShellCheckCommand(profile),
|
|
3444
3445
|
params.wait_ms
|
|
3445
3446
|
);
|
|
3446
|
-
if (result.success &&
|
|
3447
|
+
if (result.success && profile === "stability") {
|
|
3448
|
+
logEvent(processId, PROCESS_EVENTS.SHELL_STABILITY_CHECKED, "Shell stability probe executed by shell_check");
|
|
3449
|
+
}
|
|
3450
|
+
if (result.success && profile === "stability" && outputLooksStabilized(result.output)) {
|
|
3447
3451
|
logEvent(processId, PROCESS_EVENTS.SHELL_STABILIZED, "Shell stability confirmed by shell_check");
|
|
3448
3452
|
}
|
|
3453
|
+
if (result.success && profile === "stability" && !outputLooksStabilized(result.output)) {
|
|
3454
|
+
logEvent(
|
|
3455
|
+
processId,
|
|
3456
|
+
PROCESS_EVENTS.SHELL_STABILITY_INCOMPLETE,
|
|
3457
|
+
"Shell stability probe did not confirm a stable PTY"
|
|
3458
|
+
);
|
|
3459
|
+
}
|
|
3460
|
+
if (result.success && profile === "post") {
|
|
3461
|
+
logEvent(processId, PROCESS_EVENTS.POST_EXPLOITATION_ACTIVITY, "Post-exploitation probe executed by shell_check");
|
|
3462
|
+
}
|
|
3449
3463
|
return result;
|
|
3450
3464
|
}
|
|
3451
3465
|
),
|
|
@@ -3498,6 +3512,10 @@ var offensiveBoundedTools = [
|
|
|
3498
3512
|
TOOL_NAMES.EXPLOIT_FOOTHOLD_CHECK,
|
|
3499
3513
|
"Run a bounded foothold confirmation probe after an exploit chain appears to land access."
|
|
3500
3514
|
),
|
|
3515
|
+
createBoundedCommandTool(
|
|
3516
|
+
TOOL_NAMES.EXPLOIT_VECTOR_CHECK,
|
|
3517
|
+
"Run a bounded exploit vector reachability or service confirmation probe before changing vectors."
|
|
3518
|
+
),
|
|
3501
3519
|
createBoundedCommandTool(
|
|
3502
3520
|
TOOL_NAMES.PWN_CRASH_REPRO,
|
|
3503
3521
|
"Run a bounded pwn crash reproduction command from the preserved crash state."
|
|
@@ -11879,7 +11897,7 @@ After completion: record key loot/findings from the sub-agent output to canonica
|
|
|
11879
11897
|
workerType: params["worker_type"],
|
|
11880
11898
|
resumeTaskId: params["resume_task_id"]
|
|
11881
11899
|
};
|
|
11882
|
-
const { AgentTool } = await import("./agent-tool-
|
|
11900
|
+
const { AgentTool } = await import("./agent-tool-MP274HWD.js");
|
|
11883
11901
|
const executor = new AgentTool(state, events, scopeGuard, approvalGate);
|
|
11884
11902
|
const result = await executor.execute(input);
|
|
11885
11903
|
state.recordDelegatedTask({
|
|
@@ -12027,6 +12045,9 @@ function hasEvent(processId, eventName) {
|
|
|
12027
12045
|
(event) => event.processId === processId && event.event === eventName
|
|
12028
12046
|
);
|
|
12029
12047
|
}
|
|
12048
|
+
function getLatestEventTimestamp(processId, eventName) {
|
|
12049
|
+
return getProcessEventLog().filter((event) => event.processId === processId && event.event === eventName).reduce((latest, event) => Math.max(latest, event.timestamp), 0);
|
|
12050
|
+
}
|
|
12030
12051
|
function isPtyUpgradeCommand(detail) {
|
|
12031
12052
|
return detail.includes("pty.spawn(") || detail.includes("import pty; pty.spawn(") || detail.includes("script -qc") || detail.includes("script -q /dev/null -c /bin/bash") || detail.includes("script /dev/null -c bash") || detail.includes("stty raw -echo") || detail.includes("export term=") || detail.includes("export shell=") || detail.includes("stty rows") || detail.includes("stty columns") || detail.includes("tty") || detail.includes("/usr/bin/expect -c") || detail.includes('exec "/bin/bash"');
|
|
12032
12053
|
}
|
|
@@ -12039,6 +12060,11 @@ function isShellStabilized(stdout, commandDetails) {
|
|
|
12039
12060
|
function isPostExploitationCommand(detail) {
|
|
12040
12061
|
return detail.includes("sudo -l") || detail.includes("ip a") || detail.includes("ip route") || detail.includes("ps aux") || detail.includes("ss -tlnp") || detail.includes("netstat -tlnp") || detail.includes("env | grep") || detail.includes("find / -perm -4000") || detail.includes("getcap -r /") || detail.includes("cat /etc/os-release") || detail.includes("uname -a") || detail.includes("whoami && hostname");
|
|
12041
12062
|
}
|
|
12063
|
+
function hasRecentEvent(processId, eventName) {
|
|
12064
|
+
return getProcessEventLog().some(
|
|
12065
|
+
(event) => event.processId === processId && event.event === eventName
|
|
12066
|
+
);
|
|
12067
|
+
}
|
|
12042
12068
|
function getShellSupervisorLifecycleSnapshot() {
|
|
12043
12069
|
const processes = listBackgroundProcesses().filter(
|
|
12044
12070
|
(process2) => process2.isRunning && (process2.role === PROCESS_ROLES.ACTIVE_SHELL || process2.role === PROCESS_ROLES.LISTENER)
|
|
@@ -12047,7 +12073,9 @@ function getShellSupervisorLifecycleSnapshot() {
|
|
|
12047
12073
|
if (activeShell) {
|
|
12048
12074
|
const output = getProcessOutput(activeShell.id);
|
|
12049
12075
|
const commandDetails = getRecentCommandDetails(activeShell.id);
|
|
12050
|
-
|
|
12076
|
+
const lastStabilizedAt = getLatestEventTimestamp(activeShell.id, PROCESS_EVENTS.SHELL_STABILIZED);
|
|
12077
|
+
const lastIncompleteAt = getLatestEventTimestamp(activeShell.id, PROCESS_EVENTS.SHELL_STABILITY_INCOMPLETE);
|
|
12078
|
+
if (hasRecentEvent(activeShell.id, PROCESS_EVENTS.POST_EXPLOITATION_ACTIVITY) || commandDetails.some(isPostExploitationCommand)) {
|
|
12051
12079
|
return {
|
|
12052
12080
|
phase: "post_exploitation_active",
|
|
12053
12081
|
activeShellId: activeShell.id,
|
|
@@ -12055,13 +12083,20 @@ function getShellSupervisorLifecycleSnapshot() {
|
|
|
12055
12083
|
};
|
|
12056
12084
|
}
|
|
12057
12085
|
if (hasEvent(activeShell.id, PROCESS_EVENTS.SHELL_STABILIZED) || output && isShellStabilized(output.stdout, commandDetails)) {
|
|
12086
|
+
if (lastIncompleteAt > lastStabilizedAt) {
|
|
12087
|
+
return {
|
|
12088
|
+
phase: "active_shell_stabilizing",
|
|
12089
|
+
activeShellId: activeShell.id,
|
|
12090
|
+
recommendation: `Active shell ${activeShell.id} lost stable PTY confirmation after the last probe. Re-run shell upgrade and verify TERM/TTY quality again before broad enumeration.`
|
|
12091
|
+
};
|
|
12092
|
+
}
|
|
12058
12093
|
return {
|
|
12059
12094
|
phase: "active_shell_stabilized",
|
|
12060
12095
|
activeShellId: activeShell.id,
|
|
12061
12096
|
recommendation: `Active shell ${activeShell.id} appears stabilized. Reuse it for controlled enumeration and follow-up operations.`
|
|
12062
12097
|
};
|
|
12063
12098
|
}
|
|
12064
|
-
if (hasEvent(activeShell.id, PROCESS_EVENTS.SHELL_UPGRADE_ATTEMPTED) || commandDetails.some(isPtyUpgradeCommand)) {
|
|
12099
|
+
if (hasEvent(activeShell.id, PROCESS_EVENTS.SHELL_UPGRADE_ATTEMPTED) || hasRecentEvent(activeShell.id, PROCESS_EVENTS.SHELL_STABILITY_INCOMPLETE) || hasRecentEvent(activeShell.id, PROCESS_EVENTS.SHELL_STABILITY_CHECKED) || commandDetails.some(isPtyUpgradeCommand)) {
|
|
12065
12100
|
return {
|
|
12066
12101
|
phase: "active_shell_stabilizing",
|
|
12067
12102
|
activeShellId: activeShell.id,
|