@blamejs/exceptd-skills 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (136) hide show
  1. package/AGENTS.md +232 -0
  2. package/ARCHITECTURE.md +267 -0
  3. package/CHANGELOG.md +616 -0
  4. package/CONTEXT.md +203 -0
  5. package/LICENSE +200 -0
  6. package/NOTICE +82 -0
  7. package/README.md +307 -0
  8. package/SECURITY.md +73 -0
  9. package/agents/README.md +81 -0
  10. package/agents/report-generator.md +156 -0
  11. package/agents/skill-updater.md +102 -0
  12. package/agents/source-validator.md +119 -0
  13. package/agents/threat-researcher.md +149 -0
  14. package/bin/exceptd.js +183 -0
  15. package/data/_indexes/_meta.json +88 -0
  16. package/data/_indexes/activity-feed.json +362 -0
  17. package/data/_indexes/catalog-summaries.json +229 -0
  18. package/data/_indexes/chains.json +7135 -0
  19. package/data/_indexes/currency.json +359 -0
  20. package/data/_indexes/did-ladders.json +451 -0
  21. package/data/_indexes/frequency.json +2072 -0
  22. package/data/_indexes/handoff-dag.json +476 -0
  23. package/data/_indexes/jurisdiction-clocks.json +967 -0
  24. package/data/_indexes/jurisdiction-map.json +536 -0
  25. package/data/_indexes/recipes.json +319 -0
  26. package/data/_indexes/section-offsets.json +3656 -0
  27. package/data/_indexes/stale-content.json +14 -0
  28. package/data/_indexes/summary-cards.json +1736 -0
  29. package/data/_indexes/theater-fingerprints.json +381 -0
  30. package/data/_indexes/token-budget.json +2137 -0
  31. package/data/_indexes/trigger-table.json +1374 -0
  32. package/data/_indexes/xref.json +818 -0
  33. package/data/atlas-ttps.json +282 -0
  34. package/data/cve-catalog.json +496 -0
  35. package/data/cwe-catalog.json +1017 -0
  36. package/data/d3fend-catalog.json +738 -0
  37. package/data/dlp-controls.json +1039 -0
  38. package/data/exploit-availability.json +67 -0
  39. package/data/framework-control-gaps.json +1255 -0
  40. package/data/global-frameworks.json +2913 -0
  41. package/data/rfc-references.json +324 -0
  42. package/data/zeroday-lessons.json +377 -0
  43. package/keys/public.pem +3 -0
  44. package/lib/framework-gap.js +328 -0
  45. package/lib/job-queue.js +195 -0
  46. package/lib/lint-skills.js +536 -0
  47. package/lib/prefetch.js +372 -0
  48. package/lib/refresh-external.js +713 -0
  49. package/lib/schemas/cve-catalog.schema.json +151 -0
  50. package/lib/schemas/manifest.schema.json +106 -0
  51. package/lib/schemas/skill-frontmatter.schema.json +113 -0
  52. package/lib/scoring.js +149 -0
  53. package/lib/sign.js +197 -0
  54. package/lib/ttp-mapper.js +80 -0
  55. package/lib/validate-catalog-meta.js +198 -0
  56. package/lib/validate-cve-catalog.js +213 -0
  57. package/lib/validate-indexes.js +83 -0
  58. package/lib/validate-package.js +162 -0
  59. package/lib/validate-vendor.js +85 -0
  60. package/lib/verify.js +216 -0
  61. package/lib/worker-pool.js +84 -0
  62. package/manifest-snapshot.json +1833 -0
  63. package/manifest.json +2108 -0
  64. package/orchestrator/README.md +124 -0
  65. package/orchestrator/dispatcher.js +140 -0
  66. package/orchestrator/event-bus.js +146 -0
  67. package/orchestrator/index.js +874 -0
  68. package/orchestrator/pipeline.js +201 -0
  69. package/orchestrator/scanner.js +327 -0
  70. package/orchestrator/scheduler.js +137 -0
  71. package/package.json +113 -0
  72. package/sbom.cdx.json +158 -0
  73. package/scripts/audit-cross-skill.js +261 -0
  74. package/scripts/audit-perf.js +160 -0
  75. package/scripts/bootstrap.js +205 -0
  76. package/scripts/build-indexes.js +721 -0
  77. package/scripts/builders/activity-feed.js +79 -0
  78. package/scripts/builders/catalog-summaries.js +67 -0
  79. package/scripts/builders/currency.js +109 -0
  80. package/scripts/builders/cwe-chains.js +105 -0
  81. package/scripts/builders/did-ladders.js +149 -0
  82. package/scripts/builders/frequency.js +89 -0
  83. package/scripts/builders/jurisdiction-clocks.js +126 -0
  84. package/scripts/builders/recipes.js +159 -0
  85. package/scripts/builders/section-offsets.js +162 -0
  86. package/scripts/builders/stale-content.js +171 -0
  87. package/scripts/builders/summary-cards.js +166 -0
  88. package/scripts/builders/theater-fingerprints.js +198 -0
  89. package/scripts/builders/token-budget.js +96 -0
  90. package/scripts/check-manifest-snapshot.js +217 -0
  91. package/scripts/predeploy.js +267 -0
  92. package/scripts/refresh-manifest-snapshot.js +57 -0
  93. package/scripts/refresh-sbom.js +222 -0
  94. package/skills/age-gates-child-safety/skill.md +456 -0
  95. package/skills/ai-attack-surface/skill.md +282 -0
  96. package/skills/ai-c2-detection/skill.md +440 -0
  97. package/skills/ai-risk-management/skill.md +311 -0
  98. package/skills/api-security/skill.md +287 -0
  99. package/skills/attack-surface-pentest/skill.md +381 -0
  100. package/skills/cloud-security/skill.md +384 -0
  101. package/skills/compliance-theater/skill.md +365 -0
  102. package/skills/container-runtime-security/skill.md +379 -0
  103. package/skills/coordinated-vuln-disclosure/skill.md +473 -0
  104. package/skills/defensive-countermeasure-mapping/skill.md +300 -0
  105. package/skills/dlp-gap-analysis/skill.md +337 -0
  106. package/skills/email-security-anti-phishing/skill.md +206 -0
  107. package/skills/exploit-scoring/skill.md +331 -0
  108. package/skills/framework-gap-analysis/skill.md +374 -0
  109. package/skills/fuzz-testing-strategy/skill.md +313 -0
  110. package/skills/global-grc/skill.md +564 -0
  111. package/skills/identity-assurance/skill.md +272 -0
  112. package/skills/incident-response-playbook/skill.md +546 -0
  113. package/skills/kernel-lpe-triage/skill.md +303 -0
  114. package/skills/mcp-agent-trust/skill.md +326 -0
  115. package/skills/mlops-security/skill.md +325 -0
  116. package/skills/ot-ics-security/skill.md +340 -0
  117. package/skills/policy-exception-gen/skill.md +437 -0
  118. package/skills/pqc-first/skill.md +546 -0
  119. package/skills/rag-pipeline-security/skill.md +294 -0
  120. package/skills/researcher/skill.md +310 -0
  121. package/skills/sector-energy/skill.md +409 -0
  122. package/skills/sector-federal-government/skill.md +302 -0
  123. package/skills/sector-financial/skill.md +398 -0
  124. package/skills/sector-healthcare/skill.md +373 -0
  125. package/skills/security-maturity-tiers/skill.md +464 -0
  126. package/skills/skill-update-loop/skill.md +463 -0
  127. package/skills/supply-chain-integrity/skill.md +318 -0
  128. package/skills/threat-model-currency/skill.md +404 -0
  129. package/skills/threat-modeling-methodology/skill.md +312 -0
  130. package/skills/webapp-security/skill.md +281 -0
  131. package/skills/zeroday-gap-learn/skill.md +350 -0
  132. package/vendor/blamejs/LICENSE +201 -0
  133. package/vendor/blamejs/README.md +54 -0
  134. package/vendor/blamejs/_PROVENANCE.json +54 -0
  135. package/vendor/blamejs/retry.js +335 -0
  136. package/vendor/blamejs/worker-pool.js +418 -0
@@ -0,0 +1,381 @@
1
+ ---
2
+ name: attack-surface-pentest
3
+ version: "1.0.0"
4
+ description: Modern attack surface management + pen testing methodology for AI-era environments — NIST 800-115, OWASP WSTG, PTES, ATT&CK-driven adversary emulation, TIBER-EU
5
+ triggers:
6
+ - attack surface
7
+ - pen test
8
+ - penetration testing
9
+ - red team
10
+ - adversary emulation
11
+ - threat-led testing
12
+ - tlpt
13
+ - tiber-eu
14
+ - asset inventory
15
+ - external footprint
16
+ - asm
17
+ data_deps:
18
+ - cve-catalog.json
19
+ - atlas-ttps.json
20
+ - framework-control-gaps.json
21
+ - cwe-catalog.json
22
+ - d3fend-catalog.json
23
+ atlas_refs:
24
+ - AML.T0043
25
+ - AML.T0051
26
+ - AML.T0010
27
+ attack_refs:
28
+ - T1190
29
+ - T1133
30
+ - T1059
31
+ - T1078
32
+ framework_gaps:
33
+ - NIST-800-115
34
+ - OWASP-Pen-Testing-Guide-v5
35
+ - PTES-Pre-engagement
36
+ - NIS2-Art21-patch-management
37
+ rfc_refs: []
38
+ cwe_refs:
39
+ - CWE-1395
40
+ - CWE-22
41
+ - CWE-269
42
+ - CWE-352
43
+ - CWE-434
44
+ - CWE-732
45
+ - CWE-78
46
+ - CWE-787
47
+ - CWE-79
48
+ - CWE-89
49
+ - CWE-918
50
+ forward_watch:
51
+ - NIST SP 800-115 successor publication (the 2008 original is the active gap)
52
+ - TIBER-EU scenario library refresh under DORA Year-2 supervisory cycle
53
+ - OWASP WSTG v5.x AI/MCP test cases (currently in working-group draft)
54
+ - PTES revision incorporating AI-surface enumeration
55
+ d3fend_refs:
56
+ - D3-CSPP
57
+ - D3-EAL
58
+ - D3-NTA
59
+ last_threat_review: "2026-05-11"
60
+ ---
61
+
62
+ # Attack Surface Management + Penetration Testing
63
+
64
+ ## Threat Context (mid-2026)
65
+
66
+ The attack surface is no longer a list of internet-facing IPs and web apps. By mid-2026 the surface a competent adversary maps for a target enterprise includes seven distinct, simultaneously exploitable layers — and the typical pen test scope covers two of them.
67
+
68
+ ### 1. Classical perimeter (still real, no longer primary)
69
+
70
+ Edge appliances, VPN concentrators, web apps, and exposed RDP/SSH remain in the kill chain (ATT&CK T1190 Exploit Public-Facing Application, T1133 External Remote Services), and edge-device CVEs continue to dominate CISA KEV adds. But initial-access cost on this surface has risen as defenders converge on EDR, WAF, and identity hardening. Adversaries pivot to cheaper surfaces below.
71
+
72
+ ### 2. AI API egress as unmanaged C2 channel
73
+
74
+ Every enterprise now has outbound HTTPS to one or more LLM providers (OpenAI, Anthropic, Google, Azure OpenAI, Bedrock) plus local model inference endpoints. The SesameOp campaign (ATLAS AML.T0096) demonstrated that prompt/completion fields can carry C2 traffic indistinguishable from legitimate AI usage. Egress filtering policies that allow `api.openai.com` and `api.anthropic.com` as trusted destinations create a covert channel the org explicitly authorized. Pen testers in mid-2026 must emulate this — point-in-time network scanners will not surface it.
75
+
76
+ ### 3. MCP servers as RCE surface
77
+
78
+ CVE-2026-30615 (Windsurf MCP, CVSS 9.8) is the canonical example: a malicious MCP server executes arbitrary code in the AI assistant's context with zero user interaction. 150M+ affected downloads. Every developer workstation with an MCP-aware client (Cursor, VS Code + Copilot, Windsurf, Claude Code, Gemini CLI) is potentially a network of unsigned-package RCE vectors that no traditional asset inventory enumerates. ATLAS AML.T0010 (ML Supply Chain Compromise).
79
+
80
+ ### 4. Prompt-injection footprint
81
+
82
+ Any system that feeds external content (PR descriptions, support tickets, web-retrieved docs, calendar events, email bodies, RAG-retrieved chunks) into an LLM prompt is in scope (ATLAS AML.T0051). CVE-2025-53773 (GitHub Copilot, CVSS 9.6) showed prompt injection achieving RCE via a developer tool. 2026 meta-analysis: adaptive prompt injection succeeds against SOTA defenses at >85%. Pen testers must enumerate the prompt-injection footprint as a first-class asset class.
83
+
84
+ ### 5. RAG corpora and embedding stores
85
+
86
+ Vector databases are exfil targets and poisoning targets. ATLAS AML.T0043 (Craft Adversarial Data) covers vector manipulation that forces retrieval to surface proprietary chunks or to retrieve attacker-controlled poisoned chunks. A pen test that ignores the vector store does not test the actual data-exfil path.
87
+
88
+ ### 6. Ephemeral runtimes invisible to point-in-time CMDBs
89
+
90
+ Serverless functions, K8s pods scaled to zero between invocations, ephemeral build runners, sandbox VMs — these exist for seconds to minutes and never appear in a quarterly asset inventory snapshot. A traditional ASM tool that scans on a 24-hour cycle has zero visibility. Adversaries who land in a build pipeline get code execution against the supply chain (ATT&CK T1059) inside ephemeral runners that the blue team's SIEM has no log feed from.
91
+
92
+ ### 7. Supply chain manifests
93
+
94
+ `package.json`, `requirements.txt`, `pyproject.toml`, `go.mod`, `Cargo.toml`, MCP server manifests, model registries (Hugging Face, internal MLflow), and SBOMs (CycloneDX, SPDX) define a transitive dependency graph that is the actual attack surface for software supply chain compromise. 41% of 2025 zero-days were AI-discovered, compressing the window between dependency publication and reliable exploitation to hours. ATT&CK T1078 (Valid Accounts) is increasingly preceded by a supply chain compromise that plants the valid account.
95
+
96
+ ### What this means for pen testing
97
+
98
+ A pen test scoped to layers 1 and (partly) 7 — i.e. "web app + network + nominal SBOM review" — is structurally inadequate against a mid-2026 adversary. Red teams now have to emulate AI-API-as-C2 (SesameOp) and prompt-injection-as-RCE (CVE-2025-53773) alongside classical web/network surfaces. Threat-led penetration testing programs (TIBER-EU under DORA, CBEST under PRA/FCA) require adversary emulation grounded in current threat intelligence — but their published scenario libraries lag mid-2026 TTPs by 12–18 months and tester certifications do not require demonstrated competence in AI-surface attacks.
99
+
100
+ ---
101
+
102
+ ## Framework Lag Declaration
103
+
104
+ | Framework | Control / Document | Why It Fails for Mid-2026 Pen Testing |
105
+ |---|---|---|
106
+ | NIST SP 800-115 (Technical Guide to Information Security Testing and Assessment) | Whole document | Published September 2008. Predates cloud-native architectures, the AI surface, MCP, RAG, and ephemeral runtimes entirely. The methodology section assumes a network-centric perimeter and gives zero guidance on AI APIs, prompt-injection enumeration, vector store assessment, or supply-chain manifest review. No revision has been issued in 18 years. |
107
+ | OWASP Web Security Testing Guide (WSTG) v5 | Sections 4.x (Web App Testing) | Covers web app testing surface (auth, session, input validation, business logic). Does not cover AI API testing, MCP server trust, prompt injection as a webapp-equivalent input vector, or RAG corpus integrity. v5.x AI/MCP test cases are in working-group draft and not yet a stable reference. |
108
+ | PTES (Penetration Testing Execution Standard) | Pre-engagement, Intelligence Gathering, Vulnerability Analysis, Exploitation, Post-Exploitation | The phase structure remains useful, but every phase's asset taxonomy is network-centric: hosts, services, web apps. PTES does not require the tester to enumerate AI APIs, MCP servers, RAG corpora, embedding stores, model registries, or ephemeral runtime telemetry as scoped assets. PTES Pre-engagement scoping templates ship without an AI-surface checklist. |
109
+ | NIS2 (EU Directive 2022/2555) | Article 21 (cybersecurity risk-management measures, including patch management) | Article 21(2)(e) requires policies and procedures for testing and assessing the effectiveness of cybersecurity measures. The directive does not specify methodology, leaving "testing" to be operationalized by national CSIRTs. Patch-management language in Art. 21 assumes human-speed exploit development; AI-discovered zero-days (Copy Fail class, ~1 hour from disclosure to working PoC) make the typical 30-day patch SLA structurally inadequate. |
110
+ | TIBER-EU (ECB, 2018, refreshed under DORA) | Whole framework | TIBER-EU mandates Threat-Led Penetration Testing for significant financial entities under DORA Article 26. The framework requires a Threat Intelligence Provider report and adversary emulation by a Red Team Provider. As implemented in 2025–2026, the published scenario libraries and accepted TTPs lag current adversary tradecraft by 12–18 months. TLPT engagements that did not include AI-API-as-C2 or prompt-injection-as-RCE scenarios in 2025 missed the dominant new tradecraft. |
111
+ | CBEST (Bank of England / PRA / FCA) | Whole framework | UK equivalent to TIBER-EU for systemically important financial firms. Same lag pattern as TIBER-EU. CBEST-certified providers are not required to demonstrate competence in AI-surface attack emulation as of mid-2026. |
112
+ | Australian ISM (Information Security Manual) + ACSC Essential 8 | ISM controls on penetration testing; Essential 8 Maturity Level 3 testing requirements | Essential 8 mandates regular testing of mitigation strategies (patching, app control, MFA, etc.). The testing requirements do not extend to AI-API egress as C2, MCP trust, or RAG poisoning. ISM control set is network/endpoint centric. |
113
+ | ISO/IEC 27001:2022 | A.5.34 (Privacy and protection of PII) — note: the actually relevant clause for independent review is **A.5.35 (Independent review of information security)** and **A.8.29 (Security testing in development and acceptance)** | A.5.35 requires independent review of the information security approach at planned intervals or when significant changes occur. The clause is methodology-agnostic — auditors accept a network/web pen test as evidence even when AI surfaces are in production. A.8.29 mandates security testing of new and changed information systems, but does not define what an adequate test of an AI system looks like. |
114
+ | MITRE ATT&CK Enterprise (v17) | Whole matrix | The enterprise matrix does not contain prompt-injection as a technique. AI-as-C2 (SesameOp pattern) is absent from ATT&CK as of mid-2026. Adversary emulation programs that are ATT&CK-only and not ATLAS-extended will not include the mid-2026 dominant new tradecraft in their playbooks. ATLAS v5.1.0 covers it — but ATLAS is not yet a standard requirement for pen testing certification or scoping. |
115
+
116
+ > Global coverage note (AGENTS.md rule #5): the above table spans US (NIST 800-115, ATT&CK), EU (NIS2, TIBER-EU under DORA), UK (CBEST), AU (ISM/Essential 8), and ISO 27001:2022. US-only pen test scoping is incomplete.
117
+
118
+ ---
119
+
120
+ ## TTP Mapping (MITRE ATLAS v5.1.0 + MITRE ATT&CK v17)
121
+
122
+ Pen testers must emulate both classical and AI-class chains. The table below maps the kill-chain phases a mid-2026 adversary emulation engagement must cover.
123
+
124
+ | Phase | Classical TTP (ATT&CK v17) | AI-Class TTP (ATLAS v5.1.0) | Framework Gap Flag |
125
+ |---|---|---|---|
126
+ | Reconnaissance | T1595 (Active Scanning) — implied by T1190 setup | AML.T0000 (Search Open Technical Databases) — model card / dataset / API endpoint discovery | NIST 800-115 §3.x recon guidance is network-only |
127
+ | Initial Access | T1190 (Exploit Public-Facing Application) | AML.T0051 (LLM Prompt Injection) — entered via PR description, support ticket, retrieved doc | OWASP WSTG covers webapp; not prompt-injection as entry vector |
128
+ | Initial Access | T1133 (External Remote Services) | AML.T0010 (ML Supply Chain Compromise) — malicious MCP server installed by developer | PTES scoping templates do not require MCP server enumeration |
129
+ | Execution | T1059 (Command and Scripting Interpreter) | AML.T0051 → tool-use call invoking shell/code execution in agent context | NIS2 Art.21 patch-mgmt language assumes binary exploit; semantic-input exploit lives outside |
130
+ | Persistence | T1078 (Valid Accounts) | AML.T0018 / AML.T0010 — poisoned dependency planting valid creds | TIBER-EU scenario libraries lag this by 12–18 months |
131
+ | Command and Control | T1071 (Application Layer Protocol) | AML.T0096 (LLM Integration Abuse — AI API as C2, SesameOp pattern) | No major framework defines a control for AI-API egress as C2 |
132
+ | Exfiltration | T1041 (Exfil Over C2 Channel) | AML.T0043 (Craft Adversarial Data) → vector-store retrieval forced to surface proprietary chunks | RAG corpus is not enumerated in classical ASM |
133
+ | Defense Evasion | T1027 (Obfuscated Files or Information) | AML.T0051 with adaptive payload generation per attempt | Pen test reports that show "no anomalies" against AI-generated payloads understate evasion |
134
+
135
+ Gap flag legend: every row above maps to at least one framework gap declared in `framework_gaps` frontmatter, ensuring AGENTS.md rule #4 (no orphaned controls) holds in reverse — every TTP we emulate has a framework gap it exposes.
136
+
137
+ ---
138
+
139
+ ## Exploit Availability Matrix
140
+
141
+ Pen testers select tooling based on real exploit availability. The matrix below pulls from `data/exploit-availability.json` and `data/cve-catalog.json`; `last_verified` dates are tracked in the data files, not duplicated here.
142
+
143
+ | Vulnerability | CVSS | RWEP | CISA KEV | Public PoC | AI-Discovered/Accelerated | Live-Patchable | Pen Tester Use |
144
+ |---|---|---|---|---|---|---|---|
145
+ | CVE-2026-31431 (Copy Fail Linux kernel LPE) | 7.8 | 90 | Yes | Yes — 732-byte deterministic exploit | Yes (AI-discovered in ~1 hour) | Yes (kpatch / livepatch) | Post-exploit privilege escalation in unpatched Linux hosts; reliable, no race condition |
146
+ | CVE-2025-53773 (GitHub Copilot prompt-injection RCE) | 9.6 | 91 | No (not enterprise infra in KEV sense) | Yes — demonstrated PoC | Yes (AI tooling generates injection payloads) | No (requires Copilot update + prompt hardening) | Initial access via PR descriptions, support tickets, retrieved docs; emulate AML.T0051 |
147
+ | CVE-2026-30615 (Windsurf MCP RCE) | 9.8 | 94 | No | Partial — concept demonstrated | No | No (requires Windsurf patch) | Lateral movement to developer workstations via malicious MCP server; emulate AML.T0010 |
148
+ | CVE-2026-43284 (covered in fuzz/memory-safety skill) | see `data/cve-catalog.json` | see `data/cve-catalog.json` | see `data/exploit-availability.json` | see `data/exploit-availability.json` | see `data/cve-catalog.json` | see `data/cve-catalog.json` | Memory-corruption chain context — refer to that skill for exploitation detail |
149
+ | CVE-2026-43500 (covered in supply-chain skill) | see `data/cve-catalog.json` | see `data/cve-catalog.json` | see `data/exploit-availability.json` | see `data/exploit-availability.json` | see `data/cve-catalog.json` | see `data/cve-catalog.json` | Supply chain compromise context — refer to that skill for exploitation detail |
150
+ | SesameOp (AI-as-C2 technique, no CVE — adversary tradecraft) | N/A (technique, not vulnerability) | N/A | N/A | Yes — ATLAS-documented adversary pattern (AML.T0096) | Yes | N/A | Emulate covert AI-API C2 during adversary emulation; verifies whether egress monitoring catches AML.T0096 |
151
+
152
+ For any CVE referenced here, the canonical record (with `last_verified` date, exploit URL provenance, KEV add date) lives in `data/exploit-availability.json`. The pen tester pulls fresh values from there at engagement start — these rows are an orientation cross-walk, not the source of truth.
153
+
154
+ ---
155
+
156
+ ## Analysis Procedure
157
+
158
+ The procedure threads three foundational security principles through every phase. A pen test that does not produce explicit findings against all three has not done the work — it has done compliance theater (see Compliance Theater Check below).
159
+
160
+ ### Foundational principle #1 — Defense in depth assessment
161
+
162
+ At each architectural layer the test asks: if this layer fails, does the next layer catch it? A finding is generated whenever a single-layer failure produces complete compromise.
163
+
164
+ The layers under test (mid-2026 inventory):
165
+
166
+ | Layer | Control class | Test question |
167
+ |---|---|---|
168
+ | Perimeter | WAF, edge firewall, VPN, DDoS scrubbing | Does perimeter bypass (T1190) reach the application layer with no further check? |
169
+ | Application | Input validation, auth, session, business logic | Does prompt injection (AML.T0051) reach tool-use without semantic sanitization? |
170
+ | Identity | IAM, MFA, OIDC, role-based access | Does a compromised service account (T1078) cross trust boundaries without re-auth? |
171
+ | Data | Encryption at rest, DLP, RAG corpus access control | Does retrieval (AML.T0043) surface chunks the requesting principal should not see? |
172
+ | Runtime | EDR, seccomp, namespace isolation, container sandbox | Does a kernel LPE (Copy Fail class) escape container isolation? |
173
+ | Supply chain | SBOM, signed manifests, dependency pinning, MCP allowlist | Does an unsigned MCP server (AML.T0010) install with no manifest verification? |
174
+ | Egress | Network egress filtering, AI-API monitoring, DNS RPZ | Does AI-API egress (AML.T0096) carry C2 without behavioral detection? |
175
+
176
+ For each layer the tester records: which control(s) are present, which were probed, which caught the test, which did not. The defense-in-depth heatmap in the output format is built directly from this record.
177
+
178
+ ### Foundational principle #2 — Least privilege validation
179
+
180
+ Every identified principal — human, service, AI agent, MCP server, build pipeline runner, model inference endpoint — gets a privilege audit. The procedure:
181
+
182
+ 1. Enumerate the principal's actual permissions (IAM policy, Linux capabilities, K8s RBAC, MCP tool allowlist, AI agent function-calling spec).
183
+ 2. Enumerate the principal's required permissions for its real workload.
184
+ 3. Compute the delta. Every permission in (1) not in (2) is an excess-privilege finding.
185
+ 4. Map each excess permission to a TTP it enables (e.g. excess `s3:DeleteObject` enables T1485 Data Destruction; excess MCP `shell_exec` tool enables AML.T0051 → T1059 chain).
186
+
187
+ Excess privilege is itself a finding regardless of whether the test exercised it — least privilege is a posture metric, not an event metric.
188
+
189
+ ### Foundational principle #3 — Zero trust assumption inversion
190
+
191
+ The tester assumes every network position is hostile, every identity is untrusted until proven, every egress is suspect. The procedure inverts the implicit-trust assumptions the org operates under:
192
+
193
+ 1. From every network segment the test reaches, attempt to reach the next segment. Implicit-trust between segments is a zero-trust failure.
194
+ 2. For every identity the test assumes, attempt to invoke privileged APIs without step-up authentication. Implicit re-auth-not-required is a zero-trust failure.
195
+ 3. For every authorized egress (including the trusted AI-API endpoints), inject anomalous traffic (high-entropy payloads, base64-encoded blobs in prompt fields, oversized completions). Egress that authorizes destination without inspecting content is a zero-trust failure.
196
+
197
+ A pen test that "achieves access" by abusing implicit trust between segments is a zero-trust failure, not just a network finding. It is reported as such in the output format.
198
+
199
+ ### Concrete procedure steps
200
+
201
+ **Step 1 — Scoping and rules of engagement (PTES Pre-engagement + TIBER-EU TI-Provider report)**
202
+
203
+ Define: in-scope systems (must include AI surface, MCP installations, RAG corpora, ephemeral runtimes, supply chain manifests — not just IPs and URLs), out-of-scope systems, test windows, escalation contacts, legal authorisation (CFAA/Computer Misuse Act / Australian Cybercrime Act / equivalent jurisdiction). For TLPT engagements under DORA/CBEST, the TI-Provider Threat Intelligence report defines the adversary scenario; verify the scenario reflects mid-2026 tradecraft (AI-API-as-C2, prompt-injection-as-RCE) and is not a copy-forward of 2024 scenarios.
204
+
205
+ **Step 2 — Asset inventory (extended for the mid-2026 surface)**
206
+
207
+ Enumerate, at minimum:
208
+
209
+ - External IPs, domains, certificates (Censys/Shodan/crt.sh, RDAP).
210
+ - Web apps and APIs (with OpenAPI/GraphQL schemas where exposed).
211
+ - VPN/SSO/identity providers.
212
+ - AI APIs in use (LLM providers, local inference endpoints, AI agent frameworks).
213
+ - MCP servers installed across developer fleet (enumerate via IDE config files: `~/.cursor/mcp.json`, `~/.vscode/mcp.json`, equivalents).
214
+ - RAG corpora and embedding stores (vector DBs, document indexes).
215
+ - Model registries (Hugging Face orgs, internal MLflow / model registries).
216
+ - Ephemeral runtime fabrics (Lambda/CloudRun/Functions, K8s namespaces with HPA scale-to-zero, ephemeral build runners).
217
+ - Supply chain manifests (`package.json`, `requirements.txt`, `pyproject.toml`, `go.mod`, `Cargo.toml`, MCP manifests, CycloneDX/SPDX SBOMs).
218
+
219
+ If any of the AI-surface categories are reported as "none in scope" by the customer, capture that as a scope-limitation finding — most organisations are wrong about this in 2026.
220
+
221
+ **Step 3 — OSINT and recon**
222
+
223
+ - DNS, subdomain enumeration, certificate transparency.
224
+ - Code-host recon: public GitHub orgs for leaked tokens, exposed `.env`, branch-protection lapses.
225
+ - CVE / advisory cross-walk: pull current `data/cve-catalog.json` entries, cross-reference each in-scope component against CSAF advisories and OSV.dev for OSS components.
226
+ - EPSS-anchored prioritisation: rank candidate vulnerabilities by EPSS percentile and CISA KEV status; CVSS alone is not the ranker (AGENTS.md rule #3).
227
+
228
+ **Step 4 — Vulnerability identification (classical surface)**
229
+
230
+ - Authenticated and unauthenticated scans where authorised (Nessus / Nuclei / Burp / equivalent).
231
+ - Manual webapp testing per OWASP WSTG v5 sections 4.1–4.12.
232
+ - CWE coverage from `data/cwe-catalog.json` cross-walked to findings — CWE-79 (XSS), CWE-89 (SQLi), CWE-78 (OS command injection), CWE-787 (out-of-bounds write) are the load-bearing classical categories for the chains tested here; AGENTS.md rule #4 (no orphaned controls) requires every finding be CWE-mapped.
233
+
234
+ **Step 5 — AI-surface enumeration and probing**
235
+
236
+ For each system identified in Step 2 as feeding external content into an LLM prompt:
237
+
238
+ - Prompt-injection probes: a battery of indirect-injection payloads delivered via the system's normal input channels (PR descriptions, support tickets, retrieved docs, calendar/email if integrated). Record bypass rate.
239
+ - MCP tool-allowlist abuse: for each installed MCP server, verify (a) signed manifest enforced, (b) tool allowlist enforced, (c) bearer auth required, (d) outputs sanitized before model receives them. Test by introducing a controlled non-malicious MCP server with a permissive manifest and observing whether the client installs it without prompt.
240
+ - AI-C2 simulation: from a host the test has reached, generate AI-API traffic encoding a synthetic C2 protocol in prompt and completion fields. Observe whether SIEM / DLP / egress monitoring fires any alert. If the AI-API destination is treated as trusted internal traffic with no behavioural baseline, this is a controlled emulation of AML.T0096.
241
+ - RAG corpus assessment: where authorised, attempt vector-store retrieval that surfaces chunks outside the requesting principal's access boundary; attempt to inject a poisoned chunk via the documented ingestion path and verify whether content integrity checks fire.
242
+
243
+ **Step 6 — Exploitation (only as scoped)**
244
+
245
+ For each candidate vulnerability with high RWEP and authorised exploitation scope:
246
+
247
+ - Confirm exploitability against a non-production replica where available, then against production within RoE.
248
+ - Document blast radius before each exploitation step.
249
+ - For Copy Fail class kernel LPEs in scope, refer to the `kernel-lpe-triage` skill — exploitation methodology and live-patch verification belong there.
250
+
251
+ **Step 7 — Defense-in-depth probing**
252
+
253
+ Run the layer-by-layer test from foundational principle #1 above. The output of this step is the heatmap, not a list of bugs. For each row in the heatmap the tester records: was the next layer engaged? did it catch the test? what telemetry fired?
254
+
255
+ **Step 8 — Privilege chain analysis**
256
+
257
+ Apply foundational principle #2 to every principal the test enumerated or compromised. Produce the excess-privilege ledger directly as output.
258
+
259
+ **Step 9 — Zero-trust violation inventory**
260
+
261
+ Apply foundational principle #3. Record every implicit-trust crossing the test exploited — segment-to-segment, identity-to-identity, egress without inspection.
262
+
263
+ **Step 10 — Reporting and remediation roadmap**
264
+
265
+ Map each finding to:
266
+
267
+ - RWEP score (from `lib/scoring.js`, never CVSS alone — AGENTS.md rule #3).
268
+ - EPSS percentile and CISA KEV status (from `data/exploit-availability.json`).
269
+ - CWE category (from `data/cwe-catalog.json`).
270
+ - D3FEND defensive countermeasure (from `data/d3fend-catalog.json`) — see section 8 below.
271
+ - Framework gap exposed (from `data/framework-control-gaps.json`).
272
+
273
+ Sequence remediation by RWEP descending, with live-patchable items inside RWEP tier scheduled first when their patch SLA can realistically be met. For supply chain manifests, the remediation requires not just patching the called-out version but tightening the dependency-resolution policy that allowed it in.
274
+
275
+ ---
276
+
277
+ ## Output Format
278
+
279
+ ```
280
+ ## Penetration Test Report — [Engagement Name]
281
+
282
+ **Engagement window:** [start] – [end]
283
+ **Tester(s) / firm:** [names + certifications]
284
+ **Authorising party:** [client representative]
285
+ **Scope reference:** [PTES Pre-engagement document / TIBER-EU TI-Provider report / equivalent]
286
+
287
+ ### 1. Executive summary
288
+ - Top 5 findings by RWEP, one sentence each.
289
+ - Defense-in-depth verdict: which layers held, which failed.
290
+ - Zero-trust verdict: were implicit-trust crossings reachable?
291
+ - Overall posture characterisation against the mid-2026 attack surface.
292
+
293
+ ### 2. Scope and rules of engagement
294
+ [Verbatim or referenced from the agreed RoE document. Includes legal authorisation reference per jurisdiction.]
295
+
296
+ ### 3. Asset inventory (mid-2026 extended)
297
+ | Category | Asset | Notes |
298
+ |---|---|---|
299
+ | External IP/domain | ... | ... |
300
+ | Web app/API | ... | ... |
301
+ | Identity provider | ... | ... |
302
+ | AI API | ... | provider, endpoint, principal |
303
+ | MCP server | ... | client, signed?, allowlisted? |
304
+ | RAG corpus | ... | store type, access boundary |
305
+ | Model registry | ... | location, signing posture |
306
+ | Ephemeral runtime | ... | platform, observability coverage |
307
+ | Supply chain manifest | ... | path, lockfile present? |
308
+
309
+ Scope-limitation findings (categories the client said had "none in scope" that the tester confirmed otherwise) listed separately.
310
+
311
+ ### 4. Findings (prioritised by RWEP)
312
+ For each finding:
313
+ - ID, title, affected asset(s).
314
+ - TTP mapping (ATT&CK and/or ATLAS).
315
+ - CWE category.
316
+ - RWEP score + components (CVSS / EPSS / KEV / PoC availability / AI-acceleration / blast radius).
317
+ - Reproduction evidence.
318
+ - Defense-in-depth context: which layers did and did not catch this.
319
+ - Recommended D3FEND countermeasure (see section 8 below).
320
+ - Framework gap exposed.
321
+
322
+ ### 5. Defense-in-depth heatmap
323
+ | Layer | Control(s) probed | Caught? | Telemetry fired? | Notes |
324
+ |---|---|---|---|---|
325
+ | Perimeter | ... | Y/N/Partial | Y/N | ... |
326
+ | Application | ... | ... | ... | ... |
327
+ | Identity | ... | ... | ... | ... |
328
+ | Data | ... | ... | ... | ... |
329
+ | Runtime | ... | ... | ... | ... |
330
+ | Supply chain | ... | ... | ... | ... |
331
+ | Egress (incl. AI-API) | ... | ... | ... | ... |
332
+
333
+ ### 6. Least-privilege excess-privilege ledger
334
+ | Principal | Type | Excess permission(s) | TTP enabled | Recommendation |
335
+ |---|---|---|---|---|
336
+ | ... | human / service / AI agent / MCP server / pipeline | ... | T#### / AML.T#### | revoke / scope to specific resource / re-auth required |
337
+
338
+ ### 7. Zero-trust assumption violations
339
+ | # | Implicit-trust crossing exploited | Where to insert verification | Recommendation |
340
+ |---|---|---|---|
341
+ | 1 | segment A → segment B without identity check | east-west boundary | mutual mTLS + per-call authorisation |
342
+ | 2 | service account assumed trusted across cluster | API gateway | step-up auth on privileged operations |
343
+ | 3 | egress to api.openai.com authorised by destination only | egress inspection | behavioural baseline + content inspection per AML.T0096 |
344
+
345
+ ### 8. Remediation roadmap
346
+ | Priority | Finding | RWEP | Recommended D3FEND counter | Owner | Target date |
347
+ |---|---|---|---|---|---|
348
+
349
+ ### 9. Appendices
350
+ - Tooling and versions used.
351
+ - Raw evidence (logs, captures, payloads) — separately classified.
352
+ - References: PTES section IDs, OWASP WSTG IDs, NIST 800-115 section IDs, ATLAS/ATT&CK IDs cited.
353
+ ```
354
+
355
+ ---
356
+
357
+ ## Compliance Theater Check
358
+
359
+ Apply each of the following tests to a candidate "we have a pen test" assertion. Any one of them returning a theater verdict invalidates the engagement as evidence of real posture.
360
+
361
+ **Test 1 — Scope theater.** Ask the org for their most recent pen test report and inspect the scope statement. If the scope is only network + web app + nominal SBOM review, and the org operates AI APIs, MCP servers in developer workflow, RAG, or ephemeral runtimes, the engagement did not test the mid-2026 attack surface. A 2026 pen test that did not enumerate AI APIs, MCP servers, RAG corpora, embedding stores, model registries, or ephemeral runtime telemetry as scoped assets is theater for the current threat model regardless of how thorough it was within its declared scope.
362
+
363
+ **Test 2 — Defense-in-depth-finding theater.** Read the findings section. If a pen test of a 10,000+ employee organisation found zero defense-in-depth gaps — i.e. every reported finding either did not chain past a single control, or no chains were attempted — the engagement either had its scope artificially constrained or the tester did not run multi-stage emulation. Real organisations of that size have defense-in-depth gaps; their absence in a report is itself the finding.
364
+
365
+ **Test 3 — Pre-disclosed-scope theater.** Ask: was the pen tester told in advance which specific systems, payloads, or techniques were out of scope to "not surprise" the operations team or "avoid noise in detection systems"? If yes, that is a compliance pen test (a checkbox engagement to satisfy an auditor that "testing occurred"), not an adversary emulation. TIBER-EU and CBEST explicitly require the opposite: the Red Team Provider operates against a Blue Team that is unaware of the engagement specifics, precisely because pre-disclosure invalidates the result. A "pen test" that pre-disclosed the test plan to the SOC has measured the SOC's cooperation, not the organisation's actual resistance to a real adversary.
366
+
367
+ **Test 4 — Currency theater.** Open the report's TTP references. If the engagement is dated 2025–2026 and the TTP coverage is ATT&CK-only with no ATLAS references, the tester did not emulate the dominant new tradecraft. The pen test cannot evidence resistance to attack patterns it did not attempt.
368
+
369
+ ---
370
+
371
+ ## Defensive Countermeasure Mapping (D3FEND)
372
+
373
+ The findings the pen test typically produces map to D3FEND v0.10+ defensive countermeasures from `data/d3fend-catalog.json`. The table below is the recommended-counter cross-walk used in section 8 of the output format.
374
+
375
+ | Typical finding | D3FEND ID | What the counter actually does |
376
+ |---|---|---|
377
+ | Missing egress inspection / AI-API treated as trusted destination (AML.T0096) | D3-NTA (Network Traffic Analysis) | Baseline + anomaly detect on egress flows including authorised AI API endpoints; surfaces SesameOp-class C2 in prompt/completion fields |
378
+ | Excess service-account / AI-agent / MCP-tool privilege (least-privilege failure) | D3-EAL (Execution Activity Logging) | Logs principal actions with context so excess-privilege exercise is detectable post-hoc and reviewable against a least-privilege baseline |
379
+ | Unsigned MCP server installed by developer client (AML.T0010) | D3-CSPP (Credentialed Scan / Software Provenance Policy — per `data/d3fend-catalog.json`) | Enforces signed-manifest and provenance verification before MCP server load; rejects unsigned tool surfaces at install time |
380
+
381
+ Pen test findings outside this short table map to D3FEND countermeasures present in `data/d3fend-catalog.json` (D3-EHB, D3-MFA, D3-NI, D3-PHRA, D3-PA, D3-IOPR, etc.); the tester selects the counter whose technique-coverage description in the catalog matches the chain the finding exploited. AGENTS.md rule #4 (no orphaned controls) is satisfied because every recommended counter maps to a TTP the test actually emulated and a CWE category the finding exposed.