npm - @intentsolutionsio/penetration-tester - Versions diffs - 2.0.0 → 3.0.4 - Mend

@intentsolutionsio/penetration-tester 2.0.0 → 3.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (112) hide show

package/skills/generating-executive-summary/references/PLAYBOOK.md ADDED Viewed

@@ -0,0 +1,201 @@
+# PLAYBOOK — Audience Customizations and Delivery
+## Per-audience customizations
+### Board summary (default — quarterly board pack)
+Risk score + top 3 priorities + OWASP coverage. The board reader
+wants to know "should we be worried?" The exec summary as
+emitted answers that.
+```bash
+python3 ./scripts/exec_summary.py engagements/acme-2026-q2/
+```
+### C-suite (CISO / CTO) summary
+Same content as the board summary, but the C-suite reader may
+want lengthier remediation discussion. After running the default
+summary, the operator can edit the markdown directly to expand
+specific sections. The skill's output is intentionally treated as
+a draft for editorial expansion.
+### Customer's security leadership summary
+For the customer's own security leadership (Director of Security,
+VP Engineering's security lead, etc.), the most useful expansion
+is per-priority detail with cross-references to the vulnerability
+report. The default output already includes the
+`vulnerability-report.md#finding-XXX` cross-references — these
+are clickable in most markdown viewers.
+### External auditor / insurer summary
+For external parties (SOC2 auditor, cyber-insurance carrier),
+extend the default with engagement-archive integrity statement:
+```markdown
+## Archive integrity
+The engagement's findings, evidence, and configuration are
+archived at <path> with a SHA-256 manifest signed by <key id>.
+Verification command:
+`sha256sum -c manifest.sha256 && gpg --verify manifest.sha256.asc manifest.sha256`
+```
+Add this manually to the rendered summary before delivery.
+## Summary length guidelines
+| Use | Target length |
+|---|---|
+| Board summary | 1 page (~60 lines markdown) |
+| C-suite summary | 1.5 pages (~80 lines markdown) |
+| Customer security leadership | 2 pages (~120 lines markdown) |
+| External auditor (with archive section) | 2 pages |
+The skill emits ~80-100 lines by default. Pad or trim by editing
+the rendered markdown.
+## Common rewrite patterns
+### "Soften the risk-score band wording"
+Some customers find "Critical" / "High" risk-band language
+alarming when the underlying findings are routine. The
+risk-band labels are documented in THEORY.md; if your customer
+relationship makes the words land badly, rewrite the band
+description without changing the numeric score.
+Don't change the score itself to fit the rewrite — the score
+remains the canonical number; the descriptive band is the
+audience-facing label.
+### "Add a CFO-readable financial-impact section"
+The skill doesn't compute financial impact (out of scope; too
+many assumptions about asset value). For engagements that warrant
+a CFO-readable section:
+1. Work with the customer's finance lead to assign rough cost
+   ranges to each top priority.
+2. Append a "Financial impact estimates" section to the rendered
+   summary with the agreed numbers.
+3. Annotate clearly: "Cost estimates provided by customer; not
+   computed from finding data."
+### "Add a regulatory-impact narrative"
+For engagements in regulated industries (HIPAA, PCI, SOC2), the
+exec audience cares about regulatory exposure. Append a
+"Regulatory impact" section that cross-references the top
+findings against specific regulations:
+```markdown
+## Regulatory impact
+The top finding "Hardcoded AWS key in source" triggers HIPAA
+§164.308(a)(1)(ii)(B) (Risk Management) reporting obligation.
+Remediation timing should align with the breach-notification
+window if data has been exposed.
+```
+Don't include this section by default; add it when the engagement
+has specific regulatory framing.
+## Integration with the composing + mapping skills
+The exec-summary skill consumes:
+1. **Unified findings JSONL** (from `composing-vulnerability-report`
+   OR `mapping-findings-to-owasp-top10`)
+2. **OWASP coverage report** (from `mapping-findings-to-owasp-top10`)
+3. **ROE YAML** (from the engagement repository)
+Run order:
+```bash
+# 1. (Optional) Compose to produce the main vulnerability report first
+python3 plugins/security/penetration-tester/skills/composing-vulnerability-report/scripts/compose_report.py \
+    engagements/acme-2026-q2/
+# 2. Map to OWASP, producing the enriched JSONL and coverage report
+python3 plugins/security/penetration-tester/skills/mapping-findings-to-owasp-top10/scripts/map_owasp.py \
+    engagements/acme-2026-q2/
+# 3. Generate the exec summary
+python3 ./scripts/exec_summary.py engagements/acme-2026-q2/
+```
+## Post-delivery follow-up cadence
+The exec summary's "Suggested next steps" section starts a
+remediation cadence. The pentester's role typically:
+1. **End-of-engagement** (week 0): deliver the summary + report.
+2. **Week 2**: optional check-in with the customer's security
+   lead to confirm remediation has been prioritized.
+3. **Week 6**: confirm the top priorities have been addressed.
+4. **Week 12**: re-test the top priorities to verify the fixes
+   hold. The re-test produces a follow-up summary using the same
+   skill against the same engagement directory.
+Re-tests should produce a LOWER risk score after remediation. If
+the score doesn't drop, either the fixes didn't take effect or
+new findings emerged in the re-test scan.
+## Sample delivery package
+The standard engagement closeout package contains:
+```
+engagements/archives/acme-2026-q2.tar.gz
+├── engagement directory contents
+├── manifest.sha256                  # SHA-256 of every file
+├── manifest.sha256.asc              # GPG signature
+└── reports/
+    ├── vulnerability-report.md      # full technical detail
+    ├── owasp-coverage.md            # OWASP Top 10 breakdown
+    └── executive-summary.md         # this skill's output
+```
+Plus separately:
+- The exec summary as a standalone PDF (via `/whiteglove-pdf`).
+- A receipt-signed copy of the archive's manifest.
+## Risk-score auditability
+If a customer or third party questions the risk score, the
+auditable response:
+1. Show the score-composition formula from THEORY.md.
+2. Show the severity counts that fed the formula.
+3. Show the OWASP-breadth count.
+4. Show the governance-bonus determination.
+5. Demonstrate that re-running the skill produces the same score.
+The skill is deterministic by design so this audit is always
+possible.
+## When the score and the findings disagree intuitively
+Occasionally the algorithmic score doesn't match the operator's
+gut feeling — usually because:
+- A single very high-impact finding doesn't dominate enough
+  (the formula adds rather than maxing).
+- Numerous low-severity findings produce a higher score than the
+  operator expected.
+- The OWASP-breadth bonus elevates an otherwise-modest engagement.
+Operator can:
+- Document the divergence in the summary's "Suggested next steps"
+  section.
+- Use `--priority-overrides` to set top-3 priorities that match
+  intuition.
+- Refrain from re-engineering the formula for one engagement;
+  consistency across engagements is the formula's value
+  proposition.

package/skills/generating-executive-summary/references/THEORY.md ADDED Viewed

@@ -0,0 +1,195 @@
+# THEORY — Executive Summary as a Technical-Communication Discipline
+## What an exec summary actually is
+An executive summary is not a shorter vulnerability report. It's
+a different document for a different reader. The vulnerability
+report says "here is every finding we identified." The exec
+summary says "here is what the board / CEO / CISO needs to decide
+on, and how big the decision is."
+The summary's job is to compress without losing decision-relevant
+information. Compressed too far, it becomes content-free chrome
+("the engagement found several issues; please refer to the full
+report"). Compressed too little, it becomes a redundant technical
+document.
+The middle ground:
+1. A single number that establishes the engagement's bottom line
+   (risk score).
+2. Counts that contextualize the number (severity breakdown).
+3. Specific named priorities the reader can act on.
+4. Just enough scope/authz framing that the reader can validate
+   the summary against their own expectations.
+5. A clear pointer to the deep document.
+## Why a single risk score
+The single-number summary is controversial in security circles —
+it loses information, it can be gamed, it doesn't capture
+exploitation likelihood. All true.
+It's still the right primitive for the exec audience because:
+- Execs are accustomed to making decisions on single numbers
+  (NPS, MAU, ARR, Lighthouse score). A pentest result without a
+  number doesn't fit the decision frame.
+- The number's interpretation band ("Low / Moderate / Elevated /
+  High / Critical") is more decision-relevant than the precise
+  number anyway.
+- Score-to-decision mapping is consistent across engagements;
+  the customer can compare year-over-year and customer-to-customer.
+Critical caveats the summary writer should NOT lose:
+- The score is HEURISTIC. A 65 isn't twice as bad as a 32.
+- The score's composition formula is documented; the reader can
+  audit how it was computed.
+- The score isn't a substitute for reading the finding list.
+## Risk score composition — design rationale
+The chosen weights:
+- 20 per CRITICAL — each critical finding has potential to be a
+  full-organization-impact event. Two criticals already pushes
+  the score into the elevated band.
+- 10 per HIGH — high findings are material but typically scoped
+  to specific systems.
+- 3 per MEDIUM — mediums accumulate to meaning; one isn't a
+  big deal, but ten is.
+- 1 per LOW — low findings are noise individually; in bulk they
+  signal hygiene problems.
+- 0 per INFO — informational findings don't move the score.
+The OWASP-coverage breadth term adds points for engagements that
+land findings in many categories. Reasoning: an engagement that
+finds problems in 8 OWASP categories has surfaced a broader risk
+surface than one that finds problems in 2. The breadth term
+captures that.
+The governance bonus (-10 when ROE is clean) is a deliberate
+adjustment. A clean engagement with the same findings represents
+LESS organizational risk than a chaotic engagement with the same
+findings, because the customer has the operational rigor to
+remediate effectively.
+## Score band interpretation
+The 0-100 score maps to qualitative bands:
+| Range | Band | Decision implication |
+|---|---|---|
+| 0-25 | Low | Continue current cadence; no special action |
+| 26-50 | Moderate | Standard remediation planning |
+| 51-75 | Elevated | Surface to security leadership; quarter-by-quarter tracking |
+| 76-90 | High | Executive attention; structured remediation plan |
+| 91-100 | Critical | Treat as incident; urgent remediation |
+The bands were chosen to match how exec stakeholders naturally
+discretize risk. Five bands beats three (too coarse) or ten (too
+fine).
+## Deterministic priority selection — why not human-curated
+The top-3 priorities are picked algorithmically by severity +
+reachability + alphabetical tie-break. Reproducibility wins over
+nuance:
+- Same findings → same top-3, always.
+- The skill can be re-run after remediation to confirm priorities
+  changed.
+- Auditors and customers can verify the prioritization wasn't
+  selectively edited to match a narrative.
+The `--priority-overrides` flag exists for legitimate operator
+intervention (e.g. "this hardcoded AWS key is the single biggest
+risk regardless of count") but the default is algorithmic.
+## CVSS vs DREAD vs STRIDE risk frameworks
+Several risk-scoring frameworks compete:
+| Framework | Used for | Strengths | Why not here |
+|---|---|---|---|
+| CVSS | Per-vulnerability scoring | Industry-standard, NVD-blessed | Per-vuln, not aggregate; no exec-summary primitive |
+| DREAD | Threat-modeling risk | Captures Damage/Reproducibility/Exploitability/Affected users/Discoverability | Not pentest-natural; weights are subjective |
+| STRIDE | Threat categorization | Maps to threat types | Categorical, not numeric |
+| EPSS | Real-world exploitation likelihood | Predictive, data-driven | Per-CVE, not per-engagement |
+| FAIR | Quantitative risk in dollars | Most precise | Expensive to compute; requires asset-value modeling |
+The skill's risk score is a custom aggregate optimized for the
+exec-summary use case. It's not a substitute for any of the
+above; it composes with them.
+## Effort + impact estimates
+Each priority gets a rough effort (Hours / Days / Weeks) and
+impact (Limited / Significant / Material) tag. These are
+HEURISTIC and based on the source skill + reach. The skill
+deliberately doesn't try to predict dollar impact or hour count —
+those are owned by the customer's engineering team, not the
+pentester.
+Effort heuristics:
+- Dependency upgrade — Hours to Days
+- Hardcoded secret rotation — Hours
+- Config / header fix — Days
+- Injection / deserialization fix — Weeks (requires code changes)
+- License compliance — Weeks (requires legal + dep swap)
+Impact heuristics:
+- CRITICAL severity → Material
+- HIGH severity + reach ≥ 3 → Material
+- HIGH severity + reach < 3 → Significant
+- MEDIUM severity + reach ≥ 5 → Significant
+- Otherwise → Limited
+The customer's engineering team will refine these; the skill
+provides starting estimates.
+## Document length
+Target: 1-2 pages when rendered as PDF. The skill's output is
+typically 60-100 lines of markdown, which fits cleanly on 1-2
+US Letter pages.
+Longer = unread. Shorter = content-free. The sections (Risk
+score, Engagement scope, Top priorities, OWASP coverage, Next
+steps) are non-negotiable; everything else is compression
+optimization.
+## Stable rendering
+Same findings + same ROE + same coverage → same exec summary
+except for the generation date. This is important because the
+exec summary may go to the board, to insurers, to auditors —
+parties who will compare the document against any
+re-rendered version.
+If the summary changes between renderings without a finding
+change, something is wrong. Sources of non-stability to avoid:
+- Random tie-breaks (use alphabetical sort)
+- LLM-based phrasing
+- Time-stamped section content beyond the header date
+- Counts that depend on the iteration order of dicts
+The current implementation is stable. Future modifications must
+preserve this.
+## Format choices
+- Markdown for portability and version-control friendliness.
+- Numeric tables for severity counts (machine-parseable).
+- Per-priority subsections rather than a single bullet list
+  (gives each priority enough room to be acted on).
+- Pointer to vulnerability-report.md anchors so the reader can
+  jump to the deep detail when needed.
+PDF rendering is downstream; the skill's output is the source
+markdown that a separate pdf-generator (e.g. `/whiteglove-pdf`)
+can render for handoff.