npm - @aegis-scan/skills - Versions diffs - 0.1.1 → 0.2.0 - Mend

@aegis-scan/skills 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/skills/ops/README.md CHANGED Viewed

@@ -1,6 +1,41 @@
-# Operations Skills
+# Operations Skills — `ops/`
-Reserved for incident-response, post-build-audit, and verify-install-
-integrity skill modules. Target: `skills-v0.3+`.
+Operational runbooks for the AEGIS workflow itself — how to triage
+findings, when to suppress, when to escalate, how to respond to
+exploited vulnerabilities. These skills wrap the AEGIS CLI + reporters
+in process-discipline so teams use AEGIS consistently rather than
+ad-hoc.
-Scope and sources TBD. AEGIS-native authoring anticipated.
+## Sources
+| Source dir | License | Skills |
+|---|---|---|
+| `aegis-native/` | MIT (AEGIS-original) | 3 |
+## AEGIS-native skills
+| Skill | When to use |
+|---|---|
+| `triage-finding` | Receiving a new AEGIS finding (PR comment, JSON, SARIF). Triage decision tree: severity → confidence → verify → fix-vs-suppress-vs-defer. |
+| `suppress-correctly` | About to add a suppression; reviewing existing suppressions for staleness; auditing suppressions before a security review. Three legitimate cases + anti-patterns + audit-trail expectations. |
+| `escalation-runbook` | A BLOCKER reached `main`; a finding suggests active exploitation; a credential leak detected; a suppression has been gamed. Severity ladders, immediate-containment playbook, notification triggers, post-incident review structure. |
+License: MIT. See top-level [`ATTRIBUTION.md`](../../ATTRIBUTION.md) for
+attribution chain.
+## Roadmap
+Future expansions:
+- `evidence-gathering` — collect evidence for review when AEGIS surfaces a real finding.
+- `false-positive-handling` — FP triage workflow + scanner-rule refinement feedback loop.
+- `post-incident-review` — review-meeting structure after a security incident.
+- `fix-mode-discipline` — operational discipline around `aegis fix` (LLM-driven remediation safety).
+- `compliance-audit-prep` — preparing AEGIS evidence for SOC 2 / ISO 27001 / PCI-DSS audits.
+- `ci-gating-tuning` — when to fail the build vs warn vs annotate.
+## See also
+- AEGIS suppressions docs — `docs/suppressions.md`.
+- AEGIS confidence-rules — top-level `README.md` § "Scoring".
+- AEGIS incident-response convention — top-level `SECURITY-INCIDENT-RESPONSE.md`.

package/skills/ops/aegis-native/escalation-runbook/SKILL.md ADDED Viewed

@@ -0,0 +1,147 @@
+<!-- aegis-local: AEGIS-native skill, MIT-licensed; escalation runbook for severe AEGIS findings. -->
+---
+name: ops-escalation-runbook
+description: "Escalation runbook for severe AEGIS findings — what to do when a BLOCKER reaches main, when a finding suggests active exploitation, when a credential leak is detected, or when a suppression has been gamed. Covers immediate-containment steps, internal communication, legal/compliance notification triggers, post-incident review structure, and the AEGIS-specific forensic tooling. Use when responding to a high-severity AEGIS finding outside normal triage."
+---
+# Escalation Runbook — Severe AEGIS Findings
+## When to use this skill
+You are escalating one of:
+- A BLOCKER finding has reached `main` (failed CI gate, was force-merged, or AEGIS was bypassed).
+- A finding suggests active exploitation (credential leak in source, public-key/private-key pair leaked, hardcoded admin password in production code).
+- An AEGIS suppression has been gamed (review found a "FP" suppression covering a real vulnerability).
+- A new scanner update has produced findings on previously-shipping code (regression in coverage).
+## Severity ladders — when to escalate
+| Trigger | Escalate to | Within |
+|---|---|---|
+| BLOCKER on main | sec-team-lead + module owner | 1 hour |
+| Credential leaked in source (jwt-detector / entropy-scanner) | sec-team-lead, IT ops, vendor-mgmt (if vendor cred) | 30 min — rotate IMMEDIATELY |
+| RLS / tenant-isolation finding suggesting cross-tenant exposure | sec-team-lead, legal, DPO | 1 hour |
+| GDPR-engine finding involving production PII handling | sec-team-lead, DPO, legal | 4 hours, GDPR Art. 33 timer starts |
+| Multiple new BLOCKERs after a scanner-rule update (regression) | sec-team-lead | 2 hours |
+| Suppression-gaming detected during audit | sec-team-lead + module owner + author of suppression | 24 hours |
+If you're not sure whether to escalate — escalate. False alarms cost you 30 minutes; missed escalations cost you a breach.
+## Immediate-containment playbook (post-finding)
+Order matters. Don't skip steps.
+### Step 1 — Stop the bleeding
+For active-exposure findings:
+```
+Credential leaked → rotate the credential RIGHT NOW; don't wait to investigate
+Auth-bypass on prod → block the affected route at the WAF / load balancer
+RLS bypass → disable the affected query at the gateway, NOT in code (deploy lag is a window)
+```
+The runtime-stop happens BEFORE code-fix because deploys take time and the finding is publicly visible the moment AEGIS surfaces it.
+### Step 2 — Assess reach
+```bash
+# What did this credential / route / query touch in the last N days?
+aegis history . --blame --since "30 days ago"
+# Forensic logs — pull the affected window
+# (project-specific — ingestion logs, audit logs, app logs)
+```
+Bound the exposure: which users? which data classes? which time range? The answer drives notification scope.
+### Step 3 — Patch and verify
+1. Code-fix per the appropriate defensive skill (`defensive-rls-defense`, `defensive-tenant-isolation`, `defensive-ssrf`, etc.).
+2. Add a regression test that the fix prevents re-exploitation.
+3. Re-deploy.
+4. Verify the runtime-stop can be lifted (the WAF block / route disable from Step 1).
+### Step 4 — Notify
+| Trigger | Notification |
+|---|---|
+| PII exposure | GDPR Art. 33: 72 hours to supervisory authority. CCPA equivalent for CA residents. |
+| Cardholder data exposure | PCI-DSS req 12.10 — incident response plan kicks in. |
+| Vendor credentials leaked | Notify the vendor immediately — they need to rotate on their side. |
+| Material breach | SEC rule 1.05 (4-day disclosure window for material cybersecurity incidents). |
+| User credentials leaked | Notify affected users; recommend password change. |
+If unsure whether a notification trigger applies, consult legal counsel before notifying. Premature disclosure of an unconfirmed incident has its own risks.
+### Step 5 — Post-incident review (within 7 days)
+Structure:
+1. **Timeline** — when was the vulnerable code introduced (use `git blame`); when was AEGIS run on it; when did AEGIS first flag it; when was it triaged; when was it merged; when was it discovered.
+2. **Root cause** — code-level + process-level. "We bypassed the AEGIS gate because we were rushing the release" is a legitimate process root cause.
+3. **Impact** — who/what was exposed, for how long.
+4. **Action items** — code fix (already done), process fix (e.g., revoke `--no-verify` permission, mandate sec-team review for any AEGIS-bypass), tooling fix (e.g., scanner-rule that would have caught this earlier).
+The review document goes to a permanent location (security-incident log) and gets auditor-readable for the next compliance review.
+## When AEGIS is the alerting source
+AEGIS is a SAST + light DAST tool. It surfaces findings in:
+- PR comments (CI integration)
+- SARIF (GitHub Code Scanning)
+- Terminal / JSON (`aegis scan` direct output)
+- MCP server (AI-coding-agent surface)
+When a finding ESCALATES to incident-level, AEGIS is the source of the lead, not the source of truth. The source of truth is the runtime evidence:
+- Audit logs from the affected service.
+- Cloud provider logs (CloudTrail, Cloud Audit Logs, Activity Logs).
+- DB query logs.
+- WAF / proxy logs.
+Cross-reference the AEGIS finding with these to determine whether code-vulnerable + actually-exploited.
+## When AEGIS finds a regression in coverage (post-update)
+If a scanner-rule update triggers new findings on shipping code, the situation is:
+- The shipping code was previously vulnerable.
+- AEGIS is now telling you about it.
+- The exposure window is everything-since-the-vulnerable-code-shipped, NOT since-the-scanner-update.
+Treat as a real finding (not as scanner noise). The right response:
+1. Verify (per `ops-triage-aegis-finding`) — false positive, or real?
+2. If real, escalate per the severity ladder above.
+3. Code-fix.
+4. Post-incident review specifically asks: "why did the previous scanner version miss this?" (and: "what other patterns might it still miss?").
+## Anti-patterns
+### Anti-pattern 1 — "It's been there forever, can't be that bad"
+Tenure does not establish safety. SQL injection that's been in the codebase for 4 years is exactly as exploitable as one introduced yesterday — the attacker doesn't care about the tenure.
+### Anti-pattern 2 — "Wait for next sprint"
+For BLOCKERS and credential leaks, "next sprint" is unacceptable. The exposure is active until you fix it.
+### Anti-pattern 3 — Suppress under pressure
+A pressured "suppress to ship" decision is exactly when post-incident reviews most often find process root causes. Resist; document; if you must bypass AEGIS, do it explicitly with the sec-team-lead's signature, time-boxed, and post-incident-reviewed.
+### Anti-pattern 4 — Silent fix
+Fixing a BLOCKER with a one-line PR titled "small fix" hides the incident from future audit. Use a clear PR title (`fix(security): SQL-injection in /api/foo route`); cross-reference the post-incident review document.
+## See also
+- `ops-triage-aegis-finding` — the upstream triage decision.
+- `ops-suppress-correctly` — when suppression is the right call (it isn't, in escalation context).
+- `defensive-rls-defense` / `defensive-tenant-isolation` / `defensive-ssrf` — domain-specific patch playbooks.
+- AEGIS `SECURITY-INCIDENT-RESPONSE.md` — the project's own incident-response convention.
+- GDPR Art. 33 — https://gdpr-info.eu/art-33-gdpr/

package/skills/ops/aegis-native/suppress-correctly/SKILL.md ADDED Viewed

@@ -0,0 +1,196 @@
+<!-- aegis-local: AEGIS-native skill, MIT-licensed; operational runbook for suppressing AEGIS findings correctly. -->
+---
+name: ops-suppress-correctly
+description: "Operational runbook for suppressing AEGIS findings correctly. Covers when suppression is appropriate (verified FP, compensating control, accepted risk), the per-finding suppression syntax, the config-level suppression syntax, common anti-patterns, suppression-decay (review cadence), and the audit trail expectations. Use when adding any suppression, reviewing an existing suppression, or auditing suppressions for staleness."
+---
+# Suppress Correctly — AEGIS Suppression Runbook
+## When to use this skill
+- About to add a suppression — understand the policy + format first.
+- Reviewing existing suppressions for staleness.
+- Auditing suppressions before a security review or compliance audit.
+- A scanner-rule update has produced new findings on previously-suppressed code.
+## The three legitimate suppression cases
+A suppression is appropriate ONLY in these cases. Anything else is gaming the score.
+### Case 1 — Verified false positive
+The scanner over-matched. Your verification (per `ops-triage-finding`) showed the code is safe. The right next steps:
+1. Add the suppression with a one-line rationale.
+2. **File a scanner-rule refinement issue** — link it from the rationale. AEGIS improves precision on real-world FPs; the issue is the feedback loop.
+3. Optionally add a regression-test fixture that captures this specific FP shape.
+```typescript
+// aegis-disable: ssrf-checker — URL is hard-coded to api.partner.com after the line-38 allowlist check. Verified safe 2026-04-22, refinement issue #1234.
+await fetch(allowlistedUrl);
+```
+### Case 2 — Compensating control elsewhere
+The code is vulnerable in isolation but safe in context. The compensating control lives outside the scanner's view (often runtime: a WAF rule, a proxy header strip, a network-level deny).
+```typescript
+// aegis-disable: auth-enforcer — endpoint runs behind internal-only ingress (network-level deny on public traffic). Compensating control: ingress.yaml line 47.
+export async function GET() {
+  /* ... */
+}
+```
+The rationale MUST identify the compensating control's location precisely. "We have a firewall" is not enough; "ingress.yaml line 47" is.
+### Case 3 — Accepted risk (risk-register documented)
+The team has formally accepted this risk class. There's a risk register entry (in your tracking system, ideally). The suppression references the entry.
+```typescript
+// aegis-disable: rate-limit-checker — accepted risk per risk-register entry RR-2026-Q2-04 (internal admin tool, low traffic, MFA-required). Owner: sec-team. Review: 2026-Q3.
+```
+The risk-register entry must include: risk description, accepting authority (CISO / sec-team-lead), review date, compensating controls.
+## Anti-patterns
+### Anti-pattern 1 — Naked suppression
+```typescript
+// aegis-disable: ssrf-checker
+await fetch(userInput);
+```
+Why it's wrong: no rationale. The next reviewer can't tell whether this is a real FP, a compensating control, or just noise-silencing. Fail.
+### Anti-pattern 2 — Vague rationale
+```typescript
+// aegis-disable: ssrf-checker — safe
+await fetch(userInput);
+```
+Why it's wrong: "safe" is an assertion without justification. The reviewer needs to know WHY it's safe.
+### Anti-pattern 3 — Wholesale-disable a scanner
+```json
+// aegis.config.json
+{
+  "scanners": {
+    "ssrf-checker": false
+  }
+}
+```
+Why it's wrong: turns off the scanner globally. One real bug in 10K lines and you've exposed the entire codebase. Always suppress per-finding, not per-scanner.
+### Anti-pattern 4 — Suppress to ship
+If a BLOCKER finding is the only thing blocking a release and the team's response is "suppress it", you're not triaging — you're hiding a real exposure under a permanent rug. The right move: fix the BLOCKER. If you genuinely cannot, file an emergency risk-register entry with the CISO's signature and a 7-day-fix deadline.
+### Anti-pattern 5 — Long-lived TODO suppressions
+```typescript
+// aegis-disable: xss-checker — TODO fix after refactor
+```
+If the suppression has a "TODO" the rationale already concedes the code is wrong. The "TODO" rots over years. Do not. Either fix it now, or risk-register-accept it formally.
+## Suppression syntax
+### Inline (single finding at a specific line)
+```typescript
+// aegis-disable: <scanner-name> — <rationale + date + owner>
+<the line being suppressed>
+```
+The directive applies to the next non-comment line. Place it directly above the offending code. If the scanner emits at a multi-line statement, the directive on the first line covers the whole statement.
+### Config-level (persistent suppressions across scans)
+```json
+{
+  "suppressions": [
+    {
+      "scanner": "ssrf-checker",
+      "file": "lib/internal-fetch.ts",
+      "line": 42,
+      "rationale": "<rationale + date + owner>",
+      "owner": "sec-team",
+      "added": "2026-04-22",
+      "review_by": "2026-10-22"
+    }
+  ]
+}
+```
+Use config-level when:
+- The finding emits at multiple lines (file-level suppression cleaner than 5 inline directives).
+- The file is auto-generated (you can't add inline comments without breaking the generator).
+- You need centralized auditability (the config is one diff to review, not 20 inline diffs).
+### Pattern-level (suppress a class of findings across many files)
+Avoid this except for canonical platform-wide compensating controls. If you must:
+```json
+{
+  "suppressions": [
+    {
+      "scanner": "header-checker",
+      "filePattern": "**/*.test.ts",
+      "rationale": "Test files do not serve HTTP — header-checker N/A. Reviewed 2026-04-22 by sec-team."
+    }
+  ]
+}
+```
+The `filePattern` field uses glob syntax. Be conservative; pattern-suppressions hide whole classes of findings.
+## Review cadence — suppressions decay
+Every suppression has an implicit "decay date" — code changes, the surrounding context shifts, the compensating control gets removed. AEGIS supports a `review_by` field; suppressions past their `review_by` get a `[STALE]` flag in the report.
+Recommended cadence:
+- **Inline suppressions** — review at the next major refactor of the affected file.
+- **Config-level suppressions** — review every 6 months minimum.
+- **Pattern-level suppressions** — review every 3 months.
+`aegis history . --blame --suppressions` enumerates all suppressions with age + last-reviewed date.
+## Audit trail expectations
+Every suppression should answer 5 questions for the next reviewer:
+1. **Why is this safe?** (the rationale)
+2. **What compensating control covers it?** (the location of the compensating control)
+3. **Who decided?** (the owner team or individual)
+4. **When was it added?** (the date)
+5. **When does it need re-review?** (the review-by date)
+Suppressions that don't answer these 5 questions are technical debt.
+## Compliance audit considerations
+Auditors (SOC 2, ISO 27001, PCI-DSS) will ask:
+- "Show me your suppressions."
+- "Why is this one suppressed?"
+- "Who approved it?"
+- "When was it last reviewed?"
+- "What's your stale-suppression review process?"
+If your answer to any of these is "we don't track that" — you're failing audit. The structured-rationale pattern above is the audit-ready shape.
+## See also
+- `ops-triage-aegis-finding` — the upstream decision: fix or suppress.
+- `ops-escalation-runbook` — what to do when suppressions are gamed.
+- AEGIS suppressions docs — `docs/suppressions.md`.
+- AEGIS scanner-rule refinement issue tracker — file FPs as actionable feedback.

package/skills/ops/aegis-native/triage-finding/SKILL.md ADDED Viewed

@@ -0,0 +1,144 @@
+<!-- aegis-local: AEGIS-native skill, MIT-licensed; operational runbook for triaging an AEGIS finding. -->
+---
+name: ops-triage-aegis-finding
+description: "Operational runbook for triaging an AEGIS finding. Covers severity-priority, evidence verification, false-positive determination, fix-vs-suppress decision-tree, ownership routing, and SLA expectations. Use when receiving a new AEGIS finding (PR comment, JSON output, SARIF in code-scanning), reviewing a backlog of findings, or onboarding a team to AEGIS workflow."
+---
+# Triage an AEGIS Finding — Operational Runbook
+## When to use this skill
+- A PR has new AEGIS findings and you need to decide what to fix vs accept vs defer.
+- You're reviewing the AEGIS backlog (existing findings on main).
+- You're onboarding a new team member to the AEGIS triage workflow.
+- An AEGIS finding has been disputed and you need a structured re-triage.
+## The triage decision tree
+```
+For each finding:
+  1. Severity check
+     ├── BLOCKER  → Stop the line; fix before merge.
+     ├── HIGH     → Fix in this sprint; backlog with deadline.
+     ├── MEDIUM   → Triage: fix / suppress-with-reason / defer.
+     └── LOW      → Triage: fix-when-touching-the-code / suppress-with-reason.
+  2. Confidence check
+     ├── high     → Trust the finding.
+     ├── medium   → Verify before deciding. ~30% FP-rate at this level historically.
+     └── low      → Always verify. >50% FP-rate. Many low-confidence findings exist because external tools were missing.
+  3. Verification
+     - Open the file at the finding's line range
+     - Read the code; reproduce the vulnerable shape mentally
+     - If you can't reproduce → likely FP, suppress with rationale
+     - If you can reproduce → it's a real finding
+  4. Decision
+     ├── Fix       → Default for BLOCKER + verified-real findings
+     ├── Suppress  → Verified FP or compensating control elsewhere
+     ├── Defer     → Real but non-blocker, scheduled in backlog with owner + deadline
+     └── Dispute   → Open issue requesting scanner-rule refinement
+```
+## Severity is not optional
+AEGIS uses 4 severity levels: BLOCKER / HIGH / MEDIUM / LOW. **BLOCKER and CRITICAL are semantically equivalent**, both forcing the score to 0 / grade F.
+Examples:
+- BLOCKER — eval injection, hardcoded production secret, unauthed admin route, SQL injection on unscoped query.
+- HIGH — missing CSRF on mutation, missing rate-limit on auth endpoint, weak crypto.
+- MEDIUM — header missing, unstructured logging, missing pagination.
+- LOW — debug artifact, minor doc gap.
+If you're triaging a BLOCKER and considering "defer", **stop**. BLOCKERs are by definition not deferrable. Either it's actually a BLOCKER (fix it now) or it's been mis-classified (file a scanner-rule refinement issue).
+## Confidence is the FP-rate signal
+AEGIS findings carry `confidence: 'high' | 'medium' | 'low'`. The signal:
+- **high (default)** — scanner has high confidence based on per-CWE rules. Trust it, fix it.
+- **medium** — scanner has reasonable confidence, but the rule's per-CWE FP-rate hasn't been measured at scale. Common for cross-file taint (today). Verify before fixing.
+- **low** — scanner ran without one or more external tools (e.g., Semgrep not installed). The finding may be incomplete; the report shows a `[LOW-CONFIDENCE]` PR badge.
+Always inspect the finding's source code before suppressing a `medium` or `low` confidence finding.
+## How to verify a finding (15-minute drill)
+For each finding:
+1. **Read the rule.** AEGIS's README scanner inventory tells you what each scanner detects. If the rule description matches the code, it's likely real.
+2. **Read the code.** Open the file, read 10 lines above + below the flagged line. Mental model: "could an attacker make this code do something the author didn't intend?".
+3. **Trace the data flow.** Where does the vulnerable input come from? Where does it end up (the sink)? Are there sanitizers in between?
+4. **Check the per-CWE sanitizer awareness.** AEGIS doesn't false-positive on `parseInt` blocking SQLi, `DOMPurify` blocking XSS, etc. If a sanitizer exists and AEGIS still flags, either the sanitizer doesn't cover this CWE, or there's a real gap.
+5. **Reproduce mentally.** Walk through an attacker payload in your head. If you can construct one, fix the code. If you can't, it's likely a FP.
+## Fix-vs-suppress decision
+**Fix** is the default. Suppress is for these specific cases:
+1. **Verified FP** — your verification showed the code is safe; AEGIS over-matched. Suppress with rationale; file a scanner-rule refinement issue.
+2. **Compensating control elsewhere** — the code is vulnerable in isolation but safe in context (e.g., a route is auth-bypass-prone in source, but lives behind a proxy that strips auth-bypass headers). Suppress with rationale; document the compensating control's location.
+3. **Architectural decision documented as risk-accepted** — the team has decided this risk class is accepted (e.g., "internal admin tool, accept lower auth bar"). Document in a risk register; suppress with reference.
+Never suppress to silence noise without one of the above.
+## Ownership routing
+| Finding category | Default owner |
+|---|---|
+| Security (BLOCKER + HIGH) | security team |
+| Compliance (gdpr-engine, soc2-checker, etc.) | security + compliance |
+| Quality (logging-checker, console-checker, http-timeout-checker) | the owning team / module owner |
+| i18n | frontend team |
+| Dependencies (supply-chain, dep-confusion-checker) | platform / infra team |
+Wire ownership in `aegis.config.json`'s notification channels per category, or in your team's CODEOWNERS.
+## SLA defaults
+The AEGIS-recommended SLA defaults (these are guidelines, not enforced):
+| Severity | Time-to-fix |
+|---|---|
+| BLOCKER | Stop the line — fix before merge OR within 24h on main |
+| HIGH | Sprint-level (1-2 weeks) |
+| MEDIUM | Backlog — reviewed quarterly |
+| LOW | Best-effort; fix when touching the code |
+If a BLOCKER must merge for emergency reasons (e.g., the BLOCKER itself is blocking a more urgent fix), document the temporary acceptance + owner + deadline; bypass is possible via `--no-verify` on the pre-push hook (pair with explicit team Slack post).
+## How to record triage decisions
+For findings that are accepted (suppressed or deferred), record the decision in a way the next person can audit:
+```typescript
+// Inline suppression
+// aegis-disable: <scanner-name> — <one-line rationale + date + owner>
+const code = doSomething();
+// Or in aegis.config.json:
+{
+  "suppressions": [
+    {
+      "scanner": "ssrf-checker",
+      "file": "lib/internal-fetch.ts",
+      "line": 42,
+      "rationale": "Internal fetch wrapper; URL validated against tenant-allowlist by line 38. Reviewed by sec-team 2026-04-22.",
+      "owner": "sec-team",
+      "added": "2026-04-22"
+    }
+  ]
+}
+```
+The `aegis history . --blame` command exposes which suppressions are stale (older than N days, owner left team, etc.).
+## See also
+- `ops-suppress-correctly` skill — when and how to suppress, with the suppression-template.
+- `ops-escalation-runbook` skill — what to do when a BLOCKER reaches main without proper triage.
+- AEGIS suppressions docs — `docs/suppressions.md`.
+- AEGIS confidence-rules — `README.md` § "Scoring" + per-scanner README sections.