ai-project-maintainer 0.4.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (51) hide show
  1. package/README.md +16 -6
  2. package/ai-project-maintainer/agents/openai.yaml +6 -6
  3. package/ai-project-maintainer/references/ci-guardrails.md +55 -55
  4. package/ai-project-maintainer/references/database.md +60 -60
  5. package/ai-project-maintainer/references/electron-desktop.md +43 -43
  6. package/ai-project-maintainer/references/incident-response.md +52 -52
  7. package/ai-project-maintainer/references/security.md +48 -48
  8. package/ai-project-maintainer/references/tool-router.md +53 -53
  9. package/ai-project-maintainer/scripts/bootstrap-local-tools.ps1 +109 -109
  10. package/ai-project-maintainer/scripts/ci-smoke-gate.mjs +26 -26
  11. package/ai-project-maintainer/scripts/init-project.mjs +30 -18
  12. package/ai-project-maintainer/scripts/lib/check-registry.mjs +10 -9
  13. package/ai-project-maintainer/scripts/lib/checks.mjs +22 -10
  14. package/ai-project-maintainer/scripts/lib/command-runner.mjs +17 -3
  15. package/ai-project-maintainer/scripts/lib/policy.mjs +6 -4
  16. package/ai-project-maintainer/scripts/lib/report.mjs +56 -32
  17. package/assets/demo-90s-storyboard.svg +98 -0
  18. package/assets/demo-90s.gif +0 -0
  19. package/assets/social-preview.png +0 -0
  20. package/assets/social-preview.svg +55 -0
  21. package/docs/DEMO.md +68 -61
  22. package/docs/DEMO.zh-CN.md +75 -69
  23. package/docs/GITHUB-LAUNCH-CHECKLIST.md +11 -11
  24. package/docs/POLICY-AND-EXCEPTIONS.zh-CN.md +1 -1
  25. package/docs/PROMOTION.md +49 -21
  26. package/docs/SECURITY-WORKFLOW.md +61 -59
  27. package/docs/UPGRADE-ROADMAP.zh-CN.md +58 -58
  28. package/docs/demo-output/90-second-demo.html +187 -0
  29. package/docs/demo-output/before-after-case.md +91 -0
  30. package/docs/demo-output/security-report.md +62 -61
  31. package/docs/superpowers/plans/2026-06-29-ci-dogfooding.md +200 -200
  32. package/examples/demo-ai-app/.ai-maintainer/business-flows.yml +14 -14
  33. package/examples/demo-ai-app/.ai-maintainer/db-migration-policy.yml +6 -6
  34. package/examples/demo-ai-app/.ai-maintainer/evidence-sources.yml +18 -18
  35. package/examples/demo-ai-app/.ai-maintainer/exceptions.yml +1 -1
  36. package/examples/demo-ai-app/.ai-maintainer/incident-runbook.md +11 -11
  37. package/examples/demo-ai-app/.ai-maintainer/observability-checklist.yml +7 -7
  38. package/examples/demo-ai-app/.ai-maintainer/policy.yml +27 -27
  39. package/examples/demo-ai-app/.ai-maintainer/project-profile.yml +15 -15
  40. package/examples/demo-ai-app/.ai-maintainer/release-checklist.yml +7 -7
  41. package/examples/demo-ai-app/.ai-maintainer/risk-policy.yml +5 -5
  42. package/examples/demo-ai-app/.ai-maintainer/threat-model.md +18 -18
  43. package/examples/demo-ai-app/README.md +38 -38
  44. package/examples/demo-ai-app/package-lock.json +15 -15
  45. package/examples/demo-ai-app/package.json +16 -16
  46. package/examples/demo-ai-app/scripts/build.mjs +18 -18
  47. package/examples/demo-ai-app/scripts/create-before-state.mjs +86 -86
  48. package/examples/demo-ai-app/scripts/run-demo-gate.mjs +95 -95
  49. package/examples/demo-ai-app/src/order-risk.js +28 -28
  50. package/examples/demo-ai-app/test/order-risk.test.mjs +24 -24
  51. package/package.json +2 -1
package/README.md CHANGED
@@ -11,7 +11,7 @@
11
11
 
12
12
  AI can generate code fast. This tool helps you keep the project maintainable after that: collect project evidence, plan the audit, run deterministic gates, let Codex fix blockers, and rerun until the release is defensible.
13
13
 
14
- [See the demo](docs/DEMO.md) · [中文演示](docs/DEMO.zh-CN.md) · [Production audit docs](docs/PRODUCTION-AUDIT.zh-CN.md)
14
+ [See the demo](docs/DEMO.md) · [Before/after case](docs/demo-output/before-after-case.md) · [中文演示](docs/DEMO.zh-CN.md) · [Production audit docs](docs/PRODUCTION-AUDIT.zh-CN.md)
15
15
 
16
16
  It is not another scanner wrapper. It turns AI coding maintenance into a repeatable loop:
17
17
 
@@ -56,6 +56,8 @@ GitHub Actions templates can either use the npm package or clone this repository
56
56
 
57
57
  This repository includes a runnable sample project at `examples/demo-ai-app`.
58
58
 
59
+ ![90-second demo storyboard](assets/demo-90s.gif)
60
+
59
61
  ```powershell
60
62
  npm test --prefix .\examples\demo-ai-app
61
63
  npm run build --prefix .\examples\demo-ai-app
@@ -75,6 +77,12 @@ node .\examples\demo-ai-app\scripts\create-before-state.mjs
75
77
  ```
76
78
 
77
79
  It writes a broken copy under the OS temp directory, where the business tests fail.
80
+
81
+ More demo material:
82
+
83
+ - [Before/after case](docs/demo-output/before-after-case.md)
84
+ - [90-second browser demo](docs/demo-output/90-second-demo.html)
85
+ - [Animated SVG storyboard](assets/demo-90s-storyboard.svg)
78
86
 
79
87
  ## What It Checks
80
88
 
@@ -136,11 +144,12 @@ reports/security-report.sarif
136
144
  reports/sbom.cdx.json
137
145
  ```
138
146
 
139
- Reports include:
140
-
141
- - PASS/FAIL summary
142
- - blockers and warnings
143
- - production evidence gaps
147
+ Reports include:
148
+
149
+ - PASS/FAIL summary
150
+ - `overallStatus`: `FAIL`, `PASS_WITH_GAPS`, `PASS_WITH_WARNINGS`, or `PASS`
151
+ - blockers and warnings
152
+ - production evidence gaps
144
153
  - user decisions still needed
145
154
  - tool versions and commands
146
155
  - exception usage
@@ -186,6 +195,7 @@ It is designed for the practical middle ground: a personal developer or small te
186
195
 
187
196
  - [Demo](docs/DEMO.md)
188
197
  - [中文演示](docs/DEMO.zh-CN.md)
198
+ - [Before/after case](docs/demo-output/before-after-case.md)
189
199
  - [Security workflow](docs/SECURITY-WORKFLOW.md)
190
200
  - [Production audit workflow](docs/PRODUCTION-AUDIT.zh-CN.md)
191
201
  - [Intake schema](docs/INTAKE-SCHEMA.zh-CN.md)
@@ -1,6 +1,6 @@
1
- interface:
2
- display_name: "AI Project Maintainer"
3
- short_description: "Reusable local and CI safety gates for AI-coded projects."
4
- default_prompt: "Use $ai-project-maintainer to run a strict local safety gate for this project and explain any blockers."
5
- policy:
6
- allow_implicit_invocation: true
1
+ interface:
2
+ display_name: "AI Project Maintainer"
3
+ short_description: "Reusable local and CI safety gates for AI-coded projects."
4
+ default_prompt: "Use $ai-project-maintainer to run a strict local safety gate for this project and explain any blockers."
5
+ policy:
6
+ allow_implicit_invocation: true
@@ -1,55 +1,55 @@
1
- # CI Guardrails
2
-
3
- Use this reference when the user wants durable automation instead of repeated manual review.
4
-
5
- ## Principle
6
-
7
- Guardrails should be small, fast, and relevant to the repo. Add blocking checks for high-confidence failures and non-blocking reports for noisy scanners until tuned.
8
-
9
- ## Minimum Useful Stack
10
-
11
- For most repos:
12
-
13
- - Secret scan: Gitleaks or Trivy secret scan.
14
- - Dependency and image scan: Trivy or native package audit.
15
- - Static app security: Semgrep auto rules and repo tests.
16
- - IaC/cloud scan when infra files exist: Checkov or Trivy config.
17
- - Database migration lint when migrations exist: Squawk/Atlas/strong_migrations/gh-ost/pt-online-schema-change depending on database.
18
-
19
- ## PR Gates
20
-
21
- Block PRs for:
22
-
23
- - Committed secrets.
24
- - Critical reachable dependency vulnerabilities in production code.
25
- - Unsafe database migrations likely to lock or rewrite hot tables.
26
- - Public exposure or privilege escalation in IaC/Kubernetes.
27
- - Failing tests or type checks in changed service boundaries.
28
-
29
- Warn but do not block initially for:
30
-
31
- - Low-confidence SAST findings.
32
- - Dev-only vulnerabilities.
33
- - Legacy baseline issues unrelated to the PR.
34
-
35
- ## Deployment Gates
36
-
37
- Before production deploy:
38
-
39
- - Confirm migrations have a rollback or forward-fix plan.
40
- - Confirm schema/app compatibility for rolling deploys.
41
- - Confirm images are scanned and signed if the org uses signing.
42
- - Confirm observability exists for the changed path: logs, metrics, traces, and alerts.
43
- - Confirm feature flags or rollback commands for high-risk releases.
44
-
45
- ## Incident Feedback Loop
46
-
47
- After every incident, add one guardrail that would have detected or prevented it:
48
-
49
- - Migration lint or staging dry run for DB incidents.
50
- - Synthetic checks or SLO alerts for user-visible breakage.
51
- - Policy-as-code for cloud/network misconfigurations.
52
- - Regression tests for the failed path.
53
- - Runbook automation for known remediation.
54
-
55
- Do not add broad tools just to add tools. Tie each new check to a risk surfaced by the project.
1
+ # CI Guardrails
2
+
3
+ Use this reference when the user wants durable automation instead of repeated manual review.
4
+
5
+ ## Principle
6
+
7
+ Guardrails should be small, fast, and relevant to the repo. Add blocking checks for high-confidence failures and non-blocking reports for noisy scanners until tuned.
8
+
9
+ ## Minimum Useful Stack
10
+
11
+ For most repos:
12
+
13
+ - Secret scan: Gitleaks or Trivy secret scan.
14
+ - Dependency and image scan: Trivy or native package audit.
15
+ - Static app security: Semgrep auto rules and repo tests.
16
+ - IaC/cloud scan when infra files exist: Checkov or Trivy config.
17
+ - Database migration lint when migrations exist: Squawk/Atlas/strong_migrations/gh-ost/pt-online-schema-change depending on database.
18
+
19
+ ## PR Gates
20
+
21
+ Block PRs for:
22
+
23
+ - Committed secrets.
24
+ - Critical reachable dependency vulnerabilities in production code.
25
+ - Unsafe database migrations likely to lock or rewrite hot tables.
26
+ - Public exposure or privilege escalation in IaC/Kubernetes.
27
+ - Failing tests or type checks in changed service boundaries.
28
+
29
+ Warn but do not block initially for:
30
+
31
+ - Low-confidence SAST findings.
32
+ - Dev-only vulnerabilities.
33
+ - Legacy baseline issues unrelated to the PR.
34
+
35
+ ## Deployment Gates
36
+
37
+ Before production deploy:
38
+
39
+ - Confirm migrations have a rollback or forward-fix plan.
40
+ - Confirm schema/app compatibility for rolling deploys.
41
+ - Confirm images are scanned and signed if the org uses signing.
42
+ - Confirm observability exists for the changed path: logs, metrics, traces, and alerts.
43
+ - Confirm feature flags or rollback commands for high-risk releases.
44
+
45
+ ## Incident Feedback Loop
46
+
47
+ After every incident, add one guardrail that would have detected or prevented it:
48
+
49
+ - Migration lint or staging dry run for DB incidents.
50
+ - Synthetic checks or SLO alerts for user-visible breakage.
51
+ - Policy-as-code for cloud/network misconfigurations.
52
+ - Regression tests for the failed path.
53
+ - Runbook automation for known remediation.
54
+
55
+ Do not add broad tools just to add tools. Tie each new check to a risk surfaced by the project.
@@ -1,60 +1,60 @@
1
- # Database Review
2
-
3
- Use this reference when the repo has SQL files, ORM migrations, schema definitions, migration tools, or the user mentions database risk.
4
-
5
- ## Review Goals
6
-
7
- - Prevent downtime from locks, table rewrites, long transactions, blocking backfills, and incompatible app/schema deploy order.
8
- - Preserve rollback options and data integrity.
9
- - Separate migration safety from application query correctness.
10
-
11
- ## High-Risk Migration Patterns
12
-
13
- Flag and verify these patterns:
14
-
15
- - Adding `NOT NULL` columns with defaults on large existing tables.
16
- - Creating non-concurrent indexes on hot Postgres tables.
17
- - Dropping, renaming, or changing types of columns used by deployed code.
18
- - Large backfills inside a schema migration transaction.
19
- - Adding foreign keys or constraints without staged validation.
20
- - Rewriting large tables with `ALTER TABLE` operations.
21
- - Deleting data during deploy instead of after a compatibility window.
22
- - Mixing app logic changes and irreversible schema/data changes in one deploy.
23
- - Missing rollback, down migration, or forward-only recovery plan.
24
-
25
- ## PostgreSQL Routing
26
-
27
- - Use Squawk for raw SQL or Postgres migration linting.
28
- - Use Atlas for schema diff and migration lint when Atlas config or migration dirs exist.
29
- - Use pgroll when zero-downtime, expand/contract, or rollback-safe Postgres migrations are needed.
30
- - In Rails, use `strong_migrations` patterns when `db/migrate` exists.
31
-
32
- Prefer staged changes:
33
-
34
- 1. Expand schema compatibly.
35
- 2. Deploy code that reads/writes both old and new shape if needed.
36
- 3. Backfill in small batches outside the deploy transaction.
37
- 4. Validate constraints concurrently or in staged form.
38
- 5. Contract after old code is gone.
39
-
40
- ## MySQL Routing
41
-
42
- - Use gh-ost or Percona `pt-online-schema-change` for online schema changes on large tables.
43
- - Watch for metadata locks, long-running transactions, replication lag, and unsafe DDL.
44
- - Require a rollback or forward-fix plan for destructive changes.
45
-
46
- ## ORM-Specific Checks
47
-
48
- - Prisma/Drizzle/TypeORM/Knex: inspect generated SQL, not only the migration source.
49
- - Django/Alembic/Flyway/Liquibase: verify transaction behavior, lock behavior, and migration ordering.
50
- - Rails: check `strong_migrations` compatibility and avoid data backfills inside migration files.
51
-
52
- ## Output Shape
53
-
54
- For each migration risk, report:
55
-
56
- - Affected table or migration file.
57
- - Lock/rewrite/compatibility risk.
58
- - Safer migration sequence.
59
- - Tool coverage used or missing.
60
- - Verification: local migration test, shadow DB diff, staging dry run, or CI lint.
1
+ # Database Review
2
+
3
+ Use this reference when the repo has SQL files, ORM migrations, schema definitions, migration tools, or the user mentions database risk.
4
+
5
+ ## Review Goals
6
+
7
+ - Prevent downtime from locks, table rewrites, long transactions, blocking backfills, and incompatible app/schema deploy order.
8
+ - Preserve rollback options and data integrity.
9
+ - Separate migration safety from application query correctness.
10
+
11
+ ## High-Risk Migration Patterns
12
+
13
+ Flag and verify these patterns:
14
+
15
+ - Adding `NOT NULL` columns with defaults on large existing tables.
16
+ - Creating non-concurrent indexes on hot Postgres tables.
17
+ - Dropping, renaming, or changing types of columns used by deployed code.
18
+ - Large backfills inside a schema migration transaction.
19
+ - Adding foreign keys or constraints without staged validation.
20
+ - Rewriting large tables with `ALTER TABLE` operations.
21
+ - Deleting data during deploy instead of after a compatibility window.
22
+ - Mixing app logic changes and irreversible schema/data changes in one deploy.
23
+ - Missing rollback, down migration, or forward-only recovery plan.
24
+
25
+ ## PostgreSQL Routing
26
+
27
+ - Use Squawk for raw SQL or Postgres migration linting.
28
+ - Use Atlas for schema diff and migration lint when Atlas config or migration dirs exist.
29
+ - Use pgroll when zero-downtime, expand/contract, or rollback-safe Postgres migrations are needed.
30
+ - In Rails, use `strong_migrations` patterns when `db/migrate` exists.
31
+
32
+ Prefer staged changes:
33
+
34
+ 1. Expand schema compatibly.
35
+ 2. Deploy code that reads/writes both old and new shape if needed.
36
+ 3. Backfill in small batches outside the deploy transaction.
37
+ 4. Validate constraints concurrently or in staged form.
38
+ 5. Contract after old code is gone.
39
+
40
+ ## MySQL Routing
41
+
42
+ - Use gh-ost or Percona `pt-online-schema-change` for online schema changes on large tables.
43
+ - Watch for metadata locks, long-running transactions, replication lag, and unsafe DDL.
44
+ - Require a rollback or forward-fix plan for destructive changes.
45
+
46
+ ## ORM-Specific Checks
47
+
48
+ - Prisma/Drizzle/TypeORM/Knex: inspect generated SQL, not only the migration source.
49
+ - Django/Alembic/Flyway/Liquibase: verify transaction behavior, lock behavior, and migration ordering.
50
+ - Rails: check `strong_migrations` compatibility and avoid data backfills inside migration files.
51
+
52
+ ## Output Shape
53
+
54
+ For each migration risk, report:
55
+
56
+ - Affected table or migration file.
57
+ - Lock/rewrite/compatibility risk.
58
+ - Safer migration sequence.
59
+ - Tool coverage used or missing.
60
+ - Verification: local migration test, shadow DB diff, staging dry run, or CI lint.
@@ -1,43 +1,43 @@
1
- # Electron Desktop Review
2
-
3
- Use this reference when `electron` appears in dependencies, `BrowserWindow` is used, or files such as `main.js`, `preload.js`, or `renderer.js` exist.
4
-
5
- ## Blockers
6
-
7
- Block release for:
8
-
9
- - `nodeIntegration: true` in renderer windows.
10
- - `contextIsolation: false`.
11
- - `webSecurity: false` or `allowRunningInsecureContent: true`.
12
- - IPC APIs that read, write, delete, execute, or open arbitrary caller-supplied paths.
13
- - Preload APIs that expose broad filesystem, shell, child process, or network capabilities without per-operation authorization.
14
- - Auto-update flows that trust a remote hash from the same mutable location as the update metadata.
15
-
16
- ## Review Checklist
17
-
18
- - Main process owns privileged actions. Renderers request narrow operations.
19
- - File import/export paths come from a main-process dialog or a short-lived path token created by the main process.
20
- - IPC validates sender frame/origin, input schema, allowed path roots, and operation intent.
21
- - `shell.openExternal` only opens validated `https:` URLs from allowlisted domains.
22
- - Navigation and new-window handlers deny untrusted destinations.
23
- - Preload exposes small, named commands instead of raw filesystem or shell primitives.
24
- - Updates are signed or verified with a trust root not controlled by the update metadata source.
25
- - Multi-window writes use revision checks, merge-by-scope, or conflict rejection in the main process.
26
-
27
- ## Useful Tests
28
-
29
- - Renderer cannot read an arbitrary local file through import IPC.
30
- - Renderer cannot write outside approved export locations.
31
- - Main process rejects IPC from unexpected sender frames where feasible.
32
- - Two windows saving stale snapshots cannot overwrite newer data without conflict detection.
33
- - Update metadata without valid integrity/signature fails closed.
34
-
35
- ## Output Standard
36
-
37
- For Electron findings, include:
38
-
39
- - Exact exposed API or `BrowserWindow` setting.
40
- - Attack precondition such as XSS, malicious import data, or compromised update metadata.
41
- - Privilege gained by renderer code.
42
- - Main-process authorization fix.
43
- - Regression test to prove the privilege path is closed.
1
+ # Electron Desktop Review
2
+
3
+ Use this reference when `electron` appears in dependencies, `BrowserWindow` is used, or files such as `main.js`, `preload.js`, or `renderer.js` exist.
4
+
5
+ ## Blockers
6
+
7
+ Block release for:
8
+
9
+ - `nodeIntegration: true` in renderer windows.
10
+ - `contextIsolation: false`.
11
+ - `webSecurity: false` or `allowRunningInsecureContent: true`.
12
+ - IPC APIs that read, write, delete, execute, or open arbitrary caller-supplied paths.
13
+ - Preload APIs that expose broad filesystem, shell, child process, or network capabilities without per-operation authorization.
14
+ - Auto-update flows that trust a remote hash from the same mutable location as the update metadata.
15
+
16
+ ## Review Checklist
17
+
18
+ - Main process owns privileged actions. Renderers request narrow operations.
19
+ - File import/export paths come from a main-process dialog or a short-lived path token created by the main process.
20
+ - IPC validates sender frame/origin, input schema, allowed path roots, and operation intent.
21
+ - `shell.openExternal` only opens validated `https:` URLs from allowlisted domains.
22
+ - Navigation and new-window handlers deny untrusted destinations.
23
+ - Preload exposes small, named commands instead of raw filesystem or shell primitives.
24
+ - Updates are signed or verified with a trust root not controlled by the update metadata source.
25
+ - Multi-window writes use revision checks, merge-by-scope, or conflict rejection in the main process.
26
+
27
+ ## Useful Tests
28
+
29
+ - Renderer cannot read an arbitrary local file through import IPC.
30
+ - Renderer cannot write outside approved export locations.
31
+ - Main process rejects IPC from unexpected sender frames where feasible.
32
+ - Two windows saving stale snapshots cannot overwrite newer data without conflict detection.
33
+ - Update metadata without valid integrity/signature fails closed.
34
+
35
+ ## Output Standard
36
+
37
+ For Electron findings, include:
38
+
39
+ - Exact exposed API or `BrowserWindow` setting.
40
+ - Attack precondition such as XSS, malicious import data, or compromised update metadata.
41
+ - Privilege gained by renderer code.
42
+ - Main-process authorization fix.
43
+ - Regression test to prove the privilege path is closed.
@@ -1,52 +1,52 @@
1
- # Incident Response
2
-
3
- Use this reference when the user reports an outage, degraded service, suspicious production behavior, alert storm, data issue, or asks for SRE/root-cause help.
4
-
5
- ## Safety Rules
6
-
7
- - Stay read-only unless the user explicitly asks for a mitigation and confirms the target environment.
8
- - Preserve evidence before cleanup.
9
- - Prefer reversible mitigations: rollback, traffic shift, feature flag disable, rate limit, queue pause, or read-only mode.
10
- - Do not run destructive database, Kubernetes, or cloud commands as part of diagnosis.
11
-
12
- ## Triage Flow
13
-
14
- 1. Define the incident window, affected service, symptoms, and customer impact.
15
- 2. Build a timeline: deploys, config changes, database migrations, infra changes, traffic changes, alerts, and dependency events.
16
- 3. Check blast radius: which regions, tenants, endpoints, queues, jobs, or database tables are affected.
17
- 4. Correlate signals: logs, metrics, traces, Kubernetes events, rollout history, error budgets, and saturation metrics.
18
- 5. Identify likely trigger and immediate mitigation.
19
- 6. Record follow-up guardrails that would have caught the issue before production.
20
-
21
- ## Tool Routing
22
-
23
- - Kubernetes: use `kubectl` read-only commands, k8sgpt, Kubescape, or HolmesGPT.
24
- - Observability: query Prometheus/Grafana/Loki/ELK/Jaeger/Datadog/Sentry only when credentials or tools are already available.
25
- - AI SRE: use HolmesGPT or OpenSRE to correlate logs, metrics, traces, and runbooks when configured.
26
- - Database incidents: read `database.md` and look for recent migrations, locks, slow queries, deadlocks, replication lag, and connection pool exhaustion.
27
- - Network incidents: read `security.md` and inspect ingress, DNS, certificates, firewall/security-group changes, service mesh, and egress policy.
28
-
29
- ## Useful Read-Only Commands
30
-
31
- Adapt to the cluster and namespace:
32
-
33
- ```bash
34
- kubectl get pods,deploy,svc,ingress -A
35
- kubectl get events -A --sort-by=.lastTimestamp
36
- kubectl rollout history deployment/<name> -n <namespace>
37
- kubectl logs deployment/<name> -n <namespace> --since=30m
38
- k8sgpt analyze
39
- holmes ask "What changed before this alert and what is the most likely root cause?"
40
- ```
41
-
42
- ## Incident Output
43
-
44
- Return:
45
-
46
- - Current assessment and confidence.
47
- - Timeline with concrete timestamps where available.
48
- - Blast radius.
49
- - Most likely trigger and competing hypotheses.
50
- - Immediate mitigation options with rollback risk.
51
- - Evidence still needed.
52
- - Preventive guardrails for CI, deploy, monitoring, or database migration.
1
+ # Incident Response
2
+
3
+ Use this reference when the user reports an outage, degraded service, suspicious production behavior, alert storm, data issue, or asks for SRE/root-cause help.
4
+
5
+ ## Safety Rules
6
+
7
+ - Stay read-only unless the user explicitly asks for a mitigation and confirms the target environment.
8
+ - Preserve evidence before cleanup.
9
+ - Prefer reversible mitigations: rollback, traffic shift, feature flag disable, rate limit, queue pause, or read-only mode.
10
+ - Do not run destructive database, Kubernetes, or cloud commands as part of diagnosis.
11
+
12
+ ## Triage Flow
13
+
14
+ 1. Define the incident window, affected service, symptoms, and customer impact.
15
+ 2. Build a timeline: deploys, config changes, database migrations, infra changes, traffic changes, alerts, and dependency events.
16
+ 3. Check blast radius: which regions, tenants, endpoints, queues, jobs, or database tables are affected.
17
+ 4. Correlate signals: logs, metrics, traces, Kubernetes events, rollout history, error budgets, and saturation metrics.
18
+ 5. Identify likely trigger and immediate mitigation.
19
+ 6. Record follow-up guardrails that would have caught the issue before production.
20
+
21
+ ## Tool Routing
22
+
23
+ - Kubernetes: use `kubectl` read-only commands, k8sgpt, Kubescape, or HolmesGPT.
24
+ - Observability: query Prometheus/Grafana/Loki/ELK/Jaeger/Datadog/Sentry only when credentials or tools are already available.
25
+ - AI SRE: use HolmesGPT or OpenSRE to correlate logs, metrics, traces, and runbooks when configured.
26
+ - Database incidents: read `database.md` and look for recent migrations, locks, slow queries, deadlocks, replication lag, and connection pool exhaustion.
27
+ - Network incidents: read `security.md` and inspect ingress, DNS, certificates, firewall/security-group changes, service mesh, and egress policy.
28
+
29
+ ## Useful Read-Only Commands
30
+
31
+ Adapt to the cluster and namespace:
32
+
33
+ ```bash
34
+ kubectl get pods,deploy,svc,ingress -A
35
+ kubectl get events -A --sort-by=.lastTimestamp
36
+ kubectl rollout history deployment/<name> -n <namespace>
37
+ kubectl logs deployment/<name> -n <namespace> --since=30m
38
+ k8sgpt analyze
39
+ holmes ask "What changed before this alert and what is the most likely root cause?"
40
+ ```
41
+
42
+ ## Incident Output
43
+
44
+ Return:
45
+
46
+ - Current assessment and confidence.
47
+ - Timeline with concrete timestamps where available.
48
+ - Blast radius.
49
+ - Most likely trigger and competing hypotheses.
50
+ - Immediate mitigation options with rollback risk.
51
+ - Evidence still needed.
52
+ - Preventive guardrails for CI, deploy, monitoring, or database migration.
@@ -1,48 +1,48 @@
1
- # Security Review
2
-
3
- Use this reference for application, network, cloud, IaC, dependency, container, and secret security checks.
4
-
5
- ## AI-Coding Risk Hotspots
6
-
7
- AI-generated code commonly introduces plausible-looking but unsafe shortcuts. Prioritize:
8
-
9
- - Auth bypasses, missing authorization checks, tenant isolation mistakes, and IDOR.
10
- - Overbroad CORS, public debug endpoints, permissive admin routes, and missing CSRF protections.
11
- - SSRF through user-controlled URLs, webhooks, image fetchers, importers, or proxy endpoints.
12
- - SQL/NoSQL injection from string-built queries or unsafe ORM escape hatches.
13
- - Unsafe file upload, path traversal, zip extraction, and untrusted document parsing.
14
- - Disabled TLS verification, weak JWT validation, hardcoded secrets, and secret logging.
15
- - Excessive IAM permissions, public buckets, open security groups, and unauthenticated internal services.
16
- - Docker/Kubernetes privileged mode, hostPath mounts, root containers, and exposed dashboards.
17
-
18
- ## Check Sequence
19
-
20
- 1. Secrets: run Gitleaks or Trivy secret scan first; never display secret values.
21
- 2. Dependencies and images: run Trivy or equivalent package audit and separate production/runtime findings from dev-only noise.
22
- 3. Static analysis: run Semgrep auto rules, CodeQL if configured, and any repo-native lint/security tests.
23
- 4. IaC and cloud config: run Checkov, Trivy config, Conftest/OPA, or Kubescape depending on files present.
24
- 5. Targeted manual review: inspect auth boundaries, request validation, dataflow into dangerous sinks, and changed deploy/network files.
25
- 6. DAST: use ZAP or Nuclei only for an explicitly scoped local/staging target.
26
-
27
- ## Network and Cloud Review
28
-
29
- For Terraform, Helm, Kubernetes, Docker Compose, or cloud config, identify:
30
-
31
- - Internet-exposed ingress/load balancers and whether auth/TLS/WAF is present.
32
- - Security group or firewall rules with `0.0.0.0/0` or overly broad ports.
33
- - IAM wildcard actions/resources and cross-account trust.
34
- - Lack of Kubernetes NetworkPolicy in multi-tenant or internet-facing namespaces.
35
- - Secret material in manifests, env vars, logs, or image layers.
36
- - Missing resource limits, health checks, pod disruption budgets, or rollout safety.
37
-
38
- ## Finding Standard
39
-
40
- Report only actionable findings. Each security finding needs:
41
-
42
- - Asset or code path affected.
43
- - Attack path or failure mode.
44
- - Exploit preconditions.
45
- - Concrete fix.
46
- - Verification command, test, or policy.
47
-
48
- Mark speculative concerns as coverage gaps, not findings.
1
+ # Security Review
2
+
3
+ Use this reference for application, network, cloud, IaC, dependency, container, and secret security checks.
4
+
5
+ ## AI-Coding Risk Hotspots
6
+
7
+ AI-generated code commonly introduces plausible-looking but unsafe shortcuts. Prioritize:
8
+
9
+ - Auth bypasses, missing authorization checks, tenant isolation mistakes, and IDOR.
10
+ - Overbroad CORS, public debug endpoints, permissive admin routes, and missing CSRF protections.
11
+ - SSRF through user-controlled URLs, webhooks, image fetchers, importers, or proxy endpoints.
12
+ - SQL/NoSQL injection from string-built queries or unsafe ORM escape hatches.
13
+ - Unsafe file upload, path traversal, zip extraction, and untrusted document parsing.
14
+ - Disabled TLS verification, weak JWT validation, hardcoded secrets, and secret logging.
15
+ - Excessive IAM permissions, public buckets, open security groups, and unauthenticated internal services.
16
+ - Docker/Kubernetes privileged mode, hostPath mounts, root containers, and exposed dashboards.
17
+
18
+ ## Check Sequence
19
+
20
+ 1. Secrets: run Gitleaks or Trivy secret scan first; never display secret values.
21
+ 2. Dependencies and images: run Trivy or equivalent package audit and separate production/runtime findings from dev-only noise.
22
+ 3. Static analysis: run Semgrep auto rules, CodeQL if configured, and any repo-native lint/security tests.
23
+ 4. IaC and cloud config: run Checkov, Trivy config, Conftest/OPA, or Kubescape depending on files present.
24
+ 5. Targeted manual review: inspect auth boundaries, request validation, dataflow into dangerous sinks, and changed deploy/network files.
25
+ 6. DAST: use ZAP or Nuclei only for an explicitly scoped local/staging target.
26
+
27
+ ## Network and Cloud Review
28
+
29
+ For Terraform, Helm, Kubernetes, Docker Compose, or cloud config, identify:
30
+
31
+ - Internet-exposed ingress/load balancers and whether auth/TLS/WAF is present.
32
+ - Security group or firewall rules with `0.0.0.0/0` or overly broad ports.
33
+ - IAM wildcard actions/resources and cross-account trust.
34
+ - Lack of Kubernetes NetworkPolicy in multi-tenant or internet-facing namespaces.
35
+ - Secret material in manifests, env vars, logs, or image layers.
36
+ - Missing resource limits, health checks, pod disruption budgets, or rollout safety.
37
+
38
+ ## Finding Standard
39
+
40
+ Report only actionable findings. Each security finding needs:
41
+
42
+ - Asset or code path affected.
43
+ - Attack path or failure mode.
44
+ - Exploit preconditions.
45
+ - Concrete fix.
46
+ - Verification command, test, or policy.
47
+
48
+ Mark speculative concerns as coverage gaps, not findings.