@static-var/keystone 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/plugins/marketplace.json +24 -0
- package/.claude-plugin/marketplace.json +24 -0
- package/.claude-plugin/plugin.json +12 -0
- package/.codex-plugin/plugin.json +12 -0
- package/.pi/extensions/keystone.ts +172 -0
- package/HOW_IT_WORKS.md +424 -0
- package/Makefile +19 -0
- package/README.md +253 -0
- package/package.json +86 -0
- package/packaging.allowlist +32 -0
- package/scripts/build-metadata.py +99 -0
- package/scripts/package-keystone.sh +59 -0
- package/scripts/validate-keystone.py +261 -0
- package/scripts/validate-package.py +140 -0
- package/skills/keystone/SKILL.md +69 -0
- package/skills/keystone/modules/breakdown.md +239 -0
- package/skills/keystone/modules/build.md +284 -0
- package/skills/keystone/modules/debug.md +198 -0
- package/skills/keystone/modules/gates/isolation.md +56 -0
- package/skills/keystone/modules/gates/proof.md +54 -0
- package/skills/keystone/modules/gates/red.md +59 -0
- package/skills/keystone/modules/gates/review.md +56 -0
- package/skills/keystone/modules/gates/ship.md +57 -0
- package/skills/keystone/modules/health.md +124 -0
- package/skills/keystone/modules/helpers/subagents.md +134 -0
- package/skills/keystone/modules/research.md +86 -0
- package/skills/keystone/modules/review.md +270 -0
- package/skills/keystone/modules/router.md +36 -0
- package/skills/keystone/modules/shape.md +125 -0
- package/skills/keystone/modules/ship.md +130 -0
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# Keystone Subagents and Reasoning Helper
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
Teach Keystone when subagents can be used, which hosts support them, and what reasoning level each Keystone module should prefer.
|
|
5
|
+
|
|
6
|
+
This helper is advisory. Host capability wins over Keystone preference. If the current harness cannot set reasoning per subagent, encode the desired reasoning in the subagent prompt or do the work inline.
|
|
7
|
+
|
|
8
|
+
## Delegation rule
|
|
9
|
+
Use subagents only when the task has a clear boundary and a useful handoff artifact.
|
|
10
|
+
|
|
11
|
+
Good delegation targets:
|
|
12
|
+
- read-only reconnaissance
|
|
13
|
+
- independent implementation slices
|
|
14
|
+
- focused debugging/root-cause analysis
|
|
15
|
+
- read-only review
|
|
16
|
+
- documentation/copy drafting
|
|
17
|
+
|
|
18
|
+
Do not delegate when:
|
|
19
|
+
- the task needs tight conversational clarification
|
|
20
|
+
- one agent must continuously coordinate shared mutable state
|
|
21
|
+
- the host cannot preserve enough context for safe handoff
|
|
22
|
+
- the subagent would need secrets or permissions the parent should not share
|
|
23
|
+
- delegation setup, context packaging, merge/review effort, or verification overhead is likely greater than doing the task inline
|
|
24
|
+
|
|
25
|
+
## Delegation cost heuristic
|
|
26
|
+
|
|
27
|
+
Before spawning a subagent, compare expected benefit with coordination cost.
|
|
28
|
+
|
|
29
|
+
Delegate when at least one is true:
|
|
30
|
+
- the subtask can run in parallel with other independent work
|
|
31
|
+
- the subtask requires a different role, perspective, or reasoning depth
|
|
32
|
+
- the repository/source search is large enough that a scout can save parent context
|
|
33
|
+
- independent review or root-cause analysis materially reduces release risk
|
|
34
|
+
|
|
35
|
+
Do not delegate when most of these are true:
|
|
36
|
+
- the task can be done inline in a few minutes
|
|
37
|
+
- the parent would need to explain more context than the subagent can save
|
|
38
|
+
- outputs will require complex merge arbitration
|
|
39
|
+
- the work touches the same mutable files as another active agent without isolation
|
|
40
|
+
- verification cannot be independently stated in the prompt
|
|
41
|
+
|
|
42
|
+
If uncertain, prefer one narrow read-only scout/reviewer over multiple workers.
|
|
43
|
+
|
|
44
|
+
## Host capability matrix
|
|
45
|
+
|
|
46
|
+
| Harness | Subagents | Per-subagent reasoning/effort | How to configure | Keystone policy |
|
|
47
|
+
|---|---:|---:|---|---|
|
|
48
|
+
| Pi coding agent with `pi-subagents` | yes | yes | `.pi/agents/<name>.md` frontmatter `model`, `thinking`, `profile`, or `Agent({ subagent_type, thinking, model, profile })` | Use native roles; prefer role defaults unless Keystone needs a one-off override. |
|
|
49
|
+
| Claude Code | yes | partial | `.claude/agents/<name>.md` supports subagent config and `model`; built-in Explore accepts quick/medium/very-thorough style detail, but custom agents do not expose a general reasoning knob | Use `model` plus explicit prompt instructions; use built-in Explore detail when applicable. |
|
|
50
|
+
| Codex CLI/app | unclear/host-dependent | partial/global | known global config includes `model_reasoning_effort`; no stable per-subagent effort schema confirmed | Treat Keystone reasoning as advisory text unless the active Codex host exposes a per-agent effort control. |
|
|
51
|
+
| T3 Code | not confirmed | not confirmed | no confirmed public/local schema | Treat as unsupported; run inline or through the underlying Claude/Codex/OpenCode provider if available. |
|
|
52
|
+
| OpenCode | yes | partial/provider-dependent | agent config supports `mode: "subagent"`, `model`, and provider-specific `variant`; no universal reasoning field is confirmed | Use subagent mode; map Keystone reasoning to model/variant only where the provider exposes effort variants, otherwise write the desired reasoning in the prompt. |
|
|
53
|
+
| GitHub Copilot / VS Code | yes | partial | `.github/agents/*.agent.md` supports `model`, `agents`, `user-invocable`, `disable-model-invocation`; no general reasoning knob found | Use custom agents and model choice; put reasoning expectation in the agent prompt. |
|
|
54
|
+
|
|
55
|
+
## Canonical reasoning scale
|
|
56
|
+
|
|
57
|
+
Keystone uses this host-neutral scale:
|
|
58
|
+
|
|
59
|
+
| Level | Use for |
|
|
60
|
+
|---|---|
|
|
61
|
+
| `off` | deterministic formatting, mechanical edits, no reasoning needed |
|
|
62
|
+
| `minimal` | tiny lookups, simple classification, trivial copy changes |
|
|
63
|
+
| `low` | ordinary reading, straightforward writing, small scoped tasks |
|
|
64
|
+
| `medium` | normal implementation, UI decisions, moderate research |
|
|
65
|
+
| `high` | architecture, debugging, review, planning, ambiguous tradeoffs |
|
|
66
|
+
| `xhigh` | hard root-cause analysis, security-sensitive review, major design decisions |
|
|
67
|
+
|
|
68
|
+
If a host uses another vocabulary, map to the nearest equivalent. If no setting exists, write the desired level into the prompt, for example: "Use high reasoning; explore alternatives before deciding."
|
|
69
|
+
|
|
70
|
+
## Keystone module defaults
|
|
71
|
+
|
|
72
|
+
| Keystone file | Preferred role | Default reasoning | Escalate when |
|
|
73
|
+
|---|---|---:|---|
|
|
74
|
+
| `modules/router.md` | none or lightweight classifier | `low` | request is ambiguous across several irreversible actions |
|
|
75
|
+
| `modules/research.md` | scout/read-only explorer or oracle | `medium` | repository is large, source relationships are unclear, or claims affect architecture, market, safety, or release decisions (`high`) |
|
|
76
|
+
| `modules/shape.md` | writer, UI/design reviewer, or oracle | `medium` | visual systems, accessibility, complex positioning, architecture, product viability, or irreversible scope decisions are involved (`high`/`xhigh`) |
|
|
77
|
+
| `modules/breakdown.md` | planner plus reviewer | `high` | plan spans multiple independent agents or risky sequencing (`xhigh`) |
|
|
78
|
+
| `modules/build.md` | worker | `medium` | concurrency, migrations, broad refactors, or unfamiliar stack (`high`) |
|
|
79
|
+
| `modules/debug.md` | oracle/root-cause investigator | `high` | intermittent, cross-system, performance, or data-loss failures (`xhigh`) |
|
|
80
|
+
| `modules/review.md` | reviewer/read-only | `high` | security, release, data migration, or public API review (`xhigh`) |
|
|
81
|
+
| `modules/ship.md` | ship coordinator | `medium` | release has unresolved risk or multi-host packaging (`high`) |
|
|
82
|
+
| `modules/health.md` | scout plus reviewer | `medium` | broad repository/tooling drift or release readiness audit (`high`) |
|
|
83
|
+
| `modules/gates/*.md` | none | `low` | evidence is contradictory or safety-critical (`medium`) |
|
|
84
|
+
|
|
85
|
+
## Pi role mapping
|
|
86
|
+
|
|
87
|
+
When Pi subagents are available, use the narrowest role and usually keep its configured defaults:
|
|
88
|
+
|
|
89
|
+
| Need | Pi role | Typical thinking |
|
|
90
|
+
|---|---|---:|
|
|
91
|
+
| codebase exploration | `scout` | `low` |
|
|
92
|
+
| implementation | `worker` | `medium` |
|
|
93
|
+
| code/spec review | `reviewer` | `high` |
|
|
94
|
+
| architecture/root-cause second opinion | `oracle` | `high` or `xhigh` |
|
|
95
|
+
| docs/copy | `writer` | `low` |
|
|
96
|
+
|
|
97
|
+
Only override `thinking` when the module table says to escalate or de-escalate.
|
|
98
|
+
|
|
99
|
+
## Safe parallel work pattern
|
|
100
|
+
|
|
101
|
+
1. `breakdown` identifies independent tasks and their verification commands.
|
|
102
|
+
2. `build` passes `gates/isolation.md` before any mutation.
|
|
103
|
+
3. If the host supports isolated worktrees, each worker gets a separate worktree or host-isolated workspace.
|
|
104
|
+
4. Each worker reports files changed, tests run, and concerns.
|
|
105
|
+
5. Parent reconciles outputs before further mutation: accept, reject, or send back with a narrower prompt.
|
|
106
|
+
6. `review` runs as read-only, preferably in a separate reviewer subagent.
|
|
107
|
+
7. `ship` finalizes only after proof and review gates pass.
|
|
108
|
+
|
|
109
|
+
## Prompt contract for delegated work
|
|
110
|
+
|
|
111
|
+
Every subagent prompt should include:
|
|
112
|
+
|
|
113
|
+
- exact task scope
|
|
114
|
+
- files or areas allowed to change/read
|
|
115
|
+
- protected files
|
|
116
|
+
- expected output artifact or report
|
|
117
|
+
- reasoning level requested if the host cannot enforce it
|
|
118
|
+
- verification command expected
|
|
119
|
+
- instruction not to broaden scope
|
|
120
|
+
- timeout or stopping condition when the host supports it
|
|
121
|
+
|
|
122
|
+
For read-only subagents, explicitly say: "Do not edit files." For review subagents, explicitly say: "Return findings only; do not fix."
|
|
123
|
+
|
|
124
|
+
## Subagent result handling
|
|
125
|
+
|
|
126
|
+
Treat subagent output as evidence, not truth. The parent remains responsible for verification and final routing.
|
|
127
|
+
|
|
128
|
+
- Timeout or no response: mark the subtask incomplete, preserve any partial logs, and either finish inline or re-delegate with a smaller scope if the remaining work is still worth the overhead.
|
|
129
|
+
- Bad output or scope creep: reject the result, record why, and re-delegate only with a tighter prompt, protected-file list, and explicit expected artifact. Otherwise do it inline.
|
|
130
|
+
- Partial completion: accept only independently verified pieces; carry unfinished work in the handoff packet with files touched, tests run, and remaining risks.
|
|
131
|
+
- Conflicting outputs: compare evidence and reproduction/proof commands first. If both are plausible and risk is material, ask an oracle/reviewer for read-only arbitration or route to `debug`; do not merge contradictory fixes blindly.
|
|
132
|
+
- Failed verification: route through the appropriate Keystone module (`debug` for unexplained failures, `build` for contained fixes, `review` for risk assessment) and rerun the original proof before continuing.
|
|
133
|
+
|
|
134
|
+
Re-delegate only when the next prompt can be narrower than the failed one, the expected artifact is concrete, and the benefit still exceeds coordination cost. Otherwise continue inline and record the reason.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# Keystone Research Module
|
|
2
|
+
|
|
3
|
+
## Core principle
|
|
4
|
+
Research is evidence gathering before action. Inspect available material first, preserve source quality, separate facts from assumptions, and do not mutate the project unless the user explicitly asks for a durable research artifact.
|
|
5
|
+
|
|
6
|
+
## Load when
|
|
7
|
+
Load when the user asks to read, inspect, summarize, inventory, extract, compare, explain, investigate options, gather technical or market context, validate claims, or answer “what is true here?” before a decision.
|
|
8
|
+
|
|
9
|
+
## Not for
|
|
10
|
+
- Implementing, refactoring, editing, or fixing code.
|
|
11
|
+
- Shaping product direction beyond evidence-backed options.
|
|
12
|
+
- Broad tooling risk audits; use `health`.
|
|
13
|
+
- Root-cause repair of a failure; use `debug` after initial context.
|
|
14
|
+
- Guessing when evidence can be inspected.
|
|
15
|
+
|
|
16
|
+
## Outcome contract
|
|
17
|
+
Deliver a research brief that states:
|
|
18
|
+
- question or decision being supported;
|
|
19
|
+
- sources inspected, with file paths, commands, URLs, or other citations;
|
|
20
|
+
- source-quality notes (primary vs secondary, current vs stale, authoritative vs anecdotal);
|
|
21
|
+
- findings separated from assumptions and unknowns;
|
|
22
|
+
- confidence level and why;
|
|
23
|
+
- recommended next module or no-op if no action is warranted.
|
|
24
|
+
|
|
25
|
+
## Modes
|
|
26
|
+
- **Repository read:** inspect files, history, configs, tests, docs, and existing behavior. Prefer primary project evidence.
|
|
27
|
+
- **External research:** compare outside documentation, standards, issues, market examples, or APIs. Cite URLs and note recency.
|
|
28
|
+
- **Synthesis:** combine several sources into a decision-ready summary with tradeoffs and confidence.
|
|
29
|
+
- **Discovery scout:** map a large unknown area without drawing strong conclusions until evidence is sampled.
|
|
30
|
+
|
|
31
|
+
## Process
|
|
32
|
+
1. Restate the research question and the decision it informs.
|
|
33
|
+
2. Inspect before asking: search/read the repo, docs, logs, or provided sources before requesting more context.
|
|
34
|
+
3. Prefer primary evidence: source code, tests, product docs, official docs, reproducible commands, direct user-provided material.
|
|
35
|
+
4. Track citations as you go. Every important claim should point to evidence or be labeled as an assumption.
|
|
36
|
+
5. Evaluate source quality: age, authority, completeness, bias, and whether evidence is direct or inferred.
|
|
37
|
+
6. Compare alternatives when relevant, including costs, risks, constraints, and no-op implications.
|
|
38
|
+
7. State unknowns explicitly. Do not fill gaps with confident-sounding speculation.
|
|
39
|
+
8. Recommend the smallest next step: `shape`, `debug`, `health`, `breakdown`, `build`, `review`, or stop.
|
|
40
|
+
|
|
41
|
+
## Subagents and reasoning
|
|
42
|
+
Default reasoning: `medium`. Use read-only scout subagents when the search space is large or evidence can be gathered independently. Use `low` for narrow file summaries. Use `high` when findings affect architecture, security, safety, release decisions, legal/market claims, or irreversible product direction. Subagents must remain read-only unless the user requested an artifact.
|
|
43
|
+
|
|
44
|
+
## Hard rules
|
|
45
|
+
- No mutation by default: do not edit files, run formatters, or alter state except harmless read-only commands.
|
|
46
|
+
- Cite evidence for material claims; if evidence is unavailable, say so.
|
|
47
|
+
- Distinguish facts, interpretations, assumptions, and recommendations.
|
|
48
|
+
- Do not ask for information that can be inspected first.
|
|
49
|
+
- Do not present search results or model knowledge as authoritative without source-quality caveats.
|
|
50
|
+
|
|
51
|
+
## Failure modes
|
|
52
|
+
- **Context theater:** long summaries without citations or decision relevance.
|
|
53
|
+
- **Source laundering:** treating blogs, stale docs, or guesses as facts.
|
|
54
|
+
- **Premature shaping:** deciding product behavior before evidence is clear.
|
|
55
|
+
- **Mutation creep:** “just fixing” or rewriting while researching.
|
|
56
|
+
- **Hidden uncertainty:** omitting confidence, unknowns, or contradictory evidence.
|
|
57
|
+
|
|
58
|
+
## Worked example
|
|
59
|
+
Good research finding: “Official Stripe docs show idempotency keys apply per unique key and preserve the first result, including failures; this means retrying payment capture should reuse the original key, not generate a new one. Confidence: High — primary docs, current page.”
|
|
60
|
+
|
|
61
|
+
Bad research finding: “Stripe probably handles retries safely, so we can just retry the request.”
|
|
62
|
+
|
|
63
|
+
## Output format
|
|
64
|
+
```markdown
|
|
65
|
+
## Research brief
|
|
66
|
+
Question: ...
|
|
67
|
+
|
|
68
|
+
### Evidence inspected
|
|
69
|
+
- `path/or/source`: what it shows, quality note
|
|
70
|
+
|
|
71
|
+
### Findings
|
|
72
|
+
- Fact — citation
|
|
73
|
+
- Interpretation — citation + reasoning
|
|
74
|
+
|
|
75
|
+
### Assumptions / unknowns
|
|
76
|
+
- ...
|
|
77
|
+
|
|
78
|
+
### Options or implications
|
|
79
|
+
- ...
|
|
80
|
+
|
|
81
|
+
### Confidence
|
|
82
|
+
High/Medium/Low — why
|
|
83
|
+
|
|
84
|
+
### Recommended next step
|
|
85
|
+
Module or no-op, with rationale
|
|
86
|
+
```
|
|
@@ -0,0 +1,270 @@
|
|
|
1
|
+
# Keystone Review Module
|
|
2
|
+
## Core principle
|
|
3
|
+
Review is an independent, read-only attempt to disprove readiness.
|
|
4
|
+
|
|
5
|
+
Ask two questions at the same time:
|
|
6
|
+
1. **Spec axis:** does the work satisfy the stated requirements and acceptance criteria?
|
|
7
|
+
2. **Standards axis:** is it secure, correct, maintainable, tested, and safe to operate?
|
|
8
|
+
|
|
9
|
+
Do not assume changed lines are the blast radius. Trace callers, callees, contracts,
|
|
10
|
+
data flow, tests, runtime paths, and user impact before giving a verdict.
|
|
11
|
+
|
|
12
|
+
## Load when
|
|
13
|
+
Load when the user asks for code review, critique, audit, readiness assessment,
|
|
14
|
+
release/merge review, security review, regression review, or review of a diff, branch,
|
|
15
|
+
PR, patch, migration, fix, plan output, or completed implementation.
|
|
16
|
+
|
|
17
|
+
Also load when another Keystone module needs `gates/review.md` satisfied before ship.
|
|
18
|
+
|
|
19
|
+
## Not for
|
|
20
|
+
Do not use Review for:
|
|
21
|
+
- fixing, refactoring, formatting, or rewriting code
|
|
22
|
+
- committing, merging, tagging, publishing, or shipping
|
|
23
|
+
- initial implementation planning before a reviewable artifact exists
|
|
24
|
+
- open-ended research with no concrete artifact to assess
|
|
25
|
+
- debugging where the requested outcome is a fix
|
|
26
|
+
|
|
27
|
+
If asked to review and fix, review first, stop, and hand findings to `build`, `debug`,
|
|
28
|
+
`research`, `ship`, or a human only after explicit permission.
|
|
29
|
+
|
|
30
|
+
## Outcome contract
|
|
31
|
+
A complete review returns:
|
|
32
|
+
- verdict: **Block**, **Caution**, or **Looks good**
|
|
33
|
+
- findings ordered P0, P1, P2, P3, then Nitpicks
|
|
34
|
+
- evidence for every finding: file/line, behavior path, contract, test, log, or doc
|
|
35
|
+
- user impact and why the severity is justified
|
|
36
|
+
- remediation guidance without applying the fix
|
|
37
|
+
- tests that should be added or updated for affected behavior
|
|
38
|
+
- scope reviewed, validation run, limitations, and read-only confirmation
|
|
39
|
+
|
|
40
|
+
The review is incomplete if it only inspects the diff, only comments on style, or
|
|
41
|
+
cannot explain how the work behaves at runtime.
|
|
42
|
+
|
|
43
|
+
## Review passes
|
|
44
|
+
Perform multiple passes. New evidence from one pass expands later passes.
|
|
45
|
+
### Pass 0: scope and baseline
|
|
46
|
+
- Identify artifact reviewed: diff, branch, files, release candidate, or plan result.
|
|
47
|
+
- Read the user request, issue, spec, acceptance criteria, and claimed completion.
|
|
48
|
+
- Check repository status without modifying files.
|
|
49
|
+
- Record uncommitted work as context, not cleanup.
|
|
50
|
+
|
|
51
|
+
### Pass 1: spec compliance
|
|
52
|
+
- Compare implementation against explicit requirements and non-goals.
|
|
53
|
+
- Check edge cases, error states, and acceptance criteria.
|
|
54
|
+
- Separate spec misses from standards concerns.
|
|
55
|
+
- Treat a clean implementation of the wrong behavior as a finding.
|
|
56
|
+
|
|
57
|
+
### Pass 2: correctness and runtime paths
|
|
58
|
+
- Trace primary success and failure paths end to end.
|
|
59
|
+
- Follow changed functions into helpers, services, adapters, persistence, UI, jobs, and
|
|
60
|
+
serializers.
|
|
61
|
+
- Validate inputs, outputs, invariants, state transitions, retries, ordering,
|
|
62
|
+
concurrency assumptions, and error propagation.
|
|
63
|
+
- Look for nullability, off-by-one, time, encoding, pagination, caching, idempotency,
|
|
64
|
+
cancellation, and partial-failure issues.
|
|
65
|
+
|
|
66
|
+
### Pass 3: regression and compatibility
|
|
67
|
+
- Identify callers, consumers, and workflows that rely on old behavior.
|
|
68
|
+
- Check public APIs, CLIs, schemas, migrations, persisted data, environment variables,
|
|
69
|
+
feature flags, configuration defaults, and documentation.
|
|
70
|
+
- Consider rollback, downgrade, mixed-version, and incremental rollout risks.
|
|
71
|
+
- Search for tests or fixtures that encode previous behavior.
|
|
72
|
+
|
|
73
|
+
### Pass 4: security, privacy, and abuse resistance
|
|
74
|
+
- Review authentication, authorization, tenancy, secrets, logging, validation,
|
|
75
|
+
injection, XSS, SSRF, path traversal, unsafe deserialization, and RCE surfaces.
|
|
76
|
+
- Check whether sensitive data leaks through errors, logs, telemetry, URLs, caches,
|
|
77
|
+
exports, screenshots, or third-party calls.
|
|
78
|
+
- Consider malicious users, compromised clients, replay, races, resource exhaustion,
|
|
79
|
+
privilege escalation, and denial of service.
|
|
80
|
+
|
|
81
|
+
### Pass 5: tests and proof
|
|
82
|
+
- Map changed behavior to existing tests.
|
|
83
|
+
- Identify missing unit, integration, contract, regression, migration, security,
|
|
84
|
+
accessibility, performance, or end-to-end coverage.
|
|
85
|
+
- Prefer behavior assertions over implementation trivia.
|
|
86
|
+
- Run focused read-only validation when practical: existing tests, type checks, lint,
|
|
87
|
+
builds, or targeted commands.
|
|
88
|
+
- If validation cannot run, state why and what should be run.
|
|
89
|
+
|
|
90
|
+
### Pass 6: maintainability and architecture
|
|
91
|
+
- Assess clarity, cohesion, naming, dependency direction, duplication, complexity,
|
|
92
|
+
observability, and debuggability.
|
|
93
|
+
- Check architectural boundaries, local conventions, and API contracts.
|
|
94
|
+
- Flag brittle abstractions, hidden coupling, unnecessary cleverness, and premature
|
|
95
|
+
generalization when they create real maintenance risk.
|
|
96
|
+
|
|
97
|
+
### Pass 7: user impact and final consistency
|
|
98
|
+
- Translate technical issues into affected personas, workflows, data, accessibility,
|
|
99
|
+
performance, reliability, and support burden.
|
|
100
|
+
- Re-rank findings by blast radius, likelihood, recoverability, and detectability.
|
|
101
|
+
- De-duplicate findings, verify evidence, and state limitations honestly.
|
|
102
|
+
|
|
103
|
+
## Severity rubric
|
|
104
|
+
Severity reflects realistic impact, not fix size.
|
|
105
|
+
|
|
106
|
+
### P0: Critical blocker
|
|
107
|
+
Immediate or likely severe harm. Examples:
|
|
108
|
+
- data loss, corruption, or irreversible destructive action
|
|
109
|
+
- unauthorized access, privilege escalation, secret exposure, or major privacy breach
|
|
110
|
+
- production outage or release artifact that cannot safely deploy
|
|
111
|
+
- legal/compliance risk with material impact
|
|
112
|
+
|
|
113
|
+
P0 means do not ship or merge without accountable human acceptance and mitigation.
|
|
114
|
+
|
|
115
|
+
### P1: Blocking defect
|
|
116
|
+
High-impact issue that violates core requirements or creates serious regression risk.
|
|
117
|
+
Examples:
|
|
118
|
+
- primary workflow broken for a meaningful user segment
|
|
119
|
+
- incorrect billing, permissions, persistence, or business logic
|
|
120
|
+
- migration or compatibility gap that can break real deployments
|
|
121
|
+
- high-risk behavior lacking tests plus a plausible failure mode
|
|
122
|
+
|
|
123
|
+
P1 normally blocks ship.
|
|
124
|
+
|
|
125
|
+
### P2: Important non-blocker or conditional blocker
|
|
126
|
+
Material issue with bounded impact, lower likelihood, or workaround. Examples:
|
|
127
|
+
- edge case with clear user impact
|
|
128
|
+
- moderate-risk test gap
|
|
129
|
+
- maintainability issue likely to cause near-term bugs
|
|
130
|
+
- weak observability for a risky path
|
|
131
|
+
|
|
132
|
+
State whether release context makes it blocking.
|
|
133
|
+
|
|
134
|
+
### P3: Low-risk improvement
|
|
135
|
+
Valid concern with limited impact. Examples:
|
|
136
|
+
- confusing name or local complexity that slows future work
|
|
137
|
+
- minor non-hot-path performance inefficiency
|
|
138
|
+
- incomplete docs for non-critical behavior
|
|
139
|
+
- small test organization weakness
|
|
140
|
+
|
|
141
|
+
P3 should not block unless it compounds with related risks.
|
|
142
|
+
|
|
143
|
+
### Nitpick
|
|
144
|
+
Cosmetic, preference-level, or optional feedback: unenforced formatting, wording tweaks,
|
|
145
|
+
or style suggestions with no correctness or maintainability impact. Keep nitpicks
|
|
146
|
+
separate from severity findings.
|
|
147
|
+
|
|
148
|
+
## Impact tracing
|
|
149
|
+
For each meaningful change, trace:
|
|
150
|
+
- **Entry points:** user action, API route, CLI, job, event, hook, or import.
|
|
151
|
+
- **Callers:** who invokes this and what assumptions they make.
|
|
152
|
+
- **Callees:** helpers, libraries, persistence, network calls, and side effects.
|
|
153
|
+
- **Data flow:** input, validation, transformation, storage, serialization, output.
|
|
154
|
+
- **Contracts:** types, schemas, public APIs, flags, config, docs, and errors.
|
|
155
|
+
- **Runtime paths:** success, failure, retry, timeout, cancellation, concurrency.
|
|
156
|
+
- **Tests:** existing coverage, missing assertions, fixtures, mocks, snapshots.
|
|
157
|
+
- **Users:** visible behavior, accessibility, performance, reliability, trust.
|
|
158
|
+
|
|
159
|
+
If tracing leaves uncertainty, gather more read-only evidence or report the limitation.
|
|
160
|
+
Do not invent confidence.
|
|
161
|
+
|
|
162
|
+
## Security and regression checklist
|
|
163
|
+
Ask for every non-trivial review:
|
|
164
|
+
- Can a user access, modify, infer, or delete data they should not?
|
|
165
|
+
- Are authn, authz, tenancy, and ownership checked at the right layer?
|
|
166
|
+
- Can untrusted input reach queries, interpreters, shells, paths, templates, redirects,
|
|
167
|
+
or deserializers unsafely?
|
|
168
|
+
- Are secrets, tokens, PII, or internal identifiers exposed in logs, errors, telemetry,
|
|
169
|
+
URLs, caches, or client bundles?
|
|
170
|
+
- Did defaults, permissions, feature flags, or safeguards become unsafe?
|
|
171
|
+
- Are races, duplicate submissions, retries, replay, and out-of-order events safe?
|
|
172
|
+
- Can persisted data be corrupted, stranded, or made hard to rollback?
|
|
173
|
+
- Are public APIs, stored data, configs, and integrations backward compatible?
|
|
174
|
+
- Does failure degrade safely without hidden partial success?
|
|
175
|
+
- Are performance, resource use, accessibility, localization, and platform differences
|
|
176
|
+
acceptable for realistic users and abuse?
|
|
177
|
+
- Do tests cover the affected behavior and important regression paths?
|
|
178
|
+
|
|
179
|
+
## Subagents and reasoning
|
|
180
|
+
Default reasoning: `high`.
|
|
181
|
+
|
|
182
|
+
Use read-only reviewer subagents for separable risks: security/privacy, test coverage,
|
|
183
|
+
architecture/API compatibility, persistence/migration, accessibility/user impact,
|
|
184
|
+
performance, concurrency, or release risk. Escalate to `xhigh` for security-sensitive,
|
|
185
|
+
data-loss, billing, permissions, public API, migration, or cross-system reviews.
|
|
186
|
+
|
|
187
|
+
Subagents must receive the read-only contract and return evidence-backed findings, not
|
|
188
|
+
patches. Reconcile duplicates and conflicts before reporting. The primary reviewer
|
|
189
|
+
owns final severity and verdict.
|
|
190
|
+
|
|
191
|
+
## Hard rules
|
|
192
|
+
- Read-only only: do not edit, format, generate, stage, commit, merge, tag, publish,
|
|
193
|
+
or ship files.
|
|
194
|
+
- Do not silently fix issues discovered during review.
|
|
195
|
+
- Do not run destructive or project-mutating commands.
|
|
196
|
+
- Do not rely only on changed lines; inspect impacted code paths and contracts.
|
|
197
|
+
- Do not approve solely because tests pass.
|
|
198
|
+
- Do not report speculation as fact; mark uncertainty.
|
|
199
|
+
- Do not bury blockers under minor comments.
|
|
200
|
+
- Do not disguise style preferences as correctness findings.
|
|
201
|
+
- Do not omit needed tests when behavior changed.
|
|
202
|
+
- Do not satisfy `gates/review.md` unless blockers and non-blockers are separated.
|
|
203
|
+
|
|
204
|
+
## Failure modes
|
|
205
|
+
Avoid these anti-patterns:
|
|
206
|
+
- **Single-pass skim:** one read of changed lines plus generic comments.
|
|
207
|
+
- **Diff tunnel vision:** missing callers, callees, contracts, and user impact.
|
|
208
|
+
- **Checklist theater:** naming security/tests without tracing actual risk.
|
|
209
|
+
- **Green-test rubber stamp:** assuming current tests prove new behavior.
|
|
210
|
+
- **Spec blindness:** judging code quality while requirements are unmet.
|
|
211
|
+
- **Standards blindness:** accepting unsafe or fragile code because the narrow spec passes.
|
|
212
|
+
- **Severity inflation:** turning preferences into blockers.
|
|
213
|
+
- **Severity deflation:** downgrading real user harm because the fix is small.
|
|
214
|
+
- **Patch creep:** fixing, refactoring, or committing instead of reviewing.
|
|
215
|
+
- **Unowned uncertainty:** failing to state what was not verified.
|
|
216
|
+
|
|
217
|
+
## Output format
|
|
218
|
+
Worked finding example:
|
|
219
|
+
```markdown
|
|
220
|
+
### P1
|
|
221
|
+
- Missing tenant check on invoice export
|
|
222
|
+
- Evidence: `api/exportInvoice.ts:42` accepts `invoiceId` and loads the invoice without comparing `invoice.accountId` to the authenticated account; `/invoices/:id/export` is reachable by any logged-in user.
|
|
223
|
+
- Impact: A user who guesses another invoice ID can download billing data from a different account, which is a privacy and authorization breach.
|
|
224
|
+
- Recommendation: Enforce tenant ownership before export and return the existing unauthorized response on mismatch.
|
|
225
|
+
- Tests needed: Add an integration test where account A requests account B's invoice and receives 403/no file, plus a happy-path same-account export test.
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
Use this structure:
|
|
229
|
+
```markdown
|
|
230
|
+
## Verdict
|
|
231
|
+
Block | Caution | Looks good
|
|
232
|
+
|
|
233
|
+
## Scope reviewed
|
|
234
|
+
- Artifact reviewed:
|
|
235
|
+
- Key files/paths inspected:
|
|
236
|
+
- Validation run:
|
|
237
|
+
- Review limitations:
|
|
238
|
+
- Read-only confirmation: no files changed by this review
|
|
239
|
+
|
|
240
|
+
## Findings
|
|
241
|
+
### P0
|
|
242
|
+
- [Title]
|
|
243
|
+
- Evidence:
|
|
244
|
+
- Impact:
|
|
245
|
+
- Recommendation:
|
|
246
|
+
- Tests needed:
|
|
247
|
+
### P1
|
|
248
|
+
None
|
|
249
|
+
|
|
250
|
+
### P2
|
|
251
|
+
None
|
|
252
|
+
|
|
253
|
+
### P3
|
|
254
|
+
None
|
|
255
|
+
|
|
256
|
+
## Nitpicks
|
|
257
|
+
None
|
|
258
|
+
|
|
259
|
+
## Tests to add or update
|
|
260
|
+
- Behavior:
|
|
261
|
+
- Suggested coverage:
|
|
262
|
+
- Why it matters:
|
|
263
|
+
## Handoff
|
|
264
|
+
- Blockers:
|
|
265
|
+
- Non-blocking follow-up:
|
|
266
|
+
- Suggested owner module: build, debug, research, ship, or human
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
If a severity has no findings, write `None`. Recommendations must be actionable but
|
|
270
|
+
must not be applied by Review.
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Keystone Router Module
|
|
2
|
+
|
|
3
|
+
## Intent
|
|
4
|
+
Classify the user's request and select one Keystone primary module.
|
|
5
|
+
|
|
6
|
+
## Load when
|
|
7
|
+
The task is ambiguous, explicitly asks for routing, or starts from `/keystone` without a clear module fit.
|
|
8
|
+
|
|
9
|
+
## Allowed mutation
|
|
10
|
+
None, except writing a short routing decision in the conversation.
|
|
11
|
+
|
|
12
|
+
## Must not
|
|
13
|
+
Modify files, perform implementation, or expose internal modules as public slash commands.
|
|
14
|
+
|
|
15
|
+
## May call
|
|
16
|
+
One primary module after classification. Gates only if that primary module requires them.
|
|
17
|
+
|
|
18
|
+
## Subagents and reasoning
|
|
19
|
+
Default reasoning: `low`. Do not deploy subagents for simple routing; ask one clarifying question if a safe single route is not clear. See `helpers/subagents.md`.
|
|
20
|
+
|
|
21
|
+
## Routing heuristics
|
|
22
|
+
Prefer the module indicated by the user's strongest current need, not the first verb alone. Weigh multiple signals together:
|
|
23
|
+
- Failure, error, broken behavior, repro, or "fix" language points to `debug` before implementation.
|
|
24
|
+
- Review, verify, approve, merge, or ship language points to `review` before release or cleanup.
|
|
25
|
+
- New capability requests route by maturity: use `shape` when intent is fuzzy, `breakdown` when the outcome is clear but work needs decomposition, and `build` when scope and acceptance criteria are already concrete.
|
|
26
|
+
|
|
27
|
+
## Disambiguation examples
|
|
28
|
+
- "debug this and fix it" -> select `debug`; establish cause before changing code.
|
|
29
|
+
- "review and ship" -> select `review`; verify readiness before any shipping step.
|
|
30
|
+
- "add feature" -> select `shape`, `breakdown`, or `build` based on maturity; ask one concise clarifying question if maturity is not inferable.
|
|
31
|
+
|
|
32
|
+
## Handoff
|
|
33
|
+
Name the selected primary module and the reason in one sentence, then continue under that module's contract.
|
|
34
|
+
|
|
35
|
+
## Exit gate
|
|
36
|
+
Exactly one primary module is selected, or one clarifying question is asked.
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# Keystone Shape Module
|
|
2
|
+
|
|
3
|
+
## Core principle
|
|
4
|
+
Shape is a specification algorithm: turn an unclear intent into exact behavior, constraints, tradeoffs, and acceptance criteria before anyone builds. It decides what should be true, not whether code is complete.
|
|
5
|
+
|
|
6
|
+
## Load when
|
|
7
|
+
Load when the user asks to draft, rewrite, design, spec, define product behavior, improve UI/UX, choose visual direction, name or explain a feature, make scope or architecture tradeoffs, prepare acceptance criteria, or turn research into an implementation-ready direction.
|
|
8
|
+
|
|
9
|
+
## Not for
|
|
10
|
+
- Writing implementation code or changing runtime behavior; hand off to `build`.
|
|
11
|
+
- Diagnosing failures; use `debug`.
|
|
12
|
+
- Broad repository health or release readiness; use `health` or `ship`.
|
|
13
|
+
- Inventing facts that should be researched first.
|
|
14
|
+
- Polishing completed work as shippable proof.
|
|
15
|
+
|
|
16
|
+
## Outcome contract
|
|
17
|
+
Deliver a shaped proposal that includes:
|
|
18
|
+
- goal, user/audience, and success criteria;
|
|
19
|
+
- product behavior and UX states, including happy, empty, loading/pending, error/failure, and edge/constraint states where relevant;
|
|
20
|
+
- copy or content direction when user-facing text matters;
|
|
21
|
+
- architecture and scope tradeoffs at the level needed for planning, not implementation;
|
|
22
|
+
- alternatives considered and why one direction is preferred;
|
|
23
|
+
- acceptance criteria and non-goals;
|
|
24
|
+
- recommended next module (`breakdown`, `build`, `review`, `research`, or no-op).
|
|
25
|
+
|
|
26
|
+
## Modes
|
|
27
|
+
- **Product shape:** specify the user job, trigger, actor permissions, core flow, business rules, constraints, success metrics, non-goals, and acceptance criteria.
|
|
28
|
+
- **UX/UI shape:** specify layout hierarchy, navigation, interaction model, responsiveness, accessibility, visual constraints, and the 5-state UX checklist: happy, empty, loading/pending, error/failure, edge/constraint.
|
|
29
|
+
- **Copy shape:** specify audience, message hierarchy, claims, tone, CTA, labels, empty/error text, and prohibited vague claims.
|
|
30
|
+
- **Technical shape:** specify boundary placement, API granularity, data flow, state ownership, dependency direction, persistence/integration seams, and architectural tradeoffs without writing code.
|
|
31
|
+
- **Alternative exploration:** present multiple viable directions before choosing or asking the user to choose.
|
|
32
|
+
|
|
33
|
+
## Process
|
|
34
|
+
1. Classify the request into one or more modes: product, UX/UI, copy, technical, or alternatives.
|
|
35
|
+
2. Identify the goal, primary user/audience, job-to-be-done, context of use, and success criteria.
|
|
36
|
+
3. Inspect existing product patterns, domain language, and provided material before inventing new conventions.
|
|
37
|
+
4. Convert intent into exact rules:
|
|
38
|
+
- Product: actor, trigger, preconditions, action, result, permissions, limits, and measurable success.
|
|
39
|
+
- UX/UI: screen/region hierarchy, controls, transitions, accessibility behavior, responsive behavior, and 5-state UX checklist.
|
|
40
|
+
- Copy: exact headline/body/CTA/error text or content rules, with claims grounded in known facts.
|
|
41
|
+
- Technical: components/modules involved, ownership boundaries, contracts, data flow, failure handling, migration or rollout constraints.
|
|
42
|
+
5. Apply technical shaping heuristics when architecture matters:
|
|
43
|
+
- **Boundary placement:** put boundaries where ownership, volatility, testability, or external systems change; do not split stable one-step logic.
|
|
44
|
+
- **API granularity:** prefer operations that match caller intent; avoid both chatty micro-methods and god endpoints that hide unrelated behavior.
|
|
45
|
+
- **Data flow:** name source of truth, state transitions, sync/async edges, validation points, and where errors surface.
|
|
46
|
+
- **Architectural tradeoffs:** state what becomes simpler, harder, slower, safer, more testable, or more coupled.
|
|
47
|
+
6. Ban fluffy terms unless translated to behavior. Words like “modern,” “clean,” “intuitive,” “delightful,” “seamless,” or “user-friendly” must become observable rules.
|
|
48
|
+
7. Offer alternatives when the direction is not obvious. Include the no-op option if legitimate.
|
|
49
|
+
8. Convert the chosen direction into acceptance criteria that can be implemented and reviewed.
|
|
50
|
+
9. Stop at the spec boundary. If the user asks for design plus build, finish Shape with the spec and recommended handoff to `build`; do not implement code.
|
|
51
|
+
|
|
52
|
+
## Subagents and reasoning
|
|
53
|
+
Default reasoning: `medium`. Use writer, UI, design, or architecture subagents for bounded alternatives, critique, or parallel concepts. Use `high` for multi-screen flows, accessibility-sensitive experiences, design-system impact, pricing/positioning, architecture boundaries, or major scope decisions. Subagents should produce options or critique, not unrequested implementation.
|
|
54
|
+
|
|
55
|
+
## Hard rules
|
|
56
|
+
- Shape is not build: do not edit production code or runtime behavior.
|
|
57
|
+
- If the user asks for design and implementation together, Shape stops after the specification and hands off to `build`.
|
|
58
|
+
- Ground claims in research or existing product evidence; call `research` when facts are missing.
|
|
59
|
+
- Always identify user/audience and success criteria for product-facing work.
|
|
60
|
+
- Include acceptance criteria before handing off to implementation.
|
|
61
|
+
- Translate fluffy descriptors into exact behavior; otherwise remove them.
|
|
62
|
+
- Avoid no-op avoidance: if the best answer is “do nothing” or “decide later,” say so with criteria.
|
|
63
|
+
|
|
64
|
+
## Failure modes
|
|
65
|
+
- **Abstract advice:** principles without actors, states, rules, tradeoffs, or acceptance criteria.
|
|
66
|
+
- **Pretty but unusable:** visual ideas without behavior, states, or acceptance criteria.
|
|
67
|
+
- **Fluffy spec:** “modern/user-friendly” language without exact behavior.
|
|
68
|
+
- **Spec as proof:** implying a design solves the problem before implementation or validation.
|
|
69
|
+
- **Audience blur:** writing for everyone and satisfying no one.
|
|
70
|
+
- **Scope fog:** hiding hard tradeoffs until build time.
|
|
71
|
+
- **Premature code:** implementing while still deciding what should exist.
|
|
72
|
+
|
|
73
|
+
## Examples
|
|
74
|
+
Good product shape: “When a workspace has no projects, show an empty state with title ‘Create your first project,’ one-sentence explanation, primary ‘New project’ CTA, and no table chrome. Success: first project creation rate increases.”
|
|
75
|
+
Bad product shape: “Make the dashboard more useful and modern.”
|
|
76
|
+
|
|
77
|
+
Good UX/UI shape: “On save, disable the Save button, keep the form editable fields visible, show inline progress text ‘Saving…’, then restore focus to the first invalid field on failure.”
|
|
78
|
+
Bad UX/UI shape: “Use a clean, user-friendly save experience.”
|
|
79
|
+
|
|
80
|
+
Good copy shape: “CTA says ‘Start free trial’ because billing is not required; avoid ‘Buy now.’ Error text names the failed action and recovery: ‘We couldn’t send the invite. Check the email address and try again.’”
|
|
81
|
+
Bad copy shape: “Use friendly copy that reduces friction.”
|
|
82
|
+
|
|
83
|
+
Good technical shape: “Keep validation in the domain service because API and background import both need it; expose one `createInvite` operation that returns accepted, duplicate, or invalid-email outcomes.”
|
|
84
|
+
Bad technical shape: “Add a helper/manager layer so the architecture is scalable.”
|
|
85
|
+
|
|
86
|
+
Worked technical shape: “For export retries, keep the queue worker as the owner of retry state, expose `requestExport(accountId, format)` from the API, persist `pending|running|failed|ready` status in `exports`, and surface failures through the existing job status endpoint. Tradeoff: one extra status read, but retry policy stays out of controllers and can be tested without HTTP.”
|
|
87
|
+
|
|
88
|
+
## Output format
|
|
89
|
+
```markdown
|
|
90
|
+
## Shaped direction
|
|
91
|
+
Goal: ...
|
|
92
|
+
Audience/user: ...
|
|
93
|
+
Mode(s): product | UX/UI | copy | technical | alternatives
|
|
94
|
+
|
|
95
|
+
### Proposed behavior / experience
|
|
96
|
+
- Actor/trigger/preconditions: ...
|
|
97
|
+
- Rules/results: ...
|
|
98
|
+
|
|
99
|
+
### UX states and copy
|
|
100
|
+
- Happy: ...
|
|
101
|
+
- Empty: ...
|
|
102
|
+
- Loading/pending: ...
|
|
103
|
+
- Error/failure: ...
|
|
104
|
+
- Edge/constraint: ...
|
|
105
|
+
- Key copy: ...
|
|
106
|
+
|
|
107
|
+
### Technical shape
|
|
108
|
+
- Boundaries/API/data flow: ...
|
|
109
|
+
- Tradeoffs: ...
|
|
110
|
+
|
|
111
|
+
### Scope and tradeoffs
|
|
112
|
+
- In: ...
|
|
113
|
+
- Out: ...
|
|
114
|
+
- Tradeoffs: ...
|
|
115
|
+
|
|
116
|
+
### Alternatives considered
|
|
117
|
+
- Option A: ...
|
|
118
|
+
- Option B/no-op: ...
|
|
119
|
+
|
|
120
|
+
### Acceptance criteria
|
|
121
|
+
- ...
|
|
122
|
+
|
|
123
|
+
### Recommended next step
|
|
124
|
+
Module and rationale
|
|
125
|
+
```
|