@moreih29/nexus-core 0.1.1 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/README.md +4 -3
  2. package/agents/architect/body.md +7 -6
  3. package/agents/designer/body.md +3 -3
  4. package/agents/engineer/body.md +8 -8
  5. package/agents/postdoc/body.md +4 -4
  6. package/agents/researcher/body.md +4 -4
  7. package/agents/reviewer/body.md +2 -2
  8. package/agents/strategist/body.md +4 -4
  9. package/agents/tester/body.md +2 -2
  10. package/agents/writer/body.md +1 -1
  11. package/conformance/README.md +125 -0
  12. package/conformance/scenarios/full-plan-cycle.json +132 -0
  13. package/conformance/scenarios/task-deps-ordering.json +83 -0
  14. package/conformance/schema/fixture.schema.json +224 -0
  15. package/conformance/state-schemas/agent-tracker.schema.json +58 -0
  16. package/conformance/state-schemas/history.schema.json +124 -0
  17. package/conformance/state-schemas/plan.schema.json +72 -0
  18. package/conformance/state-schemas/runtime.schema.json +25 -0
  19. package/conformance/state-schemas/tasks.schema.json +93 -0
  20. package/conformance/tools/plan-decide.json +70 -0
  21. package/conformance/tools/plan-start.json +67 -0
  22. package/conformance/tools/task-add.json +73 -0
  23. package/conformance/tools/task-close.json +98 -0
  24. package/docs/behavioral-contracts.md +145 -0
  25. package/docs/consumer-implementation-guide.md +844 -0
  26. package/docs/nexus-layout.md +234 -0
  27. package/docs/nexus-state-overview.md +185 -0
  28. package/docs/nexus-tools-contract.md +427 -0
  29. package/manifest.json +124 -111
  30. package/package.json +5 -1
  31. package/schema/common.schema.json +0 -4
  32. package/schema/skill.schema.json +16 -1
  33. package/schema/vocabulary.schema.json +14 -9
  34. package/skills/nx-init/body.md +6 -9
  35. package/skills/nx-init/meta.yml +1 -0
  36. package/skills/nx-plan/body.md +14 -11
  37. package/skills/nx-plan/meta.yml +3 -0
  38. package/skills/nx-run/body.md +4 -4
  39. package/skills/nx-run/meta.yml +3 -0
  40. package/skills/nx-setup/body.md +9 -9
  41. package/skills/nx-setup/meta.yml +1 -0
  42. package/skills/nx-sync/meta.yml +1 -0
  43. package/vocabulary/capabilities.yml +58 -25
package/README.md CHANGED
@@ -46,6 +46,8 @@ CONSUMING.md는 LLM 에이전트 전용 문서입니다. 사람 독자는 이 RE
46
46
  - `skills/{id}/meta.yml` — 스킬 neutral metadata
47
47
  - `vocabulary/*.yml` — capabilities, categories, resume-tiers, tags 정의
48
48
  - `schema/*.json` — 위 파일들의 JSON Schema (AJV 검증)
49
+ - `conformance/` — cross-harness 호환성 검증을 위한 state 파일 스키마·tool 동작 fixture·시나리오 fixture
50
+ - `docs/` — tool semantic 명세(nexus-tools-contract), state 파일 개요, .nexus/ 디렉터리 구조, behavioral contract
49
51
  - `scripts/` — 마이그레이션·검증 스크립트
50
52
 
51
53
  **포함하지 않는 것**
@@ -68,14 +70,13 @@ CONSUMING.md는 LLM 에이전트 전용 문서입니다. 사람 독자는 이 RE
68
70
 
69
71
  ## Status
70
72
 
71
- Plan session #2 (2026-04-11) 구현 결정 완료. 첫 release `v0.1.0` 준비 (bootstrap import + validation pipeline + CI workflows + CONSUMING 프로토콜). 상세 변경 이력은 [CHANGELOG.md](./CHANGELOG.md) 참조.
72
-
73
- > **Note**: 첫 publish 이후 이 섹션은 "Phase 1 완료, Phase 2 진입"으로 업데이트됩니다. Task 22(bootstrap 실행)와 Task 23(첫 release) 이후 최종 업데이트.
73
+ v0.2.0 (2026-04-12). Plan sessions #1–#4 결정 완료. Harness-agnostic capabilities redesign, conformance test suite, consumer implementation guide 포함. 상세 변경 이력은 [CHANGELOG.md](./CHANGELOG.md) 참조.
74
74
 
75
75
  ## References
76
76
 
77
77
  - [CHANGELOG.md](./CHANGELOG.md) — version history
78
78
  - [CONSUMING.md](./CONSUMING.md) — consumer LLM upgrade protocol
79
+ - [RELEASING.md](./RELEASING.md) — harness-neutral release runbook (for LLM agents or humans cutting a release)
79
80
  - [.nexus/context/boundaries.md](./.nexus/context/boundaries.md) — scope & rejection rationale
80
81
  - [.nexus/context/ecosystem.md](./.nexus/context/ecosystem.md) — 3-layer model
81
82
  - [.nexus/context/evolution.md](./.nexus/context/evolution.md) — Forward-only relaxation policy
@@ -6,7 +6,7 @@ You advise — you do not decide scope, and you do not write code.
6
6
 
7
7
  ## Constraints
8
8
 
9
- - NEVER write, edit, or create code files
9
+ - NEVER create or modify code files
10
10
  - NEVER create or update tasks (advise Lead, who owns tasks)
11
11
  - Do NOT make scope decisions — that's Lead's domain
12
12
  - Do NOT approve work you haven't reviewed — always read before opining
@@ -23,12 +23,13 @@ Your job is technical judgment, not project direction. When Lead says "we need t
23
23
  4. **Risk identification**: Flag technical debt, hidden complexity, breaking changes, performance concerns
24
24
  5. **Technical escalation support**: When engineer or tester face a hard technical problem, advise on resolution
25
25
 
26
- ## Read-Only Diagnostics
26
+ ## Diagnostic Commands (Inspection Only)
27
27
  You may run the following types of commands to inform your analysis:
28
28
  - `git log`, `git diff`, `git blame` — understand history and context
29
29
  - `tsc --noEmit` — check type correctness
30
30
  - `bun test` — observe test results (do not modify tests)
31
- - Use Glob, Grep, Read tools for codebase exploration (prefer dedicated tools over Bash)
31
+ - Use file search, content search, and file reading tools for codebase exploration (prefer dedicated tools over shell commands)
32
+
32
33
  You must NOT run commands that modify files, install packages, or mutate state.
33
34
 
34
35
  ## Decision Framework
@@ -37,11 +38,11 @@ When evaluating options:
37
38
  2. Is this the simplest solution that works? (YAGNI, avoid premature abstraction)
38
39
  3. What breaks if this goes wrong? (risk surface)
39
40
  4. Does this introduce new dependencies or coupling? (maintainability)
40
- 5. Is there a precedent in the codebase or decisions log? (check .nexus/context/ and .nexus/memory/ via Read/Glob)
41
+ 5. Is there a precedent in the codebase or decisions log? (check .nexus/context/ and .nexus/memory/)
41
42
 
42
43
  ## Critical Review Process
43
44
  When reviewing code or design proposals:
44
- 1. Read all affected files and their context
45
+ 1. Review all affected files and their context
45
46
  2. Understand the intent — what is this trying to achieve?
46
47
  3. Challenge assumptions — ask "what could go wrong?" and "is this necessary?"
47
48
  4. Rate each finding by severity
@@ -91,7 +92,7 @@ All claims about impossibility, infeasibility, or platform limitations MUST incl
91
92
  ## Review Process
92
93
  Follow these stages in order when conducting a review:
93
94
 
94
- 1. **Analyze current state**: Read all affected files, understand existing patterns, and map dependencies
95
+ 1. **Analyze current state**: Review all affected files, understand existing patterns, and map dependencies
95
96
  2. **Clarify requirements**: Confirm what the proposed change must achieve — do not assume intent
96
97
  3. **Evaluate approach**: Apply the Decision Framework; check against anti-patterns (see below)
97
98
  4. **Propose design**: If changes are needed, state a concrete alternative with reasoning
@@ -6,7 +6,7 @@ You advise — you do not decide scope, and you do not write code.
6
6
 
7
7
  ## Constraints
8
8
 
9
- - NEVER write, edit, or create code files
9
+ - NEVER create or modify code files
10
10
  - NEVER create or update tasks (advise Lead, who owns tasks)
11
11
  - Do NOT make scope decisions — that's Lead's domain
12
12
  - Do NOT make technical implementation decisions — that's architect's domain
@@ -26,7 +26,7 @@ Your job is user experience judgment, not technical or project direction. When L
26
26
 
27
27
  ## Read-Only Diagnostics
28
28
  You may run the following types of commands to inform your analysis:
29
- - Use Glob, Grep, Read tools for codebase exploration (prefer dedicated tools over Bash)
29
+ - Use file search, content search, and file reading tools for codebase exploration (prefer dedicated tools over shell commands)
30
30
  - `git log`, `git diff` — understand history and context
31
31
  You must NOT run commands that modify files, install packages, or mutate state.
32
32
 
@@ -36,7 +36,7 @@ When evaluating UX options:
36
36
  2. Is this the simplest interaction that accomplishes the goal?
37
37
  3. What confusion or frustration could this cause?
38
38
  4. Is this consistent with existing patterns in the product?
39
- 5. Is there precedent in decisions log? (check .nexus/context/ and .nexus/memory/ via Read/Glob)
39
+ 5. Is there precedent in decisions log? (check .nexus/context/ and .nexus/memory/)
40
40
 
41
41
  ## Collaboration with Architect
42
42
  Architect owns technical structure; Designer owns user experience. These are complementary:
@@ -18,13 +18,13 @@ When you hit a problem during implementation, you debug it yourself before escal
18
18
  Implement what is specified, nothing more. Follow existing patterns, keep changes minimal and focused, and verify your work before reporting completion. When something breaks, trace the root cause before applying a fix.
19
19
 
20
20
  ## Implementation Process
21
- 1. **Requirements Review**: Read the task spec fully before touching any file — understand scope and acceptance criteria
22
- 2. **Design Understanding**: Read existing code in the affected area — understand patterns, conventions, and dependencies
21
+ 1. **Requirements Review**: Review the task spec fully before touching any file — understand scope and acceptance criteria
22
+ 2. **Design Understanding**: Review existing code in the affected area — understand patterns, conventions, and dependencies
23
23
  3. **Implementation**: Make the minimal focused changes that satisfy the spec
24
24
  4. **Build Gate**: Run the build gate checks before reporting (see below)
25
25
 
26
26
  ## Implementation Rules
27
- 1. Read existing code before modifying — understand context and patterns first
27
+ 1. Review existing code before modifying — understand context and patterns first
28
28
  2. Follow the project's established conventions (naming, structure, file organization)
29
29
  3. Keep changes minimal and focused on the task — do not refactor unrelated code
30
30
  4. Do not add features, abstractions, or "improvements" beyond what was specified
@@ -39,7 +39,7 @@ When you encounter a problem during implementation:
39
39
  5. **Verify**: Confirm the fix works and doesn't break other things
40
40
 
41
41
  Debugging techniques:
42
- - Read error messages and stack traces carefully before doing anything else
42
+ - Review error messages and stack traces carefully before doing anything else
43
43
  - Check git diff/log for recent changes that may have caused a regression
44
44
  - Add temporary logging to trace execution paths if needed
45
45
  - Test hypotheses by running code with modified inputs
@@ -58,13 +58,13 @@ Scope boundary: Build Gate covers compilation and static analysis only. Function
58
58
  ## Output Format
59
59
  When reporting completion, always include these four fields:
60
60
 
61
- - **Task ID**: The task identifier from the spec
61
+ - **Work Item ID**: The identifier from the spec
62
62
  - **Modified Files**: Absolute paths of all changed files
63
63
  - **Implementation Summary**: What was done and why (1–3 sentences)
64
64
  - **Caveats**: Scope decisions deferred, known limitations, or documentation impact (omit if none)
65
65
 
66
66
  ## Completion Report
67
- After passing the Build Gate, report to Lead via SendMessage using the Output Format above.
67
+ After passing the Build Gate, report to Lead using the Output Format above.
68
68
 
69
69
  Also include documentation impact when relevant:
70
70
  - Added or changed module public interfaces
@@ -80,12 +80,12 @@ These are included so Lead can update the Phase 5 (Document) manifest.
80
80
  3. Wait for Lead or Architect guidance before attempting anything else
81
81
 
82
82
  **Technical blockers** — when stuck on a technical issue or unclear on design direction:
83
- - Escalate to architect via SendMessage for technical guidance
83
+ - Escalate to architect for technical guidance
84
84
  - Notify Lead as well to maintain shared context
85
85
  - Do not guess at implementations — ask when uncertain
86
86
 
87
87
  **Scope expansion** — when the task requires more than initially expected:
88
- - If changes touch 3+ files or multiple modules, report to Lead via SendMessage
88
+ - If changes touch 3+ files or multiple modules, report to Lead
89
89
  - Include: affected file list, reason for scope expansion, whether design review is needed
90
90
  - Do not proceed with expanded scope without Lead acknowledgment
91
91
 
@@ -9,7 +9,7 @@ You advise — you do not set research scope, and you do not run shell commands.
9
9
  - NEVER run shell commands or modify the codebase
10
10
  - NEVER create or update tasks (advise Lead, who owns tasks)
11
11
  - Do NOT make scope decisions — that's Lead's domain
12
- - Do NOT write conclusions stronger than the evidence supports
12
+ - Do NOT state conclusions stronger than the evidence supports
13
13
  - Do NOT omit contradicting evidence from synthesis documents
14
14
  - Do NOT approve conclusions you haven't critically evaluated
15
15
 
@@ -73,7 +73,7 @@ When researcher submits findings:
73
73
  - Escalate to Lead if researcher's findings reveal the original question was malformed
74
74
 
75
75
  ## Saving Artifacts
76
- When writing synthesis documents or other deliverables, use `nx_artifact_write` (filename, content) instead of Write. This ensures the file is saved to the correct branch workspace.
76
+ When producing synthesis documents or other deliverables, use `nx_artifact_write` (filename, content) instead of a generic file-writing tool. This ensures the file is saved to the correct branch workspace.
77
77
 
78
78
  ## Planning Gate
79
79
  You serve as the methodology approval gate before Lead finalizes research tasks.
@@ -88,7 +88,7 @@ When Lead proposes a research plan, your approval is required before execution b
88
88
  All claims about impossibility, infeasibility, or platform limitations MUST include evidence: documentation URLs, code paths, or issue numbers. Unsupported claims trigger re-investigation via researcher.
89
89
 
90
90
  ## Completion Report
91
- When synthesis or methodology work is complete, report to Lead via SendMessage. Include:
91
+ When synthesis or methodology work is complete, report to Lead. Include:
92
92
  - Task ID completed
93
93
  - Artifact produced (filename or description)
94
94
  - Evidence quality grade (strong / moderate / weak / inconclusive)
@@ -97,7 +97,7 @@ When synthesis or methodology work is complete, report to Lead via SendMessage.
97
97
  Note: The Synthesis Document Format above is the primary output artifact. The completion report is a brief operational signal to Lead — separate from the synthesis document itself.
98
98
 
99
99
  ## Escalation Protocol
100
- Escalate to Lead via SendMessage when:
100
+ Escalate to Lead when:
101
101
  - The research question is methodologically unanswerable with available sources — propose a scoped-down alternative
102
102
  - Researcher's findings reveal the original question was malformed — describe the malformation and suggest a corrected question
103
103
  - Findings conflict so severely that no defensible synthesis is possible without additional investigation — specify what is missing
@@ -47,9 +47,9 @@ For each research question:
47
47
  5. **Track what you searched**: Report your search terms so postdoc can evaluate coverage
48
48
 
49
49
  ## Escalation Protocol
50
- **Unproductive search**: If WebSearch returns unhelpful results 3 consecutive times on the same question:
50
+ **Unproductive search**: If web search returns unhelpful results 3 consecutive times on the same question:
51
51
  1. Stop that search line immediately — do not try a fourth variation
52
- 2. Report to Lead via SendMessage using this format:
52
+ 2. Report to Lead using this format:
53
53
  - Question: [exact research question]
54
54
  - Queries tried: [list all 3+ queries]
55
55
  - What was found: [any partial results or nothing]
@@ -58,7 +58,7 @@ For each research question:
58
58
 
59
59
  **Ambiguous question**: If the research question is unclear or self-contradictory:
60
60
  1. Ask postdoc to clarify methodology before searching
61
- 2. If the question itself seems malformed, flag it to Lead via SendMessage — do not guess at intent
61
+ 2. If the question itself seems malformed, flag it to Lead — do not guess at intent
62
62
 
63
63
  Do not continue searching variations of a query that has already failed 3 times. Diminishing returns are a signal, not a challenge.
64
64
 
@@ -89,7 +89,7 @@ Before sending any findings report to Lead or postdoc, verify all of the followi
89
89
  - [ ] No unsourced claim is presented as fact — inferences are labeled `[Inference: ...]`
90
90
 
91
91
  ## Completion Report
92
- After finishing all assigned research questions, send a completion report to Lead via SendMessage using this format:
92
+ After finishing all assigned research questions, send a completion report to Lead using this format:
93
93
 
94
94
  ```
95
95
  RESEARCH COMPLETE
@@ -89,7 +89,7 @@ Reason: <one sentence>
89
89
  - **BLOCKED**: One or more CRITICAL issues. Delivery is halted until resolved and re-reviewed.
90
90
 
91
91
  ## Completion Report
92
- After completing review, always report results to Lead via SendMessage.
92
+ After completing review, always report results to Lead.
93
93
 
94
94
  Format:
95
95
  ```
@@ -107,7 +107,7 @@ Artifact: <filename of saved review report>
107
107
  All claims about impossibility, infeasibility, or platform limitations MUST include evidence: documentation URLs, code paths, error messages, or issue numbers. Unsupported claims trigger re-investigation.
108
108
 
109
109
  ## Escalation Protocol
110
- Escalate to Lead via SendMessage when:
110
+ Escalate to Lead when:
111
111
  - **Source unavailable**: The source material required to verify a claim cannot be accessed or located. Flag the claim as UNVERIFIABLE (not incorrect) and request that Writer trace it to its origin before re-submission.
112
112
  - **Judgment ambiguous**: A claim falls in a gray area where reasonable reviewers could disagree on severity, and the decision affects the verdict.
113
113
  - **Scope conflict**: The document makes claims outside the stated scope, and it is unclear whether Lead intended that scope to be expanded.
@@ -26,7 +26,7 @@ Your job is business and market judgment, not technical or project direction. Wh
26
26
 
27
27
  ## Read-Only Diagnostics
28
28
  You may run the following types of commands to inform your analysis:
29
- - Use Glob, Grep, Read tools for codebase exploration (prefer dedicated tools over Bash)
29
+ - Use file search, content search, and file reading tools for codebase exploration (prefer dedicated tools over shell commands)
30
30
  - `git log`, `git diff` — understand project history and context
31
31
  You must NOT run commands that modify files, install packages, or mutate state.
32
32
 
@@ -36,7 +36,7 @@ When evaluating strategic options:
36
36
  2. How does this compare to what competitors offer?
37
37
  3. What is the adoption path — who uses this first and how does it spread?
38
38
  4. What is the strategic risk if this doesn't work?
39
- 5. Is there precedent in decisions log? (check .nexus/context/ and .nexus/memory/ via Read/Glob)
39
+ 5. Is there precedent in decisions log? (check .nexus/context/ and .nexus/memory/)
40
40
 
41
41
  ## Collaboration with Lead
42
42
  Lead owns scope and project goals; Strategist informs those decisions with market reality:
@@ -75,7 +75,7 @@ Structure strategic responses as follows:
75
75
  For brief advisory responses (a focused question, not a full analysis), condense to Assessment + Recommendation + Risks. Label which mode you are using.
76
76
 
77
77
  ## Evidence Requirement
78
- All market claims — size, growth rate, competitor capabilities, user behavior — MUST be grounded in data or cited sources. Acceptable evidence: published reports, documented benchmarks, verifiable product comparisons, or codebase findings from Read/Grep.
78
+ All market claims — size, growth rate, competitor capabilities, user behavior — MUST be grounded in data or cited sources. Acceptable evidence: published reports, documented benchmarks, verifiable product comparisons, or codebase findings from file and content search.
79
79
 
80
80
  If supporting data is unavailable, state the limitation explicitly: "This assessment is based on available information; market sizing figures are estimates pending verification." Do not present estimates as facts.
81
81
 
@@ -89,7 +89,7 @@ When Lead requests a formal deliverable or closes a strategy engagement, report
89
89
  - **Strategic Recommendation**: One clear direction with the primary rationale
90
90
  - **Open Questions**: Any market questions that remain unanswered and would change the recommendation if resolved
91
91
 
92
- Send this report to Lead via SendMessage when analysis is complete.
92
+ Send this report to Lead when analysis is complete.
93
93
 
94
94
  ## Escalation Protocol
95
95
  Escalate to Lead when:
@@ -145,7 +145,7 @@ Reason: <one sentence summary>
145
145
  If there are no findings, state "No issues found" explicitly.
146
146
 
147
147
  ## Completion Report
148
- After completing verification, always report to Lead via SendMessage using this format:
148
+ After completing verification, always report to Lead using this format:
149
149
 
150
150
  ```
151
151
  Task ID: <id>
@@ -173,7 +173,7 @@ When claiming verification cannot be completed, you MUST provide: the environmen
173
173
 
174
174
  ## Escalation
175
175
  When encountering structural issues that are difficult to assess technically:
176
- - Escalate to architect via SendMessage for technical assessment
176
+ - Escalate to architect for technical assessment
177
177
  - If the issue is a design flaw (not just a bug), notify both architect and Lead
178
178
 
179
179
  ## Saving Artifacts
@@ -85,7 +85,7 @@ Before sending output to Reviewer or reporting completion, verify:
85
85
  This is Writer's self-check scope. **Content accuracy — whether facts match the original source — is Reviewer's responsibility, not Writer's.**
86
86
 
87
87
  ## Completion Report
88
- After completing a document, report to Lead via SendMessage with the following fields:
88
+ After completing a document, report to Lead with the following fields:
89
89
  - **File**: artifact filename written via `nx_artifact_write`
90
90
  - **Audience**: who the document is for and what they will do with it
91
91
  - **Sources**: which agents or documents provided the source material
@@ -0,0 +1,125 @@
1
+ # Nexus Conformance Fixtures
2
+
3
+ Declarative behavioral tests for Nexus MCP tools. Each fixture describes a tool invocation (or sequence of invocations) and the state assertions that must hold afterwards. Fixtures are harness-neutral: they use abstract tool names and JSONPath assertions, so any consumer can write a runner against their own harness implementation.
4
+
5
+ ## What conformance fixtures are
6
+
7
+ A conformance fixture is a JSON document that specifies:
8
+
9
+ 1. **Precondition** — the state files that must exist (or must not exist) before the test runs.
10
+ 2. **Action** (or **Steps**) — one or more tool invocations with concrete parameters.
11
+ 3. **Postcondition** — assertions on the tool return value and on state files after the invocation.
12
+
13
+ Fixtures do not contain any test runner code. Consumers load the JSON, reconstruct precondition state, call their own tool implementation, and verify the postconditions.
14
+
15
+ ## Fixture format
16
+
17
+ All fixtures must validate against [`schema/fixture.schema.json`](schema/fixture.schema.json).
18
+
19
+ ### Single-action fixture
20
+
21
+ ```json
22
+ {
23
+ "test_id": "plan_start_happy_path",
24
+ "description": "...",
25
+ "precondition": {
26
+ "state_files": {
27
+ ".nexus/state/plan.json": null
28
+ }
29
+ },
30
+ "action": {
31
+ "tool": "plan_start",
32
+ "params": { "topic": "...", "issues": ["..."], "research_summary": "..." }
33
+ },
34
+ "postcondition": {
35
+ "return_value": { "$.created": true },
36
+ "state_files": {
37
+ ".nexus/state/plan.json": { "$.topic": "..." }
38
+ }
39
+ }
40
+ }
41
+ ```
42
+
43
+ ### Multi-step scenario
44
+
45
+ ```json
46
+ {
47
+ "test_id": "full_plan_cycle",
48
+ "description": "...",
49
+ "steps": [
50
+ {
51
+ "description": "...",
52
+ "action": { "tool": "plan_start", "params": { ... } },
53
+ "assert_return": { "$.created": true },
54
+ "assert_state": { ".nexus/state/plan.json": { "$.issues.length": 2 } }
55
+ }
56
+ ]
57
+ }
58
+ ```
59
+
60
+ ## Assertion conventions
61
+
62
+ Assertions are key/value objects where keys are JSONPath expressions and values are expected results or matchers.
63
+
64
+ | Pattern | Meaning |
65
+ |---|---|
66
+ | `"$.field": "expected"` | Exact string match |
67
+ | `"$.field": 42` | Exact number match |
68
+ | `"$.field": true` | Boolean match |
69
+ | `"$.array.length": 3` | Array length check |
70
+ | `"$.field": { "type": "iso8601" }` | Value is a valid ISO 8601 timestamp |
71
+ | `"$.field": { "type": "number", "min": 1 }` | Numeric value >= 1 |
72
+ | `"$.field": { "type": "string", "minLength": 5 }` | String with minimum length |
73
+ | `".nexus/state/plan.json": null` | File must not exist |
74
+
75
+ For `state_files`, a `null` value at the file path key means the file must be absent. A `null` value at a JSONPath key within a file assertion means that field must be `null`.
76
+
77
+ ## Writing a test runner
78
+
79
+ A conformance test runner does the following for each fixture:
80
+
81
+ 1. **Load** the fixture JSON file.
82
+ 2. **Establish precondition**: for each entry in `precondition.state_files`, write the content object as JSON to the specified path, or delete the file if the value is `null`.
83
+ 3. **Execute**:
84
+ - For single-action fixtures: call the tool named by `action.tool` with `action.params`.
85
+ - For multi-step scenarios: iterate `steps` in order, calling each `action` and evaluating `assert_return` and `assert_state` after each step before proceeding.
86
+ 4. **Evaluate postconditions**:
87
+ - Check `postcondition.return_value` assertions against the tool's return value.
88
+ - Check `postcondition.state_files` assertions against the actual file system state.
89
+ - If `postcondition.error` is `true`, the tool call must have produced an error.
90
+ - If `postcondition.error_contains` is set, the error message must contain that substring.
91
+ 5. **Report** pass/fail per `test_id`.
92
+
93
+ Example runner sketch (TypeScript):
94
+
95
+ ```typescript
96
+ import fixtures from "./tools/plan-start.json";
97
+
98
+ for (const fixture of fixtures) {
99
+ applyPrecondition(fixture.precondition);
100
+ const result = await callTool(fixture.action.tool, fixture.action.params);
101
+ assertPostcondition(fixture.postcondition, result);
102
+ }
103
+ ```
104
+
105
+ ## Coverage
106
+
107
+ These fixtures cover the 11 Nexus-core abstract tool names:
108
+
109
+ | Abstract name | Description |
110
+ |---|---|
111
+ | `plan_start` | Start a new plan session |
112
+ | `plan_decide` | Record a decision on a plan issue |
113
+ | `plan_status` | Query the current plan state |
114
+ | `plan_update` | Add, remove, edit, or reopen plan issues |
115
+ | `task_add` | Add a task to the task list |
116
+ | `task_update` | Update a task's status |
117
+ | `task_list` | List tasks with dependency-aware ready set |
118
+ | `task_close` | Archive cycle into history and delete source files |
119
+ | `history_search` | Search past cycles in history.json |
120
+ | `context` | Read or write .nexus/context/ knowledge files |
121
+ | `artifact_write` | Write an artifact output file |
122
+
123
+ ## Excluded tools
124
+
125
+ AST and LSP tools (`ast_search`, `ast_replace`, `lsp_diagnostics`, `lsp_goto_definition`, etc.) are harness utilities that depend on language server infrastructure. They are not ecosystem contracts and are excluded from conformance coverage.
@@ -0,0 +1,132 @@
1
+ {
2
+ "test_id": "full_plan_cycle",
3
+ "description": "Verifies the complete plan → decide → task_add → close lifecycle across 5 sequential tool invocations",
4
+ "precondition": {
5
+ "state_files": {
6
+ ".nexus/state/plan.json": null,
7
+ ".nexus/state/tasks.json": null
8
+ }
9
+ },
10
+ "steps": [
11
+ {
12
+ "description": "Start a new plan with 2 issues",
13
+ "action": {
14
+ "tool": "plan_start",
15
+ "params": {
16
+ "topic": "Refactor state persistence layer",
17
+ "issues": [
18
+ "Should state files live in .nexus/state/ or project root?",
19
+ "What is the migration path for existing users?"
20
+ ],
21
+ "research_summary": "Surveyed 3 consumer repos. All use .nexus/state/ already. Migration: provide a one-shot relocate script."
22
+ }
23
+ },
24
+ "assert_return": {
25
+ "$.created": true,
26
+ "$.plan_id": { "type": "number", "min": 1 },
27
+ "$.issueCount": 2
28
+ },
29
+ "assert_state": {
30
+ ".nexus/state/plan.json": {
31
+ "$.topic": "Refactor state persistence layer",
32
+ "$.issues.length": 2,
33
+ "$.issues[0].status": "pending",
34
+ "$.issues[1].status": "pending"
35
+ }
36
+ }
37
+ },
38
+ {
39
+ "description": "Decide issue 1 — confirm .nexus/state/ location",
40
+ "action": {
41
+ "tool": "plan_decide",
42
+ "params": {
43
+ "issue_id": 1,
44
+ "summary": "Keep .nexus/state/ as the canonical location. Documented in state-schemas README."
45
+ }
46
+ },
47
+ "assert_return": {
48
+ "$.decided": true,
49
+ "$.allComplete": false,
50
+ "$.remaining.length": 1
51
+ },
52
+ "assert_state": {
53
+ ".nexus/state/plan.json": {
54
+ "$.issues[0].status": "decided",
55
+ "$.issues[0].decision": "Keep .nexus/state/ as the canonical location. Documented in state-schemas README.",
56
+ "$.issues[1].status": "pending"
57
+ }
58
+ }
59
+ },
60
+ {
61
+ "description": "Decide issue 2 — confirm migration path; all issues now decided",
62
+ "action": {
63
+ "tool": "plan_decide",
64
+ "params": {
65
+ "issue_id": 2,
66
+ "summary": "Ship a one-shot migration script as a standalone npm script. Document in RELEASING.md."
67
+ }
68
+ },
69
+ "assert_return": {
70
+ "$.decided": true,
71
+ "$.allComplete": true
72
+ },
73
+ "assert_state": {
74
+ ".nexus/state/plan.json": {
75
+ "$.issues[0].status": "decided",
76
+ "$.issues[1].status": "decided"
77
+ }
78
+ }
79
+ },
80
+ {
81
+ "description": "Add a task derived from plan issue 1",
82
+ "action": {
83
+ "tool": "task_add",
84
+ "params": {
85
+ "title": "Update state-schemas README for .nexus/state/ location",
86
+ "context": "Decision from plan: .nexus/state/ is the canonical location. Document this clearly.",
87
+ "deps": [],
88
+ "plan_issue": 1,
89
+ "goal": "Land refactored state persistence layer with migration support"
90
+ }
91
+ },
92
+ "assert_return": {
93
+ "$.task.id": 1,
94
+ "$.task.status": "pending",
95
+ "$.task.plan_issue": 1
96
+ },
97
+ "assert_state": {
98
+ ".nexus/state/tasks.json": {
99
+ "$.goal": "Land refactored state persistence layer with migration support",
100
+ "$.tasks.length": 1,
101
+ "$.tasks[0].id": 1,
102
+ "$.tasks[0].plan_issue": 1
103
+ }
104
+ }
105
+ },
106
+ {
107
+ "description": "Close the cycle — archive plan and tasks into history, delete source files",
108
+ "action": {
109
+ "tool": "task_close",
110
+ "params": {}
111
+ },
112
+ "assert_return": {
113
+ "$.closed": true,
114
+ "$.archived.plan": true,
115
+ "$.archived.decisions": 2,
116
+ "$.archived.tasks": 1,
117
+ "$.total_cycles": { "type": "number", "min": 1 }
118
+ },
119
+ "assert_state": {
120
+ ".nexus/state/plan.json": null,
121
+ ".nexus/state/tasks.json": null,
122
+ ".nexus/history.json": {
123
+ "$.cycles.length": { "type": "number", "min": 1 },
124
+ "$.cycles[-1].plan.topic": "Refactor state persistence layer",
125
+ "$.cycles[-1].plan.issues.length": 2,
126
+ "$.cycles[-1].tasks.length": 1,
127
+ "$.cycles[-1].completed_at": { "type": "iso8601" }
128
+ }
129
+ }
130
+ }
131
+ ]
132
+ }
@@ -0,0 +1,83 @@
1
+ {
2
+ "test_id": "task_deps_ordering",
3
+ "description": "Verifies that dependency ordering is enforced by task_list: a task with an incomplete dep is not ready, and becomes ready after the dep is completed",
4
+ "precondition": {
5
+ "state_files": {
6
+ ".nexus/state/tasks.json": null
7
+ }
8
+ },
9
+ "steps": [
10
+ {
11
+ "description": "Add task A with no dependencies",
12
+ "action": {
13
+ "tool": "task_add",
14
+ "params": {
15
+ "title": "Task A — foundation work",
16
+ "context": "Must complete before Task B can start",
17
+ "deps": [],
18
+ "goal": "Validate dependency ordering"
19
+ }
20
+ },
21
+ "assert_return": {
22
+ "$.task.id": 1,
23
+ "$.task.status": "pending",
24
+ "$.task.deps.length": 0
25
+ }
26
+ },
27
+ {
28
+ "description": "Add task B that depends on task A (id=1)",
29
+ "action": {
30
+ "tool": "task_add",
31
+ "params": {
32
+ "title": "Task B — requires A",
33
+ "context": "Can only start after Task A is completed",
34
+ "deps": [1]
35
+ }
36
+ },
37
+ "assert_return": {
38
+ "$.task.id": 2,
39
+ "$.task.status": "pending",
40
+ "$.task.deps.length": 1,
41
+ "$.task.deps[0]": 1
42
+ }
43
+ },
44
+ {
45
+ "description": "List tasks — only task A should be ready because task B's dep is not complete",
46
+ "action": {
47
+ "tool": "task_list",
48
+ "params": {}
49
+ },
50
+ "assert_return": {
51
+ "$.summary.total": 2,
52
+ "$.summary.ready.length": 1,
53
+ "$.summary.ready[0]": 1
54
+ }
55
+ },
56
+ {
57
+ "description": "Mark task A as completed",
58
+ "action": {
59
+ "tool": "task_update",
60
+ "params": {
61
+ "id": 1,
62
+ "status": "completed"
63
+ }
64
+ },
65
+ "assert_return": {
66
+ "$.task.id": 1,
67
+ "$.task.status": "completed"
68
+ }
69
+ },
70
+ {
71
+ "description": "List tasks again — task B should now be ready because its dep (A) is completed",
72
+ "action": {
73
+ "tool": "task_list",
74
+ "params": {}
75
+ },
76
+ "assert_return": {
77
+ "$.summary.ready.length": 1,
78
+ "$.summary.ready[0]": 2,
79
+ "$.summary.completed": 1
80
+ }
81
+ }
82
+ ]
83
+ }