maxsimcli 3.5.3 → 3.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/dist/.tsbuildinfo +1 -1
  2. package/dist/assets/CHANGELOG.md +21 -0
  3. package/dist/assets/dashboard/server.js +1 -1
  4. package/dist/assets/templates/agents/maxsim-code-reviewer.md +169 -0
  5. package/dist/assets/templates/agents/maxsim-debugger.md +47 -0
  6. package/dist/assets/templates/agents/maxsim-executor.md +113 -0
  7. package/dist/assets/templates/agents/maxsim-phase-researcher.md +46 -0
  8. package/dist/assets/templates/agents/maxsim-plan-checker.md +45 -0
  9. package/dist/assets/templates/agents/maxsim-planner.md +48 -0
  10. package/dist/assets/templates/agents/maxsim-spec-reviewer.md +150 -0
  11. package/dist/assets/templates/agents/maxsim-verifier.md +43 -0
  12. package/dist/assets/templates/commands/maxsim/init-existing.md +42 -0
  13. package/dist/assets/templates/skills/systematic-debugging/SKILL.md +118 -0
  14. package/dist/assets/templates/skills/tdd/SKILL.md +118 -0
  15. package/dist/assets/templates/skills/verification-before-completion/SKILL.md +102 -0
  16. package/dist/assets/templates/workflows/init-existing.md +1099 -0
  17. package/dist/cli.cjs +55 -4
  18. package/dist/cli.cjs.map +1 -1
  19. package/dist/cli.js +2 -1
  20. package/dist/cli.js.map +1 -1
  21. package/dist/core/core.js +1 -1
  22. package/dist/core/core.js.map +1 -1
  23. package/dist/core/index.d.ts +2 -2
  24. package/dist/core/index.d.ts.map +1 -1
  25. package/dist/core/index.js +2 -1
  26. package/dist/core/index.js.map +1 -1
  27. package/dist/core/init.d.ts +24 -2
  28. package/dist/core/init.d.ts.map +1 -1
  29. package/dist/core/init.js +61 -0
  30. package/dist/core/init.js.map +1 -1
  31. package/dist/core/roadmap.js +2 -2
  32. package/dist/core/roadmap.js.map +1 -1
  33. package/dist/install.cjs +38 -0
  34. package/dist/install.cjs.map +1 -1
  35. package/dist/install.js +49 -0
  36. package/dist/install.js.map +1 -1
  37. package/package.json +1 -1
@@ -34,6 +34,17 @@ Goal-backward verification starts from the outcome and works backwards:
34
34
  3. What must be WIRED for those artifacts to function?
35
35
 
36
36
  Then verify each level against the actual codebase.
37
+
38
+ **Evidence Gate:** Every verification finding must be backed by evidence:
39
+
40
+ ```
41
+ CLAIM: [what you are verifying]
42
+ EVIDENCE: [exact command or file read performed]
43
+ OUTPUT: [relevant excerpt of actual output]
44
+ VERDICT: PASS | FAIL
45
+ ```
46
+
47
+ Do NOT state "verified" without producing an evidence block. Do NOT trust SUMMARY.md claims — verify against actual code and command output.
37
48
  </core_principle>
38
49
 
39
50
  <verification_process>
@@ -588,6 +599,38 @@ return <div>No messages</div> // Always shows "no messages"
588
599
 
589
600
  </stub_detection_patterns>
590
601
 
602
+ <anti_rationalization>
603
+
604
+ ## Iron Law
605
+
606
+ <HARD-GATE>
607
+ NO VERIFICATION PASS WITHOUT INDEPENDENT EVIDENCE FOR EVERY TRUTH.
608
+ SUMMARY.md says it's done. CODE says otherwise. Trust the code.
609
+ </HARD-GATE>
610
+
611
+ ## Common Rationalizations — REJECT THESE
612
+
613
+ | Excuse | Why It Violates the Rule |
614
+ |--------|--------------------------|
615
+ | "SUMMARY says it's done" | SUMMARYs document what Claude SAID. You verify what EXISTS. |
616
+ | "Task completed = goal achieved" | Task completion ≠ goal achievement. Verify the goal. |
617
+ | "Tests pass = requirements met" | Tests can pass with incomplete implementation. Check requirements individually. |
618
+ | "I trust the executor" | Trust is not verification. Check the code yourself. |
619
+ | "The build succeeds" | A successful build does not prove functional correctness. |
620
+ | "Most truths hold" | ALL truths must hold. Partial ≠ complete. |
621
+
622
+ ## Red Flags — STOP and reassess if you catch yourself:
623
+
624
+ - About to mark a truth as "verified" without reading the actual code
625
+ - Trusting SUMMARY.md claims without grep/read verification
626
+ - Skipping a truth because "it was tested"
627
+ - Writing "PASS" before checking every must_have individually
628
+ - Feeling rushed to complete verification quickly
629
+
630
+ **If any red flag triggers: STOP. Read the code. Run the command. Produce the evidence block. THEN make the claim.**
631
+
632
+ </anti_rationalization>
633
+
591
634
  <success_criteria>
592
635
 
593
636
  - [ ] Previous VERIFICATION.md checked (Step 0)
@@ -0,0 +1,42 @@
1
+ ---
2
+ name: maxsim:init-existing
3
+ description: Initialize MAXSIM in an existing project with codebase scanning and smart defaults
4
+ argument-hint: "[--auto]"
5
+ allowed-tools:
6
+ - Read
7
+ - Bash
8
+ - Write
9
+ - Task
10
+ - AskUserQuestion
11
+ ---
12
+ <context>
13
+ **Flags:**
14
+ - `--auto` — Automatic mode. Runs full codebase scan, infers everything from code, creates all docs without interaction. Review recommended after auto mode.
15
+ </context>
16
+
17
+ <objective>
18
+ Initialize MAXSIM in an existing codebase through scan-first flow: codebase analysis, conflict resolution, scan-informed questioning, stage-aware document generation.
19
+
20
+ **Creates:**
21
+ - `.planning/codebase/` — full codebase analysis (4 mapper agents)
22
+ - `.planning/PROJECT.md` — project context with current state summary
23
+ - `.planning/config.json` — workflow preferences
24
+ - `.planning/REQUIREMENTS.md` — stage-aware requirements
25
+ - `.planning/ROADMAP.md` — milestone + suggested phases
26
+ - `.planning/STATE.md` — pre-populated project memory
27
+
28
+ **After this command:** Run `/maxsim:plan-phase 1` to start execution.
29
+ </objective>
30
+
31
+ <execution_context>
32
+ @./workflows/init-existing.md
33
+ @./references/questioning.md
34
+ @./references/ui-brand.md
35
+ @./templates/project.md
36
+ @./templates/requirements.md
37
+ </execution_context>
38
+
39
+ <process>
40
+ Execute the init-existing workflow from @./workflows/init-existing.md end-to-end.
41
+ Preserve all workflow gates (conflict resolution, scan completion, validation, approvals, commits).
42
+ </process>
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: systematic-debugging
3
+ description: Use when encountering any bug, test failure, or unexpected behavior — requires root cause investigation before attempting any fix
4
+ ---
5
+
6
+ # Systematic Debugging
7
+
8
+ Random fixes waste time and create new bugs. Find the root cause first.
9
+
10
+ **If you have not identified the root cause, you are guessing — not debugging.**
11
+
12
+ ## The Iron Law
13
+
14
+ <HARD-GATE>
15
+ NO FIX ATTEMPTS WITHOUT UNDERSTANDING ROOT CAUSE.
16
+ If you have not completed the REPRODUCE and HYPOTHESIZE steps, you CANNOT propose a fix.
17
+ "Let me just try this" is guessing, not debugging.
18
+ Violating this rule is a violation — not a time-saving shortcut.
19
+ </HARD-GATE>
20
+
21
+ ## The Gate Function
22
+
23
+ Follow these steps IN ORDER for every bug, test failure, or unexpected behavior.
24
+
25
+ ### 1. REPRODUCE — Confirm the Problem
26
+
27
+ - Run the failing command or test. Capture the EXACT error output.
28
+ - Can you trigger it reliably? What are the exact steps?
29
+ - If not reproducible: gather more data — do not guess.
30
+
31
+ ```bash
32
+ # Example: reproduce a test failure
33
+ npx vitest run path/to/failing.test.ts
34
+ ```
35
+
36
+ ### 2. HYPOTHESIZE — Form a Theory
37
+
38
+ - Read the error message COMPLETELY (stack trace, line numbers, exit codes)
39
+ - Check recent changes: `git diff`, recent commits, new dependencies
40
+ - Trace data flow: where does the bad value originate?
41
+ - State your hypothesis clearly: "I think X is the root cause because Y"
42
+
43
+ ### 3. ISOLATE — Narrow the Scope
44
+
45
+ - Find the SMALLEST reproduction case
46
+ - In multi-component systems, add diagnostic logging at each boundary
47
+ - Identify which SPECIFIC layer or component is failing
48
+ - Compare against working examples in the codebase
49
+
50
+ ### 4. VERIFY — Test Your Hypothesis
51
+
52
+ - Make the SMALLEST possible change to test your hypothesis
53
+ - Change ONE variable at a time — never multiple things simultaneously
54
+ - If hypothesis is wrong: form a NEW hypothesis, do not stack fixes
55
+
56
+ ### 5. FIX — Address the Root Cause
57
+
58
+ - Write a failing test that reproduces the bug (see TDD skill)
59
+ - Implement a SINGLE fix that addresses the root cause
60
+ - No "while I'm here" improvements — fix only the identified issue
61
+
62
+ ### 6. CONFIRM — Verify the Fix
63
+
64
+ - Run the original failing test: it must now pass
65
+ - Run the full test suite: no regressions
66
+ - Verify the original error no longer occurs
67
+
68
+ ```bash
69
+ # Confirm the specific fix
70
+ npx vitest run path/to/fixed.test.ts
71
+ # Confirm no regressions
72
+ npx vitest run
73
+ ```
74
+
75
+ ## Common Rationalizations — REJECT THESE
76
+
77
+ | Excuse | Why It Violates the Rule |
78
+ |--------|--------------------------|
79
+ | "I think I know what it is" | Thinking is not evidence. Reproduce first, then hypothesize. |
80
+ | "Let me just try this fix" | "Just try" = guessing. You have skipped REPRODUCE and HYPOTHESIZE. |
81
+ | "Quick patch for now, investigate later" | "Later" never comes. Patches mask the real problem. |
82
+ | "Multiple changes at once saves time" | You cannot isolate what worked. You will create new bugs. |
83
+ | "The issue is simple, I don't need the process" | Simple bugs have root causes too. The process is fast for simple bugs. |
84
+ | "I'm under time pressure" | Systematic debugging IS faster than guess-and-check thrashing. |
85
+ | "The reference is too long, I'll skim it" | Partial understanding guarantees partial fixes. Read it completely. |
86
+
87
+ ## Red Flags — STOP If You Catch Yourself:
88
+
89
+ - Changing code before reproducing the error
90
+ - Proposing a fix before reading the full error message and stack trace
91
+ - Trying random fixes hoping one will work
92
+ - Changing multiple things simultaneously
93
+ - Saying "it's probably X" without evidence
94
+ - Applying a fix that did not work, then adding another fix on top
95
+ - On your 3rd failed fix attempt (this signals an architectural problem — escalate)
96
+
97
+ **If any red flag triggers: STOP. Return to step 1 (REPRODUCE).**
98
+
99
+ **If 3+ fix attempts have failed:** The issue is likely architectural, not a simple bug. Document what you have tried and escalate to the user for a design decision.
100
+
101
+ ## Verification Checklist
102
+
103
+ Before claiming a bug is fixed, confirm:
104
+
105
+ - [ ] The original error has been reproduced reliably
106
+ - [ ] Root cause has been identified with evidence (not guessed)
107
+ - [ ] A failing test reproduces the bug
108
+ - [ ] A single, targeted fix addresses the root cause
109
+ - [ ] The failing test now passes
110
+ - [ ] The full test suite passes (no regressions)
111
+ - [ ] The original error no longer occurs when running the original steps
112
+
113
+ ## Debugging in MAXSIM Context
114
+
115
+ When debugging during plan execution, MAXSIM deviation rules apply:
116
+ - **Rule 1 (Auto-fix bugs):** You may auto-fix bugs found during execution, but you must still follow this debugging process.
117
+ - **Rule 4 (Architectural changes):** If 3+ fix attempts fail, STOP and return a checkpoint — this is an architectural decision for the user.
118
+ - Track all debugging deviations for SUMMARY.md documentation.
@@ -0,0 +1,118 @@
1
+ ---
2
+ name: tdd
3
+ description: Use when implementing any feature or bug fix — requires writing a failing test before any implementation code
4
+ ---
5
+
6
+ # Test-Driven Development (TDD)
7
+
8
+ Write the test first. Watch it fail. Write minimal code to pass. Clean up.
9
+
10
+ **If you did not watch the test fail, you do not know if it tests the right thing.**
11
+
12
+ ## The Iron Law
13
+
14
+ <HARD-GATE>
15
+ NO IMPLEMENTATION CODE WITHOUT A FAILING TEST FIRST.
16
+ If you wrote production code before the test, DELETE IT. Start over.
17
+ No exceptions. No "I'll add tests after." No "keep as reference."
18
+ Violating this rule is a violation — not a judgment call.
19
+ </HARD-GATE>
20
+
21
+ ## The Gate Function
22
+
23
+ Follow this cycle for every behavior change, feature addition, or bug fix.
24
+
25
+ ### 1. RED — Write Failing Test
26
+
27
+ - Write ONE minimal test that describes the desired behavior
28
+ - Test name describes what SHOULD happen, not implementation details
29
+ - Use real code paths — mocks only when unavoidable (external APIs, databases)
30
+
31
+ ### 2. VERIFY RED — Run the Test
32
+
33
+ ```bash
34
+ # Run the test suite for this file
35
+ npx vitest run path/to/test.test.ts
36
+ ```
37
+
38
+ - Test MUST fail (not error — fail with an assertion)
39
+ - Failure message must match the missing behavior
40
+ - If test passes immediately: you are testing existing behavior — rewrite it
41
+
42
+ ### 3. GREEN — Write Minimal Code
43
+
44
+ - Write the SIMPLEST code that makes the test pass
45
+ - Do NOT add features the test does not require
46
+ - Do NOT refactor yet — that comes next
47
+
48
+ ### 4. VERIFY GREEN — Run All Tests
49
+
50
+ ```bash
51
+ npx vitest run
52
+ ```
53
+
54
+ - The new test MUST pass
55
+ - ALL existing tests MUST still pass
56
+ - If any test fails: fix code, not tests
57
+
58
+ ### 5. REFACTOR — Clean Up (Tests Still Green)
59
+
60
+ - Remove duplication, improve names, extract helpers
61
+ - Run tests after every change — they must stay green
62
+ - Do NOT add new behavior during refactor
63
+
64
+ ### 6. REPEAT — Next failing test for next behavior
65
+
66
+ ## Common Rationalizations — REJECT THESE
67
+
68
+ | Excuse | Why It Violates the Rule |
69
+ |--------|--------------------------|
70
+ | "Too simple to test" | Simple code breaks. The test takes 30 seconds to write. |
71
+ | "I'll add tests after" | Tests written after pass immediately — they prove nothing. |
72
+ | "The test framework isn't set up yet" | Set it up. That is part of the task, not a reason to skip. |
73
+ | "I know the code works" | Knowledge is not evidence. A passing test is evidence. |
74
+ | "TDD is slower for this task" | TDD is faster than debugging. Every "quick skip" creates debt. |
75
+ | "Let me keep the code as reference" | You will adapt it instead of writing test-first. Delete means delete. |
76
+ | "I need to explore the design first" | Explore, then throw it away. Start implementation with TDD. |
77
+
78
+ ## Red Flags — STOP If You Catch Yourself:
79
+
80
+ - Writing implementation code before writing a test
81
+ - Writing a test that passes on the first run (you are testing existing behavior)
82
+ - Skipping the VERIFY RED step ("I know it will fail")
83
+ - Adding features beyond what the current test requires
84
+ - Skipping the REFACTOR step to save time
85
+ - Rationalizing "just this once" or "this is different"
86
+ - Keeping pre-TDD code "as reference" while writing tests
87
+
88
+ **If any red flag triggers: STOP. Delete the implementation. Write the test first.**
89
+
90
+ ## Verification Checklist
91
+
92
+ Before claiming TDD compliance, confirm:
93
+
94
+ - [ ] Every new function/method has a corresponding test
95
+ - [ ] Each test was written BEFORE its implementation
96
+ - [ ] Each test was observed to FAIL before implementation was written
97
+ - [ ] Each test failed for the expected reason (missing behavior, not syntax error)
98
+ - [ ] Minimal code was written to pass each test
99
+ - [ ] All tests pass after implementation
100
+ - [ ] Refactoring (if any) did not break any tests
101
+
102
+ Cannot check all boxes? You skipped TDD. Start over.
103
+
104
+ ## When Stuck
105
+
106
+ | Problem | Solution |
107
+ |---------|----------|
108
+ | Don't know how to test it | Write the assertion first. What should the output be? |
109
+ | Test setup is too complex | The design is too complex. Simplify the interface. |
110
+ | Must mock everything | Code is too coupled. Use dependency injection. |
111
+ | Existing code has no tests | Add tests for the code you are changing. Start the cycle now. |
112
+
113
+ ## Integration with MAXSIM
114
+
115
+ In MAXSIM plan execution, tasks marked `tdd="true"` follow this cycle with per-step commits:
116
+ - **RED commit:** `test({phase}-{plan}): add failing test for [feature]`
117
+ - **GREEN commit:** `feat({phase}-{plan}): implement [feature]`
118
+ - **REFACTOR commit (if changes made):** `refactor({phase}-{plan}): clean up [feature]`
@@ -0,0 +1,102 @@
1
+ ---
2
+ name: verification-before-completion
3
+ description: Use before claiming any work is complete, fixed, or passing — requires running verification commands and reading output before making success claims
4
+ ---
5
+
6
+ # Verification Before Completion
7
+
8
+ Claiming work is complete without verification is dishonesty, not efficiency.
9
+
10
+ **Evidence before claims, always.**
11
+
12
+ ## The Iron Law
13
+
14
+ <HARD-GATE>
15
+ NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE.
16
+ If you have not run the verification command in this turn, you CANNOT claim it passes.
17
+ "Should work" is not evidence. "I'm confident" is not evidence.
18
+ Violating this rule is a violation — not a special case.
19
+ </HARD-GATE>
20
+
21
+ ## The Gate Function
22
+
23
+ BEFORE claiming any status, expressing satisfaction, or marking a task done:
24
+
25
+ 1. **IDENTIFY:** What command proves this claim?
26
+ 2. **RUN:** Execute the FULL command (fresh, in this turn — not a previous run)
27
+ 3. **READ:** Read the FULL output. Check the exit code. Count failures.
28
+ 4. **VERIFY:** Does the output actually confirm the claim?
29
+ - If NO: State the actual status with evidence
30
+ - If YES: State the claim WITH the evidence
31
+ 5. **CLAIM:** Only now may you assert completion
32
+
33
+ **Skip any step = lying, not verifying.**
34
+
35
+ ### Evidence Block Format
36
+
37
+ When claiming task completion, build completion, or test passage, produce:
38
+
39
+ ```
40
+ CLAIM: [what you are claiming]
41
+ EVIDENCE: [exact command run in this turn]
42
+ OUTPUT: [relevant excerpt of actual output]
43
+ VERDICT: PASS | FAIL
44
+ ```
45
+
46
+ This format is required for task completion claims in MAXSIM plan execution. It is NOT required for intermediate status updates like "I have read the file" or "here is the plan."
47
+
48
+ ## Common Rationalizations — REJECT THESE
49
+
50
+ | Excuse | Why It Violates the Rule |
51
+ |--------|--------------------------|
52
+ | "Should work now" | "Should" is not evidence. RUN the command. |
53
+ | "I'm confident in the logic" | Confidence is not evidence. Run it. |
54
+ | "The linter passed" | Linter passing does not mean tests pass or build succeeds. |
55
+ | "Just this once" | NO EXCEPTIONS. This is the rule, not a guideline. |
56
+ | "I only changed one line" | One line can break everything. Verify. |
57
+ | "The subagent reported success" | Trust test output and VCS diffs, not agent reports. |
58
+ | "Partial check is enough" | Partial proves nothing about the unchecked parts. |
59
+
60
+ ## Red Flags — STOP If You Catch Yourself:
61
+
62
+ - Using "should", "probably", "seems to", or "looks good" about unverified work
63
+ - Expressing satisfaction ("Great!", "Perfect!", "Done!") before running verification
64
+ - About to commit or push without running the test/build command in THIS turn
65
+ - Trusting a subagent's completion report without independent verification
66
+ - Thinking "the last run was clean, I only changed one line"
67
+ - About to mark a MAXSIM task as done without running the `<verify>` block
68
+ - Relying on a previous turn's test output as current evidence
69
+
70
+ **If any red flag triggers: STOP. Run the command. Read the output. THEN make the claim.**
71
+
72
+ ## What Counts as Verification
73
+
74
+ | Claim | Requires | NOT Sufficient |
75
+ |-------|----------|----------------|
76
+ | "Tests pass" | Test command output showing 0 failures | Previous run, "should pass", partial run |
77
+ | "Build succeeds" | Build command with exit code 0 | Linter passing, "logs look clean" |
78
+ | "Bug is fixed" | Original failing test now passes | "Code changed, assumed fixed" |
79
+ | "Task is complete" | All done criteria checked with evidence | "I implemented everything in the plan" |
80
+ | "No regressions" | Full test suite passing | "I only changed one file" |
81
+
82
+ ## Verification Checklist
83
+
84
+ Before marking any work as complete:
85
+
86
+ - [ ] Identified the verification command for every claim
87
+ - [ ] Ran each verification command fresh in this turn
88
+ - [ ] Read the full output (not just the summary line)
89
+ - [ ] Checked exit codes (0 = success, non-zero = failure)
90
+ - [ ] Evidence supports every completion claim
91
+ - [ ] No "should", "probably", or "seems to" in your completion statement
92
+ - [ ] Evidence block produced for the task completion claim
93
+
94
+ ## In MAXSIM Plan Execution
95
+
96
+ The executor's task commit protocol requires verification BEFORE committing:
97
+ 1. Run the task's `<verify>` block (automated checks)
98
+ 2. Confirm the `<done>` criteria are met with evidence
99
+ 3. Produce an evidence block for the task completion
100
+ 4. Only then: stage files and commit
101
+
102
+ The verifier agent independently re-checks all claims — do not assume the verifier will catch what you missed.