@callumvass/forgeflow-dev 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json ADDED
@@ -0,0 +1,42 @@
1
+ {
2
+ "name": "@callumvass/forgeflow-dev",
3
+ "version": "0.1.0",
4
+ "type": "module",
5
+ "description": "Dev pipeline for Pi — TDD implementation, code review, architecture, and skill discovery.",
6
+ "keywords": [
7
+ "pi-package"
8
+ ],
9
+ "license": "MIT",
10
+ "repository": {
11
+ "type": "git",
12
+ "url": "git+https://github.com/callumvass/forgeflow.git",
13
+ "directory": "packages/dev"
14
+ },
15
+ "publishConfig": {
16
+ "provenance": true
17
+ },
18
+ "pi": {
19
+ "extensions": [
20
+ "./extensions"
21
+ ],
22
+ "skills": [
23
+ "./skills"
24
+ ],
25
+ "agents": [
26
+ "./agents"
27
+ ]
28
+ },
29
+ "scripts": {
30
+ "build": "tsup"
31
+ },
32
+ "dependencies": {
33
+ "@callumvass/forgeflow-shared": "*"
34
+ },
35
+ "peerDependencies": {
36
+ "@mariozechner/pi-ai": "*",
37
+ "@mariozechner/pi-agent-core": "*",
38
+ "@mariozechner/pi-coding-agent": "*",
39
+ "@mariozechner/pi-tui": "*",
40
+ "@sinclair/typebox": "*"
41
+ }
42
+ }
@@ -0,0 +1,119 @@
1
+ ---
2
+ name: code-review
3
+ description: Structured, checklist-driven code review with confidence scoring and evidence requirements. Precision over recall.
4
+ ---
5
+
6
+ # Code Review Skill
7
+
8
+ ## Review Order
9
+
10
+ Always review in this order. Check each category completely before moving to the next.
11
+
12
+ ### 1. Logic & Correctness
13
+ - Business logic errors (wrong conditions, missing branches)
14
+ - Control flow bugs (off-by-one, infinite loops, unreachable code)
15
+ - Wrong return values or incorrect transformations
16
+ - State management bugs (stale state, missing updates, race conditions)
17
+ - Dead wiring: new modules/classes only imported in test files, never called from production code
18
+
19
+ ### 2. Security
20
+ - Injection vulnerabilities (SQL, XSS, command injection)
21
+ - Auth/authz bypass (missing checks, privilege escalation)
22
+ - Secrets exposure (hardcoded keys, tokens in logs)
23
+ - Unsafe deserialization or eval usage
24
+
25
+ ### 3. Error Handling
26
+ - Unhandled null/undefined that will crash at runtime
27
+ - Missing error paths that silently fail
28
+ - Swallowed exceptions hiding real failures
29
+ - Error messages leaking internal details
30
+
31
+ ### 4. Performance
32
+ - N+1 queries or unbounded loops over data
33
+ - Missing pagination on unbounded result sets
34
+ - Memory leaks (event listeners, subscriptions not cleaned up)
35
+ - Unnecessary re-renders or recomputation in hot paths
36
+
37
+ ### 5. Test Quality
38
+ - TDD compliance: tests verify behavior through public interfaces, not implementation
39
+ - Boundary coverage: mocked boundaries have corresponding integration/contract tests
40
+ - Mock fidelity: mocks encode correct assumptions about external systems
41
+ - Missing tests for new behavior paths
42
+
43
+ ## Evidence Requirements
44
+
45
+ Every finding MUST include:
46
+ - **File path and line number(s)** — exact location
47
+ - **Code snippet** — the problematic code, quoted verbatim
48
+ - **Explanation** — why this is wrong (not "could be better", but "this WILL cause X")
49
+ - **Suggested fix** — concrete, not vague
50
+
51
+ Findings without evidence are invalid and will be rejected by the review judge.
52
+
53
+ ## Confidence Scoring
54
+
55
+ Rate each finding 0-100:
56
+ - **< 50**: Do not report.
57
+ - **50-84**: Do not report. Below threshold.
58
+ - **85-94**: Report. High confidence this is a real issue.
59
+ - **95-100**: Report. Certain. Evidence directly confirms.
60
+
61
+ **Threshold: only report findings with confidence >= 85.**
62
+
63
+ ## Severity Levels
64
+
65
+ - **critical**: Will cause a bug, security vulnerability, data loss, or crash in production. Must fix before merge.
66
+ - **major**: Significant logic error, missing error handling that will affect users. Must fix.
67
+ - **minor**: Code quality issue, edge case gap, suboptimal pattern. Fix and merge.
68
+ - **nit**: Style preference, naming suggestion, trivial improvement. Author's discretion.
69
+
70
+ Guidelines:
71
+ - Security findings are always `critical`.
72
+ - Test Quality findings are `minor` unless they mask a real bug.
73
+
74
+ ## FINDINGS Output Format
75
+
76
+ ```markdown
77
+ ## Review: [scope description]
78
+
79
+ ### Finding 1
80
+ - **Confidence**: [85-100]
81
+ - **Severity**: [critical | major | minor | nit]
82
+ - **Category**: [Logic | Security | Error Handling | Performance | Test Quality]
83
+ - **File**: path/to/file.ts:42
84
+ - **Code**: `the problematic code`
85
+ - **Issue**: [clear explanation of what's wrong and what will happen]
86
+ - **Fix**: [concrete suggestion]
87
+
88
+ ### Finding 2
89
+ ...
90
+
91
+ ## Summary
92
+ - Total findings: N
93
+ - Categories: [breakdown]
94
+ - Overall assessment: [one sentence]
95
+ ```
96
+
97
+ If no findings meet the confidence threshold:
98
+
99
+ ```markdown
100
+ ## Review: [scope description]
101
+
102
+ No issues found above confidence threshold (85).
103
+
104
+ ## Summary
105
+ - Reviewed: [what was checked]
106
+ - Overall assessment: Code meets standards.
107
+ ```
108
+
109
+ ## Anti-Patterns — Do NOT Flag
110
+
111
+ - **Naming/formatting** — linters handle this
112
+ - **Style preferences** — subjective choices
113
+ - **Theoretical edge cases** — "what if X is null?" when X is guaranteed non-null
114
+ - **Architectural suggestions** — out of scope for PR review
115
+ - **"You could also..." suggestions** — if it's not broken, don't suggest alternatives
116
+ - **Over-engineering suggestions** — "add error handling for..." when the error can't happen
117
+ - **Pre-existing issues** — problems that existed before this PR
118
+ - **Missing features** — unless it's in the acceptance criteria
119
+ - **Documentation gaps** — unless the code is genuinely incomprehensible
@@ -0,0 +1,58 @@
1
+ ---
2
+ name: plugins
3
+ description: Domain-specific plugin router. Scans project plugins, matches triggers against codebase/diff, returns which plugins to load for a given pipeline stage.
4
+ ---
5
+
6
+ # Plugins
7
+
8
+ Domain-specific knowledge that is progressively loaded based on the project's codebase and the current pipeline stage. Plugins live in the project repository, not in forgeflow.
9
+
10
+ ## Plugin Location
11
+
12
+ Plugins are installed per-project at:
13
+
14
+ ```
15
+ <repo-root>/.forgeflow/plugins/<name>/PLUGIN.md
16
+ ```
17
+
18
+ Use `/discover-skills` to find and install plugins for your project's tech stack.
19
+
20
+ ## Plugin Structure
21
+
22
+ ```
23
+ .forgeflow/
24
+ plugins/
25
+ <name>/
26
+ PLUGIN.md # Triggers + stage-specific guidance (always read when matched)
27
+ references/ # Deep context (read lazily when needed)
28
+ *.md
29
+ ```
30
+
31
+ Each `PLUGIN.md` has YAML frontmatter with trigger conditions and stage applicability:
32
+
33
+ ```yaml
34
+ ---
35
+ name: Human-readable name
36
+ description: One-line description
37
+ triggers:
38
+ files: ["*.tsx", "*.jsx"] # Glob patterns for project files
39
+ content: ["useQuery", "cn("] # Literal strings to search for in codebase/diff
40
+ stages: [plan, implement, review, refactor, architecture] # which pipeline stages use this plugin
41
+ ---
42
+ ```
43
+
44
+ ## How to Match Plugins
45
+
46
+ Given a diff or codebase, scan each plugin at `<cwd>/.forgeflow/plugins/*/PLUGIN.md`:
47
+
48
+ 1. **files**: At least one file (changed or in codebase) matches any of the plugin's file glob patterns.
49
+ 2. **content**: At least one of the plugin's content strings appears in the diff or codebase.
50
+ 3. **stages**: The current pipeline stage is listed in the plugin's `stages` array.
51
+
52
+ All three conditions must be true for a plugin to match.
53
+
54
+ ## Progressive Disclosure Layers
55
+
56
+ 1. **Trigger scan** — read only frontmatter, decide which plugins match. Cheap.
57
+ 2. **Plugin body** — read the matched `PLUGIN.md` body for stage-specific guidance. Medium cost.
58
+ 3. **Plugin references** — read files from `references/` only when deeper context is needed. Expensive, on-demand only.
@@ -0,0 +1,46 @@
1
+ ---
2
+ name: stitch
3
+ description: Stitch UI design reference. DESIGN.md is the styling authority, Stitch screens are the layout authority. Use when implementing UI with design system integration.
4
+ ---
5
+
6
+ # Stitch — UI Design Reference
7
+
8
+ Stitch is the source of truth for UI design. It provides two things:
9
+
10
+ 1. **DESIGN.md** — a design system document in the project root. Contains colors, typography, components, spacing, do's/don'ts. This is your styling bible.
11
+ 2. **Screens** — HTML mockups stored in a Stitch project. Each screen is a pixel-perfect reference for a specific page or component.
12
+
13
+ ## How Stitch Connects to the Project
14
+
15
+ - **DESIGN.md** lives in the project root. If it doesn't exist, Stitch is not in use — skip all Stitch workflows.
16
+ - **Stitch project ID** is specified in the PRD or issue (not in DESIGN.md).
17
+
18
+ ## Workflow: Implementing UI
19
+
20
+ ### 1. Read the Design System
21
+
22
+ Read `DESIGN.md` first. Every visual decision (colors, fonts, spacing, elevation, component patterns) must follow its rules.
23
+
24
+ ### 2. Get Screen References
25
+
26
+ If the issue body contains embedded screen HTML, use that as your layout reference. If not, and the project has Stitch MCP tools available, fetch screens using the project ID.
27
+
28
+ ### 3. Implement to Match
29
+
30
+ For each screen relevant to your work:
31
+ 1. Use the HTML as your **exact visual target**.
32
+ 2. Implement the component to match the HTML structure, styling, and layout.
33
+ 3. Adapt to the project's framework (React, Vue, etc.) but the visual output must match.
34
+
35
+ ### 4. Generate Missing Screens
36
+
37
+ When implementing a UI component that has **no matching screen**: describe the component in the issue or consult DESIGN.md for patterns. If Stitch MCP tools are available, generate a screen reference.
38
+
39
+ ## Rules
40
+
41
+ - **DESIGN.md is the styling authority.** All visual decisions come from DESIGN.md.
42
+ - **Screen HTML is the layout authority.** The visual output must match exactly.
43
+ - **Copy Stitch classes verbatim.** Stitch HTML uses Tailwind classes. Use the exact same classes — do NOT translate to inline styles, CSS modules, or custom CSS. Inline styles lose hover states, opacity modifiers, and responsive breakpoints.
44
+ - **Configure Tailwind theme first.** Stitch HTML relies on custom theme colors (e.g., `bg-primary/20`). Ensure the Tailwind config defines all design system colors from DESIGN.md before implementing components.
45
+ - **No custom CSS.** Use Tailwind exclusively. If you need a Stitch class that doesn't resolve, fix the Tailwind config — don't replace the class.
46
+ - **Don't deviate from the design.** If the design conflicts with requirements, flag it — don't silently "improve" it.
@@ -0,0 +1,115 @@
1
+ ---
2
+ name: tdd
3
+ description: Test-driven development with red-green-refactor loop. Use when building features or fixing bugs using TDD, mentions "red-green-refactor", wants integration tests, or asks for test-first development.
4
+ ---
5
+
6
+ # Test-Driven Development
7
+
8
+ ## Philosophy
9
+
10
+ Examples in this skill use framework-neutral pseudocode. Translate test syntax/assertions to the project's language and test runner.
11
+
12
+ **Core principle**: Test at system boundaries, not internal modules. Mock only what you don't control.
13
+
14
+ Every system has two testable boundaries:
15
+ 1. **Server/backend boundary** — test through the real runtime or framework test harness (HTTP handlers, message handlers, queue consumers). Use real storage, real state.
16
+ 2. **Client/frontend boundary** — test at the route/page level. Mock the network edge (HTTP/WebSocket), but render real components with real stores and real hooks.
17
+
18
+ Internal modules (stores, hooks, services, helpers) get covered transitively by boundary tests. Don't test them separately — if a store has a bug, a route-level test that exercises the same behavior will catch it.
19
+
20
+ **Unit test only pure algorithmic functions** where the math matters (rounding, scoring, splitting, validation). Everything else goes through a boundary.
21
+
22
+ **Good tests** are integration-style: they exercise real code paths through public APIs. They describe _what_ the system does, not _how_ it does it.
23
+
24
+ **Bad tests** are coupled to implementation. They mock internal collaborators, test private methods, or verify through external means. The warning sign: your test breaks when you refactor, but behavior hasn't changed.
25
+
26
+ See [tests.md](tests.md) for boundary examples, what not to test, and [mocking.md](mocking.md) for mocking guidelines.
27
+
28
+ ## Anti-Pattern: Horizontal Slices
29
+
30
+ **DO NOT write all tests first, then all implementation.** This is "horizontal slicing" — treating RED as "write all tests" and GREEN as "write all code."
31
+
32
+ **Correct approach**: Vertical slices via tracer bullets. One test → one implementation → repeat. Each test responds to what you learned from the previous cycle.
33
+
34
+ ```
35
+ WRONG (horizontal):
36
+ RED: test1, test2, test3, test4, test5
37
+ GREEN: impl1, impl2, impl3, impl4, impl5
38
+
39
+ RIGHT (vertical):
40
+ RED→GREEN: test1→impl1
41
+ RED→GREEN: test2→impl2
42
+ RED→GREEN: test3→impl3
43
+ ...
44
+ ```
45
+
46
+ ## Workflow
47
+
48
+ ### 1. Planning
49
+
50
+ Before writing any code:
51
+
52
+ - Confirm what interface changes are needed
53
+ - Confirm which behaviors to test (prioritize)
54
+ - Identify opportunities for [deep modules](deep-modules.md) (small interface, deep implementation)
55
+ - Design interfaces for [testability](interface-design.md)
56
+ - List the behaviors to test (not implementation steps)
57
+
58
+ **You can't test everything.** Focus testing effort on critical paths and complex logic, not every possible edge case.
59
+
60
+ ### 2. Tracer Bullet
61
+
62
+ Write ONE test that confirms ONE thing about the system:
63
+
64
+ ```
65
+ RED: Write test for first behavior → test fails
66
+ GREEN: Write minimal code to pass → test passes
67
+ ```
68
+
69
+ This is your tracer bullet — proves the path works end-to-end.
70
+
71
+ ### 3. Incremental Loop
72
+
73
+ For each remaining behavior:
74
+
75
+ ```
76
+ RED: Write next test → fails
77
+ GREEN: Minimal code to pass → passes
78
+ ```
79
+
80
+ Rules:
81
+ - One test at a time
82
+ - Only enough code to pass current test
83
+ - Don't anticipate future tests
84
+ - Keep tests focused on observable behavior
85
+
86
+ ### 4. Refactor
87
+
88
+ After all tests pass, look for [refactor candidates](refactoring.md):
89
+ - Extract duplication
90
+ - Deepen modules (move complexity behind simple interfaces)
91
+ - Apply SOLID principles where natural
92
+ - Run tests after each refactor step
93
+
94
+ **Never refactor while RED.** Get to GREEN first.
95
+
96
+ ### 5. Boundary Verification
97
+
98
+ After all unit tests pass, ask: **"Did any test mock a system boundary?"**
99
+
100
+ If yes, the mock encodes invisible assumptions about the other side. For each mocked boundary:
101
+ 1. **Name the assumption**
102
+ 2. **Verify it** — Write one test that uses the real system to confirm the assumption
103
+ 3. **If you can't verify it** — Write a contract test
104
+
105
+ **Rule of thumb**: If your test mocks something, you need another test that doesn't.
106
+
107
+ ## Checklist Per Cycle
108
+
109
+ ```
110
+ [ ] Test describes behavior, not implementation
111
+ [ ] Test uses public interface only
112
+ [ ] Test would survive internal refactor
113
+ [ ] Code is minimal for this test
114
+ [ ] No speculative features added
115
+ ```
@@ -0,0 +1,33 @@
1
+ # Deep Modules
2
+
3
+ From "A Philosophy of Software Design":
4
+
5
+ **Deep module** = small interface + lots of implementation
6
+
7
+ ```
8
+ ┌─────────────────────┐
9
+ │ Small Interface │ ← Few methods, simple params
10
+ ├─────────────────────┤
11
+ │ │
12
+ │ │
13
+ │ Deep Implementation│ ← Complex logic hidden
14
+ │ │
15
+ │ │
16
+ └─────────────────────┘
17
+ ```
18
+
19
+ **Shallow module** = large interface + little implementation (avoid)
20
+
21
+ ```
22
+ ┌─────────────────────────────────┐
23
+ │ Large Interface │ ← Many methods, complex params
24
+ ├─────────────────────────────────┤
25
+ │ Thin Implementation │ ← Just passes through
26
+ └─────────────────────────────────┘
27
+ ```
28
+
29
+ When designing interfaces, ask:
30
+
31
+ - Can I reduce the number of methods?
32
+ - Can I simplify the parameters?
33
+ - Can I hide more complexity inside?
@@ -0,0 +1,31 @@
1
+ # Interface Design for Testability
2
+
3
+ Good interfaces make testing natural:
4
+
5
+ 1. **Accept dependencies, don't create them**
6
+
7
+ ```text
8
+ // Testable
9
+ function processOrder(order, paymentGateway) {}
10
+
11
+ // Hard to test
12
+ function processOrder(order) {
13
+ const gateway = new StripeGateway();
14
+ }
15
+ ```
16
+
17
+ 2. **Return results, don't produce side effects**
18
+
19
+ ```text
20
+ // Testable
21
+ function calculateDiscount(cart) -> discount
22
+
23
+ // Hard to test
24
+ function applyDiscount(cart) {
25
+ cart.total -= discount;
26
+ }
27
+ ```
28
+
29
+ 3. **Small surface area**
30
+ - Fewer methods = fewer tests needed
31
+ - Fewer params = simpler test setup
@@ -0,0 +1,86 @@
1
+ # When to Mock
2
+
3
+ Mock at **system boundaries** only:
4
+
5
+ - External APIs (payment, email, etc.)
6
+ - Databases (sometimes - prefer test DB)
7
+ - Time/randomness
8
+ - File system (sometimes)
9
+
10
+ Don't mock:
11
+
12
+ - Your own classes/modules
13
+ - Internal collaborators
14
+ - Anything you control
15
+
16
+ ## Designing for Mockability
17
+
18
+ At system boundaries, design interfaces that are easy to mock:
19
+
20
+ **1. Use dependency injection**
21
+
22
+ Pass external dependencies in rather than creating them internally:
23
+
24
+ ```text
25
+ // Easy to mock
26
+ function processPayment(order, paymentClient) {
27
+ return paymentClient.charge(order.total);
28
+ }
29
+
30
+ // Hard to mock
31
+ function processPayment(order) {
32
+ const client = new StripeClient(process.env.STRIPE_KEY);
33
+ return client.charge(order.total);
34
+ }
35
+ ```
36
+
37
+ **2. Prefer SDK-style interfaces over generic fetchers**
38
+
39
+ Create specific functions for each external operation instead of one generic function with conditional logic:
40
+
41
+ ```text
42
+ // GOOD: Each function is independently mockable
43
+ api = {
44
+ getUser: (id) => fetch(`/users/${id}`),
45
+ getOrders: (userId) => fetch(`/users/${userId}/orders`),
46
+ createOrder: (data) => fetch('/orders', { method: 'POST', body: data }),
47
+ }
48
+
49
+ // BAD: Mocking requires conditional logic inside the mock
50
+ api = {
51
+ fetch: (endpoint, options) => fetch(endpoint, options),
52
+ }
53
+ ```
54
+
55
+ The SDK approach means:
56
+ - Each mock returns one specific shape
57
+ - No conditional logic in test setup
58
+ - Easier to see which endpoints a test exercises
59
+ - Stronger contracts per endpoint (typed or documented)
60
+
61
+ ## The Boundary Rule
62
+
63
+ Your system has boundaries — edges where your code talks to something you don't control. Mock **at** those edges, never inside.
64
+
65
+ Examples of boundaries (mock these):
66
+ - Network calls (HTTP, WebSocket, gRPC) from client to server
67
+ - External service SDKs (payment, email, auth providers)
68
+ - Databases when a test DB isn't practical (prefer real test DBs)
69
+ - Time, randomness, filesystem
70
+
71
+ Examples of internals (never mock these):
72
+ - Your own stores, reducers, or state management
73
+ - Your own hooks, composables, or services
74
+ - Your own utility/helper modules
75
+ - Internal class collaborators
76
+
77
+ **If you're mocking something you wrote, you're testing implementation, not behavior.** The test will break on refactor and catch nothing real.
78
+
79
+ ## Mock Fidelity
80
+
81
+ Mocks lie. Every mock encodes assumptions about external behavior. When those assumptions are wrong, tests pass but the app breaks.
82
+
83
+ **After writing tests with mocks, verify mock fidelity:**
84
+ - Does your mock return the same data shapes the real system returns?
85
+ - Does your mock follow the same timing/ordering the real system follows?
86
+ - Write at least one test against the real system (or realistic test double) per mocked boundary.
@@ -0,0 +1,10 @@
1
+ # Refactor Candidates
2
+
3
+ After TDD cycle, look for:
4
+
5
+ - **Duplication** → Extract function/class
6
+ - **Long methods** → Break into private helpers (keep tests on public interface)
7
+ - **Shallow modules** → Combine or deepen
8
+ - **Feature envy** → Move logic to where data lives
9
+ - **Primitive obsession** → Introduce value objects
10
+ - **Existing code** the new code reveals as problematic
@@ -0,0 +1,98 @@
1
+ # Good and Bad Tests
2
+
3
+ ## Good Tests
4
+
5
+ **Integration-style**: Test through real interfaces, not mocks of internal parts.
6
+
7
+ ```text
8
+ // GOOD: Tests observable behavior
9
+ test "user can checkout with valid cart":
10
+ cart = create_cart()
11
+ cart.add(product)
12
+ result = checkout(cart, payment_method)
13
+ assert result.status == "confirmed"
14
+ ```
15
+
16
+ Characteristics:
17
+
18
+ - Tests behavior users/callers care about
19
+ - Uses public API only
20
+ - Survives internal refactors
21
+ - Describes WHAT, not HOW
22
+ - One logical assertion per test
23
+
24
+ ## Bad Tests
25
+
26
+ **Implementation-detail tests**: Coupled to internal structure.
27
+
28
+ ```text
29
+ // BAD: Tests implementation details
30
+ test "checkout calls payment service internals":
31
+ payment_spy = spy(payment_service)
32
+ checkout(cart, payment_spy)
33
+ assert payment_spy.process_called_with(cart.total)
34
+ ```
35
+
36
+ Red flags:
37
+
38
+ - Mocking internal collaborators
39
+ - Testing private methods
40
+ - Asserting on call counts/order
41
+ - Test breaks when refactoring without behavior change
42
+ - Test name describes HOW not WHAT
43
+ - Verifying through external means instead of interface
44
+
45
+ ```text
46
+ // BAD: Bypasses interface to verify
47
+ test "create_user saves to database":
48
+ create_user(name="Alice")
49
+ row = db.query("SELECT * FROM users WHERE name = ?", ["Alice"])
50
+ assert row is not null
51
+
52
+ // GOOD: Verifies through interface
53
+ test "create_user makes user retrievable":
54
+ user = create_user(name="Alice")
55
+ retrieved = get_user(user.id)
56
+ assert retrieved.name == "Alice"
57
+ ```
58
+
59
+ ## What NOT to Test
60
+
61
+ **Test at system boundaries, not internal modules.** A system has two boundaries:
62
+
63
+ 1. **Server/backend boundary**: Test through the real runtime or framework test harness (HTTP requests, WebSocket messages, queue handlers). Exercise real storage, real state, real protocol handling.
64
+ 2. **Client/frontend boundary**: Test at the route/page level with external dependencies mocked at the edge (e.g., mock the network layer, not your own stores or hooks).
65
+
66
+ Tests at these two levels cover your internal modules (stores, hooks, services, helpers) transitively. If a store has a bug, a route-level test that exercises the store's behavior will catch it.
67
+
68
+ **Do not write separate tests for:**
69
+
70
+ - **State management** (stores, reducers, state machines) — covered by route/page tests that trigger the same state transitions through user interactions
71
+ - **Custom hooks / composables** — covered by route/page tests that use the hook through a real component
72
+ - **Individual UI components** — covered by route/page tests that render the full page including those components
73
+ - **Config files** (CI workflows, bundler config, deploy config) — not behavioral; breaks when config format changes, catches nothing useful
74
+ - **Design tokens / CSS classes** — testing class name presence doesn't verify visual fidelity; either trust the design system or use visual regression tools
75
+
76
+ **Do write separate unit tests for:**
77
+
78
+ - **Pure algorithmic functions** where the math matters (rounding, scoring, splitting, fuzzy matching, validation logic). These have complex edge cases that are cheaper to test in isolation.
79
+
80
+ ```text
81
+ // BAD: Testing internal state management separately
82
+ test "store updates count on increment message":
83
+ store = create_store()
84
+ store.handle_message({ type: "increment" })
85
+ assert store.state.count == 1
86
+
87
+ // GOOD: Testing the same behavior through the UI boundary
88
+ test "user sees updated count after server sends increment":
89
+ render(CounterPage, { websocket: mock_ws })
90
+ mock_ws.receive({ type: "increment" })
91
+ assert screen.has_text("Count: 1")
92
+
93
+ // GOOD: Pure algorithm deserves its own unit test
94
+ test "proportional split rounds to exact total using largest-remainder":
95
+ result = split_proportional(total=100, weights=[1, 1, 1])
96
+ assert sum(result) == 100
97
+ assert result == [34, 33, 33]
98
+ ```