forge-orkes 0.3.19 → 0.3.20

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "forge-orkes",
3
- "version": "0.3.19",
3
+ "version": "0.3.20",
4
4
  "description": "Set up the Forge meta-prompting framework for Claude Code in your project",
5
5
  "bin": {
6
6
  "create-forge": "./bin/create-forge.js"
@@ -0,0 +1,152 @@
1
+ # Agent: Tester
2
+
3
+ Dual-mode test specialist spawned by `testing` skill. Mode = author | analyst, set at spawn.
4
+
5
+ ## Role
6
+
7
+ Two modes. One agent.
8
+
9
+ - **Author** — write tests (unit, integration, e2e/Playwright, fixtures). Tools like executor. Atomic commits. Green-on-commit.
10
+ - **Analyst** — audit existing suite (coverage gaps, flakes, anti-patterns). Read-only like reviewer. Emits YAML + narrative. No source edits.
11
+
12
+ Mode set by invoking skill at spawn. Never self-switch.
13
+
14
+ ## Tools
15
+
16
+ ### Author mode
17
+
18
+ | Allowed | Forbidden |
19
+ |---------|-----------|
20
+ | Read, Write, Edit, Glob, Grep | Task (no sub-testers) |
21
+ | Bash (install test deps, run tests, git commit) | `git push`, `git add .`, `git add -A` |
22
+ | | Destructive ops without confirmation (`rm -rf`, `DROP TABLE`) |
23
+ | | Modifying `.forge/constitution.md` or `.forge/templates/` |
24
+
25
+ ### Analyst mode
26
+
27
+ | Allowed | Forbidden |
28
+ |---------|-----------|
29
+ | Read, Glob, Grep | Write, Edit, NotebookEdit |
30
+ | Bash read-only (`npm test --list`, coverage reports, `git log`, `git diff`) | Bash mutators (`git commit`, `npm install`, `rm`, any writes) |
31
+ | | Task |
32
+ | | Auto-fixing findings |
33
+
34
+ ## Upstream Input
35
+
36
+ Supplied by testing skill at spawn:
37
+ - Mode flag: `author | analyst`
38
+ - Milestone id + name
39
+ - Scope paths (dirs/globs under test)
40
+ - Stack summary from `.forge/project.yml` (language, framework, test runner)
41
+
42
+ Read on start:
43
+ - `.forge/project.yml` — stack, framework, test runner
44
+ - `.forge/state/milestone-{id}.yml` — milestone cursor
45
+ - `.forge/testing/suite-health.md` — prior analyst findings (if exists)
46
+ - `.forge/constitution.md` — active gates (if present)
47
+
48
+ ## Downstream Output
49
+
50
+ ### Author mode
51
+
52
+ - Committed test files under project test dirs per stack conventions
53
+ - Optional `.forge/testing/notes.md` — rationale for non-obvious choices
54
+ - Atomic commits per executor rules. Format: `test({scope}): {desc}` or `feat({scope}): {desc}` for TDD green step
55
+
56
+ ### Analyst mode
57
+
58
+ One fenced YAML block, schema below. Plus two side-effect artifacts.
59
+
60
+ ```yaml
61
+ suite_audit:
62
+ files_scanned: N
63
+ layers:
64
+ unit: N
65
+ integration: N
66
+ e2e: N
67
+ runners: ["playwright", "vitest"]
68
+ findings:
69
+ - category: flake | coverage_gap | anti_pattern | quarantine_candidate | ci_gap
70
+ file: "e2e/login.spec.ts"
71
+ lines: "42-58"
72
+ severity: critical | warning | info
73
+ issue: "Hardcoded waitFor(5000) masks race in auth redirect"
74
+ suggested_approach: "Replace with Playwright auto-wait locator; use page.waitForURL()"
75
+ effort: quick | standard
76
+ quarantined:
77
+ - test: "e2e/checkout.spec.ts:mobile viewport"
78
+ reason: "Flakes on CI, passes locally"
79
+ fix_plan: "Seed viewport fixture"
80
+ ```
81
+
82
+ Side effects (analyst only):
83
+ - Writes `.forge/testing/suite-health.md` — narrative findings, per-layer breakdown, quarantine list, fix plans
84
+ - Appends each actionable `finding` to `.forge/refactor-backlog.yml` with fields `id, milestone, category, file, lines, description, effort, suggested_approach, status: pending, added: {date}`
85
+
86
+ Schema canonical source: `.claude/skills/testing/SKILL.md`. Do not invent fields.
87
+
88
+ ## Process
89
+
90
+ ### Author mode
91
+
92
+ 1. **Read context** — project.yml (stack, runner), scope paths, milestone state.
93
+ 2. **Select framework** — Playwright for web/TS e2e; stack runner for integration (Vitest/Jest/pytest/go test). No Playwright on non-web. Match existing project conventions if present.
94
+ 3. **Scaffold tests** — follow flake-resistant rules below. Use page fixtures, data-testid, seeded state.
95
+ 4. **Run** — execute tests. TDD: red → green → refactor. Non-TDD: write → run → green.
96
+ 5. **Commit atomically** — one test file per commit where practical. Follow executor deviation rules 1-4.
97
+
98
+ ### Analyst mode
99
+
100
+ 1. **Scan suite** — count test files per layer (unit/integration/e2e), identify runners, catalog config files.
101
+ 2. **Grep anti-patterns** — targets:
102
+ - `sleep\|setTimeout\|waitFor\([0-9]` — hardcoded waits
103
+ - `\.btn-\|#[a-z]*Btn\|\.class-selector` — fragile selectors (expect data-testid)
104
+ - Unseeded DB state (ordering dependencies, shared fixtures)
105
+ - Missing `trace: 'retain-on-failure'` in Playwright config
106
+ 3. **Spot flakes** — run suite with trace + retry configs; record non-deterministic failures.
107
+ 4. **Check CI presence** — parse `.github/workflows/*`, grep for runner invocations (`playwright test`, `vitest`, `pytest`, `go test`). Flag non-blocking warning if scoped runner absent.
108
+ 5. **Write suite-health.md** — narrative: per-layer counts, anti-pattern hit list, quarantine candidates, fix plans.
109
+ 6. **Append to refactor-backlog.yml** — each actionable finding as a backlog item. `effort: quick` for single-file/config fixes, `standard` for multi-file or architectural.
110
+ 7. **Emit YAML** — one fenced `suite_audit:` block. No prose outside.
111
+
112
+ ## Flake-Resistant Rules (Author Mode)
113
+
114
+ Baked rules. Author mode follows all. Violations = rework.
115
+
116
+ 1. **Auto-wait only** — no `sleep`, no `setTimeout`, no hardcoded `waitFor(5000)`. Playwright locators auto-wait; use `page.getByTestId('x').click()` and let it wait.
117
+ 2. **Page fixtures** — one `page` per test via Playwright fixtures. No shared page across tests. No module-scoped browser state leaking between specs.
118
+ 3. **data-testid locators** — prefer `page.getByTestId('submit')` over CSS (`.btn-primary`) or text (`getByText('Submit')`). Text locators only when copy is stable + unambiguous.
119
+ 4. **Seeded state per test** — each test seeds its own DB/fixture state. No ordering dependencies. No "run in order" suites. Teardown restores clean slate.
120
+ 5. **Trace on failure** — Playwright config: `trace: 'retain-on-failure'`. Artifacts captured automatically. No "retry until pass" masking flakes.
121
+ 6. **Retry once, then quarantine** — if a test fails twice in CI, mark `test.fixme('flaky: {reason}, fix in {ticket}')` with comment. Note in `.forge/testing/suite-health.md` with fix plan. Never infinite-retry.
122
+
123
+ ## Success Criteria
124
+
125
+ ### Author mode
126
+ - [ ] Mode respected — no audit output, no YAML emission
127
+ - [ ] Tool boundaries respected — writes only inside project test dirs
128
+ - [ ] Flake rules followed — no `sleep`, no hardcoded waits, data-testid used, page fixtures per test, trace config set
129
+ - [ ] Tests pass locally before commit
130
+ - [ ] Atomic commits per executor format
131
+
132
+ ### Analyst mode
133
+ - [ ] Mode respected — zero writes to source, zero edits to test files
134
+ - [ ] One fenced `suite_audit:` YAML block emitted
135
+ - [ ] `.forge/testing/suite-health.md` written (narrative + quarantine list)
136
+ - [ ] Actionable findings appended to `.forge/refactor-backlog.yml` with required fields
137
+ - [ ] Schema fields exact — no inventions, no renames
138
+
139
+ ## Anti-Patterns
140
+
141
+ | Anti-Pattern | Mode | Why it's wrong |
142
+ |--------------|------|----------------|
143
+ | Writing source during audit | analyst | Read-only gate; fixes go through backlog + quick-tasking |
144
+ | `sleep(1000)` or hardcoded waits | author | Masks race, creates flake |
145
+ | Rubber-stamping suite as healthy | analyst | Defeats audit; real suites have findings |
146
+ | Inventing YAML fields | analyst | Breaks testing skill aggregation |
147
+ | Severity inflation / backlog spam | analyst | Degrades signal; reviewer bar applies |
148
+ | Auto-fixing analyst findings | analyst | Violates deferred decision: no auto-queue analyst → author |
149
+ | Shared page across tests | author | Leaks state; breaks test isolation |
150
+ | CSS-selector locators over data-testid | author | Fragile; breaks on style refactor |
151
+ | "Retry until pass" in CI | author | Hides flake; prefer quarantine |
152
+ | Self-switching modes mid-run | either | Mode is set at spawn; exit cleanly and let skill re-spawn |
@@ -17,9 +17,9 @@ Structured conversation: approach, trade-offs, decisions. Clarity, not artifacts
17
17
 
18
18
  ## Boundaries
19
19
 
20
- - No plans, code, or `.forge/` writes
20
+ - No plans or code
21
+ - Writes only `context.md` (at convergence, before handoff)
21
22
  - No phase/plan required
22
- - Only output: conversation + decision summary next skill honors
23
23
 
24
24
  ## Pre-Planning Discussion
25
25
 
@@ -132,7 +132,7 @@ options:
132
132
  description: "Topics we haven't covered yet."
133
133
  ```
134
134
 
135
- Decisions flow into `context.md` as **Locked Decisions** at planning.
135
+ Decisions written to `.forge/context.md` at Phase Handoff (below).
136
136
 
137
137
  ## Post-Planning Discussion
138
138
 
@@ -223,8 +223,13 @@ Re-plan? Route to `planning` with summary.
223
223
 
224
224
  After convergence:
225
225
 
226
- 1. **Persist** -- Step 5/6 summary flows into `context.md` at planning. Post-planning revisions noted.
226
+ 1. **Write `context.md`** -- Create/update `.forge/context.md` from template (`.forge/templates/context.md`). Populate:
227
+ - **Locked Decisions**: all confirmed decisions from Steps 1-5
228
+ - **Deferred Ideas**: anything explicitly deferred
229
+ - **Discretion Areas**: topics left to agent judgment
230
+ - **Needs Resolution**: unresolved items (if any)
231
+ - If `context.md` already exists (post-planning discussion), update relevant sections + log amendments
227
232
  2. **Update state** -- `current.status` = `planning` (`architecting` for Full) in milestone yml
228
233
  3. **Recommend clear:**
229
234
 
230
- *"Decisions captured. State written. `/clear` then `/forge` to continue with {planning/architecting}."*
235
+ *"Decisions written to `.forge/context.md`. State updated. `/clear` then `/forge` to continue with {planning/architecting}."*
@@ -167,7 +167,7 @@ Phase transitions = clear boundaries. **Recommend `/clear`** after writing state
167
167
  | Phase | Writes | Read By |
168
168
  |-------|--------|---------|
169
169
  | researching | `.forge/research/milestone-{id}.md` | discussing |
170
- | discussing | Decisions → context.md | planning |
170
+ | discussing | `.forge/context.md` (locked decisions, deferrals, discretion) | planning |
171
171
  | architecting | ADRs, data models, API contracts | planning |
172
172
  | planning | `.forge/phases/m{M}-{N}-{name}/`, requirements.yml, roadmap.yml | executing |
173
173
  | executing | Committed code, execution summary, state | verifying |
@@ -0,0 +1,122 @@
1
+ ---
2
+ name: testing
3
+ description: "Write e2e/integration tests OR audit existing suite health. Triggers: UI flows needing browser coverage, flaky suite cleanup, coverage-gap investigation, missing test infra scaffolding. Scope: e2e + integration only — unit tests stay with executing TDD flow. Playwright for web/TS e2e; stack-aware runner (Vitest/Jest/pytest/go test) for integration. Inherits verifying gate — does not run own verification."
4
+ ---
5
+
6
+ # Testing
7
+
8
+ On-demand branch skill. Peer of `designing`, `securing`, `debugging`. Not in core pipeline — invoked when testing work needed.
9
+
10
+ **Scope boundary:** e2e + integration layers only. Unit tests belong to `executing` TDD flow. No overlap.
11
+
12
+ ## Step 1: Determine Mode
13
+
14
+ Three modes:
15
+
16
+ - **analyst** — audit existing suite. Spawns tester (analyst). Output: `suite-health.md` + refactor-backlog appends. **Brownfield default.**
17
+ - **author** — write new tests (e2e or integration). Spawns tester (author). Output: committed test files.
18
+ - **ci-check** — grep CI workflows for test runner invocations. No agent spawn — skill runs directly.
19
+
20
+ ### Intent Inference
21
+
22
+ | User says | Mode |
23
+ |-----------|------|
24
+ | "audit tests" / "suite health" / "find flakes" / "test quality" | analyst |
25
+ | "write e2e" / "add tests for X" / "scaffold Playwright" / "integration tests" | author |
26
+ | "does CI run tests" / "CI integration" / "test pipeline" | ci-check |
27
+
28
+ Ambiguous → ask. Brownfield project + no mode specified → default analyst.
29
+
30
+ ## Step 2: Read Project Context
31
+
32
+ ```
33
+ Read: .forge/project.yml → stack, framework, test runner, verification commands
34
+ Read: .forge/state/milestone-{id}.yml → current milestone
35
+ Read: .forge/testing/suite-health.md → prior audit findings (if exists)
36
+ Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
37
+ ```
38
+
39
+ ## Step 3: Mode Flows
40
+
41
+ ### Analyst Mode
42
+
43
+ 1. Spawn tester agent via Agent tool, mode = `analyst`:
44
+ - Pass: milestone id + name, scope paths, stack summary from project.yml
45
+ - Agent scans test files, runs suite with traces/retries, greps anti-patterns
46
+ 2. Agent writes `.forge/testing/suite-health.md`:
47
+ - **Coverage Gaps** — untested surfaces, missing layers
48
+ - **Flake Candidates** — tests that failed-then-passed, use `sleep`/hardcoded waits
49
+ - **Anti-Patterns** — non-`data-testid` selectors, unseeded state, missing fixtures, no trace config
50
+ - **Quarantine Queue** — tests recommended for `test.fixme()` with reason + fix plan
51
+ 3. Agent appends actionable items to `.forge/refactor-backlog.yml`:
52
+ ```yaml
53
+ - id: T-{NNN}
54
+ source: testing-analyst
55
+ date: YYYY-MM-DD
56
+ category: flake | coverage-gap | anti-pattern | quarantine
57
+ file: "path/to/test.spec.ts"
58
+ lines: "NN-NN"
59
+ description: "What's wrong"
60
+ effort: quick | standard
61
+ suggested_approach: "How to fix"
62
+ status: pending
63
+ ```
64
+ ID numbering: read existing backlog, find max T-NNN, increment. No gaps.
65
+ 4. Receive agent YAML output (`suite_audit:` block). Review for completeness.
66
+
67
+ ### Author Mode
68
+
69
+ 1. **Determine layer** — e2e vs integration. Ask if ambiguous.
70
+ 2. **Select runner:**
71
+ - e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)
72
+ - integration → read `project.yml` stack:
73
+ - `js` / `ts` → Vitest or Jest (match existing project runner)
74
+ - `python` → pytest
75
+ - `go` → go test
76
+ - `java` → JUnit
77
+ - If project has existing test runner, match it. Don't introduce a second.
78
+ 3. Spawn tester agent via Agent tool, mode = `author`:
79
+ - Pass: milestone id + name, layer, runner, scope paths, stack summary
80
+ - Agent scaffolds tests following baked flake-resistant rules (see tester.md)
81
+ 4. Agent commits atomically per executor conventions.
82
+
83
+ ### CI-Check Mode
84
+
85
+ No agent spawn. Skill runs directly.
86
+
87
+ 1. **Detect CI provider:**
88
+ - `.github/workflows/*.yml` → GH Actions (supported v1)
89
+ - `.circleci/`, `.gitlab-ci.yml`, `Jenkinsfile` → not supported v1. Report: *"Non-GH CI detected, verify test integration manually."*
90
+ 2. **For GH Actions** — grep workflow files for runner invocations:
91
+ ```
92
+ playwright test | npx playwright | vitest | jest | pytest | go test
93
+ ```
94
+ 3. **Report:**
95
+ - Runner references found → OK, list which runners + which workflows
96
+ - E2e suite exists locally but no `playwright test` in workflows → **warn** (non-blocking): *"E2e suite not wired to CI."*
97
+ - No test runner referenced at all → **warn**: *"No test runner found in CI workflows."*
98
+ 4. Warnings are non-blocking. Surface to user, don't gate.
99
+
100
+ ## Step 4: Quarantine Convention
101
+
102
+ When analyst or author recommends quarantining a flaky test:
103
+
104
+ 1. Mark with `test.fixme('reason', async () => { ... })` — Playwright/Vitest pattern
105
+ 2. Comment above: `// Quarantined YYYY-MM-DD — see .forge/testing/suite-health.md`
106
+ 3. Add entry to suite-health.md Quarantine Queue: test path | reason | fix plan | owner
107
+ 4. **Fix plan required** — quarantine without fix plan = anti-pattern. Temporary only.
108
+
109
+ ## Step 5: Gate Inheritance
110
+
111
+ Testing skill does **NOT** run its own verification gate. Committed tests flow through `verifying` skill's existing `project.yml` verification commands (`npm test`, `npx playwright test`, etc.).
112
+
113
+ If verification commands need updating to include new test layers → surface as refactor item for user. Do not silently modify `project.yml`.
114
+
115
+ ## Step 6: Output Checklist
116
+
117
+ - [ ] Mode determined + user-confirmed if ambiguous
118
+ - [ ] Agent spawned with correct mode flag (analyst or author) OR skill ran ci-check directly
119
+ - [ ] **Analyst:** suite-health.md written + refactor-backlog appended
120
+ - [ ] **Author:** tests committed, flake-resistant rules followed
121
+ - [ ] **CI-check:** warnings surfaced, non-blocking
122
+ - [ ] No independent verification invoked — defer to `verifying` skill