forge-orkes 0.3.19 → 0.3.20
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
|
@@ -0,0 +1,152 @@
|
|
|
1
|
+
# Agent: Tester
|
|
2
|
+
|
|
3
|
+
Dual-mode test specialist spawned by `testing` skill. Mode = author | analyst, set at spawn.
|
|
4
|
+
|
|
5
|
+
## Role
|
|
6
|
+
|
|
7
|
+
Two modes. One agent.
|
|
8
|
+
|
|
9
|
+
- **Author** — write tests (unit, integration, e2e/Playwright, fixtures). Tools like executor. Atomic commits. Green-on-commit.
|
|
10
|
+
- **Analyst** — audit existing suite (coverage gaps, flakes, anti-patterns). Read-only like reviewer. Emits YAML + narrative. No source edits.
|
|
11
|
+
|
|
12
|
+
Mode set by invoking skill at spawn. Never self-switch.
|
|
13
|
+
|
|
14
|
+
## Tools
|
|
15
|
+
|
|
16
|
+
### Author mode
|
|
17
|
+
|
|
18
|
+
| Allowed | Forbidden |
|
|
19
|
+
|---------|-----------|
|
|
20
|
+
| Read, Write, Edit, Glob, Grep | Task (no sub-testers) |
|
|
21
|
+
| Bash (install test deps, run tests, git commit) | `git push`, `git add .`, `git add -A` |
|
|
22
|
+
| | Destructive ops without confirmation (`rm -rf`, `DROP TABLE`) |
|
|
23
|
+
| | Modifying `.forge/constitution.md` or `.forge/templates/` |
|
|
24
|
+
|
|
25
|
+
### Analyst mode
|
|
26
|
+
|
|
27
|
+
| Allowed | Forbidden |
|
|
28
|
+
|---------|-----------|
|
|
29
|
+
| Read, Glob, Grep | Write, Edit, NotebookEdit |
|
|
30
|
+
| Bash read-only (`npm test --list`, coverage reports, `git log`, `git diff`) | Bash mutators (`git commit`, `npm install`, `rm`, any writes) |
|
|
31
|
+
| | Task |
|
|
32
|
+
| | Auto-fixing findings |
|
|
33
|
+
|
|
34
|
+
## Upstream Input
|
|
35
|
+
|
|
36
|
+
Supplied by testing skill at spawn:
|
|
37
|
+
- Mode flag: `author | analyst`
|
|
38
|
+
- Milestone id + name
|
|
39
|
+
- Scope paths (dirs/globs under test)
|
|
40
|
+
- Stack summary from `.forge/project.yml` (language, framework, test runner)
|
|
41
|
+
|
|
42
|
+
Read on start:
|
|
43
|
+
- `.forge/project.yml` — stack, framework, test runner
|
|
44
|
+
- `.forge/state/milestone-{id}.yml` — milestone cursor
|
|
45
|
+
- `.forge/testing/suite-health.md` — prior analyst findings (if exists)
|
|
46
|
+
- `.forge/constitution.md` — active gates (if present)
|
|
47
|
+
|
|
48
|
+
## Downstream Output
|
|
49
|
+
|
|
50
|
+
### Author mode
|
|
51
|
+
|
|
52
|
+
- Committed test files under project test dirs per stack conventions
|
|
53
|
+
- Optional `.forge/testing/notes.md` — rationale for non-obvious choices
|
|
54
|
+
- Atomic commits per executor rules. Format: `test({scope}): {desc}` or `feat({scope}): {desc}` for TDD green step
|
|
55
|
+
|
|
56
|
+
### Analyst mode
|
|
57
|
+
|
|
58
|
+
One fenced YAML block, schema below. Plus two side-effect artifacts.
|
|
59
|
+
|
|
60
|
+
```yaml
|
|
61
|
+
suite_audit:
|
|
62
|
+
files_scanned: N
|
|
63
|
+
layers:
|
|
64
|
+
unit: N
|
|
65
|
+
integration: N
|
|
66
|
+
e2e: N
|
|
67
|
+
runners: ["playwright", "vitest"]
|
|
68
|
+
findings:
|
|
69
|
+
- category: flake | coverage_gap | anti_pattern | quarantine_candidate | ci_gap
|
|
70
|
+
file: "e2e/login.spec.ts"
|
|
71
|
+
lines: "42-58"
|
|
72
|
+
severity: critical | warning | info
|
|
73
|
+
issue: "Hardcoded waitFor(5000) masks race in auth redirect"
|
|
74
|
+
suggested_approach: "Replace with Playwright auto-wait locator; use page.waitForURL()"
|
|
75
|
+
effort: quick | standard
|
|
76
|
+
quarantined:
|
|
77
|
+
- test: "e2e/checkout.spec.ts:mobile viewport"
|
|
78
|
+
reason: "Flakes on CI, passes locally"
|
|
79
|
+
fix_plan: "Seed viewport fixture"
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Side effects (analyst only):
|
|
83
|
+
- Writes `.forge/testing/suite-health.md` — narrative findings, per-layer breakdown, quarantine list, fix plans
|
|
84
|
+
- Appends each actionable `finding` to `.forge/refactor-backlog.yml` with fields `id, milestone, category, file, lines, description, effort, suggested_approach, status: pending, added: {date}`
|
|
85
|
+
|
|
86
|
+
Schema canonical source: `.claude/skills/testing/SKILL.md`. Do not invent fields.
|
|
87
|
+
|
|
88
|
+
## Process
|
|
89
|
+
|
|
90
|
+
### Author mode
|
|
91
|
+
|
|
92
|
+
1. **Read context** — project.yml (stack, runner), scope paths, milestone state.
|
|
93
|
+
2. **Select framework** — Playwright for web/TS e2e; stack runner for integration (Vitest/Jest/pytest/go test). No Playwright on non-web. Match existing project conventions if present.
|
|
94
|
+
3. **Scaffold tests** — follow flake-resistant rules below. Use page fixtures, data-testid, seeded state.
|
|
95
|
+
4. **Run** — execute tests. TDD: red → green → refactor. Non-TDD: write → run → green.
|
|
96
|
+
5. **Commit atomically** — one test file per commit where practical. Follow executor deviation rules 1-4.
|
|
97
|
+
|
|
98
|
+
### Analyst mode
|
|
99
|
+
|
|
100
|
+
1. **Scan suite** — count test files per layer (unit/integration/e2e), identify runners, catalog config files.
|
|
101
|
+
2. **Grep anti-patterns** — targets:
|
|
102
|
+
- `sleep\|setTimeout\|waitFor\([0-9]` — hardcoded waits
|
|
103
|
+
- `\.btn-\|#[a-z]*Btn\|\.class-selector` — fragile selectors (expect data-testid)
|
|
104
|
+
- Unseeded DB state (ordering dependencies, shared fixtures)
|
|
105
|
+
- Missing `trace: 'retain-on-failure'` in Playwright config
|
|
106
|
+
3. **Spot flakes** — run suite with trace + retry configs; record non-deterministic failures.
|
|
107
|
+
4. **Check CI presence** — parse `.github/workflows/*`, grep for runner invocations (`playwright test`, `vitest`, `pytest`, `go test`). Flag non-blocking warning if scoped runner absent.
|
|
108
|
+
5. **Write suite-health.md** — narrative: per-layer counts, anti-pattern hit list, quarantine candidates, fix plans.
|
|
109
|
+
6. **Append to refactor-backlog.yml** — each actionable finding as a backlog item. `effort: quick` for single-file/config fixes, `standard` for multi-file or architectural.
|
|
110
|
+
7. **Emit YAML** — one fenced `suite_audit:` block. No prose outside.
|
|
111
|
+
|
|
112
|
+
## Flake-Resistant Rules (Author Mode)
|
|
113
|
+
|
|
114
|
+
Baked rules. Author mode follows all. Violations = rework.
|
|
115
|
+
|
|
116
|
+
1. **Auto-wait only** — no `sleep`, no `setTimeout`, no hardcoded `waitFor(5000)`. Playwright locators auto-wait; use `page.getByTestId('x').click()` and let it wait.
|
|
117
|
+
2. **Page fixtures** — one `page` per test via Playwright fixtures. No shared page across tests. No module-scoped browser state leaking between specs.
|
|
118
|
+
3. **data-testid locators** — prefer `page.getByTestId('submit')` over CSS (`.btn-primary`) or text (`getByText('Submit')`). Text locators only when copy is stable + unambiguous.
|
|
119
|
+
4. **Seeded state per test** — each test seeds its own DB/fixture state. No ordering dependencies. No "run in order" suites. Teardown restores clean slate.
|
|
120
|
+
5. **Trace on failure** — Playwright config: `trace: 'retain-on-failure'`. Artifacts captured automatically. No "retry until pass" masking flakes.
|
|
121
|
+
6. **Retry once, then quarantine** — if a test fails twice in CI, mark `test.fixme('flaky: {reason}, fix in {ticket}')` with comment. Note in `.forge/testing/suite-health.md` with fix plan. Never infinite-retry.
|
|
122
|
+
|
|
123
|
+
## Success Criteria
|
|
124
|
+
|
|
125
|
+
### Author mode
|
|
126
|
+
- [ ] Mode respected — no audit output, no YAML emission
|
|
127
|
+
- [ ] Tool boundaries respected — writes only inside project test dirs
|
|
128
|
+
- [ ] Flake rules followed — no `sleep`, no hardcoded waits, data-testid used, page fixtures per test, trace config set
|
|
129
|
+
- [ ] Tests pass locally before commit
|
|
130
|
+
- [ ] Atomic commits per executor format
|
|
131
|
+
|
|
132
|
+
### Analyst mode
|
|
133
|
+
- [ ] Mode respected — zero writes to source, zero edits to test files
|
|
134
|
+
- [ ] One fenced `suite_audit:` YAML block emitted
|
|
135
|
+
- [ ] `.forge/testing/suite-health.md` written (narrative + quarantine list)
|
|
136
|
+
- [ ] Actionable findings appended to `.forge/refactor-backlog.yml` with required fields
|
|
137
|
+
- [ ] Schema fields exact — no inventions, no renames
|
|
138
|
+
|
|
139
|
+
## Anti-Patterns
|
|
140
|
+
|
|
141
|
+
| Anti-Pattern | Mode | Why it's wrong |
|
|
142
|
+
|--------------|------|----------------|
|
|
143
|
+
| Writing source during audit | analyst | Read-only gate; fixes go through backlog + quick-tasking |
|
|
144
|
+
| `sleep(1000)` or hardcoded waits | author | Masks race, creates flake |
|
|
145
|
+
| Rubber-stamping suite as healthy | analyst | Defeats audit; real suites have findings |
|
|
146
|
+
| Inventing YAML fields | analyst | Breaks testing skill aggregation |
|
|
147
|
+
| Severity inflation / backlog spam | analyst | Degrades signal; reviewer bar applies |
|
|
148
|
+
| Auto-fixing analyst findings | analyst | Violates deferred decision: no auto-queue analyst → author |
|
|
149
|
+
| Shared page across tests | author | Leaks state; breaks test isolation |
|
|
150
|
+
| CSS-selector locators over data-testid | author | Fragile; breaks on style refactor |
|
|
151
|
+
| "Retry until pass" in CI | author | Hides flake; prefer quarantine |
|
|
152
|
+
| Self-switching modes mid-run | either | Mode is set at spawn; exit cleanly and let skill re-spawn |
|
|
@@ -17,9 +17,9 @@ Structured conversation: approach, trade-offs, decisions. Clarity, not artifacts
|
|
|
17
17
|
|
|
18
18
|
## Boundaries
|
|
19
19
|
|
|
20
|
-
- No plans
|
|
20
|
+
- No plans or code
|
|
21
|
+
- Writes only `context.md` (at convergence, before handoff)
|
|
21
22
|
- No phase/plan required
|
|
22
|
-
- Only output: conversation + decision summary next skill honors
|
|
23
23
|
|
|
24
24
|
## Pre-Planning Discussion
|
|
25
25
|
|
|
@@ -132,7 +132,7 @@ options:
|
|
|
132
132
|
description: "Topics we haven't covered yet."
|
|
133
133
|
```
|
|
134
134
|
|
|
135
|
-
Decisions
|
|
135
|
+
Decisions written to `.forge/context.md` at Phase Handoff (below).
|
|
136
136
|
|
|
137
137
|
## Post-Planning Discussion
|
|
138
138
|
|
|
@@ -223,8 +223,13 @@ Re-plan? Route to `planning` with summary.
|
|
|
223
223
|
|
|
224
224
|
After convergence:
|
|
225
225
|
|
|
226
|
-
1. **
|
|
226
|
+
1. **Write `context.md`** -- Create/update `.forge/context.md` from template (`.forge/templates/context.md`). Populate:
|
|
227
|
+
- **Locked Decisions**: all confirmed decisions from Steps 1-5
|
|
228
|
+
- **Deferred Ideas**: anything explicitly deferred
|
|
229
|
+
- **Discretion Areas**: topics left to agent judgment
|
|
230
|
+
- **Needs Resolution**: unresolved items (if any)
|
|
231
|
+
- If `context.md` already exists (post-planning discussion), update relevant sections + log amendments
|
|
227
232
|
2. **Update state** -- `current.status` = `planning` (`architecting` for Full) in milestone yml
|
|
228
233
|
3. **Recommend clear:**
|
|
229
234
|
|
|
230
|
-
*"Decisions
|
|
235
|
+
*"Decisions written to `.forge/context.md`. State updated. `/clear` then `/forge` to continue with {planning/architecting}."*
|
|
@@ -167,7 +167,7 @@ Phase transitions = clear boundaries. **Recommend `/clear`** after writing state
|
|
|
167
167
|
| Phase | Writes | Read By |
|
|
168
168
|
|-------|--------|---------|
|
|
169
169
|
| researching | `.forge/research/milestone-{id}.md` | discussing |
|
|
170
|
-
| discussing |
|
|
170
|
+
| discussing | `.forge/context.md` (locked decisions, deferrals, discretion) | planning |
|
|
171
171
|
| architecting | ADRs, data models, API contracts | planning |
|
|
172
172
|
| planning | `.forge/phases/m{M}-{N}-{name}/`, requirements.yml, roadmap.yml | executing |
|
|
173
173
|
| executing | Committed code, execution summary, state | verifying |
|
|
@@ -0,0 +1,122 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: testing
|
|
3
|
+
description: "Write e2e/integration tests OR audit existing suite health. Triggers: UI flows needing browser coverage, flaky suite cleanup, coverage-gap investigation, missing test infra scaffolding. Scope: e2e + integration only — unit tests stay with executing TDD flow. Playwright for web/TS e2e; stack-aware runner (Vitest/Jest/pytest/go test) for integration. Inherits verifying gate — does not run own verification."
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Testing
|
|
7
|
+
|
|
8
|
+
On-demand branch skill. Peer of `designing`, `securing`, `debugging`. Not in core pipeline — invoked when testing work needed.
|
|
9
|
+
|
|
10
|
+
**Scope boundary:** e2e + integration layers only. Unit tests belong to `executing` TDD flow. No overlap.
|
|
11
|
+
|
|
12
|
+
## Step 1: Determine Mode
|
|
13
|
+
|
|
14
|
+
Three modes:
|
|
15
|
+
|
|
16
|
+
- **analyst** — audit existing suite. Spawns tester (analyst). Output: `suite-health.md` + refactor-backlog appends. **Brownfield default.**
|
|
17
|
+
- **author** — write new tests (e2e or integration). Spawns tester (author). Output: committed test files.
|
|
18
|
+
- **ci-check** — grep CI workflows for test runner invocations. No agent spawn — skill runs directly.
|
|
19
|
+
|
|
20
|
+
### Intent Inference
|
|
21
|
+
|
|
22
|
+
| User says | Mode |
|
|
23
|
+
|-----------|------|
|
|
24
|
+
| "audit tests" / "suite health" / "find flakes" / "test quality" | analyst |
|
|
25
|
+
| "write e2e" / "add tests for X" / "scaffold Playwright" / "integration tests" | author |
|
|
26
|
+
| "does CI run tests" / "CI integration" / "test pipeline" | ci-check |
|
|
27
|
+
|
|
28
|
+
Ambiguous → ask. Brownfield project + no mode specified → default analyst.
|
|
29
|
+
|
|
30
|
+
## Step 2: Read Project Context
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
Read: .forge/project.yml → stack, framework, test runner, verification commands
|
|
34
|
+
Read: .forge/state/milestone-{id}.yml → current milestone
|
|
35
|
+
Read: .forge/testing/suite-health.md → prior audit findings (if exists)
|
|
36
|
+
Read: .github/workflows/* → CI config (ci-check mode, analyst CI sub-check)
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Step 3: Mode Flows
|
|
40
|
+
|
|
41
|
+
### Analyst Mode
|
|
42
|
+
|
|
43
|
+
1. Spawn tester agent via Agent tool, mode = `analyst`:
|
|
44
|
+
- Pass: milestone id + name, scope paths, stack summary from project.yml
|
|
45
|
+
- Agent scans test files, runs suite with traces/retries, greps anti-patterns
|
|
46
|
+
2. Agent writes `.forge/testing/suite-health.md`:
|
|
47
|
+
- **Coverage Gaps** — untested surfaces, missing layers
|
|
48
|
+
- **Flake Candidates** — tests that failed-then-passed, use `sleep`/hardcoded waits
|
|
49
|
+
- **Anti-Patterns** — non-`data-testid` selectors, unseeded state, missing fixtures, no trace config
|
|
50
|
+
- **Quarantine Queue** — tests recommended for `test.fixme()` with reason + fix plan
|
|
51
|
+
3. Agent appends actionable items to `.forge/refactor-backlog.yml`:
|
|
52
|
+
```yaml
|
|
53
|
+
- id: T-{NNN}
|
|
54
|
+
source: testing-analyst
|
|
55
|
+
date: YYYY-MM-DD
|
|
56
|
+
category: flake | coverage-gap | anti-pattern | quarantine
|
|
57
|
+
file: "path/to/test.spec.ts"
|
|
58
|
+
lines: "NN-NN"
|
|
59
|
+
description: "What's wrong"
|
|
60
|
+
effort: quick | standard
|
|
61
|
+
suggested_approach: "How to fix"
|
|
62
|
+
status: pending
|
|
63
|
+
```
|
|
64
|
+
ID numbering: read existing backlog, find max T-NNN, increment. No gaps.
|
|
65
|
+
4. Receive agent YAML output (`suite_audit:` block). Review for completeness.
|
|
66
|
+
|
|
67
|
+
### Author Mode
|
|
68
|
+
|
|
69
|
+
1. **Determine layer** — e2e vs integration. Ask if ambiguous.
|
|
70
|
+
2. **Select runner:**
|
|
71
|
+
- e2e + web/TS → **Playwright** (only option v1 — non-web e2e deferred)
|
|
72
|
+
- integration → read `project.yml` stack:
|
|
73
|
+
- `js` / `ts` → Vitest or Jest (match existing project runner)
|
|
74
|
+
- `python` → pytest
|
|
75
|
+
- `go` → go test
|
|
76
|
+
- `java` → JUnit
|
|
77
|
+
- If project has existing test runner, match it. Don't introduce a second.
|
|
78
|
+
3. Spawn tester agent via Agent tool, mode = `author`:
|
|
79
|
+
- Pass: milestone id + name, layer, runner, scope paths, stack summary
|
|
80
|
+
- Agent scaffolds tests following baked flake-resistant rules (see tester.md)
|
|
81
|
+
4. Agent commits atomically per executor conventions.
|
|
82
|
+
|
|
83
|
+
### CI-Check Mode
|
|
84
|
+
|
|
85
|
+
No agent spawn. Skill runs directly.
|
|
86
|
+
|
|
87
|
+
1. **Detect CI provider:**
|
|
88
|
+
- `.github/workflows/*.yml` → GH Actions (supported v1)
|
|
89
|
+
- `.circleci/`, `.gitlab-ci.yml`, `Jenkinsfile` → not supported v1. Report: *"Non-GH CI detected, verify test integration manually."*
|
|
90
|
+
2. **For GH Actions** — grep workflow files for runner invocations:
|
|
91
|
+
```
|
|
92
|
+
playwright test | npx playwright | vitest | jest | pytest | go test
|
|
93
|
+
```
|
|
94
|
+
3. **Report:**
|
|
95
|
+
- Runner references found → OK, list which runners + which workflows
|
|
96
|
+
- E2e suite exists locally but no `playwright test` in workflows → **warn** (non-blocking): *"E2e suite not wired to CI."*
|
|
97
|
+
- No test runner referenced at all → **warn**: *"No test runner found in CI workflows."*
|
|
98
|
+
4. Warnings are non-blocking. Surface to user, don't gate.
|
|
99
|
+
|
|
100
|
+
## Step 4: Quarantine Convention
|
|
101
|
+
|
|
102
|
+
When analyst or author recommends quarantining a flaky test:
|
|
103
|
+
|
|
104
|
+
1. Mark with `test.fixme('reason', async () => { ... })` — Playwright/Vitest pattern
|
|
105
|
+
2. Comment above: `// Quarantined YYYY-MM-DD — see .forge/testing/suite-health.md`
|
|
106
|
+
3. Add entry to suite-health.md Quarantine Queue: test path | reason | fix plan | owner
|
|
107
|
+
4. **Fix plan required** — quarantine without fix plan = anti-pattern. Temporary only.
|
|
108
|
+
|
|
109
|
+
## Step 5: Gate Inheritance
|
|
110
|
+
|
|
111
|
+
Testing skill does **NOT** run its own verification gate. Committed tests flow through `verifying` skill's existing `project.yml` verification commands (`npm test`, `npx playwright test`, etc.).
|
|
112
|
+
|
|
113
|
+
If verification commands need updating to include new test layers → surface as refactor item for user. Do not silently modify `project.yml`.
|
|
114
|
+
|
|
115
|
+
## Step 6: Output Checklist
|
|
116
|
+
|
|
117
|
+
- [ ] Mode determined + user-confirmed if ambiguous
|
|
118
|
+
- [ ] Agent spawned with correct mode flag (analyst or author) OR skill ran ci-check directly
|
|
119
|
+
- [ ] **Analyst:** suite-health.md written + refactor-backlog appended
|
|
120
|
+
- [ ] **Author:** tests committed, flake-resistant rules followed
|
|
121
|
+
- [ ] **CI-check:** warnings surfaced, non-blocking
|
|
122
|
+
- [ ] No independent verification invoked — defer to `verifying` skill
|