ace-test 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.ace-defaults/nav/protocols/agent-sources/ace-test.yml +19 -0
- data/.ace-defaults/nav/protocols/guide-sources/ace-test.yml +19 -0
- data/.ace-defaults/nav/protocols/tmpl-sources/ace-test.yml +11 -0
- data/.ace-defaults/nav/protocols/wfi-sources/ace-test.yml +19 -0
- data/CHANGELOG.md +169 -0
- data/LICENSE +21 -0
- data/README.md +40 -0
- data/Rakefile +12 -0
- data/handbook/agents/mock.ag.md +164 -0
- data/handbook/agents/profile-tests.ag.md +132 -0
- data/handbook/agents/test.ag.md +99 -0
- data/handbook/guides/SUMMARY.md +95 -0
- data/handbook/guides/embedded-testing-guide.g.md +261 -0
- data/handbook/guides/mocking-patterns.g.md +464 -0
- data/handbook/guides/quick-reference.g.md +46 -0
- data/handbook/guides/test-driven-development-cycle/meta-documentation.md +26 -0
- data/handbook/guides/test-driven-development-cycle/ruby-application.md +18 -0
- data/handbook/guides/test-driven-development-cycle/ruby-gem.md +19 -0
- data/handbook/guides/test-driven-development-cycle/rust-cli.md +18 -0
- data/handbook/guides/test-driven-development-cycle/rust-wasm-zed.md +19 -0
- data/handbook/guides/test-driven-development-cycle/typescript-nuxt.md +18 -0
- data/handbook/guides/test-driven-development-cycle/typescript-vue.md +19 -0
- data/handbook/guides/test-layer-decision.g.md +261 -0
- data/handbook/guides/test-mocking-patterns.g.md +414 -0
- data/handbook/guides/test-organization.g.md +140 -0
- data/handbook/guides/test-performance.g.md +353 -0
- data/handbook/guides/test-responsibility-map.g.md +220 -0
- data/handbook/guides/test-review-checklist.g.md +231 -0
- data/handbook/guides/test-suite-health.g.md +337 -0
- data/handbook/guides/testable-code-patterns.g.md +315 -0
- data/handbook/guides/testing/ruby-rspec-config-examples.md +120 -0
- data/handbook/guides/testing/ruby-rspec.md +87 -0
- data/handbook/guides/testing/rust.md +52 -0
- data/handbook/guides/testing/test-maintenance.md +364 -0
- data/handbook/guides/testing/typescript-bun.md +47 -0
- data/handbook/guides/testing/vue-firebase-auth.md +546 -0
- data/handbook/guides/testing/vue-vitest.md +236 -0
- data/handbook/guides/testing-philosophy.g.md +82 -0
- data/handbook/guides/testing-strategy.g.md +151 -0
- data/handbook/guides/testing-tdd-cycle.g.md +146 -0
- data/handbook/guides/testing.g.md +170 -0
- data/handbook/skills/as-test-create-cases/SKILL.md +24 -0
- data/handbook/skills/as-test-fix/SKILL.md +26 -0
- data/handbook/skills/as-test-improve-coverage/SKILL.md +22 -0
- data/handbook/skills/as-test-optimize/SKILL.md +34 -0
- data/handbook/skills/as-test-performance-audit/SKILL.md +34 -0
- data/handbook/skills/as-test-plan/SKILL.md +34 -0
- data/handbook/skills/as-test-review/SKILL.md +34 -0
- data/handbook/skills/as-test-verify-suite/SKILL.md +45 -0
- data/handbook/templates/e2e-sandbox-checklist.template.md +289 -0
- data/handbook/templates/test-case.template.md +56 -0
- data/handbook/templates/test-performance-audit.template.md +132 -0
- data/handbook/templates/test-responsibility-map.template.md +92 -0
- data/handbook/templates/test-review-checklist.template.md +163 -0
- data/handbook/workflow-instructions/test/analyze-failures.wf.md +120 -0
- data/handbook/workflow-instructions/test/create-cases.wf.md +675 -0
- data/handbook/workflow-instructions/test/fix.wf.md +120 -0
- data/handbook/workflow-instructions/test/improve-coverage.wf.md +370 -0
- data/handbook/workflow-instructions/test/optimize.wf.md +368 -0
- data/handbook/workflow-instructions/test/performance-audit.wf.md +17 -0
- data/handbook/workflow-instructions/test/plan.wf.md +323 -0
- data/handbook/workflow-instructions/test/review.wf.md +16 -0
- data/handbook/workflow-instructions/test/verify-suite.wf.md +343 -0
- data/lib/ace/test/version.rb +7 -0
- data/lib/ace/test.rb +10 -0
- metadata +152 -0
|
@@ -0,0 +1,163 @@
|
|
|
1
|
+
---
|
|
2
|
+
doc-type: template
|
|
3
|
+
title: Test Review Checklist
|
|
4
|
+
purpose: Test PR review checklist
|
|
5
|
+
ace-docs:
|
|
6
|
+
last-updated: 2026-02-19
|
|
7
|
+
last-checked: 2026-03-21
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Test Review Checklist
|
|
11
|
+
|
|
12
|
+
**PR**: #{{number}}
|
|
13
|
+
**Package**: {{package}}
|
|
14
|
+
**Reviewer**: {{name}}
|
|
15
|
+
**Date**: {{date}}
|
|
16
|
+
|
|
17
|
+
## Quick Summary
|
|
18
|
+
|
|
19
|
+
- [ ] Tests added/modified: {{count}}
|
|
20
|
+
- [ ] Test type: Unit / Integration / E2E
|
|
21
|
+
- [ ] Performance verified: Yes / No / N/A
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
25
|
+
## 1. Layer Appropriateness
|
|
26
|
+
|
|
27
|
+
Is each test at the correct layer?
|
|
28
|
+
|
|
29
|
+
| Test | Current Layer | Correct? | Notes |
|
|
30
|
+
|------|---------------|----------|-------|
|
|
31
|
+
| {{test}} | Unit/Integration/E2E | Yes/No | {{notes}} |
|
|
32
|
+
|
|
33
|
+
**Checklist**:
|
|
34
|
+
- [ ] Unit tests have NO real I/O (subprocess, network, filesystem)
|
|
35
|
+
- [ ] Integration tests stub external dependencies
|
|
36
|
+
- [ ] E2E tests are in `test/e2e/TS-*/` format (scenario.yml + TC-*.tc.md)
|
|
37
|
+
- [ ] No flag permutation tests in E2E (should be unit)
|
|
38
|
+
- [ ] ONE CLI parity test per integration file max
|
|
39
|
+
|
|
40
|
+
## 2. Stubbing Quality
|
|
41
|
+
|
|
42
|
+
Are mocks/stubs correctly implemented?
|
|
43
|
+
|
|
44
|
+
**Checklist**:
|
|
45
|
+
- [ ] Boundary methods stubbed (not just inner methods)
|
|
46
|
+
- [ ] `available?` checks stubbed if `run` is stubbed
|
|
47
|
+
- [ ] No zombie mocks (stub targets match actual code)
|
|
48
|
+
- [ ] Mock data is realistic (from snapshots or schemas)
|
|
49
|
+
- [ ] Composite helpers used where appropriate
|
|
50
|
+
|
|
51
|
+
**Red Flags**:
|
|
52
|
+
- [ ] Deep nesting (>3 levels) without composite helper
|
|
53
|
+
- [ ] Stubbing private methods
|
|
54
|
+
- [ ] Mock expectations without behavior assertions
|
|
55
|
+
|
|
56
|
+
## 3. Behavior vs Implementation
|
|
57
|
+
|
|
58
|
+
Do tests verify behavior, not implementation details?
|
|
59
|
+
|
|
60
|
+
**Checklist**:
|
|
61
|
+
- [ ] Tests assert on OUTPUT, not method calls
|
|
62
|
+
- [ ] Tests survive internal refactoring
|
|
63
|
+
- [ ] Mock expectations only for side-effect methods
|
|
64
|
+
- [ ] No testing of private method order
|
|
65
|
+
|
|
66
|
+
**Example Check**:
|
|
67
|
+
```ruby
|
|
68
|
+
# BAD: Tests implementation
|
|
69
|
+
mock.verify # "Was X called?"
|
|
70
|
+
|
|
71
|
+
# GOOD: Tests behavior
|
|
72
|
+
assert_equal expected, result.output
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
## 4. Performance
|
|
76
|
+
|
|
77
|
+
Will tests run fast enough?
|
|
78
|
+
|
|
79
|
+
**Checklist**:
|
|
80
|
+
- [ ] Profiled with `ace-test --profile 5`
|
|
81
|
+
- [ ] Unit tests <100ms each
|
|
82
|
+
- [ ] No `sleep` calls without stubbing
|
|
83
|
+
- [ ] No subprocess calls without stubbing
|
|
84
|
+
- [ ] Cache pre-warming if needed
|
|
85
|
+
|
|
86
|
+
**Performance Check**:
|
|
87
|
+
```bash
|
|
88
|
+
ace-test {{package}} --profile 10
|
|
89
|
+
# Verify no new tests >100ms
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
## 5. Coverage Quality
|
|
93
|
+
|
|
94
|
+
Do tests actually catch bugs?
|
|
95
|
+
|
|
96
|
+
**Checklist**:
|
|
97
|
+
- [ ] Happy path tested
|
|
98
|
+
- [ ] Error cases tested
|
|
99
|
+
- [ ] Edge cases tested (nil, empty, boundaries)
|
|
100
|
+
- [ ] Test fails when code is broken (try breaking it)
|
|
101
|
+
|
|
102
|
+
**Negative Test Check**:
|
|
103
|
+
- [ ] At least one error scenario tested
|
|
104
|
+
- [ ] Invalid input handling verified
|
|
105
|
+
- [ ] Exception/error messages checked
|
|
106
|
+
|
|
107
|
+
## 6. E2E Specific (if applicable)
|
|
108
|
+
|
|
109
|
+
For E2E tests in TS-format (`TC-*.tc.md`):
|
|
110
|
+
|
|
111
|
+
**Checklist**:
|
|
112
|
+
- [ ] PASS/FAIL assertions are explicit
|
|
113
|
+
- [ ] File paths discovered at runtime, not hardcoded
|
|
114
|
+
- [ ] Error test cases included (not just happy path)
|
|
115
|
+
- [ ] Exit codes verified for error scenarios
|
|
116
|
+
- [ ] Cleanup documented
|
|
117
|
+
- [ ] Prerequisites listed
|
|
118
|
+
|
|
119
|
+
## 7. Test Organization
|
|
120
|
+
|
|
121
|
+
Is the test well-structured?
|
|
122
|
+
|
|
123
|
+
**Checklist**:
|
|
124
|
+
- [ ] Test file in correct directory (atoms/molecules/organisms/e2e)
|
|
125
|
+
- [ ] Test name describes behavior (`test_returns_error_for_invalid_input`)
|
|
126
|
+
- [ ] Arrange-Act-Assert pattern followed
|
|
127
|
+
- [ ] No test interdependencies
|
|
128
|
+
- [ ] Fixtures in `test/fixtures/` if shared
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Verdict
|
|
133
|
+
|
|
134
|
+
- [ ] **Approve**: Tests are well-designed and performant
|
|
135
|
+
- [ ] **Request Changes**: Issues identified above
|
|
136
|
+
- [ ] **Needs Discussion**: Architectural concerns
|
|
137
|
+
|
|
138
|
+
**Comments**:
|
|
139
|
+
|
|
140
|
+
{{reviewer_comments}}
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## Quick Reference
|
|
145
|
+
|
|
146
|
+
### Performance Thresholds
|
|
147
|
+
|
|
148
|
+
| Layer | Target | Warning | Critical |
|
|
149
|
+
|-------|--------|---------|----------|
|
|
150
|
+
| Unit (atoms) | <10ms | >50ms | >100ms |
|
|
151
|
+
| Unit (molecules) | <50ms | >100ms | >200ms |
|
|
152
|
+
| Integration | <500ms | >1s | >2s |
|
|
153
|
+
|
|
154
|
+
### Stub the Boundary Pattern
|
|
155
|
+
|
|
156
|
+
```ruby
|
|
157
|
+
# Always stub availability check if stubbing execution
|
|
158
|
+
Runner.stub(:available?, true) do
|
|
159
|
+
Runner.stub(:run, result) do
|
|
160
|
+
subject.process
|
|
161
|
+
end
|
|
162
|
+
end
|
|
163
|
+
```
|
|
@@ -0,0 +1,120 @@
|
|
|
1
|
+
---
|
|
2
|
+
doc-type: workflow
|
|
3
|
+
title: Analyze Test Failures Workflow
|
|
4
|
+
purpose: analyze-test-failures workflow instruction
|
|
5
|
+
ace-docs:
|
|
6
|
+
last-updated: 2026-02-24
|
|
7
|
+
last-checked: 2026-03-21
|
|
8
|
+
---
|
|
9
|
+
|
|
10
|
+
# Analyze Test Failures Workflow
|
|
11
|
+
|
|
12
|
+
## Goal
|
|
13
|
+
|
|
14
|
+
Analyze failing automated tests and classify each failure before any fix is applied.
|
|
15
|
+
|
|
16
|
+
This workflow produces a decision report that answers:
|
|
17
|
+
- Is this failure caused by implementation code?
|
|
18
|
+
- Is this failure caused by test code/spec?
|
|
19
|
+
- Is this failure caused by test infrastructure/environment?
|
|
20
|
+
|
|
21
|
+
## Hard Rule
|
|
22
|
+
|
|
23
|
+
- Do not edit application code or test files in this workflow.
|
|
24
|
+
- Do not run formatting/autofix commands in this workflow.
|
|
25
|
+
- This workflow ends with an analysis report only.
|
|
26
|
+
- Do not ask the user where/how to fix during this workflow; decide from evidence.
|
|
27
|
+
|
|
28
|
+
## Prerequisites
|
|
29
|
+
|
|
30
|
+
- Failing tests have already been executed
|
|
31
|
+
- Failure output is available (logs, stack traces, failing test list)
|
|
32
|
+
- Project context can be loaded
|
|
33
|
+
|
|
34
|
+
## Project Context Loading
|
|
35
|
+
|
|
36
|
+
- Read and follow: `ace-bundle wfi://bundle`
|
|
37
|
+
- Check recent changes: `git log --oneline -10`
|
|
38
|
+
|
|
39
|
+
## Classification Categories
|
|
40
|
+
|
|
41
|
+
Use exactly one category per failing test:
|
|
42
|
+
|
|
43
|
+
1. `implementation-bug`
|
|
44
|
+
- Product/runtime behavior is wrong
|
|
45
|
+
- Test expectation is valid
|
|
46
|
+
|
|
47
|
+
2. `test-defect`
|
|
48
|
+
- Test assertion/setup/fixture is stale or incorrect
|
|
49
|
+
- Product behavior appears correct for current requirements
|
|
50
|
+
|
|
51
|
+
3. `test-infrastructure`
|
|
52
|
+
- Environment/tooling/framework/configuration/isolation issue
|
|
53
|
+
- Failure is not specific to business behavior
|
|
54
|
+
|
|
55
|
+
## Analysis Procedure
|
|
56
|
+
|
|
57
|
+
1. Collect failing tests
|
|
58
|
+
- Identify failing file/test IDs from latest run output
|
|
59
|
+
- Capture exact error signatures
|
|
60
|
+
|
|
61
|
+
2. Gather evidence per failure
|
|
62
|
+
- Primary stacktrace line
|
|
63
|
+
- Related test file and assertion context
|
|
64
|
+
- Related implementation file/entrypoint context
|
|
65
|
+
- Environment/tooling context (timeouts, missing deps, DB state, network/mocks)
|
|
66
|
+
|
|
67
|
+
3. Classify each failure
|
|
68
|
+
- Assign one category (`implementation-bug`, `test-defect`, `test-infrastructure`)
|
|
69
|
+
- Add confidence: `high`, `medium`, or `low`
|
|
70
|
+
- Record one disconfirming check (what could prove this classification wrong)
|
|
71
|
+
- If confidence is `medium` or `low`, run at least one additional diagnostic read/search before final decision
|
|
72
|
+
|
|
73
|
+
4. Determine fix target
|
|
74
|
+
- `implementation code`
|
|
75
|
+
- `test code`
|
|
76
|
+
- `test infrastructure`
|
|
77
|
+
|
|
78
|
+
5. Choose autonomous fix decision
|
|
79
|
+
- Select a single primary fix action per failure
|
|
80
|
+
- Provide concrete file targets in priority order
|
|
81
|
+
- Define explicit no-touch boundaries
|
|
82
|
+
- Do not emit option lists that require user selection
|
|
83
|
+
|
|
84
|
+
## Required Output Contract
|
|
85
|
+
|
|
86
|
+
Produce this section before exiting:
|
|
87
|
+
|
|
88
|
+
```markdown
|
|
89
|
+
## Failure Analysis Report
|
|
90
|
+
|
|
91
|
+
| Failure | Category | Evidence | Fix Target | Fix Target Layer | Primary Candidate Files | Fallback Candidate Files | Do-Not-Touch Boundaries | Confidence | Disconfirming Check |
|
|
92
|
+
|---|---|---|---|---|---|---|---|---|---|
|
|
93
|
+
| path/to/test_file.rb:TestName | implementation-bug | stacktrace + behavior mismatch summary | implementation code | implementation | app/service.rb, app/model.rb | test/integration/foo_test.rb | test/e2e/** | high | run related tests after patch |
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
Then include:
|
|
97
|
+
|
|
98
|
+
```markdown
|
|
99
|
+
## Fix Decisions
|
|
100
|
+
- First item to fix: ...
|
|
101
|
+
- Chosen fix decision: ...
|
|
102
|
+
- Why this target first: ...
|
|
103
|
+
|
|
104
|
+
### Execution Plan Input
|
|
105
|
+
- Primary failure to fix first: ...
|
|
106
|
+
- Why first: ...
|
|
107
|
+
- Required verification commands: ...
|
|
108
|
+
- Expected pass criteria per command: ...
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
## Success Criteria
|
|
112
|
+
|
|
113
|
+
- Every failing test is classified
|
|
114
|
+
- Evidence is concrete and traceable
|
|
115
|
+
- Fix target is explicit per failure
|
|
116
|
+
- Fix target files are explicit per failure (primary + fallback)
|
|
117
|
+
- No-touch boundaries are explicit per failure
|
|
118
|
+
- A single autonomous chosen fix decision is present per failure
|
|
119
|
+
- A prioritized first failure is selected
|
|
120
|
+
- No code/test edits were made in this workflow
|