@harness-engineering/cli 1.4.0 → 1.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/agents/personas/architecture-enforcer.yaml +1 -0
- package/dist/agents/personas/code-reviewer.yaml +43 -0
- package/dist/agents/personas/codebase-health-analyst.yaml +32 -0
- package/dist/agents/personas/documentation-maintainer.yaml +2 -0
- package/dist/agents/personas/entropy-cleaner.yaml +3 -0
- package/dist/agents/personas/graph-maintainer.yaml +27 -0
- package/dist/agents/personas/parallel-coordinator.yaml +29 -0
- package/dist/agents/personas/performance-guardian.yaml +26 -0
- package/dist/agents/personas/security-reviewer.yaml +35 -0
- package/dist/agents/personas/task-executor.yaml +41 -0
- package/dist/agents/skills/README.md +8 -0
- package/dist/agents/skills/claude-code/add-harness-component/SKILL.md +10 -0
- package/dist/agents/skills/claude-code/align-documentation/SKILL.md +19 -0
- package/dist/agents/skills/claude-code/cleanup-dead-code/SKILL.md +19 -0
- package/dist/agents/skills/claude-code/detect-doc-drift/SKILL.md +8 -0
- package/dist/agents/skills/claude-code/enforce-architecture/SKILL.md +9 -0
- package/dist/agents/skills/claude-code/harness-architecture-advisor/SKILL.md +9 -0
- package/dist/agents/skills/claude-code/harness-autopilot/SKILL.md +494 -0
- package/dist/agents/skills/claude-code/harness-autopilot/skill.yaml +52 -0
- package/dist/agents/skills/claude-code/harness-code-review/SKILL.md +25 -0
- package/dist/agents/skills/claude-code/harness-debugging/SKILL.md +10 -0
- package/dist/agents/skills/claude-code/harness-dependency-health/SKILL.md +150 -0
- package/dist/agents/skills/claude-code/harness-dependency-health/skill.yaml +41 -0
- package/dist/agents/skills/claude-code/harness-execution/SKILL.md +19 -0
- package/dist/agents/skills/claude-code/harness-hotspot-detector/SKILL.md +135 -0
- package/dist/agents/skills/claude-code/harness-hotspot-detector/skill.yaml +44 -0
- package/dist/agents/skills/claude-code/harness-impact-analysis/SKILL.md +139 -0
- package/dist/agents/skills/claude-code/harness-impact-analysis/skill.yaml +44 -0
- package/dist/agents/skills/claude-code/harness-integrity/SKILL.md +20 -6
- package/dist/agents/skills/claude-code/harness-knowledge-mapper/SKILL.md +154 -0
- package/dist/agents/skills/claude-code/harness-knowledge-mapper/skill.yaml +49 -0
- package/dist/agents/skills/claude-code/harness-onboarding/SKILL.md +10 -0
- package/dist/agents/skills/claude-code/harness-parallel-agents/SKILL.md +9 -0
- package/dist/agents/skills/claude-code/harness-perf/SKILL.md +231 -0
- package/dist/agents/skills/claude-code/harness-perf/skill.yaml +47 -0
- package/dist/agents/skills/claude-code/harness-perf-tdd/SKILL.md +236 -0
- package/dist/agents/skills/claude-code/harness-perf-tdd/skill.yaml +47 -0
- package/dist/agents/skills/claude-code/harness-planning/SKILL.md +9 -0
- package/dist/agents/skills/claude-code/harness-pre-commit-review/SKILL.md +33 -2
- package/dist/agents/skills/claude-code/harness-refactoring/SKILL.md +19 -0
- package/dist/agents/skills/claude-code/harness-release-readiness/SKILL.md +657 -0
- package/dist/agents/skills/claude-code/harness-release-readiness/skill.yaml +57 -0
- package/dist/agents/skills/claude-code/harness-security-review/SKILL.md +206 -0
- package/dist/agents/skills/claude-code/harness-security-review/skill.yaml +50 -0
- package/dist/agents/skills/claude-code/harness-security-scan/SKILL.md +102 -0
- package/dist/agents/skills/claude-code/harness-security-scan/skill.yaml +41 -0
- package/dist/agents/skills/claude-code/harness-state-management/SKILL.md +22 -8
- package/dist/agents/skills/claude-code/harness-tdd/SKILL.md +10 -0
- package/dist/agents/skills/claude-code/harness-test-advisor/SKILL.md +131 -0
- package/dist/agents/skills/claude-code/harness-test-advisor/skill.yaml +44 -0
- package/dist/agents/skills/claude-code/initialize-harness-project/SKILL.md +10 -0
- package/dist/agents/skills/claude-code/validate-context-engineering/SKILL.md +9 -0
- package/dist/agents/skills/gemini-cli/harness-autopilot/SKILL.md +494 -0
- package/dist/agents/skills/gemini-cli/harness-autopilot/skill.yaml +52 -0
- package/dist/agents/skills/gemini-cli/harness-dependency-health/SKILL.md +150 -0
- package/dist/agents/skills/gemini-cli/harness-dependency-health/skill.yaml +41 -0
- package/dist/agents/skills/gemini-cli/harness-hotspot-detector/SKILL.md +135 -0
- package/dist/agents/skills/gemini-cli/harness-hotspot-detector/skill.yaml +44 -0
- package/dist/agents/skills/gemini-cli/harness-impact-analysis/SKILL.md +139 -0
- package/dist/agents/skills/gemini-cli/harness-impact-analysis/skill.yaml +44 -0
- package/dist/agents/skills/gemini-cli/harness-knowledge-mapper/SKILL.md +154 -0
- package/dist/agents/skills/gemini-cli/harness-knowledge-mapper/skill.yaml +49 -0
- package/dist/agents/skills/gemini-cli/harness-perf/SKILL.md +231 -0
- package/dist/agents/skills/gemini-cli/harness-perf/skill.yaml +47 -0
- package/dist/agents/skills/gemini-cli/harness-perf-tdd/SKILL.md +236 -0
- package/dist/agents/skills/gemini-cli/harness-perf-tdd/skill.yaml +47 -0
- package/dist/agents/skills/gemini-cli/harness-release-readiness/SKILL.md +657 -0
- package/dist/agents/skills/gemini-cli/harness-release-readiness/skill.yaml +57 -0
- package/dist/agents/skills/gemini-cli/harness-security-review/skill.yaml +50 -0
- package/dist/agents/skills/gemini-cli/harness-security-scan/SKILL.md +102 -0
- package/dist/agents/skills/gemini-cli/harness-security-scan/skill.yaml +41 -0
- package/dist/agents/skills/gemini-cli/harness-test-advisor/SKILL.md +131 -0
- package/dist/agents/skills/gemini-cli/harness-test-advisor/skill.yaml +44 -0
- package/dist/agents/skills/tests/platform-parity.test.ts +131 -0
- package/dist/agents/skills/tests/schema.ts +2 -0
- package/dist/bin/harness.js +2 -2
- package/dist/{chunk-EFZOLZFB.js → chunk-ACMDUQJG.js} +4 -2
- package/dist/{chunk-C3J2HW4Y.js → chunk-O6NEKDYP.js} +2002 -487
- package/dist/{create-skill-4GKJZB5R.js → create-skill-NZDLMMR6.js} +1 -1
- package/dist/index.d.ts +265 -143
- package/dist/index.js +30 -4
- package/package.json +3 -2
|
@@ -0,0 +1,236 @@
|
|
|
1
|
+
# Harness Perf TDD
|
|
2
|
+
|
|
3
|
+
> Red-Green-Refactor with performance assertions. Every feature gets a correctness test AND a benchmark. No optimization without measurement.
|
|
4
|
+
|
|
5
|
+
## When to Use
|
|
6
|
+
|
|
7
|
+
- Implementing performance-critical features
|
|
8
|
+
- When the spec includes performance requirements (e.g., "must respond in < 100ms")
|
|
9
|
+
- When modifying `@perf-critical` annotated code
|
|
10
|
+
- When adding hot-path logic (parsers, serializers, query resolvers, middleware)
|
|
11
|
+
- NOT for non-performance-sensitive code (use harness-tdd instead)
|
|
12
|
+
- NOT for refactoring existing code that already has benchmarks (use harness-refactoring + harness-perf)
|
|
13
|
+
|
|
14
|
+
## Process
|
|
15
|
+
|
|
16
|
+
### Iron Law
|
|
17
|
+
|
|
18
|
+
**No production code exists without both a failing test AND a failing benchmark that demanded its creation.**
|
|
19
|
+
|
|
20
|
+
If you find yourself writing production code before both the test and the benchmark exist, STOP. Write the test. Write the benchmark. Then implement.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
### Phase 1: RED — Write Failing Test + Benchmark
|
|
25
|
+
|
|
26
|
+
1. **Write the correctness test** following the same process as harness-tdd Phase 1 (RED):
|
|
27
|
+
- Identify the smallest behavior to test
|
|
28
|
+
- Write ONE minimal test with a clear assertion
|
|
29
|
+
- Follow the project's test conventions
|
|
30
|
+
|
|
31
|
+
2. **Write a `.bench.ts` benchmark file** alongside the test file:
|
|
32
|
+
- Co-locate with source: `handler.ts` -> `handler.bench.ts`
|
|
33
|
+
- Use Vitest bench syntax for benchmark definitions
|
|
34
|
+
- Set a performance assertion if the spec includes one
|
|
35
|
+
|
|
36
|
+
```typescript
|
|
37
|
+
import { bench, describe } from 'vitest';
|
|
38
|
+
import { processData } from './handler';
|
|
39
|
+
|
|
40
|
+
describe('processData benchmarks', () => {
|
|
41
|
+
bench('processData with small input', () => {
|
|
42
|
+
processData(smallInput);
|
|
43
|
+
});
|
|
44
|
+
|
|
45
|
+
bench('processData with large input', () => {
|
|
46
|
+
processData(largeInput);
|
|
47
|
+
});
|
|
48
|
+
});
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
3. **Run the test** — observe failure. The function is not implemented yet, so the test should fail with "not defined" or "not a function."
|
|
52
|
+
|
|
53
|
+
4. **Run the benchmark** — observe failure or no baseline. This establishes that the benchmark exists and will track performance once the implementation lands.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
### Phase 2: GREEN — Pass Test and Benchmark
|
|
58
|
+
|
|
59
|
+
1. **Write the minimum implementation** to make the correctness test pass. Do not optimize yet. The goal is correctness first.
|
|
60
|
+
|
|
61
|
+
2. **Run the test** — observe pass. If it fails, fix the implementation until it passes.
|
|
62
|
+
|
|
63
|
+
3. **Run the benchmark** — capture initial results. This is the first measurement. Note:
|
|
64
|
+
- If a performance assertion exists in the spec, verify it passes
|
|
65
|
+
- If no assertion exists, record the result as a baseline reference
|
|
66
|
+
- Do not optimize at this stage unless the assertion fails
|
|
67
|
+
|
|
68
|
+
4. **If the performance assertion fails,** you have two options:
|
|
69
|
+
- The implementation approach is fundamentally wrong (e.g., O(n^2) when O(n) is needed) — revise the algorithm
|
|
70
|
+
- The assertion is too aggressive for a first pass — note it and defer to REFACTOR phase
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
### Phase 3: REFACTOR — Optimize While Green
|
|
75
|
+
|
|
76
|
+
This phase is optional. Enter it when:
|
|
77
|
+
|
|
78
|
+
- The benchmark shows room for improvement against the performance requirement
|
|
79
|
+
- Profiling reveals an obvious bottleneck
|
|
80
|
+
- The code can be simplified while maintaining or improving performance
|
|
81
|
+
|
|
82
|
+
1. **Profile the implementation** if the benchmark result is far from the requirement. Use the benchmark output to identify the bottleneck.
|
|
83
|
+
|
|
84
|
+
2. **Refactor for performance** — consider:
|
|
85
|
+
- Algorithm improvements (sort, search, data structure choice)
|
|
86
|
+
- Caching or memoization for repeated computations
|
|
87
|
+
- Reducing allocations (object pooling, buffer reuse)
|
|
88
|
+
- Eliminating unnecessary work (early returns, lazy evaluation)
|
|
89
|
+
|
|
90
|
+
3. **After each change,** run both checks:
|
|
91
|
+
- **Test:** Still passing? If not, the refactor broke correctness. Revert.
|
|
92
|
+
- **Benchmark:** Improved? If not, the refactor was not effective. Consider reverting.
|
|
93
|
+
|
|
94
|
+
4. **Stop when** the benchmark meets the performance requirement, or when further optimization yields diminishing returns (< 1% improvement per change).
|
|
95
|
+
|
|
96
|
+
5. **Do not gold-plate.** If the requirement is "< 100ms" and you are at 40ms, stop. Move on.
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
### Phase 4: VALIDATE — Harness Checks
|
|
101
|
+
|
|
102
|
+
1. **Run `harness check-perf`** to verify no Tier 1 or Tier 2 violations were introduced by the implementation:
|
|
103
|
+
- Cyclomatic complexity within thresholds
|
|
104
|
+
- Coupling metrics acceptable
|
|
105
|
+
- No benchmark regressions in other modules
|
|
106
|
+
|
|
107
|
+
2. **Run `harness validate`** to verify overall project health:
|
|
108
|
+
- All tests pass
|
|
109
|
+
- Linter clean
|
|
110
|
+
- Type checks pass
|
|
111
|
+
|
|
112
|
+
3. **Update baselines** if this is a new benchmark:
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
harness perf baselines update
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
This persists the current benchmark results so future runs can detect regressions.
|
|
119
|
+
|
|
120
|
+
4. **Commit with a descriptive message** that mentions both the feature and its performance characteristics:
|
|
121
|
+
```
|
|
122
|
+
feat(parser): add streaming JSON parser (<50ms for 1MB payloads)
|
|
123
|
+
```
|
|
124
|
+
|
|
125
|
+
---
|
|
126
|
+
|
|
127
|
+
## Benchmark File Convention
|
|
128
|
+
|
|
129
|
+
Benchmark files are co-located with their source files, using the `.bench.ts` extension:
|
|
130
|
+
|
|
131
|
+
| Source File | Benchmark File |
|
|
132
|
+
| ----------------------------- | ----------------------------------- |
|
|
133
|
+
| `src/parser/handler.ts` | `src/parser/handler.bench.ts` |
|
|
134
|
+
| `src/api/resolver.ts` | `src/api/resolver.bench.ts` |
|
|
135
|
+
| `packages/core/src/engine.ts` | `packages/core/src/engine.bench.ts` |
|
|
136
|
+
|
|
137
|
+
Each benchmark file should:
|
|
138
|
+
|
|
139
|
+
- Import only from the module under test
|
|
140
|
+
- Define benchmarks in a `describe` block named after the module
|
|
141
|
+
- Include both small-input and large-input cases when applicable
|
|
142
|
+
- Use realistic data (not empty objects or trivial inputs)
|
|
143
|
+
|
|
144
|
+
---
|
|
145
|
+
|
|
146
|
+
## Harness Integration
|
|
147
|
+
|
|
148
|
+
- **`harness check-perf`** — Run after implementation to check for violations
|
|
149
|
+
- **`harness perf bench`** — Run benchmarks in isolation
|
|
150
|
+
- **`harness perf baselines update`** — Persist benchmark results as new baselines
|
|
151
|
+
- **`harness validate`** — Full project health check
|
|
152
|
+
- **`harness perf critical-paths`** — View critical path set to understand which benchmarks have stricter thresholds
|
|
153
|
+
|
|
154
|
+
## Success Criteria
|
|
155
|
+
|
|
156
|
+
- Every new function has both a test file (`.test.ts`) and a bench file (`.bench.ts`)
|
|
157
|
+
- Benchmarks run without errors
|
|
158
|
+
- No Tier 1 performance violations after implementation
|
|
159
|
+
- Baselines are updated for new benchmarks
|
|
160
|
+
- Commit message includes performance context when relevant
|
|
161
|
+
|
|
162
|
+
## Examples
|
|
163
|
+
|
|
164
|
+
### Example: Implementing a Performance-Critical Parser
|
|
165
|
+
|
|
166
|
+
**Phase 1: RED**
|
|
167
|
+
|
|
168
|
+
```typescript
|
|
169
|
+
// src/parser/json-stream.test.ts
|
|
170
|
+
it('parses 1MB JSON in under 50ms', () => {
|
|
171
|
+
const result = parseStream(largeMbPayload);
|
|
172
|
+
expect(result).toEqual(expectedOutput);
|
|
173
|
+
});
|
|
174
|
+
|
|
175
|
+
// src/parser/json-stream.bench.ts
|
|
176
|
+
bench('parseStream 1MB', () => {
|
|
177
|
+
parseStream(largeMbPayload);
|
|
178
|
+
});
|
|
179
|
+
```
|
|
180
|
+
|
|
181
|
+
Run test: FAIL (parseStream not defined). Run benchmark: FAIL (no implementation).
|
|
182
|
+
|
|
183
|
+
**Phase 2: GREEN**
|
|
184
|
+
|
|
185
|
+
```typescript
|
|
186
|
+
// src/parser/json-stream.ts
|
|
187
|
+
export function parseStream(input: string): ParsedResult {
|
|
188
|
+
return JSON.parse(input); // simplest correct implementation
|
|
189
|
+
}
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
Run test: PASS. Run benchmark: 38ms average (meets <50ms requirement).
|
|
193
|
+
|
|
194
|
+
**Phase 3: REFACTOR** — skipped (38ms already meets 50ms target).
|
|
195
|
+
|
|
196
|
+
**Phase 4: VALIDATE**
|
|
197
|
+
|
|
198
|
+
```
|
|
199
|
+
harness check-perf — no violations
|
|
200
|
+
harness validate — passes
|
|
201
|
+
harness perf baselines update — baseline saved
|
|
202
|
+
git commit -m "feat(parser): add streaming JSON parser (<50ms for 1MB payloads)"
|
|
203
|
+
```
|
|
204
|
+
|
|
205
|
+
### Example: Optimizing an Existing Hot Path
|
|
206
|
+
|
|
207
|
+
**Phase 1: RED** — test and benchmark already exist from initial implementation.
|
|
208
|
+
|
|
209
|
+
**Phase 3: REFACTOR**
|
|
210
|
+
|
|
211
|
+
```
|
|
212
|
+
Before: resolveImports 12ms (requirement: <5ms)
|
|
213
|
+
Change: switch from recursive descent to iterative with stack
|
|
214
|
+
After: resolveImports 3.8ms
|
|
215
|
+
Test: still passing
|
|
216
|
+
```
|
|
217
|
+
|
|
218
|
+
**Phase 4: VALIDATE**
|
|
219
|
+
|
|
220
|
+
```
|
|
221
|
+
harness check-perf — complexity reduced from 12 to 8 (improvement)
|
|
222
|
+
harness perf baselines update — new baseline saved
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
## Gates
|
|
226
|
+
|
|
227
|
+
- **No code before test AND benchmark.** Both must exist before implementation begins.
|
|
228
|
+
- **No optimization without measurement.** Run the benchmark before and after refactoring. Gut feelings are not measurements.
|
|
229
|
+
- **No skipping VALIDATE.** `harness check-perf` and `harness validate` must pass after every cycle.
|
|
230
|
+
- **No committing without updated baselines.** New benchmarks must have baselines persisted.
|
|
231
|
+
|
|
232
|
+
## Escalation
|
|
233
|
+
|
|
234
|
+
- **When the performance requirement cannot be met:** Report the best achieved result and propose either relaxing the requirement or redesigning the approach. Include benchmark data.
|
|
235
|
+
- **When benchmarks are flaky:** Increase iteration count, add warmup, or isolate the benchmark from I/O. Report the variance so the team can decide on an acceptable noise margin.
|
|
236
|
+
- **When the test and benchmark have conflicting needs:** Correctness always wins. If a correct implementation cannot meet the performance requirement, escalate to the team for a design decision.
|
|
@@ -0,0 +1,47 @@
|
|
|
1
|
+
name: harness-perf-tdd
|
|
2
|
+
version: "1.0.0"
|
|
3
|
+
description: Performance-aware TDD with benchmark assertions in the red-green-refactor cycle
|
|
4
|
+
cognitive_mode: meticulous-implementer
|
|
5
|
+
triggers:
|
|
6
|
+
- manual
|
|
7
|
+
platforms:
|
|
8
|
+
- claude-code
|
|
9
|
+
- gemini-cli
|
|
10
|
+
tools:
|
|
11
|
+
- Bash
|
|
12
|
+
- Read
|
|
13
|
+
- Write
|
|
14
|
+
- Edit
|
|
15
|
+
- Glob
|
|
16
|
+
- Grep
|
|
17
|
+
cli:
|
|
18
|
+
command: harness skill run harness-perf-tdd
|
|
19
|
+
args:
|
|
20
|
+
- name: path
|
|
21
|
+
description: Project root path
|
|
22
|
+
required: false
|
|
23
|
+
mcp:
|
|
24
|
+
tool: run_skill
|
|
25
|
+
input:
|
|
26
|
+
skill: harness-perf-tdd
|
|
27
|
+
path: string
|
|
28
|
+
type: rigid
|
|
29
|
+
phases:
|
|
30
|
+
- name: red
|
|
31
|
+
description: Write failing test and benchmark assertion
|
|
32
|
+
required: true
|
|
33
|
+
- name: green
|
|
34
|
+
description: Implement to pass test and benchmark
|
|
35
|
+
required: true
|
|
36
|
+
- name: refactor
|
|
37
|
+
description: Optimize while keeping both green
|
|
38
|
+
required: false
|
|
39
|
+
- name: validate
|
|
40
|
+
description: Run harness check-perf and harness validate
|
|
41
|
+
required: true
|
|
42
|
+
state:
|
|
43
|
+
persistent: false
|
|
44
|
+
files: []
|
|
45
|
+
depends_on:
|
|
46
|
+
- harness-tdd
|
|
47
|
+
- harness-perf
|
|
@@ -60,6 +60,15 @@ When writing observable truths and acceptance criteria, use EARS (Easy Approach
|
|
|
60
60
|
|
|
61
61
|
**When to use EARS:** Apply these patterns when writing observable truths in Phase 1. Not every criterion needs an EARS pattern — use them when the requirement is behavioral (not structural). File existence checks ("src/types/user.ts exists with User interface") do not need EARS framing.
|
|
62
62
|
|
|
63
|
+
### Graph-Enhanced Context (when available)
|
|
64
|
+
|
|
65
|
+
When a knowledge graph exists at `.harness/graph/`, use graph queries for faster, more accurate context:
|
|
66
|
+
|
|
67
|
+
- `query_graph` — discover module dependencies for realistic task decomposition
|
|
68
|
+
- `get_impact` — estimate which modules a feature touches and their dependencies
|
|
69
|
+
|
|
70
|
+
Enables accurate effort estimation and task sequencing. Fall back to file-based commands if no graph is available.
|
|
71
|
+
|
|
63
72
|
---
|
|
64
73
|
|
|
65
74
|
### Phase 2: DECOMPOSE — Map File Structure and Create Tasks
|
|
@@ -61,6 +61,12 @@ Mechanical Checks:
|
|
|
61
61
|
Action: Fix lint errors before committing.
|
|
62
62
|
```
|
|
63
63
|
|
|
64
|
+
### Graph Freshness Check
|
|
65
|
+
|
|
66
|
+
If a knowledge graph exists at `.harness/graph/` and code files have changed since the last scan, run `harness scan` before proceeding. The AI review phase uses graph-enhanced MCP tools (impact analysis, harness checks) that return stale results with an outdated graph.
|
|
67
|
+
|
|
68
|
+
If no graph exists, skip this step — the tools fall back to non-graph behavior.
|
|
69
|
+
|
|
64
70
|
### Phase 2: Classify Changes
|
|
65
71
|
|
|
66
72
|
Determine whether AI review is needed based on what changed.
|
|
@@ -101,7 +107,29 @@ AI Review: SKIPPED (docs/config only)
|
|
|
101
107
|
|
|
102
108
|
If any staged file contains code changes, proceed to Phase 3.
|
|
103
109
|
|
|
104
|
-
### Phase 3:
|
|
110
|
+
### Phase 3: Security Scan
|
|
111
|
+
|
|
112
|
+
Run the built-in security scanner against staged files. This is a mechanical check — no AI judgment involved.
|
|
113
|
+
|
|
114
|
+
```bash
|
|
115
|
+
# Get list of staged source files
|
|
116
|
+
git diff --cached --name-only --diff-filter=d | grep -E '\.(ts|tsx|js|jsx|go|py)$'
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
Use the `run_security_scan` MCP tool or invoke the scanner on the staged files. Report any findings:
|
|
120
|
+
|
|
121
|
+
- **Error findings (blocking):** Hardcoded secrets, eval/injection, weak crypto — these block the commit just like lint failures.
|
|
122
|
+
- **Warning/info findings (advisory):** CORS wildcards, HTTP URLs, disabled TLS — reported but do not block.
|
|
123
|
+
|
|
124
|
+
Include security scan results in the report output:
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
Security Scan: [PASS/WARN/FAIL] (N errors, N warnings)
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
If no source files are staged, skip the security scan.
|
|
131
|
+
|
|
132
|
+
### Phase 4: AI Review (Lightweight)
|
|
105
133
|
|
|
106
134
|
Perform a focused, lightweight review of staged changes. This is NOT a full code review — it catches obvious issues only.
|
|
107
135
|
|
|
@@ -116,7 +144,7 @@ git diff --cached
|
|
|
116
144
|
Review the staged diff for these high-signal issues only:
|
|
117
145
|
|
|
118
146
|
- **Obvious bugs:** null dereference, infinite loops, off-by-one errors, resource leaks
|
|
119
|
-
- **Security issues:** hardcoded secrets, SQL injection, path traversal, unvalidated input
|
|
147
|
+
- **Security issues:** hardcoded secrets, SQL injection, path traversal, unvalidated input (complements the mechanical scan with semantic analysis — e.g., tracing user input across function boundaries)
|
|
120
148
|
- **Broken imports:** references to files/modules that do not exist
|
|
121
149
|
- **Debug artifacts:** console.log, debugger statements, TODO/FIXME without issue reference
|
|
122
150
|
- **Type mismatches:** function called with wrong argument types (if visible in diff)
|
|
@@ -139,6 +167,7 @@ Mechanical Checks:
|
|
|
139
167
|
- Lint: PASS
|
|
140
168
|
- Types: PASS
|
|
141
169
|
- Tests: PASS (12/12)
|
|
170
|
+
- Security Scan: PASS (0 errors, 0 warnings)
|
|
142
171
|
|
|
143
172
|
AI Review: PASS (no issues found)
|
|
144
173
|
```
|
|
@@ -152,6 +181,8 @@ Mechanical Checks:
|
|
|
152
181
|
- Lint: PASS
|
|
153
182
|
- Types: PASS
|
|
154
183
|
- Tests: PASS (12/12)
|
|
184
|
+
- Security Scan: WARN (0 errors, 1 warning)
|
|
185
|
+
- [SEC-NET-001] src/cors.ts:5 — CORS wildcard origin
|
|
155
186
|
|
|
156
187
|
AI Review: 2 observations
|
|
157
188
|
1. [file:line] Possible null dereference — `user.email` accessed without null check after `findUser()` which can return null.
|
|
@@ -32,6 +32,15 @@ If tests are not green before you start, you are not refactoring — you are deb
|
|
|
32
32
|
|
|
33
33
|
4. **Plan the steps.** Break the refactoring into the smallest possible individual changes. Each step should be independently committable and verifiable. If you cannot describe a step in one sentence, it is too large.
|
|
34
34
|
|
|
35
|
+
### Graph-Enhanced Context (when available)
|
|
36
|
+
|
|
37
|
+
When a knowledge graph exists at `.harness/graph/`, use graph queries for faster, more accurate context:
|
|
38
|
+
|
|
39
|
+
- `get_impact` — precise impact analysis: "if I move this function, what breaks?"
|
|
40
|
+
- `query_graph` — find all transitive consumers, not just direct importers
|
|
41
|
+
|
|
42
|
+
Catches indirect consumers that grep misses. Fall back to file-based commands if no graph is available.
|
|
43
|
+
|
|
35
44
|
### Phase 2: Execute — One Small Change at a Time
|
|
36
45
|
|
|
37
46
|
For EACH step in the plan:
|
|
@@ -62,6 +71,16 @@ For EACH step in the plan:
|
|
|
62
71
|
|
|
63
72
|
2. **Run `harness validate` and `harness check-deps` one final time.** Clean output.
|
|
64
73
|
|
|
74
|
+
### Graph Refresh
|
|
75
|
+
|
|
76
|
+
If a knowledge graph exists at `.harness/graph/`, refresh it after code changes to keep graph queries accurate:
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
harness scan [path]
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
Skipping this step means subsequent graph queries (impact analysis, dependency health, test advisor) may return stale results.
|
|
83
|
+
|
|
65
84
|
3. **Review the cumulative diff.** Does the final state match the intended improvement? Is the code genuinely better, or just different?
|
|
66
85
|
|
|
67
86
|
4. **If the refactoring introduced no improvement,** revert the entire sequence. Refactoring for its own sake is churn.
|