arey-pi 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +159 -0
- package/agents/README.md +313 -0
- package/agents/engineering-reviewer.md +78 -0
- package/agents/project-evaluator.md +136 -0
- package/agents/spec-author.md +82 -0
- package/agents/spec-syncer.md +88 -0
- package/agents/tdd-implementer.md +81 -0
- package/agents/tech-lead.md +92 -0
- package/package.json +48 -0
- package/prompts/assess-project.md +38 -0
- package/rules/README.md +57 -0
- package/rules/architecture/adrs.md +257 -0
- package/rules/architecture/architecture-memory.md +55 -0
- package/rules/assessment/project-readiness.md +224 -0
- package/rules/core/change-modes.md +63 -0
- package/rules/core/conflict-resolution.md +56 -0
- package/rules/core/definition-of-done.md +67 -0
- package/rules/core/principles.md +63 -0
- package/rules/engineering/engineering-quality.md +285 -0
- package/rules/engineering/quality-tooling.md +137 -0
- package/rules/engineering/rebuildability.md +49 -0
- package/rules/engineering/tdd.md +86 -0
- package/rules/engineering/test-quality.md +159 -0
- package/rules/specs/canonical-specs.md +62 -0
- package/rules/specs/database-specs.md +142 -0
- package/rules/specs/gherkin-authoring.md +121 -0
- package/rules/specs/language-style.md +106 -0
- package/rules/specs/spec-sync.md +70 -0
- package/rules/workflow/agent-workflows.md +70 -0
- package/rules/workflow/ai-harness.md +177 -0
- package/rules/workflow/incremental-commits.md +88 -0
- package/skills/project-readiness/SKILL.md +96 -0
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# Definition of Done
|
|
2
|
+
|
|
3
|
+
## Core Rule
|
|
4
|
+
|
|
5
|
+
A change is done only when specs, tests, and code are aligned.
|
|
6
|
+
|
|
7
|
+
Completion is not based only on code compiling or tests passing. The durable project knowledge must also be correct.
|
|
8
|
+
|
|
9
|
+
## Required Conditions
|
|
10
|
+
|
|
11
|
+
A change is complete when:
|
|
12
|
+
|
|
13
|
+
- relevant Gherkin specs exist or are explicitly confirmed unaffected;
|
|
14
|
+
- production behaviour is covered by meaningful tests;
|
|
15
|
+
- test quality has been assessed with mutation testing, coverage, or explicit review appropriate to risk;
|
|
16
|
+
- TDD was followed for production behaviour;
|
|
17
|
+
- bug fixes include regression tests;
|
|
18
|
+
- tests pass, or any inability to run them is clearly documented;
|
|
19
|
+
- weak, shallow, or unvalidated generated tests are not accepted as sufficient evidence;
|
|
20
|
+
- architecture and code quality meet the senior engineering standard defined by Arey Pi;
|
|
21
|
+
- formatting, linting/static analysis, type checking where applicable, and relevant dynamic analysis have passed or are explicitly blocked with evidence;
|
|
22
|
+
- if the project lacks required quality tooling, tooling selection/configuration has been discussed with the user and captured as follow-up or implementation work;
|
|
23
|
+
- specs, tests, and code agree;
|
|
24
|
+
- architecture docs or high-quality ADRs are updated when durable decisions changed;
|
|
25
|
+
- DBML database specs exist and are precisely synchronised when the project has a database and the change touches schema, migrations, ORM models, or persistence contracts;
|
|
26
|
+
- glossary is updated when domain language changed;
|
|
27
|
+
- code changes are scoped and minimal for the intended behaviour;
|
|
28
|
+
- project-facing prose follows the configured language style, UK English by default;
|
|
29
|
+
- specs use semantic line breaks, and touched documentation preserves or improves semantic line breaks;
|
|
30
|
+
- residual risks are reported;
|
|
31
|
+
- incremental Conventional Commits are created when work spans meaningful steps.
|
|
32
|
+
|
|
33
|
+
## Not Done
|
|
34
|
+
|
|
35
|
+
A change is not done if:
|
|
36
|
+
|
|
37
|
+
- behaviour changed but Gherkin was not updated or justified unaffected;
|
|
38
|
+
- tests were skipped silently;
|
|
39
|
+
- production code was written without TDD evidence;
|
|
40
|
+
- failing tests remain unresolved without explicit blocker status;
|
|
41
|
+
- code contradicts canonical specs;
|
|
42
|
+
- database schema, migrations, ORM models, or persistence code drift from canonical DBML;
|
|
43
|
+
- implementation is correct but architecturally weak, brittle, overcomplicated, or low quality;
|
|
44
|
+
- formatter, linter/static analyser, type checker, or required dynamic analysis fails without explicit blocker status;
|
|
45
|
+
- the project lacks quality tooling and the gap has not been surfaced to the user;
|
|
46
|
+
- significant technical decisions exist only in chat, implementation comments, or low-value ADRs that do not explain context, options, tradeoffs, and consequences;
|
|
47
|
+
- project-facing prose mixes language styles without reason;
|
|
48
|
+
- specs or touched docs ignore semantic line break conventions;
|
|
49
|
+
- unrelated cleanup is mixed into the change without approval.
|
|
50
|
+
|
|
51
|
+
## Completion Report
|
|
52
|
+
|
|
53
|
+
Agents should close with:
|
|
54
|
+
|
|
55
|
+
```txt
|
|
56
|
+
Done summary:
|
|
57
|
+
- Behaviour/spec impact:
|
|
58
|
+
- Tests/TDD:
|
|
59
|
+
- Validation:
|
|
60
|
+
- Quality tooling:
|
|
61
|
+
- Spec sync:
|
|
62
|
+
- Architecture/code quality:
|
|
63
|
+
- Architecture/ADR/glossary:
|
|
64
|
+
- Database/DBML:
|
|
65
|
+
- Commits:
|
|
66
|
+
- Residual risks:
|
|
67
|
+
```
|
|
@@ -0,0 +1,63 @@
|
|
|
1
|
+
# Framework Principles
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Arey Pi defines a software delivery model for projects that must remain understandable, testable, and rebuildable over time.
|
|
6
|
+
|
|
7
|
+
Its central premise is:
|
|
8
|
+
|
|
9
|
+
> Specs are durable. Tests are executable truth. Code is disposable.
|
|
10
|
+
|
|
11
|
+
Arey Pi optimises for preserving intent outside the current implementation so that code can be refactored, replaced, or fully rewritten without losing product or domain knowledge.
|
|
12
|
+
|
|
13
|
+
Rebuildability never lowers the quality bar. Arey Pi expects architecture and code to be designed and written at an exceptional senior engineering standard.
|
|
14
|
+
|
|
15
|
+
## Core Model
|
|
16
|
+
|
|
17
|
+
A project using Arey Pi is governed by three synchronised layers:
|
|
18
|
+
|
|
19
|
+
1. **Canonical specs** define intended behaviour and durable project knowledge.
|
|
20
|
+
2. **Tests** execute and verify the intended behaviour.
|
|
21
|
+
3. **Code** implements the behaviour and may be replaced.
|
|
22
|
+
|
|
23
|
+
The implementation is not the primary memory of the project. If important behaviour, constraints, or decisions only exist in code, the project is not fully captured.
|
|
24
|
+
|
|
25
|
+
## Durable Knowledge
|
|
26
|
+
|
|
27
|
+
Durable project knowledge belongs in:
|
|
28
|
+
|
|
29
|
+
- Gherkin feature specs;
|
|
30
|
+
- automated tests;
|
|
31
|
+
- architecture documents;
|
|
32
|
+
- Architecture Decision Records;
|
|
33
|
+
- domain glossary entries;
|
|
34
|
+
- explicit project rules and constraints.
|
|
35
|
+
|
|
36
|
+
Production code may contain useful local explanations, but it must not be the only place where product behaviour, business rules, architectural decisions, or domain vocabulary are preserved.
|
|
37
|
+
|
|
38
|
+
## Development Guarantees
|
|
39
|
+
|
|
40
|
+
Every completed change must preserve these guarantees:
|
|
41
|
+
|
|
42
|
+
1. **Canonical behaviour is represented in specs.**
|
|
43
|
+
2. **Production behaviour is covered by meaningful tests.**
|
|
44
|
+
3. **TDD is followed for production behaviour.**
|
|
45
|
+
4. **Architecture and code meet a high senior engineering standard.**
|
|
46
|
+
5. **Specs, tests, and code are synchronised before completion.**
|
|
47
|
+
6. **Durable decisions are persisted outside implementation code.**
|
|
48
|
+
7. **The resulting system remains rebuildable from durable knowledge.**
|
|
49
|
+
|
|
50
|
+
## Work Modes
|
|
51
|
+
|
|
52
|
+
Arey Pi supports two work modes:
|
|
53
|
+
|
|
54
|
+
- **Spec-Driven Mode** for non-trivial, ambiguous, product, domain, API, or architectural work.
|
|
55
|
+
- **Direct Change Mode** for small, local, obvious, or mechanical changes.
|
|
56
|
+
|
|
57
|
+
Direct Change Mode is a lighter path, not an escape hatch. It still requires TDD where production behaviour is involved and always requires final spec synchronisation.
|
|
58
|
+
|
|
59
|
+
## Agent Bias
|
|
60
|
+
|
|
61
|
+
Agents should prefer the lightest workflow that preserves the guarantees.
|
|
62
|
+
|
|
63
|
+
Do not add ceremony for simple tasks, but do not close work with missing tests, stale specs, undocumented decisions, or unresolved drift.
|
|
@@ -0,0 +1,285 @@
|
|
|
1
|
+
# Engineering Quality
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Engineering quality is the standard that keeps Arey Pi from becoming a documentation and test exercise around mediocre software.
|
|
6
|
+
|
|
7
|
+
Specs, TDD, test quality, DBML, ADRs, quality tooling, and rebuildability all exist to support excellent engineering. They do not replace the need for excellent architecture and code.
|
|
8
|
+
|
|
9
|
+
Arey Pi's expectation is simple:
|
|
10
|
+
|
|
11
|
+
> Build as the most senior, careful, pragmatic software engineer on the team would build.
|
|
12
|
+
|
|
13
|
+
## Core Rule
|
|
14
|
+
|
|
15
|
+
Production code must be correct, well-tested, well-designed, maintainable, and consistent with the project's durable knowledge.
|
|
16
|
+
|
|
17
|
+
Rebuildability never lowers the quality bar. Code may be replaceable, but the current implementation must still be excellent for its context, risk, and importance.
|
|
18
|
+
|
|
19
|
+
Agents must not treat passing tests, generated code, or future rewriteability as excuses for weak design.
|
|
20
|
+
|
|
21
|
+
## Priority Order
|
|
22
|
+
|
|
23
|
+
When implementation tradeoffs arise, optimise in this order:
|
|
24
|
+
|
|
25
|
+
1. Correct behaviour according to canonical specs.
|
|
26
|
+
2. Meaningful tests that protect the behaviour.
|
|
27
|
+
3. Excellent architecture and code quality.
|
|
28
|
+
4. Precise synchronisation with durable knowledge.
|
|
29
|
+
5. Rebuildability from specs, tests, ADRs, DBML, and architecture docs.
|
|
30
|
+
6. Delivery speed.
|
|
31
|
+
|
|
32
|
+
Speed matters only when it does not compromise correctness, test quality, architecture, maintainability, security, operability, or validation tooling.
|
|
33
|
+
|
|
34
|
+
## Universal Design Standard
|
|
35
|
+
|
|
36
|
+
The following principles are not team style preferences. They are baseline expectations for well-built software.
|
|
37
|
+
|
|
38
|
+
Agents should apply them pragmatically, not dogmatically. A deliberate exception is acceptable only when the tradeoff is explicit and, if durable, documented in an ADR or architecture note.
|
|
39
|
+
|
|
40
|
+
## Simplicity
|
|
41
|
+
|
|
42
|
+
Prefer the simplest robust design that satisfies the known requirements.
|
|
43
|
+
|
|
44
|
+
Good simplicity removes accidental complexity without hiding essential domain complexity.
|
|
45
|
+
|
|
46
|
+
Agents should avoid:
|
|
47
|
+
|
|
48
|
+
- speculative abstractions;
|
|
49
|
+
- unnecessary layers;
|
|
50
|
+
- cleverness that obscures intent;
|
|
51
|
+
- generic frameworks for one concrete use case;
|
|
52
|
+
- premature optimisation;
|
|
53
|
+
- broad rewrites when a focused change is safer.
|
|
54
|
+
|
|
55
|
+
## Cohesion and Coupling
|
|
56
|
+
|
|
57
|
+
Code should have high cohesion and low unnecessary coupling.
|
|
58
|
+
|
|
59
|
+
Responsibilities should be grouped by reason to change. Dependencies should be explicit, directional, and justified.
|
|
60
|
+
|
|
61
|
+
Agents should avoid hidden coupling through global state, shared mutable data, implicit environment assumptions, temporal ordering, or undocumented side effects.
|
|
62
|
+
|
|
63
|
+
## SOLID as Engineering Heuristics
|
|
64
|
+
|
|
65
|
+
Use SOLID as practical design heuristics, not as ceremony.
|
|
66
|
+
|
|
67
|
+
- **Single Responsibility:** modules should have a clear reason to change.
|
|
68
|
+
- **Open/Closed:** design extension points where variation is real, not imagined.
|
|
69
|
+
- **Liskov Substitution:** implementations must honour the contracts they claim to satisfy.
|
|
70
|
+
- **Interface Segregation:** interfaces should be focused and not force irrelevant dependencies.
|
|
71
|
+
- **Dependency Inversion:** depend on stable abstractions when doing so reduces real coupling and improves testability.
|
|
72
|
+
|
|
73
|
+
Applying SOLID should make the system clearer. If it makes the design more abstract, indirect, or brittle without a real benefit, it is being misused.
|
|
74
|
+
|
|
75
|
+
## Clean Boundaries
|
|
76
|
+
|
|
77
|
+
Boundaries should make the system easier to understand, test, and change.
|
|
78
|
+
|
|
79
|
+
Where applicable:
|
|
80
|
+
|
|
81
|
+
- domain logic should not depend unnecessarily on infrastructure;
|
|
82
|
+
- I/O, frameworks, persistence, and external services should sit at the edges;
|
|
83
|
+
- core behaviour should be testable without heavy runtime setup;
|
|
84
|
+
- adapters should be replaceable;
|
|
85
|
+
- dependencies should point towards stable policy and domain concepts, not incidental mechanisms.
|
|
86
|
+
|
|
87
|
+
Clean architecture is not mandatory layering. It is disciplined dependency direction and clear separation of responsibilities.
|
|
88
|
+
|
|
89
|
+
## Domain Modelling
|
|
90
|
+
|
|
91
|
+
Important domain concepts should be named and represented explicitly.
|
|
92
|
+
|
|
93
|
+
Agents should prefer code that:
|
|
94
|
+
|
|
95
|
+
- uses the language of the domain;
|
|
96
|
+
- makes invariants visible;
|
|
97
|
+
- makes invalid states hard to represent where the language allows it;
|
|
98
|
+
- keeps business rules close to the concepts they govern;
|
|
99
|
+
- avoids scattering domain rules across unrelated technical plumbing.
|
|
100
|
+
|
|
101
|
+
If durable domain knowledge is discovered while coding, it should be reflected in Gherkin, glossary, tests, or architecture docs as appropriate.
|
|
102
|
+
|
|
103
|
+
## Encapsulation and Information Hiding
|
|
104
|
+
|
|
105
|
+
Implementation details should be hidden behind stable, meaningful interfaces.
|
|
106
|
+
|
|
107
|
+
Encapsulation should protect invariants and reduce the cost of change. It should not be used to obscure behaviour, hide poor naming, or create unnecessary indirection.
|
|
108
|
+
|
|
109
|
+
## Explicit Contracts
|
|
110
|
+
|
|
111
|
+
Inputs, outputs, errors, side effects, and invariants should be explicit.
|
|
112
|
+
|
|
113
|
+
Use types, schemas, validation, assertions, preconditions, postconditions, or tests as appropriate for the language and risk.
|
|
114
|
+
|
|
115
|
+
Agents should avoid APIs where callers must guess:
|
|
116
|
+
|
|
117
|
+
- what values are valid;
|
|
118
|
+
- what errors can occur;
|
|
119
|
+
- whether data is mutated;
|
|
120
|
+
- whether operations are idempotent;
|
|
121
|
+
- what external effects happen.
|
|
122
|
+
|
|
123
|
+
## Error Handling
|
|
124
|
+
|
|
125
|
+
Error handling is part of design quality.
|
|
126
|
+
|
|
127
|
+
Good error handling:
|
|
128
|
+
|
|
129
|
+
- distinguishes expected failures from programming errors;
|
|
130
|
+
- preserves useful diagnostic context;
|
|
131
|
+
- avoids swallowing errors silently;
|
|
132
|
+
- avoids leaking secrets;
|
|
133
|
+
- provides actionable messages where user-facing;
|
|
134
|
+
- keeps recovery paths explicit;
|
|
135
|
+
- is covered by tests for important failure modes.
|
|
136
|
+
|
|
137
|
+
## Security and Privacy
|
|
138
|
+
|
|
139
|
+
Security and privacy are default engineering responsibilities.
|
|
140
|
+
|
|
141
|
+
Agents should consider:
|
|
142
|
+
|
|
143
|
+
- least privilege;
|
|
144
|
+
- input validation;
|
|
145
|
+
- output encoding where relevant;
|
|
146
|
+
- authentication and authorisation boundaries;
|
|
147
|
+
- secret handling;
|
|
148
|
+
- safe logging;
|
|
149
|
+
- data minimisation;
|
|
150
|
+
- retention and deletion rules;
|
|
151
|
+
- dependency and supply-chain risk;
|
|
152
|
+
- tenant or user data isolation.
|
|
153
|
+
|
|
154
|
+
Security-sensitive tradeoffs must not be hidden in implementation details.
|
|
155
|
+
|
|
156
|
+
## Operability
|
|
157
|
+
|
|
158
|
+
Production-quality systems should be diagnosable and operable.
|
|
159
|
+
|
|
160
|
+
Where relevant, design should include:
|
|
161
|
+
|
|
162
|
+
- useful logs;
|
|
163
|
+
- metrics;
|
|
164
|
+
- traces;
|
|
165
|
+
- health checks;
|
|
166
|
+
- clear failure modes;
|
|
167
|
+
- migration and rollback considerations;
|
|
168
|
+
- performance characteristics that can be reasoned about.
|
|
169
|
+
|
|
170
|
+
Do not add observability noise blindly. Add the signals needed to understand and operate the system.
|
|
171
|
+
|
|
172
|
+
## Performance Awareness
|
|
173
|
+
|
|
174
|
+
Avoid premature micro-optimisation, but do not ignore obvious performance risks.
|
|
175
|
+
|
|
176
|
+
Agents should consider algorithmic complexity, N+1 queries, unbounded memory growth, excessive network calls, unnecessary serial work, lock contention, and expensive operations in hot paths.
|
|
177
|
+
|
|
178
|
+
Performance decisions that shape architecture or user-visible guarantees should be persisted in architecture docs or ADRs.
|
|
179
|
+
|
|
180
|
+
## Code Quality
|
|
181
|
+
|
|
182
|
+
Production code should be:
|
|
183
|
+
|
|
184
|
+
- consistently formatted by project tooling;
|
|
185
|
+
- free of lint/static-analysis violations;
|
|
186
|
+
- clear and readable;
|
|
187
|
+
- minimal but complete;
|
|
188
|
+
- appropriately typed where the language supports it;
|
|
189
|
+
- explicit about errors and edge cases;
|
|
190
|
+
- named with precision;
|
|
191
|
+
- cohesive;
|
|
192
|
+
- locally understandable;
|
|
193
|
+
- consistent with surrounding patterns;
|
|
194
|
+
- free of unrelated cleanup;
|
|
195
|
+
- structured around domain behaviour rather than incidental mechanics.
|
|
196
|
+
|
|
197
|
+
Good code makes intended behaviour obvious and incorrect changes harder.
|
|
198
|
+
|
|
199
|
+
## Refactoring Discipline
|
|
200
|
+
|
|
201
|
+
TDD is not only Red → Green. It includes Refactor.
|
|
202
|
+
|
|
203
|
+
After tests are green, agents must ask whether the implementation is clean enough to accept.
|
|
204
|
+
|
|
205
|
+
Refactoring should:
|
|
206
|
+
|
|
207
|
+
- preserve behaviour;
|
|
208
|
+
- keep tests green;
|
|
209
|
+
- improve names, boundaries, cohesion, duplication, or clarity;
|
|
210
|
+
- stay within scope;
|
|
211
|
+
- avoid mixing unrelated cleanup with feature work;
|
|
212
|
+
- be committed separately when it is a meaningful unit of work.
|
|
213
|
+
|
|
214
|
+
If the green implementation is correct but low quality, the work is not done.
|
|
215
|
+
|
|
216
|
+
## Generated and Agent-Written Code
|
|
217
|
+
|
|
218
|
+
Generated code and agent-written code require review.
|
|
219
|
+
|
|
220
|
+
Agents must not accept code simply because it compiles or passes tests. They must check whether it is understandable, maintainable, scoped, secure, and architecturally appropriate.
|
|
221
|
+
|
|
222
|
+
Brittle generated code, shallow abstractions, duplicated implementation logic, and hard-coded paths to satisfy tests are engineering quality failures.
|
|
223
|
+
|
|
224
|
+
## Relationship to Rebuildability
|
|
225
|
+
|
|
226
|
+
Rebuildability means the system can be recreated from durable knowledge. It does not mean current code can be careless.
|
|
227
|
+
|
|
228
|
+
High-quality code improves rebuildability because:
|
|
229
|
+
|
|
230
|
+
- clean boundaries make partial replacement safer;
|
|
231
|
+
- readable code reduces spec drift;
|
|
232
|
+
- explicit contracts make tests stronger;
|
|
233
|
+
- good domain modelling clarifies Gherkin and glossary entries;
|
|
234
|
+
- documented tradeoffs make rewrites more reliable.
|
|
235
|
+
|
|
236
|
+
## Relationship to Tooling
|
|
237
|
+
|
|
238
|
+
Formatters, linters, type checkers, static analysers, dynamic analysers, coverage, and mutation testing are part of the engineering quality system.
|
|
239
|
+
|
|
240
|
+
Tooling does not prove design excellence, but failing or absent tooling is a quality risk that must be addressed or explicitly reported.
|
|
241
|
+
|
|
242
|
+
## Prohibited Behaviour
|
|
243
|
+
|
|
244
|
+
Agents must not:
|
|
245
|
+
|
|
246
|
+
- generate code that merely satisfies tests while being poorly designed;
|
|
247
|
+
- hard-code behaviour just to turn tests green;
|
|
248
|
+
- create broad abstractions without demonstrated need;
|
|
249
|
+
- hide complexity behind vague helpers;
|
|
250
|
+
- introduce global state casually;
|
|
251
|
+
- weaken boundaries for convenience;
|
|
252
|
+
- ignore errors, edge cases, security, or privacy;
|
|
253
|
+
- mix unrelated refactors into feature work;
|
|
254
|
+
- accept brittle generated code without review;
|
|
255
|
+
- optimise for speed over long-term quality;
|
|
256
|
+
- leave important design decisions only in chat, comments, or implementation details.
|
|
257
|
+
|
|
258
|
+
## Review Expectations
|
|
259
|
+
|
|
260
|
+
Engineering review should check:
|
|
261
|
+
|
|
262
|
+
- correctness;
|
|
263
|
+
- test strength;
|
|
264
|
+
- architectural fit;
|
|
265
|
+
- simplicity;
|
|
266
|
+
- cohesion and coupling;
|
|
267
|
+
- SOLID principle violations where relevant;
|
|
268
|
+
- clean boundary and dependency direction;
|
|
269
|
+
- domain modelling;
|
|
270
|
+
- naming;
|
|
271
|
+
- explicit contracts;
|
|
272
|
+
- error handling;
|
|
273
|
+
- security and privacy;
|
|
274
|
+
- operability;
|
|
275
|
+
- performance risks;
|
|
276
|
+
- maintainability;
|
|
277
|
+
- consistency with project patterns;
|
|
278
|
+
- unnecessary complexity;
|
|
279
|
+
- rebuildability.
|
|
280
|
+
|
|
281
|
+
## Acceptance Rule
|
|
282
|
+
|
|
283
|
+
A change is complete only when the implementation is not merely working, but designed and written to a high engineering standard appropriate to its risk and importance.
|
|
284
|
+
|
|
285
|
+
If code is correct but architecturally weak, brittle, overcomplicated, insecure, unmaintainable, or inconsistent with durable project knowledge, it is not done.
|
|
@@ -0,0 +1,137 @@
|
|
|
1
|
+
# Quality Tooling
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Code quality includes style, formatting, static analysis, and dynamic validation.
|
|
6
|
+
|
|
7
|
+
Arey Pi requires projects to define and use quality tooling as part of the normal validation and Definition of Done. Tooling is not optional polish; it is part of engineering quality.
|
|
8
|
+
|
|
9
|
+
## Core Rule
|
|
10
|
+
|
|
11
|
+
Every project must have explicit quality tooling for its language and stack.
|
|
12
|
+
|
|
13
|
+
At minimum, a project should define:
|
|
14
|
+
|
|
15
|
+
- a formatter;
|
|
16
|
+
- a linter or static analyser;
|
|
17
|
+
- type checking where the language supports it;
|
|
18
|
+
- relevant dynamic analysis or runtime validation where practical;
|
|
19
|
+
- commands that agents can run consistently.
|
|
20
|
+
|
|
21
|
+
If this tooling is missing, agents must not silently proceed as if validation is complete. They must propose appropriate tooling and ask the user which option to install or configure.
|
|
22
|
+
|
|
23
|
+
## Non-Negotiable Requirement
|
|
24
|
+
|
|
25
|
+
Formatting and analysis are part of the validation phase and Definition of Done.
|
|
26
|
+
|
|
27
|
+
A change is not complete until the relevant quality commands have been run successfully, or a blocker has been explicitly reported.
|
|
28
|
+
|
|
29
|
+
## Tool Selection
|
|
30
|
+
|
|
31
|
+
Tooling should match the project language, ecosystem, and existing conventions.
|
|
32
|
+
|
|
33
|
+
Preferred examples:
|
|
34
|
+
|
|
35
|
+
- **TypeScript/JavaScript:** Biome by default when no project standard exists; otherwise follow existing ESLint/Prettier/TypeScript configuration.
|
|
36
|
+
- **Python:** Ruff for linting and import/style checks; Black when the project standard uses it or when formatting is not covered by Ruff configuration.
|
|
37
|
+
- **Rust:** `cargo fmt`, `cargo clippy`.
|
|
38
|
+
- **Go:** `gofmt`, `go vet`, relevant linters when configured.
|
|
39
|
+
- **Java/Kotlin:** project formatter/linter plus build tool checks.
|
|
40
|
+
|
|
41
|
+
These are defaults, not universal mandates. Existing project standards win unless they conflict with Arey Pi's quality guarantees.
|
|
42
|
+
|
|
43
|
+
## Missing Tooling
|
|
44
|
+
|
|
45
|
+
When a project lacks quality tooling, agents must:
|
|
46
|
+
|
|
47
|
+
1. Identify the language and stack.
|
|
48
|
+
2. Inspect existing package/build configuration.
|
|
49
|
+
3. Propose a minimal quality toolchain.
|
|
50
|
+
4. Ask the user before installing dependencies or changing project tooling.
|
|
51
|
+
5. Persist the chosen commands in project scripts/docs/config.
|
|
52
|
+
6. Use the tooling as part of validation going forward.
|
|
53
|
+
|
|
54
|
+
Agents should not introduce major tooling churn without approval.
|
|
55
|
+
|
|
56
|
+
## Validation Commands
|
|
57
|
+
|
|
58
|
+
Projects should expose stable commands where possible, for example:
|
|
59
|
+
|
|
60
|
+
```txt
|
|
61
|
+
format
|
|
62
|
+
lint
|
|
63
|
+
typecheck
|
|
64
|
+
test
|
|
65
|
+
check
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
The exact command names can vary by ecosystem, but agents should be able to discover and run the standard validation path.
|
|
69
|
+
|
|
70
|
+
A good `check` command should usually compose formatting checks, lint/static analysis, type checking, and tests.
|
|
71
|
+
|
|
72
|
+
## Formatting
|
|
73
|
+
|
|
74
|
+
Formatting should be automated and deterministic.
|
|
75
|
+
|
|
76
|
+
Agents should not hand-format large code changes when a formatter exists. They should run the formatter or formatting check according to project convention.
|
|
77
|
+
|
|
78
|
+
Formatting-only changes should be isolated from behaviour changes unless explicitly approved.
|
|
79
|
+
|
|
80
|
+
## Static Analysis
|
|
81
|
+
|
|
82
|
+
Static analysis should catch issues such as:
|
|
83
|
+
|
|
84
|
+
- type errors;
|
|
85
|
+
- unused code;
|
|
86
|
+
- unreachable code;
|
|
87
|
+
- unsafe patterns;
|
|
88
|
+
- style violations;
|
|
89
|
+
- import/order problems;
|
|
90
|
+
- likely bugs;
|
|
91
|
+
- security-sensitive patterns where tooling supports it.
|
|
92
|
+
|
|
93
|
+
Agents must treat static analysis failures as validation failures unless explicitly classified as unrelated pre-existing issues.
|
|
94
|
+
|
|
95
|
+
## Dynamic Analysis
|
|
96
|
+
|
|
97
|
+
Where practical, projects should include dynamic validation appropriate to their risk profile, such as:
|
|
98
|
+
|
|
99
|
+
- test suites;
|
|
100
|
+
- mutation testing;
|
|
101
|
+
- property-based tests;
|
|
102
|
+
- runtime assertions in test environments;
|
|
103
|
+
- integration checks;
|
|
104
|
+
- security scanners;
|
|
105
|
+
- performance checks for performance-sensitive code.
|
|
106
|
+
|
|
107
|
+
Dynamic analysis should be selected based on risk, not added blindly.
|
|
108
|
+
|
|
109
|
+
## Existing Failures
|
|
110
|
+
|
|
111
|
+
If formatting, linting, type checking, or dynamic analysis already fails before the agent's change, the agent must:
|
|
112
|
+
|
|
113
|
+
- record the baseline failure;
|
|
114
|
+
- avoid making it worse;
|
|
115
|
+
- fix it if within scope;
|
|
116
|
+
- otherwise report it as pre-existing residual risk.
|
|
117
|
+
|
|
118
|
+
Do not claim validation success when quality tooling fails.
|
|
119
|
+
|
|
120
|
+
## Agent Behaviour
|
|
121
|
+
|
|
122
|
+
Before changing code, agents should inspect available tooling.
|
|
123
|
+
|
|
124
|
+
Before completion, agents should run relevant commands and report:
|
|
125
|
+
|
|
126
|
+
- formatter/check command;
|
|
127
|
+
- lint/static analysis command;
|
|
128
|
+
- typecheck command where applicable;
|
|
129
|
+
- test command;
|
|
130
|
+
- dynamic analysis command where applicable;
|
|
131
|
+
- failures, skips, and residual risks.
|
|
132
|
+
|
|
133
|
+
## Acceptance Rule
|
|
134
|
+
|
|
135
|
+
A code change is not done until style and analysis tooling have either passed or been explicitly blocked with evidence.
|
|
136
|
+
|
|
137
|
+
If a project has no such tooling, adding or selecting it becomes part of the engineering work and must be discussed with the user.
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Rebuildability
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
Arey Pi treats production code as replaceable. A healthy project can discard and recreate implementation modules from durable knowledge.
|
|
6
|
+
|
|
7
|
+
## Core Rule
|
|
8
|
+
|
|
9
|
+
A module is rebuildable when another agent or developer can understand and recreate its intended behaviour from:
|
|
10
|
+
|
|
11
|
+
- Gherkin specs;
|
|
12
|
+
- tests;
|
|
13
|
+
- architecture docs;
|
|
14
|
+
- ADRs;
|
|
15
|
+
- glossary;
|
|
16
|
+
- project rules.
|
|
17
|
+
|
|
18
|
+
The old implementation may be useful context, but it must not be required to recover intent.
|
|
19
|
+
|
|
20
|
+
## Disposable Code Principle
|
|
21
|
+
|
|
22
|
+
Code is valuable, but it is not the canonical memory of the system.
|
|
23
|
+
|
|
24
|
+
Agents should actively move durable knowledge out of implementation-only places into specs, tests, architecture docs, ADRs, or glossary entries.
|
|
25
|
+
|
|
26
|
+
## Rebuildability Signals
|
|
27
|
+
|
|
28
|
+
A module may not be rebuildable when:
|
|
29
|
+
|
|
30
|
+
- behaviour is only discoverable by reading implementation code;
|
|
31
|
+
- tests assert mechanics but not intended behaviour;
|
|
32
|
+
- Gherkin specs are missing or stale;
|
|
33
|
+
- architectural constraints are implicit;
|
|
34
|
+
- domain terms are undefined;
|
|
35
|
+
- critical decisions exist only in comments, commit history, or chat.
|
|
36
|
+
|
|
37
|
+
## Rewrites
|
|
38
|
+
|
|
39
|
+
A rewrite should start from canonical specs and tests, not by copying the previous implementation.
|
|
40
|
+
|
|
41
|
+
The previous code can be inspected for migration clues, edge cases, and compatibility risks, but canonical specs/tests define the target.
|
|
42
|
+
|
|
43
|
+
## Agent Behaviour
|
|
44
|
+
|
|
45
|
+
When working on a module, agents should ask:
|
|
46
|
+
|
|
47
|
+
> Could this module be deleted and recreated from the durable project knowledge?
|
|
48
|
+
|
|
49
|
+
If the answer is no, improve the specs/tests/docs when the current task touches the relevant area.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# TDD
|
|
2
|
+
|
|
3
|
+
## Purpose
|
|
4
|
+
|
|
5
|
+
TDD is mandatory for production behaviour.
|
|
6
|
+
|
|
7
|
+
Tests are the executable truth that verifies canonical specs and makes rebuildable code possible.
|
|
8
|
+
|
|
9
|
+
## Core Rule
|
|
10
|
+
|
|
11
|
+
Production behaviour must be introduced or changed through:
|
|
12
|
+
|
|
13
|
+
```txt
|
|
14
|
+
Red → Green → Refactor
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
This applies to features, bug fixes, behaviour changes, risky refactors, API/CLI behaviour, validation, permissions, persistence, and error handling.
|
|
18
|
+
|
|
19
|
+
## Red
|
|
20
|
+
|
|
21
|
+
Before implementation, create, update, or identify a test that fails for the intended reason.
|
|
22
|
+
|
|
23
|
+
Valid Red evidence includes:
|
|
24
|
+
|
|
25
|
+
- a newly added failing test;
|
|
26
|
+
- an updated failing test;
|
|
27
|
+
- an existing failing test that already captures the intended behaviour;
|
|
28
|
+
- a documented inability to run the test, including the exact intended command and reason.
|
|
29
|
+
|
|
30
|
+
A failure caused by setup, syntax, environment, or unrelated behaviour does not count as valid Red evidence.
|
|
31
|
+
|
|
32
|
+
## Green
|
|
33
|
+
|
|
34
|
+
Implement the smallest scoped change that makes the relevant test pass.
|
|
35
|
+
|
|
36
|
+
Green evidence should show:
|
|
37
|
+
|
|
38
|
+
- the relevant test passes;
|
|
39
|
+
- relevant surrounding tests pass where practical;
|
|
40
|
+
- no assertions were weakened just to pass;
|
|
41
|
+
- no unrelated behaviour was changed.
|
|
42
|
+
|
|
43
|
+
## Refactor
|
|
44
|
+
|
|
45
|
+
After Green, refactor only while tests remain green.
|
|
46
|
+
|
|
47
|
+
Refactoring should improve clarity, structure, duplication, or maintainability without expanding scope or changing behaviour.
|
|
48
|
+
|
|
49
|
+
## Bug Fixes
|
|
50
|
+
|
|
51
|
+
Every bug fix requires a regression test.
|
|
52
|
+
|
|
53
|
+
The expected flow is:
|
|
54
|
+
|
|
55
|
+
```txt
|
|
56
|
+
Reproduce with failing test → Fix → Passing regression test → Relevant suite green → Spec sync
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Pure Refactors
|
|
60
|
+
|
|
61
|
+
A pure refactor may rely on existing tests if the agent can explain why coverage is sufficient.
|
|
62
|
+
|
|
63
|
+
If coverage is weak and the refactor is risky, add characterization tests before changing production code.
|
|
64
|
+
|
|
65
|
+
## Prohibited Practices
|
|
66
|
+
|
|
67
|
+
Agents must not:
|
|
68
|
+
|
|
69
|
+
- write production behaviour first and tests later;
|
|
70
|
+
- weaken tests to make implementation pass;
|
|
71
|
+
- delete failing tests without justification;
|
|
72
|
+
- claim Red from irrelevant failures;
|
|
73
|
+
- skip test execution silently;
|
|
74
|
+
- expand scope beyond the spec/test intent.
|
|
75
|
+
|
|
76
|
+
## Evidence
|
|
77
|
+
|
|
78
|
+
Completion must report:
|
|
79
|
+
|
|
80
|
+
- related Gherkin scenario if applicable;
|
|
81
|
+
- Red evidence;
|
|
82
|
+
- Green evidence;
|
|
83
|
+
- refactor status;
|
|
84
|
+
- commands run;
|
|
85
|
+
- tests not run and why;
|
|
86
|
+
- residual risks.
|