npm - arey-pi - Versions diffs - 0.1.0 - Mend

arey-pi 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

package/LICENSE +21 -0
package/README.md +159 -0
package/agents/README.md +313 -0
package/agents/engineering-reviewer.md +78 -0
package/agents/project-evaluator.md +136 -0
package/agents/spec-author.md +82 -0
package/agents/spec-syncer.md +88 -0
package/agents/tdd-implementer.md +81 -0
package/agents/tech-lead.md +92 -0
package/package.json +48 -0
package/prompts/assess-project.md +38 -0
package/rules/README.md +57 -0
package/rules/architecture/adrs.md +257 -0
package/rules/architecture/architecture-memory.md +55 -0
package/rules/assessment/project-readiness.md +224 -0
package/rules/core/change-modes.md +63 -0
package/rules/core/conflict-resolution.md +56 -0
package/rules/core/definition-of-done.md +67 -0
package/rules/core/principles.md +63 -0
package/rules/engineering/engineering-quality.md +285 -0
package/rules/engineering/quality-tooling.md +137 -0
package/rules/engineering/rebuildability.md +49 -0
package/rules/engineering/tdd.md +86 -0
package/rules/engineering/test-quality.md +159 -0
package/rules/specs/canonical-specs.md +62 -0
package/rules/specs/database-specs.md +142 -0
package/rules/specs/gherkin-authoring.md +121 -0
package/rules/specs/language-style.md +106 -0
package/rules/specs/spec-sync.md +70 -0
package/rules/workflow/agent-workflows.md +70 -0
package/rules/workflow/ai-harness.md +177 -0
package/rules/workflow/incremental-commits.md +88 -0
package/skills/project-readiness/SKILL.md +96 -0

package/rules/engineering/test-quality.md ADDED Viewed

@@ -0,0 +1,159 @@
+# Test Quality
+## Purpose
+TDD only works when tests meaningfully constrain behaviour.
+This policy defines how agents should assess whether generated or modified tests are valuable, behaviour-focused, and capable of catching real regressions.
+## Core Rule
+A test is not good because it exists, passes, or increases coverage.
+A good test fails when the intended behaviour is broken and passes for the right reason when the behaviour is implemented correctly.
+## Quality Dimensions
+Agents should assess test quality across these dimensions:
+1. **Behavioural relevance:** the test validates behaviour that matters to users, domain rules, APIs, CLIs, integrations, or important internal contracts.
+2. **Spec traceability:** the test can be connected to a Gherkin scenario, regression, ADR, or explicit requirement.
+3. **Failure quality:** the test would fail for a meaningful behavioural regression, not only for incidental implementation changes.
+4. **Assertion strength:** the test asserts outcomes, state changes, side effects, errors, or contracts clearly enough to catch wrong implementations.
+5. **Minimal coupling:** the test avoids depending on private implementation details unless intentionally characterizing legacy code before refactor.
+6. **Maintainability:** the test is readable, focused, deterministic, and not overly broad.
+7. **Regression value:** for bug fixes, the test would have failed before the fix.
+## Coverage
+Coverage is useful but insufficient.
+Agents may use coverage to detect untested areas, but they must not treat high coverage as proof of quality.
+Coverage is most useful for:
+- identifying unexecuted branches in changed code;
+- finding missing edge case tests;
+- detecting untested error paths;
+- guiding review after refactors or rewrites.
+Coverage is weak when:
+- assertions are shallow;
+- tests execute code without validating outcomes;
+- tests only mirror implementation details;
+- generated tests assert mocks rather than behaviour;
+- coverage increases without mutation or failure evidence.
+## Mutation Testing
+Mutation testing is the preferred evidence for test strength when practical.
+A useful test suite should kill meaningful mutants in changed behaviour.
+Agents should consider mutation testing especially for:
+- domain rules;
+- validation logic;
+- permissions and security-sensitive code;
+- financial or billing logic;
+- complex conditionals;
+- bug fixes;
+- rewrites;
+- modules intended to be rebuildable from specs/tests.
+Mutation testing is not required for every tiny change, but Arey Pi should prefer it for critical or high-risk behaviour.
+## Mutation Score
+Mutation score thresholds are project-specific, but agents should report mutation results when available.
+Suggested defaults:
+- **Critical domain/security logic:** 90%+ mutation score for touched code, or explicit justification.
+- **Normal behaviour changes:** 75%+ mutation score for touched code, or explicit justification.
+- **Exploratory/legacy characterization:** no fixed score, but surviving meaningful mutants should be reviewed.
+The score alone is not enough. Surviving mutants must be triaged for whether they represent real test gaps, equivalent mutants, or irrelevant implementation details.
+## Surviving Mutants
+When mutation testing finds surviving mutants, agents should classify them as:
+- **test gap:** add or strengthen tests;
+- **equivalent mutant:** behaviour is unchanged, document why;
+- **irrelevant mutant:** not tied to durable behaviour, document why;
+- **spec gap:** update Gherkin/specs because intended behaviour is unclear;
+- **design smell:** implementation is too hard to specify or validate.
+Do not ignore surviving mutants silently.
+## Generated Test Review
+Generated tests require extra scrutiny.
+Agents must reject or improve tests that:
+- only assert that a function was called;
+- duplicate implementation logic instead of asserting behaviour;
+- snapshot large outputs without explaining what matters;
+- depend on arbitrary timing;
+- over-mock collaborators so the real contract is not exercised;
+- assert private structure instead of observable behaviour;
+- pass even if important logic is removed;
+- lack a clear connection to a spec, bug, or requirement.
+## Negative and Edge Cases
+For behaviour changes, agents should consider whether tests cover:
+- happy path;
+- invalid input;
+- boundary values;
+- missing permissions;
+- error handling;
+- idempotency or repeated actions;
+- persistence/integration side effects;
+- backwards compatibility where relevant.
+Not every change needs all categories, but omitted important categories should be intentional.
+## Test Review Heuristic
+Before accepting a test, ask:
+> If I intentionally broke the behaviour in the simplest realistic way, would this test fail?
+If the answer is no, the test is probably weak.
+## Required Evidence
+For non-trivial production changes, agents should report:
+- tests added or modified;
+- related Gherkin scenarios or requirements;
+- Red evidence;
+- Green evidence;
+- coverage results when run;
+- mutation testing results when run;
+- surviving mutants and their classification;
+- test quality concerns or residual risks.
+## When Mutation Testing Is Not Available
+If mutation testing is not configured or practical, agents should report that explicitly and use alternative evidence:
+- focused manual test review;
+- branch/line coverage for touched code;
+- deliberate failure checks;
+- regression reproduction;
+- edge case analysis;
+- reviewer validation.
+This should be reported as weaker evidence than mutation testing.
+## Acceptance Rule
+A change with weak tests is not complete merely because tests pass.
+If tests do not meaningfully protect the intended behaviour, agents must strengthen them, report the gap as a blocker, or ask for approval to proceed with residual risk.

package/rules/specs/canonical-specs.md ADDED Viewed

@@ -0,0 +1,62 @@
+# Canonical Specs
+## Purpose
+Canonical specs are the source of truth for intended project behaviour and durable project knowledge.
+They are not secondary documentation. They are the contract that allows the implementation to be safely changed, discarded, or rebuilt.
+## Canonical Sources
+Arey Pi recognises these canonical sources:
+| Source | Purpose |
+| --- | --- |
+| Gherkin feature specs | Observable behaviour, workflows, business rules, API/CLI contracts, important edge cases |
+| Tests | Executable verification of the specs and regression protection |
+| Architecture docs | System boundaries, constraints, major components, integration models |
+| DBML database specs | Canonical database structure for projects with persistent storage |
+| ADRs | Important technical decisions, tradeoffs, accepted consequences |
+| Glossary | Domain language, concepts, meanings, aliases, and forbidden terms |
+| Project rules | Non-negotiable engineering policies and local conventions |
+## Authority
+Canonical specs define intended behaviour by default.
+If existing code disagrees with canonical specs, agents must not assume the code is correct. They must either:
+- align the implementation with the specs;
+- update the specs if the user explicitly approves the new behaviour;
+- or stop and ask for clarification when intent is unclear.
+## Required Persistence
+A behaviour, rule, decision, or constraint should be persisted canonically when it is:
+- user-visible;
+- externally observable;
+- part of a public or internal contract;
+- a business or domain rule;
+- relevant to validation, permissions, errors, security, or persistence;
+- important for future rebuilds;
+- likely to surprise future implementers;
+- needed to understand why the system is shaped a certain way.
+## Non-Canonical Knowledge
+The following are not sufficient as durable sources of truth by themselves:
+- implementation code;
+- inline comments;
+- chat history;
+- temporary plans;
+- stale README prose;
+- agent assumptions;
+- inferred behaviour from current implementation.
+They may inform updates, but durable knowledge must be promoted into canonical specs, tests, DBML database specs, ADRs, architecture docs, or glossary entries.
+## Acceptance Rule
+A change is not complete until canonical specs are either updated or explicitly confirmed unaffected.

package/rules/specs/database-specs.md ADDED Viewed

@@ -0,0 +1,142 @@
+# Database Specs
+## Purpose
+Projects with a database must persist their data model as a canonical spec.
+Arey Pi uses DBML as the default canonical format for database structure because it is readable, diffable, tool-friendly, and independent of a specific ORM or migration tool.
+## Core Rule
+If a project has a database, it must have a DBML spec that is kept precisely synchronised with the real database model.
+The DBML spec is not optional documentation. It is a canonical source of truth for the intended database schema and must be maintained with the same care as Gherkin behaviour specs, tests, ADRs, and architecture docs.
+## Default Location
+The default location is:
+```txt
+specs/database/schema.dbml
+```
+Large systems may split DBML by bounded context or database, for example:
+```txt
+specs/database/main.dbml
+specs/database/analytics.dbml
+specs/database/billing.dbml
+```
+If multiple files are used, the project must document how they relate to deployed databases, schemas, services, or bounded contexts.
+## What Belongs in DBML
+DBML should describe the durable intended database structure, including:
+- tables;
+- columns;
+- data types;
+- nullability;
+- primary keys;
+- foreign keys;
+- unique constraints;
+- indexes when behaviourally or operationally relevant;
+- enums or constrained values where applicable;
+- join tables;
+- relationship cardinality;
+- important column notes;
+- table ownership or bounded context notes where useful.
+## What Does Not Belong in DBML
+Avoid encoding incidental implementation details that are not part of the durable data model, such as:
+- temporary migration mechanics;
+- one-off backfill scripts;
+- ORM-only helper fields that do not exist in storage;
+- environment-specific physical tuning unless it is a durable constraint;
+- generated naming noise that does not clarify the model.
+## Synchronisation Rule
+Every database-affecting change must update DBML in the same change set.
+This includes changes to:
+- migrations;
+- ORM models;
+- schema definitions;
+- SQL DDL;
+- persistence code that implies a schema change;
+- indexes or constraints;
+- relationship cardinality;
+- enum values;
+- data ownership boundaries;
+- soft-delete, audit, tenancy, or versioning fields;
+- database-specific behaviour that affects product or operational guarantees.
+If a change touches persistence code but does not alter the schema, agents must explicitly state that the DBML spec is unaffected and why.
+## Precision Requirement
+DBML must be synchronised to the millimetre.
+Agents must not leave approximate, stale, partial, or aspirational database specs. The DBML should match the intended schema represented by migrations and deployed database contracts.
+If the current database, migrations, ORM models, and DBML disagree, agents must report the drift and ask for clarification unless the intended source of truth is explicit.
+## Relationship to Migrations
+Migrations are executable change history. DBML is the canonical current model.
+Both matter:
+- migrations explain how the database changes over time;
+- DBML explains what the database is intended to look like now.
+A migration-only schema change is not complete without DBML synchronisation.
+## Relationship to Gherkin
+Gherkin describes observable behaviour. DBML describes persistent data structure.
+When a behaviour change introduces or changes persisted concepts, agents should update both:
+- Gherkin for the behaviour;
+- DBML for the data model;
+- glossary for domain terminology;
+- ADR/architecture docs for non-trivial data decisions.
+## Validation
+When tooling exists, agents should validate DBML using the project's chosen DBML tooling or generated diagrams/checks.
+When DBML validation tooling is absent, agents should still review the DBML manually and recommend adding tooling if database work is significant.
+Possible validation includes:
+- DBML parser/check command;
+- diagram generation;
+- comparison against ORM schema;
+- comparison against migration output;
+- database introspection in safe environments;
+- review by a database-aware agent or human.
+## Agent Behaviour
+Before completing database-related work, agents must inspect relevant DBML specs when they exist.
+If database code exists but DBML specs are missing, agents must surface the gap and propose adding DBML. For database-affecting work, adding the initial DBML spec should be treated as part of making the project Arey Pi-aligned unless the user explicitly defers it.
+Agents must not silently make schema changes without DBML synchronisation.
+## Acceptance Rule
+A database-affecting change is not done until:
+- DBML exists for the affected database;
+- DBML reflects the intended schema precisely;
+- migrations/schema/ORM/database code and DBML agree;
+- relevant Gherkin, glossary, ADRs, and architecture docs are updated when the data model change affects behaviour or durable design;
+- DBML validation has run or limitations are reported.

package/rules/specs/gherkin-authoring.md ADDED Viewed

@@ -0,0 +1,121 @@
+# Gherkin Authoring
+## Purpose
+Gherkin feature specs persist canonical behaviour in a format that is readable by humans, usable by agents, and traceable to tests.
+Gherkin is the default format for functional and behavioural specs.
+## Location
+Canonical Gherkin specs live under:
+```txt
+specs/features/
+```
+Recommended organisation is by domain or capability:
+```txt
+specs/features/auth/login.feature
+specs/features/billing/subscription.feature
+specs/features/api/users.feature
+```
+## What Belongs in Gherkin
+Use Gherkin for:
+- user-visible behaviour;
+- business rules;
+- domain workflows;
+- API behaviour;
+- CLI behaviour;
+- validation rules;
+- permission rules;
+- error states;
+- externally observable side effects;
+- important edge cases;
+- bug regressions when they clarify intended behaviour.
+## What Does Not Belong in Gherkin
+Avoid encoding incidental implementation details such as:
+- class names;
+- private function names;
+- internal file structure;
+- database queries unless externally contractual;
+- framework mechanics;
+- low-level algorithms unless they are part of the domain contract;
+- temporary implementation workarounds.
+## Style
+Gherkin should be written in domain language.
+Gherkin specs must use semantic line breaks.
+Each step should express one clear condition, action, outcome, or coherent idea.
+Avoid long multi-clause steps when separate semantic steps would be clearer.
+Prefer clear behavioural structure:
+```gherkin
+Feature: Account login
+  Rule: Registered users can authenticate with valid credentials
+    Scenario: Successful login with valid credentials
+      Given a registered user exists
+      When the user submits valid credentials
+      Then the system creates an authenticated session
+      And the user can access their account
+```
+Use `Rule` when it makes business rules clearer. Use `Scenario Outline` when examples express the rule better than prose.
+## Scenario Quality
+A good scenario is:
+- observable;
+- testable;
+- specific;
+- written in business/domain language;
+- focused on one behaviour or rule;
+- free of incidental implementation detail.
+A scenario should be easy to map to one or more tests.
+## Coverage Expectations
+Feature specs should normally include:
+- a happy path;
+- relevant failure paths;
+- important edge cases;
+- permission or validation boundaries when applicable.
+Do not add scenarios only for quantity. Each scenario should clarify intended behaviour.
+## Traceability
+Tests should reference related features/scenarios where practical.
+Example:
+```ts
+// Feature: Account login
+// Scenario: Successful login with valid credentials
+```
+or equivalent test naming.
+## Update Rule
+Every observable behaviour change must either:
+1. add a Gherkin scenario;
+2. update an existing scenario;
+3. explicitly state that existing scenarios already cover the behaviour;
+4. stop for clarification if the change conflicts with existing Gherkin.

package/rules/specs/language-style.md ADDED Viewed

@@ -0,0 +1,106 @@
+# Language Style
+## Purpose
+Arey Pi projects should communicate in the language style expected by their users, customers, and maintainers.
+For Arey Pi, the default written English standard is UK English.
+## Core Rule
+Use UK English for all project-facing writing unless the project explicitly defines a different language standard.
+This applies to:
+- Gherkin specs;
+- DBML notes and database spec prose;
+- documentation;
+- ADRs;
+- glossary entries;
+- README files;
+- AGENTS.md and AI harness instructions;
+- skills and prompts;
+- user-facing messages;
+- error messages;
+- comments intended to explain durable behaviour;
+- agent summaries and reports.
+## Semantic Line Breaks
+Specs must always use semantic line breaks.
+This applies especially to:
+- Gherkin feature files;
+- DBML notes and documentation comments;
+- ADRs;
+- architecture docs;
+- glossary entries;
+- Markdown specs and project rules.
+Use one sentence, clause, or coherent idea per line where practical.
+Break lines at semantic boundaries rather than arbitrary wrapping widths.
+Semantic line breaks improve:
+- git diffs;
+- code review;
+- agent edits;
+- proofreading;
+- localisation;
+- focused updates to canonical specs.
+Do not reflow entire documents just for formatting unless requested.
+When editing specs or docs, preserve or improve semantic line breaks in the touched sections.
+## UK English Expectations
+Prefer UK spellings and phrasing.
+For example, use:
+- behaviour, not the US spelling `behavior`;
+- colour, not the US spelling `color`;
+- organise, not the US spelling `organize`;
+- analyse, not the US spelling `analyze`;
+- centre, not the US spelling `center`;
+- licence as a noun in British usage, and license as a verb;
+- modelling, not the US spelling `modeling`.
+Agents should be consistent within a project and should not mix US and UK spelling casually.
+## Exceptions
+Do not rewrite language when it would break or distort:
+- code identifiers;
+- public APIs;
+- package names;
+- external protocol names;
+- quoted material;
+- third-party documentation references;
+- generated code where spelling is dictated by tooling;
+- established domain terms used by the customer.
+If an existing codebase uses US spelling in identifiers or APIs, preserve compatibility.
+Prefer UK English in prose around it.
+## Gherkin and Product Text
+Gherkin specs should use UK English unless the product or domain requires a different spelling.
+Because specs are customer-facing durable knowledge, agents should treat language consistency and semantic line breaks as part of spec quality.
+## Agent Behaviour
+Agents should write final answers, reports, docs, specs, and project instructions in UK English by default.
+When editing existing prose, agents should avoid large spelling-only rewrites unless requested.
+For touched text, prefer UK English, preserve semantic line breaks, and flag widespread inconsistency as a follow-up.
+## Acceptance Rule
+Project-facing prose is not fully polished if it mixes UK and US English without a reason.
+Specs are not style-compliant if they do not use semantic line breaks.
+For important specs, docs, prompts, and harness instructions, language style consistency and semantic line breaks are part of quality review.

package/rules/specs/spec-sync.md ADDED Viewed

@@ -0,0 +1,70 @@
+# Spec Sync
+## Purpose
+Spec sync guarantees that specs, tests, and code agree at the end of every task.
+It applies whether the work started with Spec-Driven Mode or Direct Change Mode.
+## Core Rule
+Every completed change must end with canonical specs synchronised or explicitly confirmed unaffected.
+The final result must be one of:
+```txt
+Specs updated
+```
+or:
+```txt
+Specs unaffected: <reason>
+```
+## Sync Dimensions
+At task completion, agents must consider each canonical dimension:
+- **Gherkin:** Did observable behaviour, API/CLI contracts, rules, errors, permissions, or edge cases change?
+- **Tests:** Do tests represent the intended behaviour and trace to relevant specs where practical?
+- **Architecture:** Did boundaries, dependencies, storage, integrations, or system constraints change?
+- **Database/DBML:** Did migrations, ORM models, SQL DDL, schema definitions, indexes, constraints, relationships, or persistence contracts change?
+- **ADRs:** Was a meaningful technical decision made that future agents/developers need to understand, and is it important enough for a high-quality ADR rather than process noise?
+- **Glossary:** Was a new domain term introduced or an existing meaning changed?
+## Required Behaviour
+If behaviour changed, update Gherkin.
+If architecture changed, update architecture docs or create/update a high-quality ADR when the decision has durable impact.
+If database structure changed, update the canonical DBML spec precisely.
+If durable domain language changed, update the glossary.
+If only implementation changed and behaviour stayed the same, explain why specs are unaffected and name the coverage relied on where practical.
+## Conflict Handling
+If specs, tests, and code disagree, agents must not silently pick the current implementation.
+They must resolve the disagreement by:
+- implementing the canonical spec;
+- updating the spec with explicit user approval;
+- or stopping for clarification.
+## Final Report Format
+Agents should close with a compact sync report:
+```txt
+Spec sync:
+- Gherkin:
+- Tests:
+- Architecture/ADR:
+- Database/DBML:
+- Glossary:
+- Status:
+```