codex-genesis-harness 0.1.7 → 0.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.codebase/COMPRESSED_CONTEXT.md +80 -0
- package/.codebase/CURRENT_STATE.md +37 -11
- package/.codebase/DEPENDENCY_GRAPH.md +14 -1
- package/.codebase/IMPLEMENTATION_HANDOFF.md +34 -336
- package/.codebase/KNOWN_PROBLEMS.md +54 -3
- package/.codebase/MODULE_INDEX.md +8 -0
- package/.codebase/PIPELINE_FLOW.md +7 -5
- package/.codebase/RECOVERY_POINTS.md +17 -78
- package/.codebase/TECH_DEBT.md +6 -0
- package/.codebase/TEST_MATRIX.md +4 -3
- package/.codebase/VISUAL_GRAPH.md +127 -0
- package/.codebase/context-policy.json +68 -0
- package/.codebase/memories/lessons_learned.md +21 -0
- package/.codebase/memories/preferences.md +17 -0
- package/.codebase/state.json +45 -24
- package/.codex/skills/genesis-architecture/SKILL.md +5 -0
- package/.codex/skills/genesis-debug-guide/SKILL.md +10 -4
- package/.codex/skills/genesis-docs-automation/SKILL.md +52 -973
- package/.codex/skills/genesis-executing-plans/SKILL.md +54 -0
- package/.codex/skills/genesis-executing-plans/agents/openai.yaml +6 -0
- package/.codex/skills/genesis-executing-plans/checklists/.gitkeep +0 -0
- package/.codex/skills/genesis-executing-plans/examples/.gitkeep +0 -0
- package/.codex/skills/genesis-executing-plans/templates/.gitkeep +0 -0
- package/.codex/skills/genesis-harness/SKILL.md +64 -1385
- package/.codex/skills/genesis-harness/scripts/check-docs-sync.sh +3 -3
- package/.codex/skills/genesis-harness/scripts/init-planning.sh +1 -1
- package/.codex/skills/genesis-new-design/SKILL.md +4 -1
- package/.codex/skills/genesis-new-design/agents/openai.yaml +2 -0
- package/.codex/skills/genesis-observability-automation/SKILL.md +69 -303
- package/.codex/skills/genesis-observability-automation/references/common-mistakes-and-recovery.md +84 -0
- package/.codex/skills/genesis-observability-automation/references/workflow-phases.md +78 -0
- package/.codex/skills/genesis-performance-profiling/SKILL.md +1 -22
- package/.codex/skills/genesis-performance-profiling/agents/openai.yaml +1 -1
- package/.codex/skills/genesis-planning/SKILL.md +6 -1
- package/.codex/skills/genesis-release/SKILL.md +5 -0
- package/.codex/skills/genesis-research-first/SKILL.md +6 -0
- package/.codex/skills/genesis-spec-propagation/SKILL.md +52 -504
- package/.codex/skills/genesis-test-driven-development/SKILL.md +55 -0
- package/.codex/skills/genesis-test-driven-development/agents/openai.yaml +6 -0
- package/.codex/skills/genesis-test-driven-development/checklists/.gitkeep +0 -0
- package/.codex/skills/genesis-test-driven-development/examples/.gitkeep +0 -0
- package/.codex/skills/genesis-test-driven-development/templates/.gitkeep +0 -0
- package/.codex/skills/genesis-upgrade-design/SKILL.md +4 -2
- package/.codex/skills/genesis-upgrade-design/agents/openai.yaml +2 -0
- package/.codex/skills/genesis-using-git-worktrees/SKILL.md +54 -0
- package/.codex/skills/genesis-using-git-worktrees/agents/openai.yaml +6 -0
- package/.codex/skills/genesis-using-git-worktrees/checklists/.gitkeep +0 -0
- package/.codex/skills/genesis-using-git-worktrees/examples/.gitkeep +0 -0
- package/.codex/skills/genesis-using-git-worktrees/templates/.gitkeep +0 -0
- package/.codex/skills/genesis-verification-before-completion/SKILL.md +53 -0
- package/.codex/skills/genesis-verification-before-completion/agents/openai.yaml +6 -0
- package/.codex/skills/genesis-verification-before-completion/checklists/.gitkeep +0 -0
- package/.codex/skills/genesis-verification-before-completion/examples/.gitkeep +0 -0
- package/.codex/skills/genesis-verification-before-completion/templates/.gitkeep +0 -0
- package/.codex/skills/spec-impact-engine/SKILL.md +77 -500
- package/.codex/skills/spec-impact-engine/checklists/checklist.md +10 -0
- package/.codex-plugin/plugin.json +3 -4
- package/CHANGELOG.md +4 -1
- package/README.EN.md +32 -17
- package/README.VI.md +35 -19
- package/README.md +48 -10
- package/VERSION +1 -1
- package/bin/genesis-harness.js +735 -5
- package/contracts/features/registry-schema.json +15 -0
- package/contracts/observability/agent-run-schema.json +34 -0
- package/contracts/observability/failure-schema.json +35 -0
- package/contracts/ui/auth/login-screen-contract.json +43 -0
- package/features/REGISTRY.md +63 -0
- package/features/SCOPE-template.md +65 -0
- package/fixtures/planning/MOCKUP_PROMPT_TEMPLATE.md +16 -0
- package/observability/agent-runs/sample-run.json +13 -0
- package/observability/decision-logs/sample-decision.md +43 -0
- package/observability/failures/sample-failure.json +12 -0
- package/package.json +9 -3
- package/playwright/e2e/app-template.spec.js +37 -0
- package/playwright/e2e/auth/login-screen.spec.js +65 -0
- package/playwright/e2e/web-template.spec.js +28 -0
- package/scripts/check-scope.sh +100 -0
- package/scripts/cold-start-check.js +133 -0
- package/scripts/install.sh +4 -0
- package/scripts/prompt_sentinel.js +35 -4
- package/scripts/run-evals.sh +119 -3
- package/scripts/scratch_parser.js +49 -0
- package/scripts/spec_visual_sync.js +1 -1
- package/scripts/test_generator.js +2 -2
- package/scripts/uninstall.sh +4 -0
- package/scripts/verify.sh +16 -1
- package/tests/integration/cli-smoke.test.js +103 -0
- package/tests/unit/feature_registry.test.js +152 -0
- package/tests/unit/prompt_sentinel.test.js +1 -1
- package/tests/unit/spec_visual_sync.test.js +1 -1
- package/tests/unit/test_generator.test.js +1 -1
- package/playwright/e2e/e2e-template.md +0 -4
|
@@ -1,83 +1,22 @@
|
|
|
1
1
|
# Recovery Points
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
|
|
5
|
-
**Use When**: Evolution of the Codex harness (verification loops, CLI tools, scripts) needs to be paused, or when a rollback is necessary due to environment breakage.
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Quick Reference: Current Recovery Points
|
|
10
|
-
|
|
11
|
-
| Phase | Status | Resumption File | Last Updated |
|
|
12
|
-
|-------|--------|-----------------|--------------|
|
|
13
|
-
| TUI Mockup Viewer Integration | ✓ Complete | `.codebase/CURRENT_STATE.md` | 2026-06-01 |
|
|
14
|
-
| Harness Verification Streamlining | ✓ Complete | `.codebase/CURRENT_STATE.md` | 2026-06-01 |
|
|
15
|
-
| Bead Memory Regression Tests | ✓ Complete | `scripts/run-evals.sh` | 2026-06-01 |
|
|
16
|
-
| Harness Engineering Overhaul | ⏸️ Idle (Stable) | `scripts/verify.sh` | 2026-06-01 |
|
|
17
|
-
|
|
18
|
-
---
|
|
19
|
-
|
|
20
|
-
## Phase: Harness Verification Streamlining & Memory Evals
|
|
21
|
-
|
|
22
|
-
**Status**: ✓ Complete
|
|
23
|
-
**Last Updated**: 2026-06-01
|
|
24
|
-
|
|
25
|
-
### What Happened
|
|
26
|
-
|
|
27
|
-
- Cleaned up legacy/deprecated skills (e.g., `genesis-mvp-planning`, `genesis-release-orchestration`) from `scripts/verify.sh`, `scripts/uninstall.sh`, and `scripts/run-evals.sh`.
|
|
28
|
-
- Removed hard-coded skill name mappings (`expected_name` switch statements), enabling dynamic mapping directly based on directory names.
|
|
29
|
-
- Added test coverage in `run-evals.sh` for the local bead memory commands (`remember`, `recall`, `prime`, `forget`).
|
|
30
|
-
- Enforced `state-machine.md` presence in `verify_harness_skill()`.
|
|
31
|
-
|
|
32
|
-
### Safe State Confirmation
|
|
33
|
-
|
|
34
|
-
The harness currently passes all structural tests cleanly.
|
|
35
|
-
```bash
|
|
36
|
-
# Verify structure
|
|
37
|
-
./scripts/verify.sh
|
|
38
|
-
|
|
39
|
-
# Verify regression
|
|
40
|
-
./scripts/run-evals.sh
|
|
41
|
-
|
|
42
|
-
# Dry-run package integrity
|
|
43
|
-
npm run pack:check
|
|
44
|
-
```
|
|
45
|
-
|
|
46
|
-
---
|
|
47
|
-
|
|
48
|
-
## Rollback Points
|
|
49
|
-
|
|
50
|
-
### If A Future Harness Evolution Breaks the CLI/Environment
|
|
51
|
-
|
|
52
|
-
**Rollback Level 1: Last Stable Run (Current State)**
|
|
53
|
-
If a new change to `bin/genesis-harness.js` or `scripts/verify.sh` creates infinite loops or immediate failures:
|
|
54
|
-
```bash
|
|
55
|
-
git checkout -- bin/genesis-harness.js scripts/verify.sh scripts/run-evals.sh
|
|
56
|
-
npm install
|
|
57
|
-
./scripts/verify.sh
|
|
58
|
-
```
|
|
59
|
-
|
|
60
|
-
**Rollback Level 2: Full Repository Reset**
|
|
61
|
-
If tests are failing in a manner that contaminates local fixtures or memory:
|
|
62
|
-
```bash
|
|
63
|
-
git reset --hard HEAD
|
|
64
|
-
git clean -fd
|
|
65
|
-
npm install
|
|
66
|
-
./scripts/verify.sh
|
|
67
|
-
```
|
|
68
|
-
|
|
69
|
-
---
|
|
70
|
-
|
|
71
|
-
## Checklist: Before Pausing Work on Harness Evolutions
|
|
72
|
-
|
|
73
|
-
- [ ] `scripts/verify.sh` passing cleanly (Exit Code 0)
|
|
74
|
-
- [ ] `scripts/run-evals.sh` passing cleanly (Exit Code 0)
|
|
75
|
-
- [ ] Script files verified for POSIX/LF line endings
|
|
76
|
-
- [ ] No uncommitted changes in core scripts that break existing workflows
|
|
77
|
-
- [ ] `.codebase/CURRENT_STATE.md` updated with exact phase details
|
|
3
|
+
A reverse-chronological log of stable states to return to if the current task corrupts the project.
|
|
78
4
|
|
|
79
5
|
---
|
|
80
6
|
|
|
81
|
-
##
|
|
82
|
-
**
|
|
83
|
-
**
|
|
7
|
+
## 2026-06-03T09:55:00+07:00: Full Score Harness Fix (110/110)
|
|
8
|
+
- **Status**: Stable
|
|
9
|
+
- **Git State**: Everything committed + new features added.
|
|
10
|
+
- **Why it's stable**: All tests (`tests/unit/*.test.js`), `verify.sh`, `run-evals.sh`, and `cold-start-check.js` pass with exit code 0.
|
|
11
|
+
- **How to recover**: `git reset --hard HEAD` (assuming commit happens immediately after this)
|
|
12
|
+
- **Files added**: `features/REGISTRY.md`, `scripts/cold-start-check.js`, `scripts/check-scope.sh`, observability schemas/samples.
|
|
13
|
+
|
|
14
|
+
## 2026-06-03T09:30:00+07:00: LeanCTX + CLI Postinstall Seed
|
|
15
|
+
- **Status**: Stable
|
|
16
|
+
- **Why it's stable**: `npm run verify` and `npm run eval` pass. `context-policy.json` successfully bootstrapped.
|
|
17
|
+
- **How to recover**: Return to commit before the evaluation score fixes.
|
|
18
|
+
|
|
19
|
+
## 2026-06-03T08:35:00+07:00: Harness Drift Gate Hardening
|
|
20
|
+
- **Status**: Stable
|
|
21
|
+
- **Why it's stable**: `npm run verify`, `npm run eval`, and `npm run pack:check` all pass.
|
|
22
|
+
- **How to recover**: Revert to branch state before LeanCTX introduction.
|
package/.codebase/TEST_MATRIX.md
CHANGED
|
@@ -2,10 +2,11 @@
|
|
|
2
2
|
|
|
3
3
|
Required checks:
|
|
4
4
|
|
|
5
|
-
- `scripts/verify.sh`: repository harness structure, skill metadata, contracts, fixtures,
|
|
6
|
-
- `scripts/run-evals.sh`: install/verify/uninstall regression checks.
|
|
5
|
+
- `scripts/verify.sh`: repository harness structure, skill metadata, contracts, fixtures, harness smoke test, and `SKILL.md` progressive-disclosure line limit.
|
|
6
|
+
- `scripts/run-evals.sh`: install/verify/uninstall regression checks, manifest route checks, sync-generated Mermaid relationship checks, hook docs-gate checks, LeanCTX policy checks, handoff/state freshness checks, `tests/unit/*.test.js`, and `tests/integration/*.test.js`.
|
|
7
|
+
- `tests/integration/cli-smoke.test.js`: package CLI smoke for install/postinstall LeanCTX seeding, `path`, `status`, `docs`, `docs-gate`, `leanctx`, `prime`, and `sync` in temporary fixture repositories.
|
|
8
|
+
- `tests/unit/prompt_sentinel.test.js`: LeanCTX-backed prompt sentinel threshold and truncation behavior.
|
|
7
9
|
- `npm run pack:check`: package contents dry-run.
|
|
8
10
|
- Skill validation: run `quick_validate.py` for changed skills when available.
|
|
9
11
|
|
|
10
12
|
Feature rule: add or update fixtures and expected output before implementation.
|
|
11
|
-
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# Visual Project Graph
|
|
2
|
+
|
|
3
|
+
## Harness Relationship Map
|
|
4
|
+
|
|
5
|
+
```mermaid
|
|
6
|
+
flowchart LR
|
|
7
|
+
manifest[".codex-plugin/plugin.json"] --> skills[".codex/skills/*"]
|
|
8
|
+
package["package.json"] --> cli["bin/genesis-harness.js"]
|
|
9
|
+
package --> verify["scripts/verify.sh"]
|
|
10
|
+
package --> evals["scripts/run-evals.sh"]
|
|
11
|
+
cli --> install["install / postinstall"]
|
|
12
|
+
cli --> hooks["setup-hooks"]
|
|
13
|
+
hooks --> docsgate["genesis-harness docs-gate"]
|
|
14
|
+
docsgate --> docsync["check-docs-sync.sh"]
|
|
15
|
+
docsgate --> specsync["check-spec-changelog.sh"]
|
|
16
|
+
skills --> contracts["contracts/"]
|
|
17
|
+
skills --> fixtures["fixtures/"]
|
|
18
|
+
skills --> tests["tests/ + playwright/"]
|
|
19
|
+
skills --> memory[".codebase/"]
|
|
20
|
+
verify --> skills
|
|
21
|
+
verify --> contracts
|
|
22
|
+
verify --> fixtures
|
|
23
|
+
verify --> memory
|
|
24
|
+
evals --> install
|
|
25
|
+
evals --> cli
|
|
26
|
+
evals --> unit["tests/unit/*.test.js"]
|
|
27
|
+
evals --> integration["tests/integration/*.test.js"]
|
|
28
|
+
evals --> pack["npm pack smoke"]
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Skill Workflow Relationships
|
|
32
|
+
|
|
33
|
+
```mermaid
|
|
34
|
+
flowchart TD
|
|
35
|
+
harness["genesis-harness"] --> planning["genesis-planning"]
|
|
36
|
+
harness --> research["genesis-research-first"]
|
|
37
|
+
planning --> architecture["genesis-architecture"]
|
|
38
|
+
planning --> api["genesis-api-contract"]
|
|
39
|
+
planning --> design["genesis-design-spec"]
|
|
40
|
+
api --> apisync["genesis-api-sync"]
|
|
41
|
+
design --> ui["genesis-ui-ux-test"]
|
|
42
|
+
api --> specimpact["spec-impact-engine"]
|
|
43
|
+
specimpact --> specprop["genesis-spec-propagation"]
|
|
44
|
+
specprop --> docs["genesis-docs-automation"]
|
|
45
|
+
ui --> verifybefore["genesis-verification-before-completion"]
|
|
46
|
+
apisync --> verifybefore
|
|
47
|
+
docs --> verifybefore
|
|
48
|
+
verifybefore --> release["genesis-release"]
|
|
49
|
+
harness --> memorymap["genesis-codebase-map"]
|
|
50
|
+
harness --> observability["genesis-observability-automation"]
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Code Dependency Hints
|
|
54
|
+
|
|
55
|
+
```mermaid
|
|
56
|
+
flowchart TD
|
|
57
|
+
"tests/integration/cli-smoke.test.js" --> "assert"
|
|
58
|
+
"tests/integration/cli-smoke.test.js" --> "fs"
|
|
59
|
+
"tests/integration/cli-smoke.test.js" --> "os"
|
|
60
|
+
"tests/integration/cli-smoke.test.js" --> "path"
|
|
61
|
+
"tests/integration/cli-smoke.test.js" --> "child_process"
|
|
62
|
+
"tests/unit/contract_integrity_gate.test.js" --> "assert"
|
|
63
|
+
"tests/unit/contract_integrity_gate.test.js" --> "fs"
|
|
64
|
+
"tests/unit/contract_integrity_gate.test.js" --> "path"
|
|
65
|
+
"tests/unit/contract_integrity_gate.test.js" --> "child_process"
|
|
66
|
+
"tests/unit/healing_telemetry.test.js" --> "assert"
|
|
67
|
+
"tests/unit/healing_telemetry.test.js" --> "fs"
|
|
68
|
+
"tests/unit/healing_telemetry.test.js" --> "path"
|
|
69
|
+
"tests/unit/healing_telemetry.test.js" --> "child_process"
|
|
70
|
+
"tests/unit/prompt_sentinel.test.js" --> "assert"
|
|
71
|
+
"tests/unit/prompt_sentinel.test.js" --> "fs"
|
|
72
|
+
"tests/unit/prompt_sentinel.test.js" --> "path"
|
|
73
|
+
"tests/unit/prompt_sentinel.test.js" --> "child_process"
|
|
74
|
+
"tests/unit/spec_visual_sync.test.js" --> "assert"
|
|
75
|
+
"tests/unit/spec_visual_sync.test.js" --> "fs"
|
|
76
|
+
"tests/unit/spec_visual_sync.test.js" --> "path"
|
|
77
|
+
"tests/unit/spec_visual_sync.test.js" --> "child_process"
|
|
78
|
+
"tests/unit/test_generator.test.js" --> "assert"
|
|
79
|
+
"tests/unit/test_generator.test.js" --> "fs"
|
|
80
|
+
"tests/unit/test_generator.test.js" --> "path"
|
|
81
|
+
"tests/unit/test_generator.test.js" --> "child_process"
|
|
82
|
+
"bin/genesis-harness.js" --> "fs"
|
|
83
|
+
"bin/genesis-harness.js" --> "path"
|
|
84
|
+
"bin/genesis-harness.js" --> "child_process"
|
|
85
|
+
"bin/genesis-harness.js" --> "@babel/parser"
|
|
86
|
+
"bin/genesis-harness.js" --> "@babel/traverse"
|
|
87
|
+
"bin/genesis-harness.js" --> "child_process"
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## .planning/ROADMAP.md Derived Feature Status
|
|
91
|
+
|
|
92
|
+
```mermaid
|
|
93
|
+
graph TD
|
|
94
|
+
classDef completed fill:#d4edda,stroke:#28a745,stroke-width:2px;
|
|
95
|
+
classDef inprogress fill:#fff3cd,stroke:#ffc107,stroke-width:2px;
|
|
96
|
+
classDef pending fill:#e2e3e5,stroke:#6c757d,stroke-width:2px;
|
|
97
|
+
subgraph Role_0 ["Role: User"]
|
|
98
|
+
Task0["Roadmap task 0"]
|
|
99
|
+
class Task0 completed;
|
|
100
|
+
Task1["Roadmap task 1"]
|
|
101
|
+
class Task1 inprogress;
|
|
102
|
+
Task2["Roadmap task 2"]
|
|
103
|
+
class Task2 pending;
|
|
104
|
+
end
|
|
105
|
+
subgraph Role_1 ["Role: Admin"]
|
|
106
|
+
Task3["Roadmap task 3"]
|
|
107
|
+
class Task3 completed;
|
|
108
|
+
Task4["Roadmap task 4"]
|
|
109
|
+
class Task4 pending;
|
|
110
|
+
Task5["Roadmap task 5"]
|
|
111
|
+
class Task5 inprogress;
|
|
112
|
+
end
|
|
113
|
+
subgraph Role_2 ["Role: Analytics"]
|
|
114
|
+
Task6["Roadmap task 6"]
|
|
115
|
+
class Task6 pending;
|
|
116
|
+
Task7["Roadmap task 7"]
|
|
117
|
+
class Task7 pending;
|
|
118
|
+
Task8["Roadmap task 8"]
|
|
119
|
+
class Task8 inprogress;
|
|
120
|
+
end
|
|
121
|
+
Task0 --> Task1
|
|
122
|
+
Task0 --> Task2
|
|
123
|
+
Task2 --> Task4
|
|
124
|
+
Task2 --> Task5
|
|
125
|
+
Task4 --> Task6
|
|
126
|
+
```
|
|
127
|
+
|
|
@@ -0,0 +1,68 @@
|
|
|
1
|
+
{
|
|
2
|
+
"name": "leanctx-default",
|
|
3
|
+
"token_budget": 12000,
|
|
4
|
+
"_comment_token_budget": "Default conservative budget. Override in project .codebase/context-policy.json. For 128k-context models, set to 40000.",
|
|
5
|
+
"auto_scale": {
|
|
6
|
+
"enabled": false,
|
|
7
|
+
"note": "Set enabled=true and provide model_context_window to auto-calculate budgets. Formula: token_budget = model_context_window * 0.09 (9% for harness context leaves 91% for generation).",
|
|
8
|
+
"model_context_window": null,
|
|
9
|
+
"scale_factor": 0.09
|
|
10
|
+
},
|
|
11
|
+
"warn_at": 0.6,
|
|
12
|
+
"compact_at": 0.7,
|
|
13
|
+
"hard_stop_at": 0.85,
|
|
14
|
+
"layers": [
|
|
15
|
+
{
|
|
16
|
+
"name": "core",
|
|
17
|
+
"max_tokens": 2500,
|
|
18
|
+
"include": [
|
|
19
|
+
"AGENTS.md",
|
|
20
|
+
".codex/SOUL.md",
|
|
21
|
+
".codebase/CURRENT_STATE.md",
|
|
22
|
+
".codebase/MODULE_INDEX.md",
|
|
23
|
+
".codebase/TEST_MATRIX.md"
|
|
24
|
+
]
|
|
25
|
+
},
|
|
26
|
+
{
|
|
27
|
+
"name": "active_context",
|
|
28
|
+
"max_tokens": 6500,
|
|
29
|
+
"include": [
|
|
30
|
+
".codebase/COMPRESSED_CONTEXT.md",
|
|
31
|
+
".codebase/VISUAL_GRAPH.md",
|
|
32
|
+
".planning/STATE.md",
|
|
33
|
+
".planning/ROADMAP.md",
|
|
34
|
+
"contracts/",
|
|
35
|
+
"fixtures/"
|
|
36
|
+
]
|
|
37
|
+
},
|
|
38
|
+
{
|
|
39
|
+
"name": "deferred_reference",
|
|
40
|
+
"max_tokens": 3000,
|
|
41
|
+
"include": [
|
|
42
|
+
".codex/skills/*/references/",
|
|
43
|
+
".codex/skills/*/playbooks/",
|
|
44
|
+
".codex/skills/*/checklists/",
|
|
45
|
+
"README*.md"
|
|
46
|
+
]
|
|
47
|
+
}
|
|
48
|
+
],
|
|
49
|
+
"defer_patterns": [
|
|
50
|
+
".codex/skills/*/templates/**",
|
|
51
|
+
".codex/skills/*/examples/**",
|
|
52
|
+
"playwright/**",
|
|
53
|
+
"observability/**",
|
|
54
|
+
"node_modules/**",
|
|
55
|
+
"dist/**",
|
|
56
|
+
"coverage/**"
|
|
57
|
+
],
|
|
58
|
+
"portable_commands": [
|
|
59
|
+
"genesis-harness leanctx",
|
|
60
|
+
"genesis-harness sync",
|
|
61
|
+
"genesis-harness docs-gate",
|
|
62
|
+
"genesis-harness verify-gate",
|
|
63
|
+
"npm run verify",
|
|
64
|
+
"npm run eval",
|
|
65
|
+
"node scripts/cold-start-check.js"
|
|
66
|
+
],
|
|
67
|
+
"wrapper_policy": "rtk optional when installed locally; public docs and CI must use portable commands."
|
|
68
|
+
}
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
# Lessons Learned & Historical Bugs
|
|
2
|
+
|
|
3
|
+
This file chronicles the major failures, recursive bugs, and architectural dead-ends we have encountered. It acts as an immune system preventing the agent from repeating history.
|
|
4
|
+
|
|
5
|
+
## 1. Duplicate Slash Commands in Registry
|
|
6
|
+
- **Symptom**: Agent registered 4 copies of the same slash command for a single skill.
|
|
7
|
+
- **Root Cause**: The CLI script recursively scanned the entire `.codex/` directory for active skills, accidentally parsing backup folders (`.codex/backup/`) generated during skill upgrades.
|
|
8
|
+
- **Resolution**: Backup directories must ALWAYS be placed completely outside the active parsed directory (e.g., moved to `~/.codex/backups` globally).
|
|
9
|
+
- **Rule**: When doing file tree walks for plugins/skills, always explicitly ignore `.git`, `node_modules`, `backup`, and `tmp` folders.
|
|
10
|
+
|
|
11
|
+
## 2. Documentation Drift & Broken Contracts
|
|
12
|
+
- **Symptom**: Code in `scripts/` changed logic without updating `contracts/`.
|
|
13
|
+
- **Root Cause**: Agent skipped the documentation step after a "quick fix" code edit.
|
|
14
|
+
- **Resolution**: Implemented Validation Gates (`npm run verify`).
|
|
15
|
+
- **Rule**: Never finalize a code edit without explicitly checking `TEST_MATRIX.md` and related schemas in `contracts/`. The validation gate will fail the build if it detects drift.
|
|
16
|
+
|
|
17
|
+
## 3. Excessive Token Usage from `cat` and `ls`
|
|
18
|
+
- **Symptom**: Context window flooded with massive minified bundle files or deep directory trees.
|
|
19
|
+
- **Root Cause**: Using `cat` on large files or `ls -R` without filters.
|
|
20
|
+
- **Resolution**:
|
|
21
|
+
- **Rule**: Always use the native `view_file`, `list_dir`, and `grep_search` tools with precise line bounds or search terms. NEVER `cat` a file directly in bash if a native agent tool exists.
|
|
@@ -0,0 +1,17 @@
|
|
|
1
|
+
# Developer Preferences
|
|
2
|
+
|
|
3
|
+
This file records the specific technical choices, preferences, and stylistic guidelines of the human developer for this repository. Adhere to these implicitly during code generation and problem-solving.
|
|
4
|
+
|
|
5
|
+
## Technology Stack
|
|
6
|
+
- **Primary Language**: JavaScript (Node.js for backend scripts). Keep code modern but compatible with Node >= 18.
|
|
7
|
+
- **Testing**: Use standard Unix bash testing scripts (`verify.sh`, `run-evals.sh`) and standard Node asserts for unit testing unless specified otherwise.
|
|
8
|
+
- **Frontend/UI**: When dealing with UI generation, prefer Vanilla CSS for precise control and maximum performance. Emphasize "WOW" factor, modern gradients, glassmorphism, and responsive layouts.
|
|
9
|
+
|
|
10
|
+
## Architectural Choices
|
|
11
|
+
- **Harness Engineering**: The system relies on state machines (FSM) and validation gates. Never skip a validation gate (`contract_integrity_gate.js`, `healing_telemetry.js`).
|
|
12
|
+
- **File Integrity**: Ensure all metadata inside `.codebase/` and `contracts/` stays perfectly synchronized with any changes to actual codebase logic (`scripts/spec_visual_sync.js`).
|
|
13
|
+
|
|
14
|
+
## Communication Style
|
|
15
|
+
- Be concise, professional, and skip unnecessary pleasantries when delivering technical solutions.
|
|
16
|
+
- Use GitHub Flavored Markdown for formatting logs, alerts, and instructions.
|
|
17
|
+
- Provide Vietnamese localization in READMEs and user-facing artifacts where possible, as the user frequently requests Vietnamese communication.
|
package/.codebase/state.json
CHANGED
|
@@ -1,37 +1,58 @@
|
|
|
1
1
|
{
|
|
2
2
|
"current_state": "COMPLETED",
|
|
3
|
+
"completed_at": "2026-06-03T02:42:00Z",
|
|
4
|
+
"active_work": "Full Score Harness Fix — L02-L12",
|
|
5
|
+
"session_id": "2026-06-03-full-score-fix",
|
|
6
|
+
"session_started_at": "2026-06-03T02:35:00Z",
|
|
7
|
+
"ttfv_seconds": 180,
|
|
8
|
+
"_comment_ttfv": "Time-to-First-Verification: seconds from session start to first passing test (L06 KPI). 180s = 3min for this session.",
|
|
9
|
+
"latest_handoff": ".codebase/IMPLEMENTATION_HANDOFF.md",
|
|
10
|
+
"latest_recovery_point": "Full Score Harness Fix — feature registry + observability",
|
|
11
|
+
"required_verification": [
|
|
12
|
+
"npm run verify",
|
|
13
|
+
"npm run eval",
|
|
14
|
+
"npm run pack:check",
|
|
15
|
+
"node tests/unit/feature_registry.test.js",
|
|
16
|
+
"node scripts/cold-start-check.js",
|
|
17
|
+
"node bin/genesis-harness.js docs-gate",
|
|
18
|
+
"node bin/genesis-harness.js leanctx"
|
|
19
|
+
],
|
|
3
20
|
"history": [
|
|
4
21
|
{
|
|
5
|
-
"from": "
|
|
6
|
-
"to": "
|
|
7
|
-
"reason": "
|
|
8
|
-
"timestamp": "2026-
|
|
9
|
-
|
|
10
|
-
{
|
|
11
|
-
"from": "REQUIREMENTS_GATHERING",
|
|
12
|
-
"to": "PLANNING",
|
|
13
|
-
"reason": "p",
|
|
14
|
-
"timestamp": "2026-05-31T05:51:07.765Z"
|
|
15
|
-
},
|
|
16
|
-
{
|
|
17
|
-
"from": "PLANNING",
|
|
18
|
-
"to": "IMPLEMENTATION",
|
|
19
|
-
"reason": "i",
|
|
20
|
-
"timestamp": "2026-05-31T05:51:07.883Z"
|
|
22
|
+
"from": "VERIFICATION",
|
|
23
|
+
"to": "COMPLETED",
|
|
24
|
+
"reason": "Harness drift gate hardening completed with source-of-truth, handoff, state, and CLI smoke verification gates.",
|
|
25
|
+
"timestamp": "2026-06-03T08:31:59+07:00",
|
|
26
|
+
"session_id": "2026-06-03-drift-gate"
|
|
21
27
|
},
|
|
22
28
|
{
|
|
23
|
-
"from": "
|
|
24
|
-
"to": "
|
|
25
|
-
"reason": "
|
|
26
|
-
"timestamp": "2026-
|
|
29
|
+
"from": "VERIFICATION",
|
|
30
|
+
"to": "COMPLETED",
|
|
31
|
+
"reason": "LeanCTX policy, CLI reporting, prompt sentinel thresholds, and portable npm-user command guidance added.",
|
|
32
|
+
"timestamp": "2026-06-03T09:06:31+07:00",
|
|
33
|
+
"session_id": "2026-06-03-leanctx"
|
|
27
34
|
},
|
|
28
35
|
{
|
|
29
36
|
"from": "VERIFICATION",
|
|
30
37
|
"to": "COMPLETED",
|
|
31
|
-
"reason": "
|
|
32
|
-
"timestamp": "2026-
|
|
38
|
+
"reason": "LeanCTX policy auto-seeding added for install and npm postinstall without overwriting project custom policy.",
|
|
39
|
+
"timestamp": "2026-06-03T09:28:41+07:00",
|
|
40
|
+
"session_id": "2026-06-03-leanctx-seed"
|
|
41
|
+
},
|
|
42
|
+
{
|
|
43
|
+
"from": "COMPLETED",
|
|
44
|
+
"to": "EXECUTE",
|
|
45
|
+
"reason": "Started full harness evaluation and score fix: L08 feature registry, L11 observability live, L04 instruction size, L03 cold-start, L05 session boundary, L07 scope, L09 victory blocker, L12 known problems.",
|
|
46
|
+
"timestamp": "2026-06-03T02:35:00Z",
|
|
47
|
+
"session_id": "2026-06-03-full-score-fix"
|
|
33
48
|
}
|
|
34
49
|
],
|
|
35
|
-
"context": {
|
|
50
|
+
"context": {
|
|
51
|
+
"score_target": "110/110",
|
|
52
|
+
"package_version": "0.1.7",
|
|
53
|
+
"verification_owner": "scripts/run-evals.sh",
|
|
54
|
+
"context_policy": ".codebase/context-policy.json",
|
|
55
|
+
"evaluation_report": ".codebase/../artifacts/harness_evaluation_report.md"
|
|
56
|
+
},
|
|
36
57
|
"pending_tasks": []
|
|
37
|
-
}
|
|
58
|
+
}
|
|
@@ -44,3 +44,8 @@ Changing dependency direction silently, documenting intent without tests, and sc
|
|
|
44
44
|
## Recovery workflow
|
|
45
45
|
If architecture drift is found, stop feature work, create a failing boundary test, update the contract, then repair the smallest module slice.
|
|
46
46
|
|
|
47
|
+
## MCP Automation Requirements
|
|
48
|
+
|
|
49
|
+
To ensure that architectural decisions are properly contextualized and persisted across the entire lifecycle of the project, you **MUST** use the following MCP server:
|
|
50
|
+
1. **`@modelcontextprotocol/server-memory`**: Use this MCP tool to automatically query the Knowledge Graph for past architectural decisions, trade-offs, and boundary definitions before making any new system-wide changes. After establishing a new architecture rule, you must save it to the memory graph.
|
|
51
|
+
|
|
@@ -161,13 +161,13 @@ Update `.codebase` memory after meaningful changes.
|
|
|
161
161
|
- Update documentation if behavior changed
|
|
162
162
|
```
|
|
163
163
|
|
|
164
|
-
## Auto-Trigger Workflow (Post /fix-bug)
|
|
164
|
+
## Auto-Trigger Workflow (Post /fix-bug or Heal Directive)
|
|
165
165
|
|
|
166
|
-
When `/fix-bug` completes successfully
|
|
166
|
+
When `/fix-bug` completes successfully, or when you receive the `[AGENT_DIRECTIVE] TESTS FAILED` from `genesis-harness heal`:
|
|
167
167
|
|
|
168
168
|
```yaml
|
|
169
|
-
Hook: PostToolUse → "/fix-bug completed"
|
|
170
|
-
Action: Activate genesis-debug-guide
|
|
169
|
+
Hook: PostToolUse → "/fix-bug completed" OR "[AGENT_DIRECTIVE] TESTS FAILED" printed to stdout.
|
|
170
|
+
Action: Activate genesis-debug-guide. Do NOT stop until tests pass.
|
|
171
171
|
|
|
172
172
|
1. Fix Verification (5 min):
|
|
173
173
|
- Run: npm test (or equivalent)
|
|
@@ -400,6 +400,12 @@ If debugging effort exceeds 30 minutes:
|
|
|
400
400
|
- Check GitHub issues/PRs for related bugs
|
|
401
401
|
```
|
|
402
402
|
|
|
403
|
+
## MCP Automation Requirements
|
|
404
|
+
|
|
405
|
+
To systematically isolate and resolve bugs, you **MUST** use the following MCP servers:
|
|
406
|
+
1. **`@modelcontextprotocol/server-puppeteer`**: For any UI, E2E, or visual bug, use this MCP tool to automatically navigate to the local dev server, interact with the UI, reproduce the bug, capture the browser console logs, and take screenshots for Vision analysis.
|
|
407
|
+
2. **`@modelcontextprotocol/server-memory`**: Query the Knowledge Graph to see if this bug is a known regression or if a similar issue has been resolved in the past.
|
|
408
|
+
|
|
403
409
|
## Integration with Genesis Harness
|
|
404
410
|
|
|
405
411
|
**Works with**:
|