codex-genesis-harness 0.1.7 → 0.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (93) hide show
  1. package/.codebase/COMPRESSED_CONTEXT.md +80 -0
  2. package/.codebase/CURRENT_STATE.md +37 -11
  3. package/.codebase/DEPENDENCY_GRAPH.md +14 -1
  4. package/.codebase/IMPLEMENTATION_HANDOFF.md +34 -336
  5. package/.codebase/KNOWN_PROBLEMS.md +54 -3
  6. package/.codebase/MODULE_INDEX.md +8 -0
  7. package/.codebase/PIPELINE_FLOW.md +7 -5
  8. package/.codebase/RECOVERY_POINTS.md +17 -78
  9. package/.codebase/TECH_DEBT.md +6 -0
  10. package/.codebase/TEST_MATRIX.md +4 -3
  11. package/.codebase/VISUAL_GRAPH.md +127 -0
  12. package/.codebase/context-policy.json +68 -0
  13. package/.codebase/memories/lessons_learned.md +21 -0
  14. package/.codebase/memories/preferences.md +17 -0
  15. package/.codebase/state.json +45 -24
  16. package/.codex/skills/genesis-architecture/SKILL.md +5 -0
  17. package/.codex/skills/genesis-debug-guide/SKILL.md +10 -4
  18. package/.codex/skills/genesis-docs-automation/SKILL.md +52 -973
  19. package/.codex/skills/genesis-executing-plans/SKILL.md +54 -0
  20. package/.codex/skills/genesis-executing-plans/agents/openai.yaml +6 -0
  21. package/.codex/skills/genesis-executing-plans/checklists/.gitkeep +0 -0
  22. package/.codex/skills/genesis-executing-plans/examples/.gitkeep +0 -0
  23. package/.codex/skills/genesis-executing-plans/templates/.gitkeep +0 -0
  24. package/.codex/skills/genesis-harness/SKILL.md +64 -1385
  25. package/.codex/skills/genesis-harness/scripts/check-docs-sync.sh +3 -3
  26. package/.codex/skills/genesis-harness/scripts/init-planning.sh +1 -1
  27. package/.codex/skills/genesis-new-design/SKILL.md +4 -1
  28. package/.codex/skills/genesis-new-design/agents/openai.yaml +2 -0
  29. package/.codex/skills/genesis-observability-automation/SKILL.md +69 -303
  30. package/.codex/skills/genesis-observability-automation/references/common-mistakes-and-recovery.md +84 -0
  31. package/.codex/skills/genesis-observability-automation/references/workflow-phases.md +78 -0
  32. package/.codex/skills/genesis-performance-profiling/SKILL.md +1 -22
  33. package/.codex/skills/genesis-performance-profiling/agents/openai.yaml +1 -1
  34. package/.codex/skills/genesis-planning/SKILL.md +6 -1
  35. package/.codex/skills/genesis-release/SKILL.md +5 -0
  36. package/.codex/skills/genesis-research-first/SKILL.md +6 -0
  37. package/.codex/skills/genesis-spec-propagation/SKILL.md +52 -504
  38. package/.codex/skills/genesis-test-driven-development/SKILL.md +55 -0
  39. package/.codex/skills/genesis-test-driven-development/agents/openai.yaml +6 -0
  40. package/.codex/skills/genesis-test-driven-development/checklists/.gitkeep +0 -0
  41. package/.codex/skills/genesis-test-driven-development/examples/.gitkeep +0 -0
  42. package/.codex/skills/genesis-test-driven-development/templates/.gitkeep +0 -0
  43. package/.codex/skills/genesis-upgrade-design/SKILL.md +4 -2
  44. package/.codex/skills/genesis-upgrade-design/agents/openai.yaml +2 -0
  45. package/.codex/skills/genesis-using-git-worktrees/SKILL.md +54 -0
  46. package/.codex/skills/genesis-using-git-worktrees/agents/openai.yaml +6 -0
  47. package/.codex/skills/genesis-using-git-worktrees/checklists/.gitkeep +0 -0
  48. package/.codex/skills/genesis-using-git-worktrees/examples/.gitkeep +0 -0
  49. package/.codex/skills/genesis-using-git-worktrees/templates/.gitkeep +0 -0
  50. package/.codex/skills/genesis-verification-before-completion/SKILL.md +53 -0
  51. package/.codex/skills/genesis-verification-before-completion/agents/openai.yaml +6 -0
  52. package/.codex/skills/genesis-verification-before-completion/checklists/.gitkeep +0 -0
  53. package/.codex/skills/genesis-verification-before-completion/examples/.gitkeep +0 -0
  54. package/.codex/skills/genesis-verification-before-completion/templates/.gitkeep +0 -0
  55. package/.codex/skills/spec-impact-engine/SKILL.md +77 -500
  56. package/.codex/skills/spec-impact-engine/checklists/checklist.md +10 -0
  57. package/.codex-plugin/plugin.json +3 -4
  58. package/CHANGELOG.md +4 -1
  59. package/README.EN.md +32 -17
  60. package/README.VI.md +35 -19
  61. package/README.md +48 -10
  62. package/VERSION +1 -1
  63. package/bin/genesis-harness.js +735 -5
  64. package/contracts/features/registry-schema.json +15 -0
  65. package/contracts/observability/agent-run-schema.json +34 -0
  66. package/contracts/observability/failure-schema.json +35 -0
  67. package/contracts/ui/auth/login-screen-contract.json +43 -0
  68. package/features/REGISTRY.md +63 -0
  69. package/features/SCOPE-template.md +65 -0
  70. package/fixtures/planning/MOCKUP_PROMPT_TEMPLATE.md +16 -0
  71. package/observability/agent-runs/sample-run.json +13 -0
  72. package/observability/decision-logs/sample-decision.md +43 -0
  73. package/observability/failures/sample-failure.json +12 -0
  74. package/package.json +9 -3
  75. package/playwright/e2e/app-template.spec.js +37 -0
  76. package/playwright/e2e/auth/login-screen.spec.js +65 -0
  77. package/playwright/e2e/web-template.spec.js +28 -0
  78. package/scripts/check-scope.sh +100 -0
  79. package/scripts/cold-start-check.js +133 -0
  80. package/scripts/install.sh +4 -0
  81. package/scripts/prompt_sentinel.js +35 -4
  82. package/scripts/run-evals.sh +119 -3
  83. package/scripts/scratch_parser.js +49 -0
  84. package/scripts/spec_visual_sync.js +1 -1
  85. package/scripts/test_generator.js +2 -2
  86. package/scripts/uninstall.sh +4 -0
  87. package/scripts/verify.sh +16 -1
  88. package/tests/integration/cli-smoke.test.js +103 -0
  89. package/tests/unit/feature_registry.test.js +152 -0
  90. package/tests/unit/prompt_sentinel.test.js +1 -1
  91. package/tests/unit/spec_visual_sync.test.js +1 -1
  92. package/tests/unit/test_generator.test.js +1 -1
  93. package/playwright/e2e/e2e-template.md +0 -4
@@ -1,83 +1,22 @@
1
1
  # Recovery Points
2
2
 
3
- **Purpose**: Document where harness architecture implementation can be paused and resumed without losing context or creating inconsistencies.
4
-
5
- **Use When**: Evolution of the Codex harness (verification loops, CLI tools, scripts) needs to be paused, or when a rollback is necessary due to environment breakage.
6
-
7
- ---
8
-
9
- ## Quick Reference: Current Recovery Points
10
-
11
- | Phase | Status | Resumption File | Last Updated |
12
- |-------|--------|-----------------|--------------|
13
- | TUI Mockup Viewer Integration | ✓ Complete | `.codebase/CURRENT_STATE.md` | 2026-06-01 |
14
- | Harness Verification Streamlining | ✓ Complete | `.codebase/CURRENT_STATE.md` | 2026-06-01 |
15
- | Bead Memory Regression Tests | ✓ Complete | `scripts/run-evals.sh` | 2026-06-01 |
16
- | Harness Engineering Overhaul | ⏸️ Idle (Stable) | `scripts/verify.sh` | 2026-06-01 |
17
-
18
- ---
19
-
20
- ## Phase: Harness Verification Streamlining & Memory Evals
21
-
22
- **Status**: ✓ Complete
23
- **Last Updated**: 2026-06-01
24
-
25
- ### What Happened
26
-
27
- - Cleaned up legacy/deprecated skills (e.g., `genesis-mvp-planning`, `genesis-release-orchestration`) from `scripts/verify.sh`, `scripts/uninstall.sh`, and `scripts/run-evals.sh`.
28
- - Removed hard-coded skill name mappings (`expected_name` switch statements), enabling dynamic mapping directly based on directory names.
29
- - Added test coverage in `run-evals.sh` for the local bead memory commands (`remember`, `recall`, `prime`, `forget`).
30
- - Enforced `state-machine.md` presence in `verify_harness_skill()`.
31
-
32
- ### Safe State Confirmation
33
-
34
- The harness currently passes all structural tests cleanly.
35
- ```bash
36
- # Verify structure
37
- ./scripts/verify.sh
38
-
39
- # Verify regression
40
- ./scripts/run-evals.sh
41
-
42
- # Dry-run package integrity
43
- npm run pack:check
44
- ```
45
-
46
- ---
47
-
48
- ## Rollback Points
49
-
50
- ### If A Future Harness Evolution Breaks the CLI/Environment
51
-
52
- **Rollback Level 1: Last Stable Run (Current State)**
53
- If a new change to `bin/genesis-harness.js` or `scripts/verify.sh` creates infinite loops or immediate failures:
54
- ```bash
55
- git checkout -- bin/genesis-harness.js scripts/verify.sh scripts/run-evals.sh
56
- npm install
57
- ./scripts/verify.sh
58
- ```
59
-
60
- **Rollback Level 2: Full Repository Reset**
61
- If tests are failing in a manner that contaminates local fixtures or memory:
62
- ```bash
63
- git reset --hard HEAD
64
- git clean -fd
65
- npm install
66
- ./scripts/verify.sh
67
- ```
68
-
69
- ---
70
-
71
- ## Checklist: Before Pausing Work on Harness Evolutions
72
-
73
- - [ ] `scripts/verify.sh` passing cleanly (Exit Code 0)
74
- - [ ] `scripts/run-evals.sh` passing cleanly (Exit Code 0)
75
- - [ ] Script files verified for POSIX/LF line endings
76
- - [ ] No uncommitted changes in core scripts that break existing workflows
77
- - [ ] `.codebase/CURRENT_STATE.md` updated with exact phase details
3
+ A reverse-chronological log of stable states to return to if the current task corrupts the project.
78
4
 
79
5
  ---
80
6
 
81
- ## Contact For Questions
82
- **Owner**: Codex Harness Engineering Team
83
- **Last Validated**: 2026-06-01
7
+ ## 2026-06-03T09:55:00+07:00: Full Score Harness Fix (110/110)
8
+ - **Status**: Stable
9
+ - **Git State**: Everything committed + new features added.
10
+ - **Why it's stable**: All tests (`tests/unit/*.test.js`), `verify.sh`, `run-evals.sh`, and `cold-start-check.js` pass with exit code 0.
11
+ - **How to recover**: `git reset --hard HEAD` (assuming commit happens immediately after this)
12
+ - **Files added**: `features/REGISTRY.md`, `scripts/cold-start-check.js`, `scripts/check-scope.sh`, observability schemas/samples.
13
+
14
+ ## 2026-06-03T09:30:00+07:00: LeanCTX + CLI Postinstall Seed
15
+ - **Status**: Stable
16
+ - **Why it's stable**: `npm run verify` and `npm run eval` pass. `context-policy.json` successfully bootstrapped.
17
+ - **How to recover**: Return to commit before the evaluation score fixes.
18
+
19
+ ## 2026-06-03T08:35:00+07:00: Harness Drift Gate Hardening
20
+ - **Status**: Stable
21
+ - **Why it's stable**: `npm run verify`, `npm run eval`, and `npm run pack:check` all pass.
22
+ - **How to recover**: Revert to branch state before LeanCTX introduction.
@@ -0,0 +1,6 @@
1
+ # Tech Debt Ledger
2
+
3
+ This file logs structural rule violations, skipped tests, and out-of-scope modifications that were bypassed using `VIBE_MODE`.
4
+ The AI and developers should periodically review this file to pay down technical debt.
5
+
6
+ ---
@@ -2,10 +2,11 @@
2
2
 
3
3
  Required checks:
4
4
 
5
- - `scripts/verify.sh`: repository harness structure, skill metadata, contracts, fixtures, and harness smoke test.
6
- - `scripts/run-evals.sh`: install/verify/uninstall regression checks.
5
+ - `scripts/verify.sh`: repository harness structure, skill metadata, contracts, fixtures, harness smoke test, and `SKILL.md` progressive-disclosure line limit.
6
+ - `scripts/run-evals.sh`: install/verify/uninstall regression checks, manifest route checks, sync-generated Mermaid relationship checks, hook docs-gate checks, LeanCTX policy checks, handoff/state freshness checks, `tests/unit/*.test.js`, and `tests/integration/*.test.js`.
7
+ - `tests/integration/cli-smoke.test.js`: package CLI smoke for install/postinstall LeanCTX seeding, `path`, `status`, `docs`, `docs-gate`, `leanctx`, `prime`, and `sync` in temporary fixture repositories.
8
+ - `tests/unit/prompt_sentinel.test.js`: LeanCTX-backed prompt sentinel threshold and truncation behavior.
7
9
  - `npm run pack:check`: package contents dry-run.
8
10
  - Skill validation: run `quick_validate.py` for changed skills when available.
9
11
 
10
12
  Feature rule: add or update fixtures and expected output before implementation.
11
-
@@ -0,0 +1,127 @@
1
+ # Visual Project Graph
2
+
3
+ ## Harness Relationship Map
4
+
5
+ ```mermaid
6
+ flowchart LR
7
+ manifest[".codex-plugin/plugin.json"] --> skills[".codex/skills/*"]
8
+ package["package.json"] --> cli["bin/genesis-harness.js"]
9
+ package --> verify["scripts/verify.sh"]
10
+ package --> evals["scripts/run-evals.sh"]
11
+ cli --> install["install / postinstall"]
12
+ cli --> hooks["setup-hooks"]
13
+ hooks --> docsgate["genesis-harness docs-gate"]
14
+ docsgate --> docsync["check-docs-sync.sh"]
15
+ docsgate --> specsync["check-spec-changelog.sh"]
16
+ skills --> contracts["contracts/"]
17
+ skills --> fixtures["fixtures/"]
18
+ skills --> tests["tests/ + playwright/"]
19
+ skills --> memory[".codebase/"]
20
+ verify --> skills
21
+ verify --> contracts
22
+ verify --> fixtures
23
+ verify --> memory
24
+ evals --> install
25
+ evals --> cli
26
+ evals --> unit["tests/unit/*.test.js"]
27
+ evals --> integration["tests/integration/*.test.js"]
28
+ evals --> pack["npm pack smoke"]
29
+ ```
30
+
31
+ ## Skill Workflow Relationships
32
+
33
+ ```mermaid
34
+ flowchart TD
35
+ harness["genesis-harness"] --> planning["genesis-planning"]
36
+ harness --> research["genesis-research-first"]
37
+ planning --> architecture["genesis-architecture"]
38
+ planning --> api["genesis-api-contract"]
39
+ planning --> design["genesis-design-spec"]
40
+ api --> apisync["genesis-api-sync"]
41
+ design --> ui["genesis-ui-ux-test"]
42
+ api --> specimpact["spec-impact-engine"]
43
+ specimpact --> specprop["genesis-spec-propagation"]
44
+ specprop --> docs["genesis-docs-automation"]
45
+ ui --> verifybefore["genesis-verification-before-completion"]
46
+ apisync --> verifybefore
47
+ docs --> verifybefore
48
+ verifybefore --> release["genesis-release"]
49
+ harness --> memorymap["genesis-codebase-map"]
50
+ harness --> observability["genesis-observability-automation"]
51
+ ```
52
+
53
+ ## Code Dependency Hints
54
+
55
+ ```mermaid
56
+ flowchart TD
57
+ "tests/integration/cli-smoke.test.js" --> "assert"
58
+ "tests/integration/cli-smoke.test.js" --> "fs"
59
+ "tests/integration/cli-smoke.test.js" --> "os"
60
+ "tests/integration/cli-smoke.test.js" --> "path"
61
+ "tests/integration/cli-smoke.test.js" --> "child_process"
62
+ "tests/unit/contract_integrity_gate.test.js" --> "assert"
63
+ "tests/unit/contract_integrity_gate.test.js" --> "fs"
64
+ "tests/unit/contract_integrity_gate.test.js" --> "path"
65
+ "tests/unit/contract_integrity_gate.test.js" --> "child_process"
66
+ "tests/unit/healing_telemetry.test.js" --> "assert"
67
+ "tests/unit/healing_telemetry.test.js" --> "fs"
68
+ "tests/unit/healing_telemetry.test.js" --> "path"
69
+ "tests/unit/healing_telemetry.test.js" --> "child_process"
70
+ "tests/unit/prompt_sentinel.test.js" --> "assert"
71
+ "tests/unit/prompt_sentinel.test.js" --> "fs"
72
+ "tests/unit/prompt_sentinel.test.js" --> "path"
73
+ "tests/unit/prompt_sentinel.test.js" --> "child_process"
74
+ "tests/unit/spec_visual_sync.test.js" --> "assert"
75
+ "tests/unit/spec_visual_sync.test.js" --> "fs"
76
+ "tests/unit/spec_visual_sync.test.js" --> "path"
77
+ "tests/unit/spec_visual_sync.test.js" --> "child_process"
78
+ "tests/unit/test_generator.test.js" --> "assert"
79
+ "tests/unit/test_generator.test.js" --> "fs"
80
+ "tests/unit/test_generator.test.js" --> "path"
81
+ "tests/unit/test_generator.test.js" --> "child_process"
82
+ "bin/genesis-harness.js" --> "fs"
83
+ "bin/genesis-harness.js" --> "path"
84
+ "bin/genesis-harness.js" --> "child_process"
85
+ "bin/genesis-harness.js" --> "@babel/parser"
86
+ "bin/genesis-harness.js" --> "@babel/traverse"
87
+ "bin/genesis-harness.js" --> "child_process"
88
+ ```
89
+
90
+ ## .planning/ROADMAP.md Derived Feature Status
91
+
92
+ ```mermaid
93
+ graph TD
94
+ classDef completed fill:#d4edda,stroke:#28a745,stroke-width:2px;
95
+ classDef inprogress fill:#fff3cd,stroke:#ffc107,stroke-width:2px;
96
+ classDef pending fill:#e2e3e5,stroke:#6c757d,stroke-width:2px;
97
+ subgraph Role_0 ["Role: User"]
98
+ Task0["Roadmap task 0"]
99
+ class Task0 completed;
100
+ Task1["Roadmap task 1"]
101
+ class Task1 inprogress;
102
+ Task2["Roadmap task 2"]
103
+ class Task2 pending;
104
+ end
105
+ subgraph Role_1 ["Role: Admin"]
106
+ Task3["Roadmap task 3"]
107
+ class Task3 completed;
108
+ Task4["Roadmap task 4"]
109
+ class Task4 pending;
110
+ Task5["Roadmap task 5"]
111
+ class Task5 inprogress;
112
+ end
113
+ subgraph Role_2 ["Role: Analytics"]
114
+ Task6["Roadmap task 6"]
115
+ class Task6 pending;
116
+ Task7["Roadmap task 7"]
117
+ class Task7 pending;
118
+ Task8["Roadmap task 8"]
119
+ class Task8 inprogress;
120
+ end
121
+ Task0 --> Task1
122
+ Task0 --> Task2
123
+ Task2 --> Task4
124
+ Task2 --> Task5
125
+ Task4 --> Task6
126
+ ```
127
+
@@ -0,0 +1,68 @@
1
+ {
2
+ "name": "leanctx-default",
3
+ "token_budget": 12000,
4
+ "_comment_token_budget": "Default conservative budget. Override in project .codebase/context-policy.json. For 128k-context models, set to 40000.",
5
+ "auto_scale": {
6
+ "enabled": false,
7
+ "note": "Set enabled=true and provide model_context_window to auto-calculate budgets. Formula: token_budget = model_context_window * 0.09 (9% for harness context leaves 91% for generation).",
8
+ "model_context_window": null,
9
+ "scale_factor": 0.09
10
+ },
11
+ "warn_at": 0.6,
12
+ "compact_at": 0.7,
13
+ "hard_stop_at": 0.85,
14
+ "layers": [
15
+ {
16
+ "name": "core",
17
+ "max_tokens": 2500,
18
+ "include": [
19
+ "AGENTS.md",
20
+ ".codex/SOUL.md",
21
+ ".codebase/CURRENT_STATE.md",
22
+ ".codebase/MODULE_INDEX.md",
23
+ ".codebase/TEST_MATRIX.md"
24
+ ]
25
+ },
26
+ {
27
+ "name": "active_context",
28
+ "max_tokens": 6500,
29
+ "include": [
30
+ ".codebase/COMPRESSED_CONTEXT.md",
31
+ ".codebase/VISUAL_GRAPH.md",
32
+ ".planning/STATE.md",
33
+ ".planning/ROADMAP.md",
34
+ "contracts/",
35
+ "fixtures/"
36
+ ]
37
+ },
38
+ {
39
+ "name": "deferred_reference",
40
+ "max_tokens": 3000,
41
+ "include": [
42
+ ".codex/skills/*/references/",
43
+ ".codex/skills/*/playbooks/",
44
+ ".codex/skills/*/checklists/",
45
+ "README*.md"
46
+ ]
47
+ }
48
+ ],
49
+ "defer_patterns": [
50
+ ".codex/skills/*/templates/**",
51
+ ".codex/skills/*/examples/**",
52
+ "playwright/**",
53
+ "observability/**",
54
+ "node_modules/**",
55
+ "dist/**",
56
+ "coverage/**"
57
+ ],
58
+ "portable_commands": [
59
+ "genesis-harness leanctx",
60
+ "genesis-harness sync",
61
+ "genesis-harness docs-gate",
62
+ "genesis-harness verify-gate",
63
+ "npm run verify",
64
+ "npm run eval",
65
+ "node scripts/cold-start-check.js"
66
+ ],
67
+ "wrapper_policy": "rtk optional when installed locally; public docs and CI must use portable commands."
68
+ }
@@ -0,0 +1,21 @@
1
+ # Lessons Learned & Historical Bugs
2
+
3
+ This file chronicles the major failures, recursive bugs, and architectural dead-ends we have encountered. It acts as an immune system preventing the agent from repeating history.
4
+
5
+ ## 1. Duplicate Slash Commands in Registry
6
+ - **Symptom**: Agent registered 4 copies of the same slash command for a single skill.
7
+ - **Root Cause**: The CLI script recursively scanned the entire `.codex/` directory for active skills, accidentally parsing backup folders (`.codex/backup/`) generated during skill upgrades.
8
+ - **Resolution**: Backup directories must ALWAYS be placed completely outside the active parsed directory (e.g., moved to `~/.codex/backups` globally).
9
+ - **Rule**: When doing file tree walks for plugins/skills, always explicitly ignore `.git`, `node_modules`, `backup`, and `tmp` folders.
10
+
11
+ ## 2. Documentation Drift & Broken Contracts
12
+ - **Symptom**: Code in `scripts/` changed logic without updating `contracts/`.
13
+ - **Root Cause**: Agent skipped the documentation step after a "quick fix" code edit.
14
+ - **Resolution**: Implemented Validation Gates (`npm run verify`).
15
+ - **Rule**: Never finalize a code edit without explicitly checking `TEST_MATRIX.md` and related schemas in `contracts/`. The validation gate will fail the build if it detects drift.
16
+
17
+ ## 3. Excessive Token Usage from `cat` and `ls`
18
+ - **Symptom**: Context window flooded with massive minified bundle files or deep directory trees.
19
+ - **Root Cause**: Using `cat` on large files or `ls -R` without filters.
20
+ - **Resolution**:
21
+ - **Rule**: Always use the native `view_file`, `list_dir`, and `grep_search` tools with precise line bounds or search terms. NEVER `cat` a file directly in bash if a native agent tool exists.
@@ -0,0 +1,17 @@
1
+ # Developer Preferences
2
+
3
+ This file records the specific technical choices, preferences, and stylistic guidelines of the human developer for this repository. Adhere to these implicitly during code generation and problem-solving.
4
+
5
+ ## Technology Stack
6
+ - **Primary Language**: JavaScript (Node.js for backend scripts). Keep code modern but compatible with Node >= 18.
7
+ - **Testing**: Use standard Unix bash testing scripts (`verify.sh`, `run-evals.sh`) and standard Node asserts for unit testing unless specified otherwise.
8
+ - **Frontend/UI**: When dealing with UI generation, prefer Vanilla CSS for precise control and maximum performance. Emphasize "WOW" factor, modern gradients, glassmorphism, and responsive layouts.
9
+
10
+ ## Architectural Choices
11
+ - **Harness Engineering**: The system relies on state machines (FSM) and validation gates. Never skip a validation gate (`contract_integrity_gate.js`, `healing_telemetry.js`).
12
+ - **File Integrity**: Ensure all metadata inside `.codebase/` and `contracts/` stays perfectly synchronized with any changes to actual codebase logic (`scripts/spec_visual_sync.js`).
13
+
14
+ ## Communication Style
15
+ - Be concise, professional, and skip unnecessary pleasantries when delivering technical solutions.
16
+ - Use GitHub Flavored Markdown for formatting logs, alerts, and instructions.
17
+ - Provide Vietnamese localization in READMEs and user-facing artifacts where possible, as the user frequently requests Vietnamese communication.
@@ -1,37 +1,58 @@
1
1
  {
2
2
  "current_state": "COMPLETED",
3
+ "completed_at": "2026-06-03T02:42:00Z",
4
+ "active_work": "Full Score Harness Fix — L02-L12",
5
+ "session_id": "2026-06-03-full-score-fix",
6
+ "session_started_at": "2026-06-03T02:35:00Z",
7
+ "ttfv_seconds": 180,
8
+ "_comment_ttfv": "Time-to-First-Verification: seconds from session start to first passing test (L06 KPI). 180s = 3min for this session.",
9
+ "latest_handoff": ".codebase/IMPLEMENTATION_HANDOFF.md",
10
+ "latest_recovery_point": "Full Score Harness Fix — feature registry + observability",
11
+ "required_verification": [
12
+ "npm run verify",
13
+ "npm run eval",
14
+ "npm run pack:check",
15
+ "node tests/unit/feature_registry.test.js",
16
+ "node scripts/cold-start-check.js",
17
+ "node bin/genesis-harness.js docs-gate",
18
+ "node bin/genesis-harness.js leanctx"
19
+ ],
3
20
  "history": [
4
21
  {
5
- "from": "INIT",
6
- "to": "REQUIREMENTS_GATHERING",
7
- "reason": "Started Gathering",
8
- "timestamp": "2026-05-31T05:39:53.706Z"
9
- },
10
- {
11
- "from": "REQUIREMENTS_GATHERING",
12
- "to": "PLANNING",
13
- "reason": "p",
14
- "timestamp": "2026-05-31T05:51:07.765Z"
15
- },
16
- {
17
- "from": "PLANNING",
18
- "to": "IMPLEMENTATION",
19
- "reason": "i",
20
- "timestamp": "2026-05-31T05:51:07.883Z"
22
+ "from": "VERIFICATION",
23
+ "to": "COMPLETED",
24
+ "reason": "Harness drift gate hardening completed with source-of-truth, handoff, state, and CLI smoke verification gates.",
25
+ "timestamp": "2026-06-03T08:31:59+07:00",
26
+ "session_id": "2026-06-03-drift-gate"
21
27
  },
22
28
  {
23
- "from": "IMPLEMENTATION",
24
- "to": "VERIFICATION",
25
- "reason": "v",
26
- "timestamp": "2026-05-31T05:51:08.001Z"
29
+ "from": "VERIFICATION",
30
+ "to": "COMPLETED",
31
+ "reason": "LeanCTX policy, CLI reporting, prompt sentinel thresholds, and portable npm-user command guidance added.",
32
+ "timestamp": "2026-06-03T09:06:31+07:00",
33
+ "session_id": "2026-06-03-leanctx"
27
34
  },
28
35
  {
29
36
  "from": "VERIFICATION",
30
37
  "to": "COMPLETED",
31
- "reason": "testing complete transition",
32
- "timestamp": "2026-05-31T05:53:47.517Z"
38
+ "reason": "LeanCTX policy auto-seeding added for install and npm postinstall without overwriting project custom policy.",
39
+ "timestamp": "2026-06-03T09:28:41+07:00",
40
+ "session_id": "2026-06-03-leanctx-seed"
41
+ },
42
+ {
43
+ "from": "COMPLETED",
44
+ "to": "EXECUTE",
45
+ "reason": "Started full harness evaluation and score fix: L08 feature registry, L11 observability live, L04 instruction size, L03 cold-start, L05 session boundary, L07 scope, L09 victory blocker, L12 known problems.",
46
+ "timestamp": "2026-06-03T02:35:00Z",
47
+ "session_id": "2026-06-03-full-score-fix"
33
48
  }
34
49
  ],
35
- "context": {},
50
+ "context": {
51
+ "score_target": "110/110",
52
+ "package_version": "0.1.7",
53
+ "verification_owner": "scripts/run-evals.sh",
54
+ "context_policy": ".codebase/context-policy.json",
55
+ "evaluation_report": ".codebase/../artifacts/harness_evaluation_report.md"
56
+ },
36
57
  "pending_tasks": []
37
- }
58
+ }
@@ -44,3 +44,8 @@ Changing dependency direction silently, documenting intent without tests, and sc
44
44
  ## Recovery workflow
45
45
  If architecture drift is found, stop feature work, create a failing boundary test, update the contract, then repair the smallest module slice.
46
46
 
47
+ ## MCP Automation Requirements
48
+
49
+ To ensure that architectural decisions are properly contextualized and persisted across the entire lifecycle of the project, you **MUST** use the following MCP server:
50
+ 1. **`@modelcontextprotocol/server-memory`**: Use this MCP tool to automatically query the Knowledge Graph for past architectural decisions, trade-offs, and boundary definitions before making any new system-wide changes. After establishing a new architecture rule, you must save it to the memory graph.
51
+
@@ -161,13 +161,13 @@ Update `.codebase` memory after meaningful changes.
161
161
  - Update documentation if behavior changed
162
162
  ```
163
163
 
164
- ## Auto-Trigger Workflow (Post /fix-bug)
164
+ ## Auto-Trigger Workflow (Post /fix-bug or Heal Directive)
165
165
 
166
- When `/fix-bug` completes successfully:
166
+ When `/fix-bug` completes successfully, or when you receive the `[AGENT_DIRECTIVE] TESTS FAILED` from `genesis-harness heal`:
167
167
 
168
168
  ```yaml
169
- Hook: PostToolUse → "/fix-bug completed"
170
- Action: Activate genesis-debug-guide
169
+ Hook: PostToolUse → "/fix-bug completed" OR "[AGENT_DIRECTIVE] TESTS FAILED" printed to stdout.
170
+ Action: Activate genesis-debug-guide. Do NOT stop until tests pass.
171
171
 
172
172
  1. Fix Verification (5 min):
173
173
  - Run: npm test (or equivalent)
@@ -400,6 +400,12 @@ If debugging effort exceeds 30 minutes:
400
400
  - Check GitHub issues/PRs for related bugs
401
401
  ```
402
402
 
403
+ ## MCP Automation Requirements
404
+
405
+ To systematically isolate and resolve bugs, you **MUST** use the following MCP servers:
406
+ 1. **`@modelcontextprotocol/server-puppeteer`**: For any UI, E2E, or visual bug, use this MCP tool to automatically navigate to the local dev server, interact with the UI, reproduce the bug, capture the browser console logs, and take screenshots for Vision analysis.
407
+ 2. **`@modelcontextprotocol/server-memory`**: Query the Knowledge Graph to see if this bug is a known regression or if a similar issue has been resolved in the past.
408
+
403
409
  ## Integration with Genesis Harness
404
410
 
405
411
  **Works with**: