@alucify/cli 0.7.2 → 0.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/dist/cli.js +447 -409
- package/dist/docs/help/mcp-server/tools/verify-plan.md +34 -26
- package/dist/mcp-server-vendored.js +24 -24
- package/dist/web/app.css +1 -1
- package/dist/web/app.js +66 -66
- package/package.json +1 -1
|
@@ -6,15 +6,15 @@ description: "How the verify_plan tool mechanically checks a plan's appgraph ref
|
|
|
6
6
|
|
|
7
7
|
# Verifying a Plan
|
|
8
8
|
|
|
9
|
-
The `verify_plan` tool checks a planning document against the appgraph to catch invented IDs
|
|
9
|
+
The `verify_plan` tool checks a planning document against the appgraph to catch invented IDs and silent omissions before the plan is presented or executed, and surfaces advisory drift warnings for human review.
|
|
10
10
|
|
|
11
11
|
## What It Does
|
|
12
12
|
|
|
13
|
-
Runs
|
|
13
|
+
Runs two hard checks (gate `valid`) and one advisory check (does not gate `valid`):
|
|
14
14
|
|
|
15
|
-
1. **Invalid IDs** — Extracts every appgraph ID (`nv*`, `nl*`, `ni*`, `nr*`, `ns*`) from the plan and confirms each exists in the current appgraph. Catches hallucinations.
|
|
16
|
-
2. **
|
|
17
|
-
3. **
|
|
15
|
+
1. **Invalid IDs (hard fail)** — Extracts every appgraph ID (`nv*`, `nl*`, `ni*`, `nr*`, `ns*`) from the plan and confirms each exists in the current appgraph. Catches hallucinations.
|
|
16
|
+
2. **Missing coverage (hard fail)** — Lists untested validations and in-progress components that need work but aren't referenced anywhere in the plan. Catches silent omissions.
|
|
17
|
+
3. **Name mismatches (advisory)** — Compares the plan's description near each ID against the actual node's name and description using token overlap. Surfaced as warnings under `warnings.nameMismatches` because the heuristic false-positives on legitimate batch listings (e.g. compact ID enumerations where descriptive prose lives elsewhere in the plan). Reviewed by the agent, not enforced.
|
|
18
18
|
|
|
19
19
|
No LLM calls — all checks are mechanical.
|
|
20
20
|
|
|
@@ -30,19 +30,10 @@ No LLM calls — all checks are mechanical.
|
|
|
30
30
|
```json
|
|
31
31
|
{
|
|
32
32
|
"valid": false,
|
|
33
|
-
"summary": "Found 44 IDs in plan | 11 INVALID (do not exist in appgraph) |
|
|
33
|
+
"summary": "Found 44 IDs in plan | 11 INVALID (do not exist in appgraph) | 22 missing coverage | 7 drift warnings (advisory)",
|
|
34
34
|
"invalidIds": [
|
|
35
35
|
{ "id": "nv00057", "nearbyText": "Task 5.1: TestAC_AgentAccountFieldSync (nv00057) ..." }
|
|
36
36
|
],
|
|
37
|
-
"nameMismatches": [
|
|
38
|
-
{
|
|
39
|
-
"id": "nv00042",
|
|
40
|
-
"actualName": "Orphaned quota entry is warned and excluded",
|
|
41
|
-
"actualDescription": "...",
|
|
42
|
-
"planContext": "Task 2.7: TestLoadState_Corrupted (nv00042) — test malformed JSON...",
|
|
43
|
-
"similarity": 0.04
|
|
44
|
-
}
|
|
45
|
-
],
|
|
46
37
|
"missingIds": [
|
|
47
38
|
{
|
|
48
39
|
"id": "nv00001",
|
|
@@ -52,13 +43,27 @@ No LLM calls — all checks are mechanical.
|
|
|
52
43
|
"reason": "untested validation"
|
|
53
44
|
}
|
|
54
45
|
],
|
|
55
|
-
"
|
|
46
|
+
"missingIdsTruncated": 0,
|
|
47
|
+
"warnings": {
|
|
48
|
+
"nameMismatches": [
|
|
49
|
+
{
|
|
50
|
+
"id": "nv00042",
|
|
51
|
+
"actualName": "Orphaned quota entry is warned and excluded",
|
|
52
|
+
"actualDescription": "...",
|
|
53
|
+
"planContext": "Task 2.7: TestLoadState_Corrupted (nv00042) — test malformed JSON...",
|
|
54
|
+
"similarity": 0.04
|
|
55
|
+
}
|
|
56
|
+
]
|
|
57
|
+
},
|
|
58
|
+
"guidance": "Fix invalid IDs: these do not exist in the appgraph. Add coverage for missing IDs ... Review drift warnings (advisory, do not block) ..."
|
|
56
59
|
}
|
|
57
60
|
```
|
|
58
61
|
|
|
62
|
+
`valid` is `true` when both `invalidIds` and `missingIds` are empty, regardless of how many `warnings.nameMismatches` entries exist.
|
|
63
|
+
|
|
59
64
|
## How It's Used
|
|
60
65
|
|
|
61
|
-
The `/alucify-plan` skill calls this tool as the final step before presenting a plan. If
|
|
66
|
+
The `/alucify-plan` skill calls this tool as the final step before presenting a plan. If hard checks fail, the planning agent fixes the flagged issues and re-runs verification until `valid: true`. Drift warnings under `warnings.nameMismatches` are reviewed but do not block presentation.
|
|
62
67
|
|
|
63
68
|
## Example Workflow
|
|
64
69
|
|
|
@@ -66,17 +71,20 @@ The `/alucify-plan` skill calls this tool as the final step before presenting a
|
|
|
66
71
|
1. Plan agent writes plan referencing nv00057-nv00067
|
|
67
72
|
2. verify_plan returns invalidIds: ["nv00057", ...] because IDs only go up to nv00055
|
|
68
73
|
3. Plan agent removes invented IDs, replaces with real ones from get_implementation_tasks
|
|
69
|
-
4. verify_plan re-runs
|
|
70
|
-
|
|
71
|
-
|
|
72
|
-
|
|
73
|
-
|
|
74
|
+
4. verify_plan re-runs and returns valid: true with 7 entries in warnings.nameMismatches
|
|
75
|
+
(most are batch ID listings — false positives)
|
|
76
|
+
5. Plan agent reviews each warning. For one (nv00042), the plan's description
|
|
77
|
+
("TestLoadState_Corrupted") doesn't match nv00042's actual name
|
|
78
|
+
("Orphaned quota entry warning") — agent re-maps the task to the correct ID.
|
|
79
|
+
The other 6 are confirmed false positives (compact ID lists) and left alone.
|
|
80
|
+
6. verify_plan re-runs (optional) — still valid: true, 6 advisory warnings remain
|
|
74
81
|
7. Plan is presented to the user
|
|
75
82
|
```
|
|
76
83
|
|
|
77
84
|
## Notes
|
|
78
85
|
|
|
79
|
-
-
|
|
80
|
-
- The
|
|
81
|
-
-
|
|
82
|
-
-
|
|
86
|
+
- Drift warnings (`warnings.nameMismatches`) are advisory — they surface possible off-by-one errors but false-positive on legitimate batch ID listings, tables, and reference dumps where descriptive prose lives elsewhere. The agent reviews them; they do not block `valid: true`.
|
|
87
|
+
- The similarity threshold is conservative (low bar) to catch obvious mismatches; tighter thresholds would miss real drift.
|
|
88
|
+
- The tool does not check the quality of the plan's tasks — only whether IDs are real and coverage is complete.
|
|
89
|
+
- `missingIds` is capped at 100 results to keep the response bounded.
|
|
90
|
+
- Safe to call repeatedly — verification is cheap and has no side effects.
|