@alucify/cli 0.7.1 → 0.7.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -6,15 +6,15 @@ description: "How the verify_plan tool mechanically checks a plan's appgraph ref
6
6
 
7
7
  # Verifying a Plan
8
8
 
9
- The `verify_plan` tool checks a planning document against the appgraph to catch invented IDs, off-by-one drift, and silent omissions before the plan is presented or executed.
9
+ The `verify_plan` tool checks a planning document against the appgraph to catch invented IDs and silent omissions before the plan is presented or executed, and surfaces advisory drift warnings for human review.
10
10
 
11
11
  ## What It Does
12
12
 
13
- Runs three deterministic checks against the plan text:
13
+ Runs two hard checks (gate `valid`) and one advisory check (does not gate `valid`):
14
14
 
15
- 1. **Invalid IDs** — Extracts every appgraph ID (`nv*`, `nl*`, `ni*`, `nr*`, `ns*`) from the plan and confirms each exists in the current appgraph. Catches hallucinations.
16
- 2. **Name mismatches** — Compares the plan's description near each ID against the actual node's name and description using token overlap. Catches off-by-one drift where the plan cites the wrong ID for a described task.
17
- 3. **Missing coverage** — Lists untested validations and in-progress components that need work but aren't referenced anywhere in the plan. Catches silent omissions.
15
+ 1. **Invalid IDs (hard fail)** — Extracts every appgraph ID (`nv*`, `nl*`, `ni*`, `nr*`, `ns*`) from the plan and confirms each exists in the current appgraph. Catches hallucinations.
16
+ 2. **Missing coverage (hard fail)** — Lists untested validations and in-progress components that need work but aren't referenced anywhere in the plan. Catches silent omissions.
17
+ 3. **Name mismatches (advisory)** — Compares the plan's description near each ID against the actual node's name and description using token overlap. Surfaced as warnings under `warnings.nameMismatches` because the heuristic false-positives on legitimate batch listings (e.g. compact ID enumerations where descriptive prose lives elsewhere in the plan). Reviewed by the agent, not enforced.
18
18
 
19
19
  No LLM calls — all checks are mechanical.
20
20
 
@@ -30,19 +30,10 @@ No LLM calls — all checks are mechanical.
30
30
  ```json
31
31
  {
32
32
  "valid": false,
33
- "summary": "Found 44 IDs in plan | 11 INVALID (do not exist in appgraph) | 7 likely mismatched | 22 missing coverage",
33
+ "summary": "Found 44 IDs in plan | 11 INVALID (do not exist in appgraph) | 22 missing coverage | 7 drift warnings (advisory)",
34
34
  "invalidIds": [
35
35
  { "id": "nv00057", "nearbyText": "Task 5.1: TestAC_AgentAccountFieldSync (nv00057) ..." }
36
36
  ],
37
- "nameMismatches": [
38
- {
39
- "id": "nv00042",
40
- "actualName": "Orphaned quota entry is warned and excluded",
41
- "actualDescription": "...",
42
- "planContext": "Task 2.7: TestLoadState_Corrupted (nv00042) — test malformed JSON...",
43
- "similarity": 0.04
44
- }
45
- ],
46
37
  "missingIds": [
47
38
  {
48
39
  "id": "nv00001",
@@ -52,13 +43,27 @@ No LLM calls — all checks are mechanical.
52
43
  "reason": "untested validation"
53
44
  }
54
45
  ],
55
- "guidance": "Fix invalid IDs: these do not exist in the appgraph..."
46
+ "missingIdsTruncated": 0,
47
+ "warnings": {
48
+ "nameMismatches": [
49
+ {
50
+ "id": "nv00042",
51
+ "actualName": "Orphaned quota entry is warned and excluded",
52
+ "actualDescription": "...",
53
+ "planContext": "Task 2.7: TestLoadState_Corrupted (nv00042) — test malformed JSON...",
54
+ "similarity": 0.04
55
+ }
56
+ ]
57
+ },
58
+ "guidance": "Fix invalid IDs: these do not exist in the appgraph. Add coverage for missing IDs ... Review drift warnings (advisory, do not block) ..."
56
59
  }
57
60
  ```
58
61
 
62
+ `valid` is `true` when both `invalidIds` and `missingIds` are empty, regardless of how many `warnings.nameMismatches` entries exist.
63
+
59
64
  ## How It's Used
60
65
 
61
- The `/alucify-plan` skill calls this tool as the final step before presenting a plan. If verification fails, the planning agent fixes the flagged issues and re-runs verification until `valid: true`.
66
+ The `/alucify-plan` skill calls this tool as the final step before presenting a plan. If hard checks fail, the planning agent fixes the flagged issues and re-runs verification until `valid: true`. Drift warnings under `warnings.nameMismatches` are reviewed but do not block presentation.
62
67
 
63
68
  ## Example Workflow
64
69
 
@@ -66,17 +71,20 @@ The `/alucify-plan` skill calls this tool as the final step before presenting a
66
71
  1. Plan agent writes plan referencing nv00057-nv00067
67
72
  2. verify_plan returns invalidIds: ["nv00057", ...] because IDs only go up to nv00055
68
73
  3. Plan agent removes invented IDs, replaces with real ones from get_implementation_tasks
69
- 4. verify_plan re-runs this time nameMismatches flags nv00042 because the plan's
70
- description ("TestLoadState_Corrupted") doesn't match nv00042's actual name
71
- ("Orphaned quota entry warning")
72
- 5. Plan agent corrects the ID mapping
73
- 6. verify_plan re-runs and returns valid: true
74
+ 4. verify_plan re-runs and returns valid: true with 7 entries in warnings.nameMismatches
75
+ (most are batch ID listings — false positives)
76
+ 5. Plan agent reviews each warning. For one (nv00042), the plan's description
77
+ ("TestLoadState_Corrupted") doesn't match nv00042's actual name
78
+ ("Orphaned quota entry warning") — agent re-maps the task to the correct ID.
79
+ The other 6 are confirmed false positives (compact ID lists) and left alone.
80
+ 6. verify_plan re-runs (optional) — still valid: true, 6 advisory warnings remain
74
81
  7. Plan is presented to the user
75
82
  ```
76
83
 
77
84
  ## Notes
78
85
 
79
- - Name mismatches use a conservative similarity threshold only obvious mismatches are flagged
80
- - The tool does not check the quality of the plan's tasks only whether IDs are real and coverage is complete
81
- - `missingIds` is capped at 100 results to keep the response bounded
82
- - Safe to call repeatedly verification is cheap and has no side effects
86
+ - Drift warnings (`warnings.nameMismatches`) are advisory they surface possible off-by-one errors but false-positive on legitimate batch ID listings, tables, and reference dumps where descriptive prose lives elsewhere. The agent reviews them; they do not block `valid: true`.
87
+ - The similarity threshold is conservative (low bar) to catch obvious mismatches; tighter thresholds would miss real drift.
88
+ - The tool does not check the quality of the plan's tasks — only whether IDs are real and coverage is complete.
89
+ - `missingIds` is capped at 100 results to keep the response bounded.
90
+ - Safe to call repeatedly — verification is cheap and has no side effects.