@hacksmith/doraval 0.2.46 → 0.2.47

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -134,6 +134,21 @@ doraval skill drift ./skills/my-skill/
134
134
  | **Guardrail** | Has explicit `MUST` / `MUST NOT` constraints |
135
135
  | **Clarity** | Free of ambiguous words (`maybe`, `perhaps`, `consider`) |
136
136
 
137
+ ### `eval` — Did the agent follow the skill?
138
+
139
+ After a real session, evaluate whether the coding agent actually adhered to the skills it invoked.
140
+
141
+ ```bash
142
+ doraval eval # pick from recent sessions interactively
143
+ doraval eval --verbose
144
+ doraval judge ./skills/improve/ # evaluate latest session for one skill
145
+ doraval eval history
146
+ ```
147
+
148
+ `eval` uses an LLM judge (via your configured agent) to produce a per-skill `PASS`/`FAIL` with a dynamic checklist, user familiarity score, and closure information (1-shot vs multi-turn vs incomplete).
149
+
150
+ Requires `doraval init` first. See the [full docs](https://thehacksmith.dev/commands/eval/).
151
+
137
152
  ### `journal` — Decision memory
138
153
 
139
154
  Record and sync project principles so future you (and agents) don't accidentally contradict past choices.
@@ -177,6 +192,7 @@ doraval claude new # interactive wizard for skills/plugins (follows off
177
192
  doraval validate . --for claude --format json --ci
178
193
  doraval skill validate ./my-skill/ --format json --ci
179
194
  doraval skill drift ./my-skill/ --format json --ci
195
+ doraval eval --ci --format json
180
196
  ```
181
197
 
182
198
  Exits with code `1` when errors are found. Pipe `--format json` output to `jq` or consume programmatically.