ai-wiki-toolkit-linux-arm64 0.1.31 → 0.1.35

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,7 +1,7 @@
1
1
  # ai-wiki-toolkit-linux-arm64
2
2
 
3
3
  This package contains the `aiwiki-toolkit` executable for `linux-arm64-glibc`.
4
- It is published as the platform-specific binary package for `ai-wiki-toolkit` `0.1.31`.
4
+ It is published as the platform-specific binary package for `ai-wiki-toolkit` `0.1.35`.
5
5
  Most users should install `ai-wiki-toolkit` instead of using this package directly.
6
6
 
7
7
  ---
@@ -305,7 +305,7 @@ aiwiki-toolkit init
305
305
  - create a gitignored `.env.aiwiki` file for the current local actor identity
306
306
  - create starter indexes such as `ai-wiki/conventions/index.md`, `ai-wiki/review-patterns/index.md`, `ai-wiki/problems/index.md`, `ai-wiki/features/index.md`, `ai-wiki/trails/index.md`, `ai-wiki/work/index.md`, and `ai-wiki/people/<handle>/index.md`
307
307
  - create `ai-wiki/conventions/`, `ai-wiki/review-patterns/`, `ai-wiki/problems/`, `ai-wiki/features/`, `ai-wiki/work/`, `ai-wiki/people/<handle>/drafts/`, `ai-wiki/metrics/`, and repo/home `_toolkit/`
308
- - generate package-managed `_toolkit/index.md`, `_toolkit/workflows.md`, `_toolkit/catalog.json`, `_toolkit/schema/reuse-v1.md`, `_toolkit/schema/team-memory-v1.md`, `_toolkit/schema/work-v1.md`, `_toolkit/metrics/*.json`, and `_toolkit/work/*`
308
+ - generate package-managed `_toolkit/index.md`, `_toolkit/workflows.md`, `_toolkit/catalog.json`, `_toolkit/schema/reuse-v1.md`, `_toolkit/schema/team-memory-v1.md`, `_toolkit/schema/work-v1.md`, `_toolkit/metrics/*`, and `_toolkit/work/*`
309
309
  - upsert a managed `.gitignore` block that ignores `.env.aiwiki`, AI wiki telemetry, and generated aggregate snapshots so routine agent use does not dirty `git status`
310
310
  - create or refresh package-owned `.agents/skills/ai-wiki-reuse-check/`, `.agents/skills/ai-wiki-update-check/`, `.agents/skills/ai-wiki-clarify-before-code/`, `.agents/skills/ai-wiki-capture-review-learning/`, and `.agents/skills/ai-wiki-consolidate-drafts/`
311
311
  - update `AGENT.md`, `AGENTS.md`, and/or `CLAUDE.md` with a short managed instruction block that points agents to `ai-wiki/_toolkit/system.md` when the repo contains `ai-wiki/`
@@ -338,7 +338,7 @@ If package-owned repo-local skill files already exist under `.agents/skills/ai-w
338
338
 
339
339
  `init` remains as a backward-compatible alias for `install`. The actual scaffold creation does not happen at package install time; it happens when you run `aiwiki-toolkit install` or `aiwiki-toolkit init` inside a git repository.
340
340
 
341
- To append one explicit knowledge-reuse observation and refresh managed aggregates:
341
+ To append one explicit knowledge-reuse observation and refresh handle-scoped managed metrics:
342
342
 
343
343
  ```bash
344
344
  aiwiki-toolkit record-reuse \
@@ -352,7 +352,9 @@ aiwiki-toolkit record-reuse \
352
352
  --saved-seconds 45
353
353
  ```
354
354
 
355
- This appends to the user-owned `ai-wiki/metrics/reuse-events/<handle>.jsonl` shard and refreshes the package-managed aggregate views under `ai-wiki/_toolkit/metrics/`. The installer ignores both the shard and the generated aggregate views by default so these telemetry updates stay local.
355
+ This appends to the user-owned `ai-wiki/metrics/reuse-events/<handle>.jsonl` shard and refreshes the handle-scoped generated views under `ai-wiki/_toolkit/metrics/by-handle/<handle>/`. The installer ignores both the shard and the generated aggregate views by default so these telemetry updates stay local.
356
+
357
+ For post-task diagnosis, reuse events can also carry optional provenance such as `--session-id`, `--source-session-id`, `--source-task-id`, `--consulted-order`, `--signal-status candidate`, `--not-helpful-reason superseded_by_later_doc`, `--resolved-by-doc-id`, and `--superseded-by-doc-id`. Candidate not-helpful signals are review hints; confirmed outcomes still require explicit human or agent judgment.
356
358
 
357
359
  Only record user-owned AI wiki knowledge docs with `record-reuse`.
358
360
 
@@ -401,7 +403,7 @@ aiwiki-toolkit record-reuse-check \
401
403
  --check-outcome wiki_used
402
404
  ```
403
405
 
404
- This appends to the user-owned `ai-wiki/metrics/task-checks/<handle>.jsonl` shard and refreshes the package-managed aggregate views under `ai-wiki/_toolkit/metrics/`. The installer ignores both the shard and the generated aggregate views by default so these telemetry updates stay local.
406
+ This appends to the user-owned `ai-wiki/metrics/task-checks/<handle>.jsonl` shard and refreshes the handle-scoped generated views under `ai-wiki/_toolkit/metrics/by-handle/<handle>/`. The installer ignores both the shard and the generated aggregate views by default so these telemetry updates stay local.
405
407
 
406
408
  Both metrics logs are sharded by handle under:
407
409
 
@@ -410,7 +412,7 @@ Both metrics logs are sharded by handle under:
410
412
 
411
413
  These logs are intended as local telemetry by default, not merge-heavy source files.
412
414
 
413
- If you need a fresh local telemetry and work snapshot, regenerate package-managed aggregate views such as `ai-wiki/_toolkit/catalog.json`, `ai-wiki/_toolkit/metrics/*.json`, or `ai-wiki/_toolkit/work/*` with:
415
+ If you need a fresh local telemetry and work snapshot, regenerate package-managed aggregate views such as `ai-wiki/_toolkit/catalog.json`, `ai-wiki/_toolkit/metrics/*`, or `ai-wiki/_toolkit/work/*` with:
414
416
 
415
417
  ```bash
416
418
  aiwiki-toolkit refresh-metrics
@@ -424,7 +426,7 @@ aiwiki-toolkit diagnose memory --since 14d --handle your-handle
424
426
  aiwiki-toolkit diagnose memory --focus trial-error
425
427
  ```
426
428
 
427
- This writes regenerated local reports under `ai-wiki/_toolkit/diagnostics/` and prints the report to stdout. The report highlights high-ROI memory, noisy memory, stale or missing docs, conflict notes, missed-memory signals, and coverage gaps such as document reuse events that were never paired with a task-level reuse check. It does not edit user-owned AI wiki docs.
429
+ This writes regenerated local reports under `ai-wiki/_toolkit/diagnostics/<handle-or-all>/` and prints the report to stdout. The report highlights high-ROI memory, noisy memory, stale or missing docs, conflict notes, missed-memory signals, and coverage gaps such as document reuse events that were never paired with a task-level reuse check. It does not edit user-owned AI wiki docs.
428
430
 
429
431
  Use `--focus trial-error` to generate a focused trial/error reduction report from existing
430
432
  AI wiki evidence. It summarizes material effects such as `avoided_retry`,
@@ -439,25 +441,142 @@ aiwiki-toolkit consolidate queue
439
441
  aiwiki-toolkit consolidate queue --since 14d --handle your-handle
440
442
  ```
441
443
 
442
- This writes regenerated local reports under `ai-wiki/_toolkit/consolidation/` and prints the queue to stdout. The queue suggests one action per draft cluster: keep, refine, promotion candidate, conflict, or supersession. It does not edit user-owned AI wiki docs or create shared conventions, review patterns, problems, features, or decisions; those still require human confirmation.
444
+ This writes regenerated local reports under `ai-wiki/_toolkit/consolidation/<handle>/` and prints the queue to stdout. The queue suggests one action per draft cluster: keep, refine, promotion candidate, conflict, or supersession. It does not edit user-owned AI wiki docs or create shared conventions, review patterns, problems, features, or decisions; those still require human confirmation.
445
+
446
+ To mark handle-local draft promotion candidates from confirmed-useful reuse evidence:
447
+
448
+ ```bash
449
+ aiwiki-toolkit promote candidates --handle your-handle
450
+ aiwiki-toolkit promote candidates --handle your-handle --apply
451
+ ```
452
+
453
+ The default run is report-only. With `--apply`, a draft is marked only when it has more than three distinct resolved task IDs, no `not_helpful` reuse events, and an existing non-stale source draft. The command refreshes stable links in `ai-wiki/people/<handle>/index.md`; exact reuse counts stay in generated reports under `ai-wiki/_toolkit/reports/promotion-candidates/<handle>/`.
454
+
455
+ To inspect referenced files and estimated time impact from local reuse evidence:
456
+
457
+ ```bash
458
+ aiwiki-toolkit report usefulness --handle your-handle
459
+ aiwiki-toolkit report usefulness --handle your-handle --format json
460
+ ```
461
+
462
+ This writes `ai-wiki/_toolkit/reports/usefulness/<handle-or-all>/latest.md` and `.json`. It lists referenced files and sums resolved-event `estimated_savings`. Baseline/current/remaining durations are reported as `unknown` until task logs include explicit timing evidence.
463
+
464
+ To generate a weekly local HTML review queue:
465
+
466
+ ```bash
467
+ aiwiki-toolkit report weekly --handle your-handle
468
+ aiwiki-toolkit report weekly --handle your-handle --if-due
469
+ ```
470
+
471
+ This writes a static HTML review queue and JSON payload under `ai-wiki/_toolkit/reports/weekly/<handle>/<iso-week>/`, refreshes `latest.html` and `latest.json`, and records the last generated period in `ai-wiki/_toolkit/reports/weekly/<handle>/state.json`. The HTML page focuses only on items that need human judgment: promotion candidates, personal drafts that may need diagnosis, and not-helpful signals. Coverage, referenced-file, and other raw evidence remains in the JSON payload and supporting reports; saved-time estimates belong in impact-eval reports, not the weekly HTML view. Use `--if-due` from cron, launchd, or an agent workflow so the same ISO week is generated once; use `--force` for local testing before a release.
443
472
 
444
473
  To summarize first-attempt product impact from a captured eval run:
445
474
 
446
475
  ```bash
476
+ aiwiki-toolkit eval impact families
477
+ aiwiki-toolkit eval impact families --format json
478
+ aiwiki-toolkit eval impact discover
479
+ aiwiki-toolkit eval impact family show ownership_boundary
480
+ aiwiki-toolkit eval impact family candidates
481
+ aiwiki-toolkit eval impact family init --name retry_loop --from-candidate problems/retry-loop --baseline-ref HEAD^
482
+ aiwiki-toolkit eval impact family draft --candidate problems_retry_loop --baseline-ref HEAD^
483
+ aiwiki-toolkit eval impact family promote --candidate problems_retry_loop
484
+ aiwiki-toolkit eval impact family promote --candidate problems_retry_loop --apply
485
+ aiwiki-toolkit eval impact plan --family ownership_boundary
486
+ aiwiki-toolkit eval impact plan --family ownership_boundary --format json
487
+ aiwiki-toolkit eval impact prepare --family ownership_boundary
488
+ aiwiki-toolkit eval impact prepare --family ownership_boundary --format json
489
+ aiwiki-toolkit eval impact run --run-dir /path/to/eval-run --slot s01
490
+ aiwiki-toolkit eval impact run --run-dir /path/to/eval-run --all-slots --score-policy command-exit
491
+ aiwiki-toolkit eval impact run --run-dir /path/to/eval-run --all-slots --score-policy rubric --rubric evals/impact/rubrics/my-family.json
492
+ aiwiki-toolkit eval impact benchmark --family ownership_boundary --score-policy command-exit
493
+ aiwiki-toolkit eval impact schedule report --handle your-handle --candidate-max-items 25
494
+ aiwiki-toolkit eval impact schedule run --family ownership_boundary --score-policy command-exit
495
+ aiwiki-toolkit eval impact schedule run --all-runnable --if-due --score-policy rubric
496
+ aiwiki-toolkit eval impact capture --run-dir /path/to/eval-run --slot s01 --prompt-level original --first-pass-success
497
+ aiwiki-toolkit eval impact validate --run-dir /path/to/eval-run
498
+ aiwiki-toolkit eval impact score --run-dir /path/to/eval-run --slot s01 --prompt-level original --label success
499
+ aiwiki-toolkit eval impact manifest --run-dir /path/to/eval-run
500
+ aiwiki-toolkit eval impact manifest --run-dir /path/to/eval-run --format json
447
501
  aiwiki-toolkit eval impact report --run-dir /path/to/eval-run
448
502
  aiwiki-toolkit eval impact report --run-dir /path/to/eval-run --format json
449
503
  aiwiki-toolkit eval impact summarize --run-dir /path/to/eval-run --run-dir /path/to/another-run
450
504
  aiwiki-toolkit eval impact summarize --runs-file evals/impact/runs.json
451
505
  ```
452
506
 
453
- This reads an existing run directory with `metadata.json`, result captures, optional
454
- `score.json` files, and optional `confounds.json`. It compares the run's primary variants,
455
- normally `no_aiwiki_workflow` versus `aiwiki_ambient_memory_workflow`, using first-attempt
456
- metrics only: `first_pass` captures count toward the signal, while `final` repair captures
457
- stay diagnostic. The command reports first-attempt success rate, average score, attempts, human
458
- nudges, changed files, untracked files, change-profile splits for project files versus AI wiki
459
- telemetry and user-owned wiki churn, and whether the run is ready for shareable causal claims.
460
- It does not run agents or mutate eval artifacts.
507
+ Use `eval impact families` before running benchmarks. It discovers registered families from
508
+ `evals/impact/families/*/spec.toml`, reports readiness, prompt and rubric presence, memory fixture
509
+ counts, baseline refs, historical issues, and next commands. Use `eval impact family show <name>`
510
+ for one family.
511
+
512
+ Use `eval impact family candidates` to expose trial/error replay candidates from existing AI wiki
513
+ telemetry. It layers over `diagnose memory --focus trial-error` and reports candidate readiness
514
+ without writing user-owned AI wiki docs. Use `eval impact family init --from-candidate ...` only
515
+ after confirming a source incident, baseline ref, prompt shape, and rubric direction; it creates a
516
+ draft family scaffold under `evals/impact/`.
517
+
518
+ Use `eval impact discover` for the continuous loop. It refreshes the managed candidate queue under
519
+ `ai-wiki/_toolkit/evals/candidates/`, preserves first-seen/last-seen/seen-count state, and prints
520
+ the next draft, promotion, and schedule commands. Use `eval impact family draft` to create managed
521
+ candidate files under `ai-wiki/_toolkit/evals/drafts/<candidate>/` without registering a formal
522
+ family. Use `eval impact family promote` as a report-only gate; add `--apply` only after the draft
523
+ has a real baseline ref, prompt, and rubric and you want to write formal files under
524
+ `evals/impact/`.
525
+
526
+ Use `eval impact plan` to inspect the next run before creating workspaces or invoking agents. It
527
+ reads `evals/impact/families/<family>/spec.toml` and prompt files, then reports the planned
528
+ baseline ref, prompt hashes, workflow-primary variants, output paths, and script commands. The plan
529
+ command does not mutate eval artifacts or call an agent.
530
+
531
+ Use `eval impact prepare` to execute the planned setup only: it creates neutral slot workspaces,
532
+ creates the run directory and metadata, and writes initial `manifest.json` and `manifest.md` files.
533
+ It still does not call an agent.
534
+
535
+ Use `eval impact run` to invoke Codex CLI against one neutral slot or all slots in an already
536
+ prepared run. The command calls the repo-local slot runner, captures first-pass artifacts,
537
+ optionally exports visible Codex sessions, validates confounds, applies an explicit score policy,
538
+ and writes a report bundle under `<run-dir>/report_bundle/`. The default score policy is `none`.
539
+ `--score-policy command-exit` is useful for smoke tests and execution-health automation, but it
540
+ only scores Codex/save-result command completion; use manual or semantic scoring before making
541
+ research-quality correctness claims.
542
+ `--score-policy rubric` reads an `impact-eval-rubric-v1` JSON file, writes
543
+ `rubric_judgment.json` next to each slot score, then writes the normal `score.json` artifact.
544
+ Rubric criteria can inspect captured diffs, final messages, result fields, changed files, and
545
+ untracked files.
546
+
547
+ Use `eval impact benchmark` when you want one command to prepare a family and immediately run all
548
+ slots. It wraps `prepare` plus `run`, then returns the prepared run directory, run result, validation
549
+ status, scores, and report bundle.
550
+
551
+ Use `eval impact schedule report` to generate a periodic benchmark dashboard under
552
+ `ai-wiki/_toolkit/evals/reports/<period>/`. It combines registered families, the managed candidate
553
+ queue, and the run index. Pass the same candidate filters you use for discovery, such as `--handle`,
554
+ `--since`, and `--candidate-max-items`, so the scheduled report does not accidentally stale a
555
+ larger queue with a narrower refresh. Use `eval impact schedule run --family <name>` or
556
+ `--all-runnable` to run benchmarks, append `ai-wiki/_toolkit/evals/runs/index.json`, refresh the
557
+ report, and record `ai-wiki/_toolkit/evals/schedule/state.json`. `--if-due` is intended for cron,
558
+ launchd, or an agent workflow that should run at most once per period.
559
+
560
+ Use `eval impact capture` after a manual first pass or repaired pass to save `result.json`, the
561
+ workspace diff, status, head, and optional final-message artifact. It infers slot variant and
562
+ workspace from `metadata.json` when possible. Use `eval impact validate` after exporting visible
563
+ sessions to write `confounds.json`; missing exports are reported as critical confounds rather than
564
+ silently accepted. Use `eval impact score` to write the manual `score.json` artifact for a slot.
565
+ Each of these commands refreshes `manifest.json` and `manifest.md` so the run inventory stays
566
+ current.
567
+
568
+ The report and manifest commands read an existing run directory with `metadata.json`, result
569
+ captures, optional `score.json` files, and optional `confounds.json`. The `eval impact report`
570
+ command compares the run's primary variants, normally `no_aiwiki_workflow` versus
571
+ `aiwiki_ambient_memory_workflow`, using first-attempt metrics only: `first_pass` captures count
572
+ toward the signal, while `final` repair captures stay diagnostic. The command reports
573
+ first-attempt success rate, average score, attempts, human nudges, changed files, untracked files,
574
+ change-profile splits for project files versus AI wiki telemetry and user-owned wiki churn, and
575
+ whether the run is ready for shareable causal claims. It does not run agents.
576
+
577
+ Use `eval impact manifest` to audit run identity before interpreting scores. It reports the
578
+ baseline ref, prompt hashes, model, reasoning effort, execution surface, slot-to-variant mapping,
579
+ session export presence, confounds, and captured artifact paths.
461
580
 
462
581
  Use `eval impact summarize` to aggregate multiple captured runs into a product-level dashboard.
463
582
  It reports each family's primary outcome, product signal, shareability, success and score deltas,
@@ -516,7 +635,7 @@ Even with `--purge-user-docs --yes`, the shared home wiki under `~/ai-wiki/syste
516
635
  - `ai-wiki/index.md` is a repo-owned map and is not treated as a starter-drift upgrade target by `doctor`.
517
636
  - `ai-wiki/workflows.md` remains user-owned; package-managed workflow updates land in `ai-wiki/_toolkit/workflows.md` instead of rewriting the repo-owned file.
518
637
  - `.env.aiwiki` stores the current local actor identity in a managed block. It is gitignored and should not be committed.
519
- - `ai-wiki/metrics/reuse-events/<handle>.jsonl` and `ai-wiki/metrics/task-checks/<handle>.jsonl` are user-owned evidence data. `ai-wiki/work/events/<handle>.jsonl` is user-owned work state. Package-managed aggregate views are regenerated under `ai-wiki/_toolkit/metrics/` and `ai-wiki/_toolkit/work/`; memory diagnostics are generated under `ai-wiki/_toolkit/diagnostics/`; consolidation queues are generated under `ai-wiki/_toolkit/consolidation/`. The installer ignores those generated paths by default in `.gitignore`.
638
+ - `ai-wiki/metrics/reuse-events/<handle>.jsonl` and `ai-wiki/metrics/task-checks/<handle>.jsonl` are user-owned evidence data. `ai-wiki/work/events/<handle>.jsonl` is user-owned work state. Package-managed aggregate views are regenerated under `ai-wiki/_toolkit/metrics/` and `ai-wiki/_toolkit/work/`; handle-scoped metrics are regenerated under `ai-wiki/_toolkit/metrics/by-handle/<handle>/`; memory diagnostics, consolidation queues, promotion reports, usefulness reports, and weekly reports are written under handle-scoped generated paths where they depend on a handle. The installer ignores those generated paths by default in `.gitignore`.
520
639
  - Legacy flat files such as `ai-wiki/metrics/reuse-events.jsonl` and `ai-wiki/metrics/task-checks.jsonl` are still read for compatibility, but new writes should use the handle-sharded layout.
521
640
  - `aiwiki-toolkit doctor --suggest-index-upgrade` prints suggested replacements for missing repo starter docs and repo-owned companion docs such as `ai-wiki/workflows.md`, but it does not overwrite them automatically.
522
641
  - Package-owned `.agents/skills/ai-wiki-reuse-check/**`, `.agents/skills/ai-wiki-update-check/**`, `.agents/skills/ai-wiki-clarify-before-code/**`, `.agents/skills/ai-wiki-capture-review-learning/**`, and `.agents/skills/ai-wiki-consolidate-drafts/**` are refreshed by `install` so package workflow updates reach existing repos.
Binary file
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "ai-wiki-toolkit-linux-arm64",
3
- "version": "0.1.31",
3
+ "version": "0.1.35",
4
4
  "description": "Platform binary package for ai-wiki-toolkit (linux-arm64-glibc).",
5
5
  "license": "MIT",
6
6
  "homepage": "https://github.com/BochengYin/ai-wiki-toolkit#readme",