@sanity/ailf-studio 0.1.16 → 0.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -57,6 +57,28 @@ This registers:
57
57
  - The `ailf.evalRequest` document type (evaluation request triggers)
58
58
  - The **AI Literacy Framework** dashboard tool in the Studio sidebar
59
59
 
60
+ #### Document Actions
61
+
62
+ The plugin registers two document actions for triggering evaluations directly
63
+ from Studio:
64
+
65
+ - **Run Task Eval** (on `ailf.task` documents) — evaluates a single task. Click
66
+ ▶ in the document actions menu to run all test cases for the task against the
67
+ current documentation. The button shows the score when complete (~10–15 min).
68
+ No secrets needed — it creates an `ailf.evalRequest` document that a
69
+ server-side webhook dispatches to the pipeline.
70
+
71
+ - **Run AI Eval** (on content releases) — evaluates all tasks affected by a
72
+ content release. Appears in the release detail page's action bar. Answers "did
73
+ my doc changes help or hurt AI agent performance?" Shows score and delta vs
74
+ baseline when complete.
75
+
76
+ Both actions use the same mechanism: they create an `ailf.evalRequest` document
77
+ in the Content Lake with `status: "pending"`. A server-side Sanity webhook picks
78
+ up the document and dispatches the pipeline via GitHub Actions. The Studio
79
+ component polls for the resulting report and updates the button label with the
80
+ score.
81
+
60
82
  ### 3. Alternative: tool-only installation
61
83
 
62
84
  If you only want the dashboard tool without the document schemas (e.g., the
@@ -103,6 +125,21 @@ export default defineConfig({
103
125
  })
104
126
  ```
105
127
 
128
+ ## Task Execution Workflows
129
+
130
+ Tasks created in Studio are automatically included in every pipeline run — no
131
+ registration step needed. There are four ways to execute tasks:
132
+
133
+ | Method | Trigger | Scope |
134
+ | ------------------------------ | ----------------------------------------------------------------- | ----------------------------- |
135
+ | **Run Task Eval** action | Click ▶ on any `ailf.task` document | Single task |
136
+ | **Run AI Eval** release action | Click button on a content release page | Tasks affected by the release |
137
+ | **CLI pipeline** | `ailf pipeline` (with optional `--area`/`--task`/`--tag` filters) | All enabled tasks |
138
+ | **Scheduled pipeline** | GitHub Actions cron (daily + weekly) | All enabled tasks |
139
+
140
+ See [CONTRIBUTING_TASKS.md](../../docs/CONTRIBUTING_TASKS.md#running-your-task)
141
+ for the full execution flow and details on each method.
142
+
106
143
  ## Dashboard Views
107
144
 
108
145
  The plugin provides three tab views plus a detail drill-down, accessible from
package/dist/index.d.ts CHANGED
@@ -436,10 +436,24 @@ declare const reportSchema: {
436
436
  * - A gold-standard implementation (reference solution)
437
437
  * - When/how the task runs (execution controls)
438
438
  *
439
+ * ## Execution paths
440
+ *
441
+ * Published tasks are automatically discovered by the pipeline — no
442
+ * registration step needed. There are four ways to execute a task:
443
+ *
444
+ * 1. **Run Task Eval** — click ▶ on any ailf.task document in Studio.
445
+ * Creates an ailf.evalRequest scoped to this task. Webhook dispatches
446
+ * the pipeline; score appears inline when complete (~10–15 min).
447
+ * 2. **Run AI Eval** — click on a content release page. Auto-scopes to
448
+ * tasks whose canonical docs are in the release.
449
+ * 3. **CLI** — `ailf pipeline --task <id>` or `ailf pipeline --area <area>`.
450
+ * 4. **Scheduled** — GitHub Actions cron (daily baseline, weekly full).
451
+ *
439
452
  * Tasks can be authored natively in Studio or mirrored from external
440
453
  * repositories. Mirrored tasks have a read-only `origin` block that
441
454
  * tracks their source repo provenance.
442
455
  *
456
+ * @see docs/CONTRIBUTING_TASKS.md#running-your-task — full execution guide
443
457
  * @see docs/design-docs/tasks-as-content.md
444
458
  * @see docs/design-docs/tasks-as-content.md#decision-8-domain-specific-assertion-types-not-a-promptfoo-subset
445
459
  */
@@ -574,6 +588,10 @@ interface ProvenanceData {
574
588
  sha: string;
575
589
  };
576
590
  graderModel: string;
591
+ lineage?: {
592
+ comparedAgainst?: string;
593
+ rerunOf?: string;
594
+ };
577
595
  mode: string;
578
596
  models: {
579
597
  id: string;