npm - @sanity/ailf-studio - Versions diffs - 0.1.16 → 0.1.18 - Mend

@sanity/ailf-studio 0.1.16 → 0.1.18

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (4) hide show

package/README.md CHANGED Viewed

@@ -57,6 +57,28 @@ This registers:
 - The `ailf.evalRequest` document type (evaluation request triggers)
 - The **AI Literacy Framework** dashboard tool in the Studio sidebar
+#### Document Actions
+The plugin registers two document actions for triggering evaluations directly
+from Studio:
+- **Run Task Eval** (on `ailf.task` documents) — evaluates a single task. Click
+  ▶ in the document actions menu to run all test cases for the task against the
+  current documentation. The button shows the score when complete (~10–15 min).
+  No secrets needed — it creates an `ailf.evalRequest` document that a
+  server-side webhook dispatches to the pipeline.
+- **Run AI Eval** (on content releases) — evaluates all tasks affected by a
+  content release. Appears in the release detail page's action bar. Answers "did
+  my doc changes help or hurt AI agent performance?" Shows score and delta vs
+  baseline when complete.
+Both actions use the same mechanism: they create an `ailf.evalRequest` document
+in the Content Lake with `status: "pending"`. A server-side Sanity webhook picks
+up the document and dispatches the pipeline via GitHub Actions. The Studio
+component polls for the resulting report and updates the button label with the
+score.
 ### 3. Alternative: tool-only installation
 If you only want the dashboard tool without the document schemas (e.g., the
@@ -103,6 +125,21 @@ export default defineConfig({
 })
 ```
+## Task Execution Workflows
+Tasks created in Studio are automatically included in every pipeline run — no
+registration step needed. There are four ways to execute tasks:
+| Method                         | Trigger                                                           | Scope                         |
+| ------------------------------ | ----------------------------------------------------------------- | ----------------------------- |
+| **Run Task Eval** action       | Click ▶ on any `ailf.task` document                               | Single task                   |
+| **Run AI Eval** release action | Click button on a content release page                            | Tasks affected by the release |
+| **CLI pipeline**               | `ailf pipeline` (with optional `--area`/`--task`/`--tag` filters) | All enabled tasks             |
+| **Scheduled pipeline**         | GitHub Actions cron (daily + weekly)                              | All enabled tasks             |
+See [CONTRIBUTING_TASKS.md](../../docs/CONTRIBUTING_TASKS.md#running-your-task)
+for the full execution flow and details on each method.
 ## Dashboard Views
 The plugin provides three tab views plus a detail drill-down, accessible from

package/dist/index.d.ts CHANGED Viewed

@@ -436,10 +436,24 @@ declare const reportSchema: {
  * - A gold-standard implementation (reference solution)
  * - When/how the task runs (execution controls)
  *
+ * ## Execution paths
+ *
+ * Published tasks are automatically discovered by the pipeline — no
+ * registration step needed. There are four ways to execute a task:
+ *
+ * 1. **Run Task Eval** — click ▶ on any ailf.task document in Studio.
+ *    Creates an ailf.evalRequest scoped to this task. Webhook dispatches
+ *    the pipeline; score appears inline when complete (~10–15 min).
+ * 2. **Run AI Eval** — click on a content release page. Auto-scopes to
+ *    tasks whose canonical docs are in the release.
+ * 3. **CLI** — `ailf pipeline --task <id>` or `ailf pipeline --area <area>`.
+ * 4. **Scheduled** — GitHub Actions cron (daily baseline, weekly full).
+ *
  * Tasks can be authored natively in Studio or mirrored from external
  * repositories. Mirrored tasks have a read-only `origin` block that
  * tracks their source repo provenance.
  *
+ * @see docs/CONTRIBUTING_TASKS.md#running-your-task — full execution guide
  * @see docs/design-docs/tasks-as-content.md
  * @see docs/design-docs/tasks-as-content.md#decision-8-domain-specific-assertion-types-not-a-promptfoo-subset
  */
@@ -574,6 +588,10 @@ interface ProvenanceData {
         sha: string;
     };
     graderModel: string;
+    lineage?: {
+        comparedAgainst?: string;
+        rerunOf?: string;
+    };
     mode: string;
     models: {
         id: string;