@sanity/ailf-studio 0.1.17 → 0.1.18
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +37 -0
- package/dist/index.d.ts +14 -0
- package/dist/index.js +20 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -57,6 +57,28 @@ This registers:
|
|
|
57
57
|
- The `ailf.evalRequest` document type (evaluation request triggers)
|
|
58
58
|
- The **AI Literacy Framework** dashboard tool in the Studio sidebar
|
|
59
59
|
|
|
60
|
+
#### Document Actions
|
|
61
|
+
|
|
62
|
+
The plugin registers two document actions for triggering evaluations directly
|
|
63
|
+
from Studio:
|
|
64
|
+
|
|
65
|
+
- **Run Task Eval** (on `ailf.task` documents) — evaluates a single task. Click
|
|
66
|
+
▶ in the document actions menu to run all test cases for the task against the
|
|
67
|
+
current documentation. The button shows the score when complete (~10–15 min).
|
|
68
|
+
No secrets needed — it creates an `ailf.evalRequest` document that a
|
|
69
|
+
server-side webhook dispatches to the pipeline.
|
|
70
|
+
|
|
71
|
+
- **Run AI Eval** (on content releases) — evaluates all tasks affected by a
|
|
72
|
+
content release. Appears in the release detail page's action bar. Answers "did
|
|
73
|
+
my doc changes help or hurt AI agent performance?" Shows score and delta vs
|
|
74
|
+
baseline when complete.
|
|
75
|
+
|
|
76
|
+
Both actions use the same mechanism: they create an `ailf.evalRequest` document
|
|
77
|
+
in the Content Lake with `status: "pending"`. A server-side Sanity webhook picks
|
|
78
|
+
up the document and dispatches the pipeline via GitHub Actions. The Studio
|
|
79
|
+
component polls for the resulting report and updates the button label with the
|
|
80
|
+
score.
|
|
81
|
+
|
|
60
82
|
### 3. Alternative: tool-only installation
|
|
61
83
|
|
|
62
84
|
If you only want the dashboard tool without the document schemas (e.g., the
|
|
@@ -103,6 +125,21 @@ export default defineConfig({
|
|
|
103
125
|
})
|
|
104
126
|
```
|
|
105
127
|
|
|
128
|
+
## Task Execution Workflows
|
|
129
|
+
|
|
130
|
+
Tasks created in Studio are automatically included in every pipeline run — no
|
|
131
|
+
registration step needed. There are four ways to execute tasks:
|
|
132
|
+
|
|
133
|
+
| Method | Trigger | Scope |
|
|
134
|
+
| ------------------------------ | ----------------------------------------------------------------- | ----------------------------- |
|
|
135
|
+
| **Run Task Eval** action | Click ▶ on any `ailf.task` document | Single task |
|
|
136
|
+
| **Run AI Eval** release action | Click button on a content release page | Tasks affected by the release |
|
|
137
|
+
| **CLI pipeline** | `ailf pipeline` (with optional `--area`/`--task`/`--tag` filters) | All enabled tasks |
|
|
138
|
+
| **Scheduled pipeline** | GitHub Actions cron (daily + weekly) | All enabled tasks |
|
|
139
|
+
|
|
140
|
+
See [CONTRIBUTING_TASKS.md](../../docs/CONTRIBUTING_TASKS.md#running-your-task)
|
|
141
|
+
for the full execution flow and details on each method.
|
|
142
|
+
|
|
106
143
|
## Dashboard Views
|
|
107
144
|
|
|
108
145
|
The plugin provides three tab views plus a detail drill-down, accessible from
|
package/dist/index.d.ts
CHANGED
|
@@ -436,10 +436,24 @@ declare const reportSchema: {
|
|
|
436
436
|
* - A gold-standard implementation (reference solution)
|
|
437
437
|
* - When/how the task runs (execution controls)
|
|
438
438
|
*
|
|
439
|
+
* ## Execution paths
|
|
440
|
+
*
|
|
441
|
+
* Published tasks are automatically discovered by the pipeline — no
|
|
442
|
+
* registration step needed. There are four ways to execute a task:
|
|
443
|
+
*
|
|
444
|
+
* 1. **Run Task Eval** — click ▶ on any ailf.task document in Studio.
|
|
445
|
+
* Creates an ailf.evalRequest scoped to this task. Webhook dispatches
|
|
446
|
+
* the pipeline; score appears inline when complete (~10–15 min).
|
|
447
|
+
* 2. **Run AI Eval** — click on a content release page. Auto-scopes to
|
|
448
|
+
* tasks whose canonical docs are in the release.
|
|
449
|
+
* 3. **CLI** — `ailf pipeline --task <id>` or `ailf pipeline --area <area>`.
|
|
450
|
+
* 4. **Scheduled** — GitHub Actions cron (daily baseline, weekly full).
|
|
451
|
+
*
|
|
439
452
|
* Tasks can be authored natively in Studio or mirrored from external
|
|
440
453
|
* repositories. Mirrored tasks have a read-only `origin` block that
|
|
441
454
|
* tracks their source repo provenance.
|
|
442
455
|
*
|
|
456
|
+
* @see docs/CONTRIBUTING_TASKS.md#running-your-task — full execution guide
|
|
443
457
|
* @see docs/design-docs/tasks-as-content.md
|
|
444
458
|
* @see docs/design-docs/tasks-as-content.md#decision-8-domain-specific-assertion-types-not-a-promptfoo-subset
|
|
445
459
|
*/
|
package/dist/index.js
CHANGED
|
@@ -2481,6 +2481,26 @@ var taskSchema = defineType5({
|
|
|
2481
2481
|
}),
|
|
2482
2482
|
// -----------------------------------------------------------------------
|
|
2483
2483
|
// Execution controls
|
|
2484
|
+
//
|
|
2485
|
+
// Pipeline support status:
|
|
2486
|
+
// - `enabled` → ✅ used in GROQ filter (execution.enabled != false)
|
|
2487
|
+
// - `blocking` → 🔮 forward-looking — schema reserved, not yet enforced
|
|
2488
|
+
// by the pipeline. PR check workflow reports scores but
|
|
2489
|
+
// does not gate merges on this flag.
|
|
2490
|
+
// - `trigger.branches`, `trigger.paths` → ✅ used by repo-based task
|
|
2491
|
+
// threshold evaluator (RepoThresholdEvaluator)
|
|
2492
|
+
// - `trigger.labels`, `trigger.schedule` → 🔮 forward-looking — schema
|
|
2493
|
+
// reserved, not yet read by the pipeline GROQ query or
|
|
2494
|
+
// mapped to TaskDefinition.
|
|
2495
|
+
// - `threshold.score` → ✅ used by repo-based threshold evaluator
|
|
2496
|
+
// - `threshold.dimensions` → 🔮 forward-looking — schema reserved, not
|
|
2497
|
+
// yet read by the pipeline GROQ query or mapped to
|
|
2498
|
+
// TaskDefinition.
|
|
2499
|
+
//
|
|
2500
|
+
// When implementing a forward-looking field, update the GROQ query in
|
|
2501
|
+
// content-lake-task-source.ts, the ContentLakeTask interface, the
|
|
2502
|
+
// mapToTaskDefinition() mapping, and (if pipeline-relevant) the
|
|
2503
|
+
// TaskDefinition port type in @sanity/ailf-core.
|
|
2484
2504
|
// -----------------------------------------------------------------------
|
|
2485
2505
|
defineField5({
|
|
2486
2506
|
description: "Controls when and how this task runs in the pipeline. Tasks without execution controls use defaults: enabled, non-blocking, no filters, no threshold.",
|