@sanity/ailf-studio 0.1.5 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -13,28 +13,13 @@ AILF reports are stored.
13
13
 
14
14
  ### 1. Add the dependency
15
15
 
16
- #### Continuous releases (recommended for external projects)
17
-
18
- Every merge to `main` that touches `packages/studio/` automatically publishes
19
- via [pkg.pr.new](https://pkg.pr.new). Install the latest main build:
20
-
21
- ```bash
22
- pnpm add https://pkg.pr.new/sanity-labs/ai-literacy-framework/@sanity/ailf-studio@main
23
- ```
24
-
25
- Or pin to a specific commit:
26
-
27
16
  ```bash
28
- pnpm add https://pkg.pr.new/sanity-labs/ai-literacy-framework/@sanity/ailf-studio@<commit-sha>
17
+ pnpm add @sanity/ailf-studio
29
18
  ```
30
19
 
31
- To update to the latest build, re-run the install command the `@main` URL
32
- always resolves to the most recent build.
33
-
34
- #### PR preview packages
35
-
36
- PRs labeled `trigger: preview` also publish preview packages. Install URLs are
37
- posted as PR comments automatically.
20
+ > **Note:** The package is published with `restricted` access to the `@sanity`
21
+ > npm scope. You need an npm token with read access — see the root
22
+ > [README](../../README.md#obtain-secrets) for how to obtain one.
38
23
 
39
24
  #### Within the monorepo
40
25
 
@@ -70,7 +55,7 @@ This registers:
70
55
  - The `ailf.referenceSolution` document type (gold-standard reference
71
56
  implementations)
72
57
  - The `ailf.evalRequest` document type (evaluation request triggers)
73
- - The **AI Literacy** dashboard tool in the Studio sidebar
58
+ - The **AI Literacy Framework** dashboard tool in the Studio sidebar
74
59
 
75
60
  ### 3. Alternative: tool-only installation
76
61
 
@@ -120,7 +105,8 @@ export default defineConfig({
120
105
 
121
106
  ## Dashboard Views
122
107
 
123
- The plugin provides five views accessible from tabs in the dashboard:
108
+ The plugin provides three tab views plus a detail drill-down, accessible from
109
+ the **AI Literacy Framework** tool in the Studio sidebar.
124
110
 
125
111
  ### Latest Reports
126
112
 
@@ -128,11 +114,14 @@ A card list of the most recent evaluation reports. Each card shows:
128
114
 
129
115
  - Overall score, doc lift, and lowest-scoring area
130
116
  - Evaluation mode, source, and trigger type
131
- - Git metadata (branch, PR number) when available
117
+ - Git metadata (branch, PR number, origin repo) when available
132
118
  - Auto-comparison delta against the previous run
133
119
 
134
120
  Click any card to navigate to the Report Detail view.
135
121
 
122
+ The view includes a **search bar** for filtering reports by document slug, area,
123
+ or content release perspective.
124
+
136
125
  ### Score Timeline
137
126
 
138
127
  A line chart of overall and per-area scores over time. Filterable by:
@@ -153,25 +142,22 @@ report from dropdowns, then view:
153
142
  - Per-model deltas (when both reports include per-model breakdowns)
154
143
  - Noise threshold classification
155
144
 
156
- ### Content Impact
157
-
158
- Find all evaluation reports related to a specific Sanity document. Enter a
159
- document ID to see:
160
-
161
- - Which evaluations included that document in their target set
162
- - Score trends for that document's feature area over time
163
- - Whether edits to the document improved or regressed scores
164
-
165
145
  ### Report Detail
166
146
 
167
- Full drill-down into a single report:
168
-
169
- - Per-area score table with all dimensions (task completion, code correctness,
170
- doc coverage, lift from docs)
171
- - Per-model breakdowns with cost-per-quality-point
172
- - Provenance metadata (trigger, git info, grader model, context hash)
173
- - Auto-comparison summary against the previous comparable run
174
- - Link to the Promptfoo web viewer for raw evaluation output
147
+ Full drill-down into a single report (navigated from Latest Reports or a direct
148
+ URL):
149
+
150
+ - **Overview stats** — composite score, doc lift, cost, duration
151
+ - **Per-area score table** with all dimensions (task completion, code
152
+ correctness, doc coverage, lift from docs)
153
+ - **Three-layer table** floor / ceiling / actual decomposition (when
154
+ available)
155
+ - **Per-model breakdowns** with cost-per-quality-point
156
+ - **Judgment list** — individual grader verdicts with reasoning
157
+ - **Recommendations** — gap analysis remediation suggestions (when available)
158
+ - **Provenance card** — trigger, git info (branch, PR, origin repo), grader
159
+ model, context hash, eval fingerprint
160
+ - **Auto-comparison summary** against the previous comparable run
175
161
 
176
162
  ## Filtering
177
163
 
@@ -198,13 +184,13 @@ export default defineConfig({
198
184
  })
199
185
  ```
200
186
 
201
- Reports are written by the evaluation pipeline (`turbo pipeline -- --publish`).
202
- See the [report store design docs](../../docs/design-docs/report-store/index.md)
203
- for the full architecture.
187
+ Reports are written by the evaluation pipeline (`ailf pipeline --publish`). See
188
+ the [report store design docs](../../docs/design-docs/report-store/index.md) for
189
+ the full architecture.
204
190
 
205
191
  ## Exported API
206
192
 
207
- The plugin exports building blocks for custom views or extensions:
193
+ The plugin exports building blocks for custom views or extensions.
208
194
 
209
195
  ### Plugin & Tool
210
196
 
@@ -230,6 +216,7 @@ The plugin exports building blocks for custom views or extensions:
230
216
  | ------------------- | --------------------------------------------------------------------------------------------- |
231
217
  | `AssertionInput` | Custom input for task assertions with contextual type descriptions and monospace code styling |
232
218
  | `CanonicalDocInput` | Custom input for canonical doc references with polymorphic resolution type help |
219
+ | `ReleasePicker` | Content release perspective picker for evaluation scoping |
233
220
  | `MirrorBanner` | Banner showing repo source, sync status, and provenance for mirrored tasks |
234
221
  | `SyncStatusBadge` | Colored badge (green/yellow/red) showing sync freshness of mirrored tasks |
235
222
 
@@ -240,31 +227,58 @@ The plugin exports building blocks for custom views or extensions:
240
227
  | `GraduateToNativeAction` | Converts a mirrored (read-only) task to a native (editable) task by removing origin |
241
228
  | `createRunEvaluationAction` | Factory for creating a Studio action that triggers evaluations |
242
229
 
230
+ ### Glossary
231
+
232
+ | Export | Description |
233
+ | ---------- | ------------------------------------------------------------------------ |
234
+ | `GLOSSARY` | Centralized tooltip descriptions for all evaluation metrics and concepts |
235
+
243
236
  ### GROQ Queries
244
237
 
245
- | Export | Description |
246
- | ---------------------- | --------------------------------------- |
247
- | `latestReportsQuery` | N most recent reports (filterable) |
248
- | `scoreTimelineQuery` | Score data points over time |
249
- | `reportDetailQuery` | Full report with all fields |
250
- | `comparisonPairQuery` | Two reports for side-by-side comparison |
251
- | `contentImpactQuery` | Reports related to a document ID |
252
- | `distinctSourcesQuery` | All unique source names |
253
- | `distinctModesQuery` | All unique evaluation modes |
254
- | `distinctAreasQuery` | All unique feature areas |
238
+ | Export | Description |
239
+ | ------------------------------ | ------------------------------------------ |
240
+ | `latestReportsQuery` | N most recent reports (filterable) |
241
+ | `scoreTimelineQuery` | Score data points over time |
242
+ | `reportDetailQuery` | Full report with all fields |
243
+ | `comparisonPairQuery` | Two reports for side-by-side comparison |
244
+ | `contentImpactQuery` | Reports related to a document ID |
245
+ | `recentDocumentEvalsQuery` | Recent evaluations for a specific document |
246
+ | `articleSearchQuery` | Full-text search across article documents |
247
+ | `distinctSourcesQuery` | All unique source names |
248
+ | `distinctModesQuery` | All unique evaluation modes |
249
+ | `distinctAreasQuery` | All unique feature areas |
250
+ | `distinctModelsQuery` | All unique model identifiers |
251
+ | `distinctPerspectivesQuery` | All unique content release perspectives |
252
+ | `distinctTargetDocumentsQuery` | All unique target document slugs |
255
253
 
256
254
  ### Types
257
255
 
258
- | Export | Description |
259
- | ------------------- | ---------------------------------------------- |
260
- | `ReportListItem` | Shape returned by `latestReportsQuery` |
261
- | `ReportDetail` | Shape returned by `reportDetailQuery` |
262
- | `TimelineDataPoint` | Shape returned by `scoreTimelineQuery` |
263
- | `ComparisonData` | Auto-comparison data embedded in reports |
264
- | `ContentImpactItem` | Shape returned by `contentImpactQuery` |
265
- | `ProvenanceData` | Report provenance metadata |
266
- | `SummaryData` | Score summary (overall + per-area + per-model) |
267
- | `ScoreItem` | Individual area score entry |
256
+ | Export | Description |
257
+ | ---------------------------- | --------------------------------------------------------------------- |
258
+ | `ReportListItem` | Shape returned by `latestReportsQuery` |
259
+ | `ReportDetail` | Shape returned by `reportDetailQuery` |
260
+ | `TimelineDataPoint` | Shape returned by `scoreTimelineQuery` |
261
+ | `ComparisonData` | Auto-comparison data embedded in reports |
262
+ | `ContentImpactItem` | Shape returned by `contentImpactQuery` |
263
+ | `ProvenanceData` | Report provenance metadata |
264
+ | `SummaryData` | Score summary (overall + per-area + per-model) |
265
+ | `ScoreItem` | Individual area score entry |
266
+ | `RecommendationGap` | Single gap analysis recommendation |
267
+ | `RecommendationsData` | Full recommendations payload |
268
+ | `JudgmentData` | Individual grader judgment with reasoning |
269
+ | `DocumentRef` | Canonical document reference (re-exported from `@sanity/ailf-shared`) |
270
+ | `ScoreGrade` | Letter grade type (re-exported from `@sanity/ailf-shared`) |
271
+ | `scoreGrade` | Function to compute letter grade from numeric score |
272
+ | `RunEvaluationActionOptions` | Options for `createRunEvaluationAction` factory |
273
+
274
+ ### Utility Functions
275
+
276
+ | Export | Description |
277
+ | -------------------- | --------------------------------------------------------- |
278
+ | `formatPercent` | Format a number as a percentage string |
279
+ | `formatRelativeTime` | Format an ISO timestamp as relative time (e.g., "2h ago") |
280
+ | `formatDelta` | Format a score delta with +/− sign |
281
+ | `formatDuration` | Format milliseconds as human-readable duration |
268
282
 
269
283
  ## Development
270
284
 
@@ -279,8 +293,8 @@ pnpm --filter @sanity/ailf-studio dev
279
293
  turbo build
280
294
  ```
281
295
 
282
- The plugin is pure TypeScript (TSC compilation, no bundler). The consuming
283
- Studio's bundler (Vite) handles the final bundle.
296
+ The plugin uses [tsup](https://github.com/egoist/tsup) for bundling. The
297
+ consuming Studio's bundler (Vite) handles the final bundle.
284
298
 
285
299
  ## Related Documentation
286
300
 
package/dist/index.d.ts CHANGED
@@ -436,10 +436,10 @@ declare const webhookConfigSchema: {
436
436
  * supports browser back/forward navigation.
437
437
  *
438
438
  * Route structure:
439
- * /ai-literacy → Latest Reports (home)
440
- * /ai-literacy/report/:reportId → Report Detail
441
- * /ai-literacy/timeline → Score Timeline
442
- * /ai-literacy/compare → Compare
439
+ * /ailf → Latest Reports (home)
440
+ * /ailf/report/:reportId → Report Detail
441
+ * /ailf/timeline → Score Timeline
442
+ * /ailf/compare → Compare
443
443
  */
444
444
 
445
445
  /**