@sanity/ailf-studio 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md ADDED
@@ -0,0 +1,255 @@
1
+ # @sanity/ailf-studio
2
+
3
+ Sanity Studio dashboard plugin for the **AI Literacy Framework**. Visualizes
4
+ evaluation reports, score trends, comparisons, and content impact — directly
5
+ inside Sanity Studio with no external backend.
6
+
7
+ All data is read from the Sanity Content Lake via GROQ.
8
+
9
+ ## Installation
10
+
11
+ Install the plugin into any Sanity Studio that has access to the dataset where
12
+ AILF reports are stored.
13
+
14
+ ### 1. Add the dependency
15
+
16
+ #### Continuous releases (recommended for external projects)
17
+
18
+ Every merge to `main` that touches `packages/studio/` automatically publishes
19
+ via [pkg.pr.new](https://pkg.pr.new). Install the latest main build:
20
+
21
+ ```bash
22
+ pnpm add https://pkg.pr.new/sanity-labs/ai-literacy-framework/@sanity/ailf-studio@main
23
+ ```
24
+
25
+ Or pin to a specific commit:
26
+
27
+ ```bash
28
+ pnpm add https://pkg.pr.new/sanity-labs/ai-literacy-framework/@sanity/ailf-studio@<commit-sha>
29
+ ```
30
+
31
+ To update to the latest build, re-run the install command — the `@main` URL
32
+ always resolves to the most recent build.
33
+
34
+ #### PR preview packages
35
+
36
+ PRs labeled `trigger: preview` also publish preview packages. Install URLs are
37
+ posted as PR comments automatically.
38
+
39
+ #### Within the monorepo
40
+
41
+ ```bash
42
+ pnpm add @sanity/ailf-studio@workspace:*
43
+ ```
44
+
45
+ ### 2. Register the plugin
46
+
47
+ The recommended approach registers both the document schemas and the dashboard
48
+ tool in one call:
49
+
50
+ ```ts
51
+ // sanity.config.ts
52
+ import { defineConfig } from "sanity"
53
+ import { ailfPlugin } from "@sanity/ailf-studio"
54
+
55
+ export default defineConfig({
56
+ // ... your existing config
57
+ plugins: [
58
+ ailfPlugin(),
59
+ // ... other plugins
60
+ ],
61
+ })
62
+ ```
63
+
64
+ This registers:
65
+
66
+ - The `ailf.report` document type (read-only evaluation reports)
67
+ - The `ailf.webhookConfig` document type (webhook-triggered evaluation settings)
68
+ - The **AI Literacy** dashboard tool in the Studio sidebar
69
+
70
+ ### 3. Alternative: tool-only installation
71
+
72
+ If you only want the dashboard tool without the document schemas (e.g., the
73
+ schemas are already registered elsewhere):
74
+
75
+ ```ts
76
+ // sanity.config.ts
77
+ import { defineConfig } from "sanity"
78
+ import { ailfTool } from "@sanity/ailf-studio"
79
+
80
+ export default defineConfig({
81
+ // ... your existing config
82
+ tools: [ailfTool()],
83
+ })
84
+ ```
85
+
86
+ ### 4. Alternative: schema-only installation
87
+
88
+ If you want the document schemas without the dashboard (e.g., to query reports
89
+ programmatically):
90
+
91
+ ```ts
92
+ // sanity.config.ts
93
+ import { defineConfig } from "sanity"
94
+ import { reportSchema, webhookConfigSchema } from "@sanity/ailf-studio"
95
+
96
+ export default defineConfig({
97
+ // ... your existing config
98
+ schema: {
99
+ types: [reportSchema, webhookConfigSchema],
100
+ },
101
+ })
102
+ ```
103
+
104
+ ## Dashboard Views
105
+
106
+ The plugin provides five views accessible from tabs in the dashboard:
107
+
108
+ ### Latest Reports
109
+
110
+ A card list of the most recent evaluation reports. Each card shows:
111
+
112
+ - Overall score, doc lift, and lowest-scoring area
113
+ - Evaluation mode, source, and trigger type
114
+ - Git metadata (branch, PR number) when available
115
+ - Auto-comparison delta against the previous run
116
+
117
+ Click any card to navigate to the Report Detail view.
118
+
119
+ ### Score Timeline
120
+
121
+ A line chart of overall and per-area scores over time. Filterable by:
122
+
123
+ - **Source** — which documentation source was evaluated (e.g., production,
124
+ branch deploy)
125
+ - **Mode** — evaluation mode (baseline, observed, agentic)
126
+
127
+ Data points are interactive — click to jump to the full report.
128
+
129
+ ### Compare
130
+
131
+ Side-by-side comparison of any two reports. Select a baseline and experiment
132
+ report from dropdowns, then view:
133
+
134
+ - Overall score and doc-lift deltas
135
+ - Per-area deltas (improved / regressed / unchanged)
136
+ - Per-model deltas (when both reports include per-model breakdowns)
137
+ - Noise threshold classification
138
+
139
+ ### Content Impact
140
+
141
+ Find all evaluation reports related to a specific Sanity document. Enter a
142
+ document ID to see:
143
+
144
+ - Which evaluations included that document in their target set
145
+ - Score trends for that document's feature area over time
146
+ - Whether edits to the document improved or regressed scores
147
+
148
+ ### Report Detail
149
+
150
+ Full drill-down into a single report:
151
+
152
+ - Per-area score table with all dimensions (task completion, code correctness,
153
+ doc coverage, lift from docs)
154
+ - Per-model breakdowns with cost-per-quality-point
155
+ - Provenance metadata (trigger, git info, grader model, context hash)
156
+ - Auto-comparison summary against the previous comparable run
157
+ - Link to the Promptfoo web viewer for raw evaluation output
158
+
159
+ ## Filtering
160
+
161
+ The Dashboard and Score Timeline views share global filters:
162
+
163
+ - **Source filter** — values are auto-populated from distinct
164
+ `provenance.source.name` values across all reports
165
+ - **Mode filter** — values are auto-populated from distinct `provenance.mode`
166
+ values
167
+
168
+ Filters are applied via GROQ query parameters, so only matching reports are
169
+ fetched.
170
+
171
+ ## Dataset Configuration
172
+
173
+ The plugin reads reports from whatever dataset the Studio is configured to use.
174
+ To point it at a dedicated report dataset, configure the Studio's dataset:
175
+
176
+ ```ts
177
+ export default defineConfig({
178
+ projectId: "3do82whm",
179
+ dataset: "my-report-dataset", // or use AILF_REPORT_DATASET
180
+ plugins: [ailfPlugin()],
181
+ })
182
+ ```
183
+
184
+ Reports are written by the evaluation pipeline (`turbo pipeline -- --publish`).
185
+ See the [report store design docs](../../docs/design-docs/report-store/index.md)
186
+ for the full architecture.
187
+
188
+ ## Exported API
189
+
190
+ The plugin exports building blocks for custom views or extensions:
191
+
192
+ ### Plugin & Tool
193
+
194
+ | Export | Description |
195
+ | ------------ | ---------------------------- |
196
+ | `ailfPlugin` | Full plugin (schemas + tool) |
197
+ | `ailfTool` | Dashboard tool only |
198
+
199
+ ### Schemas
200
+
201
+ | Export | Description |
202
+ | --------------------- | -------------------------------------- |
203
+ | `reportSchema` | `ailf.report` document type definition |
204
+ | `webhookConfigSchema` | `ailf.webhookConfig` document type |
205
+
206
+ ### GROQ Queries
207
+
208
+ | Export | Description |
209
+ | ---------------------- | --------------------------------------- |
210
+ | `latestReportsQuery` | N most recent reports (filterable) |
211
+ | `scoreTimelineQuery` | Score data points over time |
212
+ | `reportDetailQuery` | Full report with all fields |
213
+ | `comparisonPairQuery` | Two reports for side-by-side comparison |
214
+ | `contentImpactQuery` | Reports related to a document ID |
215
+ | `distinctSourcesQuery` | All unique source names |
216
+ | `distinctModesQuery` | All unique evaluation modes |
217
+ | `distinctAreasQuery` | All unique feature areas |
218
+
219
+ ### Types
220
+
221
+ | Export | Description |
222
+ | ------------------- | ---------------------------------------------- |
223
+ | `ReportListItem` | Shape returned by `latestReportsQuery` |
224
+ | `ReportDetail` | Shape returned by `reportDetailQuery` |
225
+ | `TimelineDataPoint` | Shape returned by `scoreTimelineQuery` |
226
+ | `ComparisonData` | Auto-comparison data embedded in reports |
227
+ | `ContentImpactItem` | Shape returned by `contentImpactQuery` |
228
+ | `ProvenanceData` | Report provenance metadata |
229
+ | `SummaryData` | Score summary (overall + per-area + per-model) |
230
+ | `ScoreItem` | Individual area score entry |
231
+
232
+ ## Development
233
+
234
+ ```bash
235
+ # Build the plugin
236
+ pnpm --filter @sanity/ailf-studio build
237
+
238
+ # Watch mode (rebuilds on file changes)
239
+ pnpm --filter @sanity/ailf-studio dev
240
+
241
+ # Build everything (from repo root)
242
+ turbo build
243
+ ```
244
+
245
+ The plugin is pure TypeScript (TSC compilation, no bundler). The consuming
246
+ Studio's bundler (Vite) handles the final bundle.
247
+
248
+ ## Related Documentation
249
+
250
+ - [Report Store Design](../../docs/design-docs/report-store/index.md) — full
251
+ architecture and implementation plan
252
+ - [Visibility & Workflows](../../docs/design-docs/report-store/visibility-workflows.md)
253
+ — design rationale for the dashboard views
254
+ - [Report Store Architecture](../../docs/design-docs/report-store/architecture.md)
255
+ — Sanity Content Lake as the system of record