@hachej/boring-bi-dashboard 0.1.60
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +83 -0
- package/dist/front/index.d.ts +28 -0
- package/dist/front/index.js +831 -0
- package/dist/shared/index.d.ts +16 -0
- package/dist/shared/index.js +131 -0
- package/dist/types-BGZKL9Rs.d.ts +146 -0
- package/docs/issues/bi-dashboard-plugin/README.md +349 -0
- package/docs/issues/bi-dashboard-plugin/data-access-unification.md +638 -0
- package/docs/issues/bi-dashboard-plugin/data-bridge.md +425 -0
- package/example/.gitignore +16 -0
- package/example/.pi/extensions/bi-dashboard/front/index.ts +1 -0
- package/example/.pi/extensions/bi-dashboard/package.json +11 -0
- package/example/README.md +10 -0
- package/example/dashboards/people.dashboard.json +178 -0
- package/example/data/people.csv +13 -0
- package/example/eval/bi-dashboard.yaml +31 -0
- package/package.json +85 -0
- package/playground/README.md +25 -0
- package/playground/run-eval.ts +100 -0
- package/playground/smoke-dashboard.ts +101 -0
- package/skills/bi-dashboard-authoring/SKILL.md +78 -0
|
@@ -0,0 +1,638 @@
|
|
|
1
|
+
# Data Access Unification Plan
|
|
2
|
+
|
|
3
|
+
## Status
|
|
4
|
+
|
|
5
|
+
Draft plan. This plan supersedes the current split-brain prototype where:
|
|
6
|
+
|
|
7
|
+
- `@hachej/boring-data-bridge` parses workspace CSV/JSON/NDJSON files itself.
|
|
8
|
+
- BI dashboard falls back to `/api/v1/files/records` and repeats aggregation in the browser.
|
|
9
|
+
- Perspective-like panels render JSON rows in an HTML table instead of using a real Perspective/Arrow preparation path.
|
|
10
|
+
|
|
11
|
+
## Goal
|
|
12
|
+
|
|
13
|
+
Unify local file data, semantic queries, database-backed queries, DuckDB workspace files, and Perspective preparation behind a clean set of responsibilities:
|
|
14
|
+
|
|
15
|
+
1. **Workspace file APIs own file-backed tabular reads.**
|
|
16
|
+
2. **`data-bridge` owns semantic/query execution and adapter routing.**
|
|
17
|
+
3. **Perspective consumers negotiate transport; server selects the safe transport.**
|
|
18
|
+
4. **DuckDB/SQLite workspace files are file-backed databases: preview via file API, query via data bridge.**
|
|
19
|
+
|
|
20
|
+
The end state should make dashboards, data explorer, agents, and future report/notebook plugins share one data model instead of each plugin inventing its own CSV parser, query runner, or Perspective loader.
|
|
21
|
+
|
|
22
|
+
## Current Findings
|
|
23
|
+
|
|
24
|
+
### Existing `/api/v1/files/records`
|
|
25
|
+
|
|
26
|
+
Implemented in:
|
|
27
|
+
|
|
28
|
+
- `packages/agent/src/server/http/routes/file.ts`
|
|
29
|
+
- `packages/agent/src/server/http/routes/fileRecords.ts`
|
|
30
|
+
|
|
31
|
+
Current behavior:
|
|
32
|
+
|
|
33
|
+
- Supports JSON array, NDJSON/JSONL, and CSV.
|
|
34
|
+
- Accepts `path`, `offset`, `limit`, `q`.
|
|
35
|
+
- Enforces maximum file bytes, output bytes, row scan limits, returned row limits, column sampling limits.
|
|
36
|
+
- Uses the workspace adapter (`workspace.stat`, `workspace.readFile`) rather than raw filesystem paths.
|
|
37
|
+
- Returns:
|
|
38
|
+
|
|
39
|
+
```ts
|
|
40
|
+
interface FileRecordsResult {
|
|
41
|
+
source: { kind: "file"; path: string; format: "json-array" | "ndjson" | "csv" }
|
|
42
|
+
path: string
|
|
43
|
+
format: string
|
|
44
|
+
columns: { name: string; type: string }[]
|
|
45
|
+
rows: Record<string, unknown>[]
|
|
46
|
+
total: number
|
|
47
|
+
hasMore: boolean
|
|
48
|
+
offset: number
|
|
49
|
+
limit: number
|
|
50
|
+
mtimeMs?: number
|
|
51
|
+
}
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
### Current `data-bridge` prototype
|
|
55
|
+
|
|
56
|
+
Implemented in:
|
|
57
|
+
|
|
58
|
+
- `plugins/data-bridge/src/server/index.ts`
|
|
59
|
+
|
|
60
|
+
Current behavior:
|
|
61
|
+
|
|
62
|
+
- Registers `data.v1.query.run` through WorkspaceBridge.
|
|
63
|
+
- Has a workspace-file adapter that reads files directly from `workspaceRoot`.
|
|
64
|
+
- Has its own CSV/JSON/NDJSON parsing and aggregation.
|
|
65
|
+
- Has BSL subprocess execution.
|
|
66
|
+
- Has realpath containment checks, but still duplicates file API behavior.
|
|
67
|
+
|
|
68
|
+
### Current BI dashboard runtime
|
|
69
|
+
|
|
70
|
+
Implemented in:
|
|
71
|
+
|
|
72
|
+
- `plugins/bi-dashboard/src/front/dashboardData.ts`
|
|
73
|
+
|
|
74
|
+
Current behavior:
|
|
75
|
+
|
|
76
|
+
- Calls `data.v1.query.run` first for each dashboard query.
|
|
77
|
+
- Falls back to `/api/v1/files/records` if bridge fails and `dataRef.kind === "workspace-file"`.
|
|
78
|
+
- Browser fallback pages file records and aggregates client-side.
|
|
79
|
+
- `BSLPerspectiveViewer` renders a plain table from JSON rows.
|
|
80
|
+
|
|
81
|
+
## Architecture Decision
|
|
82
|
+
|
|
83
|
+
### Responsibility Split
|
|
84
|
+
|
|
85
|
+
| Data/source shape | Owner | API |
|
|
86
|
+
| --- | --- | --- |
|
|
87
|
+
| Plain workspace CSV/JSON/NDJSON | workspace file subsystem | `/api/v1/files/records` and future file data endpoints |
|
|
88
|
+
| Plain workspace Parquet/Arrow | workspace file subsystem | future `/api/v1/files/data`/`arrow`/`perspective` endpoints |
|
|
89
|
+
| DuckDB/SQLite file preview | workspace file subsystem | future preview endpoints |
|
|
90
|
+
| DuckDB/SQLite query execution | `data-bridge` adapter | `data.v1.query.run` |
|
|
91
|
+
| BSL semantic model query | `data-bridge` adapter | `data.v1.query.run` |
|
|
92
|
+
| Remote DB/warehouse query | `data-bridge` adapter | `data.v1.query.run` |
|
|
93
|
+
| Perspective-ready data | negotiated server operation | file API for local files, `data.v1.perspective.prepare` for query/adapters |
|
|
94
|
+
|
|
95
|
+
### Why not put everything in file APIs?
|
|
96
|
+
|
|
97
|
+
File APIs should answer: “read this workspace file as data.”
|
|
98
|
+
|
|
99
|
+
They should not become a general SQL/BSL/warehouse execution surface. Query execution needs adapter policy, credentials, trusted callers, timeouts, capabilities, and runtime tokens. That belongs in `data-bridge`.
|
|
100
|
+
|
|
101
|
+
### Why not put workspace-file reads only in `data-bridge`?
|
|
102
|
+
|
|
103
|
+
Because `/api/v1/files/records` already exists, is paginated, enforces file limits, uses the workspace adapter abstraction, and is useful outside BI dashboards. Duplicating CSV/JSON parsing in `data-bridge` creates inconsistent semantics and security drift.
|
|
104
|
+
|
|
105
|
+
## Target APIs
|
|
106
|
+
|
|
107
|
+
## 1. File Records API: keep and extract shared implementation
|
|
108
|
+
|
|
109
|
+
Keep existing HTTP endpoint:
|
|
110
|
+
|
|
111
|
+
```http
|
|
112
|
+
GET /api/v1/files/records?path=data.csv&offset=0&limit=100&q=engineer
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Extract the core logic into a reusable server module that can be called by both HTTP routes and trusted server plugins.
|
|
116
|
+
|
|
117
|
+
Suggested package location:
|
|
118
|
+
|
|
119
|
+
```txt
|
|
120
|
+
packages/file-data/src/server/
|
|
121
|
+
records.ts // parser + paging + type inference, no Fastify
|
|
122
|
+
workspaceRecords.ts // workspace adapter integration
|
|
123
|
+
formats/
|
|
124
|
+
csv.ts
|
|
125
|
+
json.ts
|
|
126
|
+
ndjson.ts
|
|
127
|
+
leases.ts // read-only local materialization for DB files
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
The reusable implementation must live in a domain-neutral server-only package, **not** in `@hachej/boring-agent/server` and preferably not in workspace core unless the team explicitly decides that tabular workspace-file parsing is part of the workspace file subsystem. Recommended package: `@hachej/file-data` under `packages/file-data/`.
|
|
131
|
+
|
|
132
|
+
Rationale: `@hachej/boring-data-bridge` is a workspace server plugin; making it depend on the agent HTTP layer would couple reusable data adapters to the host app/agent package. Conversely, making `@hachej/boring-workspace` understand CSV/Arrow/Perspective risks violating the workspace package's domain-neutral boundary. A small server-only file-data package lets the agent route, workspace file APIs, and data-bridge share implementation without teaching workspace about BSL/SQL/Perspective.
|
|
133
|
+
|
|
134
|
+
Public/server exports from `@hachej/file-data/server`:
|
|
135
|
+
|
|
136
|
+
```ts
|
|
137
|
+
export async function readWorkspaceFileRecords(args: {
|
|
138
|
+
workspace: WorkspaceFileAdapter
|
|
139
|
+
path: string
|
|
140
|
+
offset?: number
|
|
141
|
+
limit?: number
|
|
142
|
+
q?: string | null
|
|
143
|
+
maxFileBytes?: number
|
|
144
|
+
maxRowsScanned?: number
|
|
145
|
+
}): Promise<FileRecordsResult>
|
|
146
|
+
|
|
147
|
+
export function buildFileRecordsResult(args: {
|
|
148
|
+
path: string
|
|
149
|
+
content: string
|
|
150
|
+
mtimeMs?: number
|
|
151
|
+
offset: number
|
|
152
|
+
limit: number
|
|
153
|
+
q: string | null
|
|
154
|
+
}): FileRecordsResult
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
Then move both the HTTP route and data-bridge adapter to use this same implementation.
|
|
158
|
+
|
|
159
|
+
Acceptance:
|
|
160
|
+
|
|
161
|
+
- Existing `/api/v1/files/records` tests remain green.
|
|
162
|
+
- `data-bridge` no longer contains a separate CSV parser.
|
|
163
|
+
- BI dashboard fallback and bridge produce the same rows/columns for the same workspace file.
|
|
164
|
+
|
|
165
|
+
## 2. File data negotiation API
|
|
166
|
+
|
|
167
|
+
Add a file-centric endpoint for tabular data transport negotiation.
|
|
168
|
+
|
|
169
|
+
Preferred shape:
|
|
170
|
+
|
|
171
|
+
```http
|
|
172
|
+
GET /api/v1/files/data?path=data.csv&representation=records&offset=0&limit=100
|
|
173
|
+
GET /api/v1/files/data?path=data.parquet&representation=arrow
|
|
174
|
+
GET /api/v1/files/data?path=data.csv&representation=perspective&transport.preferred=auto
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
Alternative: separate endpoints:
|
|
178
|
+
|
|
179
|
+
```http
|
|
180
|
+
GET /api/v1/files/records
|
|
181
|
+
GET /api/v1/files/arrow
|
|
182
|
+
GET /api/v1/files/perspective
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
Recommendation: use one negotiated endpoint once we add Arrow/Perspective, but keep `/records` as a stable compatibility endpoint.
|
|
186
|
+
|
|
187
|
+
Request:
|
|
188
|
+
|
|
189
|
+
```ts
|
|
190
|
+
type FileDataRepresentation = "records" | "arrow" | "perspective"
|
|
191
|
+
type DataTransport = "inline" | "artifact" | "websocket"
|
|
192
|
+
type PreferredDataTransport = "auto" | DataTransport
|
|
193
|
+
type PayloadFormat = "arrow" | "json"
|
|
194
|
+
|
|
195
|
+
interface FileDataRequest {
|
|
196
|
+
path: string
|
|
197
|
+
representation?: FileDataRepresentation
|
|
198
|
+
transport?: {
|
|
199
|
+
preferred?: PreferredDataTransport
|
|
200
|
+
accepted?: DataTransport[] // never includes "auto"
|
|
201
|
+
payloadFormat?: PayloadFormat
|
|
202
|
+
maxInlineBytes?: number
|
|
203
|
+
}
|
|
204
|
+
offset?: number
|
|
205
|
+
limit?: number
|
|
206
|
+
q?: string
|
|
207
|
+
table?: string // source-specific preview selector for DB-like files; no arbitrary SQL
|
|
208
|
+
sheet?: string // future spreadsheet selector
|
|
209
|
+
}
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
Response variants:
|
|
213
|
+
|
|
214
|
+
```ts
|
|
215
|
+
type FileDataResponse =
|
|
216
|
+
| FileRecordsResult
|
|
217
|
+
| FileArrowResponse
|
|
218
|
+
| FilePerspectiveResponse
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
Arrow response:
|
|
222
|
+
|
|
223
|
+
```ts
|
|
224
|
+
interface InlinePayloadDescriptor {
|
|
225
|
+
bytes?: number
|
|
226
|
+
data?: unknown
|
|
227
|
+
base64?: string
|
|
228
|
+
}
|
|
229
|
+
|
|
230
|
+
interface ArtifactPayloadDescriptor {
|
|
231
|
+
url: string
|
|
232
|
+
contentType: string
|
|
233
|
+
expiresAt?: string
|
|
234
|
+
bytes?: number
|
|
235
|
+
}
|
|
236
|
+
|
|
237
|
+
interface FileArrowResponse {
|
|
238
|
+
kind: "workspace-file.arrow"
|
|
239
|
+
version: 1
|
|
240
|
+
transport: "inline" | "artifact"
|
|
241
|
+
payloadFormat: "arrow"
|
|
242
|
+
contentType: "application/vnd.apache.arrow.file" | "application/vnd.apache.arrow.stream"
|
|
243
|
+
inline?: InlinePayloadDescriptor
|
|
244
|
+
artifact?: ArtifactPayloadDescriptor
|
|
245
|
+
schema?: Array<{ name: string; type: string }>
|
|
246
|
+
rowCount?: number
|
|
247
|
+
source: { kind: "file"; path: string; fileFormat: string }
|
|
248
|
+
}
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
File Perspective response:
|
|
252
|
+
|
|
253
|
+
```ts
|
|
254
|
+
interface FilePerspectiveResponse {
|
|
255
|
+
kind: "workspace-file.perspective"
|
|
256
|
+
version: 1
|
|
257
|
+
transport: DataTransport
|
|
258
|
+
payloadFormat: PayloadFormat
|
|
259
|
+
schema?: Array<{ name: string; type: string }>
|
|
260
|
+
rowCount?: number
|
|
261
|
+
source: { kind: "file"; path: string; fileFormat: string }
|
|
262
|
+
inline?: InlinePayloadDescriptor
|
|
263
|
+
artifact?: ArtifactPayloadDescriptor
|
|
264
|
+
websocket?: { url: string; protocol: "perspective" | "data-bridge-arrow-delta"; sessionId: string }
|
|
265
|
+
viewer?: PerspectiveViewerConfig
|
|
266
|
+
}
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
## 3. Data bridge query API
|
|
270
|
+
|
|
271
|
+
Keep:
|
|
272
|
+
|
|
273
|
+
```ts
|
|
274
|
+
data.v1.query.run
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
Use it for:
|
|
278
|
+
|
|
279
|
+
- BSL semantic queries.
|
|
280
|
+
- Remote DB/warehouse queries.
|
|
281
|
+
- DuckDB/SQLite file-backed queries.
|
|
282
|
+
- Structured dashboard queries over queryable sources.
|
|
283
|
+
|
|
284
|
+
Do **not** use it as the primary implementation for plain CSV/JSON file previews. If a dashboard query points directly at a plain workspace file, the bridge may delegate to shared file-record services internally, but the canonical file data surface remains the file API.
|
|
285
|
+
|
|
286
|
+
Request direction:
|
|
287
|
+
|
|
288
|
+
```ts
|
|
289
|
+
type DataRef =
|
|
290
|
+
| { kind: "workspace-file"; path: string; fileFormat?: "csv" | "json" | "ndjson" | "parquet" | "arrow"; limit?: number }
|
|
291
|
+
| { kind: "duckdb-file"; path: string; table?: string }
|
|
292
|
+
| { kind: "sqlite-file"; path: string; table?: string }
|
|
293
|
+
| { kind: "semantic-model"; model: string }
|
|
294
|
+
|
|
295
|
+
interface DataBridgeQueryRunInput {
|
|
296
|
+
source?: string
|
|
297
|
+
query:
|
|
298
|
+
| DataBridgeDashboardQuery
|
|
299
|
+
| DataBridgeSemanticQuery
|
|
300
|
+
| DataBridgeSqlQuery
|
|
301
|
+
}
|
|
302
|
+
|
|
303
|
+
interface DataBridgeDashboardQuery {
|
|
304
|
+
language: "bsl-dashboard"
|
|
305
|
+
model: string
|
|
306
|
+
dataRef?: DataRef
|
|
307
|
+
groupBy?: string[]
|
|
308
|
+
measures?: string[]
|
|
309
|
+
dimensions?: string[]
|
|
310
|
+
filters?: DataBridgeFilterExpression[]
|
|
311
|
+
orderBy?: Array<[field: string, direction: "asc" | "desc"]>
|
|
312
|
+
limit?: number
|
|
313
|
+
}
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
Response remains:
|
|
317
|
+
|
|
318
|
+
```ts
|
|
319
|
+
interface DataBridgeTableResult {
|
|
320
|
+
kind: "data-bridge.table"
|
|
321
|
+
version: 1
|
|
322
|
+
columns: DataBridgeColumn[]
|
|
323
|
+
rows: Record<string, unknown>[]
|
|
324
|
+
rowCount: number
|
|
325
|
+
truncated?: boolean
|
|
326
|
+
source?: string
|
|
327
|
+
}
|
|
328
|
+
```
|
|
329
|
+
|
|
330
|
+
Important: `data.v1.query.run` should generally return small/medium aggregated JSON table results for metrics/charts. Large raw datasets should not come through this path for visualization.
|
|
331
|
+
|
|
332
|
+
## 4. Data bridge Perspective prepare API
|
|
333
|
+
|
|
334
|
+
Add:
|
|
335
|
+
|
|
336
|
+
```ts
|
|
337
|
+
data.v1.perspective.prepare
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
This is the query/adapter equivalent of file perspective preparation. It handles BSL, DuckDB, SQL, remote DB, or any adapter-backed source.
|
|
341
|
+
|
|
342
|
+
Request:
|
|
343
|
+
|
|
344
|
+
```ts
|
|
345
|
+
type DataTransport = "inline" | "artifact" | "websocket"
|
|
346
|
+
type PreferredDataTransport = "auto" | DataTransport
|
|
347
|
+
type PayloadFormat = "arrow" | "json"
|
|
348
|
+
|
|
349
|
+
interface PerspectiveViewerConfig {
|
|
350
|
+
plugin?: "Datagrid" | "Y Line" | "X Bar" | string
|
|
351
|
+
columns?: string[]
|
|
352
|
+
group_by?: string[]
|
|
353
|
+
split_by?: string[]
|
|
354
|
+
sort?: Array<[string, "asc" | "desc"]>
|
|
355
|
+
filter?: unknown[][]
|
|
356
|
+
aggregates?: Record<string, string>
|
|
357
|
+
}
|
|
358
|
+
|
|
359
|
+
interface DataBridgePerspectivePrepareInput {
|
|
360
|
+
source?: string
|
|
361
|
+
datasetId?: string
|
|
362
|
+
query: DataBridgeQueryRunInput["query"]
|
|
363
|
+
viewer?: PerspectiveViewerConfig
|
|
364
|
+
transport?: {
|
|
365
|
+
preferred?: PreferredDataTransport
|
|
366
|
+
accepted: DataTransport[] // never includes "auto"
|
|
367
|
+
payloadFormat?: PayloadFormat
|
|
368
|
+
maxInlineBytes?: number
|
|
369
|
+
}
|
|
370
|
+
}
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
Response:
|
|
374
|
+
|
|
375
|
+
```ts
|
|
376
|
+
interface DataBridgePerspectiveResult {
|
|
377
|
+
kind: "data-bridge.perspective"
|
|
378
|
+
version: 1
|
|
379
|
+
transport: DataTransport
|
|
380
|
+
payloadFormat: PayloadFormat
|
|
381
|
+
schema?: DataBridgeColumn[]
|
|
382
|
+
rowCount?: number
|
|
383
|
+
source?: string
|
|
384
|
+
viewer?: PerspectiveViewerConfig
|
|
385
|
+
inline?: InlinePayloadDescriptor
|
|
386
|
+
artifact?: ArtifactPayloadDescriptor
|
|
387
|
+
websocket?: {
|
|
388
|
+
url: string
|
|
389
|
+
protocol: "perspective" | "data-bridge-arrow-delta"
|
|
390
|
+
sessionId: string
|
|
391
|
+
}
|
|
392
|
+
}
|
|
393
|
+
```
|
|
394
|
+
|
|
395
|
+
### Transport negotiation rule
|
|
396
|
+
|
|
397
|
+
The consumer advertises capability and preference; the server decides.
|
|
398
|
+
|
|
399
|
+
Consumer controls:
|
|
400
|
+
|
|
401
|
+
- `preferredTransport`
|
|
402
|
+
- `acceptedTransports`
|
|
403
|
+
- `format`
|
|
404
|
+
- `maxInlineBytes`
|
|
405
|
+
- viewer configuration hints
|
|
406
|
+
|
|
407
|
+
Server controls:
|
|
408
|
+
|
|
409
|
+
- actual transport
|
|
410
|
+
- whether inline is allowed
|
|
411
|
+
- maximum bytes/rows
|
|
412
|
+
- whether Arrow conversion is available
|
|
413
|
+
- artifact lifetime
|
|
414
|
+
- websocket availability
|
|
415
|
+
- security/auth/capability policy
|
|
416
|
+
|
|
417
|
+
Server must never honor `inline` blindly for large results.
|
|
418
|
+
|
|
419
|
+
## DuckDB Workspace Files
|
|
420
|
+
|
|
421
|
+
DuckDB files are physically workspace files but semantically queryable databases.
|
|
422
|
+
|
|
423
|
+
### Preview path: file API
|
|
424
|
+
|
|
425
|
+
Use file API for discovery and preview:
|
|
426
|
+
|
|
427
|
+
```http
|
|
428
|
+
GET /api/v1/files/duckdb/tables?path=analytics.duckdb
|
|
429
|
+
GET /api/v1/files/duckdb/records?path=analytics.duckdb&table=orders&offset=0&limit=100
|
|
430
|
+
GET /api/v1/files/data?path=analytics.duckdb&representation=records&table=orders
|
|
431
|
+
```
|
|
432
|
+
|
|
433
|
+
The file subsystem should enforce:
|
|
434
|
+
|
|
435
|
+
- path validation through workspace adapter
|
|
436
|
+
- read-only mode
|
|
437
|
+
- table name validation
|
|
438
|
+
- page limits
|
|
439
|
+
- no arbitrary SQL in preview endpoints
|
|
440
|
+
|
|
441
|
+
### Query path: data bridge
|
|
442
|
+
|
|
443
|
+
Use data bridge for SQL/semantic queries:
|
|
444
|
+
|
|
445
|
+
```ts
|
|
446
|
+
data.v1.query.run({
|
|
447
|
+
source: "duckdb-file",
|
|
448
|
+
query: {
|
|
449
|
+
language: "sql",
|
|
450
|
+
dialect: "duckdb",
|
|
451
|
+
sql: "select role, count(*) as count from people group by role",
|
|
452
|
+
dataRef: { kind: "duckdb-file", path: "analytics.duckdb" }
|
|
453
|
+
}
|
|
454
|
+
})
|
|
455
|
+
```
|
|
456
|
+
|
|
457
|
+
Safer dashboard query shape:
|
|
458
|
+
|
|
459
|
+
```ts
|
|
460
|
+
data.v1.query.run({
|
|
461
|
+
query: {
|
|
462
|
+
language: "bsl-dashboard",
|
|
463
|
+
model: "orders",
|
|
464
|
+
groupBy: ["month"],
|
|
465
|
+
measures: ["revenue"],
|
|
466
|
+
dataRef: {
|
|
467
|
+
kind: "duckdb-file",
|
|
468
|
+
path: "analytics.duckdb",
|
|
469
|
+
table: "orders"
|
|
470
|
+
}
|
|
471
|
+
}
|
|
472
|
+
})
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
DuckDB adapter requirements:
|
|
476
|
+
|
|
477
|
+
- Obtain the DB file through a workspace-owned `materializeWorkspaceFileReadOnly()`/`WorkspaceFileLease` abstraction rather than raw `workspaceRoot` paths.
|
|
478
|
+
- The lease resolves path through adapter policy, rejects symlink escapes, creates an immutable temp copy or read-only local path when needed, works for remote/sandbox workspaces, and is scoped to request/session lifetime.
|
|
479
|
+
- Open DuckDB against the lease in read-only mode and prevent WAL/sidecar writes to the workspace.
|
|
480
|
+
- Disable extension auto-install/loading unless explicitly allowed.
|
|
481
|
+
- Restrict filesystem/network functions.
|
|
482
|
+
- Run with timeout/cancellation.
|
|
483
|
+
- Enforce output row/byte limits.
|
|
484
|
+
- For dashboard structured queries, compile to parameterized/quoted DuckDB SQL.
|
|
485
|
+
- For direct SQL, require stronger capability such as `data:sql-query` and trusted caller class depending on policy.
|
|
486
|
+
|
|
487
|
+
## BI Dashboard Runtime Changes
|
|
488
|
+
|
|
489
|
+
### Metrics and charts
|
|
490
|
+
|
|
491
|
+
Use `data.v1.query.run` for non-file semantic/query sources.
|
|
492
|
+
|
|
493
|
+
For plain `workspace-file` dataRefs:
|
|
494
|
+
|
|
495
|
+
- Near-term: `data.v1.query.run` may be used, but the bridge must delegate to the shared file-record service so semantics match the file API.
|
|
496
|
+
- The existing browser-side `/files/records` aggregation fallback is temporary compatibility debt. Remove it in Phase 2 or guard it behind an explicit dev-only feature flag with a tracked removal criterion.
|
|
497
|
+
|
|
498
|
+
Avoid two independent parsers and avoid permanent dashboard-local query execution.
|
|
499
|
+
|
|
500
|
+
### Perspective panels
|
|
501
|
+
|
|
502
|
+
For `BSLPerspectiveViewer`:
|
|
503
|
+
|
|
504
|
+
1. If query `dataRef.kind === "workspace-file"`, call file data/perspective endpoint.
|
|
505
|
+
2. Else call `data.v1.perspective.prepare`.
|
|
506
|
+
3. Load the result into real Perspective:
|
|
507
|
+
- inline JSON/Arrow for small results
|
|
508
|
+
- artifact Arrow for larger static results
|
|
509
|
+
- websocket once server-side replicated Perspective is available
|
|
510
|
+
|
|
511
|
+
Do not route large raw Perspective datasets through `data.v1.query.run` JSON rows.
|
|
512
|
+
|
|
513
|
+
### Eval/E2E stitching
|
|
514
|
+
|
|
515
|
+
Add a true end-to-end workflow test:
|
|
516
|
+
|
|
517
|
+
1. Run the BI dashboard agent eval.
|
|
518
|
+
2. Locate generated `*.dashboard.json` in the eval workspace.
|
|
519
|
+
3. Start playground against that workspace.
|
|
520
|
+
4. Open the generated dashboard in browser.
|
|
521
|
+
5. Assert:
|
|
522
|
+
- dashboard title visible
|
|
523
|
+
- at least one metric has non-placeholder value
|
|
524
|
+
- at least one chart/table has rows
|
|
525
|
+
- no “No live data source configured yet” placeholder
|
|
526
|
+
- if Perspective component exists, it loads through Perspective prepare path
|
|
527
|
+
|
|
528
|
+
## Implementation Phases
|
|
529
|
+
|
|
530
|
+
### Phase 0 — Document and freeze architecture
|
|
531
|
+
|
|
532
|
+
- Add this plan.
|
|
533
|
+
- Update existing data-bridge and dashboard plans to reference this split.
|
|
534
|
+
- Mark current `data-bridge` workspace-file parser as temporary.
|
|
535
|
+
|
|
536
|
+
Acceptance:
|
|
537
|
+
|
|
538
|
+
- Reviewers agree on file API vs data bridge ownership.
|
|
539
|
+
|
|
540
|
+
### Phase 1 — Extract file records core
|
|
541
|
+
|
|
542
|
+
- Move file-record parsing/paging/type inference into reusable server module.
|
|
543
|
+
- Keep existing `/api/v1/files/records` behavior unchanged.
|
|
544
|
+
- Add unit tests around extracted module.
|
|
545
|
+
- Add contract tests proving `/api/v1/files/records` and `data-bridge` delegated workspace-file reads return compatible rows/columns/limits.
|
|
546
|
+
|
|
547
|
+
Acceptance:
|
|
548
|
+
|
|
549
|
+
- Existing file route tests pass unchanged.
|
|
550
|
+
- Extracted module can be imported without Fastify.
|
|
551
|
+
|
|
552
|
+
### Phase 2 — Make `data-bridge` delegate workspace-file reads
|
|
553
|
+
|
|
554
|
+
- Replace local CSV/JSON/NDJSON parser in `plugins/data-bridge` with shared file-record service.
|
|
555
|
+
- Preserve path/security semantics via workspace adapter, not raw `workspaceRoot` reads.
|
|
556
|
+
- Keep traversal/symlink tests or adapt them to workspace adapter semantics.
|
|
557
|
+
|
|
558
|
+
Acceptance:
|
|
559
|
+
|
|
560
|
+
- `@hachej/boring-data-bridge` has no custom CSV parser.
|
|
561
|
+
- Bridge and `/files/records` agree on rows/columns/limits.
|
|
562
|
+
- BI dashboard no longer has a permanent browser aggregation fallback; if retained temporarily, it is feature-flagged and documented for removal.
|
|
563
|
+
|
|
564
|
+
### Phase 3 — Add file data negotiation endpoint
|
|
565
|
+
|
|
566
|
+
- Add `/api/v1/files/data` or explicit `/files/arrow` + `/files/perspective` endpoint.
|
|
567
|
+
- Initially support `representation=records` and small `representation=perspective` with `transport.preferred=inline` and `payloadFormat=json`.
|
|
568
|
+
- Add Arrow support behind capability/dependency detection.
|
|
569
|
+
|
|
570
|
+
Acceptance:
|
|
571
|
+
|
|
572
|
+
- File endpoint can return records and a Perspective inline descriptor.
|
|
573
|
+
- Large inline requests are rejected or downgraded to artifact when artifact support exists.
|
|
574
|
+
|
|
575
|
+
### Phase 4 — Add `data.v1.perspective.prepare`
|
|
576
|
+
|
|
577
|
+
- Add shared contract to `@hachej/boring-data-bridge`.
|
|
578
|
+
- Implement inline JSON first for small results.
|
|
579
|
+
- Add Arrow artifact once artifact storage exists.
|
|
580
|
+
- Add websocket later.
|
|
581
|
+
|
|
582
|
+
Acceptance:
|
|
583
|
+
|
|
584
|
+
- `BSLPerspectiveViewer` no longer calls `data.v1.query.run` for large/raw Perspective loads.
|
|
585
|
+
- Server chooses actual transport from accepted transports.
|
|
586
|
+
|
|
587
|
+
### Phase 5 — DuckDB file support
|
|
588
|
+
|
|
589
|
+
- Add file API table discovery and paginated preview for `.duckdb`.
|
|
590
|
+
- Add `duckdb-file` data bridge adapter for query execution.
|
|
591
|
+
- Add `WorkspaceFileLease`/read-only materialization support for local and non-local workspace adapters.
|
|
592
|
+
- Add direct SQL gating policy.
|
|
593
|
+
- Add structured dashboard query compiler for DuckDB tables.
|
|
594
|
+
|
|
595
|
+
Acceptance:
|
|
596
|
+
|
|
597
|
+
- A workspace `.duckdb` file can be previewed without SQL.
|
|
598
|
+
- A dashboard can query a DuckDB table through `data.v1.query.run`.
|
|
599
|
+
- Direct SQL requires explicit capability and respects timeouts/output limits.
|
|
600
|
+
|
|
601
|
+
### Phase 6 — True generated dashboard render E2E
|
|
602
|
+
|
|
603
|
+
- Extend eval harness or add a script that runs agent eval then browser render against generated output.
|
|
604
|
+
- Assert real data rendering, not just valid JSON.
|
|
605
|
+
|
|
606
|
+
Acceptance:
|
|
607
|
+
|
|
608
|
+
- CI can prove: agent creates dashboard → file exists → dashboard opens → data renders.
|
|
609
|
+
|
|
610
|
+
## Security Requirements
|
|
611
|
+
|
|
612
|
+
- File APIs must use workspace adapter path validation, not raw filesystem reads.
|
|
613
|
+
- Symlink traversal must remain impossible for local filesystem workspaces.
|
|
614
|
+
- Direct SQL must be gated separately from structured dashboard queries.
|
|
615
|
+
- DuckDB/SQLite workspace files must be accessed through read-only leases/materialized snapshots, never raw unchecked paths.
|
|
616
|
+
- Direct BSL Python query strings remain trusted-only with `data:bsl-query-string`.
|
|
617
|
+
- Inline transports must enforce byte/row limits server-side.
|
|
618
|
+
- Websocket transports must bind to workspace/session auth and expire.
|
|
619
|
+
- Artifacts must have content type, no-sniff headers, expiry, and workspace/session authorization.
|
|
620
|
+
|
|
621
|
+
## Open Questions
|
|
622
|
+
|
|
623
|
+
1. Should `/api/v1/files/data` replace `/api/v1/files/records` long term, or should `/records` stay as the simple stable API?
|
|
624
|
+
2. Where should Arrow conversion live: `@hachej/boring-agent`, `@hachej/boring-data-bridge`, or a dedicated optional package?
|
|
625
|
+
3. What artifact store should be used for Arrow artifacts in local/dev/serverless modes?
|
|
626
|
+
4. Should DuckDB direct SQL be available to browser callers with a constrained read-only policy, or only runtime/server callers?
|
|
627
|
+
6. Should `WorkspaceFileLease` live in `@hachej/file-data/server` or as a minimal workspace adapter capability consumed by that package?
|
|
628
|
+
5. Should Perspective websocket mode live in `data-bridge`, the workspace server, or a dedicated `perspective-bridge` plugin?
|
|
629
|
+
|
|
630
|
+
## Recommended Immediate Next Step
|
|
631
|
+
|
|
632
|
+
Do **Phase 1 + Phase 2** before adding more dashboard features:
|
|
633
|
+
|
|
634
|
+
- Extract the existing file-record implementation.
|
|
635
|
+
- Make `data-bridge` delegate workspace-file reads to it.
|
|
636
|
+
- Keep `data.v1.query.run` for current dashboard metrics/charts.
|
|
637
|
+
|
|
638
|
+
This removes duplicated parsers and gives a safe foundation for Arrow/Perspective transport negotiation.
|