@jterrats/open-orchestra 1.0.11 → 1.0.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +56 -0
- package/dist/command-manifest.js +1 -1
- package/dist/command-manifest.js.map +1 -1
- package/dist/runtime-claude-native-bridge.d.ts +2 -0
- package/dist/runtime-claude-native-bridge.js +59 -0
- package/dist/runtime-claude-native-bridge.js.map +1 -0
- package/dist/runtime-execution.js +3 -0
- package/dist/runtime-execution.js.map +1 -1
- package/dist/runtime-parent-action-capacity.d.ts +10 -0
- package/dist/runtime-parent-action-capacity.js +110 -0
- package/dist/runtime-parent-action-capacity.js.map +1 -0
- package/dist/runtime-parent-action-dispatch.d.ts +1 -7
- package/dist/runtime-parent-action-dispatch.js +84 -90
- package/dist/runtime-parent-action-dispatch.js.map +1 -1
- package/dist/runtime-parent-action-eligibility.d.ts +10 -1
- package/dist/runtime-parent-action-eligibility.js +15 -2
- package/dist/runtime-parent-action-eligibility.js.map +1 -1
- package/dist/runtime-parent-action-fallback.d.ts +11 -0
- package/dist/runtime-parent-action-fallback.js +12 -0
- package/dist/runtime-parent-action-fallback.js.map +1 -0
- package/dist/task-graph-commands.js +18 -14
- package/dist/task-graph-commands.js.map +1 -1
- package/dist/types/runtime.d.ts +8 -0
- package/dist/workflow-run-commands.js +17 -9
- package/dist/workflow-run-commands.js.map +1 -1
- package/dist/workflow-task-service.js +27 -1
- package/dist/workflow-task-service.js.map +1 -1
- package/docs/audio-video-transcription-skill.md +441 -45
- package/docs/autonomous-workflow.md +38 -0
- package/docs/backlog/web-code-editor-lsp-spike.md +289 -0
- package/docs/context-vault.md +80 -0
- package/docs/runtime-adapters.md +11 -0
- package/docs/site-manifest.json +1 -0
- package/package.json +1 -1
|
@@ -1,58 +1,454 @@
|
|
|
1
|
-
# Audio/Video Transcription Skill
|
|
1
|
+
# Audio/Video Transcription Skill Spike
|
|
2
2
|
|
|
3
|
-
|
|
4
|
-
`
|
|
5
|
-
|
|
6
|
-
|
|
3
|
+
Task: `GH-367-TRANSCRIPTION-SKILL-SPIKE`
|
|
4
|
+
Issue: `GH-367` - Audio and video transcription evidence skill
|
|
5
|
+
Status: proposed architecture spike
|
|
6
|
+
Lead role: Architect
|
|
7
|
+
Review roles covered: Security/Privacy, QA, Developer
|
|
7
8
|
|
|
8
|
-
##
|
|
9
|
+
## Goal
|
|
9
10
|
|
|
10
|
-
-
|
|
11
|
-
|
|
12
|
-
|
|
13
|
-
- External transcription providers require explicit policy opt-in before media
|
|
14
|
-
or raw transcript text leaves the workspace.
|
|
15
|
-
- Transcript artifacts must redact secrets, tokens, credentials, configured PII,
|
|
16
|
-
and regulated data markers before persistence.
|
|
17
|
-
- Regulated contexts should record consent and retention notes before release.
|
|
11
|
+
Define a local-first transcription skill that turns workflow-local audio and
|
|
12
|
+
video artifacts into searchable, reviewable evidence without uploading media or
|
|
13
|
+
raw transcript text by default.
|
|
18
14
|
|
|
19
|
-
|
|
15
|
+
The skill should support interviews, demos, sprint reviews, QA recordings,
|
|
16
|
+
support sessions, discovery calls, and voice notes as task evidence. The first
|
|
17
|
+
release should be an on-demand CLI/service workflow, not a real-time meeting
|
|
18
|
+
bot.
|
|
20
19
|
|
|
21
|
-
|
|
20
|
+
## Acceptance Criteria From GH-367
|
|
22
21
|
|
|
23
|
-
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
-
|
|
27
|
-
-
|
|
28
|
-
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
-
|
|
32
|
-
|
|
33
|
-
|
|
22
|
+
- The skill activates on demand from transcription, audio, video, demo
|
|
23
|
+
recording, sprint review, interview, QA evidence, support call, or discovery
|
|
24
|
+
session signals.
|
|
25
|
+
- Local/offline transcription is the default path when available.
|
|
26
|
+
- External provider transcription requires explicit policy opt-in.
|
|
27
|
+
- Provenance records source file path or artifact id, hash, duration, language,
|
|
28
|
+
engine/provider/model, timestamp, actor, task id, and consent/retention notes
|
|
29
|
+
when provided.
|
|
30
|
+
- Outputs include human-readable Markdown and structured JSON; VTT/SRT are
|
|
31
|
+
available when timestamps are reliable.
|
|
32
|
+
- The skill extracts speakers, timestamps, decisions, risks, action items,
|
|
33
|
+
acceptance-criteria candidates, defects, and lesson-learned candidates.
|
|
34
|
+
- Sensitive values are redacted before persistence.
|
|
35
|
+
- QA evidence can reference transcripts and timestamp ranges.
|
|
36
|
+
- Failure modes are explicit and non-destructive.
|
|
37
|
+
- Tests cover activation, policy gating, redaction, provenance, output formats,
|
|
38
|
+
artifact path safety, and degraded local/offline behavior.
|
|
39
|
+
- Documentation explains local-first setup, privacy defaults, provider opt-in,
|
|
40
|
+
retention, and evidence usage.
|
|
34
41
|
|
|
35
|
-
|
|
36
|
-
SRT are appropriate only when timestamp confidence is good enough for reviewers
|
|
37
|
-
to navigate the media.
|
|
42
|
+
## Architecture Decision
|
|
38
43
|
|
|
39
|
-
|
|
44
|
+
Status: proposed
|
|
40
45
|
|
|
41
|
-
|
|
42
|
-
|
|
43
|
-
|
|
44
|
-
|
|
46
|
+
Decision: build a provider-neutral transcription pipeline with a local engine as
|
|
47
|
+
the default adapter, an external-provider adapter family behind explicit policy
|
|
48
|
+
gates, and a single evidence artifact contract shared by Markdown, JSON, VTT,
|
|
49
|
+
and SRT exporters.
|
|
45
50
|
|
|
46
|
-
|
|
51
|
+
Rationale:
|
|
52
|
+
|
|
53
|
+
- The product need is governed evidence, not generic media processing.
|
|
54
|
+
- Local-first execution minimizes default privacy and compliance risk.
|
|
55
|
+
- Provider-neutral interfaces let the product add `whisper.cpp`,
|
|
56
|
+
`faster-whisper`, OpenAI, Google Cloud Speech-to-Text, or future adapters
|
|
57
|
+
without changing the evidence contract.
|
|
58
|
+
- A single provenance envelope makes transcripts auditable across local and
|
|
59
|
+
external execution.
|
|
60
|
+
- Redaction must happen before persisted summaries, handoffs, and evidence
|
|
61
|
+
links are written.
|
|
62
|
+
|
|
63
|
+
Alternatives considered:
|
|
64
|
+
|
|
65
|
+
| Option | Benefits | Costs / Risks | Decision |
|
|
66
|
+
| --- | --- | --- | --- |
|
|
67
|
+
| Local-only skill | Strong privacy posture, works offline after setup, lower vendor risk | Hardware variability, slower on large files, local engine install friction | Use as default path |
|
|
68
|
+
| External-only provider | Better managed scaling and language/diarization options | Media leaves workspace, policy/compliance burden, cost controls required | Reject as default |
|
|
69
|
+
| Provider-specific OpenAI or Google command | Fast MVP if one provider is already approved | Locks product contract to one vendor shape | Reject for core architecture |
|
|
70
|
+
| Manual transcript attachment only | Minimal implementation | Does not satisfy evidence, provenance, or redaction requirements | Reject |
|
|
71
|
+
|
|
72
|
+
## Scope
|
|
73
|
+
|
|
74
|
+
In scope for the first implementation:
|
|
75
|
+
|
|
76
|
+
- Skill activation metadata and on-demand command/service contract.
|
|
77
|
+
- Workflow-local source validation for files and evidence artifact references.
|
|
78
|
+
- Audio extraction and media metadata probing through a bounded media adapter.
|
|
79
|
+
- Local transcription adapter contract with one supported local engine.
|
|
80
|
+
- Explicit opt-in policy contract for external providers.
|
|
81
|
+
- Transcript normalization, redaction, provenance, and artifact writing.
|
|
82
|
+
- Markdown and JSON transcript evidence output.
|
|
83
|
+
- VTT/SRT export when segment timestamps are available.
|
|
84
|
+
- QA evidence mapping from acceptance criteria to timestamp ranges.
|
|
85
|
+
|
|
86
|
+
Out of scope:
|
|
87
|
+
|
|
88
|
+
- Real-time meeting attendance or live call capture.
|
|
89
|
+
- Automatic external uploads.
|
|
90
|
+
- Medical, legal, or financial interpretation as final advice.
|
|
91
|
+
- Long-term media archive management beyond retention metadata.
|
|
92
|
+
- Speaker identification as identity proof. Diarization is a best-effort label.
|
|
93
|
+
|
|
94
|
+
## Proposed Component Boundaries
|
|
95
|
+
|
|
96
|
+
```text
|
|
97
|
+
Task / CLI / API request
|
|
98
|
+
-> Skill activation planner
|
|
99
|
+
-> Transcription request validator
|
|
100
|
+
-> Media probe and extraction adapter
|
|
101
|
+
-> Policy gate
|
|
102
|
+
-> Engine adapter
|
|
103
|
+
-> Transcript normalizer
|
|
104
|
+
-> Redaction service
|
|
105
|
+
-> Finding extractor
|
|
106
|
+
-> Evidence artifact writer
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
Recommended modules for implementation:
|
|
110
|
+
|
|
111
|
+
| Boundary | Responsibility | Notes |
|
|
112
|
+
| --- | --- | --- |
|
|
113
|
+
| `transcription-request` domain | Narrow request, policy, output, and failure types | No provider-specific fields in public APIs except through typed adapter config |
|
|
114
|
+
| `transcription-service` | Orchestrates validation, media probing, engine execution, redaction, extraction, and artifact writing | No shell string interpolation; adapters receive args arrays |
|
|
115
|
+
| `media-probe-adapter` | Reads duration, codec, stream info, and extracts audio when needed | Wrap `ffprobe`/`ffmpeg` with `spawn`/`execFile` args arrays |
|
|
116
|
+
| `transcription-engine-adapter` | Runs local or external provider and returns normalized segments | Provider-specific parsing stays here |
|
|
117
|
+
| `transcript-redaction-service` | Applies secret, credential, configured PII, and regulated marker redaction | Must fail closed when redaction cannot run |
|
|
118
|
+
| `transcript-evidence-writer` | Writes Markdown, JSON, VTT, SRT, and evidence metadata | Atomic writes; no raw unredacted transcript persistence |
|
|
119
|
+
| `workflow-evidence-integration` | Links transcript artifact ids to task evidence and acceptance criteria | Keeps QA evidence references stable |
|
|
120
|
+
|
|
121
|
+
## Local-First Stack Options
|
|
122
|
+
|
|
123
|
+
Use `ffmpeg`/`ffprobe` as the media utility layer when present. The
|
|
124
|
+
implementation should detect missing binaries and return a degraded evidence
|
|
125
|
+
result rather than requiring installation during command execution.
|
|
126
|
+
|
|
127
|
+
| Stack | Fit | Strengths | Constraints | Recommended use |
|
|
128
|
+
| --- | --- | --- | --- | --- |
|
|
129
|
+
| `whisper.cpp` | Local CPU/GPU transcription | Portable C/C++ runtime, offline operation, simple deployment model | Model download and hardware-dependent speed; diarization is not the core strength | Default MVP local engine candidate |
|
|
130
|
+
| `faster-whisper` | Local or self-hosted accelerated Whisper inference | CTranslate2-backed performance, useful with GPU or optimized CPU | Python/runtime packaging and model cache management | Phase 2 local high-throughput adapter |
|
|
131
|
+
| OS-native speech APIs | Local-ish desktop support where available | Low setup on specific platforms | Platform variance, privacy terms differ, not CI-friendly | Defer unless a product platform needs it |
|
|
132
|
+
| Self-hosted transcription service | Team-managed private endpoint | Centralized resource control, can use GPUs | Becomes infrastructure and access-control work | Later enterprise option |
|
|
133
|
+
|
|
134
|
+
Local setup principles:
|
|
135
|
+
|
|
136
|
+
- Never auto-download models without explicit user action or config.
|
|
137
|
+
- Store model cache locations in typed configuration.
|
|
138
|
+
- Record model name, model path or digest when available, and runtime version.
|
|
139
|
+
- Enforce default size and duration limits before invoking engines.
|
|
140
|
+
- Keep temp audio files inside a workflow-owned temp directory and delete them
|
|
141
|
+
after successful artifact generation unless retention policy says otherwise.
|
|
142
|
+
|
|
143
|
+
## External Provider Opt-In
|
|
144
|
+
|
|
145
|
+
External transcription is allowed only when all of these are true:
|
|
146
|
+
|
|
147
|
+
- The task or workspace policy explicitly enables the provider.
|
|
148
|
+
- The source artifact is classified as allowed for external processing.
|
|
149
|
+
- Required consent and retention notes are present for regulated or user
|
|
150
|
+
research recordings.
|
|
151
|
+
- Secrets and obvious credentials are screened before upload where technically
|
|
152
|
+
possible.
|
|
153
|
+
- Cost, size, duration, and rate-limit budgets are configured.
|
|
154
|
+
- The evidence artifact records provider, model, endpoint family, request id or
|
|
155
|
+
equivalent correlation id, timestamp, actor, and policy id.
|
|
156
|
+
|
|
157
|
+
Provider candidates:
|
|
158
|
+
|
|
159
|
+
| Provider family | Fit | Required controls |
|
|
160
|
+
| --- | --- | --- |
|
|
161
|
+
| OpenAI Audio transcription | Optional hosted provider for approved workspaces | Server-side API key only, HTTPS, request/correlation id logging, pinned model config, no client-side key exposure |
|
|
162
|
+
| Google Cloud Speech-to-Text | Optional hosted provider for enterprise Google Cloud tenants | IAM, regional endpoint review, audit logging, quota/budget controls, encryption and retention review |
|
|
163
|
+
| Other providers | Future adapters | Same policy gate and provenance contract before enablement |
|
|
164
|
+
|
|
165
|
+
External-provider adapters must not be referenced from product logic directly.
|
|
166
|
+
The service should receive a `TranscriptionEngineAdapter` selected by policy and
|
|
167
|
+
configuration.
|
|
168
|
+
|
|
169
|
+
## Privacy, Security, And Compliance Controls
|
|
170
|
+
|
|
171
|
+
Data classification:
|
|
172
|
+
|
|
173
|
+
| Data | Default classification | Handling |
|
|
174
|
+
| --- | --- | --- |
|
|
175
|
+
| Source audio/video | Sensitive evidence | Workflow-local only by default; external provider requires opt-in |
|
|
176
|
+
| Raw transcript before redaction | Sensitive derived data | In memory or temp-only; do not persist unless explicitly configured for debugging in a secure path |
|
|
177
|
+
| Redacted transcript | Workflow evidence | Persist as Markdown/JSON under evidence output path |
|
|
178
|
+
| Metadata/provenance | Audit data | Persist with artifact; avoid leaking local absolute paths in user-facing output when not needed |
|
|
179
|
+
| Consent/retention notes | Compliance metadata | Required for regulated/user research contexts before release |
|
|
180
|
+
|
|
181
|
+
Security requirements:
|
|
182
|
+
|
|
183
|
+
- Validate paths are inside the workspace or approved evidence directories.
|
|
184
|
+
- Reject traversal, symlinks that resolve outside allowed roots, unreadable
|
|
185
|
+
files, and unsupported artifact references.
|
|
186
|
+
- Use `spawn` or `execFile` with args arrays for `ffmpeg`, `ffprobe`, local
|
|
187
|
+
engines, and helper commands.
|
|
188
|
+
- Validate external URLs use `https://` and configured allowlisted hosts.
|
|
189
|
+
- Load provider secrets only from server-side environment or secret manager
|
|
190
|
+
integration; never write them to transcript artifacts or logs.
|
|
191
|
+
- Redact before persistence and before summary generation.
|
|
192
|
+
- Fail closed on redaction failure, provider policy denial, missing consent
|
|
193
|
+
where required, or unsupported file path.
|
|
194
|
+
- Cap file size, duration, segment count, output size, and provider cost.
|
|
195
|
+
- Store failure reports without copying raw transcript or media bytes.
|
|
196
|
+
|
|
197
|
+
Compliance requirements:
|
|
198
|
+
|
|
199
|
+
- Record consent status as one of `not_required`, `provided`, `missing`, or
|
|
200
|
+
`unknown`.
|
|
201
|
+
- Record retention class and delete/review date when provided.
|
|
202
|
+
- Mark regulated domain signals: health, finance, legal, government id,
|
|
203
|
+
children/minors, biometric voice data, and customer support PII.
|
|
204
|
+
- Require human review before using transcript-derived medical, legal,
|
|
205
|
+
financial, hiring, or support-account decisions as final determinations.
|
|
206
|
+
- Keep tenant policy explicit; do not infer permission from provider credentials
|
|
207
|
+
alone.
|
|
208
|
+
|
|
209
|
+
## Evidence Artifact Contract
|
|
210
|
+
|
|
211
|
+
Each transcript run should write one artifact set with a stable id:
|
|
212
|
+
|
|
213
|
+
```text
|
|
214
|
+
.agent-workflow/evidence/transcripts/<task-id>/<artifact-id>.md
|
|
215
|
+
.agent-workflow/evidence/transcripts/<task-id>/<artifact-id>.json
|
|
216
|
+
.agent-workflow/evidence/transcripts/<task-id>/<artifact-id>.vtt
|
|
217
|
+
.agent-workflow/evidence/transcripts/<task-id>/<artifact-id>.srt
|
|
218
|
+
```
|
|
219
|
+
|
|
220
|
+
VTT/SRT files are optional and should be omitted when timestamp confidence is
|
|
221
|
+
too low.
|
|
222
|
+
|
|
223
|
+
Minimum JSON shape:
|
|
224
|
+
|
|
225
|
+
```json
|
|
226
|
+
{
|
|
227
|
+
"schemaVersion": 1,
|
|
228
|
+
"taskId": "GH-367-TRANSCRIPTION-SKILL-SPIKE",
|
|
229
|
+
"artifactId": "transcript_20260522_000001",
|
|
230
|
+
"source": {
|
|
231
|
+
"kind": "workflow-file",
|
|
232
|
+
"path": ".agent-workflow/evidence/demo.mp4",
|
|
233
|
+
"sha256": "hex",
|
|
234
|
+
"durationMs": 120000,
|
|
235
|
+
"mimeType": "video/mp4",
|
|
236
|
+
"language": "en"
|
|
237
|
+
},
|
|
238
|
+
"engine": {
|
|
239
|
+
"executionMode": "local",
|
|
240
|
+
"provider": "whisper.cpp",
|
|
241
|
+
"model": "base.en",
|
|
242
|
+
"version": "detected-version"
|
|
243
|
+
},
|
|
244
|
+
"policy": {
|
|
245
|
+
"policyId": "workspace-default",
|
|
246
|
+
"externalProviderAllowed": false,
|
|
247
|
+
"redactionPolicyId": "default",
|
|
248
|
+
"consentStatus": "provided",
|
|
249
|
+
"retentionClass": "task-evidence"
|
|
250
|
+
},
|
|
251
|
+
"provenance": {
|
|
252
|
+
"actor": "qa",
|
|
253
|
+
"generatedAt": "2026-05-22T00:00:00.000Z",
|
|
254
|
+
"command": "orchestra transcript run",
|
|
255
|
+
"requestId": "local-run-id"
|
|
256
|
+
},
|
|
257
|
+
"segments": [
|
|
258
|
+
{
|
|
259
|
+
"startMs": 1000,
|
|
260
|
+
"endMs": 3500,
|
|
261
|
+
"speaker": "speaker_1",
|
|
262
|
+
"text": "Redacted transcript segment.",
|
|
263
|
+
"confidence": 0.92
|
|
264
|
+
}
|
|
265
|
+
],
|
|
266
|
+
"findings": {
|
|
267
|
+
"decisions": [],
|
|
268
|
+
"risks": [],
|
|
269
|
+
"actionItems": [],
|
|
270
|
+
"acceptanceCriteriaCandidates": [],
|
|
271
|
+
"defects": [],
|
|
272
|
+
"lessonCandidates": [],
|
|
273
|
+
"unresolvedQuestions": []
|
|
274
|
+
},
|
|
275
|
+
"redactions": {
|
|
276
|
+
"applied": true,
|
|
277
|
+
"countsByType": {
|
|
278
|
+
"secret": 0,
|
|
279
|
+
"email": 2,
|
|
280
|
+
"phone": 1,
|
|
281
|
+
"regulatedMarker": 0
|
|
282
|
+
}
|
|
283
|
+
},
|
|
284
|
+
"quality": {
|
|
285
|
+
"isPartial": false,
|
|
286
|
+
"timestampConfidence": "high",
|
|
287
|
+
"warnings": []
|
|
288
|
+
}
|
|
289
|
+
}
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
Markdown report sections:
|
|
293
|
+
|
|
294
|
+
- Summary and provenance.
|
|
295
|
+
- Source, policy, consent, retention, and redaction status.
|
|
296
|
+
- Acceptance-criteria mapping table with timestamp ranges.
|
|
297
|
+
- Decisions, risks, defects, action items, lesson candidates, and unresolved
|
|
298
|
+
questions.
|
|
299
|
+
- Gaps and degraded-mode warnings.
|
|
300
|
+
|
|
301
|
+
## Failure Model
|
|
302
|
+
|
|
303
|
+
All failures must be non-destructive and evidence-friendly.
|
|
304
|
+
|
|
305
|
+
| Failure | Expected behavior |
|
|
306
|
+
| --- | --- |
|
|
307
|
+
| Missing `ffmpeg`/`ffprobe` | Return degraded result with setup guidance; do not mutate source |
|
|
308
|
+
| Missing local engine | Return policy-safe degraded evidence; do not fall back to external provider automatically |
|
|
309
|
+
| Unsupported codec | Record media probe failure and suggested conversion path |
|
|
310
|
+
| Oversized file or duration | Stop before engine invocation and record configured limit |
|
|
311
|
+
| Path outside allowed roots | Block as security failure |
|
|
312
|
+
| Provider not opted in | Block external execution and record policy id |
|
|
313
|
+
| Provider timeout/rate limit | Record partial/deferred result with retry-safe metadata |
|
|
314
|
+
| Redaction failure | Fail closed; do not persist transcript text |
|
|
315
|
+
| Low timestamp confidence | Write Markdown/JSON with warning; omit VTT/SRT |
|
|
316
|
+
| Partial transcript | Mark `quality.isPartial=true` and require QA review before release |
|
|
317
|
+
| Missing consent in regulated context | Block release approval or require PO risk acceptance |
|
|
318
|
+
|
|
319
|
+
## QA Evidence And Test Strategy
|
|
320
|
+
|
|
321
|
+
QA must map every GH-367 acceptance criterion to an observable check. The first
|
|
322
|
+
implementation should prefer deterministic fixtures: tiny generated WAV/video
|
|
323
|
+
fixtures, mocked engine adapters, and provider contract fakes.
|
|
324
|
+
|
|
325
|
+
| Acceptance area | Test type | Fixture/setup | Expected evidence |
|
|
326
|
+
| --- | --- | --- | --- |
|
|
327
|
+
| Skill activation | Unit/CLI | Task text with transcription/audio/video signals and unrelated control task | Skill selected only for matching signals |
|
|
328
|
+
| Local default | Unit/service | Local engine available; no provider opt-in | Adapter selection uses local engine |
|
|
329
|
+
| Provider opt-in | Unit/contract | Provider configured with policy denied/allowed variants | Denied blocks; allowed records provider provenance |
|
|
330
|
+
| Provenance | Unit/snapshot | Mock media metadata and engine result | JSON includes source hash, duration, language, model, actor, task id, timestamp |
|
|
331
|
+
| Markdown/JSON output | Unit/golden file | Mock segments and findings | Markdown and JSON match schema; no unredacted secret |
|
|
332
|
+
| VTT/SRT output | Unit/golden file | Timestamped and untimestamped segment variants | Subtitles emitted only with adequate timestamps |
|
|
333
|
+
| Finding extraction | Unit | Transcript with decisions, risks, action items, defects | Structured findings populate expected arrays |
|
|
334
|
+
| Redaction | Unit/security | Secrets, tokens, emails, phones, regulated markers | Redacted before persistence; counts recorded |
|
|
335
|
+
| Path safety | Unit/security | Traversal, symlink escape, valid evidence path | Unsafe paths blocked |
|
|
336
|
+
| Missing tools | Unit/integration | `ffmpeg` or engine unavailable | Degraded evidence result, no provider fallback |
|
|
337
|
+
| Oversized media | Unit | File size/duration over limits | Engine not invoked; clear failure |
|
|
338
|
+
| QA timestamp references | Integration | Transcript artifact linked to AC table | Evidence can reference timestamp ranges |
|
|
339
|
+
|
|
340
|
+
Recommended commands once implemented:
|
|
47
341
|
|
|
48
342
|
```bash
|
|
49
|
-
|
|
50
|
-
|
|
51
|
-
|
|
52
|
-
--owner qa \
|
|
53
|
-
--paths ".agent-workflow/evidence/demo.mp4,docs/evidence/transcripts" \
|
|
54
|
-
--acceptance "recording transcript maps acceptance criteria to timestamps" \
|
|
55
|
-
--risks "privacy,security,governance,release"
|
|
56
|
-
|
|
57
|
-
orchestra skills plan --task QA-TRANSCRIPT-001
|
|
343
|
+
node --test test/transcription-*.test.js
|
|
344
|
+
npm run build
|
|
345
|
+
npm run precommit
|
|
58
346
|
```
|
|
347
|
+
|
|
348
|
+
Manual QA for the first release:
|
|
349
|
+
|
|
350
|
+
- Run a small local audio fixture through the command.
|
|
351
|
+
- Verify generated Markdown and JSON artifact paths.
|
|
352
|
+
- Verify no raw unredacted transcript is persisted when fixture contains
|
|
353
|
+
secret-like and PII-like text.
|
|
354
|
+
- Verify provider-denied policy cannot upload.
|
|
355
|
+
- Verify provider-allowed policy records the opt-in decision and correlation
|
|
356
|
+
metadata.
|
|
357
|
+
|
|
358
|
+
## Developer Implementation Slices
|
|
359
|
+
|
|
360
|
+
1. `GH-367A` - Skill activation and request contract
|
|
361
|
+
- Add skill manifest/catalog wiring if needed.
|
|
362
|
+
- Define request, policy, output, segment, finding, and failure types.
|
|
363
|
+
- Tests: activation signals, narrow public types, invalid request handling.
|
|
364
|
+
|
|
365
|
+
2. `GH-367B` - Media validation and local probe adapter
|
|
366
|
+
- Validate workspace/evidence paths.
|
|
367
|
+
- Add `ffprobe` metadata probe and `ffmpeg` audio extraction wrapper using
|
|
368
|
+
args arrays.
|
|
369
|
+
- Tests: path traversal, symlink escape, missing binary, unsupported codec,
|
|
370
|
+
size/duration limits.
|
|
371
|
+
|
|
372
|
+
3. `GH-367C` - Local engine adapter MVP
|
|
373
|
+
- Add one local engine adapter, preferably `whisper.cpp`, behind the generic
|
|
374
|
+
`TranscriptionEngineAdapter`.
|
|
375
|
+
- Normalize output into segments.
|
|
376
|
+
- Tests: adapter command construction, missing engine, partial transcript,
|
|
377
|
+
timestamp confidence.
|
|
378
|
+
|
|
379
|
+
4. `GH-367D` - Evidence writer and provenance artifacts
|
|
380
|
+
- Write Markdown/JSON and optional VTT/SRT from normalized transcript.
|
|
381
|
+
- Add atomic artifact writes and evidence registration.
|
|
382
|
+
- Tests: schema, golden Markdown, subtitle gating, artifact id stability.
|
|
383
|
+
|
|
384
|
+
5. `GH-367E` - Redaction and privacy policy gates
|
|
385
|
+
- Add redaction pipeline before persistence.
|
|
386
|
+
- Add provider opt-in policy and consent/retention checks.
|
|
387
|
+
- Tests: secrets, configured PII, regulated markers, redaction failure,
|
|
388
|
+
missing consent, provider denial.
|
|
389
|
+
|
|
390
|
+
6. `GH-367F` - External provider adapter behind opt-in
|
|
391
|
+
- Add first hosted provider adapter only after policy gates are tested.
|
|
392
|
+
- Record request/correlation id, provider/model, region/endpoint family when
|
|
393
|
+
available, and cost/duration guardrail metadata.
|
|
394
|
+
- Tests: mocked provider success, timeout, rate limit, malformed response,
|
|
395
|
+
no secret leakage.
|
|
396
|
+
|
|
397
|
+
7. `GH-367G` - QA evidence mapping and docs
|
|
398
|
+
- Add acceptance criteria timestamp mapping support.
|
|
399
|
+
- Document setup, local engine install expectations, provider opt-in,
|
|
400
|
+
retention, and troubleshooting.
|
|
401
|
+
- Tests: evidence report links transcript timestamp ranges to AC rows.
|
|
402
|
+
|
|
403
|
+
## Risks And Mitigations
|
|
404
|
+
|
|
405
|
+
| Risk | Severity | Mitigation |
|
|
406
|
+
| --- | --- | --- |
|
|
407
|
+
| Secret or PII leakage from recordings | High | Local default, explicit external opt-in, redaction before persistence, security tests |
|
|
408
|
+
| Raw transcript persistence before redaction | High | In-memory/temp-only handling; fail closed on redaction errors |
|
|
409
|
+
| Provider credentials imply accidental opt-in | High | Separate credentials from policy; require explicit provider policy id |
|
|
410
|
+
| Transcript hallucination or speaker misattribution | Medium | Mark confidence/quality, require human review, avoid identity claims |
|
|
411
|
+
| Large media files exhaust local resources | Medium | Size/duration/segment limits and preflight probing |
|
|
412
|
+
| Local engine availability differs by machine/CI | Medium | Degraded failure model and mocked deterministic tests |
|
|
413
|
+
| Codec variance causes flaky tests | Medium | Tiny generated fixtures and mocked media probe for most tests |
|
|
414
|
+
| Regulated retention/consent varies by tenant | High | Required metadata fields and release blocking for missing regulated consent |
|
|
415
|
+
| External provider terms or model behavior changes | Medium | Adapter isolation, pinned model config, provider-specific docs, eval fixtures |
|
|
416
|
+
|
|
417
|
+
## Open Questions
|
|
418
|
+
|
|
419
|
+
- Where should transcript artifacts live long term: `.agent-workflow/evidence`
|
|
420
|
+
only, or a configurable workspace evidence root?
|
|
421
|
+
- Should raw transcripts ever be retained for debugging, and if so under what
|
|
422
|
+
encrypted local policy?
|
|
423
|
+
- Which local engine should be the MVP implementation target for supported
|
|
424
|
+
platforms?
|
|
425
|
+
- Which provider, if any, has an approved tenant policy for the first hosted
|
|
426
|
+
adapter?
|
|
427
|
+
- Should diarization be included in MVP or treated as a best-effort optional
|
|
428
|
+
post-processor?
|
|
429
|
+
|
|
430
|
+
## Recommended Next Stories
|
|
431
|
+
|
|
432
|
+
- `GH-367A-SKILL-CONTRACT`: Define transcription request/output/policy types and
|
|
433
|
+
skill activation behavior.
|
|
434
|
+
- `GH-367B-MEDIA-PREFLIGHT`: Implement workflow-local media validation,
|
|
435
|
+
`ffprobe` metadata extraction, and safe `ffmpeg` audio extraction.
|
|
436
|
+
- `GH-367C-LOCAL-ENGINE`: Add the first local transcription adapter with
|
|
437
|
+
degraded-mode behavior.
|
|
438
|
+
- `GH-367D-EVIDENCE-ARTIFACTS`: Generate Markdown/JSON/VTT/SRT transcript
|
|
439
|
+
artifacts with provenance.
|
|
440
|
+
- `GH-367E-PRIVACY-REDACTION`: Implement redaction, consent, retention, and
|
|
441
|
+
provider policy gates.
|
|
442
|
+
- `GH-367F-PROVIDER-ADAPTER`: Add one external provider adapter behind explicit
|
|
443
|
+
opt-in and mocked contract tests.
|
|
444
|
+
- `GH-367G-QA-EVIDENCE`: Add AC-to-timestamp evidence mapping and QA report
|
|
445
|
+
workflow.
|
|
446
|
+
|
|
447
|
+
## Reference Notes
|
|
448
|
+
|
|
449
|
+
- `whisper.cpp` is an active local C/C++ port of Whisper suitable for offline
|
|
450
|
+
transcription experiments.
|
|
451
|
+
- `faster-whisper` uses CTranslate2 and is a candidate for faster local or
|
|
452
|
+
self-hosted inference.
|
|
453
|
+
- OpenAI and Google Cloud Speech-to-Text are viable hosted transcription
|
|
454
|
+
adapter families only behind explicit workspace policy opt-in.
|
|
@@ -25,6 +25,44 @@ The run state, gate artifacts, handoffs, evidence, reviews, decisions, and
|
|
|
25
25
|
clarifications are persisted under `.agent-workflow/` so the delivery story can
|
|
26
26
|
be audited after the fact.
|
|
27
27
|
|
|
28
|
+
## Task-Scoped Roles
|
|
29
|
+
|
|
30
|
+
Tasks can declare the roles that are required or optional for the workflow. When
|
|
31
|
+
a task is created with only an owner role, Open Orchestra treats that owner as
|
|
32
|
+
the implicit required role instead of falling back to the default delivery
|
|
33
|
+
workflow.
|
|
34
|
+
|
|
35
|
+
```bash
|
|
36
|
+
orchestra task add --id BUG-001 --title "Fix CLI bug" --owner developer
|
|
37
|
+
orchestra workflow run --task BUG-001 --gates phase
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
Use explicit roles when the task needs a broader lifecycle:
|
|
41
|
+
|
|
42
|
+
```bash
|
|
43
|
+
orchestra task add --id STORY-001 --title "Ship user-facing workflow" \
|
|
44
|
+
--owner product_owner --required-roles product_owner,architect,developer,qa,release_manager
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
This keeps small developer or QA tasks scoped while still allowing full
|
|
48
|
+
PO-to-release delivery for stories that need it.
|
|
49
|
+
|
|
50
|
+
## Workspace Isolation
|
|
51
|
+
|
|
52
|
+
Workflow commands that operate on run state support `--target-dir` so callers
|
|
53
|
+
can launch work from another directory without writing `.agent-workflow/` state
|
|
54
|
+
to the wrong repo.
|
|
55
|
+
|
|
56
|
+
```bash
|
|
57
|
+
orchestra workflow run --task STORY-001 --target-dir /path/to/project
|
|
58
|
+
orchestra workflow list --target-dir /path/to/project
|
|
59
|
+
orchestra workflow gate-approve --run <run-id> --target-dir /path/to/project
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
Use `--target-dir` for temporary E2E projects, editor integrations, local web
|
|
63
|
+
console actions, and any parent process that coordinates more than one
|
|
64
|
+
workspace.
|
|
65
|
+
|
|
28
66
|
## Phase Graph
|
|
29
67
|
|
|
30
68
|
```
|