@mevdragon/vidfarm-devcli 0.2.1 → 0.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +6 -39
- package/GETTING_STARTED.developers.md +87 -0
- package/README.md +94 -238
- package/SKILL.developer.md +430 -104
- package/dist/src/account-pages.js +1 -1
- package/dist/src/app.js +93 -5
- package/dist/src/cli.js +456 -8
- package/dist/src/config.js +3 -2
- package/dist/src/context.js +30 -11
- package/dist/src/db.js +2 -57
- package/dist/src/dev-app.js +0 -1
- package/dist/src/index.js +4 -2
- package/dist/src/lib/template-paths.js +21 -0
- package/dist/src/runtime.js +3 -1
- package/dist/src/services/auth.js +4 -4
- package/dist/src/services/job-logs.js +186 -0
- package/dist/src/services/jobs.js +3 -2
- package/dist/src/services/providers.js +14 -6
- package/dist/src/services/storage.js +85 -2
- package/dist/src/services/template-sources.js +29 -3
- package/dist/templates/template_0000/src/lib/images.js +46 -86
- package/dist/templates/template_0000/src/template.js +277 -53
- package/package.json +5 -6
- package/templates/template_0000/README.md +8 -52
- package/templates/template_0000/SKILL.md +35 -3
- package/templates/template_0000/package.json +3 -6
- package/templates/template_0000/src/lib/images.js +46 -86
- package/templates/template_0000/src/lib/images.ts +55 -98
- package/templates/template_0000/src/template-dna.js +9 -0
- package/templates/template_0000/src/template.js +523 -199
- package/templates/template_0000/src/template.ts +356 -61
- package/templates/template_0000/template.config.json +7 -12
- package/AWS_REMOTION_HANDOFF.md +0 -311
- package/PLATFORM_SPEC.md +0 -1039
- package/SKILL.director.md +0 -599
- package/dist/infra/cdk/bin/vidfarm-prod.js +0 -59
- package/dist/infra/cdk/lib/vidfarm-prod-stack.js +0 -212
- package/templates/template_0000/package-lock.json +0 -5505
- package/templates/template_0000/scripts/create-site.mjs +0 -27
- package/templates/template_0000/scripts/render-cloud.mjs +0 -72
package/PLATFORM_SPEC.md
DELETED
|
@@ -1,1039 +0,0 @@
|
|
|
1
|
-
# Vidfarm Platform Spec
|
|
2
|
-
|
|
3
|
-
This document defines the initial platform architecture for Vidfarm running on a single Dockerized EC2 host.
|
|
4
|
-
|
|
5
|
-
The goal is to let in-house developers build new video production templates quickly while the platform centrally owns auth, billing, job orchestration, customer state, and deployment.
|
|
6
|
-
|
|
7
|
-
For template code distribution in v1.1, the platform should support GitHub-backed in-house template sources, but production activation must still be manual, admin-approved, and pinned to a specific commit SHA.
|
|
8
|
-
|
|
9
|
-
## Goals
|
|
10
|
-
|
|
11
|
-
1. Support multiple video production patterns behind one consistent API.
|
|
12
|
-
2. Treat every operation as an async job that immediately returns `job_id`.
|
|
13
|
-
3. Let template developers write normal TypeScript/Node code with normal npm dependencies.
|
|
14
|
-
4. Keep the first production deployment simple enough to run on one EC2 Docker host.
|
|
15
|
-
5. Preserve a clean path to later split heavy workloads into isolated workers or separate services.
|
|
16
|
-
|
|
17
|
-
## Non-Goals
|
|
18
|
-
|
|
19
|
-
1. Public marketplace for third-party user-submitted templates.
|
|
20
|
-
2. Multi-region or scale-to-zero serverless deployment in v1.
|
|
21
|
-
3. Full microservice isolation for every template.
|
|
22
|
-
4. Perfect cost attribution in v1. The first target is safe, conservative billing that protects gross margin.
|
|
23
|
-
|
|
24
|
-
## Core Decision
|
|
25
|
-
|
|
26
|
-
Vidfarm should run as one shared platform container on EC2.
|
|
27
|
-
|
|
28
|
-
Templates should be packaged as internal code modules loaded by that platform, not as separately deployed HTTP services by default.
|
|
29
|
-
Platform runtime code should live under `src/*`, while template implementation code should live outside the platform tree under `templates/<template-folder>/*`.
|
|
30
|
-
|
|
31
|
-
This gives us:
|
|
32
|
-
|
|
33
|
-
1. One auth and billing boundary.
|
|
34
|
-
2. One job table and one queueing system.
|
|
35
|
-
3. One deployment artifact for the normal case.
|
|
36
|
-
4. Full npm freedom for template developers.
|
|
37
|
-
5. A cleaner developer experience than forcing every template to become its own service.
|
|
38
|
-
|
|
39
|
-
Templates may still opt into isolated execution later when they have special requirements such as native binaries, unusually high memory use, or independent scaling needs.
|
|
40
|
-
|
|
41
|
-
## Template Source Of Truth
|
|
42
|
-
|
|
43
|
-
Template code may live in GitHub, but the platform should not auto-pull floating branch heads into production.
|
|
44
|
-
|
|
45
|
-
Approved v1.1 release model:
|
|
46
|
-
|
|
47
|
-
1. developer keeps template code in GitHub
|
|
48
|
-
2. the default import branch is `production`
|
|
49
|
-
3. admin manually reviews the repo and chooses when to import
|
|
50
|
-
4. admin publishes the approved Remotion site bundle if the template needs Remotion
|
|
51
|
-
5. platform resolves the chosen branch head to a commit SHA
|
|
52
|
-
6. platform builds and certifies that exact commit
|
|
53
|
-
7. platform activates a pinned release record for that commit
|
|
54
|
-
8. admin rebuilds and redeploys the production Docker image with the approved release set
|
|
55
|
-
|
|
56
|
-
Live platform state should therefore point to:
|
|
57
|
-
|
|
58
|
-
- repo URL
|
|
59
|
-
- branch name
|
|
60
|
-
- exact commit SHA
|
|
61
|
-
- certification result
|
|
62
|
-
- active/inactive release state
|
|
63
|
-
|
|
64
|
-
It should not point only to a floating branch name.
|
|
65
|
-
|
|
66
|
-
Release authority should be centralized:
|
|
67
|
-
|
|
68
|
-
- developers can push source code to GitHub
|
|
69
|
-
- developers cannot directly publish to shared Remotion AWS
|
|
70
|
-
- developers cannot directly promote templates into production Docker
|
|
71
|
-
- shared AWS publish, activation, and production deployment are admin-only steps
|
|
72
|
-
|
|
73
|
-
## Initial Runtime Choice
|
|
74
|
-
|
|
75
|
-
### Production Runtime
|
|
76
|
-
|
|
77
|
-
- Node.js 22
|
|
78
|
-
- TypeScript
|
|
79
|
-
- Hono for the HTTP API
|
|
80
|
-
- SQLite for v1 job state and queue state
|
|
81
|
-
- S3 for customer files and generated artifacts
|
|
82
|
-
- Remotion Lambda for final render workloads where appropriate
|
|
83
|
-
|
|
84
|
-
### Why Hono
|
|
85
|
-
|
|
86
|
-
Hono is a good fit for the control plane:
|
|
87
|
-
|
|
88
|
-
1. Small and fast.
|
|
89
|
-
2. Strong middleware model.
|
|
90
|
-
3. Good TypeScript ergonomics.
|
|
91
|
-
4. Easy to keep the HTTP layer thin while most complexity lives in jobs and template execution.
|
|
92
|
-
|
|
93
|
-
### Why Not Bun for v1 Runtime
|
|
94
|
-
|
|
95
|
-
Bun is not forbidden, but it should not be the production baseline for v1.
|
|
96
|
-
|
|
97
|
-
Reasons:
|
|
98
|
-
|
|
99
|
-
1. The platform already assumes Node-oriented Docker execution.
|
|
100
|
-
2. AI SDK compatibility and native package behavior are more predictable on Node.
|
|
101
|
-
3. Remotion and adjacent tooling are safer on the Node compatibility path.
|
|
102
|
-
4. The hard part of this system is orchestration correctness, not JavaScript runtime speed.
|
|
103
|
-
|
|
104
|
-
If desired, Bun can be evaluated later as a local development runner or for specific internal tools.
|
|
105
|
-
|
|
106
|
-
## Supported Production Patterns
|
|
107
|
-
|
|
108
|
-
The platform must support all of the following under one model:
|
|
109
|
-
|
|
110
|
-
1. Pure AI multi-stage production.
|
|
111
|
-
2. Remotion render pipelines.
|
|
112
|
-
3. Hybrid research plus render pipelines.
|
|
113
|
-
4. Animated storytelling pipelines.
|
|
114
|
-
|
|
115
|
-
The common abstraction is:
|
|
116
|
-
|
|
117
|
-
1. Customer hits a template endpoint.
|
|
118
|
-
2. Platform validates auth and input.
|
|
119
|
-
3. Platform creates an async job.
|
|
120
|
-
4. Worker executes the requested operation.
|
|
121
|
-
5. Customer polls job state or receives a webhook.
|
|
122
|
-
|
|
123
|
-
## Architectural Overview
|
|
124
|
-
|
|
125
|
-
```txt
|
|
126
|
-
Client
|
|
127
|
-
-> Hono API
|
|
128
|
-
-> Auth / Billing / Template Registry / Job Creation
|
|
129
|
-
-> SQLite (jobs, logs, queue, rate-limit state, customer metadata pointers)
|
|
130
|
-
-> Worker Loop in same container
|
|
131
|
-
-> External providers (OpenAI, Gemini, OpenRouter, Perplexity, etc.)
|
|
132
|
-
-> S3 (workspace files, stage artifacts, final outputs)
|
|
133
|
-
-> Remotion Lambda (when render pipeline requires it)
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
Initial deployment is one process image with two logical responsibilities:
|
|
137
|
-
|
|
138
|
-
1. API server
|
|
139
|
-
2. Background worker / dispatcher
|
|
140
|
-
|
|
141
|
-
These can run in the same container in v1. If needed later, they can be split into separate process types using the same codebase.
|
|
142
|
-
|
|
143
|
-
## API Principles
|
|
144
|
-
|
|
145
|
-
All template operations are async-first.
|
|
146
|
-
|
|
147
|
-
Even if a task could finish quickly, the platform should still prefer job creation so the customer sees one consistent model.
|
|
148
|
-
|
|
149
|
-
### Base Path
|
|
150
|
-
|
|
151
|
-
```txt
|
|
152
|
-
/templates/:templateId/*
|
|
153
|
-
```
|
|
154
|
-
|
|
155
|
-
### Core Endpoints
|
|
156
|
-
|
|
157
|
-
```txt
|
|
158
|
-
GET /templates/:templateId
|
|
159
|
-
GET /templates/:templateId/about/*
|
|
160
|
-
GET /templates/:templateId/skill
|
|
161
|
-
POST /templates/:templateId/config
|
|
162
|
-
POST /templates/:templateId/operations/:operationName
|
|
163
|
-
GET /templates/:templateId/jobs
|
|
164
|
-
GET /templates/:templateId/jobs/:jobId
|
|
165
|
-
GET /templates/:templateId/jobs/:jobId/logs
|
|
166
|
-
POST /templates/:templateId/jobs/:jobId/cancel
|
|
167
|
-
```
|
|
168
|
-
|
|
169
|
-
### Request Headers
|
|
170
|
-
|
|
171
|
-
```txt
|
|
172
|
-
vidfarm-user-id: string
|
|
173
|
-
vidfarm-api-key: string
|
|
174
|
-
```
|
|
175
|
-
|
|
176
|
-
### Job Creation Request Shape
|
|
177
|
-
|
|
178
|
-
```json
|
|
179
|
-
{
|
|
180
|
-
"tracer": "client-generated-string",
|
|
181
|
-
"payload": {}
|
|
182
|
-
}
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
### Job Creation Response Shape
|
|
186
|
-
|
|
187
|
-
```json
|
|
188
|
-
{
|
|
189
|
-
"job_id": "job_xxx",
|
|
190
|
-
"tracer": "client-generated-string",
|
|
191
|
-
"status": "queued"
|
|
192
|
-
}
|
|
193
|
-
```
|
|
194
|
-
|
|
195
|
-
### Template Metadata Response Shape
|
|
196
|
-
|
|
197
|
-
```json
|
|
198
|
-
{
|
|
199
|
-
"id": "4c7a7e1a-7f35-4f30-9f86-9c8a63c7f2db",
|
|
200
|
-
"slug_id": "template_0000",
|
|
201
|
-
"version": "1.0.0",
|
|
202
|
-
"title": "Template 0000",
|
|
203
|
-
"description": "Short-form slideshow pipeline",
|
|
204
|
-
"viral_dna": "Fast TikTok slideshow hooks with mobile-native pacing.",
|
|
205
|
-
"preview_media": [
|
|
206
|
-
"https://api.example.com/templates/4c7a7e1a-7f35-4f30-9f86-9c8a63c7f2db/about/preview-01.jpg"
|
|
207
|
-
],
|
|
208
|
-
"link_to_original": "https://www.tiktok.com/@example/video/1234567890",
|
|
209
|
-
"skill_url": "https://api.example.com/templates/4c7a7e1a-7f35-4f30-9f86-9c8a63c7f2db/skill",
|
|
210
|
-
"operations": [
|
|
211
|
-
{
|
|
212
|
-
"name": "create_slideshow",
|
|
213
|
-
"description": "Generate slideshow frames.",
|
|
214
|
-
"providerHint": "openrouter"
|
|
215
|
-
}
|
|
216
|
-
]
|
|
217
|
-
}
|
|
218
|
-
```
|
|
219
|
-
|
|
220
|
-
`GET /templates/:templateId/about/*` should expose template metadata assets such as preview images or videos. In storage, these assets should use the stable logical prefix `templates/:templateId/about/*`, whether the backing store is local disk or S3.
|
|
221
|
-
|
|
222
|
-
Template definitions should expose `preview_media` as absolute HTTPS URLs. When the backing store is S3, those entries should point directly at the S3 object for assets stored under `templates/:templateId/about/*`.
|
|
223
|
-
|
|
224
|
-
### Job List Filtering
|
|
225
|
-
|
|
226
|
-
`GET /templates/:templateId/jobs` should support:
|
|
227
|
-
|
|
228
|
-
- `tracer`
|
|
229
|
-
- `start_time`
|
|
230
|
-
- `end_time`
|
|
231
|
-
- `limit`
|
|
232
|
-
|
|
233
|
-
`GET /templates/:templateId/jobs/:jobId/logs` should use the same time window language:
|
|
234
|
-
|
|
235
|
-
- `start_time`
|
|
236
|
-
- `end_time`
|
|
237
|
-
- `limit`
|
|
238
|
-
|
|
239
|
-
`logs_from` is deprecated and should not be used in the standard.
|
|
240
|
-
|
|
241
|
-
`GET /templates/:templateId` is the template "about" response and must also return:
|
|
242
|
-
|
|
243
|
-
- `slug_id: string`
|
|
244
|
-
- `title: string`
|
|
245
|
-
- `viral_dna: string`
|
|
246
|
-
- `preview_media: string[]`
|
|
247
|
-
- `link_to_original: string`
|
|
248
|
-
|
|
249
|
-
In the template definition standard, this response `title` and `description` should be sourced from `template.about.title` and `template.about.description`, not top-level template fields.
|
|
250
|
-
|
|
251
|
-
## Template Model
|
|
252
|
-
|
|
253
|
-
Templates are normal TypeScript packages with unrestricted internal structure and should live under a repo-level `templates/` directory, for example `templates/template_0000/*`.
|
|
254
|
-
|
|
255
|
-
They may:
|
|
256
|
-
|
|
257
|
-
1. Import npm libraries.
|
|
258
|
-
2. Define helper modules.
|
|
259
|
-
3. Bundle prompt files.
|
|
260
|
-
4. Include Remotion compositions.
|
|
261
|
-
5. Call provider SDKs.
|
|
262
|
-
6. Run arbitrary internal orchestration logic.
|
|
263
|
-
|
|
264
|
-
The framework should not force templates into a single-file callback model.
|
|
265
|
-
|
|
266
|
-
### Template Contract
|
|
267
|
-
|
|
268
|
-
The external API surface should be defined as operations, not just raw stage names.
|
|
269
|
-
|
|
270
|
-
Suggested shape:
|
|
271
|
-
|
|
272
|
-
```ts
|
|
273
|
-
export const myTemplate = defineTemplate({
|
|
274
|
-
id: "123e4567-e89b-42d3-a456-426614174000",
|
|
275
|
-
slugId: "ugc_voiceover_v1",
|
|
276
|
-
version: "1.0.0",
|
|
277
|
-
about: {
|
|
278
|
-
title: "UGC Voiceover V1",
|
|
279
|
-
description: "Short-form UGC voiceover pipeline",
|
|
280
|
-
viral_dna: "Fast hooks, native pacing, and repeatable creator-style framing.",
|
|
281
|
-
preview_media: ["https://cdn.example.com/templates/ugc-voiceover-v1/about/preview-01.mp4"],
|
|
282
|
-
link_to_original: "https://www.tiktok.com/@example/video/1234567890"
|
|
283
|
-
},
|
|
284
|
-
configSchema: z.object({
|
|
285
|
-
defaultProvider: z.enum(["openai", "gemini", "openrouter", "perplexity"]).default("openai")
|
|
286
|
-
}),
|
|
287
|
-
|
|
288
|
-
operations: {
|
|
289
|
-
scaffold: {
|
|
290
|
-
description: "Generate a script scaffold.",
|
|
291
|
-
inputSchema: z.object({
|
|
292
|
-
topic: z.string()
|
|
293
|
-
}),
|
|
294
|
-
workflow: "scaffoldWorkflow",
|
|
295
|
-
providerHint: "openai",
|
|
296
|
-
webhookSupport: true
|
|
297
|
-
},
|
|
298
|
-
render: {
|
|
299
|
-
description: "Submit final render work.",
|
|
300
|
-
inputSchema: z.object({
|
|
301
|
-
storyboardId: z.string()
|
|
302
|
-
}),
|
|
303
|
-
workflow: "renderWorkflow",
|
|
304
|
-
webhookSupport: true
|
|
305
|
-
}
|
|
306
|
-
},
|
|
307
|
-
|
|
308
|
-
jobs: {
|
|
309
|
-
async scaffoldWorkflow(ctx, input) {
|
|
310
|
-
return {
|
|
311
|
-
progress: 1,
|
|
312
|
-
output: {}
|
|
313
|
-
};
|
|
314
|
-
},
|
|
315
|
-
async renderWorkflow(ctx, input) {
|
|
316
|
-
return {
|
|
317
|
-
progress: 1,
|
|
318
|
-
output: {}
|
|
319
|
-
};
|
|
320
|
-
}
|
|
321
|
-
}
|
|
322
|
-
});
|
|
323
|
-
```
|
|
324
|
-
|
|
325
|
-
### Why This Contract
|
|
326
|
-
|
|
327
|
-
This separates:
|
|
328
|
-
|
|
329
|
-
1. Public API entrypoints.
|
|
330
|
-
2. Internal workflow implementation.
|
|
331
|
-
3. Template metadata and validation.
|
|
332
|
-
|
|
333
|
-
It gives template developers full control over their workflow logic while keeping the platform contract stable.
|
|
334
|
-
|
|
335
|
-
## Template Execution Context
|
|
336
|
-
|
|
337
|
-
Each operation or job should receive a framework-owned context object.
|
|
338
|
-
|
|
339
|
-
Suggested capabilities:
|
|
340
|
-
|
|
341
|
-
```ts
|
|
342
|
-
interface TemplateJobContext {
|
|
343
|
-
env: "development" | "production";
|
|
344
|
-
customer: CustomerContext;
|
|
345
|
-
templateConfig: Record<string, unknown>;
|
|
346
|
-
logger: {
|
|
347
|
-
debug(message: string, metadata?: Record<string, unknown>): void;
|
|
348
|
-
info(message: string, metadata?: Record<string, unknown>): void;
|
|
349
|
-
warn(message: string, metadata?: Record<string, unknown>): void;
|
|
350
|
-
error(message: string, metadata?: Record<string, unknown>): void;
|
|
351
|
-
progress(progress: number, message: string, metadata?: Record<string, unknown>): void;
|
|
352
|
-
};
|
|
353
|
-
jobs: {
|
|
354
|
-
enqueueChild(input: {
|
|
355
|
-
operationName: string;
|
|
356
|
-
workflowName: string;
|
|
357
|
-
payload: Record<string, unknown>;
|
|
358
|
-
providerHint?: ProviderType;
|
|
359
|
-
}): Promise<{ jobId: string }>;
|
|
360
|
-
};
|
|
361
|
-
storage: {
|
|
362
|
-
putJson(key: string, value: unknown): Promise<{ key: string; url: string | null }>;
|
|
363
|
-
putText(key: string, value: string, contentType?: string): Promise<{ key: string; url: string | null }>;
|
|
364
|
-
putBuffer(
|
|
365
|
-
key: string,
|
|
366
|
-
value: Uint8Array,
|
|
367
|
-
options?: { contentType?: string; kind?: string; metadata?: Record<string, unknown> }
|
|
368
|
-
): Promise<{ key: string; url: string | null }>;
|
|
369
|
-
getPublicUrl(key: string): string | null;
|
|
370
|
-
};
|
|
371
|
-
billing: {
|
|
372
|
-
record(input: {
|
|
373
|
-
type: "ai_generation" | "render" | "storage_write" | "cpu_estimate";
|
|
374
|
-
costUsd: number;
|
|
375
|
-
chargeUsd?: number;
|
|
376
|
-
metadata?: Record<string, unknown>;
|
|
377
|
-
}): Promise<void>;
|
|
378
|
-
};
|
|
379
|
-
providers: {
|
|
380
|
-
generateText(input: {
|
|
381
|
-
provider: ProviderType;
|
|
382
|
-
model: string;
|
|
383
|
-
prompt: string;
|
|
384
|
-
temperature?: number;
|
|
385
|
-
}): Promise<{ text: string }>;
|
|
386
|
-
generateImage(input: {
|
|
387
|
-
provider: ProviderType;
|
|
388
|
-
model: string;
|
|
389
|
-
prompt: string;
|
|
390
|
-
size?: string;
|
|
391
|
-
}): Promise<{ bytes: Uint8Array; contentType: string; revisedPrompt: string | null }>;
|
|
392
|
-
analyzeImageLayout(input: {
|
|
393
|
-
provider: ProviderType;
|
|
394
|
-
model: string;
|
|
395
|
-
imageUrl: string;
|
|
396
|
-
overlayText: string;
|
|
397
|
-
}): Promise<{
|
|
398
|
-
zone: "top" | "center" | "bottom";
|
|
399
|
-
align: "left" | "center" | "right";
|
|
400
|
-
maxWidthPercent: number;
|
|
401
|
-
justification: string;
|
|
402
|
-
}>;
|
|
403
|
-
};
|
|
404
|
-
remotion: {
|
|
405
|
-
render(input: {
|
|
406
|
-
compositionId: string;
|
|
407
|
-
serveUrl?: string;
|
|
408
|
-
entryPoint?: string;
|
|
409
|
-
outputKey?: string;
|
|
410
|
-
inputProps: Record<string, unknown>;
|
|
411
|
-
}): Promise<{ renderId: string; outputUrl: string | null; metadata: Record<string, unknown> }>;
|
|
412
|
-
};
|
|
413
|
-
}
|
|
414
|
-
```
|
|
415
|
-
|
|
416
|
-
Framework-owned context capabilities should include:
|
|
417
|
-
|
|
418
|
-
1. Resolving customer AI keys safely.
|
|
419
|
-
2. Writing artifacts through a stable storage prefix.
|
|
420
|
-
3. Enqueuing child jobs.
|
|
421
|
-
4. Emitting logs and progress.
|
|
422
|
-
5. Recording billable events.
|
|
423
|
-
6. Submitting downstream renders through a Remotion adapter.
|
|
424
|
-
7. Calling provider adapters through centralized rate-limit enforcement.
|
|
425
|
-
|
|
426
|
-
## Environment Behavior
|
|
427
|
-
|
|
428
|
-
The platform must clearly distinguish development from production.
|
|
429
|
-
|
|
430
|
-
### Development
|
|
431
|
-
|
|
432
|
-
Developer-owned API keys from local `.env` are allowed.
|
|
433
|
-
|
|
434
|
-
This is for:
|
|
435
|
-
|
|
436
|
-
1. Local testing.
|
|
437
|
-
2. Template development.
|
|
438
|
-
3. Dry-running internal workflows before deployment.
|
|
439
|
-
|
|
440
|
-
Template authors should also be able to run the same certification harness locally through the developer CLI before admin review.
|
|
441
|
-
|
|
442
|
-
### Production
|
|
443
|
-
|
|
444
|
-
The platform must use customer-owned provider keys stored in the customer profile when the template requests external AI inference on behalf of that customer.
|
|
445
|
-
|
|
446
|
-
Platform-controlled keys may still exist for:
|
|
447
|
-
|
|
448
|
-
1. Platform-level fallback behavior.
|
|
449
|
-
2. Internal moderation or diagnostics.
|
|
450
|
-
3. Emergency operations.
|
|
451
|
-
|
|
452
|
-
But customer-billed workloads should default to customer-owned keys when available.
|
|
453
|
-
|
|
454
|
-
Admin-only template source import and activation are allowed in production. Developer template changes should not go live until an admin explicitly imports and activates a pinned release.
|
|
455
|
-
|
|
456
|
-
## Developer And Admin Auth Model
|
|
457
|
-
|
|
458
|
-
The platform’s API auth remains customer-style `vidfarm-user-id` plus `vidfarm-api-key`, but the platform must also support an admin allowlist for template-source management endpoints.
|
|
459
|
-
|
|
460
|
-
Minimum v1.1 rule:
|
|
461
|
-
|
|
462
|
-
1. all normal template execution uses standard platform auth
|
|
463
|
-
2. template source registration, import, and activation endpoints are admin-only
|
|
464
|
-
3. admin authorization may begin as an allowlist of trusted emails
|
|
465
|
-
|
|
466
|
-
This keeps v1 simple while still distinguishing runtime customers from internal release operators.
|
|
467
|
-
|
|
468
|
-
## Customer Profile Model
|
|
469
|
-
|
|
470
|
-
Each customer profile should support:
|
|
471
|
-
|
|
472
|
-
1. Multiple provider API keys.
|
|
473
|
-
2. Multiple keys per provider.
|
|
474
|
-
3. Workspace file storage references.
|
|
475
|
-
4. Webhook destinations.
|
|
476
|
-
5. Billing preferences and limits.
|
|
477
|
-
|
|
478
|
-
Suggested provider key record:
|
|
479
|
-
|
|
480
|
-
```ts
|
|
481
|
-
interface CustomerProviderKey {
|
|
482
|
-
id: string;
|
|
483
|
-
provider: "openai" | "gemini" | "openrouter" | "perplexity";
|
|
484
|
-
encryptedSecret: string;
|
|
485
|
-
label?: string;
|
|
486
|
-
status: "active" | "paused" | "rate_limited" | "invalid";
|
|
487
|
-
lastUsedAt?: string;
|
|
488
|
-
cooldownUntil?: string;
|
|
489
|
-
}
|
|
490
|
-
```
|
|
491
|
-
|
|
492
|
-
Customers may store multiple keys for the same provider. The platform should treat those keys as a small pooled resource that jobs must acquire before making outbound AI requests.
|
|
493
|
-
|
|
494
|
-
## Queueing and Async Jobs
|
|
495
|
-
|
|
496
|
-
The platform is async-native.
|
|
497
|
-
|
|
498
|
-
Every operation should create a job record and return immediately.
|
|
499
|
-
|
|
500
|
-
### Job State
|
|
501
|
-
|
|
502
|
-
Suggested states:
|
|
503
|
-
|
|
504
|
-
```txt
|
|
505
|
-
queued
|
|
506
|
-
running
|
|
507
|
-
waiting_for_child
|
|
508
|
-
waiting_for_human
|
|
509
|
-
succeeded
|
|
510
|
-
failed
|
|
511
|
-
cancelled
|
|
512
|
-
```
|
|
513
|
-
|
|
514
|
-
### Job Data
|
|
515
|
-
|
|
516
|
-
Each job record should track:
|
|
517
|
-
|
|
518
|
-
1. `job_id`
|
|
519
|
-
2. `template_id`
|
|
520
|
-
3. `operation_name`
|
|
521
|
-
4. `tracer`
|
|
522
|
-
5. `status`
|
|
523
|
-
6. `payload`
|
|
524
|
-
7. `result`
|
|
525
|
-
8. `error`
|
|
526
|
-
9. `progress`
|
|
527
|
-
10. `webhook_url`
|
|
528
|
-
11. `parent_job_id`
|
|
529
|
-
12. `customer_id`
|
|
530
|
-
13. `reservation_id` or billing reference
|
|
531
|
-
14. timestamps
|
|
532
|
-
|
|
533
|
-
### Logs
|
|
534
|
-
|
|
535
|
-
Logs must be stored as structured job events, not just raw text.
|
|
536
|
-
|
|
537
|
-
Each event should support:
|
|
538
|
-
|
|
539
|
-
1. timestamp
|
|
540
|
-
2. level
|
|
541
|
-
3. message
|
|
542
|
-
4. machine-readable metadata
|
|
543
|
-
5. progress update
|
|
544
|
-
6. artifact references
|
|
545
|
-
|
|
546
|
-
This lets the client render a live job timeline later.
|
|
547
|
-
|
|
548
|
-
## SQLite-Backed AI Key Queue
|
|
549
|
-
|
|
550
|
-
SQLite is not only the v1 job store. It is also the coordination layer for customer AI API key usage.
|
|
551
|
-
|
|
552
|
-
The intended model is:
|
|
553
|
-
|
|
554
|
-
1. A job becomes runnable.
|
|
555
|
-
2. The worker identifies which provider and model the next step requires.
|
|
556
|
-
3. The worker attempts to lease one eligible customer key from SQLite.
|
|
557
|
-
4. If a lease is granted, the worker performs the outbound API call.
|
|
558
|
-
5. The worker records usage, updates cooldown state if needed, and releases the lease.
|
|
559
|
-
|
|
560
|
-
This gives the platform a lightweight queue for AI key access without needing Redis, SQS, or a separate lock service.
|
|
561
|
-
|
|
562
|
-
### Why SQLite Is Acceptable in v1
|
|
563
|
-
|
|
564
|
-
This is a reasonable design if all of the following remain true:
|
|
565
|
-
|
|
566
|
-
1. One EC2 host is the active source of truth.
|
|
567
|
-
2. The platform runs a moderate number of worker loops.
|
|
568
|
-
3. SQLite is configured in WAL mode.
|
|
569
|
-
4. Lease acquisition is done transactionally.
|
|
570
|
-
5. Jobs are retry-safe and can be rescheduled when no key is available.
|
|
571
|
-
|
|
572
|
-
### Core Idea
|
|
573
|
-
|
|
574
|
-
The AI key queue is represented by a combination of:
|
|
575
|
-
|
|
576
|
-
1. Customer provider key records.
|
|
577
|
-
2. Active key lease records.
|
|
578
|
-
3. Key usage and error events.
|
|
579
|
-
4. Cooldown timestamps after rate-limit responses.
|
|
580
|
-
|
|
581
|
-
There is no separate message broker for API key access. Eligibility is derived from database state at lease time.
|
|
582
|
-
|
|
583
|
-
### Suggested Tables
|
|
584
|
-
|
|
585
|
-
```sql
|
|
586
|
-
create table customer_provider_keys (
|
|
587
|
-
id text primary key,
|
|
588
|
-
customer_id text not null,
|
|
589
|
-
provider text not null,
|
|
590
|
-
label text,
|
|
591
|
-
encrypted_secret text not null,
|
|
592
|
-
status text not null,
|
|
593
|
-
weight integer not null default 1,
|
|
594
|
-
last_used_at text,
|
|
595
|
-
cooldown_until text,
|
|
596
|
-
disabled_reason text,
|
|
597
|
-
created_at text not null,
|
|
598
|
-
updated_at text not null
|
|
599
|
-
);
|
|
600
|
-
|
|
601
|
-
create table provider_key_leases (
|
|
602
|
-
key_id text primary key,
|
|
603
|
-
lease_token text not null,
|
|
604
|
-
worker_id text not null,
|
|
605
|
-
job_id text not null,
|
|
606
|
-
leased_at text not null,
|
|
607
|
-
expires_at text not null
|
|
608
|
-
);
|
|
609
|
-
|
|
610
|
-
create table provider_key_usage_events (
|
|
611
|
-
id text primary key,
|
|
612
|
-
key_id text not null,
|
|
613
|
-
job_id text not null,
|
|
614
|
-
provider text not null,
|
|
615
|
-
model text,
|
|
616
|
-
event_type text not null,
|
|
617
|
-
input_tokens integer,
|
|
618
|
-
output_tokens integer,
|
|
619
|
-
cost_usd real,
|
|
620
|
-
created_at text not null
|
|
621
|
-
);
|
|
622
|
-
```
|
|
623
|
-
|
|
624
|
-
Optional model capability table:
|
|
625
|
-
|
|
626
|
-
```sql
|
|
627
|
-
create table provider_key_capabilities (
|
|
628
|
-
key_id text not null,
|
|
629
|
-
model text not null,
|
|
630
|
-
primary key (key_id, model)
|
|
631
|
-
);
|
|
632
|
-
```
|
|
633
|
-
|
|
634
|
-
### Lease Acquisition
|
|
635
|
-
|
|
636
|
-
Workers must acquire a key lease before making any outbound provider request on behalf of a customer.
|
|
637
|
-
|
|
638
|
-
Lease acquisition should happen inside a transaction using `BEGIN IMMEDIATE`.
|
|
639
|
-
|
|
640
|
-
The query should exclude keys that are:
|
|
641
|
-
|
|
642
|
-
1. Not active.
|
|
643
|
-
2. In cooldown.
|
|
644
|
-
3. Already leased and whose lease has not expired.
|
|
645
|
-
4. Incompatible with the requested provider or model.
|
|
646
|
-
|
|
647
|
-
Preferred selection order in v1:
|
|
648
|
-
|
|
649
|
-
1. Least recently used eligible key.
|
|
650
|
-
2. Higher weight first when weights differ.
|
|
651
|
-
|
|
652
|
-
Illustrative flow:
|
|
653
|
-
|
|
654
|
-
```txt
|
|
655
|
-
BEGIN IMMEDIATE
|
|
656
|
-
1. Select one eligible key
|
|
657
|
-
2. Insert active lease row
|
|
658
|
-
3. Commit
|
|
659
|
-
```
|
|
660
|
-
|
|
661
|
-
Illustrative query shape:
|
|
662
|
-
|
|
663
|
-
```sql
|
|
664
|
-
select k.id
|
|
665
|
-
from customer_provider_keys k
|
|
666
|
-
left join provider_key_leases l
|
|
667
|
-
on l.key_id = k.id
|
|
668
|
-
and l.expires_at > datetime('now')
|
|
669
|
-
where k.customer_id = ?
|
|
670
|
-
and k.provider = ?
|
|
671
|
-
and k.status = 'active'
|
|
672
|
-
and (k.cooldown_until is null or k.cooldown_until <= datetime('now'))
|
|
673
|
-
and l.key_id is null
|
|
674
|
-
order by k.last_used_at asc nulls first, k.weight desc
|
|
675
|
-
limit 1;
|
|
676
|
-
```
|
|
677
|
-
|
|
678
|
-
If a key is found, the worker inserts a lease row with a short expiry such as 30 to 90 seconds.
|
|
679
|
-
|
|
680
|
-
If no key is found, the worker must not busy-loop. It should reschedule the job for a future run.
|
|
681
|
-
|
|
682
|
-
### Lease Semantics
|
|
683
|
-
|
|
684
|
-
Lease rows should contain:
|
|
685
|
-
|
|
686
|
-
1. `key_id`
|
|
687
|
-
2. `lease_token`
|
|
688
|
-
3. `worker_id`
|
|
689
|
-
4. `job_id`
|
|
690
|
-
5. `leased_at`
|
|
691
|
-
6. `expires_at`
|
|
692
|
-
|
|
693
|
-
The lease token should be required for release or extension so one worker cannot accidentally release another worker's lease.
|
|
694
|
-
|
|
695
|
-
### Lease Expiry and Recovery
|
|
696
|
-
|
|
697
|
-
If a worker crashes, its lease should naturally expire and the key should become eligible again.
|
|
698
|
-
|
|
699
|
-
For long-running requests, the platform may optionally support lease extension heartbeats. This is useful when the provider call or downstream processing can exceed the default lease duration.
|
|
700
|
-
|
|
701
|
-
### Success Path
|
|
702
|
-
|
|
703
|
-
After a successful provider call, the worker should:
|
|
704
|
-
|
|
705
|
-
1. Record a usage event.
|
|
706
|
-
2. Update `last_used_at`.
|
|
707
|
-
3. Clear any temporary rate-limit status when appropriate.
|
|
708
|
-
4. Release the lease.
|
|
709
|
-
|
|
710
|
-
### Rate-Limit Path
|
|
711
|
-
|
|
712
|
-
If the provider returns a rate-limit response, the worker should:
|
|
713
|
-
|
|
714
|
-
1. Record a `rate_limit` usage event.
|
|
715
|
-
2. Put the key into cooldown by setting `cooldown_until`.
|
|
716
|
-
3. Release the lease.
|
|
717
|
-
4. Reschedule the job.
|
|
718
|
-
|
|
719
|
-
Cooldown duration may initially be determined by:
|
|
720
|
-
|
|
721
|
-
1. Provider response headers if available.
|
|
722
|
-
2. Provider-specific backoff policy.
|
|
723
|
-
3. Conservative defaults when headers are absent.
|
|
724
|
-
|
|
725
|
-
### Auth Failure Path
|
|
726
|
-
|
|
727
|
-
If the provider reports invalid credentials, the worker should:
|
|
728
|
-
|
|
729
|
-
1. Record an `auth_error` usage event.
|
|
730
|
-
2. Mark the key `invalid`.
|
|
731
|
-
3. Release the lease.
|
|
732
|
-
4. Retry with another key if one exists.
|
|
733
|
-
5. Fail the job clearly if no valid key remains.
|
|
734
|
-
|
|
735
|
-
### Scheduler Behavior
|
|
736
|
-
|
|
737
|
-
The scheduler should treat jobs as runnable only when both are true:
|
|
738
|
-
|
|
739
|
-
1. The job itself is ready to run.
|
|
740
|
-
2. A compatible provider key can likely be leased now or soon.
|
|
741
|
-
|
|
742
|
-
Recommended loop:
|
|
743
|
-
|
|
744
|
-
1. Fetch queued jobs ordered by `run_after`.
|
|
745
|
-
2. Attempt key lease acquisition for the next provider-dependent step.
|
|
746
|
-
3. If lease succeeds, run the step and mark job `running`.
|
|
747
|
-
4. If lease fails, move `run_after` forward instead of spinning.
|
|
748
|
-
5. Retry later.
|
|
749
|
-
|
|
750
|
-
This is what makes the AI key queue effectively a SQLite-backed coordination system rather than a separate infrastructure dependency.
|
|
751
|
-
|
|
752
|
-
### Observability
|
|
753
|
-
|
|
754
|
-
The platform should emit logs and metrics for:
|
|
755
|
-
|
|
756
|
-
1. Lease acquisition success rate.
|
|
757
|
-
2. Lease wait time.
|
|
758
|
-
3. Key cooldown frequency by provider.
|
|
759
|
-
4. Key invalidation frequency.
|
|
760
|
-
5. Job deferrals caused by unavailable keys.
|
|
761
|
-
|
|
762
|
-
These signals will tell us when SQLite remains sufficient and when key coordination needs a stronger backend.
|
|
763
|
-
|
|
764
|
-
## Rate Limiting and Provider Routing
|
|
765
|
-
|
|
766
|
-
Customer keys are not interchangeable infinite resources.
|
|
767
|
-
|
|
768
|
-
The platform must route AI calls through a provider layer that understands:
|
|
769
|
-
|
|
770
|
-
1. Provider type.
|
|
771
|
-
2. Model name.
|
|
772
|
-
3. Key-level rate limits.
|
|
773
|
-
4. Backoff behavior.
|
|
774
|
-
5. Retry policy.
|
|
775
|
-
6. Temporary key disablement after provider errors.
|
|
776
|
-
|
|
777
|
-
### Initial Approach
|
|
778
|
-
|
|
779
|
-
Use SQLite-backed leasing for queue and key selection in v1.
|
|
780
|
-
|
|
781
|
-
This is acceptable if:
|
|
782
|
-
|
|
783
|
-
1. One EC2 host is the source of truth.
|
|
784
|
-
2. SQLite is configured in WAL mode.
|
|
785
|
-
3. Concurrency expectations remain moderate.
|
|
786
|
-
4. Jobs are idempotent enough to survive retries.
|
|
787
|
-
|
|
788
|
-
### Future Upgrade Path
|
|
789
|
-
|
|
790
|
-
If platform concurrency or reliability needs outgrow SQLite, move the job and rate-limit state to Postgres before adopting many-worker horizontal scale.
|
|
791
|
-
|
|
792
|
-
## Billing Model
|
|
793
|
-
|
|
794
|
-
The platform owns billing enforcement, but templates should emit billing events through framework APIs.
|
|
795
|
-
|
|
796
|
-
Templates should not hand-roll their own pricing logic in arbitrary ways.
|
|
797
|
-
|
|
798
|
-
### Billing Principle
|
|
799
|
-
|
|
800
|
-
Bill conservatively enough to avoid cloud-cost loss.
|
|
801
|
-
|
|
802
|
-
Current target:
|
|
803
|
-
|
|
804
|
-
```txt
|
|
805
|
-
customer_charge_usd ~= platform_cost_usd * 2
|
|
806
|
-
```
|
|
807
|
-
|
|
808
|
-
This is a margin buffer, not a final finance system.
|
|
809
|
-
|
|
810
|
-
### Billing Event Types
|
|
811
|
-
|
|
812
|
-
The framework should support at least:
|
|
813
|
-
|
|
814
|
-
1. External AI token usage.
|
|
815
|
-
2. Remotion render usage.
|
|
816
|
-
3. EC2 / CPU / memory approximation.
|
|
817
|
-
4. Storage writes.
|
|
818
|
-
5. Data egress or expensive file processing when relevant.
|
|
819
|
-
|
|
820
|
-
### Billing API for Templates
|
|
821
|
-
|
|
822
|
-
Templates should call framework helpers like:
|
|
823
|
-
|
|
824
|
-
```ts
|
|
825
|
-
await ctx.billing.record({
|
|
826
|
-
type: "ai_generation",
|
|
827
|
-
provider: "openai",
|
|
828
|
-
model: "gpt-4.1",
|
|
829
|
-
estimatedCostUsd: 0.024,
|
|
830
|
-
metadata: {},
|
|
831
|
-
});
|
|
832
|
-
```
|
|
833
|
-
|
|
834
|
-
The framework should translate these events into customer-facing charges.
|
|
835
|
-
|
|
836
|
-
## Webhooks
|
|
837
|
-
|
|
838
|
-
Every job may include an optional webhook destination.
|
|
839
|
-
|
|
840
|
-
The platform should emit webhook events for:
|
|
841
|
-
|
|
842
|
-
1. `job.queued`
|
|
843
|
-
2. `job.running`
|
|
844
|
-
3. `job.progress`
|
|
845
|
-
4. `job.succeeded`
|
|
846
|
-
5. `job.failed`
|
|
847
|
-
6. `job.cancelled`
|
|
848
|
-
|
|
849
|
-
Webhook delivery should be:
|
|
850
|
-
|
|
851
|
-
1. Signed
|
|
852
|
-
2. Retried with backoff
|
|
853
|
-
3. Persisted as delivery attempts
|
|
854
|
-
|
|
855
|
-
## File and Artifact Storage
|
|
856
|
-
|
|
857
|
-
S3 is the system of record for customer-uploaded files and large generated outputs.
|
|
858
|
-
|
|
859
|
-
### Customer Workspace Convention
|
|
860
|
-
|
|
861
|
-
Suggested logical prefix:
|
|
862
|
-
|
|
863
|
-
```txt
|
|
864
|
-
s3://bucket/customers/:customerId/workspace/...
|
|
865
|
-
```
|
|
866
|
-
|
|
867
|
-
### Job Artifact Convention
|
|
868
|
-
|
|
869
|
-
Suggested logical prefix:
|
|
870
|
-
|
|
871
|
-
```txt
|
|
872
|
-
s3://bucket/templates/:templateId/users/:userId/jobs/:jobId/...
|
|
873
|
-
```
|
|
874
|
-
|
|
875
|
-
Artifacts may include:
|
|
876
|
-
|
|
877
|
-
1. Prompt snapshots
|
|
878
|
-
2. Storyboards
|
|
879
|
-
3. Preview images
|
|
880
|
-
4. Audio assets
|
|
881
|
-
5. Subtitle files
|
|
882
|
-
6. Render manifests
|
|
883
|
-
7. Final video outputs
|
|
884
|
-
|
|
885
|
-
This prefix is for template-generated outputs and intermediate artifacts. Keep it stable across storage backends so local development mirrors production object layout.
|
|
886
|
-
|
|
887
|
-
### Template Metadata Convention
|
|
888
|
-
|
|
889
|
-
Suggested logical prefix:
|
|
890
|
-
|
|
891
|
-
```txt
|
|
892
|
-
s3://bucket/templates/:templateId/about/...
|
|
893
|
-
```
|
|
894
|
-
|
|
895
|
-
This prefix is for framework-owned template metadata assets such as preview images, preview videos, and other public "about" media referenced by `GET /templates/:templateId`.
|
|
896
|
-
|
|
897
|
-
## Remotion Integration
|
|
898
|
-
|
|
899
|
-
Remotion should be treated as a specialized downstream execution path, not the center of the platform.
|
|
900
|
-
|
|
901
|
-
Templates can invoke Remotion through a framework adapter.
|
|
902
|
-
|
|
903
|
-
Suggested flow:
|
|
904
|
-
|
|
905
|
-
1. Template job prepares structured render input.
|
|
906
|
-
2. Template writes required assets through framework storage.
|
|
907
|
-
3. Template calls `ctx.remotion.render(...)`.
|
|
908
|
-
4. The adapter renders locally or via Lambda depending on environment config.
|
|
909
|
-
5. Final artifact is attached back to the parent job result.
|
|
910
|
-
|
|
911
|
-
This keeps Remotion as one implementation detail among several, rather than forcing the platform to be Remotion-first.
|
|
912
|
-
|
|
913
|
-
## Isolation Policy
|
|
914
|
-
|
|
915
|
-
Default mode is shared in-process execution inside the main platform runtime.
|
|
916
|
-
|
|
917
|
-
### Shared Execution Is Correct By Default
|
|
918
|
-
|
|
919
|
-
Use shared execution when the template:
|
|
920
|
-
|
|
921
|
-
1. Uses standard Node dependencies.
|
|
922
|
-
2. Fits within normal memory and CPU budgets.
|
|
923
|
-
3. Does not need a custom OS image.
|
|
924
|
-
4. Can safely coexist with other templates.
|
|
925
|
-
|
|
926
|
-
### Isolated Execution Is an Escape Hatch
|
|
927
|
-
|
|
928
|
-
Allow a template to declare isolated execution later if it needs:
|
|
929
|
-
|
|
930
|
-
1. Heavy FFmpeg or native binary workloads.
|
|
931
|
-
2. Custom Chromium or system library requirements.
|
|
932
|
-
3. A stricter security or resource boundary.
|
|
933
|
-
4. Independent scaling or scheduling.
|
|
934
|
-
|
|
935
|
-
Even then, the main platform should still own:
|
|
936
|
-
|
|
937
|
-
1. Auth
|
|
938
|
-
2. Billing
|
|
939
|
-
3. Job creation
|
|
940
|
-
4. Customer state
|
|
941
|
-
5. Webhook delivery
|
|
942
|
-
|
|
943
|
-
## Suggested Repository Shape
|
|
944
|
-
|
|
945
|
-
```txt
|
|
946
|
-
/src
|
|
947
|
-
/templates/template_0000
|
|
948
|
-
/templates/template_0001
|
|
949
|
-
/AWS_REMOTION_HANDOFF.md
|
|
950
|
-
/PLATFORM_SPEC.md
|
|
951
|
-
/SKILL.director.md
|
|
952
|
-
/SKILL.developer.md
|
|
953
|
-
```
|
|
954
|
-
|
|
955
|
-
## Suggested Internal Components
|
|
956
|
-
|
|
957
|
-
### Platform API
|
|
958
|
-
|
|
959
|
-
Responsibilities:
|
|
960
|
-
|
|
961
|
-
1. Request validation
|
|
962
|
-
2. Auth
|
|
963
|
-
3. Template lookup
|
|
964
|
-
4. Config updates
|
|
965
|
-
5. Job creation
|
|
966
|
-
6. Job status reads
|
|
967
|
-
7. Webhook registration
|
|
968
|
-
|
|
969
|
-
### Worker / Dispatcher
|
|
970
|
-
|
|
971
|
-
Responsibilities:
|
|
972
|
-
|
|
973
|
-
1. Pull queued jobs
|
|
974
|
-
2. Acquire provider key leases
|
|
975
|
-
3. Execute template jobs
|
|
976
|
-
4. Persist logs and artifacts
|
|
977
|
-
5. Update billing
|
|
978
|
-
6. Deliver completion webhooks
|
|
979
|
-
|
|
980
|
-
### Template Registry
|
|
981
|
-
|
|
982
|
-
Responsibilities:
|
|
983
|
-
|
|
984
|
-
1. Register approved in-house templates
|
|
985
|
-
2. Expose metadata
|
|
986
|
-
3. Resolve operations and jobs
|
|
987
|
-
4. Enforce version compatibility
|
|
988
|
-
|
|
989
|
-
## Security Notes
|
|
990
|
-
|
|
991
|
-
1. Customer provider keys must be encrypted at rest.
|
|
992
|
-
2. API keys must be hash-verified, not stored in plaintext.
|
|
993
|
-
3. Template code is trusted internal code in v1, not untrusted tenant code.
|
|
994
|
-
4. Webhook signatures must be mandatory.
|
|
995
|
-
5. Customer file access must always be scoped by customer identity.
|
|
996
|
-
6. Template source imports must pin an exact commit SHA before activation.
|
|
997
|
-
7. Templates must include a `SKILL.md` file so customer AI agents have a framework-owned usage contract.
|
|
998
|
-
|
|
999
|
-
## Operational Notes for v1
|
|
1000
|
-
|
|
1001
|
-
1. Run one Docker image on one EC2 host.
|
|
1002
|
-
2. Keep API and worker in the same deployable unit initially.
|
|
1003
|
-
3. Use SQLite in WAL mode.
|
|
1004
|
-
4. Back up SQLite and treat it as transitional infrastructure.
|
|
1005
|
-
5. Store all large artifacts in S3, not on container disk.
|
|
1006
|
-
6. Assume template code is centrally reviewed before deployment.
|
|
1007
|
-
7. Not every template requires Remotion. Remotion validation should only apply to templates that actually call the Remotion adapter.
|
|
1008
|
-
|
|
1009
|
-
## Template Certification Minimum
|
|
1010
|
-
|
|
1011
|
-
A template must not be activatable unless all of the following pass:
|
|
1012
|
-
|
|
1013
|
-
1. template metadata contract is valid
|
|
1014
|
-
2. operation-to-workflow references are valid
|
|
1015
|
-
3. a template-local `SKILL.md` file exists
|
|
1016
|
-
4. every operation defines a smoke-test payload
|
|
1017
|
-
5. the smoke-test harness passes
|
|
1018
|
-
6. the release is associated with a pinned commit SHA
|
|
1019
|
-
|
|
1020
|
-
## Known Limits of v1
|
|
1021
|
-
|
|
1022
|
-
1. SQLite is a reasonable starting point but not the final queueing backend for large-scale concurrency.
|
|
1023
|
-
2. Shared in-process template execution is simpler operationally but weaker as a hard isolation boundary.
|
|
1024
|
-
3. EC2 cost attribution will begin as approximation plus provider-cost tracking, not perfect real-time infrastructure metering.
|
|
1025
|
-
|
|
1026
|
-
## Recommendation Summary
|
|
1027
|
-
|
|
1028
|
-
The recommended initial Vidfarm platform is:
|
|
1029
|
-
|
|
1030
|
-
1. One Node.js 22 Docker container on EC2.
|
|
1031
|
-
2. Hono as the HTTP API layer.
|
|
1032
|
-
3. SQLite as the initial jobs and rate-limit store.
|
|
1033
|
-
4. S3 for workspace and artifact storage.
|
|
1034
|
-
5. Remotion Lambda as a downstream rendering path.
|
|
1035
|
-
6. Templates implemented as normal internal TypeScript packages with full npm access.
|
|
1036
|
-
7. Public API defined as async operations that enqueue jobs.
|
|
1037
|
-
8. Optional isolated template execution added later only when justified by concrete workload needs.
|
|
1038
|
-
|
|
1039
|
-
This is the simplest architecture that matches the product goals without prematurely turning each template into a separate service.
|