hyperframes 0.6.0-alpha.2 → 0.6.0-alpha.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: hyperframes
3
- description: Create video compositions, animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions in HyperFrames HTML. Use when asked to build any HTML-based video content, add captions or subtitles synced to audio, generate text-to-speech narration, create audio-reactive animation (beat sync, glow, pulse driven by music), add animated text highlighting (marker sweeps, hand-drawn circles, burst lines, scribble, sketchout), or add transitions between scenes (crossfades, wipes, reveals, shader transitions). Covers composition authoring, timing, media, and the full video production workflow. For CLI commands (init, lint, preview, render, transcribe, tts) see the hyperframes-cli skill.
3
+ description: Create video compositions, animations, title cards, overlays, captions, voiceovers, audio-reactive visuals, and scene transitions in HyperFrames HTML. Use when asked to build any HTML-based video content, add captions or subtitles synced to audio, generate text-to-speech narration, create audio-reactive animation (beat sync, glow, pulse driven by music), add animated text highlighting (marker sweeps, hand-drawn circles, burst lines, scribble, sketchout), or add transitions between scenes (crossfades, wipes, reveals, shader transitions). Covers composition authoring, timing, media, and the full video production workflow. For dev-loop CLI commands (init, lint, inspect, preview, render) see the hyperframes-cli skill; for asset preprocessing commands (tts, transcribe, remove-background) see the hyperframes-media skill.
4
4
  ---
5
5
 
6
6
  # HyperFrames
@@ -467,7 +467,6 @@ Skip on small edits (fixing a color, adjusting one duration). Run on new composi
467
467
  ## References (loaded on demand)
468
468
 
469
469
  - **[references/captions.md](references/captions.md)** — Captions, subtitles, lyrics, karaoke synced to audio. Tone-adaptive style detection, per-word styling, text overflow prevention, caption exit guarantees, word grouping. Read when adding any text synced to audio timing.
470
- - **[references/tts.md](references/tts.md)** — Text-to-speech with Kokoro-82M. Voice selection, speed tuning, TTS+captions workflow. Read when generating narration or voiceover.
471
470
  - **[references/audio-reactive.md](references/audio-reactive.md)** — Audio-reactive animation: map frequency bands and amplitude to GSAP properties. Read when visuals should respond to music, voice, or sound.
472
471
  - **[references/css-patterns.md](references/css-patterns.md)** — CSS+GSAP marker highlighting: highlight, circle, burst, scribble, sketchout. Deterministic, fully seekable. Read when adding visual emphasis to text.
473
472
  - **[references/video-composition.md](references/video-composition.md)** — Video-medium rules: density, color presence, scale, frame composition, design.md as brand not layout. **Always read** — these override web instincts.
@@ -481,7 +480,7 @@ Skip on small edits (fixing a color, adjusting one duration). Run on new composi
481
480
  - **[house-style.md](house-style.md)** — Default motion, sizing, and color palettes when no design.md is specified.
482
481
  - **[patterns.md](patterns.md)** — PiP, title cards, slide show patterns.
483
482
  - **[data-in-motion.md](data-in-motion.md)** — Data, stats, and infographic patterns.
484
- - **[references/transcript-guide.md](references/transcript-guide.md)** — Transcription commands, whisper models, external APIs, troubleshooting.
483
+ - **[references/transcript-guide.md](references/transcript-guide.md)** — Caption-side transcript handling: input formats, mandatory quality check, cleaning JS, OpenAI/Groq API fallback, "if no transcript exists" flow. (For the `transcribe` CLI invocation, model selection rules, and the `.en` gotcha, see the `hyperframes-media` skill.)
485
484
  - **[references/dynamic-techniques.md](references/dynamic-techniques.md)** — Dynamic caption animation techniques (karaoke, clip-path, slam, scatter, elastic, 3D).
486
485
 
487
486
  - **[references/transitions.md](references/transitions.md)** — Scene transitions: crossfades, wipes, reveals, shader transitions. Energy/mood selection, CSS vs WebGL guidance. **Always read for multi-scene compositions** — scenes without transitions feel like jump cuts.
@@ -30,6 +30,79 @@ tl.to(
30
30
  tl.to("#pip-frame", { left: 40, duration: 0.6 }, 30);
31
31
  ```
32
32
 
33
+ ## Text Behind Subject (transparent webm overlay)
34
+
35
+ Put a headline _behind_ a presenter so their silhouette occludes the text. Requires a transparent cutout produced by `npx hyperframes remove-background presenter.mp4 -o presenter.webm`.
36
+
37
+ Three layers, plus one critical rule:
38
+
39
+ ```html
40
+ <!-- z=1 base — full opaque mp4 (lobby + presenter), always visible -->
41
+ <video
42
+ id="cf-base"
43
+ data-start="0"
44
+ data-duration="6"
45
+ data-media-start="0"
46
+ data-track-index="0"
47
+ src="presenter.mp4"
48
+ muted
49
+ playsinline
50
+ ></video>
51
+
52
+ <!-- z=2 headline — visible the whole time -->
53
+ <h1
54
+ id="cf-headline"
55
+ style="position:absolute;top:50%;left:50%;
56
+ transform:translate(-50%,-50%); z-index:2; font-size:220px; font-weight:900;
57
+ color:#fff; text-shadow:0 6px 32px rgba(0,0,0,.55); clip-path:inset(0 0 100% 0);"
58
+ >
59
+ MAKE IT IN HYPERFRAMES
60
+ </h1>
61
+
62
+ <!-- z=3 cutout — same source, alpha around presenter, hidden until the cut -->
63
+ <!-- WRAPPER has the opacity, NOT the video itself (see rule below). -->
64
+ <div class="cutout-wrap" style="position:absolute;inset:0;z-index:3;opacity:0">
65
+ <video
66
+ id="cf-cutout"
67
+ data-start="0"
68
+ data-duration="6"
69
+ data-media-start="0"
70
+ data-track-index="1"
71
+ src="presenter.webm"
72
+ muted
73
+ playsinline
74
+ ></video>
75
+ </div>
76
+ ```
77
+
78
+ ```js
79
+ const tl = gsap.timeline({ paused: true });
80
+ const CUT = 3.3;
81
+
82
+ // Reveal headline early
83
+ tl.to("#cf-headline", { clipPath: "inset(0 0 0% 0)", duration: 0.6, ease: "expo.out" }, 0.25);
84
+
85
+ // At the cut, flip the cutout wrapper visible — the presenter's silhouette
86
+ // punches through the headline.
87
+ tl.set(".cutout-wrap", { opacity: 1 }, CUT);
88
+
89
+ // Sentinel: extend timeline to the composition's full duration so the
90
+ // renderer doesn't bail past the last meaningful tween.
91
+ tl.set({}, {}, 6);
92
+
93
+ window.__timelines["cover-flip"] = tl;
94
+ ```
95
+
96
+ **Why a wrapper div, not opacity on the video itself?**
97
+
98
+ The framework forces `opacity: 1` on any element with `data-start`/`data-duration` while it's "active" — that's how it manages clip lifecycles. A CSS `opacity: 0` on the video element is silently overwritten. Wrap the video in a div with no `data-*` attributes; the wrapper is owned by your CSS/GSAP.
99
+
100
+ **Why both videos at `data-start="0"`?**
101
+
102
+ So both decode in sync from t=0. Late-mounting the cutout (`data-start=3.3`) makes Chrome do a seek + decoder warm-up at mount, which can land a frame off the base mp4 — visible as a one-frame jitter at the cut.
103
+
104
+ **Color match:** `remove-background` defaults to `--quality balanced` (crf 18) which keeps the cutout's RGB nearly identical to the source mp4 — minimal edge halo or color shift when overlaid. Use `--quality best` (crf 12) for hero shots; only drop to `--quality fast` (crf 30) when the cutout sits over a _different_ background and the size matters.
105
+
33
106
  ## Title Card with Fade
34
107
 
35
108
  ```html
@@ -1,24 +1,6 @@
1
1
  # Transcript Guide
2
2
 
3
- ## How Transcripts Are Generated
4
-
5
- `hyperframes transcribe` handles both transcription and format conversion:
6
-
7
- ```bash
8
- # Transcribe audio/video (uses whisper.cpp locally, no API key needed)
9
- npx hyperframes transcribe audio.mp3
10
-
11
- # Use a larger model for better accuracy
12
- npx hyperframes transcribe audio.mp3 --model medium.en
13
-
14
- # Filter to English only (skips non-English speech)
15
- npx hyperframes transcribe audio.mp3 --language en
16
-
17
- # Import an existing transcript from another tool
18
- npx hyperframes transcribe captions.srt
19
- npx hyperframes transcribe captions.vtt
20
- npx hyperframes transcribe openai-response.json
21
- ```
3
+ For the `transcribe` CLI invocation, the `.en`-translates-non-English rule, and whisper model selection, see the `hyperframes-media` skill. This file covers what to do with the resulting transcript when authoring captions: input formats, mandatory quality checks, cleaning code, external-API fallbacks.
22
4
 
23
5
  ## Supported Input Formats
24
6
 
@@ -34,32 +16,6 @@ The CLI auto-detects and normalizes these formats:
34
16
 
35
17
  **Word-level timestamps produce better captions.** SRT/VTT give phrase-level timing, which works but can't do per-word animation effects.
36
18
 
37
- ## Whisper Model Guide
38
-
39
- The default model (`small.en`) balances accuracy and speed. For better results, use a larger model:
40
-
41
- | Model | Size | Speed | Accuracy | When to use |
42
- | ---------- | ------ | -------- | --------- | ------------------------------------- |
43
- | `tiny` | 75 MB | Fastest | Low | Quick previews, testing pipeline |
44
- | `base` | 142 MB | Fast | Fair | Short clips, clear audio |
45
- | `small` | 466 MB | Moderate | Good | **Default** — good for most content |
46
- | `medium` | 1.5 GB | Slow | Very good | Important content, noisy audio, music |
47
- | `large-v3` | 3.1 GB | Slowest | Best | Production quality |
48
-
49
- **Only add `.en` suffix when the user explicitly says the audio is English.** `.en` models are slightly more accurate for English but will TRANSLATE non-English audio instead of transcribing it.
50
-
51
- **Critical: `.en` models translate non-English audio into English** — they don't transcribe it. If the audio might not be English, always use a model without the `.en` suffix and pass `--language` to specify the source language. If you're unsure of the language, use `small` (not `small.en`) without `--language` — whisper will auto-detect.
52
-
53
- ```bash
54
- # Spanish audio
55
- npx hyperframes transcribe audio.mp3 --model small --language es
56
-
57
- # Unknown language — let whisper auto-detect
58
- npx hyperframes transcribe audio.mp3 --model small
59
- ```
60
-
61
- **Music and vocals over instrumentation**: `small.en` will misidentify lyrics — use `medium.en` as the minimum, or import lyrics manually. Even `medium.en` struggles with heavily produced tracks; for music videos, providing known lyrics as an SRT/VTT and importing with `hyperframes transcribe lyrics.srt` will always beat automated transcription.
62
-
63
19
  ## Transcript Quality Check (Mandatory)
64
20
 
65
21
  After every transcription, **read the transcript and check for quality issues before proceeding.** Bad transcripts produce nonsensical captions. Never skip this step.
@@ -1,6 +1,6 @@
1
1
  ---
2
2
  name: hyperframes-cli
3
- description: HyperFrames CLI tool — hyperframes init, lint, inspect, preview, render, transcribe, tts, remove-background, doctor, browser, info, upgrade, compositions, docs, benchmark. Use when scaffolding a project, linting, validating, inspecting visual layout in compositions, previewing in the studio, rendering to video, transcribing audio, generating TTS, removing the background from an avatar video for transparent overlays, or troubleshooting the HyperFrames environment.
3
+ description: HyperFrames CLI dev loop `npx hyperframes` for scaffolding (init), validation (lint, inspect), preview, render, and environment troubleshooting (doctor, browser, info, upgrade). Use when running any of these commands or troubleshooting the HyperFrames build/render environment. For asset preprocessing commands (`tts`, `transcribe`, `remove-background`), invoke the `hyperframes-media` skill instead.
4
4
  ---
5
5
 
6
6
  # HyperFrames CLI
@@ -120,37 +120,9 @@ npx hyperframes render --docker # byte-identical
120
120
 
121
121
  **Parametrized renders:** the composition declares its variables on the `<html>` root with **`data-composition-variables`** — a JSON **array of declarations** (`{id, type, label, default}` per entry) that defines the schema. Scripts inside read the resolved values via `window.__hyperframes.getVariables()`. The CLI **`--variables '{"title":"Q4 Report"}'`** is a JSON **object keyed by id** that overrides those declared defaults for one render; missing keys fall through, so the same composition runs unchanged in dev preview and in production. (Sub-comp hosts can also override per-instance with **`data-variable-values`** — same object shape, scoped to one mount of the sub-composition. See the `hyperframes` skill for the full pattern.)
122
122
 
123
- ## Transcription
123
+ ## Asset Preprocessing
124
124
 
125
- ```bash
126
- npx hyperframes transcribe audio.mp3
127
- npx hyperframes transcribe video.mp4 --model medium.en --language en
128
- npx hyperframes transcribe subtitles.srt # import existing
129
- npx hyperframes transcribe subtitles.vtt
130
- npx hyperframes transcribe openai-response.json
131
- ```
132
-
133
- ## Text-to-Speech
134
-
135
- ```bash
136
- npx hyperframes tts "Text here" --voice af_nova --output narration.wav
137
- npx hyperframes tts script.txt --voice bf_emma
138
- npx hyperframes tts --list # show all voices
139
- ```
140
-
141
- ## Background Removal (transparent video)
142
-
143
- Remove the background from a video or image so it can be used as a transparent overlay in a composition (e.g. an avatar floating on a background).
144
-
145
- ```bash
146
- npx hyperframes remove-background avatar.mp4 -o transparent.webm # default: VP9 alpha WebM
147
- npx hyperframes remove-background avatar.mp4 -o transparent.mov # ProRes 4444 for editing
148
- npx hyperframes remove-background portrait.jpg -o cutout.png # single-image cutout
149
- npx hyperframes remove-background avatar.mp4 -o transparent.webm --device cpu
150
- npx hyperframes remove-background --info # detected providers
151
- ```
152
-
153
- Uses `u2net_human_seg` (MIT). First run downloads ~168 MB of weights to `~/.cache/hyperframes/background-removal/models/` and reuses them after. Drop the resulting `.webm` into a composition with `<video src="transparent.webm" autoplay muted loop>` — Chrome decodes the alpha natively.
125
+ `npx hyperframes tts`, `transcribe`, and `remove-background` produce assets (narration audio, word-level transcripts, transparent video) that get dropped into a composition. Each downloads its own model on first run. For voice selection, whisper model rules (the `.en`-translates-non-English gotcha), output format choice (VP9 alpha WebM vs ProRes), and the TTS → transcribe → captions chain, invoke the `hyperframes-media` skill.
154
126
 
155
127
  ## Troubleshooting
156
128