npm - varg.ai-sdk - Versions diffs - 0.1.0 - Mend

varg.ai-sdk 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (48) hide show

package/.claude/settings.local.json +7 -0
package/.env.example +24 -0
package/CLAUDE.md +118 -0
package/README.md +231 -0
package/SKILLS.md +157 -0
package/STRUCTURE.md +92 -0
package/TEST_RESULTS.md +122 -0
package/action/captions/SKILL.md +170 -0
package/action/captions/index.ts +227 -0
package/action/edit/SKILL.md +235 -0
package/action/edit/index.ts +493 -0
package/action/image/SKILL.md +140 -0
package/action/image/index.ts +112 -0
package/action/sync/SKILL.md +136 -0
package/action/sync/index.ts +187 -0
package/action/transcribe/SKILL.md +179 -0
package/action/transcribe/index.ts +227 -0
package/action/video/SKILL.md +116 -0
package/action/video/index.ts +135 -0
package/action/voice/SKILL.md +125 -0
package/action/voice/index.ts +201 -0
package/biome.json +33 -0
package/index.ts +38 -0
package/lib/README.md +144 -0
package/lib/ai-sdk/fal.ts +106 -0
package/lib/ai-sdk/replicate.ts +107 -0
package/lib/elevenlabs.ts +382 -0
package/lib/fal.ts +478 -0
package/lib/ffmpeg.ts +467 -0
package/lib/fireworks.ts +235 -0
package/lib/groq.ts +246 -0
package/lib/higgsfield.ts +176 -0
package/lib/remotion/SKILL.md +823 -0
package/lib/remotion/cli.ts +115 -0
package/lib/remotion/functions.ts +283 -0
package/lib/remotion/index.ts +19 -0
package/lib/remotion/templates.ts +73 -0
package/lib/replicate.ts +304 -0
package/output.txt +1 -0
package/package.json +35 -0
package/pipeline/cookbooks/SKILL.md +285 -0
package/pipeline/cookbooks/remotion-video.md +585 -0
package/pipeline/cookbooks/round-video-character.md +337 -0
package/pipeline/cookbooks/talking-character.md +59 -0
package/test-import.ts +7 -0
package/test-services.ts +97 -0
package/tsconfig.json +29 -0
package/utilities/s3.ts +147 -0

package/.claude/settings.local.json ADDED Viewed

@@ -0,0 +1,7 @@
+{
+  "permissions": {
+    "allow": ["Bash(mkdir:*)"],
+    "deny": [],
+    "ask": []
+  }
+}

package/.env.example ADDED Viewed

@@ -0,0 +1,24 @@
+# fal.ai api key
+FAL_API_KEY=fal_xxx
+# higgsfield credentials
+HIGGSFIELD_API_KEY=hf_xxx
+HIGGSFIELD_SECRET=secret_xxx
+# elevenlabs api key
+ELEVENLABS_API_KEY=el_xxx
+# groq api key (ultra-fast whisper transcription)
+GROQ_API_KEY=gsk_xxx
+# fireworks api key (word-level transcription with timestamps)
+FIREWORKS_API_KEY=fw_xxx
+# cloudflare r2 / s3 storage
+CLOUDFLARE_R2_API_URL=https://xxx.r2.cloudflarestorage.com
+CLOUDFLARE_ACCESS_KEY_ID=xxx
+CLOUDFLARE_ACCESS_SECRET=xxx
+CLOUDFLARE_R2_BUCKET=m
+# replicate (optional)
+REPLICATE_API_TOKEN=r8_xxx

package/CLAUDE.md ADDED Viewed

@@ -0,0 +1,118 @@
+---
+description: Use Bun instead of Node.js, npm, pnpm, or vite. Use existing tools via bash commands.
+globs: "*.ts, *.tsx, *.html, *.css, *.js, *.jsx, package.json"
+alwaysApply: false
+---
+Default to using Bun instead of Node.js.
+## Working with this SDK
+- **Use existing tools**: Always use the built-in CLI tools via bash commands (e.g., `bun run lib/fal.ts`, `bun run lib/elevenlabs.ts`)
+- **Don't write custom scripts**: Avoid creating new TypeScript/JavaScript scripts. Use the existing lib/ tools directly
+- **Media folders**: Store input files in `media/` folder, outputs go to `output/` folder
+- **Local file support**: Tools like `lib/fal.ts` support local file paths (e.g., `media/image.png`) in addition to URLs
+- Use `bun <file>` instead of `node <file>` or `ts-node <file>`
+- Use `bun test` instead of `jest` or `vitest`
+- Use `bun build <file.html|file.ts|file.css>` instead of `webpack` or `esbuild`
+- Use `bun install` instead of `npm install` or `yarn install` or `pnpm install`
+- Use `bun run <script>` instead of `npm run <script>` or `yarn run <script>` or `pnpm run <script>`
+- Bun automatically loads .env, so don't use dotenv.
+## APIs
+- `Bun.serve()` supports WebSockets, HTTPS, and routes. Don't use `express`.
+- `bun:sqlite` for SQLite. Don't use `better-sqlite3`.
+- `Bun.redis` for Redis. Don't use `ioredis`.
+- `Bun.sql` for Postgres. Don't use `pg` or `postgres.js`.
+- `WebSocket` is built-in. Don't use `ws`.
+- Prefer `Bun.file` over `node:fs`'s readFile/writeFile
+- Bun.$`ls` instead of execa.
+## Testing
+Use `bun test` to run tests.
+```ts#index.test.ts
+import { test, expect } from "bun:test";
+test("hello world", () => {
+  expect(1).toBe(1);
+});
+```
+## Frontend
+Use HTML imports with `Bun.serve()`. Don't use `vite`. HTML imports fully support React, CSS, Tailwind.
+Server:
+```ts#index.ts
+import index from "./index.html"
+Bun.serve({
+  routes: {
+    "/": index,
+    "/api/users/:id": {
+      GET: (req) => {
+        return new Response(JSON.stringify({ id: req.params.id }));
+      },
+    },
+  },
+  // optional websocket support
+  websocket: {
+    open: (ws) => {
+      ws.send("Hello, world!");
+    },
+    message: (ws, message) => {
+      ws.send(message);
+    },
+    close: (ws) => {
+      // handle close
+    }
+  },
+  development: {
+    hmr: true,
+    console: true,
+  }
+})
+```
+HTML files can import .tsx, .jsx or .js files directly and Bun's bundler will transpile & bundle automatically. `<link>` tags can point to stylesheets and Bun's CSS bundler will bundle.
+```html#index.html
+<html>
+  <body>
+    <h1>Hello, world!</h1>
+    <script type="module" src="./frontend.tsx"></script>
+  </body>
+</html>
+```
+With the following `frontend.tsx`:
+```tsx#frontend.tsx
+import React from "react";
+// import .css files directly and it works
+import './index.css';
+import { createRoot } from "react-dom/client";
+const root = createRoot(document.body);
+export default function Frontend() {
+  return <h1>Hello, world!</h1>;
+}
+root.render(<Frontend />);
+```
+Then, run index.ts
+```sh
+bun --hot ./index.ts
+```
+For more information, read the Bun API docs in `node_modules/bun-types/docs/**.md`.

package/README.md ADDED Viewed

@@ -0,0 +1,231 @@
+# varg.ai sdk
+video generation and editing tools sdk
+## folder structure
+```
+sdk/
+│
+├── media/              # working directory for media files (images, videos, audio)
+├── output/             # generated output files
+│
+├── utilities/
+│
+├── lib/
+│   ├── pymovie/
+│   ├── opencv/
+│   ├── fal/
+│   ├── higgsfield/
+│   ├── ffmpeg/
+│   ├── remotion/
+│   ├── remotion.dev/
+│   └── motion.dev/
+│
+├── service/
+│   ├── image/          # image generation + SKILL.md
+│   ├── video/          # video generation + SKILL.md
+│   ├── voice/          # voice synthesis + SKILL.md
+│   ├── sync/           # lipsync + SKILL.md
+│   ├── captions/       # video captions + SKILL.md
+│   ├── edit/           # video editing + SKILL.md
+│   └── transcribe/     # audio transcription + SKILL.md
+│
+└── pipeline/
+    └── cookbooks/
+```
+## installation
+```bash
+bun install
+```
+set environment variables in `.env`:
+```bash
+FAL_API_KEY=fal_xxx
+HIGGSFIELD_API_KEY=hf_xxx
+HIGGSFIELD_SECRET=secret_xxx
+REPLICATE_API_TOKEN=r8_xxx
+ELEVENLABS_API_KEY=el_xxx
+GROQ_API_KEY=gsk_xxx
+FIREWORKS_API_KEY=fw_xxx
+CLOUDFLARE_R2_API_URL=https://xxx.r2.cloudflarestorage.com
+CLOUDFLARE_ACCESS_KEY_ID=xxx
+CLOUDFLARE_ACCESS_SECRET=xxx
+CLOUDFLARE_R2_BUCKET=m
+```
+## usage
+### as cli
+```bash
+# generate image with ai-sdk (recommended)
+bun run lib/ai-sdk/fal.ts generate_image "a beautiful sunset" "fal-ai/flux/dev" "16:9"
+# generate image with fal client (advanced features)
+bun run lib/fal.ts generate_image "a beautiful sunset"
+# generate video from image (supports local files)
+bun run lib/fal.ts image_to_video "person talking" media/image.jpg 5
+bun run lib/fal.ts image_to_video "person talking" https://example.com/image.jpg 5
+# generate soul character
+bun run lib/higgsfield.ts generate_soul "professional headshot"
+# generate video with replicate
+bun run lib/replicate.ts minimax "person walking on beach"
+# generate voice with elevenlabs
+bun run lib/elevenlabs.ts tts "hello world" rachel output.mp3
+# transcribe audio to text/subtitles
+bun run service/transcribe media/audio.mp3 groq
+bun run service/transcribe media/audio.mp3 fireworks output.srt
+bun run lib/fireworks.ts media/audio.mp3 output.srt
+# edit video with ffmpeg
+bun run lib/ffmpeg.ts concat output.mp4 video1.mp4 video2.mp4
+# lipsync video with audio
+bun run service/sync overlay video.mp4 audio.mp3 synced.mp4
+# upload file to s3
+bun run utilities/s3.ts upload ./video.mp4 videos/output.mp4
+```
+### as library
+```typescript
+import { generateImage, imageToVideo } from "varg.ai-sdk"
+import { uploadFromUrl } from "varg.ai-sdk"
+// generate image
+const img = await generateImage({
+  prompt: "a beautiful sunset",
+  model: "fal-ai/flux-pro/v1.1",
+})
+// animate it
+const video = await imageToVideo({
+  prompt: "camera pan across scene",
+  imageUrl: img.data.images[0].url,
+  duration: 5,
+})
+// upload to s3
+const url = await uploadFromUrl(
+  video.data.video.url,
+  "videos/sunset.mp4"
+)
+console.log(`uploaded: ${url}`)
+```
+## modules
+### lib
+core libraries for video/audio/ai processing:
+- **ai-sdk/fal**: fal.ai using vercel ai sdk (recommended for images)
+- **ai-sdk/replicate**: replicate.com using vercel ai sdk
+- **fal**: fal.ai using direct client (for video & advanced features, supports local file uploads)
+- **higgsfield**: soul character generation
+- **replicate**: replicate.com api (minimax, kling, luma, flux)
+- **elevenlabs**: text-to-speech and voice generation
+- **groq**: ultra-fast whisper transcription (audio to text)
+- **fireworks**: word-level audio transcription with timestamps (srt/vtt)
+- **ffmpeg**: video editing operations (concat, trim, resize, etc.)
+- **remotion**: programmatic video creation with react
+### media folder
+- **media/**: working directory for storing input media files (images, videos, audio)
+- **output/**: directory for generated/processed output files
+- use `media/` for source files, `output/` for results
+- fal.ts supports local file paths from `media/` folder
+### service
+high-level services combining multiple libs. each service includes a SKILL.md for claude code agent skills:
+- **image**: image generation (fal + higgsfield)
+- **video**: video generation from image/text
+- **voice**: voice generation with multiple providers (elevenlabs)
+- **transcribe**: audio transcription with groq whisper or fireworks (srt support)
+- **sync**: lipsync workflows (wav2lip, audio overlay)
+- **captions**: auto-generate and overlay subtitles on videos
+- **edit**: video editing workflows (resize, trim, concat, social media prep)
+### utilities
+- **s3**: cloudflare r2 / s3 storage operations
+### pipeline
+- **cookbooks**: step-by-step recipes for complex workflows (includes talking-character SKILL.md)
+## key learnings
+### remotion batch rendering with variations
+when creating multiple video variations (e.g., 15 videos with different images):
+**❌ don't do this:**
+```bash
+# overwriting files causes caching issues
+for i in 1..15; do
+  cp woman-$i-before.jpg lib/remotion/public/before.jpg  # overwrites!
+  cp woman-$i-after.jpg lib/remotion/public/after.jpg    # overwrites!
+  render video
+done
+# result: all videos show the same woman (the last one)
+```
+**✅ do this instead:**
+```typescript
+// 1. use unique filenames for each variation
+// lib/remotion/public/woman-01-before.jpg, woman-02-before.jpg, etc.
+// 2. pass variation id as prop
+interface Props { variationId?: string }
+const MyComp: React.FC<Props> = ({ variationId = "01" }) => {
+  const beforeImg = staticFile(`woman-${variationId}-before.jpg`);
+  const afterImg = staticFile(`woman-${variationId}-after.jpg`);
+}
+// 3. register multiple compositions with unique props
+registerRoot(() => (
+  <>
+    {Array.from({ length: 15 }, (_, i) => {
+      const variationId = String(i + 1).padStart(2, "0");
+      return (
+        <Composition
+          id={`MyVideo-${variationId}`}
+          component={MyComp}
+          defaultProps={{ variationId }}
+          {...otherProps}
+        />
+      );
+    })}
+  </>
+));
+// 4. render each composition
+bun run lib/remotion/index.ts render root.tsx MyVideo-01 output-01.mp4
+bun run lib/remotion/index.ts render root.tsx MyVideo-02 output-02.mp4
+```
+**why this matters:**
+- remotion's `staticFile()` caches based on filename
+- overwriting files between renders causes all videos to use the last cached version
+- unique filenames + props ensure each render uses correct assets
+### fal.ai nsfw content filtering
+fal.ai automatically filters content that may be nsfw:
+**symptoms:**
+- image generation succeeds but returns empty file (~7.6KB)
+- no error message
+- happens with certain clothing/body descriptions
+**solution:**
+- be explicit about modest, full-coverage clothing:
+  - ✅ "long sleeve athletic top and full length leggings"
+  - ❌ "athletic wear" (vague, may trigger filter)
+- add "professional", "modest", "appropriate" to prompts
+- always check file sizes after batch generation (< 10KB = filtered)

package/SKILLS.md ADDED Viewed

@@ -0,0 +1,157 @@
+# agent skills
+this sdk includes claude code agent skills for each service. each skill is co-located with its service code.
+## available skills
+### service skills
+located in `service/<name>/SKILL.md`:
+1. **image-generation** (`service/image/`)
+   - generate ai images using fal (flux models) or higgsfield soul characters
+   - cli: `bun run service/image fal|soul <prompt> [options]`
+2. **video-generation** (`service/video/`)
+   - generate videos from images (local or url) or text prompts using fal.ai
+   - supports local image files - automatically uploads to fal storage
+   - cli: `bun run service/video from_image|from_text <args>`
+3. **voice-synthesis** (`service/voice/`)
+   - generate realistic text-to-speech audio using elevenlabs
+   - cli: `bun run service/voice generate|elevenlabs <text> [options]`
+3b. **music-generation** (`lib/elevenlabs.ts`)
+   - generate music from text prompts using elevenlabs
+   - generate sound effects from descriptions
+   - cli: `bun run lib/elevenlabs.ts music|sfx <prompt> [options]`
+4. **video-lipsync** (`service/sync/`)
+   - sync video with audio using wav2lip or simple overlay
+   - cli: `bun run service/sync sync|wav2lip|overlay <args>`
+5. **video-captions** (`service/captions/`)
+   - add auto-generated or custom subtitles to videos
+   - cli: `bun run service/captions <videoPath> [options]`
+6. **video-editing** (`service/edit/`)
+   - edit videos with ffmpeg (resize, trim, concat, social media prep)
+   - cli: `bun run service/edit social|montage|trim|resize|merge_audio <args>`
+7. **audio-transcription** (`service/transcribe/`)
+   - transcribe audio to text or subtitles using groq/fireworks
+   - cli: `bun run service/transcribe <audioUrl> <provider> [outputPath]`
+### utility skills
+8. **telegram-send** (external: `/Users/aleks/Github/Badaboom1995/rumble-b2c`)
+   - send videos to telegram users/channels as round videos
+   - automatically converts to 512x512 square format for telegram
+   - cli: `cd /Users/aleks/Github/Badaboom1995/rumble-b2c && bun run scripts/telegram-send-video.ts <videoPath> <@username>`
+   - example: `cd /Users/aleks/Github/Badaboom1995/rumble-b2c && bun run scripts/telegram-send-video.ts /path/to/video.mp4 @caffeinum`
+### pipeline skills
+located in `pipeline/cookbooks/SKILL.md`:
+9. **talking-character-pipeline** (`pipeline/cookbooks/`)
+   - complete workflow to create talking character videos
+   - combines: character generation → voiceover → animation → lipsync → captions → social prep
+10. **round-video-character** (`pipeline/cookbooks/round-video-character.md`)
+   - create realistic round selfie videos for telegram using nano banana pro + wan 2.5
+   - workflow: generate selfie first frame (person in setting) → voiceover → wan 2.5 video
+   - uses: `bun run lib/fal.ts`, `bun run lib/replicate.ts`, `bun run lib/elevenlabs.ts`
+   - input: text script + profile photo
+   - output: extreme close-up selfie video with authentic camera shake, lighting, and audio
+## structure
+each skill follows this pattern:
+```
+service/<name>/
+├── index.ts      # service implementation
+└── SKILL.md      # claude code agent skill
+```
+## how skills work
+skills are **model-invoked** - claude autonomously decides when to use them based on your request and the skill's description.
+**example:**
+- you say: "create a talking character video"
+- claude reads `talking-character-pipeline` skill
+- claude executes the workflow using the pipeline steps
+## using skills
+### in claude code
+skills are automatically discovered when you're in the sdk directory:
+```
+user: create an image of a sunset
+claude: [uses image-generation skill]
+        bun run service/image fal "beautiful sunset over mountains"
+```
+### manually
+you can also run services directly:
+```bash
+# generate image
+bun run service/image fal "sunset over mountains" true
+# generate video from that image
+bun run service/video from_image "camera pan" https://image-url.jpg 5 true
+# add voice
+bun run service/voice elevenlabs "this is a beautiful sunset" rachel true
+# sync with video
+bun run service/sync wav2lip https://video-url.mp4 https://audio-url.mp3
+```
+## skill features
+each skill includes:
+- **name**: unique skill identifier
+- **description**: when claude should use this skill
+- **allowed-tools**: restricted to Read, Bash for safety
+- **usage examples**: cli and programmatic examples
+- **when to use**: specific use cases
+- **tips**: best practices
+- **environment variables**: required api keys
+## benefits
+- **discoverability**: claude knows all available services
+- **context**: skills provide usage examples and best practices
+- **safety**: `allowed-tools` limits to read-only and bash execution
+- **documentation**: skills serve as living documentation
+## skill reference
+| skill | service | primary use case |
+|-------|---------|------------------|
+| image-generation | image | create ai images, character headshots |
+| video-generation | video | animate images, generate video clips |
+| voice-synthesis | voice | text-to-speech, voiceovers |
+| music-generation | elevenlabs | generate music, create sound effects |
+| video-lipsync | sync | sync audio with video, talking characters |
+| video-captions | captions | add subtitles, accessibility |
+| video-editing | edit | resize, trim, social media optimization |
+| audio-transcription | transcribe | speech-to-text, subtitle generation |
+| telegram-send | external | send videos to telegram as round videos |
+| talking-character-pipeline | pipeline | end-to-end talking character videos |
+| round-video-character | pipeline | telegram round selfie videos with wan 2.5 |
+## see also
+- [README.md](README.md) - sdk overview and installation
+- [STRUCTURE.md](STRUCTURE.md) - detailed module organization
+- [pipeline/cookbooks/talking-character.md](pipeline/cookbooks/talking-character.md) - talking character workflow
+- [pipeline/cookbooks/round-video-character.md](pipeline/cookbooks/round-video-character.md) - telegram round selfie video cookbook

package/STRUCTURE.md ADDED Viewed

@@ -0,0 +1,92 @@
+# sdk structure
+## lib/ - two fal implementations
+### lib/ai-sdk/fal.ts
+uses `@ai-sdk/fal` with vercel ai sdk
+**when to use:**
+- standard image generation
+- need consistent api across providers
+- want automatic image format handling
+- prefer typed aspect ratios
+**commands:**
+```bash
+bun run lib/ai-sdk/fal.ts generate_image <prompt> [model] [aspectRatio]
+```
+### lib/fal.ts
+uses `@fal-ai/client` directly
+**when to use:**
+- video generation (image-to-video, text-to-video)
+- advanced fal features
+- need queue/streaming updates
+- custom api parameters
+**commands:**
+```bash
+bun run lib/fal.ts generate_image <prompt> [model] [imageSize]
+bun run lib/fal.ts image_to_video <prompt> <imageUrl> [duration]
+bun run lib/fal.ts text_to_video <prompt> [duration]
+```
+### lib/higgsfield.ts
+uses `@higgsfield/client` for soul character generation
+**commands:**
+```bash
+bun run lib/higgsfield.ts generate_soul <prompt> [customReferenceId]
+bun run lib/higgsfield.ts create_character <name> <imageUrl1> [imageUrl2...]
+bun run lib/higgsfield.ts list_styles
+```
+## service/ - high-level wrappers
+### service/image.ts
+combines fal + higgsfield for image generation
+```bash
+bun run service/image.ts fal <prompt> [model] [upload]
+bun run service/image.ts soul <prompt> [customReferenceId] [upload]
+```
+### service/video.ts
+video generation with optional s3 upload
+```bash
+bun run service/video.ts from_image <prompt> <imageUrl> [duration] [upload]
+bun run service/video.ts from_text <prompt> [duration] [upload]
+```
+## utilities/
+### utilities/s3.ts
+cloudflare r2 / s3 storage operations
+```bash
+bun run utilities/s3.ts upload <filePath> <objectKey>
+bun run utilities/s3.ts upload_from_url <url> <objectKey>
+bun run utilities/s3.ts presigned_url <objectKey> [expiresIn]
+```
+## pipeline/cookbooks/
+markdown guides for complex workflows
+- `talking-character.md`: create talking character videos
+## dependencies
+- `@ai-sdk/fal` - vercel ai sdk fal provider
+- `@fal-ai/client` - official fal client
+- `@higgsfield/client` - higgsfield api client
+- `@aws-sdk/client-s3` - s3 storage
+- `ai` - vercel ai sdk core
+## key decisions
+1. **two fal implementations** - ai-sdk for simplicity, client for power
+2. **all scripts are cli + library** - can be run directly or imported
+3. **consistent logging** - `[module] message` format
+4. **auto image opening** - ai-sdk version opens images automatically