npm - @vargai/sdk - Versions diffs - 0.1.1 - Mend

@vargai/sdk 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (67) hide show

package/.env.example +24 -0
package/CLAUDE.md +118 -0
package/HIGGSFIELD_REWRITE_SUMMARY.md +300 -0
package/README.md +231 -0
package/SKILLS.md +157 -0
package/STRUCTURE.md +92 -0
package/TEST_RESULTS.md +122 -0
package/action/captions/SKILL.md +170 -0
package/action/captions/index.ts +169 -0
package/action/edit/SKILL.md +235 -0
package/action/edit/index.ts +437 -0
package/action/image/SKILL.md +140 -0
package/action/image/index.ts +105 -0
package/action/sync/SKILL.md +136 -0
package/action/sync/index.ts +145 -0
package/action/transcribe/SKILL.md +179 -0
package/action/transcribe/index.ts +210 -0
package/action/video/SKILL.md +116 -0
package/action/video/index.ts +125 -0
package/action/voice/SKILL.md +125 -0
package/action/voice/index.ts +136 -0
package/biome.json +33 -0
package/bun.lock +842 -0
package/cli/commands/find.ts +58 -0
package/cli/commands/help.ts +70 -0
package/cli/commands/list.ts +49 -0
package/cli/commands/run.ts +237 -0
package/cli/commands/which.ts +66 -0
package/cli/discover.ts +66 -0
package/cli/index.ts +33 -0
package/cli/runner.ts +65 -0
package/cli/types.ts +49 -0
package/cli/ui.ts +185 -0
package/index.ts +75 -0
package/lib/README.md +144 -0
package/lib/ai-sdk/fal.ts +106 -0
package/lib/ai-sdk/replicate.ts +107 -0
package/lib/elevenlabs.ts +382 -0
package/lib/fal.ts +467 -0
package/lib/ffmpeg.ts +467 -0
package/lib/fireworks.ts +235 -0
package/lib/groq.ts +246 -0
package/lib/higgsfield/MIGRATION.md +308 -0
package/lib/higgsfield/README.md +273 -0
package/lib/higgsfield/example.ts +228 -0
package/lib/higgsfield/index.ts +241 -0
package/lib/higgsfield/soul.ts +262 -0
package/lib/higgsfield.ts +176 -0
package/lib/remotion/SKILL.md +823 -0
package/lib/remotion/cli.ts +115 -0
package/lib/remotion/functions.ts +283 -0
package/lib/remotion/index.ts +19 -0
package/lib/remotion/templates.ts +73 -0
package/lib/replicate.ts +304 -0
package/output.txt +1 -0
package/package.json +42 -0
package/pipeline/cookbooks/SKILL.md +285 -0
package/pipeline/cookbooks/remotion-video.md +585 -0
package/pipeline/cookbooks/round-video-character.md +337 -0
package/pipeline/cookbooks/talking-character.md +59 -0
package/scripts/produce-menopause-campaign.sh +202 -0
package/service/music/SKILL.md +229 -0
package/service/music/index.ts +296 -0
package/test-import.ts +7 -0
package/test-services.ts +97 -0
package/tsconfig.json +29 -0
package/utilities/s3.ts +147 -0

package/action/image/SKILL.md ADDED Viewed

@@ -0,0 +1,140 @@
+---
+name: image-generation
+description: generate ai images using fal (flux models) or higgsfield soul characters. use when user wants to create images, headshots, character portraits, or needs image generation with specific models.
+allowed-tools: Read, Bash
+---
+# image generation
+generate ai images using multiple providers with automatic s3 upload support.
+## providers
+### fal (flux models)
+- high quality image generation
+- supports flux-pro, flux-dev, and other flux models
+- configurable model selection
+- automatic image opening on generation
+### higgsfield soul
+- character headshot generation
+- consistent character style
+- professional portrait quality
+- custom style references
+## usage
+### generate with fal
+```bash
+bun run service/image.ts fal "a beautiful sunset over mountains" [model] [upload]
+```
+**parameters:**
+- `prompt` (required): text description of the image
+- `model` (optional): fal model to use (default: flux-pro)
+- `upload` (optional): "true" to upload to s3
+**example:**
+```bash
+bun run service/image.ts fal "professional headshot, studio lighting" true
+```
+### generate with soul
+```bash
+bun run service/image.ts soul "friendly person smiling" [styleId] [upload]
+```
+**parameters:**
+- `prompt` (required): character description
+- `styleId` (optional): custom higgsfield style reference
+- `upload` (optional): "true" to upload to s3
+**example:**
+```bash
+bun run service/image.ts soul "professional business woman" true
+```
+## as library
+```typescript
+import { generateWithFal, generateWithSoul } from "./service/image"
+// fal generation
+const falResult = await generateWithFal("sunset over ocean", {
+  model: "fal-ai/flux-pro/v1.1",
+  upload: true
+})
+console.log(falResult.imageUrl)
+console.log(falResult.uploaded) // s3 url if upload=true
+// soul generation
+const soulResult = await generateWithSoul("friendly character", {
+  upload: true
+})
+console.log(soulResult.imageUrl)
+```
+## output
+returns `ImageGenerationResult`:
+```typescript
+{
+  imageUrl: string,      // direct image url
+  uploaded?: string      // s3 url if upload requested
+}
+```
+## when to use
+use this skill when:
+- generating images from text descriptions
+- creating character headshots or portraits
+- need consistent character style (use soul)
+- need high quality photorealistic images (use fal)
+- preparing images for video generation pipeline
+## nsfw filtering and content moderation
+fal.ai has content safety filters that may flag images as nsfw:
+**common triggers:**
+- prompts mentioning "athletic wear", "fitted sportswear", "gym clothes"
+- certain body descriptions even when clothed
+- prompts that could be interpreted as revealing clothing
+**symptoms:**
+- image generation returns but file is empty (often 7.6KB)
+- no error message, just an unusable file
+- happens inconsistently across similar prompts
+**solutions:**
+- specify modest, full-coverage clothing explicitly:
+  - ✅ "long sleeve athletic top and full length leggings"
+  - ✅ "fully covered in modest workout attire"
+  - ❌ "athletic wear" (too vague, may trigger filter)
+  - ❌ "fitted sportswear" (may trigger filter)
+- add "professional", "modest", "appropriate" to descriptions
+- if multiple images in batch get flagged, adjust prompts to be more explicit about coverage
+- always check output file sizes - empty files (< 10KB) indicate nsfw filtering
+**example:**
+```bash
+# ❌ may get flagged as nsfw
+bun run service/image.ts fal "woman in athletic wear"
+# ✅ less likely to trigger filter
+bun run service/image.ts fal "woman wearing long sleeve athletic top and full length leggings"
+```
+## environment variables
+required:
+- `FAL_API_KEY` - for fal image generation
+- `HIGGSFIELD_API_KEY` - for soul character generation
+- `HIGGSFIELD_SECRET` - for higgsfield authentication
+optional (for s3 upload):
+- `CLOUDFLARE_R2_API_URL`
+- `CLOUDFLARE_ACCESS_KEY_ID`
+- `CLOUDFLARE_ACCESS_SECRET`
+- `CLOUDFLARE_R2_BUCKET`

package/action/image/index.ts ADDED Viewed

@@ -0,0 +1,105 @@
+#!/usr/bin/env bun
+/**
+ * image generation service combining fal and higgsfield
+ * usage: bun run service/image.ts <command> <args>
+ */
+import type { ActionMeta } from "../../cli/types";
+import { generateImage } from "../../lib/fal";
+import { generateSoul } from "../../lib/higgsfield";
+import { uploadFromUrl } from "../../utilities/s3";
+export const meta: ActionMeta = {
+  name: "image",
+  type: "action",
+  description: "generate image from text",
+  inputType: "text",
+  outputType: "image",
+  schema: {
+    input: {
+      type: "object",
+      required: ["prompt"],
+      properties: {
+        prompt: { type: "string", description: "what to generate" },
+        size: {
+          type: "string",
+          enum: [
+            "square_hd",
+            "landscape_4_3",
+            "portrait_4_3",
+            "landscape_16_9",
+          ],
+          default: "landscape_4_3",
+          description: "image size/aspect",
+        },
+      },
+    },
+    output: { type: "string", format: "file-path", description: "image path" },
+  },
+  async run(options) {
+    const { prompt, size } = options as { prompt: string; size?: string };
+    return generateWithFal(prompt, { model: size });
+  },
+};
+export interface ImageGenerationResult {
+  imageUrl: string;
+  uploaded?: string;
+}
+export async function generateWithFal(
+  prompt: string,
+  options: { model?: string; upload?: boolean } = {},
+): Promise<ImageGenerationResult> {
+  console.log("[service/image] generating with fal");
+  const result = await generateImage({ prompt, model: options.model });
+  const imageUrl = result.data?.images?.[0]?.url;
+  if (!imageUrl) {
+    throw new Error("no image url in result");
+  }
+  let uploaded: string | undefined;
+  if (options.upload) {
+    const timestamp = Date.now();
+    const objectKey = `images/fal/${timestamp}.png`;
+    uploaded = await uploadFromUrl(imageUrl, objectKey);
+    console.log(`[service/image] uploaded to ${uploaded}`);
+  }
+  return { imageUrl, uploaded };
+}
+export async function generateWithSoul(
+  prompt: string,
+  options: { styleId?: string; upload?: boolean } = {},
+): Promise<ImageGenerationResult> {
+  console.log("[service/image] generating with higgsfield soul");
+  const result = await generateSoul({
+    prompt,
+    styleId: options.styleId,
+  });
+  const imageUrl = result.jobs?.[0]?.results?.raw?.url;
+  if (!imageUrl) {
+    throw new Error("no image url in result");
+  }
+  let uploaded: string | undefined;
+  if (options.upload) {
+    const timestamp = Date.now();
+    const objectKey = `images/soul/${timestamp}.png`;
+    uploaded = await uploadFromUrl(imageUrl, objectKey);
+    console.log(`[service/image] uploaded to ${uploaded}`);
+  }
+  return { imageUrl, uploaded };
+}
+// cli
+if (import.meta.main) {
+  const { runCli } = await import("../../cli/runner");
+  runCli(meta);
+}

package/action/sync/SKILL.md ADDED Viewed

@@ -0,0 +1,136 @@
+---
+name: video-lipsync
+description: sync video with audio using wav2lip ai model or simple audio overlay. use when creating talking videos, matching lip movements to audio, or combining video with voiceovers.
+allowed-tools: Read, Bash
+---
+# video lipsync
+sync video with audio using ai-powered lipsync or simple overlay.
+## methods
+### wav2lip (ai-powered)
+- uses replicate wav2lip model
+- matches lip movements to audio
+- works with url inputs
+- processing time: 30-60 seconds
+- best for: talking character videos
+### overlay (simple)
+- adds audio track to video using ffmpeg
+- no lip movement matching
+- works with local files
+- processing time: instant
+- best for: background music, voiceovers
+## usage
+### sync with method selection
+```bash
+bun run service/sync.ts sync <videoUrl> <audioUrl> [method] [output]
+```
+**parameters:**
+- `videoUrl` (required): video file path or url
+- `audioUrl` (required): audio file path or url
+- `method` (optional): "wav2lip" or "overlay" (default: overlay)
+- `output` (optional): output path (default: output-synced.mp4)
+**example:**
+```bash
+bun run service/sync.ts sync video.mp4 audio.mp3 overlay output.mp4
+```
+### wav2lip direct
+```bash
+bun run service/sync.ts wav2lip <videoUrl> <audioUrl>
+```
+**example:**
+```bash
+bun run service/sync.ts wav2lip https://example.com/character.mp4 https://example.com/voice.mp3
+```
+### overlay direct
+```bash
+bun run service/sync.ts overlay <videoPath> <audioPath> [output]
+```
+**example:**
+```bash
+bun run service/sync.ts overlay character.mp4 narration.mp3 final.mp4
+```
+## as library
+```typescript
+import { lipsync, lipsyncWav2Lip, lipsyncOverlay } from "./service/sync"
+// flexible sync
+const result = await lipsync({
+  videoUrl: "video.mp4",
+  audioUrl: "audio.mp3",
+  method: "wav2lip",
+  output: "synced.mp4"
+})
+// wav2lip specific
+const lipsynced = await lipsyncWav2Lip({
+  videoUrl: "https://example.com/video.mp4",
+  audioUrl: "https://example.com/audio.mp3"
+})
+// overlay specific
+const overlayed = await lipsyncOverlay(
+  "video.mp4",
+  "audio.mp3",
+  "output.mp4"
+)
+```
+## when to use each method
+### use wav2lip when:
+- creating talking character videos
+- lip movements must match speech
+- have urls for video and audio
+- quality is more important than speed
+### use overlay when:
+- adding background music
+- audio doesn't require lip sync
+- working with local files
+- need instant processing
+## typical workflow
+1. generate character image (image service)
+2. animate character (video service)
+3. generate voiceover (voice service)
+4. sync with wav2lip (this service)
+5. add captions (captions service)
+## tips
+**for wav2lip:**
+- use close-up character shots for best results
+- ensure audio is clear and well-paced
+- video should show face clearly
+- works best with 5-10 second clips
+**for overlay:**
+- match audio length to video length
+- ffmpeg will loop short audio or trim long audio
+- preserves original video quality
+## environment variables
+required (for wav2lip):
+- `REPLICATE_API_TOKEN` - for wav2lip model
+no special requirements for overlay method (ffmpeg must be installed)
+## error handling
+if wav2lip fails, the service automatically falls back to overlay method with a warning message.

package/action/sync/index.ts ADDED Viewed

@@ -0,0 +1,145 @@
+#!/usr/bin/env bun
+/**
+ * lipsync service - combines video with audio using various methods
+ * supports wav2lip, synclabs, and simple audio overlay
+ */
+import type { ActionMeta } from "../../cli/types";
+import { addAudio } from "../../lib/ffmpeg";
+import { runModel } from "../../lib/replicate";
+export const meta: ActionMeta = {
+  name: "sync",
+  type: "action",
+  description: "sync audio to video (lipsync)",
+  inputType: "video+audio",
+  outputType: "video",
+  schema: {
+    input: {
+      type: "object",
+      required: ["video", "audio"],
+      properties: {
+        video: {
+          type: "string",
+          format: "file-path",
+          description: "input video file or url",
+        },
+        audio: {
+          type: "string",
+          format: "file-path",
+          description: "audio file or url to sync",
+        },
+        method: {
+          type: "string",
+          enum: ["wav2lip", "overlay"],
+          default: "overlay",
+          description: "sync method (wav2lip requires urls)",
+        },
+        output: {
+          type: "string",
+          format: "file-path",
+          description: "output video path",
+        },
+      },
+    },
+    output: { type: "string", format: "file-path", description: "video path" },
+  },
+  async run(options) {
+    const { video, audio, method, output } = options as {
+      video: string;
+      audio: string;
+      method?: "wav2lip" | "overlay";
+      output?: string;
+    };
+    return lipsync({ videoUrl: video, audioUrl: audio, method, output });
+  },
+};
+// types
+export interface LipsyncOptions {
+  videoUrl: string;
+  audioUrl: string;
+  method?: "wav2lip" | "synclabs" | "overlay";
+  output?: string;
+}
+export interface Wav2LipOptions {
+  videoUrl: string;
+  audioUrl: string;
+}
+// core functions
+export async function lipsync(options: LipsyncOptions) {
+  const { videoUrl, audioUrl, method = "overlay", output } = options;
+  if (!videoUrl || !audioUrl) {
+    throw new Error("videoUrl and audioUrl are required");
+  }
+  console.log(`[sync] syncing video with audio using ${method}...`);
+  switch (method) {
+    case "wav2lip":
+      return await lipsyncWav2Lip({ videoUrl, audioUrl });
+    case "synclabs":
+      console.log(
+        `[sync] synclabs not yet implemented, falling back to overlay`,
+      );
+      return await lipsyncOverlay(videoUrl, audioUrl, output);
+    case "overlay":
+      return await lipsyncOverlay(videoUrl, audioUrl, output);
+    default:
+      throw new Error(`unknown lipsync method: ${method}`);
+  }
+}
+export async function lipsyncWav2Lip(options: Wav2LipOptions) {
+  const { videoUrl, audioUrl } = options;
+  console.log(`[sync] using wav2lip model...`);
+  try {
+    const output = await runModel("devxpy/cog-wav2lip", {
+      face: videoUrl,
+      audio: audioUrl,
+    });
+    console.log(`[sync] wav2lip completed`);
+    return output;
+  } catch (error) {
+    console.error(`[sync] wav2lip error:`, error);
+    throw error;
+  }
+}
+export async function lipsyncOverlay(
+  videoPath: string,
+  audioPath: string,
+  output: string = "output-synced.mp4",
+) {
+  console.log(`[sync] overlaying audio on video...`);
+  try {
+    const result = await addAudio({
+      videoPath,
+      audioPath,
+      output,
+    });
+    console.log(`[sync] overlay completed`);
+    return result;
+  } catch (error) {
+    console.error(`[sync] overlay error:`, error);
+    throw error;
+  }
+}
+// cli
+if (import.meta.main) {
+  const { runCli } = await import("../../cli/runner");
+  runCli(meta);
+}

package/action/transcribe/SKILL.md ADDED Viewed

@@ -0,0 +1,179 @@
+---
+name: audio-transcription
+description: transcribe audio to text or subtitles using groq whisper or fireworks with srt/vtt support. use when converting speech to text, generating subtitles, or need word-level timestamps for captions.
+allowed-tools: Read, Bash
+---
+# audio transcription
+convert audio to text or subtitle files using ai transcription.
+## providers
+### groq (ultra-fast)
+- uses whisper-large-v3
+- fastest transcription (~5-10 seconds)
+- plain text output
+- sentence-level timing
+- best for: quick transcripts, text extraction
+### fireworks (word-level)
+- uses whisper-v3
+- word-level timestamps
+- outputs srt or vtt format
+- precise subtitle timing
+- best for: captions, subtitles, timed transcripts
+## usage
+### basic transcription
+```bash
+bun run service/transcribe.ts <audioUrl> <provider> [outputPath]
+```
+**example:**
+```bash
+bun run service/transcribe.ts media/audio.mp3 groq
+bun run service/transcribe.ts media/audio.mp3 fireworks output.srt
+```
+### with output format
+```bash
+bun run lib/fireworks.ts <audioPath> <outputPath>
+```
+**example:**
+```bash
+bun run lib/fireworks.ts media/audio.mp3 output.srt
+```
+## as library
+```typescript
+import { transcribe } from "./service/transcribe"
+// groq transcription
+const groqResult = await transcribe({
+  audioUrl: "media/audio.mp3",
+  provider: "groq",
+  outputFormat: "text"
+})
+console.log(groqResult.text)
+// fireworks with srt
+const fireworksResult = await transcribe({
+  audioUrl: "media/audio.mp3",
+  provider: "fireworks",
+  outputFormat: "srt",
+  outputPath: "subtitles.srt"
+})
+console.log(fireworksResult.text)
+console.log(fireworksResult.outputPath) // subtitles.srt
+```
+## output formats
+### text (groq default)
+```
+This is the transcribed text from the audio file.
+All words in plain text format.
+```
+### srt (subtitle format)
+```
+1
+00:00:00,000 --> 00:00:02,500
+This is the first subtitle
+2
+00:00:02,500 --> 00:00:05,000
+This is the second subtitle
+```
+### vtt (web video text tracks)
+```
+WEBVTT
+00:00:00.000 --> 00:00:02.500
+This is the first subtitle
+00:00:02.500 --> 00:00:05.000
+This is the second subtitle
+```
+## when to use
+use this skill when:
+- converting speech to text
+- generating subtitles for videos
+- creating accessible content
+- need word-level timing for captions
+- extracting dialogue from media
+- preparing transcripts for analysis
+## provider comparison
+| feature | groq | fireworks |
+|---------|------|-----------|
+| speed | ultra-fast (5-10s) | moderate (15-30s) |
+| output | plain text | srt/vtt with timestamps |
+| timing | sentence-level | word-level |
+| use case | quick transcripts | precise subtitles |
+## typical workflows
+### for captions
+1. record or generate audio (voice service)
+2. transcribe with fireworks (this service)
+3. add captions to video (captions service)
+### for transcripts
+1. extract audio from video
+2. transcribe with groq (this service)
+3. use text for analysis or documentation
+## tips
+**provider selection:**
+- use **groq** when you just need the text fast
+- use **fireworks** when you need subtitle files
+- use **fireworks** for captions on social media videos
+**audio quality:**
+- clear audio transcribes more accurately
+- reduce background noise when possible
+- supports mp3, wav, m4a, and most audio formats
+**timing accuracy:**
+- fireworks provides word-level timestamps
+- perfect for lip-sync verification
+- great for precise subtitle placement
+## integration with other services
+perfect companion for:
+- **captions service** - auto-generate video subtitles
+- **voice service** - transcribe generated speech
+- **sync service** - verify audio timing
+## environment variables
+required:
+- `GROQ_API_KEY` - for groq provider
+- `FIREWORKS_API_KEY` - for fireworks provider
+## processing time
+- **groq**: 5-10 seconds (any audio length)
+- **fireworks**: 15-30 seconds (depending on audio length)
+## supported formats
+input audio:
+- mp3, wav, m4a, ogg, flac
+- video files (extracts audio automatically)
+output formats:
+- text (plain text)
+- srt (subtitles)
+- vtt (web video text tracks)