@framers/agentos-skills-registry 0.11.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@framers/agentos-skills-registry",
3
- "version": "0.11.0",
3
+ "version": "0.12.0",
4
4
  "files": [
5
5
  "dist",
6
6
  "registry",
@@ -1,12 +1,12 @@
1
1
  ---
2
2
  name: image-gen
3
- version: '1.0.0'
4
- description: Generate images from text prompts using AI image generation APIs like DALL-E, Stable Diffusion, or Midjourney.
3
+ version: '2.0.0'
4
+ description: Generate, edit, upscale, and variate images using the AgentOS multi-provider image pipeline with automatic fallback.
5
5
  author: Wunderland
6
6
  namespace: wunderland
7
7
  category: creative
8
- tags: [image-generation, ai-art, dall-e, stable-diffusion, creative, visual]
9
- requires_secrets: [openai.api_key]
8
+ tags: [image-generation, ai-art, dall-e, stable-diffusion, flux, replicate, stability, fal, creative, visual]
9
+ requires_secrets: []
10
10
  requires_tools: [generate_image]
11
11
  metadata:
12
12
  agentos:
@@ -15,36 +15,89 @@ metadata:
15
15
  homepage: https://platform.openai.com/docs/guides/images
16
16
  ---
17
17
 
18
- # AI Image Generation
18
+ # AI Image Generation Workflow
19
19
 
20
- Use the `generate_image` tool to create images from text descriptions. Two providers are supported:
21
- - **DALL-E 3** (OpenAI) — requires `OPENAI_API_KEY`
22
- - **Stability AI** (SDXL) — requires `STABILITY_API_KEY`
20
+ Use this skill when the user wants to create, edit, upscale, or create variations of images. AgentOS provides four high-level APIs that route to any configured provider with automatic fallback when multiple providers have credentials set.
21
+
22
+ ## The Four High-Level APIs
23
+
24
+ 1. **`generateImage()`** — Create new images from text prompts.
25
+ 2. **`editImage()`** — Transform existing images via img2img, inpainting, or outpainting.
26
+ 3. **`upscaleImage()`** — Increase resolution (2x or 4x super-resolution).
27
+ 4. **`variateImage()`** — Generate visual variations of an existing image.
23
28
 
24
29
  If the `generate_image` tool is not loaded, enable it with `extensions_enable image-generation`.
25
30
 
26
- Craft detailed, effective prompts that translate the user's creative vision into high-quality generated images.
31
+ ## Provider Selection Guide
32
+
33
+ Choose the provider based on the user's priority:
34
+
35
+ | Priority | Provider | Env Var | Best For |
36
+ |----------|----------|---------|----------|
37
+ | Quality | **OpenAI** (GPT-Image-1, DALL-E 3) | `OPENAI_API_KEY` | Highest fidelity, prompt adherence, text-in-image |
38
+ | Control | **Stability AI** (SDXL, SD3, Ultra) | `STABILITY_API_KEY` | Negative prompts, style presets, cfg/steps tuning |
39
+ | Speed | **BFL / Flux** (Flux Pro 1.1) | `BFL_API_KEY` | Fast generation with strong quality |
40
+ | Speed | **Fal** (Flux Dev) | `FAL_API_KEY` | Serverless Flux inference, low latency |
41
+ | Variety | **Replicate** (Flux, SDXL, community models) | `REPLICATE_API_TOKEN` | Access to thousands of community models |
42
+ | Cost | **OpenRouter** (routes to cheapest) | `OPENROUTER_API_KEY` | Provider-agnostic routing, best price |
43
+ | Privacy | **Local SD** (A1111 / ComfyUI) | `STABLE_DIFFUSION_LOCAL_BASE_URL` | Fully offline, no data leaves the machine |
44
+
45
+ When multiple providers are configured, AgentOS wraps them in a **FallbackImageProxy** — if the primary provider fails (rate limit, outage, etc.), the request automatically retries on the next available provider in priority order.
46
+
47
+ ## Operation Decision Tree
48
+
49
+ Use this to pick the right API for the user's request:
50
+
51
+ - **"Generate / create / draw / imagine"** -> `generateImage()`
52
+ - **"Edit / change / modify / transform"** -> `editImage()` with `mode: 'img2img'`
53
+ - **"Remove / fill in / fix this area"** -> `editImage()` with `mode: 'inpaint'` + mask
54
+ - **"Extend / expand the borders"** -> `editImage()` with `mode: 'outpaint'`
55
+ - **"Make it higher resolution / sharper"** -> `upscaleImage()` with `scale: 2` or `4`
56
+ - **"Show me variations / alternatives"** -> `variateImage()` with `n: 3-4`
57
+
58
+ ## Prompt Engineering Tips
59
+
60
+ A strong image prompt has five components:
61
+
62
+ 1. **Subject** — What is in the image. Be specific: "a red panda sitting on a mossy branch" not "an animal."
63
+ 2. **Style** — Artistic approach: photorealistic, watercolor, pixel art, oil painting, vector illustration, cinematic, anime.
64
+ 3. **Composition** — Camera angle and framing: close-up portrait, wide establishing shot, overhead flat lay, isometric.
65
+ 4. **Lighting and Color** — Mood through light: golden hour, dramatic side-lighting, neon glow, muted earth tones, high contrast.
66
+ 5. **Atmosphere** — Emotional tone: serene, ominous, whimsical, nostalgic, futuristic.
27
67
 
28
- When generating images, help the user refine their prompt for best results. A good image prompt includes: subject description, style (photorealistic, illustration, watercolor, etc.), composition (close-up, wide shot, overhead), lighting (natural, dramatic, soft), color palette, and mood/atmosphere. Offer prompt suggestions when the user's description is vague or underspecified.
68
+ Additional tips:
69
+ - Front-load the most important elements. Models weight earlier tokens more heavily.
70
+ - Use negative prompts (Stability, Local SD) to exclude unwanted elements: "no text, no watermark, no blurry."
71
+ - For text-in-image, OpenAI GPT-Image-1 is the most reliable. Other models struggle with legible text.
72
+ - Request `quality: 'hd'` for DALL-E 3 when detail matters (doubles cost).
73
+ - For consistent characters across multiple images, describe the character in detail each time or use img2img with a reference.
29
74
 
30
- Support different image sizes and aspect ratios based on the API capabilities (1024x1024, 1792x1024, 1024x1792 for DALL-E 3). For iterative refinement, maintain context from previous generations so the user can say "make it more vibrant" or "change the background to a beach." Save generated images to the filesystem when the user requests it, with descriptive filenames.
75
+ ## Sizes and Aspect Ratios
31
76
 
32
- When the user requests variations or edits of existing images, use the appropriate API endpoints (variations, inpainting) when available. For batch generation, create multiple variations with slightly different prompts to give the user options. Always inform the user of the model and settings used for each generation.
77
+ | Provider | Supported Sizes | Aspect Ratio Support |
78
+ |----------|----------------|---------------------|
79
+ | OpenAI | 1024x1024, 1792x1024, 1024x1792 | Via size selection |
80
+ | Stability | Flexible | `1:1`, `16:9`, `9:16`, `4:3`, `3:4`, etc. |
81
+ | Replicate/Flux | Flexible | `aspectRatio` parameter |
82
+ | Local SD | Any (multiples of 64) | Via `width`/`height` |
33
83
 
34
84
  ## Examples
35
85
 
36
- - "Generate an image of a cozy cabin in the mountains at sunset, watercolor style"
37
- - "Create a professional logo for a coffee shop called 'Bean There'"
38
- - "Make the previous image more dramatic with storm clouds"
39
- - "Generate 3 variations of a cyberpunk cityscape at night"
40
- - "Create a 16:9 landscape of a serene Japanese garden in spring"
86
+ - "Generate a photorealistic image of a cozy cabin in the mountains at sunset."
87
+ - "Create a professional logo for a coffee shop called 'Bean There' — vector illustration style, clean lines."
88
+ - "Edit this photo: make the sky more dramatic with storm clouds." (img2img)
89
+ - "Remove the person from the background of this product photo." (inpaint + mask)
90
+ - "Upscale this thumbnail to 4x resolution for print."
91
+ - "Show me 3 variations of this hero image with different color palettes."
92
+ - "Generate a 16:9 cinematic landscape of a neon-lit Tokyo street at night in the rain."
41
93
 
42
94
  ## Constraints
43
95
 
44
96
  - Image generation costs API credits per request; inform the user of approximate costs when possible.
45
- - Content policy restrictions apply: no realistic faces of real people, no violent/explicit content.
46
- - DALL-E 3 does not support exact image editing or inpainting; describe the full desired output.
97
+ - Content policy restrictions apply per provider: no realistic faces of real people, no violent/explicit content.
98
+ - DALL-E 3 does not support native inpainting use GPT-Image-1 or Stability for mask-based editing.
99
+ - Upscaling is not supported by OpenAI or OpenRouter — use Stability, Replicate, or Local SD.
47
100
  - Generated images may not perfectly match the prompt; iterative refinement is expected.
48
- - Maximum prompt length varies by model (DALL-E 3: 4,000 characters).
49
- - Image quality and style depend on the model version and generation parameters.
50
- - Generated images should not be represented as photographs or real events.
101
+ - Maximum prompt length varies by model (DALL-E 3: 4,000 chars; Stability: 2,000 chars).
102
+ - Local SD requires a running A1111 or ComfyUI instance with the API enabled.
103
+ - The fallback chain only activates when the primary provider fails; it does not merge results from multiple providers.