@ai-agent-tools/picgen 0.1.0-alpha.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,105 @@
1
+ # PicGen Alpha Proposal
2
+
3
+ PicGen is a lightweight image generation connector for AI agents. It lets users generate images from the current agent context through their own providers, API keys, and quota.
4
+
5
+ ## Scope
6
+
7
+ Alpha focuses only on image generation:
8
+
9
+ - OpenAI-compatible `/v1/images/generations`
10
+ - Gemini image API
11
+ - Gemini reference-image generation
12
+ - local CLI
13
+ - local Codex skill
14
+ - provider lifecycle management
15
+ - dry-run planning before paid generation
16
+
17
+ Out of scope for Alpha:
18
+
19
+ - video generation
20
+ - audio generation
21
+ - GUI configuration
22
+ - provider marketplace
23
+ - full Codex plugin packaging
24
+ - real image editing and variations
25
+
26
+ ## Configuration Model
27
+
28
+ PicGen uses four layers:
29
+
30
+ - `provider`: where requests go, including official or third-party channels
31
+ - `mode`: model preference such as fast, balanced, or premium
32
+ - `preset`: usage defaults such as poster, product shot, social cover
33
+ - `routing`: default provider, fallback providers, and default mode
34
+ - `capability`: whether a provider supports text-to-image, reference-image, or future workflows
35
+
36
+ Users should not need to provide model, resolution, aspect ratio, or quality on every request. Setup and presets hold those choices.
37
+
38
+ ## Provider Lifecycle
39
+
40
+ Providers can be managed repeatedly after initial setup:
41
+
42
+ ```text
43
+ add -> test -> enable/disable -> edit -> remove
44
+ ```
45
+
46
+ Disabled providers remain in config but are skipped by automatic routing.
47
+
48
+ `picgen setup` is a repeatable guided entry point. It should help users quick-add common providers, choose the default provider, choose a default generation preference, test providers, and add advanced custom providers without requiring them to understand resolution, aspect ratio, quality, or protocol details.
49
+
50
+ Provider `base_url` values are host-only. Users should not include `/v1` or `/v1beta`; protocol adapters append those paths internally.
51
+ Providers may optionally define `test_model` for health checks. This avoids hard-coding short-lived model names while still allowing lightweight connectivity tests.
52
+ Providers define `capabilities` so routing can skip unsupported providers. Old configs infer capabilities from protocol defaults.
53
+
54
+ ## Agent Invocation Policy
55
+
56
+ PicGen should be visible to agents, but should not silently spend quota.
57
+
58
+ - Explicit image generation request: call PicGen directly.
59
+ - Strong visual-output intent: ask for confirmation first.
60
+ - Weak visual discussion: suggest PicGen, do not call.
61
+
62
+ Use `picgen create --dry-run` to show the planned provider, model, preset, aspect ratio, quantity, and prompt before generation.
63
+ Manual CLI generation asks for confirmation before contacting a provider. `--yes` skips that confirmation for explicit user-driven calls.
64
+
65
+ Reference images are passed with repeated `--reference <path>` flags. Alpha supports reference images through Gemini `generateContent` by sending local files as inline image parts. OpenAI-compatible reference-image editing should be implemented as a separate adapter later; the `/v1/images/generations` adapter must not silently ignore reference images.
66
+
67
+ ## Alpha Commands
68
+
69
+ ```bash
70
+ picgen setup
71
+ picgen doctor --json
72
+ picgen create --dry-run "一张产品发布会主视觉"
73
+ picgen create --yes "一张产品发布会主视觉"
74
+ picgen create --dry-run --provider gemini_official --reference ./reference.png "基于参考图生成一张海报"
75
+ picgen provider list
76
+ picgen provider add
77
+ picgen provider test <name>
78
+ picgen provider prefer <name>
79
+ picgen provider enable <name>
80
+ picgen provider disable <name>
81
+ picgen provider remove <name>
82
+ picgen mode prefer <name>
83
+ picgen preset prefer <name>
84
+ ```
85
+
86
+ ## Current Status
87
+
88
+ The repository currently implements:
89
+
90
+ - TypeScript CLI skeleton
91
+ - default config and schema validation
92
+ - interactive provider add/edit flow
93
+ - provider enable/disable/remove/list
94
+ - provider test network checks
95
+ - provider/mode/preset preference commands
96
+ - doctor JSON output
97
+ - dry-run generation planning
98
+ - local output asset and metadata writing
99
+ - OpenAI-compatible image generation call
100
+ - Gemini generateContent image generation call
101
+ - Gemini reference-image generation call
102
+ - provider response redaction for generated image data and Gemini thought signatures
103
+ - routing tests
104
+
105
+ Keychain-backed API key storage and full plugin packaging are not implemented yet.
@@ -0,0 +1,159 @@
1
+ # PicGen Alpha Release Checklist
2
+
3
+ This checklist is for the first internal or friend-and-colleague trial of PicGen.
4
+
5
+ ## Install
6
+
7
+ ```bash
8
+ npm install -g @ai-agent-tools/picgen
9
+ picgen --help
10
+ picgen quickstart
11
+ ```
12
+
13
+ Node.js 20 or newer is required.
14
+
15
+ ## Agent Prompt
16
+
17
+ Send this to Codex, Trae, Claude Code, or a similar coding agent:
18
+
19
+ ```text
20
+ 请安装并体验 @ai-agent-tools/picgen:全局安装 npm install -g @ai-agent-tools/picgen,运行 picgen setup 配置,然后先 dry-run 预览,再确认生成一张测试图。如果我要用参考图,请使用 --reference <图片路径>。
21
+ ```
22
+
23
+ ## First Run
24
+
25
+ 1. Run setup:
26
+
27
+ ```bash
28
+ picgen setup
29
+ ```
30
+
31
+ 2. Use quick-add unless you already know the provider protocol details.
32
+
33
+ 3. Provider host URLs should be host-only:
34
+
35
+ ```text
36
+ https://www.pandai.vip
37
+ https://api.openai.com
38
+ https://generativelanguage.googleapis.com
39
+ ```
40
+
41
+ Do not include `/v1` or `/v1beta`.
42
+
43
+ 4. Set API keys in the shell or a local `.env` file:
44
+
45
+ ```bash
46
+ cp .env.example .env
47
+ ```
48
+
49
+ ```text
50
+ OPENAI_API_KEY=...
51
+ GEMINI_API_KEY=...
52
+ ```
53
+
54
+ 5. Check configuration:
55
+
56
+ ```bash
57
+ picgen doctor --json
58
+ ```
59
+
60
+ ## Safe Preview
61
+
62
+ Always start with dry-run:
63
+
64
+ ```bash
65
+ picgen create --dry-run "一张极简科技感产品海报"
66
+ ```
67
+
68
+ Dry-run does not call providers and does not spend quota.
69
+
70
+ ## Real Generation
71
+
72
+ After the preview looks right:
73
+
74
+ ```bash
75
+ picgen create "一张极简科技感产品海报"
76
+ ```
77
+
78
+ The CLI asks for confirmation before calling the provider. Use `--yes` only when you intentionally want to skip the prompt:
79
+
80
+ ```bash
81
+ picgen create --yes "一张极简科技感产品海报"
82
+ ```
83
+
84
+ ## Reference Image Trial
85
+
86
+ Reference images are supported through Gemini providers in Alpha:
87
+
88
+ ```bash
89
+ picgen create --dry-run --reference ./reference.png "基于参考图生成一张品牌海报"
90
+ picgen create --yes --reference ./reference.png "基于参考图生成一张品牌海报"
91
+ ```
92
+
93
+ If the default provider does not support reference images, PicGen routes to a capable fallback provider. If the user explicitly selects an unsupported provider, PicGen fails clearly instead of ignoring the reference image.
94
+
95
+ ## Expected Output
96
+
97
+ Generated images are saved locally under `outputs/picgen` by default. CLI output includes:
98
+
99
+ - `output_dir`
100
+ - `metadata_path`
101
+ - image path
102
+ - MIME type
103
+ - width and height when PicGen can read them
104
+
105
+ Provider image payloads and Gemini thought signatures are redacted from metadata.
106
+
107
+ ## Current Alpha Limits
108
+
109
+ - OpenAI-compatible `/v1/images/generations` supports text-to-image only.
110
+ - OpenAI reference images need a future `/v1/images/edits` adapter.
111
+ - Gemini may return PNG even when a preset says jpeg or webp; PicGen does not transcode output formats yet.
112
+ - API keys are read from environment variables or `.env`; keychain storage is not implemented.
113
+ - Full Codex plugin packaging is not implemented yet. Use the bundled skill instructions or CLI directly.
114
+ - Multi-reference limits are not model-specific yet.
115
+
116
+ ## Troubleshooting
117
+
118
+ `Missing API key environment variable`
119
+
120
+ Set the environment variable named in the error, or put it in `.env` in the current working directory.
121
+
122
+ `Provider host URL`
123
+
124
+ Use only the host. Do not add `/v1`, `/v1beta`, or endpoint paths.
125
+
126
+ `Provider "... " does not support reference-image`
127
+
128
+ Use a Gemini provider or remove `--reference`.
129
+
130
+ `Provider check failed`
131
+
132
+ Run:
133
+
134
+ ```bash
135
+ picgen provider test <provider-name> --json
136
+ ```
137
+
138
+ Check `base_url`, API key, model name, and provider availability.
139
+
140
+ `No enabled provider can satisfy...`
141
+
142
+ Run `picgen provider list`, enable a provider, add a fallback provider, or adjust the selected mode/model.
143
+
144
+ ## Release Gate
145
+
146
+ Before publishing:
147
+
148
+ ```bash
149
+ npm run typecheck
150
+ npm test
151
+ npm run build
152
+ npm pack --dry-run
153
+ ```
154
+
155
+ Publish when ready:
156
+
157
+ ```bash
158
+ npm publish --otp <code>
159
+ ```
@@ -0,0 +1,245 @@
1
+ # PicGen Agent Invocation Contract
2
+
3
+ This document defines how agents should call PicGen on behalf of users. It is the canonical behavior contract for Codex, Trae, Claude Code, and other agent integrations.
4
+
5
+ ## Purpose
6
+
7
+ PicGen lets non-technical users generate images directly inside an agent workflow without copying prompts into external image platforms. The agent understands the user's intent and context; PicGen handles routing, provider calls, local asset storage, and normalized results.
8
+
9
+ ## Responsibilities
10
+
11
+ The agent is responsible for:
12
+
13
+ - Deciding whether PicGen should be used.
14
+ - Turning the conversation context into a concise image prompt.
15
+ - Choosing an appropriate preset such as `poster`, `product-shot`, or `social-cover`.
16
+ - Passing user-selected local reference images when the user asks to continue from, edit from, or use an existing image.
17
+ - Running a dry-run before agent-initiated generation.
18
+ - Showing a user-friendly generation preview.
19
+ - Calling real generation only after confirmation, unless the user explicitly asked to skip confirmation.
20
+ - Showing local image previews or paths after generation.
21
+ - Loading generated images only when the user asks for analysis, editing, continuation, or comparison.
22
+
23
+ The PicGen CLI is responsible for:
24
+
25
+ - Loading user preferences and config.
26
+ - Resolving provider, model, mode, preset, and output settings.
27
+ - Matching the request to provider capabilities such as `text-to-image` and `reference-image`.
28
+ - Producing dry-run plans without calling providers.
29
+ - Calling providers for real generation.
30
+ - Downloading, decoding, and saving generated images as local files.
31
+ - Normalizing provider-specific response formats.
32
+ - Printing compact results to stdout.
33
+ - Writing diagnostics to metadata files without storing large image payloads.
34
+
35
+ ## Intent Levels
36
+
37
+ ### Explicit Generation Intent
38
+
39
+ Use PicGen when the user explicitly asks to generate, create, make, render, or produce an image, or when the user names PicGen.
40
+
41
+ Examples:
42
+
43
+ ```text
44
+ Use PicGen to generate a launch poster.
45
+ Create a social cover based on the current plan.
46
+ Generate two product shots from the description above.
47
+ ```
48
+
49
+ The default agent flow is dry-run, confirmation, then real generation.
50
+
51
+ ### Strong Visual Output Intent
52
+
53
+ When the user clearly wants a visual output but has not explicitly asked the agent to generate now, ask for confirmation before entering the PicGen workflow.
54
+
55
+ Example:
56
+
57
+ ```text
58
+ I can generate a poster preview from this concept. Would you like me to do that now?
59
+ ```
60
+
61
+ ### Weak Visual Discussion Intent
62
+
63
+ When the user is only discussing visual direction, mood, layout, or brand style, do not call PicGen. Suggest generation only if helpful.
64
+
65
+ ## Dry-run and Confirmation
66
+
67
+ Agent-initiated generation must run a dry-run first because real generation may spend user quota and send the prompt to a third-party provider.
68
+
69
+ Command pattern:
70
+
71
+ ```bash
72
+ picgen create --dry-run --preset poster "<prompt>"
73
+ ```
74
+
75
+ Do not expose the term `dry-run` to non-technical users by default. Present it as a generation preview or confirmation step.
76
+
77
+ The preview should summarize:
78
+
79
+ - Intended use or preset.
80
+ - Provider or channel.
81
+ - Model, when useful.
82
+ - Number of images.
83
+ - Aspect ratio.
84
+ - Local output behavior.
85
+
86
+ If the user explicitly says "generate directly", "do not ask", or equivalent, the agent may skip the user-facing confirmation step. The agent should still construct a plan internally.
87
+
88
+ ## Preferences and One-off Overrides
89
+
90
+ Long-term preferences come from setup and config:
91
+
92
+ - Default provider.
93
+ - Fallback providers.
94
+ - Default mode.
95
+ - Default preset.
96
+
97
+ `picgen create` flags are one-off overrides and must not change config:
98
+
99
+ ```bash
100
+ picgen create --provider gemini_official "<prompt>"
101
+ picgen create --model gemini-3-pro-image-preview "<prompt>"
102
+ picgen create --preset poster "<prompt>"
103
+ picgen create --mode premium "<prompt>"
104
+ picgen create --reference ./reference.png "<prompt>"
105
+ ```
106
+
107
+ Only explicit preference commands should change config:
108
+
109
+ ```bash
110
+ picgen provider prefer gemini_official
111
+ picgen mode prefer premium
112
+ picgen preset prefer social-cover
113
+ ```
114
+
115
+ If the user says "use Gemini this time", use a one-off override. If the user says "use Gemini by default from now on", update preferences.
116
+
117
+ ## Setup Simplicity
118
+
119
+ PicGen setup should minimize questions for non-technical users.
120
+
121
+ Initial setup should focus on:
122
+
123
+ - Preferred provider or channel.
124
+ - Whether the required API key environment variable is available.
125
+ - Default generation mode: fast, balanced, or high quality.
126
+
127
+ Initial setup should not require users to understand resolution, aspect ratio, quality, image count, response format, or protocol details. Presets and routing defaults should handle those choices.
128
+
129
+ Provider `base_url` values should be host-only. Users should not include `/v1` or `/v1beta`; PicGen appends protocol-specific paths internally.
130
+
131
+ Provider health checks may use a lightweight `test_model`. Gemini provider tests should use a text-only `generateContent` request so health checks validate connectivity without triggering image generation.
132
+
133
+ Providers should expose capabilities. At minimum:
134
+
135
+ - `text-to-image`: can generate from a text prompt.
136
+ - `reference-image`: can use one or more local images as generation references.
137
+
138
+ If capabilities are omitted from older configs, PicGen should infer defaults from the protocol. Gemini supports both `text-to-image` and `reference-image`; OpenAI-compatible `/v1/images/generations` supports `text-to-image` only.
139
+
140
+ Routing should skip providers that do not support the capability required by the request. If the user explicitly selects an unsupported provider, PicGen should fail clearly instead of silently ignoring the unsupported input.
141
+
142
+ ## Reference Images
143
+
144
+ Agents may pass local reference images when the user explicitly asks to use an existing image, continue from a generated image, create a variant, or use a visual reference.
145
+
146
+ Command pattern:
147
+
148
+ ```bash
149
+ picgen create --dry-run --provider gemini_official --reference ./reference.png --preset poster "<prompt>"
150
+ picgen create --provider gemini_official --reference ./reference.png --preset poster "<prompt>"
151
+ ```
152
+
153
+ `--reference` may be repeated for multiple local images.
154
+
155
+ Dry-run output should include only reference image paths, MIME types, and byte sizes. It must not print or expose image base64.
156
+
157
+ Alpha supports reference images through the Gemini adapter. If the selected provider uses the OpenAI-compatible `/v1/images/generations` adapter, agents should switch to a Gemini provider for that run or explain that OpenAI-compatible reference-image support is not implemented yet.
158
+
159
+ ## Output Asset Contract
160
+
161
+ PicGen must normalize provider responses into local image files.
162
+
163
+ Provider responses may include:
164
+
165
+ - Remote image URLs.
166
+ - Base64 image data.
167
+ - Inline image bytes.
168
+ - File references.
169
+ - Temporary download URLs.
170
+
171
+ PicGen should download, decode, or copy those outputs into local files and return local paths.
172
+
173
+ Default stdout should stay compact:
174
+
175
+ ```json
176
+ {
177
+ "ok": true,
178
+ "output_dir": "/path/to/output",
179
+ "images": [
180
+ {
181
+ "path": "/path/to/image-1.png",
182
+ "mime_type": "image/png",
183
+ "width": 1024,
184
+ "height": 1024,
185
+ "metadata_path": "/path/to/metadata.json"
186
+ }
187
+ ],
188
+ "metadata_path": "/path/to/metadata.json"
189
+ }
190
+ ```
191
+
192
+ Do not print base64, binary image data, or full provider responses to stdout. Store detailed responses in metadata files.
193
+
194
+ Metadata must redact large provider-only fields such as generated image base64 payloads and Gemini thought signatures. Metadata is for diagnostics; agents should not display provider responses to users unless they are debugging an explicit failure.
195
+
196
+ When PicGen can read the generated image dimensions, stdout and metadata should include `width` and `height` for each image. Agents should prefer these fields over reading image files just to check size or aspect ratio.
197
+
198
+ ## Provider-specific Generation Behavior
199
+
200
+ Gemini image generation should request image-only responses with:
201
+
202
+ ```json
203
+ {
204
+ "generationConfig": {
205
+ "responseModalities": ["IMAGE"]
206
+ }
207
+ }
208
+ ```
209
+
210
+ This keeps responses compact and avoids returning unnecessary text. Gemini provider health checks should not use image-only generation; they should remain text-only connectivity checks.
211
+
212
+ Gemini may return internal thought parts or thought signatures. PicGen should not expose these to users. If thought images are present, PicGen should save only non-thought output images as generation results.
213
+
214
+ ## Display and On-demand Loading
215
+
216
+ After generation, the agent should show image previews or local paths only.
217
+
218
+ The agent must not automatically read, attach, analyze, or resend generated images after generation. Load generated images only when the user asks to inspect, edit, continue from, or compare them.
219
+
220
+ When loading is needed, load only the specific referenced image or images, not the whole output directory.
221
+
222
+ This is an internal efficiency rule. Do not explain token or context management to ordinary users unless they ask.
223
+
224
+ ## Error Handling
225
+
226
+ Agents should provide actionable next steps and must not pretend generation succeeded.
227
+
228
+ Common cases:
229
+
230
+ - No provider is configured: guide the user to run `picgen setup`.
231
+ - API key is missing: name the required environment variable.
232
+ - Provider is disabled: suggest enabling it or using a one-off provider override.
233
+ - Unknown preset or mode: suggest available choices or the default.
234
+ - Unsupported model: suggest editing the provider or using another provider.
235
+ - Provider call failed: show a brief error and point to metadata or error logs.
236
+
237
+ After a paid provider call fails, do not silently retry with another paid provider unless the user has confirmed fallback behavior.
238
+
239
+ ## Privacy and Quota
240
+
241
+ Do not send the full conversation context to providers by default. Compress context into the minimal visual prompt needed for the generation.
242
+
243
+ Do not silently spend user quota. Agent-initiated real generation requires a preview and confirmation by default.
244
+
245
+ Users may explicitly request direct generation. Future config may control whether agents are allowed to skip confirmation.