thebird 1.2.4 → 1.2.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +116 -0
- package/package.json +1 -1
package/CLAUDE.md
ADDED
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
# thebird Development Notes
|
|
2
|
+
|
|
3
|
+
## Architecture Overview
|
|
4
|
+
|
|
5
|
+
**thebird** is an Anthropic SDK adapter that translates message format and tool calls to multiple LLM providers (Gemini, OpenAI-compatible APIs). It's a drop-in bridge — you write Anthropic-format code, thebird routes to any provider.
|
|
6
|
+
|
|
7
|
+
### Message Translation
|
|
8
|
+
|
|
9
|
+
Anthropic format:
|
|
10
|
+
```js
|
|
11
|
+
[{ role: 'user', content: [
|
|
12
|
+
{ type: 'text', text: '...' },
|
|
13
|
+
{ type: 'image', source: { type: 'base64', media_type: 'image/png', data: '...' } }
|
|
14
|
+
] }]
|
|
15
|
+
```
|
|
16
|
+
|
|
17
|
+
Translates to provider-native format:
|
|
18
|
+
- **Gemini**: `parts: [{ text: '...' }, { inlineData: { mimeType: '...', data: '...' } }]`
|
|
19
|
+
- **OpenAI**: `content: [{ type: 'text', text: '...' }, { type: 'image_url', image_url: { url: '...' } }]`
|
|
20
|
+
|
|
21
|
+
### Tool Calling
|
|
22
|
+
|
|
23
|
+
Anthropic tool schema → provider native → normalized response back to Anthropic format.
|
|
24
|
+
|
|
25
|
+
Streaming events (all events are Anthropic-compatible):
|
|
26
|
+
- `text-delta`, `tool-use-start`, `tool-use-delta`, `message-start`, `message-stop`
|
|
27
|
+
|
|
28
|
+
### Routing (Multi-Provider)
|
|
29
|
+
|
|
30
|
+
`createRouter()` picks provider+model per request based on:
|
|
31
|
+
1. `taskType` (e.g., 'think', 'background', 'longContext')
|
|
32
|
+
2. Token count vs `longContextThreshold`
|
|
33
|
+
|
|
34
|
+
Routes are defined as `provider,model` strings in config.
|
|
35
|
+
|
|
36
|
+
### Transformers
|
|
37
|
+
|
|
38
|
+
Some providers need field adjustments:
|
|
39
|
+
- `deepseek`: strips `cache_control`, `repetition_penalty`
|
|
40
|
+
- `groq`: removes `top_k`
|
|
41
|
+
- `reasoning`: moves `reasoning_content` to `_reasoning`
|
|
42
|
+
|
|
43
|
+
Applied automatically during request building.
|
|
44
|
+
|
|
45
|
+
## gembird — Image Generation via Browser
|
|
46
|
+
|
|
47
|
+
**gembird** generates 4-view product images (front, back, left-side, right-side) using Gemini's web UI.
|
|
48
|
+
|
|
49
|
+
### Why Browser Automation?
|
|
50
|
+
|
|
51
|
+
Gemini API free tier has 0 quota for image generation. Web UI works without limits. Tradeoff: slower than API, depends on UI stability, but no quota needed.
|
|
52
|
+
|
|
53
|
+
### Workflow
|
|
54
|
+
|
|
55
|
+
1. Playwright CDP connection to Chrome on `localhost:9222`
|
|
56
|
+
2. Navigate to gemini.google.com
|
|
57
|
+
3. For each view:
|
|
58
|
+
- Type prompt asking for that view
|
|
59
|
+
- Poll for new `<img alt="AI generated">` (120s timeout)
|
|
60
|
+
- Extract via canvas: `canvas.drawImage(img) → canvas.toDataURL('image/png')`
|
|
61
|
+
- POST base64 to local HTTP save server
|
|
62
|
+
4. Save 4 PNGs to output dir
|
|
63
|
+
|
|
64
|
+
### CLI
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
node index.js "prompt"
|
|
68
|
+
node index.js --image ref.png "prompt"
|
|
69
|
+
node index.js --output ./dir "prompt"
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Arguments parsed in index.js lines 144-172.
|
|
73
|
+
|
|
74
|
+
### Observability
|
|
75
|
+
|
|
76
|
+
- Chrome console logs Gemini errors
|
|
77
|
+
- 120s timeout is conservative; real generation ~30-60s
|
|
78
|
+
- If extraction fails, check `img[alt*="AI generated"]` selector
|
|
79
|
+
|
|
80
|
+
## Development Constraints
|
|
81
|
+
|
|
82
|
+
- Max 200 lines per file (split before hitting limit)
|
|
83
|
+
- No comments
|
|
84
|
+
- No test files
|
|
85
|
+
- No hardcoded values
|
|
86
|
+
- Errors throw with context (no silent failures)
|
|
87
|
+
- Messages must stay Anthropic-compatible (other code depends on this contract)
|
|
88
|
+
- Tool schemas must translate cleanly to all providers
|
|
89
|
+
|
|
90
|
+
## Testing
|
|
91
|
+
|
|
92
|
+
No test files. Validation via:
|
|
93
|
+
- `examples/basic-chat.js`: Single-turn Anthropic format → Gemini
|
|
94
|
+
- `examples/streaming.js`: Streaming events
|
|
95
|
+
- `examples/tool-use.js`: Tool calling and tool result handling
|
|
96
|
+
- `examples/vision.js`: Image blocks (base64, URL, inline)
|
|
97
|
+
- `examples/multi-turn.js`: Multi-turn chat with context
|
|
98
|
+
|
|
99
|
+
Run examples against real Gemini API to validate message translation.
|
|
100
|
+
|
|
101
|
+
## Known Issues & Workarounds
|
|
102
|
+
|
|
103
|
+
- Gemini API doesn't support `tool_choice: 'required'` — treated as `'auto'`
|
|
104
|
+
- Some models have different tool naming conventions — check provider docs
|
|
105
|
+
- Streaming response parsing varies by provider — see lib/providers/ for details
|
|
106
|
+
- OAuth tokens expire — gembird uses browser session instead of capturing tokens
|
|
107
|
+
|
|
108
|
+
## Files
|
|
109
|
+
|
|
110
|
+
- `lib/convert.js`: Message/tool translation logic
|
|
111
|
+
- `lib/client.js`: Provider client factory
|
|
112
|
+
- `lib/errors.js`: Error handling and retry logic
|
|
113
|
+
- `lib/providers/`: Provider-specific streaming implementations
|
|
114
|
+
- `index.js`: Main entry point, streaming and generation wrappers
|
|
115
|
+
- `index.d.ts`: TypeScript type definitions
|
|
116
|
+
- `examples/`: Working examples using Anthropic SDK format
|
package/package.json
CHANGED