thebird 1.2.4 → 1.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/CLAUDE.md +116 -0
  2. package/package.json +1 -1
package/CLAUDE.md ADDED
@@ -0,0 +1,116 @@
1
+ # thebird Development Notes
2
+
3
+ ## Architecture Overview
4
+
5
+ **thebird** is an Anthropic SDK adapter that translates message format and tool calls to multiple LLM providers (Gemini, OpenAI-compatible APIs). It's a drop-in bridge — you write Anthropic-format code, thebird routes to any provider.
6
+
7
+ ### Message Translation
8
+
9
+ Anthropic format:
10
+ ```js
11
+ [{ role: 'user', content: [
12
+ { type: 'text', text: '...' },
13
+ { type: 'image', source: { type: 'base64', media_type: 'image/png', data: '...' } }
14
+ ] }]
15
+ ```
16
+
17
+ Translates to provider-native format:
18
+ - **Gemini**: `parts: [{ text: '...' }, { inlineData: { mimeType: '...', data: '...' } }]`
19
+ - **OpenAI**: `content: [{ type: 'text', text: '...' }, { type: 'image_url', image_url: { url: '...' } }]`
20
+
21
+ ### Tool Calling
22
+
23
+ Anthropic tool schema → provider native → normalized response back to Anthropic format.
24
+
25
+ Streaming events (all events are Anthropic-compatible):
26
+ - `text-delta`, `tool-use-start`, `tool-use-delta`, `message-start`, `message-stop`
27
+
28
+ ### Routing (Multi-Provider)
29
+
30
+ `createRouter()` picks provider+model per request based on:
31
+ 1. `taskType` (e.g., 'think', 'background', 'longContext')
32
+ 2. Token count vs `longContextThreshold`
33
+
34
+ Routes are defined as `provider,model` strings in config.
35
+
36
+ ### Transformers
37
+
38
+ Some providers need field adjustments:
39
+ - `deepseek`: strips `cache_control`, `repetition_penalty`
40
+ - `groq`: removes `top_k`
41
+ - `reasoning`: moves `reasoning_content` to `_reasoning`
42
+
43
+ Applied automatically during request building.
44
+
45
+ ## gembird — Image Generation via Browser
46
+
47
+ **gembird** generates 4-view product images (front, back, left-side, right-side) using Gemini's web UI.
48
+
49
+ ### Why Browser Automation?
50
+
51
+ Gemini API free tier has 0 quota for image generation. Web UI works without limits. Tradeoff: slower than API, depends on UI stability, but no quota needed.
52
+
53
+ ### Workflow
54
+
55
+ 1. Playwright CDP connection to Chrome on `localhost:9222`
56
+ 2. Navigate to gemini.google.com
57
+ 3. For each view:
58
+ - Type prompt asking for that view
59
+ - Poll for new `<img alt="AI generated">` (120s timeout)
60
+ - Extract via canvas: `canvas.drawImage(img) → canvas.toDataURL('image/png')`
61
+ - POST base64 to local HTTP save server
62
+ 4. Save 4 PNGs to output dir
63
+
64
+ ### CLI
65
+
66
+ ```bash
67
+ node index.js "prompt"
68
+ node index.js --image ref.png "prompt"
69
+ node index.js --output ./dir "prompt"
70
+ ```
71
+
72
+ Arguments parsed in index.js lines 144-172.
73
+
74
+ ### Observability
75
+
76
+ - Chrome console logs Gemini errors
77
+ - 120s timeout is conservative; real generation ~30-60s
78
+ - If extraction fails, check `img[alt*="AI generated"]` selector
79
+
80
+ ## Development Constraints
81
+
82
+ - Max 200 lines per file (split before hitting limit)
83
+ - No comments
84
+ - No test files
85
+ - No hardcoded values
86
+ - Errors throw with context (no silent failures)
87
+ - Messages must stay Anthropic-compatible (other code depends on this contract)
88
+ - Tool schemas must translate cleanly to all providers
89
+
90
+ ## Testing
91
+
92
+ No test files. Validation via:
93
+ - `examples/basic-chat.js`: Single-turn Anthropic format → Gemini
94
+ - `examples/streaming.js`: Streaming events
95
+ - `examples/tool-use.js`: Tool calling and tool result handling
96
+ - `examples/vision.js`: Image blocks (base64, URL, inline)
97
+ - `examples/multi-turn.js`: Multi-turn chat with context
98
+
99
+ Run examples against real Gemini API to validate message translation.
100
+
101
+ ## Known Issues & Workarounds
102
+
103
+ - Gemini API doesn't support `tool_choice: 'required'` — treated as `'auto'`
104
+ - Some models have different tool naming conventions — check provider docs
105
+ - Streaming response parsing varies by provider — see lib/providers/ for details
106
+ - OAuth tokens expire — gembird uses browser session instead of capturing tokens
107
+
108
+ ## Files
109
+
110
+ - `lib/convert.js`: Message/tool translation logic
111
+ - `lib/client.js`: Provider client factory
112
+ - `lib/errors.js`: Error handling and retry logic
113
+ - `lib/providers/`: Provider-specific streaming implementations
114
+ - `index.js`: Main entry point, streaming and generation wrappers
115
+ - `index.d.ts`: TypeScript type definitions
116
+ - `examples/`: Working examples using Anthropic SDK format
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thebird",
3
- "version": "1.2.4",
3
+ "version": "1.2.5",
4
4
  "description": "Anthropic SDK to Gemini streaming bridge — drop-in proxy that translates Anthropic message format and tool calls to Google Gemini",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",