thebird 1.2.4 → 1.2.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/CLAUDE.md +121 -0
  2. package/package.json +1 -1
package/CLAUDE.md ADDED
@@ -0,0 +1,121 @@
1
+ # thebird Development Notes
2
+
3
+ ## Architecture Overview
4
+
5
+ **thebird** is an Anthropic SDK adapter that translates message format and tool calls to multiple LLM providers (Gemini, OpenAI-compatible APIs). It's a drop-in bridge — you write Anthropic-format code, thebird routes to any provider.
6
+
7
+ ### Message Translation
8
+
9
+ Anthropic format:
10
+ ```js
11
+ [{ role: 'user', content: [
12
+ { type: 'text', text: '...' },
13
+ { type: 'image', source: { type: 'base64', media_type: 'image/png', data: '...' } }
14
+ ] }]
15
+ ```
16
+
17
+ Translates to provider-native format:
18
+ - **Gemini**: `parts: [{ text: '...' }, { inlineData: { mimeType: '...', data: '...' } }]`
19
+ - **OpenAI**: `content: [{ type: 'text', text: '...' }, { type: 'image_url', image_url: { url: '...' } }]`
20
+
21
+ ### Tool Calling
22
+
23
+ Anthropic tool schema → provider native → normalized response back to Anthropic format.
24
+
25
+ Streaming events (all events are Anthropic-compatible):
26
+ - `text-delta`, `tool-use-start`, `tool-use-delta`, `message-start`, `message-stop`
27
+
28
+ ### Routing (Multi-Provider)
29
+
30
+ `createRouter()` picks provider+model per request based on:
31
+ 1. `taskType` (e.g., 'think', 'background', 'longContext')
32
+ 2. Token count vs `longContextThreshold`
33
+
34
+ Routes are defined as `provider,model` strings in config.
35
+
36
+ ### Transformers
37
+
38
+ Some providers need field adjustments:
39
+ - `deepseek`: strips `cache_control`, `repetition_penalty`
40
+ - `groq`: removes `top_k`
41
+ - `reasoning`: moves `reasoning_content` to `_reasoning`
42
+
43
+ Applied automatically during request building.
44
+
45
+ ## gembird — Image Generation via Hybrid Approach
46
+
47
+ **gembird** generates 4-view product images (front, back, left-side, right-side) using Gemini's web UI + HTTP API hybrid approach.
48
+
49
+ ### Workflow
50
+
51
+ 1. **Auth via Browser**: Playwright CDP connects to Chrome on `localhost:9222` and navigates to gemini.google.com. User is already logged in.
52
+ 2. **Session Capture**: Intercept network requests to extract session: `{ cookies, xsrf, fsid, template }`. One-time capture, cached in `.gemini-session.json`.
53
+ 3. **HTTP Generation**: For each view, POST prompt to Cloud Code Assist API with captured session. Stream response.
54
+ 4. **Parse Response**: Extract image URLs from streaming response via regex.
55
+ 5. **Download**: Download PNG from `lh3.googleusercontent.com` with cookies. Save to disk.
56
+
57
+ ### Why Hybrid?
58
+
59
+ Gemini API free tier has 0 quota for image generation. Browser provides free auth. HTTP API (Cloud Code Assist) provides faster generation than DOM polling + canvas extraction. Hybrid = free auth + fast generation + no quota limits.
60
+
61
+ ### Performance
62
+
63
+ - Browser connection: one-time, 30s
64
+ - HTTP generation per image: ~30-60s (Gemini's generation time, not polling overhead)
65
+ - Download per image: ~5s
66
+ - Total: ~4 images in 2-3 minutes (vs ~8 minutes with browser polling + canvas extraction)
67
+
68
+ ### CLI
69
+
70
+ ```bash
71
+ node index.js "prompt"
72
+ node index.js --image ref.png "prompt"
73
+ node index.js --output ./dir "prompt"
74
+ ```
75
+
76
+ Arguments parsed in index.js lines 88-115.
77
+
78
+ ### Observability
79
+
80
+ - Session cached in `.gemini-session.json` (expires after 1 hour)
81
+ - HTTP response streamed and parsed for image URLs
82
+ - Download errors logged with context
83
+ - Progress logged per view: `[1/4] front view...`
84
+
85
+ ## Development Constraints
86
+
87
+ - Max 200 lines per file (split before hitting limit)
88
+ - No comments
89
+ - No test files
90
+ - No hardcoded values
91
+ - Errors throw with context (no silent failures)
92
+ - Messages must stay Anthropic-compatible (other code depends on this contract)
93
+ - Tool schemas must translate cleanly to all providers
94
+
95
+ ## Testing
96
+
97
+ No test files. Validation via:
98
+ - `examples/basic-chat.js`: Single-turn Anthropic format → Gemini
99
+ - `examples/streaming.js`: Streaming events
100
+ - `examples/tool-use.js`: Tool calling and tool result handling
101
+ - `examples/vision.js`: Image blocks (base64, URL, inline)
102
+ - `examples/multi-turn.js`: Multi-turn chat with context
103
+
104
+ Run examples against real Gemini API to validate message translation.
105
+
106
+ ## Known Issues & Workarounds
107
+
108
+ - Gemini API doesn't support `tool_choice: 'required'` — treated as `'auto'`
109
+ - Some models have different tool naming conventions — check provider docs
110
+ - Streaming response parsing varies by provider — see lib/providers/ for details
111
+ - OAuth tokens expire — gembird uses browser session instead of capturing tokens
112
+
113
+ ## Files
114
+
115
+ - `lib/convert.js`: Message/tool translation logic
116
+ - `lib/client.js`: Provider client factory
117
+ - `lib/errors.js`: Error handling and retry logic
118
+ - `lib/providers/`: Provider-specific streaming implementations
119
+ - `index.js`: Main entry point, streaming and generation wrappers
120
+ - `index.d.ts`: TypeScript type definitions
121
+ - `examples/`: Working examples using Anthropic SDK format
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "thebird",
3
- "version": "1.2.4",
3
+ "version": "1.2.6",
4
4
  "description": "Anthropic SDK to Gemini streaming bridge — drop-in proxy that translates Anthropic message format and tool calls to Google Gemini",
5
5
  "main": "index.js",
6
6
  "types": "index.d.ts",