@office-ai/aioncli-core 0.24.0 → 0.24.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,595 @@
1
+ # AWS Bedrock Integration Implementation Plan
2
+
3
+ ## Overview
4
+
5
+ Add AWS Bedrock support to aioncli using the unified Converse API, with priority
6
+ support for Anthropic Claude model series and multi-region deployment.
7
+
8
+ ## Requirements Confirmation
9
+
10
+ - **Region Support**: Multi-region (globally available)
11
+ - **Priority Models**: Anthropic Claude (3.5/3.7 Sonnet)
12
+ - **Implementation Scope**: Complete implementation (text generation, tool
13
+ calling, streaming responses, token counting, embedding)
14
+
15
+ ## Supported Model List
16
+
17
+ **Phase 1: Anthropic Claude model series only**
18
+
19
+ ### Cross-Region Models (Cross-Region Inference Profiles, Recommended)
20
+
21
+ These models use the `global.` prefix and can be called from any AWS region,
22
+ providing optimal availability and fault tolerance.
23
+
24
+ - `global.anthropic.claude-opus-4-5-20251101-v1:0` - Claude Opus 4.5 (most
25
+ powerful)
26
+ - `global.anthropic.claude-sonnet-4-5-20250929-v1:0` - Claude Sonnet 4.5
27
+ (recommended, default)
28
+ - `global.anthropic.claude-sonnet-4-20250514-v1:0` - Claude Sonnet 4
29
+ - `global.anthropic.claude-haiku-4-5-20251001-v1:0` - Claude Haiku 4.5 (fastest)
30
+
31
+ ### Regional Models
32
+
33
+ These models are only available in specific regions and provide backward
34
+ compatibility with older Claude versions.
35
+
36
+ - `anthropic.claude-3-5-sonnet-20241022-v2:0` - Claude 3.5 Sonnet v2
37
+ - `anthropic.claude-3-5-sonnet-20240620-v1:0` - Claude 3.5 Sonnet v1
38
+ - `anthropic.claude-3-opus-20240229-v1:0` - Claude 3 Opus
39
+ - `anthropic.claude-3-haiku-20240307-v1:0` - Claude 3 Haiku
40
+
41
+ **Future Phases**: Extend support to Amazon Titan, Meta Llama, and Mistral
42
+ models as needed.
43
+
44
+ ## Core Architecture Design
45
+
46
+ ### 1. ContentGenerator Implementation
47
+
48
+ Create `BedrockContentGenerator` class implementing the `ContentGenerator`
49
+ interface:
50
+
51
+ ```typescript
52
+ // packages/core/src/core/bedrockContentGenerator.ts
53
+ export class BedrockContentGenerator implements ContentGenerator {
54
+ private client: BedrockRuntimeClient;
55
+ private model: string;
56
+ private region: string;
57
+
58
+ async generateContent(
59
+ request,
60
+ userPromptId,
61
+ ): Promise<GenerateContentResponse>;
62
+ async generateContentStream(
63
+ request,
64
+ userPromptId,
65
+ ): AsyncGenerator<GenerateContentResponse>;
66
+ async countTokens(request): Promise<CountTokensResponse>;
67
+ async embedContent(request): Promise<EmbedContentResponse>;
68
+ }
69
+ ```
70
+
71
+ ### 2. API Format Conversion
72
+
73
+ #### Gemini → Bedrock Converse Request Format
74
+
75
+ **Message Conversion**:
76
+
77
+ - Gemini:
78
+ `{role: 'user'|'model', parts: [{text}|{functionCall}|{functionResponse}]}`
79
+ - Bedrock:
80
+ `{role: 'user'|'assistant', content: [{text}|{toolUse}|{toolResult}]}`
81
+
82
+ **Tool Definition Conversion**:
83
+
84
+ - Gemini: `functionDeclarations` with `parameters` (JSON Schema)
85
+ - Bedrock: `toolSpec` with `inputSchema.json` (JSON Schema)
86
+
87
+ **System Instruction**:
88
+
89
+ - Gemini: `systemInstruction` field
90
+ - Bedrock: `system` array `[{text: '...'}]`
91
+
92
+ #### Bedrock Converse → Gemini Response Format
93
+
94
+ **Text Content**:
95
+
96
+ - Bedrock: `{content: [{text: '...'}]}`
97
+ - Gemini: `{parts: [{text: '...'}]}`
98
+
99
+ **Tool Calls**:
100
+
101
+ - Bedrock: `{content: [{toolUse: {toolUseId, name, input}}]}`
102
+ - Gemini: `{parts: [{functionCall: {id, name, args}}]}`
103
+
104
+ **Finish Reason Mapping**:
105
+
106
+ - `end_turn` → `STOP`
107
+ - `max_tokens` → `MAX_TOKENS`
108
+ - `stop_sequence` → `STOP`
109
+ - `tool_use` → `STOP`
110
+ - `content_filtered` → `SAFETY`
111
+
112
+ ### 3. Streaming Response Handling
113
+
114
+ Use `ConverseStreamCommand` to process event streams:
115
+
116
+ ```typescript
117
+ async *streamGenerator(stream) {
118
+ const toolCalls = new Map(); // Accumulate tool calls
119
+
120
+ for await (const event of stream) {
121
+ if (event.contentBlockStart) {
122
+ // Start new content block
123
+ }
124
+ if (event.contentBlockDelta) {
125
+ // Accumulate text/tool input
126
+ if (event.contentBlockDelta.delta?.text) {
127
+ yield convertTextDelta(event);
128
+ }
129
+ if (event.contentBlockDelta.delta?.toolUse) {
130
+ accumulateToolCall(event);
131
+ }
132
+ }
133
+ if (event.contentBlockStop) {
134
+ // Complete tool call, emit full result
135
+ yield finalizeToolCall(event);
136
+ }
137
+ if (event.metadata) {
138
+ // Token usage information
139
+ yield convertUsageMetadata(event);
140
+ }
141
+ }
142
+ }
143
+ ```
144
+
145
+ ### 4. Tool Call Handling
146
+
147
+ **Multi-turn Conversation Support**:
148
+
149
+ 1. User message → Model returns toolUse
150
+ 2. Convert to Gemini functionCall → CLI executes tool
151
+ 3. Tool result converts to toolResult → Send back to Bedrock
152
+ 4. Bedrock returns final response
153
+
154
+ **ID Matching**:
155
+
156
+ - Bedrock `toolUseId` ↔ Gemini `functionCall.id`
157
+ - Ensure tool call and response IDs are consistent
158
+
159
+ ## Key File Modifications
160
+
161
+ ### New Files
162
+
163
+ 1. **`/packages/core/src/core/bedrockContentGenerator.ts`** (~1800 lines)
164
+ - BedrockContentGenerator class implementation
165
+ - Format conversion methods
166
+ - Streaming response handling
167
+ - Error handling and retry logic
168
+
169
+ 2. **`/packages/core/src/core/bedrockContentGenerator.test.ts`** (~600 lines)
170
+ - Mock AWS SDK client
171
+ - Format conversion tests
172
+ - Tool calling tests
173
+ - Streaming response tests
174
+
175
+ ### Modified Files
176
+
177
+ 3. **`/packages/core/src/core/contentGenerator.ts`**
178
+ - Add `AuthType.USE_BEDROCK = 'bedrock'`
179
+ - Update `ContentGeneratorConfig` type to add `awsRegion?: string`
180
+ - Add Bedrock routing in `createContentGenerator()` factory function
181
+
182
+ 4. **`/packages/core/src/config/config.ts`**
183
+ - Add AWS environment variable detection in `createContentGeneratorConfig()`
184
+ - Read `AWS_REGION`, `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
185
+
186
+ 5. **`/packages/core/src/config/models.ts`**
187
+ - Add Bedrock model constants and validation
188
+ - Define
189
+ `DEFAULT_BEDROCK_MODEL = 'global.anthropic.claude-sonnet-4-5-20250929-v1:0'`
190
+
191
+ 6. **`/packages/core/package.json`**
192
+ - Add dependencies:
193
+ - `"@aws-sdk/client-bedrock-runtime": "^3.700.0"`
194
+ - `"@aws-sdk/credential-providers": "^3.700.0"` (for AWS Profile
195
+ authentication)
196
+
197
+ ## Implementation Details
198
+
199
+ ### AWS Authentication
200
+
201
+ **Simplest Approach: Fully rely on AWS SDK default credential chain**
202
+
203
+ ```typescript
204
+ import { BedrockRuntimeClient } from '@aws-sdk/client-bedrock-runtime';
205
+
206
+ // SDK automatically finds credentials, in priority order:
207
+ // 1. Environment variables: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN
208
+ // 2. AWS_PROFILE specified profile (from ~/.aws/credentials)
209
+ // 3. Default [default] profile (from ~/.aws/credentials)
210
+ const client = new BedrockRuntimeClient({
211
+ region: process.env.AWS_REGION || 'us-east-1',
212
+ });
213
+ ```
214
+
215
+ **User Configuration Examples**:
216
+
217
+ **Method 1: Environment Variables (suitable for temporary use or CI/CD)**
218
+
219
+ ```bash
220
+ export AWS_REGION="us-east-1"
221
+ export AWS_ACCESS_KEY_ID="AKIA..."
222
+ export AWS_SECRET_ACCESS_KEY="..."
223
+ npm run start
224
+ ```
225
+
226
+ **Method 2: AWS Profile (recommended, supports multi-account switching)**
227
+
228
+ ```bash
229
+ # ~/.aws/credentials file content:
230
+ [default]
231
+ aws_access_key_id = AKIA...
232
+ aws_secret_access_key = ...
233
+
234
+ [enterprise-ai]
235
+ aws_access_key_id = AKIA...
236
+ aws_secret_access_key = ...
237
+
238
+ # Use default profile
239
+ export AWS_REGION="us-east-1"
240
+ npm run start
241
+
242
+ # Use enterprise-ai profile
243
+ export AWS_REGION="ap-southeast-1"
244
+ export AWS_PROFILE="enterprise-ai"
245
+ npm run start
246
+ ```
247
+
248
+ **Advantages**:
249
+
250
+ - Minimal code (~5 lines)
251
+ - AWS SDK automatically handles all authentication logic
252
+ - Supports all AWS standard authentication methods (env vars, profiles, IAM
253
+ roles, etc.)
254
+ - Users don't need to learn new configuration methods
255
+
256
+ ### Token Counting Implementation
257
+
258
+ **Background**:
259
+
260
+ - Bedrock responses include precise `usage.inputTokens/outputTokens/totalTokens`
261
+ - `countTokens` method is mainly used for media content estimation and error
262
+ logging
263
+ - Doesn't need to be particularly precise, simple estimation is sufficient
264
+
265
+ **Implementation**:
266
+
267
+ ```typescript
268
+ async countTokens(request: CountTokensParameters): Promise<CountTokensResponse> {
269
+ // Extract all text content
270
+ const text = request.contents
271
+ .flatMap(c => c.parts)
272
+ .filter(p => 'text' in p)
273
+ .map(p => p.text)
274
+ .join('');
275
+
276
+ // Simple estimation: 1 token ≈ 4 characters (suitable for English and code)
277
+ // Actual tokens for Claude models will differ slightly, but sufficient for estimation
278
+ const totalTokens = Math.ceil(text.length / 4);
279
+
280
+ return { totalTokens };
281
+ }
282
+ ```
283
+
284
+ **Note**: This method is for estimation only; actual token usage is based on the
285
+ usage in API responses.
286
+
287
+ ### Embedding Support
288
+
289
+ **Phase 1: Not Implemented**:
290
+
291
+ - Claude models don't provide embedding capability
292
+ - `embedContent` method is not actually used in the CLI
293
+ - Simply return "not supported" error
294
+
295
+ ```typescript
296
+ async embedContent(request: EmbedContentParameters): Promise<EmbedContentResponse> {
297
+ throw new Error(
298
+ 'Embedding is not supported for Claude models on Bedrock. ' +
299
+ 'Consider using Amazon Titan Embed models in future versions.'
300
+ );
301
+ }
302
+ ```
303
+
304
+ **Future Extension**: If embedding support is needed, Amazon Titan Embed models
305
+ can be added (using InvokeModel API).
306
+
307
+ ### Error Handling
308
+
309
+ **Throttling Retry** (ThrottlingException):
310
+
311
+ ```typescript
312
+ private async sendWithRetry(command, maxRetries = 3) {
313
+ for (let attempt = 0; attempt < maxRetries; attempt++) {
314
+ try {
315
+ return await this.client.send(command);
316
+ } catch (error) {
317
+ if (error.name === 'ThrottlingException' && attempt < maxRetries - 1) {
318
+ await sleep(Math.pow(2, attempt) * 1000); // Exponential backoff
319
+ continue;
320
+ }
321
+ if (error.name === 'ValidationException') {
322
+ throw new Error(`Bedrock validation error: ${error.message}`);
323
+ }
324
+ throw error;
325
+ }
326
+ }
327
+ }
328
+ ```
329
+
330
+ ### JSON Mode Support
331
+
332
+ Bedrock doesn't support native JSON mode; use tool calling to simulate:
333
+
334
+ ```typescript
335
+ if (request.config?.responseJsonSchema) {
336
+ const jsonTool = {
337
+ toolSpec: {
338
+ name: 'respond_in_schema',
339
+ description: 'Response in JSON schema',
340
+ inputSchema: { json: request.config.responseJsonSchema },
341
+ },
342
+ };
343
+
344
+ // Force use of this tool
345
+ const command = new ConverseCommand({
346
+ modelId: this.model,
347
+ messages,
348
+ toolConfig: {
349
+ tools: [jsonTool],
350
+ toolChoice: { tool: { name: 'respond_in_schema' } },
351
+ },
352
+ });
353
+ }
354
+ ```
355
+
356
+ ## Configuration Examples
357
+
358
+ ### Environment Variable Configuration
359
+
360
+ ```bash
361
+ # Use AWS access keys
362
+ export AWS_REGION="us-east-1"
363
+ export AWS_ACCESS_KEY_ID="AKIA..."
364
+ export AWS_SECRET_ACCESS_KEY="..."
365
+
366
+ # Or use AWS Profile
367
+ export AWS_REGION="ap-southeast-1"
368
+ export AWS_PROFILE="my-profile"
369
+
370
+ # Start CLI
371
+ npm run start
372
+ ```
373
+
374
+ ### Model Selection
375
+
376
+ ```bash
377
+ # Use default cross-region Claude Sonnet 4.5 model
378
+ npm run start
379
+
380
+ # Specify specific cross-region model
381
+ npm run start -- --model global.anthropic.claude-opus-4-5-20251101-v1:0
382
+
383
+ # Use regional model
384
+ npm run start -- --model anthropic.claude-3-5-sonnet-20241022-v2:0
385
+
386
+ # Use Titan
387
+ npm run start -- --model amazon.titan-text-premier-v1:0
388
+ ```
389
+
390
+ ### View Available Models
391
+
392
+ ```bash
393
+ # List all Anthropic models in current region
394
+ aws bedrock list-foundation-models \
395
+ --region $AWS_REGION \
396
+ --by-provider Anthropic
397
+ ```
398
+
399
+ ## Testing Strategy
400
+
401
+ ### Unit Tests (vitest)
402
+
403
+ Mock AWS SDK client:
404
+
405
+ ```typescript
406
+ vi.mock('@aws-sdk/client-bedrock-runtime', () => ({
407
+ BedrockRuntimeClient: vi.fn(),
408
+ ConverseCommand: vi.fn(),
409
+ ConverseStreamCommand: vi.fn(),
410
+ }));
411
+
412
+ const mockClient = {
413
+ send: vi.fn(),
414
+ };
415
+
416
+ BedrockRuntimeClient.mockImplementation(() => mockClient);
417
+ ```
418
+
419
+ Test Coverage:
420
+
421
+ - ✅ Request format conversion (Gemini → Bedrock)
422
+ - ✅ Response format conversion (Bedrock → Gemini)
423
+ - ✅ Tool definition conversion
424
+ - ✅ Tool call/response conversion
425
+ - ✅ Streaming response accumulation
426
+ - ✅ Finish reason mapping
427
+ - ✅ Error handling (throttling, validation errors)
428
+ - ✅ Token counting estimation
429
+ - ✅ Embedding calls
430
+
431
+ ### Integration Tests
432
+
433
+ Requires real AWS credentials:
434
+
435
+ ```bash
436
+ # Set test credentials
437
+ export AWS_REGION="us-east-1"
438
+ export AWS_ACCESS_KEY_ID="..."
439
+ export AWS_SECRET_ACCESS_KEY="..."
440
+
441
+ # Run integration tests
442
+ npm run test:integration:bedrock
443
+ ```
444
+
445
+ Test Scenarios:
446
+
447
+ - Single-turn conversations
448
+ - Multi-turn conversations
449
+ - Tool calling (read files, execute commands)
450
+ - Streaming responses
451
+ - Different model families (Claude, Titan, Llama)
452
+
453
+ ## Validation Plan
454
+
455
+ ### End-to-End Testing
456
+
457
+ 1. **Basic Conversation**:
458
+
459
+ ```bash
460
+ npm run start
461
+ > Hello, please introduce yourself
462
+ # Verify: Normal response returned
463
+ ```
464
+
465
+ 2. **Tool Calling**:
466
+
467
+ ```bash
468
+ > Read the README.md file in the current directory
469
+ # Verify: Calls ReadFileTool, returns file content
470
+ ```
471
+
472
+ 3. **Multi-turn Conversation**:
473
+
474
+ ```bash
475
+ > Create a file named test.txt with content "Hello Bedrock"
476
+ # Verify: Calls WriteFileTool, confirms creation success
477
+ > Now read this file
478
+ # Verify: Calls ReadFileTool, returns correct content
479
+ ```
480
+
481
+ 4. **Streaming Response**:
482
+
483
+ ```bash
484
+ > Write a poem about cloud computing
485
+ # Verify: Character-by-character display, smooth experience
486
+ ```
487
+
488
+ 5. **Cross-Region Testing**:
489
+
490
+ ```bash
491
+ # Test Asia-Pacific region
492
+ export AWS_REGION="ap-southeast-1"
493
+ npm run start
494
+
495
+ # Test European region
496
+ export AWS_REGION="eu-west-1"
497
+ npm run start
498
+ ```
499
+
500
+ ### Performance Validation
501
+
502
+ - Response latency < 2 seconds (non-streaming)
503
+ - Streaming first byte latency < 500ms
504
+ - Token counting error < 10%
505
+ - Throttling retry success rate > 95%
506
+
507
+ ## Potential Challenges and Solutions
508
+
509
+ ### 1. Stricter Bedrock Throttling
510
+
511
+ **Issue**: Claude models on Bedrock have stricter rate limits (e.g., 10 req/min)
512
+
513
+ **Solutions**:
514
+
515
+ - Implement exponential backoff retry
516
+ - Provide friendly error messages suggesting users request quota increases
517
+ - Support multi-region failover (if multiple regions configured)
518
+
519
+ ### 2. Significant Tool Call Format Differences
520
+
521
+ **Issue**: Bedrock's toolUse/toolResult format differs significantly from Gemini
522
+
523
+ **Solutions**:
524
+
525
+ - Reference OpenAIContentGenerator's tool conversion logic
526
+ - Establish complete ID mapping mechanism
527
+ - Add detailed logging for debugging
528
+
529
+ ### 3. Different Schema Requirements for Different Models
530
+
531
+ **Issue**: Claude and Llama have different levels of JSON Schema support
532
+
533
+ **Solutions**:
534
+
535
+ - Implement schema sanitization function to remove unsupported fields
536
+ - Perform compatibility testing for each model family
537
+ - Document limitations of each model in documentation
538
+
539
+ ### 4. Embedding Only Supports Titan
540
+
541
+ **Issue**: Claude and Llama models don't provide embeddings
542
+
543
+ **Solutions**:
544
+
545
+ - Detect model type in `embedContent()`
546
+ - Return clear error message if not a Titan embedding model
547
+ - Suggest users switch to `amazon.titan-embed-text-v2:0`
548
+
549
+ ## Implementation Priority
550
+
551
+ ### Phase 1 (Core Functionality)
552
+
553
+ - ✅ BedrockContentGenerator basic structure
554
+ - ✅ Text generation (generateContent)
555
+ - ✅ Streaming responses (generateContentStream)
556
+ - ✅ Basic error handling
557
+ - ✅ Claude model support
558
+
559
+ ### Phase 2 (Tool Support)
560
+
561
+ - ✅ Tool definition conversion
562
+ - ✅ Tool calling and response handling
563
+ - ✅ Multi-turn conversation support
564
+ - ✅ Complete unit tests
565
+
566
+ ### Phase 3 (Enhanced Features)
567
+
568
+ - ✅ Token counting implementation (simple estimation)
569
+ - ✅ Throttling retry optimization
570
+ - ✅ Single region configuration (via AWS_REGION environment variable)
571
+
572
+ ### Phase 4 (Production Ready)
573
+
574
+ - ⏳ Integration tests (basic tests complete, can be extended)
575
+ - ✅ Documentation improvements (authentication docs added)
576
+ - ✅ Performance optimization (retry logic integrated)
577
+ - ✅ Error message localization (enhanceError provides friendly error messages)
578
+
579
+ ## Estimated Effort
580
+
581
+ **Phase 1 (Claude support only)**:
582
+
583
+ - Core implementation: ~1500 lines of code (BedrockContentGenerator)
584
+ - Test code: ~500 lines of code (unit tests + format conversion tests)
585
+ - Configuration changes: ~100 lines of code (AuthType, factory, models config)
586
+ - Documentation updates: ~300 lines of documentation
587
+
588
+ Total: Approximately 2400 lines of code
589
+
590
+ **Simplifications**:
591
+
592
+ - ❌ No support for Titan/Llama/Mistral models
593
+ - ❌ No Embedding functionality
594
+ - ✅ Token counting uses simple estimation
595
+ - ✅ Only implement environment variable and AWS Profile authentication