ak-gemini 2.0.1 → 2.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/GUIDE.md ADDED
@@ -0,0 +1,1314 @@
1
+ # ak-gemini — Integration Guide
2
+
3
+ > A practical guide for rapidly adding AI capabilities to any Node.js codebase using `ak-gemini`.
4
+ > Covers every class, common patterns, best practices, and observability hooks.
5
+
6
+ ```sh
7
+ npm install ak-gemini
8
+ ```
9
+
10
+ **Requirements**: Node.js 18+, a `GEMINI_API_KEY` env var (or Vertex AI credentials).
11
+
12
+ ---
13
+
14
+ ## Table of Contents
15
+
16
+ 1. [Core Concepts](#core-concepts)
17
+ 2. [Authentication](#authentication)
18
+ 3. [Class Selection Guide](#class-selection-guide)
19
+ 4. [Message — Stateless AI Calls](#message--stateless-ai-calls)
20
+ 5. [Chat — Multi-Turn Conversations](#chat--multi-turn-conversations)
21
+ 6. [Transformer — Structured JSON Transformation](#transformer--structured-json-transformation)
22
+ 7. [ToolAgent — Agent with Custom Tools](#toolagent--agent-with-custom-tools)
23
+ 8. [CodeAgent — Agent That Writes and Runs Code](#codeagent--agent-that-writes-and-runs-code)
24
+ 9. [RagAgent — Document & Data Q&A](#ragagent--document--data-qa)
25
+ 10. [Embedding — Vector Embeddings](#embedding--vector-embeddings)
26
+ 11. [Google Search Grounding](#google-search-grounding)
27
+ 12. [Context Caching](#context-caching)
28
+ 13. [Observability & Usage Tracking](#observability--usage-tracking)
29
+ 14. [Thinking Configuration](#thinking-configuration)
30
+ 15. [Error Handling & Retries](#error-handling--retries)
31
+ 16. [Performance Tips](#performance-tips)
32
+ 17. [Common Integration Patterns](#common-integration-patterns)
33
+ 18. [Quick Reference](#quick-reference)
34
+
35
+ ---
36
+
37
+ ## Core Concepts
38
+
39
+ Every class in ak-gemini extends `BaseGemini`, which handles:
40
+
41
+ - **Authentication** — Gemini API key or Vertex AI service account
42
+ - **Chat sessions** — Managed conversation state with the model
43
+ - **Token tracking** — Input/output token counts after every call
44
+ - **Cost estimation** — Dollar estimates before sending
45
+ - **Few-shot seeding** — Inject example pairs to guide the model
46
+ - **Thinking config** — Control the model's internal reasoning budget
47
+ - **Safety settings** — Harassment and dangerous content filters (relaxed by default)
48
+
49
+ ```javascript
50
+ import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent } from 'ak-gemini';
51
+ // or
52
+ import AI from 'ak-gemini';
53
+ const t = new AI.Transformer({ ... });
54
+ ```
55
+
56
+ The default model is `gemini-2.5-flash`. Override with `modelName`:
57
+
58
+ ```javascript
59
+ new Chat({ modelName: 'gemini-2.5-pro' });
60
+ ```
61
+
62
+ ---
63
+
64
+ ## Authentication
65
+
66
+ ### Gemini API (default)
67
+
68
+ ```javascript
69
+ // Option 1: Environment variable (recommended)
70
+ // Set GEMINI_API_KEY in your .env or shell
71
+ new Chat();
72
+
73
+ // Option 2: Explicit key
74
+ new Chat({ apiKey: 'your-key' });
75
+ ```
76
+
77
+ ### Vertex AI
78
+
79
+ ```javascript
80
+ new Chat({
81
+ vertexai: true,
82
+ project: 'my-gcp-project', // or GOOGLE_CLOUD_PROJECT env var
83
+ location: 'us-central1', // or GOOGLE_CLOUD_LOCATION env var
84
+ labels: { app: 'myapp', env: 'prod' } // billing labels (Vertex AI only)
85
+ });
86
+ ```
87
+
88
+ Vertex AI uses Application Default Credentials. Run `gcloud auth application-default login` locally, or use a service account in production.
89
+
90
+ ---
91
+
92
+ ## Class Selection Guide
93
+
94
+ | I want to... | Use | Method |
95
+ |---|---|---|
96
+ | Get a one-off AI response (no history) | `Message` | `send()` |
97
+ | Have a back-and-forth conversation | `Chat` | `send()` |
98
+ | Transform JSON with examples + validation | `Transformer` | `send()` |
99
+ | Give the AI tools to call (APIs, DB, etc.) | `ToolAgent` | `chat()` / `stream()` |
100
+ | Let the AI write and run JavaScript | `CodeAgent` | `chat()` / `stream()` |
101
+ | Q&A over documents, files, or data | `RagAgent` | `chat()` / `stream()` |
102
+ | Generate vector embeddings | `Embedding` | `embed()` / `embedBatch()` |
103
+
104
+ **Rule of thumb**: Start with `Message` for the simplest integration. Move to `Chat` if you need history. Use `Transformer` when you need structured JSON output with validation. Use agents when the AI needs to take action.
105
+
106
+ ---
107
+
108
+ ## Message — Stateless AI Calls
109
+
110
+ The simplest class. Each `send()` call is independent — no conversation history is maintained. Ideal for classification, extraction, summarization, and any fire-and-forget AI call.
111
+
112
+ ```javascript
113
+ import { Message } from 'ak-gemini';
114
+
115
+ const msg = new Message({
116
+ systemPrompt: 'You are a sentiment classifier. Respond with: positive, negative, or neutral.'
117
+ });
118
+
119
+ const result = await msg.send('I love this product!');
120
+ console.log(result.text); // "positive"
121
+ console.log(result.usage); // { promptTokens, responseTokens, totalTokens, ... }
122
+ ```
123
+
124
+ ### Structured Output (JSON)
125
+
126
+ Force the model to return valid JSON matching a schema:
127
+
128
+ ```javascript
129
+ const extractor = new Message({
130
+ systemPrompt: 'Extract structured data from the input text.',
131
+ responseMimeType: 'application/json',
132
+ responseSchema: {
133
+ type: 'object',
134
+ properties: {
135
+ people: { type: 'array', items: { type: 'string' } },
136
+ places: { type: 'array', items: { type: 'string' } },
137
+ sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] }
138
+ },
139
+ required: ['people', 'places', 'sentiment']
140
+ }
141
+ });
142
+
143
+ const result = await extractor.send('Alice and Bob visited Paris. They had a wonderful time.');
144
+ console.log(result.data);
145
+ // { people: ['Alice', 'Bob'], places: ['Paris'], sentiment: 'positive' }
146
+ ```
147
+
148
+ Key difference from `Chat`: `result.data` contains the parsed JSON object. `result.text` contains the raw string.
149
+
150
+ ### When to Use Message
151
+
152
+ - Classification, tagging, or labeling
153
+ - Entity extraction
154
+ - Summarization
155
+ - Any call where previous context doesn't matter
156
+ - High-throughput pipelines where you process items independently
157
+
158
+ ---
159
+
160
+ ## Chat — Multi-Turn Conversations
161
+
162
+ Maintains conversation history across calls. The model remembers what was said earlier.
163
+
164
+ ```javascript
165
+ import { Chat } from 'ak-gemini';
166
+
167
+ const chat = new Chat({
168
+ systemPrompt: 'You are a helpful coding assistant.'
169
+ });
170
+
171
+ const r1 = await chat.send('What is a closure in JavaScript?');
172
+ console.log(r1.text);
173
+
174
+ const r2 = await chat.send('Can you give me an example?');
175
+ // The model remembers the closure topic from r1
176
+ console.log(r2.text);
177
+ ```
178
+
179
+ ### History Management
180
+
181
+ ```javascript
182
+ // Get conversation history
183
+ const history = chat.getHistory();
184
+
185
+ // Clear and start fresh (preserves system prompt)
186
+ await chat.clearHistory();
187
+ ```
188
+
189
+ ### When to Use Chat
190
+
191
+ - Interactive assistants and chatbots
192
+ - Multi-step reasoning where later questions depend on earlier answers
193
+ - Tutoring or coaching interactions
194
+ - Any scenario where context carries across messages
195
+
196
+ ---
197
+
198
+ ## Transformer — Structured JSON Transformation
199
+
200
+ The power tool for data pipelines. Show it examples of input → output mappings, then send new inputs. Includes validation, retry, and AI-powered error correction.
201
+
202
+ ```javascript
203
+ import { Transformer } from 'ak-gemini';
204
+
205
+ const t = new Transformer({
206
+ systemPrompt: 'Transform user profiles into marketing segments.',
207
+ sourceKey: 'INPUT', // key for input data in examples
208
+ targetKey: 'OUTPUT', // key for output data in examples
209
+ maxRetries: 3, // retry on validation failure
210
+ retryDelay: 1000, // ms between retries
211
+ });
212
+
213
+ // Seed with examples
214
+ await t.seed([
215
+ {
216
+ INPUT: { age: 25, spending: 'high', interests: ['tech', 'gaming'] },
217
+ OUTPUT: { segment: 'young-affluent-tech', confidence: 0.9, tags: ['early-adopter'] }
218
+ },
219
+ {
220
+ INPUT: { age: 55, spending: 'medium', interests: ['gardening', 'cooking'] },
221
+ OUTPUT: { segment: 'mature-lifestyle', confidence: 0.85, tags: ['home-focused'] }
222
+ }
223
+ ]);
224
+
225
+ // Transform new data
226
+ const result = await t.send({ age: 30, spending: 'low', interests: ['books', 'hiking'] });
227
+ // result → { segment: '...', confidence: ..., tags: [...] }
228
+ ```
229
+
230
+ ### Validation
231
+
232
+ Pass an async validator as the third argument to `send()`. If it throws, the Transformer retries with the error message fed back to the model:
233
+
234
+ ```javascript
235
+ const result = await t.send(
236
+ { age: 30, spending: 'low' },
237
+ {}, // options
238
+ async (output) => {
239
+ if (!output.segment) throw new Error('Missing segment field');
240
+ if (output.confidence < 0 || output.confidence > 1) {
241
+ throw new Error('Confidence must be between 0 and 1');
242
+ }
243
+ return output; // return the validated (or modified) output
244
+ }
245
+ );
246
+ ```
247
+
248
+ Or set a global validator in the constructor:
249
+
250
+ ```javascript
251
+ const t = new Transformer({
252
+ asyncValidator: async (output) => {
253
+ if (!output.id) throw new Error('Missing id');
254
+ return output;
255
+ }
256
+ });
257
+ ```
258
+
259
+ ### Self-Healing with `rebuild()`
260
+
261
+ When downstream code fails, feed the error back to the AI:
262
+
263
+ ```javascript
264
+ try {
265
+ await processPayload(result);
266
+ } catch (err) {
267
+ const fixed = await t.rebuild(result, err.message);
268
+ await processPayload(fixed); // try again with AI-corrected payload
269
+ }
270
+ ```
271
+
272
+ ### Loading Examples from a File
273
+
274
+ ```javascript
275
+ const t = new Transformer({
276
+ examplesFile: './training-data.json'
277
+ // JSON array of { INPUT: ..., OUTPUT: ... } objects
278
+ });
279
+ await t.seed(); // loads from file automatically
280
+ ```
281
+
282
+ ### Stateless Sends
283
+
284
+ Send without affecting the conversation history (useful for parallel processing):
285
+
286
+ ```javascript
287
+ const result = await t.send(payload, { stateless: true });
288
+ ```
289
+
290
+ ### When to Use Transformer
291
+
292
+ - ETL pipelines — transform data between formats
293
+ - API response normalization
294
+ - Content enrichment (add tags, categories, scores)
295
+ - Any structured data transformation where you can provide examples
296
+ - Batch processing with validation guarantees
297
+
298
+ ---
299
+
300
+ ## ToolAgent — Agent with Custom Tools
301
+
302
+ Give the model tools (functions) it can call. You define what tools exist and how to execute them. The agent handles the conversation loop — sending messages, receiving tool calls, executing them, feeding results back, until the model produces a final text answer.
303
+
304
+ ```javascript
305
+ import { ToolAgent } from 'ak-gemini';
306
+
307
+ const agent = new ToolAgent({
308
+ systemPrompt: 'You are a database assistant.',
309
+ tools: [
310
+ {
311
+ name: 'query_db',
312
+ description: 'Execute a read-only SQL query against the users database',
313
+ parametersJsonSchema: {
314
+ type: 'object',
315
+ properties: {
316
+ sql: { type: 'string', description: 'The SQL query to execute' }
317
+ },
318
+ required: ['sql']
319
+ }
320
+ },
321
+ {
322
+ name: 'send_email',
323
+ description: 'Send an email notification',
324
+ parametersJsonSchema: {
325
+ type: 'object',
326
+ properties: {
327
+ to: { type: 'string' },
328
+ subject: { type: 'string' },
329
+ body: { type: 'string' }
330
+ },
331
+ required: ['to', 'subject', 'body']
332
+ }
333
+ }
334
+ ],
335
+ toolExecutor: async (toolName, args) => {
336
+ switch (toolName) {
337
+ case 'query_db':
338
+ return await db.query(args.sql);
339
+ case 'send_email':
340
+ await mailer.send(args);
341
+ return { sent: true };
342
+ }
343
+ },
344
+ maxToolRounds: 10 // safety limit on tool-use loop iterations
345
+ });
346
+
347
+ const result = await agent.chat('How many users signed up this week? Email the count to admin@co.com');
348
+ console.log(result.text); // "There were 47 new signups this week. I've sent the email."
349
+ console.log(result.toolCalls); // [{ name: 'query_db', args: {...}, result: [...] }, { name: 'send_email', ... }]
350
+ ```
351
+
352
+ ### Streaming
353
+
354
+ Stream the agent's output in real-time — useful for showing progress in a UI:
355
+
356
+ ```javascript
357
+ for await (const event of agent.stream('Find the top 5 users by spend')) {
358
+ switch (event.type) {
359
+ case 'text': process.stdout.write(event.text); break;
360
+ case 'tool_call': console.log(`\nCalling ${event.toolName}...`); break;
361
+ case 'tool_result': console.log(`Result:`, event.result); break;
362
+ case 'done': console.log('\nUsage:', event.usage); break;
363
+ }
364
+ }
365
+ ```
366
+
367
+ ### Execution Gating
368
+
369
+ Control which tool calls are allowed at runtime:
370
+
371
+ ```javascript
372
+ const agent = new ToolAgent({
373
+ tools: [...],
374
+ toolExecutor: myExecutor,
375
+ onBeforeExecution: async (toolName, args) => {
376
+ if (toolName === 'delete_user') {
377
+ console.log('Blocked dangerous tool call');
378
+ return false; // deny execution
379
+ }
380
+ return true; // allow
381
+ },
382
+ onToolCall: (toolName, args) => {
383
+ // Notification callback — fires on every tool call (logging, metrics, etc.)
384
+ metrics.increment(`tool_call.${toolName}`);
385
+ }
386
+ });
387
+ ```
388
+
389
+ ### Stopping an Agent
390
+
391
+ Cancel mid-execution from a callback or externally:
392
+
393
+ ```javascript
394
+ // From a callback
395
+ onBeforeExecution: async (toolName, args) => {
396
+ if (shouldStop) {
397
+ agent.stop(); // stop after this round
398
+ return false;
399
+ }
400
+ return true;
401
+ }
402
+
403
+ // Externally (e.g., user cancel button, timeout)
404
+ setTimeout(() => agent.stop(), 60_000);
405
+ const result = await agent.chat('Do some work');
406
+ // result includes warning: "Agent was stopped"
407
+ ```
408
+
409
+ ### When to Use ToolAgent
410
+
411
+ - AI that needs to call APIs, query databases, or interact with external systems
412
+ - Workflow automation — the AI orchestrates a sequence of operations
413
+ - Research assistants that fetch and synthesize data from multiple sources
414
+ - Any scenario where you want the model to decide *which* tools to use and *when*
415
+
416
+ ---
417
+
418
+ ## CodeAgent — Agent That Writes and Runs Code
419
+
420
+ Instead of calling tools one by one, the model writes complete JavaScript scripts and executes them in a child process. This is powerful for tasks that require complex logic, file manipulation, or multi-step computation.
421
+
422
+ ```javascript
423
+ import { CodeAgent } from 'ak-gemini';
424
+
425
+ const agent = new CodeAgent({
426
+ workingDirectory: '/path/to/project',
427
+ importantFiles: ['package.json', 'src/config.js'], // injected into system prompt
428
+ timeout: 30_000, // per-execution timeout
429
+ maxRounds: 10, // max code execution cycles
430
+ keepArtifacts: true, // keep script files on disk after execution
431
+ });
432
+
433
+ const result = await agent.chat('Find all files larger than 1MB and list them sorted by size');
434
+ console.log(result.text); // Agent's summary
435
+ console.log(result.codeExecutions); // [{ code, output, stderr, exitCode, purpose }]
436
+ ```
437
+
438
+ ### How It Works
439
+
440
+ 1. On `init()`, the agent scans the working directory and gathers codebase context (file tree, package.json, key files, importantFiles)
441
+ 2. This context is injected into the system prompt so the model understands the project
442
+ 3. The model writes JavaScript using an internal `execute_code` tool
443
+ 4. Code is saved to a `.mjs` file and run in a Node.js child process that inherits `process.env`
444
+ 5. stdout/stderr feeds back to the model
445
+ 6. The model decides if more work is needed (up to `maxRounds` cycles)
446
+
447
+ ### Streaming
448
+
449
+ ```javascript
450
+ for await (const event of agent.stream('Refactor the auth module to use async/await')) {
451
+ switch (event.type) {
452
+ case 'text': process.stdout.write(event.text); break;
453
+ case 'code': console.log('\n--- Executing code ---'); break;
454
+ case 'output': console.log(event.stdout); break;
455
+ case 'done': console.log('\nDone!', event.usage); break;
456
+ }
457
+ }
458
+ ```
459
+
460
+ ### Execution Gating & Notifications
461
+
462
+ ```javascript
463
+ const agent = new CodeAgent({
464
+ workingDirectory: '.',
465
+ onBeforeExecution: async (code) => {
466
+ // Review code before it runs
467
+ if (code.includes('rm -rf')) return false; // deny
468
+ return true;
469
+ },
470
+ onCodeExecution: (code, output) => {
471
+ // Log every execution for audit
472
+ logger.info({ code: code.slice(0, 200), exitCode: output.exitCode });
473
+ }
474
+ });
475
+ ```
476
+
477
+ ### Retrieving Scripts
478
+
479
+ Get all scripts the agent wrote across all interactions:
480
+
481
+ ```javascript
482
+ const scripts = agent.dump();
483
+ // [{ fileName: 'agent-read-config.mjs', purpose: 'read-config', script: '...', filePath: '/path/...' }]
484
+ ```
485
+
486
+ ### When to Use CodeAgent
487
+
488
+ - File system operations — reading, writing, transforming files
489
+ - Data analysis — processing CSV, JSON, or log files
490
+ - Codebase exploration — finding patterns, counting occurrences, generating reports
491
+ - Prototyping — quickly testing ideas by having the AI write and run code
492
+ - Any task where the AI needs more flexibility than predefined tools provide
493
+
494
+ ---
495
+
496
+ ## RagAgent — Document & Data Q&A
497
+
498
+ Load documents and data into the model's context for grounded Q&A. Supports three input types that can be used together:
499
+
500
+ | Input Type | Option | What It Does |
501
+ |---|---|---|
502
+ | **Remote files** | `remoteFiles` | Uploaded via Google Files API — for PDFs, images, audio, video |
503
+ | **Local files** | `localFiles` | Read from disk as UTF-8 text — for md, json, csv, yaml, txt |
504
+ | **Local data** | `localData` | In-memory objects serialized as JSON |
505
+
506
+ ```javascript
507
+ import { RagAgent } from 'ak-gemini';
508
+
509
+ const agent = new RagAgent({
510
+ // Text files read directly from disk (fast, no upload)
511
+ localFiles: ['./docs/api-reference.md', './docs/architecture.md'],
512
+
513
+ // In-memory data
514
+ localData: [
515
+ { name: 'users', data: await db.query('SELECT * FROM users LIMIT 100') },
516
+ { name: 'config', data: JSON.parse(await fs.readFile('./config.json', 'utf-8')) },
517
+ ],
518
+
519
+ // Binary/media files uploaded via Files API
520
+ remoteFiles: ['./diagrams/architecture.png', './reports/q4.pdf'],
521
+ });
522
+
523
+ const result = await agent.chat('What authentication method does the API use?');
524
+ console.log(result.text); // Grounded answer citing the api-reference.md
525
+ ```
526
+
527
+ ### Dynamic Context
528
+
529
+ Add more context after initialization (each triggers a reinit):
530
+
531
+ ```javascript
532
+ await agent.addLocalFiles(['./new-doc.md']);
533
+ await agent.addLocalData([{ name: 'metrics', data: { uptime: 99.9 } }]);
534
+ await agent.addRemoteFiles(['./new-chart.png']);
535
+ ```
536
+
537
+ ### Inspecting Context
538
+
539
+ ```javascript
540
+ const ctx = agent.getContext();
541
+ // {
542
+ // remoteFiles: [{ name, displayName, mimeType, sizeBytes, uri, originalPath }],
543
+ // localFiles: [{ name, path, size }],
544
+ // localData: [{ name, type }]
545
+ // }
546
+ ```
547
+
548
+ ### Streaming
549
+
550
+ ```javascript
551
+ for await (const event of agent.stream('Summarize the architecture document')) {
552
+ if (event.type === 'text') process.stdout.write(event.text);
553
+ if (event.type === 'done') console.log('\nUsage:', event.usage);
554
+ }
555
+ ```
556
+
557
+ ### When to Use RagAgent
558
+
559
+ - Documentation Q&A — let users ask questions about your docs
560
+ - Data exploration — load database results or CSV exports and ask questions
561
+ - Code review — load source files and ask about patterns, bugs, or architecture
562
+ - Report analysis — load PDF reports and extract insights
563
+ - Any scenario where the AI needs to answer questions grounded in specific data
564
+
565
+ ### Choosing Input Types
566
+
567
+ | Data | Use |
568
+ |---|---|
569
+ | Plain text files (md, txt, json, csv, yaml) | `localFiles` — fastest, no API upload |
570
+ | In-memory objects, DB results, API responses | `localData` — serialized as JSON |
571
+ | PDFs, images, audio, video | `remoteFiles` — uploaded via Files API |
572
+
573
+ Prefer `localFiles` and `localData` when possible — they skip the upload step and initialize faster.
574
+
575
+ ---
576
+
577
+ ## Embedding — Vector Embeddings
578
+
579
+ Generate vector embeddings for similarity search, clustering, classification, and deduplication. The `Embedding` class uses Google's text embedding models and provides a simple API for single and batch operations.
580
+
581
+ ```javascript
582
+ import { Embedding } from 'ak-gemini';
583
+
584
+ const embedder = new Embedding({
585
+ modelName: 'gemini-embedding-001', // default
586
+ });
587
+ ```
588
+
589
+ ### Basic Embedding
590
+
591
+ ```javascript
592
+ const result = await embedder.embed('The quick brown fox jumps over the lazy dog');
593
+ console.log(result.values); // [0.012, -0.034, 0.056, ...] — 768 dimensions by default
594
+ console.log(result.values.length); // 768
595
+ ```
596
+
597
+ ### Batch Embedding
598
+
599
+ Embed multiple texts in a single API call for efficiency:
600
+
601
+ ```javascript
602
+ const texts = [
603
+ 'Machine learning fundamentals',
604
+ 'Deep neural networks',
605
+ 'How to bake sourdough bread',
606
+ ];
607
+
608
+ const results = await embedder.embedBatch(texts);
609
+ // results[0].values, results[1].values, results[2].values
610
+ ```
611
+
612
+ ### Task Types
613
+
614
+ Task types optimize embeddings for specific use cases:
615
+
616
+ ```javascript
617
+ // For documents being indexed
618
+ const docEmbedder = new Embedding({
619
+ taskType: 'RETRIEVAL_DOCUMENT',
620
+ title: 'API Reference' // title only applies to RETRIEVAL_DOCUMENT
621
+ });
622
+
623
+ // For search queries against those documents
624
+ const queryEmbedder = new Embedding({
625
+ taskType: 'RETRIEVAL_QUERY'
626
+ });
627
+
628
+ // Other task types
629
+ new Embedding({ taskType: 'SEMANTIC_SIMILARITY' });
630
+ new Embedding({ taskType: 'CLUSTERING' });
631
+ new Embedding({ taskType: 'CLASSIFICATION' });
632
+ ```
633
+
634
+ **Best practice**: Use `RETRIEVAL_DOCUMENT` when embedding content to store, and `RETRIEVAL_QUERY` when embedding the user's search query.
635
+
636
+ ### Output Dimensionality
637
+
638
+ Reduce embedding dimensions to save storage space (trade-off with accuracy):
639
+
640
+ ```javascript
641
+ // Constructor-level
642
+ const embedder = new Embedding({ outputDimensionality: 256 });
643
+
644
+ // Per-call override
645
+ const result = await embedder.embed('Hello', { outputDimensionality: 128 });
646
+ console.log(result.values.length); // 128
647
+ ```
648
+
649
+ Supported by `gemini-embedding-001` (not `text-embedding-001`).
650
+
651
+ ### Cosine Similarity
652
+
653
+ Compare two embeddings without an API call:
654
+
655
+ ```javascript
656
+ const [a, b] = await Promise.all([
657
+ embedder.embed('cats are great pets'),
658
+ embedder.embed('dogs are wonderful companions'),
659
+ ]);
660
+
661
+ const score = embedder.similarity(a.values, b.values);
662
+ // score ≈ 0.85 (semantically similar)
663
+ ```
664
+
665
+ Returns a value between -1 (opposite) and 1 (identical). Typical thresholds:
666
+ - `> 0.8` — very similar
667
+ - `0.5–0.8` — somewhat related
668
+ - `< 0.5` — different topics
669
+
670
+ ### Integration Pattern: Semantic Search
671
+
672
+ ```javascript
673
+ // Index phase
674
+ const documents = ['doc1 text...', 'doc2 text...', 'doc3 text...'];
675
+ const docEmbedder = new Embedding({ taskType: 'RETRIEVAL_DOCUMENT' });
676
+ const docVectors = await docEmbedder.embedBatch(documents);
677
+
678
+ // Search phase
679
+ const queryEmbedder = new Embedding({ taskType: 'RETRIEVAL_QUERY' });
680
+ const queryVector = await queryEmbedder.embed('how do I authenticate?');
681
+
682
+ // Find best match
683
+ const scores = docVectors.map((doc, i) => ({
684
+ index: i,
685
+ score: queryEmbedder.similarity(queryVector.values, doc.values)
686
+ }));
687
+ scores.sort((a, b) => b.score - a.score);
688
+ console.log('Best match:', documents[scores[0].index]);
689
+ ```
690
+
691
+ ### When to Use Embedding
692
+
693
+ - Semantic search — find documents similar to a query
694
+ - Deduplication — detect near-duplicate content
695
+ - Clustering — group similar items together
696
+ - Classification — compare against known category embeddings
697
+ - Recommendation — find items similar to user preferences
698
+
699
+ ---
700
+
701
+ ## Google Search Grounding
702
+
703
+ Ground model responses in real-time Google Search results. Available on **all classes** via `enableGrounding` — not just Transformer.
704
+
705
+ **Warning**: Google Search grounding costs approximately **$35 per 1,000 queries**. Use selectively.
706
+
707
+ ### Basic Usage
708
+
709
+ ```javascript
710
+ import { Chat } from 'ak-gemini';
711
+
712
+ const chat = new Chat({
713
+ enableGrounding: true
714
+ });
715
+
716
+ const result = await chat.send('What happened in tech news today?');
717
+ console.log(result.text); // Response grounded in current search results
718
+ ```
719
+
720
+ ### Grounding Metadata
721
+
722
+ When grounding is enabled, `getLastUsage()` includes source attribution:
723
+
724
+ ```javascript
725
+ const usage = chat.getLastUsage();
726
+
727
+ if (usage.groundingMetadata) {
728
+ // Search queries the model executed
729
+ console.log('Queries:', usage.groundingMetadata.webSearchQueries);
730
+
731
+ // Source citations
732
+ for (const chunk of usage.groundingMetadata.groundingChunks || []) {
733
+ if (chunk.web) {
734
+ console.log(`Source: ${chunk.web.title} — ${chunk.web.uri}`);
735
+ }
736
+ }
737
+ }
738
+ ```
739
+
740
+ ### Grounding Configuration
741
+
742
+ ```javascript
743
+ const chat = new Chat({
744
+ enableGrounding: true,
745
+ groundingConfig: {
746
+ // Exclude specific domains
747
+ excludeDomains: ['reddit.com', 'twitter.com'],
748
+
749
+ // Filter by time range (Gemini API only)
750
+ timeRangeFilter: {
751
+ startTime: '2025-01-01T00:00:00Z',
752
+ endTime: '2025-12-31T23:59:59Z'
753
+ }
754
+ }
755
+ });
756
+ ```
757
+
758
+ ### Grounding with ToolAgent
759
+
760
+ Grounding works alongside user-defined tools — both are merged into the tools array automatically:
761
+
762
+ ```javascript
763
+ const agent = new ToolAgent({
764
+ enableGrounding: true,
765
+ tools: [
766
+ { name: 'save_result', description: 'Save a research result', parametersJsonSchema: { type: 'object', properties: { title: { type: 'string' }, summary: { type: 'string' } }, required: ['title', 'summary'] } }
767
+ ],
768
+ toolExecutor: async (name, args) => {
769
+ if (name === 'save_result') return await db.insert(args);
770
+ }
771
+ });
772
+
773
+ // The agent can search the web AND call your tools
774
+ const result = await agent.chat('Research the latest AI safety developments and save the key findings');
775
+ ```
776
+
777
+ ### Per-Message Grounding Toggle (Transformer)
778
+
779
+ Transformer supports toggling grounding per-message without rebuilding the instance:
780
+
781
+ ```javascript
782
+ const t = new Transformer({ enableGrounding: false });
783
+
784
+ // Enable grounding for just this call
785
+ const result = await t.send(payload, { enableGrounding: true });
786
+
787
+ // Back to no grounding for subsequent calls
788
+ ```
789
+
790
+ ### When to Use Grounding
791
+
792
+ - Questions about current events, recent news, or real-time data
793
+ - Fact-checking or verification tasks
794
+ - Research assistants that need up-to-date information
795
+ - Any scenario where the model's training data cutoff is a limitation
796
+
797
+ ---
798
+
799
+ ## Context Caching
800
+
801
+ Cache system prompts, documents, or tool definitions to reduce costs when making many API calls with the same large context. Cached tokens are billed at a reduced rate.
802
+
803
+ ### When Context Caching Helps
804
+
805
+ - **Large system prompts** reused across many calls
806
+ - **RagAgent** with the same document set serving many queries
807
+ - **ToolAgent** with many tool definitions
808
+ - Any scenario with high token count in repeated context
809
+
810
+ ### Create and Use a Cache
811
+
812
+ ```javascript
813
+ import { Chat } from 'ak-gemini';
814
+
815
+ const chat = new Chat({
816
+ systemPrompt: veryLongSystemPrompt // e.g., 10,000+ tokens
817
+ });
818
+
819
+ // Create a cache (auto-uses this instance's model and systemPrompt)
820
+ const cache = await chat.createCache({
821
+ ttl: '3600s', // 1 hour
822
+ displayName: 'my-app-system-prompt'
823
+ });
824
+
825
+ console.log(cache.name); // Server-generated resource name
826
+ console.log(cache.expireTime); // When it expires
827
+
828
+ // Attach the cache to this instance
829
+ await chat.useCache(cache.name);
830
+
831
+ // All subsequent calls use cached tokens at reduced cost
832
+ const r1 = await chat.send('Hello');
833
+ const r2 = await chat.send('Tell me more');
834
+ ```
835
+
836
+ ### Cache Management
837
+
838
+ ```javascript
839
+ // List all caches
840
+ const caches = await chat.listCaches();
841
+
842
+ // Get cache details
843
+ const info = await chat.getCache(cache.name);
844
+ console.log(info.usageMetadata?.totalTokenCount);
845
+
846
+ // Extend TTL
847
+ await chat.updateCache(cache.name, { ttl: '7200s' });
848
+
849
+ // Delete when done
850
+ await chat.deleteCache(cache.name);
851
+ ```
852
+
853
+ ### Cache with Constructor
854
+
855
+ If you already have a cache name, pass it directly:
856
+
857
+ ```javascript
858
+ const chat = new Chat({
859
+ cachedContent: 'projects/my-project/locations/us-central1/cachedContents/abc123'
860
+ });
861
+ ```
862
+
863
+ ### What Can Be Cached
864
+
865
+ The `createCache()` config accepts:
866
+
867
+ | Field | Description |
868
+ |---|---|
869
+ | `systemInstruction` | System prompt (auto-populated from instance if not provided) |
870
+ | `contents` | Content messages to cache |
871
+ | `tools` | Tool declarations to cache |
872
+ | `toolConfig` | Tool configuration to cache |
873
+ | `ttl` | Time-to-live (e.g., `'3600s'`) |
874
+ | `displayName` | Human-readable label |
875
+
876
+ ### Cost Savings
877
+
878
+ Context caching reduces input token costs for cached content. The exact savings depend on the model — check [Google's pricing page](https://ai.google.dev/pricing) for current rates. The trade-off is the cache storage cost and the minimum cache size requirement.
879
+
880
+ **Rule of thumb**: Caching pays off when you make many calls with the same large context (system prompt + documents) within the cache TTL.
881
+
882
+ ---
883
+
884
+ ## Observability & Usage Tracking
885
+
886
+ Every class provides consistent observability hooks.
887
+
888
+ ### Token Usage
889
+
890
+ After every API call, get detailed token counts:
891
+
892
+ ```javascript
893
+ const usage = instance.getLastUsage();
894
+ // {
895
+ // promptTokens: 1250, // input tokens (cumulative across retries)
896
+ // responseTokens: 340, // output tokens (cumulative across retries)
897
+ // totalTokens: 1590, // total (cumulative)
898
+ // attempts: 1, // 1 = first try, 2+ = retries needed
899
+ // modelVersion: 'gemini-2.5-flash-001', // actual model that responded
900
+ // requestedModel: 'gemini-2.5-flash', // model you requested
901
+ // timestamp: 1710000000000
902
+ // }
903
+ ```
904
+
905
+ ### Cost Estimation
906
+
907
+ Estimate cost *before* sending:
908
+
909
+ ```javascript
910
+ const estimate = await instance.estimateCost('What is the meaning of life?');
911
+ // {
912
+ // inputTokens: 8,
913
+ // model: 'gemini-2.5-flash',
914
+ // pricing: { input: 0.15, output: 0.60 }, // per million tokens
915
+ // estimatedInputCost: 0.0000012,
916
+ // note: 'Output cost depends on response length'
917
+ // }
918
+ ```
919
+
920
+ Or just get the token count:
921
+
922
+ ```javascript
923
+ const { inputTokens } = await instance.estimate('some payload');
924
+ ```
925
+
926
+ ### Logging
927
+
928
+ All classes use [pino](https://github.com/pinojs/pino) for structured logging. Control the level:
929
+
930
+ ```javascript
931
+ // Per-instance
932
+ new Chat({ logLevel: 'debug' });
933
+
934
+ // Via environment
935
+ LOG_LEVEL=debug node app.js
936
+
937
+ // Via NODE_ENV (dev → debug, test → warn, prod → info)
938
+ ```
939
+
940
+ ### Agent Callbacks
941
+
942
+ ToolAgent and CodeAgent provide execution callbacks for building audit trails, metrics, and approval flows:
943
+
944
+ ```javascript
945
+ // ToolAgent
946
+ new ToolAgent({
947
+ onToolCall: (toolName, args) => {
948
+ // Fires on every tool call — use for logging, metrics
949
+ logger.info({ event: 'tool_call', tool: toolName, args });
950
+ },
951
+ onBeforeExecution: async (toolName, args) => {
952
+ // Fires before execution — return false to deny
953
+ // Use for approval flows, safety checks, rate limiting
954
+ return !blocklist.includes(toolName);
955
+ }
956
+ });
957
+
958
+ // CodeAgent
959
+ new CodeAgent({
960
+ onCodeExecution: (code, output) => {
961
+ // Fires after every code execution
962
+ logger.info({ event: 'code_exec', exitCode: output.exitCode, lines: code.split('\n').length });
963
+ },
964
+ onBeforeExecution: async (code) => {
965
+ // Review code before execution
966
+ if (code.includes('process.exit')) return false;
967
+ return true;
968
+ }
969
+ });
970
+ ```
971
+
972
+ ### Billing Labels (Vertex AI)
973
+
974
+ Tag API calls for cost attribution:
975
+
976
+ ```javascript
977
+ // Constructor-level (applies to all calls)
978
+ new Transformer({
979
+ vertexai: true,
980
+ project: 'my-project',
981
+ labels: { app: 'etl-pipeline', env: 'prod', team: 'data' }
982
+ });
983
+
984
+ // Per-message override
985
+ await transformer.send(payload, { labels: { job_id: 'abc123' } });
986
+ ```
987
+
988
+ ---
989
+
990
+ ## Thinking Configuration
991
+
992
+ Models like `gemini-2.5-flash` and `gemini-2.5-pro` support thinking — internal reasoning before answering. Control the budget:
993
+
994
+ ```javascript
995
+ // Disable thinking (default — fastest, cheapest)
996
+ new Chat({ thinkingConfig: { thinkingBudget: 0 } });
997
+
998
+ // Automatic thinking budget (model decides)
999
+ new Chat({ thinkingConfig: { thinkingBudget: -1 } });
1000
+
1001
+ // Fixed budget (in tokens)
1002
+ new Chat({ thinkingConfig: { thinkingBudget: 2048 } });
1003
+
1004
+ // Use ThinkingLevel enum
1005
+ import { ThinkingLevel } from 'ak-gemini';
1006
+ new Chat({ thinkingConfig: { thinkingLevel: ThinkingLevel.LOW } });
1007
+ ```
1008
+
1009
+ **When to enable thinking**: Complex reasoning, math, multi-step logic, code generation. **When to disable**: Simple classification, extraction, or chat where speed matters.
1010
+
1011
+ ---
1012
+
1013
+ ## Error Handling & Retries
1014
+
1015
+ ### Transformer Retries
1016
+
1017
+ The Transformer has built-in retry with exponential backoff when validation fails:
1018
+
1019
+ ```javascript
1020
+ const t = new Transformer({
1021
+ maxRetries: 3, // default: 3
1022
+ retryDelay: 1000 // default: 1000ms, doubles each retry
1023
+ });
1024
+ ```
1025
+
1026
+ Each retry feeds the validation error back to the model, giving it a chance to self-correct. The `usage` object reports cumulative tokens across all attempts:
1027
+
1028
+ ```javascript
1029
+ const result = await t.send(payload, {}, validator);
1030
+ const usage = t.getLastUsage();
1031
+ console.log(usage.attempts); // 2 = needed one retry
1032
+ ```
1033
+
1034
+ ### Rate Limiting (429 Errors)
1035
+
1036
+ The Gemini API returns 429 when rate limited. ak-gemini does not auto-retry 429s — handle them in your application layer:
1037
+
1038
+ ```javascript
1039
+ async function sendWithBackoff(instance, payload, maxRetries = 3) {
1040
+ for (let i = 0; i < maxRetries; i++) {
1041
+ try {
1042
+ return await instance.send(payload);
1043
+ } catch (err) {
1044
+ if (err.status === 429 && i < maxRetries - 1) {
1045
+ await new Promise(r => setTimeout(r, 2 ** i * 1000));
1046
+ continue;
1047
+ }
1048
+ throw err;
1049
+ }
1050
+ }
1051
+ }
1052
+ ```
1053
+
1054
+ ### CodeAgent Failure Limits
1055
+
1056
+ CodeAgent tracks consecutive failed executions. After `maxRetries` (default: 3) consecutive failures, the model summarizes what went wrong and asks for guidance:
1057
+
1058
+ ```javascript
1059
+ new CodeAgent({
1060
+ maxRetries: 5, // allow more failures before stopping
1061
+ });
1062
+ ```
1063
+
1064
+ ---
1065
+
1066
+ ## Performance Tips
1067
+
1068
+ ### Reuse Instances
1069
+
1070
+ Each instance maintains a chat session. Creating a new instance for every request wastes the system prompt tokens. Reuse instances when possible:
1071
+
1072
+ ```javascript
1073
+ // Bad — creates a new session every call
1074
+ app.post('/classify', async (req, res) => {
1075
+ const msg = new Message({ systemPrompt: '...' }); // new instance every request!
1076
+ const result = await msg.send(req.body.text);
1077
+ res.json(result);
1078
+ });
1079
+
1080
+ // Good — reuse the instance
1081
+ const classifier = new Message({ systemPrompt: '...' });
1082
+ app.post('/classify', async (req, res) => {
1083
+ const result = await classifier.send(req.body.text);
1084
+ res.json(result);
1085
+ });
1086
+ ```
1087
+
1088
+ ### Choose the Right Model
1089
+
1090
+ | Model | Speed | Cost | Best For |
1091
+ |---|---|---|---|
1092
+ | `gemini-2.0-flash-lite` | Fastest | Cheapest | Classification, extraction, simple tasks |
1093
+ | `gemini-2.0-flash` | Fast | Low | General purpose, good quality |
1094
+ | `gemini-2.5-flash` | Medium | Low | Best balance of speed and quality |
1095
+ | `gemini-2.5-pro` | Slow | High | Complex reasoning, code, analysis |
1096
+
1097
+ ### Use `Message` for Stateless Workloads
1098
+
1099
+ `Message` uses `generateContent()` under the hood — no chat session overhead. For pipelines processing thousands of items independently, `Message` is the right choice.
1100
+
1101
+ ### Use `localFiles` / `localData` over `remoteFiles`
1102
+
1103
+ For text-based content, `localFiles` and `localData` skip the Files API upload entirely. They're faster to initialize and don't require network calls for the file upload step.
1104
+
1105
+ ### Disable Thinking for Simple Tasks
1106
+
1107
+ Thinking tokens cost money and add latency. For classification, extraction, or simple formatting tasks, keep `thinkingBudget: 0` (the default).
1108
+
1109
+ ---
1110
+
1111
+ ## Common Integration Patterns
1112
+
1113
+ ### Pattern: API Endpoint Classifier
1114
+
1115
+ ```javascript
1116
+ import { Message } from 'ak-gemini';
1117
+
1118
+ const classifier = new Message({
1119
+ modelName: 'gemini-2.0-flash-lite', // fast + cheap
1120
+ systemPrompt: 'Classify support tickets. Respond with exactly one of: billing, technical, account, other.',
1121
+ });
1122
+
1123
+ app.post('/api/classify-ticket', async (req, res) => {
1124
+ const result = await classifier.send(req.body.text);
1125
+ res.json({ category: result.text.trim().toLowerCase() });
1126
+ });
1127
+ ```
1128
+
1129
+ ### Pattern: ETL Pipeline with Validation
1130
+
1131
+ ```javascript
1132
+ import { Transformer } from 'ak-gemini';
1133
+
1134
+ const normalizer = new Transformer({
1135
+ sourceKey: 'RAW',
1136
+ targetKey: 'NORMALIZED',
1137
+ maxRetries: 3,
1138
+ asyncValidator: async (output) => {
1139
+ if (!output.email?.includes('@')) throw new Error('Invalid email');
1140
+ if (!output.name?.trim()) throw new Error('Name is required');
1141
+ return output;
1142
+ }
1143
+ });
1144
+
1145
+ await normalizer.seed([
1146
+ { RAW: { nm: 'alice', mail: 'alice@co.com' }, NORMALIZED: { name: 'Alice', email: 'alice@co.com' } },
1147
+ ]);
1148
+
1149
+ for (const record of rawRecords) {
1150
+ const clean = await normalizer.send(record, { stateless: true });
1151
+ await db.insert('users', clean);
1152
+ }
1153
+ ```
1154
+
1155
+ ### Pattern: Conversational Assistant with Tools
1156
+
1157
+ ```javascript
1158
+ import { ToolAgent } from 'ak-gemini';
1159
+
1160
+ const assistant = new ToolAgent({
1161
+ systemPrompt: `You are a customer support agent for Acme Corp.
1162
+ You can look up orders and issue refunds. Always confirm before issuing refunds.`,
1163
+ tools: [
1164
+ {
1165
+ name: 'lookup_order',
1166
+ description: 'Look up an order by ID or customer email',
1167
+ parametersJsonSchema: {
1168
+ type: 'object',
1169
+ properties: {
1170
+ order_id: { type: 'string' },
1171
+ email: { type: 'string' }
1172
+ }
1173
+ }
1174
+ },
1175
+ {
1176
+ name: 'issue_refund',
1177
+ description: 'Issue a refund for an order',
1178
+ parametersJsonSchema: {
1179
+ type: 'object',
1180
+ properties: {
1181
+ order_id: { type: 'string' },
1182
+ amount: { type: 'number' },
1183
+ reason: { type: 'string' }
1184
+ },
1185
+ required: ['order_id', 'amount', 'reason']
1186
+ }
1187
+ }
1188
+ ],
1189
+ toolExecutor: async (toolName, args) => {
1190
+ if (toolName === 'lookup_order') return await orderService.lookup(args);
1191
+ if (toolName === 'issue_refund') return await orderService.refund(args);
1192
+ },
1193
+ onBeforeExecution: async (toolName, args) => {
1194
+ // Only allow refunds under $100 without human approval
1195
+ if (toolName === 'issue_refund' && args.amount > 100) {
1196
+ return false;
1197
+ }
1198
+ return true;
1199
+ }
1200
+ });
1201
+
1202
+ // In a chat endpoint
1203
+ const result = await assistant.chat(userMessage);
1204
+ ```
1205
+
1206
+ ### Pattern: Document Q&A Service
1207
+
1208
+ ```javascript
1209
+ import { RagAgent } from 'ak-gemini';
1210
+
1211
+ const docs = new RagAgent({
1212
+ localFiles: [
1213
+ './docs/getting-started.md',
1214
+ './docs/api-reference.md',
1215
+ './docs/faq.md',
1216
+ ],
1217
+ systemPrompt: 'You are a documentation assistant. Answer questions based on the docs. If the answer is not in the docs, say so.',
1218
+ });
1219
+
1220
+ app.post('/api/ask', async (req, res) => {
1221
+ const result = await docs.chat(req.body.question);
1222
+ res.json({ answer: result.text, usage: result.usage });
1223
+ });
1224
+ ```
1225
+
1226
+ ### Pattern: Data-Grounded Analysis
1227
+
1228
+ ```javascript
1229
+ import { RagAgent } from 'ak-gemini';
1230
+
1231
+ const analyst = new RagAgent({
1232
+ modelName: 'gemini-2.5-pro', // use a smarter model for analysis
1233
+ localData: [
1234
+ { name: 'sales_q4', data: await db.query('SELECT * FROM sales WHERE quarter = 4') },
1235
+ { name: 'targets', data: await db.query('SELECT * FROM quarterly_targets') },
1236
+ ],
1237
+ systemPrompt: 'You are a business analyst. Analyze the provided data and answer questions with specific numbers.',
1238
+ });
1239
+
1240
+ const result = await analyst.chat('Which regions missed their Q4 targets? By how much?');
1241
+ ```
1242
+
1243
+ ### Pattern: Few-Shot Any Class
1244
+
1245
+ Every class supports `seed()` for few-shot learning — not just Transformer:
1246
+
1247
+ ```javascript
1248
+ import { Chat } from 'ak-gemini';
1249
+
1250
+ const chat = new Chat({ systemPrompt: 'You are a SQL expert.' });
1251
+ await chat.seed([
1252
+ { PROMPT: 'Get all users', ANSWER: 'SELECT * FROM users;' },
1253
+ { PROMPT: 'Count orders by status', ANSWER: 'SELECT status, COUNT(*) FROM orders GROUP BY status;' },
1254
+ ]);
1255
+
1256
+ const result = await chat.send('Find users who signed up in the last 7 days');
1257
+ // Model follows the SQL-only response pattern from the examples
1258
+ ```
1259
+
1260
+ ---
1261
+
1262
+ ## Quick Reference
1263
+
1264
+ ### Imports
1265
+
1266
+ ```javascript
1267
+ // Named exports
1268
+ import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent, Embedding, BaseGemini, log } from 'ak-gemini';
1269
+ import { extractJSON, attemptJSONRecovery } from 'ak-gemini';
1270
+ import { ThinkingLevel, HarmCategory, HarmBlockThreshold } from 'ak-gemini';
1271
+
1272
+ // Default export (namespace)
1273
+ import AI from 'ak-gemini';
1274
+
1275
+ // CommonJS
1276
+ const { Transformer, Chat, Embedding } = require('ak-gemini');
1277
+ ```
1278
+
1279
+ ### Constructor Options (All Classes)
1280
+
1281
+ | Option | Type | Default |
1282
+ |---|---|---|
1283
+ | `modelName` | string | `'gemini-2.5-flash'` |
1284
+ | `systemPrompt` | string \| null \| false | varies by class |
1285
+ | `apiKey` | string | `GEMINI_API_KEY` env var |
1286
+ | `vertexai` | boolean | `false` |
1287
+ | `project` | string | `GOOGLE_CLOUD_PROJECT` env var |
1288
+ | `location` | string | `'global'` |
1289
+ | `chatConfig` | object | `{ temperature: 0.7, topP: 0.95, topK: 64 }` |
1290
+ | `thinkingConfig` | object | `{ thinkingBudget: 0 }` |
1291
+ | `maxOutputTokens` | number \| null | `50000` |
1292
+ | `logLevel` | string | based on `NODE_ENV` |
1293
+ | `labels` | object | `{}` (Vertex AI only) |
1294
+ | `enableGrounding` | boolean | `false` |
1295
+ | `groundingConfig` | object | `{}` |
1296
+ | `cachedContent` | string | `null` |
1297
+
1298
+ ### Methods Available on All Classes
1299
+
1300
+ | Method | Returns | Description |
1301
+ |---|---|---|
1302
+ | `init(force?)` | `Promise<void>` | Initialize chat session |
1303
+ | `seed(examples, opts?)` | `Promise<Array>` | Add few-shot examples |
1304
+ | `getHistory()` | `Array` | Get conversation history |
1305
+ | `clearHistory()` | `Promise<void>` | Clear conversation history |
1306
+ | `getLastUsage()` | `UsageData \| null` | Token usage from last call |
1307
+ | `estimate(payload)` | `Promise<{ inputTokens }>` | Estimate input tokens |
1308
+ | `estimateCost(payload)` | `Promise<object>` | Estimate cost in dollars |
1309
+ | `createCache(config?)` | `Promise<CachedContentInfo>` | Create a context cache |
1310
+ | `getCache(name)` | `Promise<CachedContentInfo>` | Get cache details |
1311
+ | `listCaches()` | `Promise<any>` | List all caches |
1312
+ | `updateCache(name, config?)` | `Promise<CachedContentInfo>` | Update cache TTL |
1313
+ | `deleteCache(name)` | `Promise<void>` | Delete a cache |
1314
+ | `useCache(name)` | `Promise<void>` | Attach a cache to this instance |