ak-gemini 2.0.2 → 2.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/GUIDE.md CHANGED
@@ -22,12 +22,15 @@ npm install ak-gemini
22
22
  7. [ToolAgent — Agent with Custom Tools](#toolagent--agent-with-custom-tools)
23
23
  8. [CodeAgent — Agent That Writes and Runs Code](#codeagent--agent-that-writes-and-runs-code)
24
24
  9. [RagAgent — Document & Data Q&A](#ragagent--document--data-qa)
25
- 10. [Observability & Usage Tracking](#observability--usage-tracking)
26
- 11. [Thinking Configuration](#thinking-configuration)
27
- 12. [Error Handling & Retries](#error-handling--retries)
28
- 13. [Performance Tips](#performance-tips)
29
- 14. [Common Integration Patterns](#common-integration-patterns)
30
- 15. [Quick Reference](#quick-reference)
25
+ 10. [Embedding Vector Embeddings](#embedding--vector-embeddings)
26
+ 11. [Google Search Grounding](#google-search-grounding)
27
+ 12. [Context Caching](#context-caching)
28
+ 13. [Observability & Usage Tracking](#observability--usage-tracking)
29
+ 14. [Thinking Configuration](#thinking-configuration)
30
+ 15. [Error Handling & Retries](#error-handling--retries)
31
+ 16. [Performance Tips](#performance-tips)
32
+ 17. [Common Integration Patterns](#common-integration-patterns)
33
+ 18. [Quick Reference](#quick-reference)
31
34
 
32
35
  ---
33
36
 
@@ -96,6 +99,7 @@ Vertex AI uses Application Default Credentials. Run `gcloud auth application-def
96
99
  | Give the AI tools to call (APIs, DB, etc.) | `ToolAgent` | `chat()` / `stream()` |
97
100
  | Let the AI write and run JavaScript | `CodeAgent` | `chat()` / `stream()` |
98
101
  | Q&A over documents, files, or data | `RagAgent` | `chat()` / `stream()` |
102
+ | Generate vector embeddings | `Embedding` | `embed()` / `embedBatch()` |
99
103
 
100
104
  **Rule of thumb**: Start with `Message` for the simplest integration. Move to `Chat` if you need history. Use `Transformer` when you need structured JSON output with validation. Use agents when the AI needs to take action.
101
105
 
@@ -570,6 +574,313 @@ Prefer `localFiles` and `localData` when possible — they skip the upload step
570
574
 
571
575
  ---
572
576
 
577
+ ## Embedding — Vector Embeddings
578
+
579
+ Generate vector embeddings for similarity search, clustering, classification, and deduplication. The `Embedding` class uses Google's text embedding models and provides a simple API for single and batch operations.
580
+
581
+ ```javascript
582
+ import { Embedding } from 'ak-gemini';
583
+
584
+ const embedder = new Embedding({
585
+ modelName: 'gemini-embedding-001', // default
586
+ });
587
+ ```
588
+
589
+ ### Basic Embedding
590
+
591
+ ```javascript
592
+ const result = await embedder.embed('The quick brown fox jumps over the lazy dog');
593
+ console.log(result.values); // [0.012, -0.034, 0.056, ...] — 768 dimensions by default
594
+ console.log(result.values.length); // 768
595
+ ```
596
+
597
+ ### Batch Embedding
598
+
599
+ Embed multiple texts in a single API call for efficiency:
600
+
601
+ ```javascript
602
+ const texts = [
603
+ 'Machine learning fundamentals',
604
+ 'Deep neural networks',
605
+ 'How to bake sourdough bread',
606
+ ];
607
+
608
+ const results = await embedder.embedBatch(texts);
609
+ // results[0].values, results[1].values, results[2].values
610
+ ```
611
+
612
+ ### Task Types
613
+
614
+ Task types optimize embeddings for specific use cases:
615
+
616
+ ```javascript
617
+ // For documents being indexed
618
+ const docEmbedder = new Embedding({
619
+ taskType: 'RETRIEVAL_DOCUMENT',
620
+ title: 'API Reference' // title only applies to RETRIEVAL_DOCUMENT
621
+ });
622
+
623
+ // For search queries against those documents
624
+ const queryEmbedder = new Embedding({
625
+ taskType: 'RETRIEVAL_QUERY'
626
+ });
627
+
628
+ // Other task types
629
+ new Embedding({ taskType: 'SEMANTIC_SIMILARITY' });
630
+ new Embedding({ taskType: 'CLUSTERING' });
631
+ new Embedding({ taskType: 'CLASSIFICATION' });
632
+ ```
633
+
634
+ **Best practice**: Use `RETRIEVAL_DOCUMENT` when embedding content to store, and `RETRIEVAL_QUERY` when embedding the user's search query.
635
+
636
+ ### Output Dimensionality
637
+
638
+ Reduce embedding dimensions to save storage space (trade-off with accuracy):
639
+
640
+ ```javascript
641
+ // Constructor-level
642
+ const embedder = new Embedding({ outputDimensionality: 256 });
643
+
644
+ // Per-call override
645
+ const result = await embedder.embed('Hello', { outputDimensionality: 128 });
646
+ console.log(result.values.length); // 128
647
+ ```
648
+
649
+ Supported by `gemini-embedding-001` (not `text-embedding-001`).
650
+
651
+ ### Cosine Similarity
652
+
653
+ Compare two embeddings without an API call:
654
+
655
+ ```javascript
656
+ const [a, b] = await Promise.all([
657
+ embedder.embed('cats are great pets'),
658
+ embedder.embed('dogs are wonderful companions'),
659
+ ]);
660
+
661
+ const score = embedder.similarity(a.values, b.values);
662
+ // score ≈ 0.85 (semantically similar)
663
+ ```
664
+
665
+ Returns a value between -1 (opposite) and 1 (identical). Typical thresholds:
666
+ - `> 0.8` — very similar
667
+ - `0.5–0.8` — somewhat related
668
+ - `< 0.5` — different topics
669
+
670
+ ### Integration Pattern: Semantic Search
671
+
672
+ ```javascript
673
+ // Index phase
674
+ const documents = ['doc1 text...', 'doc2 text...', 'doc3 text...'];
675
+ const docEmbedder = new Embedding({ taskType: 'RETRIEVAL_DOCUMENT' });
676
+ const docVectors = await docEmbedder.embedBatch(documents);
677
+
678
+ // Search phase
679
+ const queryEmbedder = new Embedding({ taskType: 'RETRIEVAL_QUERY' });
680
+ const queryVector = await queryEmbedder.embed('how do I authenticate?');
681
+
682
+ // Find best match
683
+ const scores = docVectors.map((doc, i) => ({
684
+ index: i,
685
+ score: queryEmbedder.similarity(queryVector.values, doc.values)
686
+ }));
687
+ scores.sort((a, b) => b.score - a.score);
688
+ console.log('Best match:', documents[scores[0].index]);
689
+ ```
690
+
691
+ ### When to Use Embedding
692
+
693
+ - Semantic search — find documents similar to a query
694
+ - Deduplication — detect near-duplicate content
695
+ - Clustering — group similar items together
696
+ - Classification — compare against known category embeddings
697
+ - Recommendation — find items similar to user preferences
698
+
699
+ ---
700
+
701
+ ## Google Search Grounding
702
+
703
+ Ground model responses in real-time Google Search results. Available on **all classes** via `enableGrounding` — not just Transformer.
704
+
705
+ **Warning**: Google Search grounding costs approximately **$35 per 1,000 queries**. Use selectively.
706
+
707
+ ### Basic Usage
708
+
709
+ ```javascript
710
+ import { Chat } from 'ak-gemini';
711
+
712
+ const chat = new Chat({
713
+ enableGrounding: true
714
+ });
715
+
716
+ const result = await chat.send('What happened in tech news today?');
717
+ console.log(result.text); // Response grounded in current search results
718
+ ```
719
+
720
+ ### Grounding Metadata
721
+
722
+ When grounding is enabled, `getLastUsage()` includes source attribution:
723
+
724
+ ```javascript
725
+ const usage = chat.getLastUsage();
726
+
727
+ if (usage.groundingMetadata) {
728
+ // Search queries the model executed
729
+ console.log('Queries:', usage.groundingMetadata.webSearchQueries);
730
+
731
+ // Source citations
732
+ for (const chunk of usage.groundingMetadata.groundingChunks || []) {
733
+ if (chunk.web) {
734
+ console.log(`Source: ${chunk.web.title} — ${chunk.web.uri}`);
735
+ }
736
+ }
737
+ }
738
+ ```
739
+
740
+ ### Grounding Configuration
741
+
742
+ ```javascript
743
+ const chat = new Chat({
744
+ enableGrounding: true,
745
+ groundingConfig: {
746
+ // Exclude specific domains
747
+ excludeDomains: ['reddit.com', 'twitter.com'],
748
+
749
+ // Filter by time range (Gemini API only)
750
+ timeRangeFilter: {
751
+ startTime: '2025-01-01T00:00:00Z',
752
+ endTime: '2025-12-31T23:59:59Z'
753
+ }
754
+ }
755
+ });
756
+ ```
757
+
758
+ ### Grounding with ToolAgent
759
+
760
+ Grounding works alongside user-defined tools — both are merged into the tools array automatically:
761
+
762
+ ```javascript
763
+ const agent = new ToolAgent({
764
+ enableGrounding: true,
765
+ tools: [
766
+ { name: 'save_result', description: 'Save a research result', parametersJsonSchema: { type: 'object', properties: { title: { type: 'string' }, summary: { type: 'string' } }, required: ['title', 'summary'] } }
767
+ ],
768
+ toolExecutor: async (name, args) => {
769
+ if (name === 'save_result') return await db.insert(args);
770
+ }
771
+ });
772
+
773
+ // The agent can search the web AND call your tools
774
+ const result = await agent.chat('Research the latest AI safety developments and save the key findings');
775
+ ```
776
+
777
+ ### Per-Message Grounding Toggle (Transformer)
778
+
779
+ Transformer supports toggling grounding per-message without rebuilding the instance:
780
+
781
+ ```javascript
782
+ const t = new Transformer({ enableGrounding: false });
783
+
784
+ // Enable grounding for just this call
785
+ const result = await t.send(payload, { enableGrounding: true });
786
+
787
+ // Back to no grounding for subsequent calls
788
+ ```
789
+
790
+ ### When to Use Grounding
791
+
792
+ - Questions about current events, recent news, or real-time data
793
+ - Fact-checking or verification tasks
794
+ - Research assistants that need up-to-date information
795
+ - Any scenario where the model's training data cutoff is a limitation
796
+
797
+ ---
798
+
799
+ ## Context Caching
800
+
801
+ Cache system prompts, documents, or tool definitions to reduce costs when making many API calls with the same large context. Cached tokens are billed at a reduced rate.
802
+
803
+ ### When Context Caching Helps
804
+
805
+ - **Large system prompts** reused across many calls
806
+ - **RagAgent** with the same document set serving many queries
807
+ - **ToolAgent** with many tool definitions
808
+ - Any scenario with high token count in repeated context
809
+
810
+ ### Create and Use a Cache
811
+
812
+ ```javascript
813
+ import { Chat } from 'ak-gemini';
814
+
815
+ const chat = new Chat({
816
+ systemPrompt: veryLongSystemPrompt // e.g., 10,000+ tokens
817
+ });
818
+
819
+ // Create a cache (auto-uses this instance's model and systemPrompt)
820
+ const cache = await chat.createCache({
821
+ ttl: '3600s', // 1 hour
822
+ displayName: 'my-app-system-prompt'
823
+ });
824
+
825
+ console.log(cache.name); // Server-generated resource name
826
+ console.log(cache.expireTime); // When it expires
827
+
828
+ // Attach the cache to this instance
829
+ await chat.useCache(cache.name);
830
+
831
+ // All subsequent calls use cached tokens at reduced cost
832
+ const r1 = await chat.send('Hello');
833
+ const r2 = await chat.send('Tell me more');
834
+ ```
835
+
836
+ ### Cache Management
837
+
838
+ ```javascript
839
+ // List all caches
840
+ const caches = await chat.listCaches();
841
+
842
+ // Get cache details
843
+ const info = await chat.getCache(cache.name);
844
+ console.log(info.usageMetadata?.totalTokenCount);
845
+
846
+ // Extend TTL
847
+ await chat.updateCache(cache.name, { ttl: '7200s' });
848
+
849
+ // Delete when done
850
+ await chat.deleteCache(cache.name);
851
+ ```
852
+
853
+ ### Cache with Constructor
854
+
855
+ If you already have a cache name, pass it directly:
856
+
857
+ ```javascript
858
+ const chat = new Chat({
859
+ cachedContent: 'projects/my-project/locations/us-central1/cachedContents/abc123'
860
+ });
861
+ ```
862
+
863
+ ### What Can Be Cached
864
+
865
+ The `createCache()` config accepts:
866
+
867
+ | Field | Description |
868
+ |---|---|
869
+ | `systemInstruction` | System prompt (auto-populated from instance if not provided) |
870
+ | `contents` | Content messages to cache |
871
+ | `tools` | Tool declarations to cache |
872
+ | `toolConfig` | Tool configuration to cache |
873
+ | `ttl` | Time-to-live (e.g., `'3600s'`) |
874
+ | `displayName` | Human-readable label |
875
+
876
+ ### Cost Savings
877
+
878
+ Context caching reduces input token costs for cached content. The exact savings depend on the model — check [Google's pricing page](https://ai.google.dev/pricing) for current rates. The trade-off is the cache storage cost and the minimum cache size requirement.
879
+
880
+ **Rule of thumb**: Caching pays off when you make many calls with the same large context (system prompt + documents) within the cache TTL.
881
+
882
+ ---
883
+
573
884
  ## Observability & Usage Tracking
574
885
 
575
886
  Every class provides consistent observability hooks.
@@ -954,7 +1265,7 @@ const result = await chat.send('Find users who signed up in the last 7 days');
954
1265
 
955
1266
  ```javascript
956
1267
  // Named exports
957
- import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent, BaseGemini, log } from 'ak-gemini';
1268
+ import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent, Embedding, BaseGemini, log } from 'ak-gemini';
958
1269
  import { extractJSON, attemptJSONRecovery } from 'ak-gemini';
959
1270
  import { ThinkingLevel, HarmCategory, HarmBlockThreshold } from 'ak-gemini';
960
1271
 
@@ -962,7 +1273,7 @@ import { ThinkingLevel, HarmCategory, HarmBlockThreshold } from 'ak-gemini';
962
1273
  import AI from 'ak-gemini';
963
1274
 
964
1275
  // CommonJS
965
- const { Transformer, Chat } = require('ak-gemini');
1276
+ const { Transformer, Chat, Embedding } = require('ak-gemini');
966
1277
  ```
967
1278
 
968
1279
  ### Constructor Options (All Classes)
@@ -980,6 +1291,9 @@ const { Transformer, Chat } = require('ak-gemini');
980
1291
  | `maxOutputTokens` | number \| null | `50000` |
981
1292
  | `logLevel` | string | based on `NODE_ENV` |
982
1293
  | `labels` | object | `{}` (Vertex AI only) |
1294
+ | `enableGrounding` | boolean | `false` |
1295
+ | `groundingConfig` | object | `{}` |
1296
+ | `cachedContent` | string | `null` |
983
1297
 
984
1298
  ### Methods Available on All Classes
985
1299
 
@@ -992,3 +1306,9 @@ const { Transformer, Chat } = require('ak-gemini');
992
1306
  | `getLastUsage()` | `UsageData \| null` | Token usage from last call |
993
1307
  | `estimate(payload)` | `Promise<{ inputTokens }>` | Estimate input tokens |
994
1308
  | `estimateCost(payload)` | `Promise<object>` | Estimate cost in dollars |
1309
+ | `createCache(config?)` | `Promise<CachedContentInfo>` | Create a context cache |
1310
+ | `getCache(name)` | `Promise<CachedContentInfo>` | Get cache details |
1311
+ | `listCaches()` | `Promise<any>` | List all caches |
1312
+ | `updateCache(name, config?)` | `Promise<CachedContentInfo>` | Update cache TTL |
1313
+ | `deleteCache(name)` | `Promise<void>` | Delete a cache |
1314
+ | `useCache(name)` | `Promise<void>` | Attach a cache to this instance |
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # ak-gemini
2
2
 
3
- **Modular, type-safe wrapper for Google's Gemini AI.** Five class exports for different interaction patterns — JSON transformation, chat, stateless messages, tool-using agents, and code-writing agents — all sharing a common base.
3
+ **Modular, type-safe wrapper for Google's Gemini AI.** Seven class exports for different interaction patterns — JSON transformation, chat, stateless messages, tool-using agents, code-writing agents, document Q&A, and embeddings — all sharing a common base.
4
4
 
5
5
  ```sh
6
6
  npm install ak-gemini
@@ -17,7 +17,7 @@ export GEMINI_API_KEY=your-key
17
17
  ```
18
18
 
19
19
  ```javascript
20
- import { Transformer, Chat, Message, ToolAgent, CodeAgent } from 'ak-gemini';
20
+ import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent, Embedding } from 'ak-gemini';
21
21
  ```
22
22
 
23
23
  ---
@@ -176,6 +176,27 @@ for await (const event of agent.stream('Refactor the auth module')) {
176
176
  }
177
177
  ```
178
178
 
179
+ ### Embedding — Vector Embeddings
180
+
181
+ Generate vector embeddings for similarity search, clustering, and classification.
182
+
183
+ ```javascript
184
+ const embedder = new Embedding({
185
+ modelName: 'gemini-embedding-001', // default
186
+ taskType: 'RETRIEVAL_DOCUMENT'
187
+ });
188
+
189
+ // Single text
190
+ const result = await embedder.embed('Hello world');
191
+ console.log(result.values); // [0.012, -0.034, ...]
192
+
193
+ // Batch
194
+ const results = await embedder.embedBatch(['Hello', 'World']);
195
+
196
+ // Cosine similarity (pure math, no API call)
197
+ const score = embedder.similarity(results[0].values, results[1].values);
198
+ ```
199
+
179
200
  ---
180
201
 
181
202
  ## Stopping Agents
@@ -252,6 +273,43 @@ new Chat({
252
273
  });
253
274
  ```
254
275
 
276
+ ### Google Search Grounding
277
+
278
+ Ground responses in real-time web search results. Available on all classes.
279
+
280
+ ```javascript
281
+ const chat = new Chat({
282
+ enableGrounding: true,
283
+ groundingConfig: { excludeDomains: ['example.com'] }
284
+ });
285
+
286
+ const result = await chat.send('Who won the 2026 Super Bowl?');
287
+ const sources = result.usage?.groundingMetadata?.groundingChunks;
288
+ ```
289
+
290
+ **Warning**: Google Search grounding costs ~$35/1k queries.
291
+
292
+ ### Context Caching
293
+
294
+ Reduce costs by caching repeated system prompts, documents, or tool definitions.
295
+
296
+ ```javascript
297
+ const chat = new Chat({ systemPrompt: longSystemPrompt });
298
+
299
+ // Create a cache
300
+ const cache = await chat.createCache({
301
+ ttl: '3600s',
302
+ displayName: 'my-system-prompt-cache'
303
+ });
304
+
305
+ // Use the cache (subsequent calls use cached tokens at reduced cost)
306
+ await chat.useCache(cache.name);
307
+ const result = await chat.send('Hello!');
308
+
309
+ // Clean up
310
+ await chat.deleteCache(cache.name);
311
+ ```
312
+
255
313
  ### Billing Labels (Vertex AI)
256
314
 
257
315
  ```javascript
@@ -281,6 +339,9 @@ All classes accept `BaseGeminiOptions`:
281
339
  | `maxOutputTokens` | number | `50000` | Max tokens in response (`null` removes limit) |
282
340
  | `logLevel` | string | based on NODE_ENV | `'trace'`\|`'debug'`\|`'info'`\|`'warn'`\|`'error'`\|`'none'` |
283
341
  | `labels` | object | — | Billing labels (Vertex AI) |
342
+ | `enableGrounding` | boolean | `false` | Enable Google Search grounding |
343
+ | `groundingConfig` | object | — | Grounding config (excludeDomains, timeRangeFilter) |
344
+ | `cachedContent` | string | — | Cached content resource name |
284
345
 
285
346
  ### Transformer-Specific
286
347
 
@@ -293,8 +354,6 @@ All classes accept `BaseGeminiOptions`:
293
354
  | `retryDelay` | number | `1000` | Initial retry delay (ms) |
294
355
  | `responseSchema` | object | — | JSON schema for output validation |
295
356
  | `asyncValidator` | function | — | Global async validator |
296
- | `enableGrounding` | boolean | `false` | Enable Google Search grounding |
297
-
298
357
  ### ToolAgent-Specific
299
358
 
300
359
  | Option | Type | Default | Description |
@@ -322,21 +381,31 @@ All classes accept `BaseGeminiOptions`:
322
381
  | `responseSchema` | object | — | Schema for structured output |
323
382
  | `responseMimeType` | string | — | e.g. `'application/json'` |
324
383
 
384
+ ### Embedding-Specific
385
+
386
+ | Option | Type | Default | Description |
387
+ |--------|------|---------|-------------|
388
+ | `taskType` | string | — | `'RETRIEVAL_DOCUMENT'`, `'RETRIEVAL_QUERY'`, `'SEMANTIC_SIMILARITY'`, `'CLUSTERING'` |
389
+ | `title` | string | — | Document title (only with `RETRIEVAL_DOCUMENT`) |
390
+ | `outputDimensionality` | number | — | Output vector dimensions |
391
+ | `autoTruncate` | boolean | `true` | Auto-truncate long inputs |
392
+
325
393
  ---
326
394
 
327
395
  ## Exports
328
396
 
329
397
  ```javascript
330
398
  // Named exports
331
- import { Transformer, Chat, Message, ToolAgent, CodeAgent, BaseGemini, log } from 'ak-gemini';
399
+ import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent, Embedding, BaseGemini, log } from 'ak-gemini';
332
400
  import { extractJSON, attemptJSONRecovery } from 'ak-gemini';
333
401
 
334
402
  // Default export (namespace)
335
403
  import AI from 'ak-gemini';
336
404
  new AI.Transformer({ ... });
405
+ new AI.Embedding({ ... });
337
406
 
338
407
  // CommonJS
339
- const { Transformer, Chat } = require('ak-gemini');
408
+ const { Transformer, Chat, Embedding } = require('ak-gemini');
340
409
  ```
341
410
 
342
411
  ---