npm - ak-gemini - Versions diffs - 2.0.1 → 2.0.2 - Mend

ak-gemini 2.0.1 → 2.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (3) hide show

package/GUIDE.md ADDED Viewed

@@ -0,0 +1,994 @@
+# ak-gemini — Integration Guide
+> A practical guide for rapidly adding AI capabilities to any Node.js codebase using `ak-gemini`.
+> Covers every class, common patterns, best practices, and observability hooks.
+```sh
+npm install ak-gemini
+```
+**Requirements**: Node.js 18+, a `GEMINI_API_KEY` env var (or Vertex AI credentials).
+---
+## Table of Contents
+1. [Core Concepts](#core-concepts)
+2. [Authentication](#authentication)
+3. [Class Selection Guide](#class-selection-guide)
+4. [Message — Stateless AI Calls](#message--stateless-ai-calls)
+5. [Chat — Multi-Turn Conversations](#chat--multi-turn-conversations)
+6. [Transformer — Structured JSON Transformation](#transformer--structured-json-transformation)
+7. [ToolAgent — Agent with Custom Tools](#toolagent--agent-with-custom-tools)
+8. [CodeAgent — Agent That Writes and Runs Code](#codeagent--agent-that-writes-and-runs-code)
+9. [RagAgent — Document & Data Q&A](#ragagent--document--data-qa)
+10. [Observability & Usage Tracking](#observability--usage-tracking)
+11. [Thinking Configuration](#thinking-configuration)
+12. [Error Handling & Retries](#error-handling--retries)
+13. [Performance Tips](#performance-tips)
+14. [Common Integration Patterns](#common-integration-patterns)
+15. [Quick Reference](#quick-reference)
+---
+## Core Concepts
+Every class in ak-gemini extends `BaseGemini`, which handles:
+- **Authentication** — Gemini API key or Vertex AI service account
+- **Chat sessions** — Managed conversation state with the model
+- **Token tracking** — Input/output token counts after every call
+- **Cost estimation** — Dollar estimates before sending
+- **Few-shot seeding** — Inject example pairs to guide the model
+- **Thinking config** — Control the model's internal reasoning budget
+- **Safety settings** — Harassment and dangerous content filters (relaxed by default)
+```javascript
+import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent } from 'ak-gemini';
+// or
+import AI from 'ak-gemini';
+const t = new AI.Transformer({ ... });
+```
+The default model is `gemini-2.5-flash`. Override with `modelName`:
+```javascript
+new Chat({ modelName: 'gemini-2.5-pro' });
+```
+---
+## Authentication
+### Gemini API (default)
+```javascript
+// Option 1: Environment variable (recommended)
+// Set GEMINI_API_KEY in your .env or shell
+new Chat();
+// Option 2: Explicit key
+new Chat({ apiKey: 'your-key' });
+```
+### Vertex AI
+```javascript
+new Chat({
+  vertexai: true,
+  project: 'my-gcp-project',           // or GOOGLE_CLOUD_PROJECT env var
+  location: 'us-central1',             // or GOOGLE_CLOUD_LOCATION env var
+  labels: { app: 'myapp', env: 'prod' } // billing labels (Vertex AI only)
+});
+```
+Vertex AI uses Application Default Credentials. Run `gcloud auth application-default login` locally, or use a service account in production.
+---
+## Class Selection Guide
+| I want to... | Use | Method |
+|---|---|---|
+| Get a one-off AI response (no history) | `Message` | `send()` |
+| Have a back-and-forth conversation | `Chat` | `send()` |
+| Transform JSON with examples + validation | `Transformer` | `send()` |
+| Give the AI tools to call (APIs, DB, etc.) | `ToolAgent` | `chat()` / `stream()` |
+| Let the AI write and run JavaScript | `CodeAgent` | `chat()` / `stream()` |
+| Q&A over documents, files, or data | `RagAgent` | `chat()` / `stream()` |
+**Rule of thumb**: Start with `Message` for the simplest integration. Move to `Chat` if you need history. Use `Transformer` when you need structured JSON output with validation. Use agents when the AI needs to take action.
+---
+## Message — Stateless AI Calls
+The simplest class. Each `send()` call is independent — no conversation history is maintained. Ideal for classification, extraction, summarization, and any fire-and-forget AI call.
+```javascript
+import { Message } from 'ak-gemini';
+const msg = new Message({
+  systemPrompt: 'You are a sentiment classifier. Respond with: positive, negative, or neutral.'
+});
+const result = await msg.send('I love this product!');
+console.log(result.text);  // "positive"
+console.log(result.usage); // { promptTokens, responseTokens, totalTokens, ... }
+```
+### Structured Output (JSON)
+Force the model to return valid JSON matching a schema:
+```javascript
+const extractor = new Message({
+  systemPrompt: 'Extract structured data from the input text.',
+  responseMimeType: 'application/json',
+  responseSchema: {
+    type: 'object',
+    properties: {
+      people: { type: 'array', items: { type: 'string' } },
+      places: { type: 'array', items: { type: 'string' } },
+      sentiment: { type: 'string', enum: ['positive', 'negative', 'neutral'] }
+    },
+    required: ['people', 'places', 'sentiment']
+  }
+});
+const result = await extractor.send('Alice and Bob visited Paris. They had a wonderful time.');
+console.log(result.data);
+// { people: ['Alice', 'Bob'], places: ['Paris'], sentiment: 'positive' }
+```
+Key difference from `Chat`: `result.data` contains the parsed JSON object. `result.text` contains the raw string.
+### When to Use Message
+- Classification, tagging, or labeling
+- Entity extraction
+- Summarization
+- Any call where previous context doesn't matter
+- High-throughput pipelines where you process items independently
+---
+## Chat — Multi-Turn Conversations
+Maintains conversation history across calls. The model remembers what was said earlier.
+```javascript
+import { Chat } from 'ak-gemini';
+const chat = new Chat({
+  systemPrompt: 'You are a helpful coding assistant.'
+});
+const r1 = await chat.send('What is a closure in JavaScript?');
+console.log(r1.text);
+const r2 = await chat.send('Can you give me an example?');
+// The model remembers the closure topic from r1
+console.log(r2.text);
+```
+### History Management
+```javascript
+// Get conversation history
+const history = chat.getHistory();
+// Clear and start fresh (preserves system prompt)
+await chat.clearHistory();
+```
+### When to Use Chat
+- Interactive assistants and chatbots
+- Multi-step reasoning where later questions depend on earlier answers
+- Tutoring or coaching interactions
+- Any scenario where context carries across messages
+---
+## Transformer — Structured JSON Transformation
+The power tool for data pipelines. Show it examples of input → output mappings, then send new inputs. Includes validation, retry, and AI-powered error correction.
+```javascript
+import { Transformer } from 'ak-gemini';
+const t = new Transformer({
+  systemPrompt: 'Transform user profiles into marketing segments.',
+  sourceKey: 'INPUT',   // key for input data in examples
+  targetKey: 'OUTPUT',  // key for output data in examples
+  maxRetries: 3,        // retry on validation failure
+  retryDelay: 1000,     // ms between retries
+});
+// Seed with examples
+await t.seed([
+  {
+    INPUT: { age: 25, spending: 'high', interests: ['tech', 'gaming'] },
+    OUTPUT: { segment: 'young-affluent-tech', confidence: 0.9, tags: ['early-adopter'] }
+  },
+  {
+    INPUT: { age: 55, spending: 'medium', interests: ['gardening', 'cooking'] },
+    OUTPUT: { segment: 'mature-lifestyle', confidence: 0.85, tags: ['home-focused'] }
+  }
+]);
+// Transform new data
+const result = await t.send({ age: 30, spending: 'low', interests: ['books', 'hiking'] });
+// result → { segment: '...', confidence: ..., tags: [...] }
+```
+### Validation
+Pass an async validator as the third argument to `send()`. If it throws, the Transformer retries with the error message fed back to the model:
+```javascript
+const result = await t.send(
+  { age: 30, spending: 'low' },
+  {},  // options
+  async (output) => {
+    if (!output.segment) throw new Error('Missing segment field');
+    if (output.confidence < 0 || output.confidence > 1) {
+      throw new Error('Confidence must be between 0 and 1');
+    }
+    return output; // return the validated (or modified) output
+  }
+);
+```
+Or set a global validator in the constructor:
+```javascript
+const t = new Transformer({
+  asyncValidator: async (output) => {
+    if (!output.id) throw new Error('Missing id');
+    return output;
+  }
+});
+```
+### Self-Healing with `rebuild()`
+When downstream code fails, feed the error back to the AI:
+```javascript
+try {
+  await processPayload(result);
+} catch (err) {
+  const fixed = await t.rebuild(result, err.message);
+  await processPayload(fixed); // try again with AI-corrected payload
+}
+```
+### Loading Examples from a File
+```javascript
+const t = new Transformer({
+  examplesFile: './training-data.json'
+  // JSON array of { INPUT: ..., OUTPUT: ... } objects
+});
+await t.seed(); // loads from file automatically
+```
+### Stateless Sends
+Send without affecting the conversation history (useful for parallel processing):
+```javascript
+const result = await t.send(payload, { stateless: true });
+```
+### When to Use Transformer
+- ETL pipelines — transform data between formats
+- API response normalization
+- Content enrichment (add tags, categories, scores)
+- Any structured data transformation where you can provide examples
+- Batch processing with validation guarantees
+---
+## ToolAgent — Agent with Custom Tools
+Give the model tools (functions) it can call. You define what tools exist and how to execute them. The agent handles the conversation loop — sending messages, receiving tool calls, executing them, feeding results back, until the model produces a final text answer.
+```javascript
+import { ToolAgent } from 'ak-gemini';
+const agent = new ToolAgent({
+  systemPrompt: 'You are a database assistant.',
+  tools: [
+    {
+      name: 'query_db',
+      description: 'Execute a read-only SQL query against the users database',
+      parametersJsonSchema: {
+        type: 'object',
+        properties: {
+          sql: { type: 'string', description: 'The SQL query to execute' }
+        },
+        required: ['sql']
+      }
+    },
+    {
+      name: 'send_email',
+      description: 'Send an email notification',
+      parametersJsonSchema: {
+        type: 'object',
+        properties: {
+          to: { type: 'string' },
+          subject: { type: 'string' },
+          body: { type: 'string' }
+        },
+        required: ['to', 'subject', 'body']
+      }
+    }
+  ],
+  toolExecutor: async (toolName, args) => {
+    switch (toolName) {
+      case 'query_db':
+        return await db.query(args.sql);
+      case 'send_email':
+        await mailer.send(args);
+        return { sent: true };
+    }
+  },
+  maxToolRounds: 10 // safety limit on tool-use loop iterations
+});
+const result = await agent.chat('How many users signed up this week? Email the count to admin@co.com');
+console.log(result.text);       // "There were 47 new signups this week. I've sent the email."
+console.log(result.toolCalls);  // [{ name: 'query_db', args: {...}, result: [...] }, { name: 'send_email', ... }]
+```
+### Streaming
+Stream the agent's output in real-time — useful for showing progress in a UI:
+```javascript
+for await (const event of agent.stream('Find the top 5 users by spend')) {
+  switch (event.type) {
+    case 'text':        process.stdout.write(event.text); break;
+    case 'tool_call':   console.log(`\nCalling ${event.toolName}...`); break;
+    case 'tool_result': console.log(`Result:`, event.result); break;
+    case 'done':        console.log('\nUsage:', event.usage); break;
+  }
+}
+```
+### Execution Gating
+Control which tool calls are allowed at runtime:
+```javascript
+const agent = new ToolAgent({
+  tools: [...],
+  toolExecutor: myExecutor,
+  onBeforeExecution: async (toolName, args) => {
+    if (toolName === 'delete_user') {
+      console.log('Blocked dangerous tool call');
+      return false; // deny execution
+    }
+    return true; // allow
+  },
+  onToolCall: (toolName, args) => {
+    // Notification callback — fires on every tool call (logging, metrics, etc.)
+    metrics.increment(`tool_call.${toolName}`);
+  }
+});
+```
+### Stopping an Agent
+Cancel mid-execution from a callback or externally:
+```javascript
+// From a callback
+onBeforeExecution: async (toolName, args) => {
+  if (shouldStop) {
+    agent.stop(); // stop after this round
+    return false;
+  }
+  return true;
+}
+// Externally (e.g., user cancel button, timeout)
+setTimeout(() => agent.stop(), 60_000);
+const result = await agent.chat('Do some work');
+// result includes warning: "Agent was stopped"
+```
+### When to Use ToolAgent
+- AI that needs to call APIs, query databases, or interact with external systems
+- Workflow automation — the AI orchestrates a sequence of operations
+- Research assistants that fetch and synthesize data from multiple sources
+- Any scenario where you want the model to decide *which* tools to use and *when*
+---
+## CodeAgent — Agent That Writes and Runs Code
+Instead of calling tools one by one, the model writes complete JavaScript scripts and executes them in a child process. This is powerful for tasks that require complex logic, file manipulation, or multi-step computation.
+```javascript
+import { CodeAgent } from 'ak-gemini';
+const agent = new CodeAgent({
+  workingDirectory: '/path/to/project',
+  importantFiles: ['package.json', 'src/config.js'], // injected into system prompt
+  timeout: 30_000,    // per-execution timeout
+  maxRounds: 10,      // max code execution cycles
+  keepArtifacts: true, // keep script files on disk after execution
+});
+const result = await agent.chat('Find all files larger than 1MB and list them sorted by size');
+console.log(result.text);            // Agent's summary
+console.log(result.codeExecutions);  // [{ code, output, stderr, exitCode, purpose }]
+```
+### How It Works
+1. On `init()`, the agent scans the working directory and gathers codebase context (file tree, package.json, key files, importantFiles)
+2. This context is injected into the system prompt so the model understands the project
+3. The model writes JavaScript using an internal `execute_code` tool
+4. Code is saved to a `.mjs` file and run in a Node.js child process that inherits `process.env`
+5. stdout/stderr feeds back to the model
+6. The model decides if more work is needed (up to `maxRounds` cycles)
+### Streaming
+```javascript
+for await (const event of agent.stream('Refactor the auth module to use async/await')) {
+  switch (event.type) {
+    case 'text':   process.stdout.write(event.text); break;
+    case 'code':   console.log('\n--- Executing code ---'); break;
+    case 'output': console.log(event.stdout); break;
+    case 'done':   console.log('\nDone!', event.usage); break;
+  }
+}
+```
+### Execution Gating & Notifications
+```javascript
+const agent = new CodeAgent({
+  workingDirectory: '.',
+  onBeforeExecution: async (code) => {
+    // Review code before it runs
+    if (code.includes('rm -rf')) return false; // deny
+    return true;
+  },
+  onCodeExecution: (code, output) => {
+    // Log every execution for audit
+    logger.info({ code: code.slice(0, 200), exitCode: output.exitCode });
+  }
+});
+```
+### Retrieving Scripts
+Get all scripts the agent wrote across all interactions:
+```javascript
+const scripts = agent.dump();
+// [{ fileName: 'agent-read-config.mjs', purpose: 'read-config', script: '...', filePath: '/path/...' }]
+```
+### When to Use CodeAgent
+- File system operations — reading, writing, transforming files
+- Data analysis — processing CSV, JSON, or log files
+- Codebase exploration — finding patterns, counting occurrences, generating reports
+- Prototyping — quickly testing ideas by having the AI write and run code
+- Any task where the AI needs more flexibility than predefined tools provide
+---
+## RagAgent — Document & Data Q&A
+Load documents and data into the model's context for grounded Q&A. Supports three input types that can be used together:
+| Input Type | Option | What It Does |
+|---|---|---|
+| **Remote files** | `remoteFiles` | Uploaded via Google Files API — for PDFs, images, audio, video |
+| **Local files** | `localFiles` | Read from disk as UTF-8 text — for md, json, csv, yaml, txt |
+| **Local data** | `localData` | In-memory objects serialized as JSON |
+```javascript
+import { RagAgent } from 'ak-gemini';
+const agent = new RagAgent({
+  // Text files read directly from disk (fast, no upload)
+  localFiles: ['./docs/api-reference.md', './docs/architecture.md'],
+  // In-memory data
+  localData: [
+    { name: 'users', data: await db.query('SELECT * FROM users LIMIT 100') },
+    { name: 'config', data: JSON.parse(await fs.readFile('./config.json', 'utf-8')) },
+  ],
+  // Binary/media files uploaded via Files API
+  remoteFiles: ['./diagrams/architecture.png', './reports/q4.pdf'],
+});
+const result = await agent.chat('What authentication method does the API use?');
+console.log(result.text);  // Grounded answer citing the api-reference.md
+```
+### Dynamic Context
+Add more context after initialization (each triggers a reinit):
+```javascript
+await agent.addLocalFiles(['./new-doc.md']);
+await agent.addLocalData([{ name: 'metrics', data: { uptime: 99.9 } }]);
+await agent.addRemoteFiles(['./new-chart.png']);
+```
+### Inspecting Context
+```javascript
+const ctx = agent.getContext();
+// {
+//   remoteFiles: [{ name, displayName, mimeType, sizeBytes, uri, originalPath }],
+//   localFiles: [{ name, path, size }],
+//   localData: [{ name, type }]
+// }
+```
+### Streaming
+```javascript
+for await (const event of agent.stream('Summarize the architecture document')) {
+  if (event.type === 'text') process.stdout.write(event.text);
+  if (event.type === 'done') console.log('\nUsage:', event.usage);
+}
+```
+### When to Use RagAgent
+- Documentation Q&A — let users ask questions about your docs
+- Data exploration — load database results or CSV exports and ask questions
+- Code review — load source files and ask about patterns, bugs, or architecture
+- Report analysis — load PDF reports and extract insights
+- Any scenario where the AI needs to answer questions grounded in specific data
+### Choosing Input Types
+| Data | Use |
+|---|---|
+| Plain text files (md, txt, json, csv, yaml) | `localFiles` — fastest, no API upload |
+| In-memory objects, DB results, API responses | `localData` — serialized as JSON |
+| PDFs, images, audio, video | `remoteFiles` — uploaded via Files API |
+Prefer `localFiles` and `localData` when possible — they skip the upload step and initialize faster.
+---
+## Observability & Usage Tracking
+Every class provides consistent observability hooks.
+### Token Usage
+After every API call, get detailed token counts:
+```javascript
+const usage = instance.getLastUsage();
+// {
+//   promptTokens: 1250,      // input tokens (cumulative across retries)
+//   responseTokens: 340,     // output tokens (cumulative across retries)
+//   totalTokens: 1590,       // total (cumulative)
+//   attempts: 1,             // 1 = first try, 2+ = retries needed
+//   modelVersion: 'gemini-2.5-flash-001',  // actual model that responded
+//   requestedModel: 'gemini-2.5-flash',    // model you requested
+//   timestamp: 1710000000000
+// }
+```
+### Cost Estimation
+Estimate cost *before* sending:
+```javascript
+const estimate = await instance.estimateCost('What is the meaning of life?');
+// {
+//   inputTokens: 8,
+//   model: 'gemini-2.5-flash',
+//   pricing: { input: 0.15, output: 0.60 },  // per million tokens
+//   estimatedInputCost: 0.0000012,
+//   note: 'Output cost depends on response length'
+// }
+```
+Or just get the token count:
+```javascript
+const { inputTokens } = await instance.estimate('some payload');
+```
+### Logging
+All classes use [pino](https://github.com/pinojs/pino) for structured logging. Control the level:
+```javascript
+// Per-instance
+new Chat({ logLevel: 'debug' });
+// Via environment
+LOG_LEVEL=debug node app.js
+// Via NODE_ENV (dev → debug, test → warn, prod → info)
+```
+### Agent Callbacks
+ToolAgent and CodeAgent provide execution callbacks for building audit trails, metrics, and approval flows:
+```javascript
+// ToolAgent
+new ToolAgent({
+  onToolCall: (toolName, args) => {
+    // Fires on every tool call — use for logging, metrics
+    logger.info({ event: 'tool_call', tool: toolName, args });
+  },
+  onBeforeExecution: async (toolName, args) => {
+    // Fires before execution — return false to deny
+    // Use for approval flows, safety checks, rate limiting
+    return !blocklist.includes(toolName);
+  }
+});
+// CodeAgent
+new CodeAgent({
+  onCodeExecution: (code, output) => {
+    // Fires after every code execution
+    logger.info({ event: 'code_exec', exitCode: output.exitCode, lines: code.split('\n').length });
+  },
+  onBeforeExecution: async (code) => {
+    // Review code before execution
+    if (code.includes('process.exit')) return false;
+    return true;
+  }
+});
+```
+### Billing Labels (Vertex AI)
+Tag API calls for cost attribution:
+```javascript
+// Constructor-level (applies to all calls)
+new Transformer({
+  vertexai: true,
+  project: 'my-project',
+  labels: { app: 'etl-pipeline', env: 'prod', team: 'data' }
+});
+// Per-message override
+await transformer.send(payload, { labels: { job_id: 'abc123' } });
+```
+---
+## Thinking Configuration
+Models like `gemini-2.5-flash` and `gemini-2.5-pro` support thinking — internal reasoning before answering. Control the budget:
+```javascript
+// Disable thinking (default — fastest, cheapest)
+new Chat({ thinkingConfig: { thinkingBudget: 0 } });
+// Automatic thinking budget (model decides)
+new Chat({ thinkingConfig: { thinkingBudget: -1 } });
+// Fixed budget (in tokens)
+new Chat({ thinkingConfig: { thinkingBudget: 2048 } });
+// Use ThinkingLevel enum
+import { ThinkingLevel } from 'ak-gemini';
+new Chat({ thinkingConfig: { thinkingLevel: ThinkingLevel.LOW } });
+```
+**When to enable thinking**: Complex reasoning, math, multi-step logic, code generation. **When to disable**: Simple classification, extraction, or chat where speed matters.
+---
+## Error Handling & Retries
+### Transformer Retries
+The Transformer has built-in retry with exponential backoff when validation fails:
+```javascript
+const t = new Transformer({
+  maxRetries: 3,   // default: 3
+  retryDelay: 1000 // default: 1000ms, doubles each retry
+});
+```
+Each retry feeds the validation error back to the model, giving it a chance to self-correct. The `usage` object reports cumulative tokens across all attempts:
+```javascript
+const result = await t.send(payload, {}, validator);
+const usage = t.getLastUsage();
+console.log(usage.attempts); // 2 = needed one retry
+```
+### Rate Limiting (429 Errors)
+The Gemini API returns 429 when rate limited. ak-gemini does not auto-retry 429s — handle them in your application layer:
+```javascript
+async function sendWithBackoff(instance, payload, maxRetries = 3) {
+  for (let i = 0; i < maxRetries; i++) {
+    try {
+      return await instance.send(payload);
+    } catch (err) {
+      if (err.status === 429 && i < maxRetries - 1) {
+        await new Promise(r => setTimeout(r, 2 ** i * 1000));
+        continue;
+      }
+      throw err;
+    }
+  }
+}
+```
+### CodeAgent Failure Limits
+CodeAgent tracks consecutive failed executions. After `maxRetries` (default: 3) consecutive failures, the model summarizes what went wrong and asks for guidance:
+```javascript
+new CodeAgent({
+  maxRetries: 5, // allow more failures before stopping
+});
+```
+---
+## Performance Tips
+### Reuse Instances
+Each instance maintains a chat session. Creating a new instance for every request wastes the system prompt tokens. Reuse instances when possible:
+```javascript
+// Bad — creates a new session every call
+app.post('/classify', async (req, res) => {
+  const msg = new Message({ systemPrompt: '...' }); // new instance every request!
+  const result = await msg.send(req.body.text);
+  res.json(result);
+});
+// Good — reuse the instance
+const classifier = new Message({ systemPrompt: '...' });
+app.post('/classify', async (req, res) => {
+  const result = await classifier.send(req.body.text);
+  res.json(result);
+});
+```
+### Choose the Right Model
+| Model | Speed | Cost | Best For |
+|---|---|---|---|
+| `gemini-2.0-flash-lite` | Fastest | Cheapest | Classification, extraction, simple tasks |
+| `gemini-2.0-flash` | Fast | Low | General purpose, good quality |
+| `gemini-2.5-flash` | Medium | Low | Best balance of speed and quality |
+| `gemini-2.5-pro` | Slow | High | Complex reasoning, code, analysis |
+### Use `Message` for Stateless Workloads
+`Message` uses `generateContent()` under the hood — no chat session overhead. For pipelines processing thousands of items independently, `Message` is the right choice.
+### Use `localFiles` / `localData` over `remoteFiles`
+For text-based content, `localFiles` and `localData` skip the Files API upload entirely. They're faster to initialize and don't require network calls for the file upload step.
+### Disable Thinking for Simple Tasks
+Thinking tokens cost money and add latency. For classification, extraction, or simple formatting tasks, keep `thinkingBudget: 0` (the default).
+---
+## Common Integration Patterns
+### Pattern: API Endpoint Classifier
+```javascript
+import { Message } from 'ak-gemini';
+const classifier = new Message({
+  modelName: 'gemini-2.0-flash-lite', // fast + cheap
+  systemPrompt: 'Classify support tickets. Respond with exactly one of: billing, technical, account, other.',
+});
+app.post('/api/classify-ticket', async (req, res) => {
+  const result = await classifier.send(req.body.text);
+  res.json({ category: result.text.trim().toLowerCase() });
+});
+```
+### Pattern: ETL Pipeline with Validation
+```javascript
+import { Transformer } from 'ak-gemini';
+const normalizer = new Transformer({
+  sourceKey: 'RAW',
+  targetKey: 'NORMALIZED',
+  maxRetries: 3,
+  asyncValidator: async (output) => {
+    if (!output.email?.includes('@')) throw new Error('Invalid email');
+    if (!output.name?.trim()) throw new Error('Name is required');
+    return output;
+  }
+});
+await normalizer.seed([
+  { RAW: { nm: 'alice', mail: 'alice@co.com' }, NORMALIZED: { name: 'Alice', email: 'alice@co.com' } },
+]);
+for (const record of rawRecords) {
+  const clean = await normalizer.send(record, { stateless: true });
+  await db.insert('users', clean);
+}
+```
+### Pattern: Conversational Assistant with Tools
+```javascript
+import { ToolAgent } from 'ak-gemini';
+const assistant = new ToolAgent({
+  systemPrompt: `You are a customer support agent for Acme Corp.
+You can look up orders and issue refunds. Always confirm before issuing refunds.`,
+  tools: [
+    {
+      name: 'lookup_order',
+      description: 'Look up an order by ID or customer email',
+      parametersJsonSchema: {
+        type: 'object',
+        properties: {
+          order_id: { type: 'string' },
+          email: { type: 'string' }
+        }
+      }
+    },
+    {
+      name: 'issue_refund',
+      description: 'Issue a refund for an order',
+      parametersJsonSchema: {
+        type: 'object',
+        properties: {
+          order_id: { type: 'string' },
+          amount: { type: 'number' },
+          reason: { type: 'string' }
+        },
+        required: ['order_id', 'amount', 'reason']
+      }
+    }
+  ],
+  toolExecutor: async (toolName, args) => {
+    if (toolName === 'lookup_order') return await orderService.lookup(args);
+    if (toolName === 'issue_refund') return await orderService.refund(args);
+  },
+  onBeforeExecution: async (toolName, args) => {
+    // Only allow refunds under $100 without human approval
+    if (toolName === 'issue_refund' && args.amount > 100) {
+      return false;
+    }
+    return true;
+  }
+});
+// In a chat endpoint
+const result = await assistant.chat(userMessage);
+```
+### Pattern: Document Q&A Service
+```javascript
+import { RagAgent } from 'ak-gemini';
+const docs = new RagAgent({
+  localFiles: [
+    './docs/getting-started.md',
+    './docs/api-reference.md',
+    './docs/faq.md',
+  ],
+  systemPrompt: 'You are a documentation assistant. Answer questions based on the docs. If the answer is not in the docs, say so.',
+});
+app.post('/api/ask', async (req, res) => {
+  const result = await docs.chat(req.body.question);
+  res.json({ answer: result.text, usage: result.usage });
+});
+```
+### Pattern: Data-Grounded Analysis
+```javascript
+import { RagAgent } from 'ak-gemini';
+const analyst = new RagAgent({
+  modelName: 'gemini-2.5-pro', // use a smarter model for analysis
+  localData: [
+    { name: 'sales_q4', data: await db.query('SELECT * FROM sales WHERE quarter = 4') },
+    { name: 'targets', data: await db.query('SELECT * FROM quarterly_targets') },
+  ],
+  systemPrompt: 'You are a business analyst. Analyze the provided data and answer questions with specific numbers.',
+});
+const result = await analyst.chat('Which regions missed their Q4 targets? By how much?');
+```
+### Pattern: Few-Shot Any Class
+Every class supports `seed()` for few-shot learning — not just Transformer:
+```javascript
+import { Chat } from 'ak-gemini';
+const chat = new Chat({ systemPrompt: 'You are a SQL expert.' });
+await chat.seed([
+  { PROMPT: 'Get all users', ANSWER: 'SELECT * FROM users;' },
+  { PROMPT: 'Count orders by status', ANSWER: 'SELECT status, COUNT(*) FROM orders GROUP BY status;' },
+]);
+const result = await chat.send('Find users who signed up in the last 7 days');
+// Model follows the SQL-only response pattern from the examples
+```
+---
+## Quick Reference
+### Imports
+```javascript
+// Named exports
+import { Transformer, Chat, Message, ToolAgent, CodeAgent, RagAgent, BaseGemini, log } from 'ak-gemini';
+import { extractJSON, attemptJSONRecovery } from 'ak-gemini';
+import { ThinkingLevel, HarmCategory, HarmBlockThreshold } from 'ak-gemini';
+// Default export (namespace)
+import AI from 'ak-gemini';
+// CommonJS
+const { Transformer, Chat } = require('ak-gemini');
+```
+### Constructor Options (All Classes)
+| Option | Type | Default |
+|---|---|---|
+| `modelName` | string | `'gemini-2.5-flash'` |
+| `systemPrompt` | string \| null \| false | varies by class |
+| `apiKey` | string | `GEMINI_API_KEY` env var |
+| `vertexai` | boolean | `false` |
+| `project` | string | `GOOGLE_CLOUD_PROJECT` env var |
+| `location` | string | `'global'` |
+| `chatConfig` | object | `{ temperature: 0.7, topP: 0.95, topK: 64 }` |
+| `thinkingConfig` | object | `{ thinkingBudget: 0 }` |
+| `maxOutputTokens` | number \| null | `50000` |
+| `logLevel` | string | based on `NODE_ENV` |
+| `labels` | object | `{}` (Vertex AI only) |
+### Methods Available on All Classes
+| Method | Returns | Description |
+|---|---|---|
+| `init(force?)` | `Promise<void>` | Initialize chat session |
+| `seed(examples, opts?)` | `Promise<Array>` | Add few-shot examples |
+| `getHistory()` | `Array` | Get conversation history |
+| `clearHistory()` | `Promise<void>` | Clear conversation history |
+| `getLastUsage()` | `UsageData \| null` | Token usage from last call |
+| `estimate(payload)` | `Promise<{ inputTokens }>` | Estimate input tokens |
+| `estimateCost(payload)` | `Promise<object>` | Estimate cost in dollars |

package/package.json CHANGED Viewed

@@ -2,7 +2,7 @@
 	"name": "ak-gemini",
 	"author": "ak@mixpanel.com",
 	"description": "AK's Generative AI Helper for doing... everything",
-	"version": "2.0.1",
+	"version": "2.0.2",
 	"main": "index.js",
 	"files": [
 		"index.js",
@@ -13,9 +13,11 @@
 		"message.js",
 		"tool-agent.js",
 		"code-agent.js",
+		"rag-agent.js",
 		"json-helpers.js",
 		"types.d.ts",
-		"logger.js"
+		"logger.js",
+		"GUIDE.md"
 	],
 	"types": "types.d.ts",
 	"exports": {

package/rag-agent.js ADDED Viewed

@@ -0,0 +1,340 @@
+/**
+ * @fileoverview RagAgent class — AI agent for document & data Q&A.
+ *
+ * NOTE: This is not true RAG (no vector embeddings, chunking, or similarity
+ * search). It uses long-context injection — all content is placed directly
+ * into the model's context window. Named "RagAgent" because it serves the
+ * same purpose in spirit: grounding AI responses in user-provided data.
+ *
+ * Supports three input types:
+ * - remoteFiles: uploaded via Google Files API (PDFs, images, audio, video)
+ * - localFiles: read from disk as text (md, json, csv, yaml, txt)
+ * - localData: in-memory objects serialized as JSON
+ */
+import { resolve, basename, extname } from 'node:path';
+import { readFile } from 'node:fs/promises';
+import BaseGemini from './base.js';
+import log from './logger.js';
+/** @type {Record<string, string>} */
+const MIME_TYPES = {
+	// Text
+	'.txt': 'text/plain', '.md': 'text/plain', '.csv': 'text/csv',
+	'.html': 'text/html', '.htm': 'text/html', '.xml': 'text/xml',
+	'.json': 'application/json', '.js': 'text/javascript', '.mjs': 'text/javascript',
+	'.ts': 'text/plain', '.css': 'text/css', '.yaml': 'text/plain', '.yml': 'text/plain',
+	'.py': 'text/x-python', '.rb': 'text/plain', '.sh': 'text/plain',
+	// Documents
+	'.pdf': 'application/pdf',
+	'.doc': 'application/msword',
+	'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
+	// Images
+	'.png': 'image/png', '.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
+	'.gif': 'image/gif', '.webp': 'image/webp', '.svg': 'image/svg+xml',
+	// Audio
+	'.mp3': 'audio/mpeg', '.wav': 'audio/wav', '.ogg': 'audio/ogg',
+	'.flac': 'audio/flac', '.aac': 'audio/aac',
+	// Video
+	'.mp4': 'video/mp4', '.webm': 'video/webm', '.avi': 'video/x-msvideo',
+	'.mov': 'video/quicktime', '.mkv': 'video/x-matroska',
+};
+/**
+ * @typedef {import('./types').RagAgentOptions} RagAgentOptions
+ * @typedef {import('./types').RagResponse} RagResponse
+ * @typedef {import('./types').RagStreamEvent} RagStreamEvent
+ * @typedef {import('./types').LocalDataEntry} LocalDataEntry
+ */
+const DEFAULT_SYSTEM_PROMPT =
+	'You are a helpful AI assistant. Answer questions based on the provided documents and data. ' +
+	'When referencing information, mention which document or data source it comes from.';
+const FILE_POLL_INTERVAL_MS = 2000;
+const FILE_POLL_TIMEOUT_MS = 60_000;
+/**
+ * AI agent that answers questions grounded in user-provided documents and data.
+ * Supports three input types:
+ * - `remoteFiles` — uploaded via Google Files API (PDFs, images, audio, video)
+ * - `localFiles` — read from disk as text (md, json, csv, yaml, txt)
+ * - `localData` — in-memory objects serialized as JSON
+ *
+ * @example
+ * ```javascript
+ * import { RagAgent } from 'ak-gemini';
+ *
+ * const agent = new RagAgent({
+ *   remoteFiles: ['./report.pdf', './diagram.png'],
+ *   localFiles: ['./docs/api.md', './config.yaml'],
+ *   localData: [
+ *     { name: 'users', data: [{ id: 1, name: 'Alice' }] },
+ *   ],
+ * });
+ *
+ * const result = await agent.chat('What does the API doc say about auth?');
+ * console.log(result.text);
+ *
+ * // Streaming
+ * for await (const event of agent.stream('Summarize the report')) {
+ *   if (event.type === 'text') process.stdout.write(event.text);
+ * }
+ * ```
+ */
+class RagAgent extends BaseGemini {
+	/**
+	 * @param {RagAgentOptions} [options={}]
+	 */
+	constructor(options = {}) {
+		if (options.systemPrompt === undefined) {
+			options = { ...options, systemPrompt: DEFAULT_SYSTEM_PROMPT };
+		}
+		super(options);
+		this.remoteFiles = options.remoteFiles || [];
+		this.localFiles = options.localFiles || [];
+		this.localData = options.localData || [];
+		this._uploadedRemoteFiles = [];
+		this._localFileContents = [];
+		this._initialized = false;
+		const total = this.remoteFiles.length + this.localFiles.length + this.localData.length;
+		log.debug(`RagAgent created with ${total} context sources`);
+	}
+	// ── Initialization ───────────────────────────────────────────────────────
+	/**
+	 * Uploads remote files, reads local files, and seeds all context into the chat.
+	 * @param {boolean} [force=false]
+	 * @returns {Promise<void>}
+	 */
+	async init(force = false) {
+		if (this._initialized && !force) return;
+		// 1. Upload remote files via Files API
+		this._uploadedRemoteFiles = [];
+		for (const filePath of this.remoteFiles) {
+			const resolvedPath = resolve(filePath);
+			log.debug(`Uploading remote file: ${resolvedPath}`);
+			const ext = extname(resolvedPath).toLowerCase();
+			const mimeType = MIME_TYPES[ext] || 'application/octet-stream';
+			const uploaded = await this.genAIClient.files.upload({
+				file: resolvedPath,
+				config: { displayName: basename(resolvedPath), mimeType }
+			});
+			await this._waitForFileActive(uploaded);
+			this._uploadedRemoteFiles.push({
+				...uploaded,
+				originalPath: resolvedPath
+			});
+			log.debug(`File uploaded: ${uploaded.displayName} (${uploaded.mimeType})`);
+		}
+		// 2. Read local files from disk
+		this._localFileContents = [];
+		for (const filePath of this.localFiles) {
+			const resolvedPath = resolve(filePath);
+			log.debug(`Reading local file: ${resolvedPath}`);
+			const content = await readFile(resolvedPath, 'utf-8');
+			this._localFileContents.push({
+				name: basename(resolvedPath),
+				content,
+				path: resolvedPath
+			});
+			log.debug(`Local file read: ${basename(resolvedPath)} (${content.length} chars)`);
+		}
+		// 3. Set system instruction and create chat session
+		this.chatConfig.systemInstruction = /** @type {string} */ (this.systemPrompt);
+		await super.init(force);
+		// 4. Build unified context parts and seed into chat history
+		/** @type {Array<Object>} */
+		const parts = [];
+		// Remote file references
+		for (const f of this._uploadedRemoteFiles) {
+			parts.push({ fileData: { fileUri: f.uri, mimeType: f.mimeType } });
+		}
+		// Local file contents
+		for (const lf of this._localFileContents) {
+			parts.push({ text: `--- File: ${lf.name} ---\n${lf.content}` });
+		}
+		// Local data entries
+		for (const ld of this.localData) {
+			const serialized = typeof ld.data === 'string' ? ld.data : JSON.stringify(ld.data, null, 2);
+			parts.push({ text: `--- Data: ${ld.name} ---\n${serialized}` });
+		}
+		if (parts.length > 0) {
+			parts.push({ text: 'Here are the documents and data to analyze.' });
+			const history = [
+				{ role: 'user', parts },
+				{ role: 'model', parts: [{ text: 'I have reviewed all the provided documents and data. I am ready to answer your questions about them.' }] }
+			];
+			this.chatSession = this._createChatSession(history);
+		}
+		this._initialized = true;
+		log.debug(`RagAgent initialized with ${this._uploadedRemoteFiles.length} remote files, ${this._localFileContents.length} local files, ${this.localData.length} data entries`);
+	}
+	// ── Non-Streaming Chat ───────────────────────────────────────────────────
+	/**
+	 * Send a message and get a complete response grounded in the loaded context.
+	 *
+	 * @param {string} message - The user's question
+	 * @param {Object} [opts={}] - Per-message options
+	 * @param {Record<string, string>} [opts.labels] - Per-message billing labels
+	 * @returns {Promise<RagResponse>}
+	 */
+	async chat(message, opts = {}) {
+		if (!this._initialized) await this.init();
+		const response = await this.chatSession.sendMessage({ message });
+		this._captureMetadata(response);
+		this._cumulativeUsage = {
+			promptTokens: this.lastResponseMetadata.promptTokens,
+			responseTokens: this.lastResponseMetadata.responseTokens,
+			totalTokens: this.lastResponseMetadata.totalTokens,
+			attempts: 1
+		};
+		return {
+			text: response.text || '',
+			usage: this.getLastUsage()
+		};
+	}
+	// ── Streaming ────────────────────────────────────────────────────────────
+	/**
+	 * Send a message and stream the response as events.
+	 *
+	 * @param {string} message - The user's question
+	 * @param {Object} [opts={}] - Per-message options
+	 * @yields {RagStreamEvent}
+	 */
+	async *stream(message, opts = {}) {
+		if (!this._initialized) await this.init();
+		let fullText = '';
+		const streamResponse = await this.chatSession.sendMessageStream({ message });
+		for await (const chunk of streamResponse) {
+			if (chunk.candidates?.[0]?.content?.parts?.[0]?.text) {
+				const text = chunk.candidates[0].content.parts[0].text;
+				fullText += text;
+				yield { type: 'text', text };
+			}
+		}
+		yield {
+			type: 'done',
+			fullText,
+			usage: this.getLastUsage()
+		};
+	}
+	// ── Context Management ──────────────────────────────────────────────────
+	/**
+	 * Add remote files (uploaded via Files API). Triggers reinitialize.
+	 * @param {string[]} paths
+	 * @returns {Promise<void>}
+	 */
+	async addRemoteFiles(paths) {
+		this.remoteFiles.push(...paths);
+		await this.init(true);
+	}
+	/**
+	 * Add local text files (read from disk). Triggers reinitialize.
+	 * @param {string[]} paths
+	 * @returns {Promise<void>}
+	 */
+	async addLocalFiles(paths) {
+		this.localFiles.push(...paths);
+		await this.init(true);
+	}
+	/**
+	 * Add in-memory data entries. Triggers reinitialize.
+	 * @param {LocalDataEntry[]} entries
+	 * @returns {Promise<void>}
+	 */
+	async addLocalData(entries) {
+		this.localData.push(...entries);
+		await this.init(true);
+	}
+	/**
+	 * Returns metadata about all context sources.
+	 * @returns {{ remoteFiles: Array<Object>, localFiles: Array<Object>, localData: Array<Object> }}
+	 */
+	getContext() {
+		return {
+			remoteFiles: this._uploadedRemoteFiles.map(f => ({
+				name: f.name,
+				displayName: f.displayName,
+				mimeType: f.mimeType,
+				sizeBytes: f.sizeBytes,
+				uri: f.uri,
+				originalPath: f.originalPath
+			})),
+			localFiles: this._localFileContents.map(lf => ({
+				name: lf.name,
+				path: lf.path,
+				size: lf.content.length
+			})),
+			localData: this.localData.map(ld => ({
+				name: ld.name,
+				type: typeof ld.data === 'object' && ld.data !== null
+					? (Array.isArray(ld.data) ? 'array' : 'object')
+					: typeof ld.data
+			}))
+		};
+	}
+	// ── Private Helpers ──────────────────────────────────────────────────────
+	/**
+	 * Polls until an uploaded file reaches ACTIVE state.
+	 * @param {Object} file - The uploaded file object
+	 * @returns {Promise<void>}
+	 * @private
+	 */
+	async _waitForFileActive(file) {
+		if (file.state === 'ACTIVE') return;
+		const start = Date.now();
+		while (Date.now() - start < FILE_POLL_TIMEOUT_MS) {
+			const updated = await this.genAIClient.files.get({ name: file.name });
+			if (updated.state === 'ACTIVE') return;
+			if (updated.state === 'FAILED') {
+				throw new Error(`File processing failed: ${file.displayName || file.name}`);
+			}
+			await new Promise(r => setTimeout(r, FILE_POLL_INTERVAL_MS));
+		}
+		throw new Error(`File processing timed out after ${FILE_POLL_TIMEOUT_MS / 1000}s: ${file.displayName || file.name}`);
+	}
+}
+export default RagAgent;