PyPI - agent-scaffold-cli - Versions diffs - 0.1.1__py3-none-any.whl - Mend

agent-scaffold-cli 0.1.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

agent_scaffold/_bundled_deployments/docs/cross-cutting/observability.md ADDED Viewed

@@ -0,0 +1,259 @@
+# Cross-cutting: Observability
+**Concern:** Trace every LLM call, tool invocation, and agent step so you can debug, optimize, and audit agent behavior.
+**Library:** Langfuse (self-hosted, MIT)
+**Lives in:** Inline below (formerly `common/python/agent_common/observability/` and `common/typescript/src/observability/`)
+## What it provides
+- **Singleton client** -- `get_langfuse()` (Py) / `createLangfuseClient()` (TS) initializes once and reuses across the app.
+- **Trace decorator** -- `@traced("name")` (Py) / `traced("name", fn)` (TS) wraps any function in a Langfuse trace span with automatic error capture.
+- **Async support** -- The Python decorator auto-detects sync vs async functions. The TS version is async-native.
+- **Error propagation** -- Exceptions are recorded on the span (level=ERROR, status_message) and re-raised. Tracing never swallows errors.
+## How to use
+### Python
+```python
+from agent_common.observability import get_langfuse, traced
+# Initialize (typically in app lifespan)
+langfuse = get_langfuse(
+    public_key="pk-lf-local",
+    secret_key="sk-lf-local",
+    host="http://localhost:3000",
+)
+# Trace a function
+@traced("answer_question")
+async def answer_question(question: str) -> str:
+    # LLM call, tool use, etc.
+    return result
+```
+### TypeScript
+```typescript
+import { createLangfuseClient, traced } from "@agent-deployments/common";
+// Initialize
+createLangfuseClient({
+  publicKey: "pk-lf-local",
+  secretKey: "sk-lf-local",
+  host: "http://localhost:3000",
+});
+// Trace a function
+const answer = await traced("answer_question", async () => {
+  // LLM call, tool use, etc.
+  return result;
+});
+```
+### Nesting spans
+Create child spans within a traced function for granular visibility:
+```python
+@traced("rag_pipeline")
+async def rag_pipeline(question: str) -> str:
+    client = get_langfuse()
+    trace = client.trace(name="rag_pipeline")
+    # Child span for retrieval
+    retrieval_span = trace.span(name="retrieve_chunks")
+    chunks = await retrieve(question)
+    retrieval_span.end(output=f"{len(chunks)} chunks")
+    # Child span for generation
+    gen_span = trace.span(name="generate_answer")
+    answer = await generate(question, chunks)
+    gen_span.end(output=answer[:200])
+    return answer
+```
+## Tests
+Test that the observability fixtures work with mocked Langfuse (Py). Test traced() wrapper behavior for both success and error paths (TS).
+## Configuration via env
+| Var | Default | Effect |
+|-----|---------|--------|
+| `LANGFUSE_PUBLIC_KEY` | `pk-lf-local` | Project public key for the Langfuse API |
+| `LANGFUSE_SECRET_KEY` | `sk-lf-local` | Project secret key for the Langfuse API |
+| `LANGFUSE_HOST` | `http://localhost:3000` | Langfuse server URL |
+These are set in each prototype's `.env.example` and validated at boot via `settings.py` / `config.ts`.
+## Viewing traces
+With `docker compose up`, Langfuse is available at `http://localhost:3000`:
+- Default login: `admin@local.dev` / `admin`
+- Project: `default` (auto-created via init env vars in `docker-compose.base.yml`)
+- Each request generates a trace with spans for agent steps, tool calls, and LLM invocations
+## Swapping to LangSmith
+For teams already using LangChain/LangGraph heavily, LangSmith is a drop-in alternative:
+1. Replace `langfuse` dependency with `langsmith`
+2. Replace `get_langfuse()` / `@traced` with LangSmith's `@traceable` decorator
+3. Set `LANGCHAIN_TRACING_V2=true` and `LANGCHAIN_API_KEY` in env
+4. Remove Langfuse services from `docker-compose.yml`
+This is a **multi-file swap** (common module + env config + docker-compose).
+## Reference Implementation
+<details>
+<summary>Python — <code>langfuse.py</code></summary>
+```python
+"""Langfuse client singleton and trace decorator."""
+import asyncio
+import functools
+from typing import Any, Callable
+from langfuse import Langfuse
+_client: Langfuse | None = None
+def get_langfuse(
+    *,
+    public_key: str | None = None,
+    secret_key: str | None = None,
+    host: str = "http://localhost:3000",
+) -> Langfuse:
+    """Get or create the Langfuse singleton client."""
+    global _client
+    if _client is None:
+        _client = Langfuse(
+            public_key=public_key,
+            secret_key=secret_key,
+            host=host,
+        )
+    return _client
+def traced(
+    name: str | None = None,
+    *,
+    metadata: dict[str, Any] | None = None,
+) -> Callable:
+    """Decorator that wraps a function in a Langfuse trace span."""
+    def decorator(fn: Callable) -> Callable:
+        span_name = name or fn.__name__
+        @functools.wraps(fn)
+        async def async_wrapper(*args: Any, **kwargs: Any) -> Any:
+            client = get_langfuse()
+            trace = client.trace(name=span_name, metadata=metadata or {})
+            span = trace.span(name=span_name)
+            try:
+                result = await fn(*args, **kwargs)
+                span.end(output=str(result)[:500])
+                return result
+            except Exception as exc:
+                span.end(level="ERROR", status_message=str(exc))
+                raise
+        @functools.wraps(fn)
+        def sync_wrapper(*args: Any, **kwargs: Any) -> Any:
+            client = get_langfuse()
+            trace = client.trace(name=span_name, metadata=metadata or {})
+            span = trace.span(name=span_name)
+            try:
+                result = fn(*args, **kwargs)
+                span.end(output=str(result)[:500])
+                return result
+            except Exception as exc:
+                span.end(level="ERROR", status_message=str(exc))
+                raise
+        if asyncio.iscoroutinefunction(fn):
+            return async_wrapper
+        return sync_wrapper
+    return decorator
+```
+</details>
+<details>
+<summary>TypeScript — <code>langfuse.ts</code></summary>
+```typescript
+/**
+ * Langfuse client wrapper and trace utilities.
+ *
+ * Note: This is a lightweight wrapper. The actual Langfuse SDK should be
+ * installed in each prototype that needs it. This module provides the
+ * configuration shape and a traced() helper pattern.
+ */
+export interface LangfuseConfig {
+  publicKey: string;
+  secretKey: string;
+  host?: string;
+}
+interface TraceSpan {
+  name: string;
+  startTime: number;
+  endTime?: number;
+  metadata?: Record<string, unknown>;
+  status?: "ok" | "error";
+  error?: string;
+}
+let _config: LangfuseConfig | null = null;
+/**
+ * Initialize the Langfuse client configuration.
+ */
+export function createLangfuseClient(config: LangfuseConfig): LangfuseConfig {
+  _config = config;
+  return _config;
+}
+/**
+ * Decorator-style wrapper that traces a function execution.
+ *
+ * Usage:
+ *   const result = await traced("my-operation", async () => {
+ *     return doSomething();
+ *   });
+ */
+export async function traced<T>(
+  name: string,
+  fn: () => Promise<T>,
+  metadata?: Record<string, unknown>,
+): Promise<T> {
+  const span: TraceSpan = {
+    name,
+    startTime: Date.now(),
+    metadata,
+  };
+  try {
+    const result = await fn();
+    span.endTime = Date.now();
+    span.status = "ok";
+    return result;
+  } catch (error) {
+    span.endTime = Date.now();
+    span.status = "error";
+    span.error = error instanceof Error ? error.message : String(error);
+    throw error;
+  }
+}
+```
+</details>

agent_scaffold/_bundled_deployments/docs/cross-cutting/rate-limiting.md ADDED Viewed

@@ -0,0 +1,171 @@
+# Cross-cutting: Rate Limiting
+**Concern:** Protect agent endpoints from abuse with per-user and per-IP request throttling.
+**Library:** `slowapi` (Py) / custom sliding-window middleware (TS)
+**Lives in:** Inline below (formerly `common/python/agent_common/ratelimit/` and `common/typescript/src/ratelimit/`)
+## What it provides
+- **Python:** `build_limiter(redis_url, default_limit)` returns a configured `slowapi.Limiter` instance backed by Redis. Integrates with FastAPI via `app.state.limiter` and the `@limiter.limit()` decorator.
+- **TypeScript:** `buildRateLimiter(config)` returns a function `(key: string) => RateLimitResult` that checks a sliding window counter. Currently in-memory; swap to Redis for distributed deployments.
+## How to use
+### Python (FastAPI + slowapi)
+```python
+from agent_common.ratelimit import build_limiter
+from slowapi import _rate_limit_exceeded_handler
+from slowapi.errors import RateLimitExceeded
+limiter = build_limiter(redis_url="redis://localhost:6379", default_limit="60/minute")
+app = FastAPI()
+app.state.limiter = limiter
+app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
+@app.post("/query")
+@limiter.limit("30/minute")  # Override default for this endpoint
+async def query(request: Request):
+    ...
+```
+The key function defaults to `get_remote_address` (client IP). For per-user limiting, pass a custom key function that extracts the user ID from the JWT.
+### TypeScript (Hono)
+```typescript
+import { buildRateLimiter } from "@agent-deployments/common";
+const checkLimit = buildRateLimiter({
+  redisUrl: "redis://localhost:6379",
+  maxRequests: 60,
+  windowSeconds: 60,
+});
+app.use("*", async (c, next) => {
+  const key = c.req.header("x-user-id") ?? c.req.header("x-forwarded-for") ?? "anon";
+  const result = checkLimit(key);
+  if (!result.allowed) {
+    return c.json({ error: "Rate limit exceeded" }, 429);
+  }
+  c.header("X-RateLimit-Remaining", String(result.remaining));
+  await next();
+});
+```
+## Configuration via env
+| Var | Default | Effect |
+|-----|---------|--------|
+| `REDIS_URL` | `redis://localhost:6379` | Redis instance for rate limit counters (Py) |
+| Default limit | `60/minute` | Global default; override per-endpoint |
+## Suggested limits for agent endpoints
+| Endpoint type | Suggested limit | Rationale |
+|--------------|----------------|-----------|
+| `/query` (LLM call) | 10-30/minute | LLM calls are expensive and slow |
+| `/documents` (ingest) | 5/minute | Ingestion triggers chunking + embedding |
+| `/health` | Unlimited | Monitoring probes |
+## Tests
+Test limiter creation with Redis URL (Py). Test window behavior, allow/deny, and reset (TS).
+## Production considerations
+- The Python implementation is **production-ready** -- slowapi + Redis handles distributed rate limiting across multiple app instances.
+- The TypeScript implementation is **in-memory** -- fine for single-instance dev, but must be swapped to a Redis-backed store (e.g., `hono-rate-limiter` with `ioredis`) for multi-instance production.
+- Add `Retry-After` and `X-RateLimit-*` headers so clients can back off gracefully.
+## Reference Implementation
+<details>
+<summary>Python — <code>slowapi_setup.py</code></summary>
+```python
+"""Rate limiter setup using slowapi + Redis."""
+from slowapi import Limiter
+from slowapi.util import get_remote_address
+def build_limiter(
+    redis_url: str = "redis://localhost:6379",
+    *,
+    default_limit: str = "60/minute",
+) -> Limiter:
+    """Build a configured slowapi Limiter backed by Redis."""
+    return Limiter(
+        key_func=get_remote_address,
+        default_limits=[default_limit],
+        storage_uri=redis_url,
+    )
+```
+</details>
+<details>
+<summary>TypeScript — <code>ratelimit.ts</code></summary>
+```typescript
+/**
+ * Rate limiting utilities for Hono-based prototypes.
+ */
+export interface RateLimitConfig {
+  /** Redis URL for distributed rate limiting */
+  redisUrl: string;
+  /** Max requests per window */
+  maxRequests: number;
+  /** Window size in seconds */
+  windowSeconds: number;
+}
+interface RateLimitResult {
+  allowed: boolean;
+  remaining: number;
+  resetAt: number;
+}
+/**
+ * Build a rate limiter function.
+ *
+ * Returns a function that checks whether a given key (e.g., user ID or IP)
+ * is within its rate limit. Uses a simple in-memory sliding window for now;
+ * Redis-backed implementation should be added per prototype.
+ */
+export function buildRateLimiter(config: RateLimitConfig) {
+  const windows = new Map<string, { count: number; resetAt: number }>();
+  return (key: string): RateLimitResult => {
+    const now = Date.now();
+    const entry = windows.get(key);
+    if (!entry || now >= entry.resetAt) {
+      windows.set(key, {
+        count: 1,
+        resetAt: now + config.windowSeconds * 1000,
+      });
+      return {
+        allowed: true,
+        remaining: config.maxRequests - 1,
+        resetAt: now + config.windowSeconds * 1000,
+      };
+    }
+    entry.count++;
+    const allowed = entry.count <= config.maxRequests;
+    return {
+      allowed,
+      remaining: Math.max(0, config.maxRequests - entry.count),
+      resetAt: entry.resetAt,
+    };
+  };
+}
+```
+</details>

agent_scaffold/_bundled_deployments/docs/cross-cutting/testing-strategy.md ADDED Viewed

@@ -0,0 +1,261 @@
+# Cross-cutting: Testing Strategy
+**Concern:** Three-tier test strategy that validates agent behavior without flaky LLM-dependent suites blocking CI.
+**Library:** `pytest` + `deepeval` (Py) / `vitest` (TS)
+**Lives in:** Inline below (formerly `common/python/agent_common/testing/` and `common/typescript/src/testing/`)
+## The three tiers
+```
+┌──────────────────────────────────────────────┐
+│  Tier 3: Eval (golden datasets)              │  main branch only, real LLM
+│  Faithfulness, relevancy, correctness        │
+├──────────────────────────────────────────────┤
+│  Tier 2: Integration (real LLM)              │  main branch only, ANTHROPIC_API_KEY
+│  End-to-end agent flow, actual model calls   │
+├──────────────────────────────────────────────┤
+│  Tier 1: Unit (mocked LLM)                  │  every PR, fast, deterministic
+│  Schema validation, tool logic, API routes   │
+└──────────────────────────────────────────────┘
+```
+| Tier | Runs on | LLM | Speed | What it validates |
+|------|---------|-----|-------|-------------------|
+| Unit | Every PR | Mocked | < 10s | Schemas, tool functions, route handlers, chunking logic |
+| Integration | Main only | Real | 30-60s | Full agent pipeline with actual model calls |
+| Eval | Main only | Real | 1-5min | Quality metrics on golden datasets |
+## Directory layout
+Every prototype follows this structure:
+```
+# Python
+tests/
+├── __init__.py
+├── unit/
+│   ├── __init__.py
+│   ├── test_api.py         # Route handler tests
+│   ├── test_schemas.py     # Request/response validation
+│   └── test_tools.py       # Tool function logic
+├── integration/
+│   └── __init__.py
+└── evals/
+    └── __init__.py
+# TypeScript
+tests/
+├── unit/
+│   ├── api.test.ts
+│   ├── schemas.test.ts
+│   └── tools.test.ts
+```
+## Shared test fixtures
+The shared testing utilities provide mock LLM utilities so unit tests never hit a real model (see Reference Implementation below for the full source):
+### Python
+```python
+from agent_common.testing import mock_llm_response, mock_llm_client
+# Single mock response
+response = mock_llm_response("The answer is 42", model="claude-sonnet-4-6")
+assert response.choices[0].message.content == "The answer is 42"
+# Mock client that cycles through predefined responses
+client = mock_llm_client(["Response 1", "Response 2"])
+result = await client.chat.completions.create()
+assert result.choices[0].message.content == "Response 1"
+```
+### TypeScript
+```typescript
+import { mockLlmResponse, mockLlmClient } from "@agent-deployments/common";
+const response = mockLlmResponse("The answer is 42");
+expect(response.choices[0].message.content).toBe("The answer is 42");
+const client = mockLlmClient(["Response 1", "Response 2"]);
+const result = await client.chat.completions.create();
+expect(result.choices[0].message.content).toBe("Response 1");
+```
+## Running tests
+```bash
+# Unit tests (every PR)
+make test-unit PROTOTYPE=docs-rag-qa TRACK=python
+# Integration tests (needs ANTHROPIC_API_KEY)
+make test-integration PROTOTYPE=docs-rag-qa TRACK=python
+# Eval suite (needs ANTHROPIC_API_KEY)
+make eval PROTOTYPE=docs-rag-qa TRACK=python
+# All tests
+make test PROTOTYPE=docs-rag-qa TRACK=python
+```
+## CI behavior
+Defined in `.github/workflows/ci.yml`:
+- **On PR:** Unit tests run. Integration and eval are skipped (no API key, saves cost).
+- **On main:** Unit + integration + eval all run with `ANTHROPIC_API_KEY` from GitHub Secrets.
+- **Exit code 5** (no tests collected) is treated as success via `|| test $? -eq 5`, handling prototypes where integration/eval tests haven't been written yet.
+## Eval datasets
+Each prototype includes `eval/dataset.jsonl` with golden input/output pairs:
+```jsonl
+{"input": "What is MCP?", "expected_output": "MCP is the Model Context Protocol...", "metadata": {}}
+```
+## Security testing (Promptfoo)
+Each prototype includes `eval/promptfoo.yaml` for red-team scans:
+```yaml
+redteam:
+  plugins:
+    - prompt-injection
+    - jailbreak
+    - pii
+```
+Run via `make security PROTOTYPE=<name>`. Runs on main branch in CI.
+## Tests
+Validate that mock fixtures produce correct response shapes and that the mock client cycles through predefined responses.
+## Reference Implementation
+<details>
+<summary>Python — <code>fixtures.py</code></summary>
+```python
+"""Shared pytest fixtures and test utilities for agent-deployments prototypes."""
+from typing import Any
+from unittest.mock import AsyncMock, MagicMock
+def mock_llm_response(content: str = "Hello from mock LLM", **kwargs: Any) -> MagicMock:
+    """Create a mock LLM response object."""
+    message = MagicMock()
+    message.content = content
+    message.role = "assistant"
+    message.tool_calls = kwargs.get("tool_calls", [])
+    choice = MagicMock()
+    choice.message = message
+    choice.finish_reason = kwargs.get("finish_reason", "stop")
+    response = MagicMock()
+    response.choices = [choice]
+    response.model = kwargs.get("model", "mock-model")
+    response.usage = MagicMock(
+        prompt_tokens=kwargs.get("prompt_tokens", 10),
+        completion_tokens=kwargs.get("completion_tokens", 20),
+        total_tokens=kwargs.get("total_tokens", 30),
+    )
+    return response
+def mock_llm_client(responses: list[str] | None = None) -> AsyncMock:
+    """Create a mock async LLM client that returns predefined responses."""
+    _responses = responses or ["Mock response"]
+    _call_count = 0
+    async def _create(**kwargs: Any) -> MagicMock:
+        nonlocal _call_count
+        content = _responses[_call_count % len(_responses)]
+        _call_count += 1
+        return mock_llm_response(content, **kwargs)
+    client = AsyncMock()
+    client.chat.completions.create = _create
+    return client
+```
+</details>
+<details>
+<summary>TypeScript — <code>fixtures.ts</code></summary>
+```typescript
+/**
+ * Shared test utilities for agent-deployments prototypes.
+ */
+export interface MockLlmResponse {
+  choices: Array<{
+    message: { role: string; content: string; tool_calls?: unknown[] };
+    finish_reason: string;
+  }>;
+  model: string;
+  usage: {
+    prompt_tokens: number;
+    completion_tokens: number;
+    total_tokens: number;
+  };
+}
+/**
+ * Create a mock LLM response object.
+ */
+export function mockLlmResponse(
+  content = "Hello from mock LLM",
+  options: {
+    model?: string;
+    finishReason?: string;
+    toolCalls?: unknown[];
+  } = {},
+): MockLlmResponse {
+  return {
+    choices: [
+      {
+        message: {
+          role: "assistant",
+          content,
+          tool_calls: options.toolCalls ?? [],
+        },
+        finish_reason: options.finishReason ?? "stop",
+      },
+    ],
+    model: options.model ?? "mock-model",
+    usage: {
+      prompt_tokens: 10,
+      completion_tokens: 20,
+      total_tokens: 30,
+    },
+  };
+}
+/**
+ * Create a mock LLM client that returns predefined responses.
+ */
+export function mockLlmClient(responses: string[] = ["Mock response"]) {
+  let callCount = 0;
+  return {
+    chat: {
+      completions: {
+        create: async (): Promise<MockLlmResponse> => {
+          const content = responses[callCount % responses.length] ?? "";
+          callCount++;
+          return mockLlmResponse(content);
+        },
+      },
+    },
+  };
+}
+```
+</details>

agent_scaffold/_bundled_deployments/docs/frameworks/README.md ADDED Viewed

@@ -0,0 +1,22 @@
+# Frameworks
+Agent frameworks used in this repo. Each file answers: **"How do I implement the pattern?"**
+| Framework | Language | Best for | Used in |
+|-----------|----------|----------|---------|
+| [LangGraph](langgraph.md) | Python | Stateful graphs, multi-step, multi-agent | research-assistant, code-review, memory, hierarchical |
+| [Pydantic AI](pydantic-ai.md) | Python | Single agents, typed tools, simple ReAct | customer-support, docs-rag-qa, research-assistant |
+| [CrewAI](crewai.md) | Python | Multi-agent crews | ops-crew |
+| [Mastra](mastra.md) | TypeScript | Workflows, memory, multi-agent | Not yet used (documented as TS option) |
+| [Vercel AI SDK](vercel-ai-sdk.md) | TypeScript | Lightweight agents, streaming | All TS tracks |
+## How to pick a framework
+**Python track:**
+- Simple agent with tools → **Pydantic AI** (least boilerplate)
+- Complex state, multi-step, checkpointing → **LangGraph** (best state management)
+- Team of collaborating agents → **CrewAI** (purpose-built for crews)
+**TypeScript track:**
+- Most use cases → **Vercel AI SDK** (lightweight, production-proven)
+- Need workflows, memory, or multi-agent → **Mastra** (batteries included)