npm - maestro-bundle - Versions diffs - 1.3.1 → 1.5.0 - Mend

maestro-bundle 1.3.1 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (128) hide show

package/templates/bundle-ai-agents/skills/memory-management/SKILL.md CHANGED Viewed

@@ -1,78 +1,112 @@
 ---
 name: memory-management
-description: Implementar memória de curto, médio e longo prazo para agentes usando LangGraph Store e checkpointers. Use quando precisar que agentes lembrem de interações anteriores, persistam estado, ou aprendam com execuções passadas.
+description: Implement short-term, medium-term, and long-term memory for AI agents using LangGraph Store and checkpointers. Use when agents need to remember past interactions, persist state, or learn from previous executions.
+version: 1.0.0
+author: Maestro
 ---
-# Gerenciamento de Memória
+# Memory Management
-## 3 Níveis de Memória
+Implement three tiers of agent memory -- short-term (context window), medium-term (checkpointer), and long-term (store) -- to enable persistent learning and state management.
-| Nível | Duração | Mecanismo | Exemplo |
-|---|---|---|---|
-| Curto prazo | 1 sessão | Context window | Mensagens da conversa atual |
-| Médio prazo | 1 demanda | Checkpointer | Estado entre nós do grafo |
-| Longo prazo | Permanente | Store | Padrões aprendidos, preferências |
+## When to Use
+- Agent needs to resume work after interruption
+- Agent should learn from past executions and avoid repeating mistakes
+- Persisting state between nodes in a LangGraph workflow
+- Storing and retrieving patterns learned across multiple demands
+- Implementing memory decay to remove stale or low-confidence knowledge
-## Curto Prazo — Context Window
+## Available Operations
+1. Configure short-term memory via context window
+2. Set up medium-term memory with LangGraph checkpointer
+3. Implement long-term memory with LangGraph Store
+4. Integrate memory into Deep Agent configuration
+5. Implement memory cleanup and decay policies
-Gerenciado automaticamente pelo LangGraph. Usar `add_messages` para acumular.
+## Multi-Step Workflow
+### Step 1: Set Up Short-Term Memory (Context Window)
+Short-term memory is automatic -- LangGraph accumulates messages within a session.
 ```python
+from typing import TypedDict, Annotated
+from langgraph.graph.message import add_messages
 class AgentState(TypedDict):
-    messages: Annotated[list, add_messages]  # Acumula automaticamente
+    messages: Annotated[list, add_messages]  # Accumulates automatically
 ```
-## Médio Prazo — Checkpointer
+No additional setup needed. Messages persist for the duration of a single invocation chain.
-Persiste o estado do grafo entre invocações da mesma demanda.
+### Step 2: Set Up Medium-Term Memory (Checkpointer)
+Persists graph state between invocations of the same demand. Enables resume after failure.
+```bash
+pip install langgraph-checkpoint-postgres psycopg
+```
 ```python
 from langgraph.checkpoint.postgres.aio import AsyncPostgresSaver
+from langgraph.graph import StateGraph
 checkpointer = AsyncPostgresSaver.from_conn_string(DATABASE_URL)
 graph = StateGraph(OrchestratorState)
-# ... definir nós e edges ...
+# ... define nodes and edges ...
 app = graph.compile(checkpointer=checkpointer)
-# Usar thread_id consistente por demanda
+# Use consistent thread_id per demand
 config = {"configurable": {"thread_id": f"demand-{demand_id}"}}
 result = await app.ainvoke({"messages": [...]}, config=config)
-# Próxima invocação com mesmo thread_id retoma do estado salvo
-result2 = await app.ainvoke({"messages": [nova_msg]}, config=config)
+# Next invocation with same thread_id resumes from saved state
+result2 = await app.ainvoke({"messages": [new_msg]}, config=config)
+```
+Verify checkpointer is working:
+```bash
+psql $DATABASE_URL -c "SELECT thread_id, created_at FROM checkpoints ORDER BY created_at DESC LIMIT 5;"
 ```
-## Longo Prazo — Store
+### Step 3: Set Up Long-Term Memory (Store)
-Persiste conhecimento entre demandas diferentes.
+Persists knowledge across different demands. The agent remembers patterns, preferences, and learnings.
+```bash
+pip install langgraph-store-postgres
+```
 ```python
 from langgraph.store.postgres import AsyncPostgresStore
 store = AsyncPostgresStore.from_conn_string(DATABASE_URL)
-# Salvar aprendizado
+# Save a learned pattern
 await store.aput(
     namespace=("agent", "backend", "patterns"),
     key="spring-crud-pattern",
     value={
-        "pattern": "Usar record para DTO, entity para domínio",
+        "pattern": "Use record for DTO, entity for domain",
         "learned_from": "demand-123",
         "confidence": 0.95,
         "created_at": "2026-03-27"
     }
 )
-# Buscar aprendizados relevantes
+# Search for relevant learnings
 results = await store.asearch(
     namespace=("agent", "backend", "patterns"),
-    query="como criar DTO para API REST",
+    query="how to create DTO for REST API",
     limit=5
 )
 ```
-## Memória no Deep Agent
+### Step 4: Integrate Memory into Deep Agent
+Wire all three memory tiers into a Deep Agent configuration.
 ```python
 from deepagents import create_deep_agent
@@ -85,22 +119,72 @@ agent = create_deep_agent(
     backend=FilesystemBackend(root_dir=".", virtual_mode=True),
     checkpointer=PostgresSaver(conn_string=DATABASE_URL),
     store=PostgresStore(conn_string=DATABASE_URL),
-    system_prompt="Você é um agente backend..."
+    system_prompt="You are a backend agent..."
 )
 ```
-## Limpeza de memória
+### Step 5: Implement Memory Cleanup and Decay
-Memórias envelhecem. Implementar decay:
+Memories age. Remove stale or low-confidence entries to keep the store relevant.
 ```python
+from datetime import datetime, timedelta
 async def cleanup_stale_memories(store, max_age_days: int = 90):
-    """Remove memórias antigas ou com baixa confiança"""
+    """Remove old or low-confidence memories."""
     cutoff = datetime.now() - timedelta(days=max_age_days)
     memories = await store.alist(namespace=("agent",))
+    removed = 0
     for mem in memories:
         if mem.value.get("created_at", "") < cutoff.isoformat():
             await store.adelete(namespace=mem.namespace, key=mem.key)
+            removed += 1
         elif mem.value.get("confidence", 1.0) < 0.3:
             await store.adelete(namespace=mem.namespace, key=mem.key)
+            removed += 1
+    return removed
 ```
+Run cleanup:
+```bash
+python -m memory.cleanup --max-age-days 90 --min-confidence 0.3
+```
+## Resources
+- `references/memory-tiers.md` - Detailed comparison of memory tiers with use cases
+- `references/namespace-conventions.md` - Naming conventions for store namespaces
+## Examples
+### Example 1: Enable Resume After Failure
+User asks: "Our agent crashes mid-task and loses all progress. Fix it."
+Response approach:
+1. Add PostgresSaver checkpointer to the agent's graph compilation
+2. Use `thread_id` based on the demand ID for consistent state
+3. On restart, invoke with the same thread_id -- LangGraph automatically resumes
+4. Verify with: `psql $DATABASE_URL -c "SELECT * FROM checkpoints WHERE thread_id='demand-xyz';"`
+### Example 2: Agent Should Learn From Past Work
+User asks: "The backend agent keeps making the same pagination mistake. Make it learn."
+Response approach:
+1. After each successful demand, save patterns to the Store
+2. Before each new task, search the Store for relevant learnings
+3. Inject top-3 relevant learnings into the agent's context
+4. Track confidence scores -- boost on positive feedback, decay on negative
+### Example 3: Clean Up Old Memories
+User asks: "The memory store has grown too large. Clean it up."
+Response approach:
+1. Run `memory.cleanup --max-age-days 90` to remove entries older than 90 days
+2. Remove entries with confidence below 0.3
+3. Audit remaining entries for duplicates
+4. Set up a weekly cron job for automatic cleanup
+## Notes
+- Always use `thread_id` based on a stable identifier (demand_id, session_id)
+- Checkpointer handles resume automatically -- no custom logic needed
+- Store namespaces should follow the convention: `("agent", agent_type, category)`
+- Memory cleanup should run on a schedule (weekly recommended)
+- Include `confidence` and `created_at` in all store entries for decay management
+- Long-term memories should be surfaced via retrieval, not dumped into the prompt

package/templates/bundle-ai-agents/skills/memory-management/references/memory-tiers.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Memory Tiers Reference
+## Comparison
+| Tier | Duration | Mechanism | Storage | Cost | Use Case |
+|---|---|---|---|---|---|
+| Short-term | 1 session | Context window | In-memory | Free | Current conversation messages |
+| Medium-term | 1 demand | Checkpointer | PostgreSQL | Low | Graph state between invocations |
+| Long-term | Permanent | Store | PostgreSQL | Low | Patterns, preferences, learnings |
+## Short-Term Memory
+- **What**: Messages in the current invocation chain.
+- **How**: `Annotated[list, add_messages]` in state schema.
+- **Limit**: Bounded by context window size.
+- **When it resets**: End of the invocation chain.
+## Medium-Term Memory
+- **What**: Full graph state (all state fields) at each node execution.
+- **How**: `graph.compile(checkpointer=PostgresSaver(...))`.
+- **Limit**: Bounded by database storage.
+- **When it resets**: When the demand is completed or explicitly cleared.
+- **Key feature**: Enables resume after crash.
+## Long-Term Memory
+- **What**: Extracted patterns, preferences, and learnings.
+- **How**: `store.aput(namespace=..., key=..., value=...)`.
+- **Limit**: Bounded by database storage and relevance decay.
+- **When it resets**: Only when explicitly cleaned up.
+- **Key feature**: Enables cross-demand learning.
+## Decision Guide
+| Question | Answer | Use |
+|---|---|---|
+| Does the agent need to remember within this conversation? | Yes | Short-term |
+| Does the agent need to resume after a crash? | Yes | Medium-term (checkpointer) |
+| Should the agent learn from past demands? | Yes | Long-term (store) |
+| Does the agent need to share knowledge with other agents? | Yes | Long-term (store with shared namespace) |

package/templates/bundle-ai-agents/skills/memory-management/references/namespace-conventions.md ADDED Viewed

@@ -0,0 +1,41 @@
+# Namespace Conventions Reference
+## Store Namespace Structure
+```
+("agent", agent_type, category)
+```
+## Standard Namespaces
+| Namespace | Purpose | Example Key |
+|---|---|---|
+| `("agent", "backend", "patterns")` | Code patterns learned by backend agent | `"fastapi-crud-pattern"` |
+| `("agent", "frontend", "patterns")` | UI patterns learned by frontend agent | `"react-form-pattern"` |
+| `("agent", "backend", "errors")` | Common errors and their fixes | `"alembic-migration-conflict"` |
+| `("agent", "backend", "preferences")` | Team preferences for code style | `"prefer-dataclass-over-dict"` |
+| `("project", "decisions")` | Architectural decisions | `"chose-fastapi-over-flask"` |
+| `("project", "standards")` | Project-wide coding standards | `"naming-conventions"` |
+## Value Schema
+Every store entry should include these fields:
+```python
+{
+    "pattern": str,        # The actual knowledge
+    "learned_from": str,   # Which demand/task this came from
+    "confidence": float,   # 0.0 to 1.0 (boost on positive feedback, decay on negative)
+    "created_at": str,     # ISO timestamp
+    "updated_at": str,     # ISO timestamp (updated on reinforcement)
+    "usage_count": int,    # How many times this memory was retrieved
+}
+```
+## Confidence Management
+- **Initial**: 0.7 (new learning, not yet validated)
+- **Reinforced**: +0.1 per positive use (max 1.0)
+- **Contradicted**: -0.2 per negative feedback (min 0.0)
+- **Cleanup threshold**: < 0.3 (remove on next cleanup run)
+- **Time decay**: -0.05 per month without usage

package/templates/bundle-ai-agents/skills/prompt-engineering/SKILL.md CHANGED Viewed

@@ -1,66 +1,158 @@
 ---
 name: prompt-engineering
-description: Criar e otimizar system prompts para agentes seguindo melhores práticas de context engineering. Use quando precisar escrever prompts, melhorar prompts existentes, ou criar instruções para agentes.
+description: Create and optimize system prompts for AI agents following context engineering best practices. Use when writing prompts, improving existing prompts, or creating agent instructions.
+version: 1.0.0
+author: Maestro
 ---
-# Prompt Engineering para Agentes
+# Prompt Engineering
-## Estrutura de System Prompt
+Craft effective system prompts for AI agents using structured templates, best practices, and iterative refinement.
+## When to Use
+- Writing a new system prompt for an agent
+- Improving an underperforming agent's instructions
+- Creating role-specific prompts for multi-agent systems
+- Reviewing prompts for anti-patterns and clarity issues
+- Optimizing prompts to reduce token usage without losing quality
+## Available Operations
+1. Write a structured system prompt from scratch
+2. Audit an existing prompt for anti-patterns
+3. Refine a prompt based on agent evaluation results
+4. Create few-shot examples for a prompt
+5. Optimize prompt token count
+## Multi-Step Workflow
+### Step 1: Define the Prompt Structure
+Every agent system prompt should follow this 6-part structure:
 ```
-1. IDENTIDADE — Quem o agente é
-2. OBJETIVO — O que ele deve alcançar
-3. FERRAMENTAS — O que tem disponível
-4. REGRAS — Limites inegociáveis
-5. FORMATO — Como estruturar a saída
-6. EXEMPLOS — Demonstrações concretas
+1. IDENTITY   -- Who the agent is
+2. OBJECTIVE  -- What it must achieve
+3. TOOLS      -- What it has available
+4. RULES      -- Non-negotiable constraints
+5. FORMAT     -- How to structure output
+6. EXAMPLES   -- Concrete demonstrations
 ```
-## Template
+### Step 2: Write the System Prompt
+Use the template below, filling in each section with specific details.
 ```python
 SYSTEM_PROMPT = """
-## Identidade
-Você é {role}, especializado em {especialidade}.
-## Objetivo
-Sua missão é {objetivo_principal}. Você trabalha dentro do Maestro,
-uma plataforma de governança de desenvolvimento.
-## Ferramentas disponíveis
-{lista_de_tools_com_descrição}
-## Regras
-1. Sempre seguir o bundle {bundle_name} para padrões de código
-2. Todo commit deve referenciar a task: {task_id}
-3. Trabalhar apenas na worktree designada: {worktree_path}
-4. Reportar progresso a cada etapa significativa
-5. Solicitar human review para operações destrutivas
-## Formato de resposta
-- Para código: blocos com linguagem especificada
-- Para decisões: justificar com "porquê"
-- Para erros: incluir contexto e sugestão de fix
-## Exemplo
-Task: "Criar endpoint GET /api/v1/demands"
-Ação: Criar controller, use case, repository seguindo Clean Architecture
+## Identity
+You are {role}, specialized in {specialty}.
+## Objective
+Your mission is {primary_objective}. You work within Maestro,
+a development governance platform.
+## Available Tools
+{list_of_tools_with_descriptions}
+## Rules
+1. Always follow the {bundle_name} bundle for code standards
+2. Every commit must reference the task: {task_id}
+3. Work only in the designated worktree: {worktree_path}
+4. Report progress at every significant step
+5. Request human review for destructive operations
+## Response Format
+- For code: use fenced code blocks with language specified
+- For decisions: justify with "why"
+- For errors: include context and suggested fix
+## Example
+Task: "Create endpoint GET /api/v1/demands"
+Action: Create controller, use case, repository following Clean Architecture
 Branch: feature/backend-{task_id}
 """
 ```
-## Boas práticas
+### Step 3: Apply Best Practices
+Review the prompt against these rules:
+1. **Be specific** -- "Create a REST API with FastAPI" > "Create an API"
+2. **Explain why** -- "Use Value Objects because they enforce validation at construction" > "Use Value Objects"
+3. **Give examples** -- One good example is worth 10 lines of instruction
+4. **Avoid negatives** -- "Keep functions under 20 lines" > "Don't write long functions"
+5. **Prioritize** -- Put the most important rules first (models pay more attention to early content)
+6. **Test** -- Run the prompt with real scenarios before deploying
+### Step 4: Check for Anti-Patterns
+Audit the prompt for these common problems:
+```bash
+# Count NEVER/ALWAYS occurrences (too many weaken their impact)
+grep -c -i "never\|always" prompt.md
+# Check prompt length (over 5000 words causes focus loss)
+wc -w prompt.md
+```
+Anti-patterns to fix:
+- Excessive NEVER/ALWAYS (loses emphasis when overused)
+- Contradictory instructions (e.g., "be concise" + "explain everything in detail")
+- Prompts over 5000 words (agent loses focus on key instructions)
+- Rules without justification (agent cannot reason about when to flex)
+### Step 5: Test and Iterate
+Run the prompt through evaluation scenarios to measure effectiveness.
+```bash
+python -m evals.run_prompt_eval --prompt prompts/agent_backend.md --scenarios evals/prompt_scenarios.json
+```
+Compare scores across prompt versions:
+```bash
+python -m evals.compare_prompts --v1 prompts/v1.md --v2 prompts/v2.md --scenarios evals/prompt_scenarios.json
+```
+## Resources
+- `references/prompt-templates.md` - Ready-to-use prompt templates for common agent roles
+- `references/anti-patterns.md` - Detailed anti-pattern catalog with fix examples
+## Examples
+### Example 1: Write a Backend Agent Prompt
+User asks: "Create a system prompt for our backend agent that builds FastAPI endpoints."
+Response approach:
+1. Set identity to "Backend Engineer Agent, specialized in FastAPI and Clean Architecture"
+2. Define objective: "Build production-ready REST API endpoints following bundle standards"
+3. List tools: file operations, git commands, test runner, linter
+4. Set rules: follow clean-architecture skill, enforce test coverage > 80%, use typed DTOs
+5. Add format section for code blocks and error reporting
+6. Include a concrete example of building a CRUD endpoint
-1. **Seja específico** — "Crie uma API REST com FastAPI" > "Crie uma API"
-2. **Explique o porquê** — "Usar Value Objects porque garantem validação no construtor" > "Usar Value Objects"
-3. **Dê exemplos** — Um bom exemplo vale mais que 10 linhas de instrução
-4. **Evite negativos** — "Mantenha funções com até 20 linhas" > "Não crie funções longas"
-5. **Priorize** — Coloque as regras mais importantes primeiro
-6. **Teste** — Rode o prompt com cenários reais antes de deployar
+### Example 2: Fix an Underperforming Prompt
+User asks: "Our agent keeps ignoring the coding standards. Fix the prompt."
+Response approach:
+1. Read the current prompt and check for vague instructions
+2. Look for missing justifications on rules (agent doesn't understand importance)
+3. Move coding standards rules higher in the prompt (priority by position)
+4. Add a concrete example showing compliant vs non-compliant code
+5. Add a "Common Mistakes" section with specific things to avoid
-## Anti-patterns
+### Example 3: Reduce Prompt Token Count
+User asks: "The system prompt is too long, cut it down without losing quality."
+Response approach:
+1. Count current tokens with `tiktoken`
+2. Move detailed reference material to skill files loaded on-demand
+3. Replace verbose explanations with concise bullet points
+4. Keep examples (they're high-value) but trim redundant ones
+5. Target: system prompt under 2000 tokens, details in skills
-- NEVER/ALWAYS em excesso (perde a força)
-- Instruções contraditórias
-- Prompts com 5000+ palavras (o agente se perde)
-- Regras sem justificativa (o agente não sabe quando flexibilizar)
+## Notes
+- System prompts should stay under 2000 tokens; move details to on-demand skills
+- Test every prompt change with at least 5 real-world scenarios
+- Version your prompts (v1, v2, etc.) and track performance across versions
+- The first 500 tokens of a prompt get the most attention from the model
+- Few-shot examples are the single most effective prompting technique

package/templates/bundle-ai-agents/skills/prompt-engineering/references/anti-patterns.md ADDED Viewed

@@ -0,0 +1,59 @@
+# Prompt Anti-Patterns Reference
+## 1. NEVER/ALWAYS Overuse
+**Problem**: Using NEVER and ALWAYS too frequently dilutes their impact.
+**Bad**:
+```
+NEVER use var. ALWAYS use const. NEVER use any. ALWAYS type everything.
+NEVER skip tests. ALWAYS write docs. NEVER commit without review.
+```
+**Good**:
+```
+Use const by default, let when reassignment is needed.
+Type all function parameters and return values.
+Critical: NEVER commit secrets or credentials to the repository.
+```
+**Fix**: Reserve NEVER/ALWAYS for truly critical rules (security, data integrity). Use softer language for preferences.
+## 2. Contradictory Instructions
+**Problem**: Instructions that conflict cause unpredictable behavior.
+**Bad**:
+```
+Be concise in your responses.
+Explain every decision in detail with full justification.
+```
+**Good**:
+```
+Be concise by default. When making architectural decisions, explain the reasoning.
+```
+**Fix**: Qualify when each instruction applies.
+## 3. Excessive Length (> 5000 words)
+**Problem**: The agent loses focus on key instructions when the prompt is too long.
+**Fix**: Move reference material to skill files. Keep the system prompt under 2000 tokens. Load details on-demand.
+## 4. Rules Without Justification
+**Problem**: Without knowing why, the agent cannot reason about edge cases.
+**Bad**: "Use Value Objects for all domain primitives."
+**Good**: "Use Value Objects for domain primitives because they enforce validation at construction time and make the domain model self-documenting."
+## 5. Vague Instructions
+**Problem**: Ambiguity leads to inconsistent agent behavior.
+**Bad**: "Write good code."
+**Good**: "Write code that follows Clean Architecture: separate controllers, use cases, and repositories. Keep functions under 20 lines. Include type hints on all function signatures."

package/templates/bundle-ai-agents/skills/prompt-engineering/references/prompt-templates.md ADDED Viewed

@@ -0,0 +1,75 @@
+# Prompt Templates Reference
+## Backend Agent Template
+```
+## Identity
+You are a Backend Engineer Agent, specialized in building REST APIs with FastAPI and Clean Architecture.
+## Objective
+Build production-ready API endpoints that follow the project's bundle standards, including proper error handling, validation, pagination, and test coverage.
+## Tools
+- File read/write operations
+- Git commands (commit, branch, push)
+- pytest for running tests
+- ruff for linting
+## Rules
+1. Follow Clean Architecture: controller -> use case -> repository
+2. Every endpoint must have input validation with Pydantic models
+3. Test coverage must be >= 80% for new code
+4. Use typed DTOs for all API responses
+5. Handle errors with standardized ErrorResponse format
+## Response Format
+- Code in fenced blocks with language specified
+- Explain architectural decisions with "why"
+- Report test results after implementation
+```
+## Frontend Agent Template
+```
+## Identity
+You are a Frontend Engineer Agent, specialized in React with TypeScript.
+## Objective
+Build responsive, accessible UI components following the project's design system and bundle standards.
+## Tools
+- File read/write operations
+- npm/pnpm commands
+- Jest/Vitest for testing
+- ESLint for linting
+## Rules
+1. Use functional components with hooks
+2. All props must be typed with TypeScript interfaces
+3. Components must be accessible (ARIA labels, keyboard navigation)
+4. Write unit tests for business logic, integration tests for user flows
+5. Use the project's design tokens for spacing, colors, typography
+```
+## DevOps Agent Template
+```
+## Identity
+You are a DevOps Engineer Agent, specialized in Docker, CI/CD, and infrastructure.
+## Objective
+Containerize applications, configure CI/CD pipelines, and manage deployment infrastructure.
+## Tools
+- Docker CLI (build, push, compose)
+- Git commands
+- kubectl for Kubernetes
+- Terraform for infrastructure
+## Rules
+1. All Dockerfiles must use multi-stage builds
+2. Never run containers as root
+3. Include health checks in all service containers
+4. Secrets must come from environment variables, never hardcoded
+5. CI pipelines must include lint, test, build, and security scan stages
+```