npm - @shakudo/kaji-setup-external - Versions diffs - 1.0.0 - Mend

@shakudo/kaji-setup-external 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (411) hide show

package/assets/skills/context-optimization/examples/llm-as-judge-skills/agents/evaluator-agent/evaluator-agent.md ADDED Viewed

@@ -0,0 +1,177 @@
+# Evaluator Agent
+## Purpose
+The Evaluator Agent assesses the quality of LLM-generated responses using configurable evaluation criteria. It implements the LLM-as-a-Judge pattern with support for both direct scoring and pairwise comparison.
+## Agent Definition
+```typescript
+import { ToolLoopAgent } from "ai";
+import { anthropic } from "@ai-sdk/anthropic";
+import { evaluationTools } from "../tools";
+export const evaluatorAgent = new ToolLoopAgent({
+  name: "evaluator",
+  model: anthropic("claude-sonnet-4-20250514"),
+  instructions: `You are an expert evaluator of LLM-generated content.
+Your role is to:
+1. Assess response quality against specific criteria
+2. Provide structured scores with justifications
+3. Identify specific issues and strengths
+4. Compare responses when asked for pairwise evaluation
+Evaluation Guidelines:
+- Be objective and consistent in your assessments
+- Ground evaluations in specific evidence from the response
+- Consider the context and requirements of the original task
+- Avoid position bias - evaluate content not placement
+- Do not favor verbose responses unless verbosity adds value
+Always provide:
+- Numerical scores for each criterion
+- Specific examples supporting your assessment
+- Actionable feedback for improvement`,
+  tools: {
+    directScore: evaluationTools.directScore,
+    pairwiseCompare: evaluationTools.pairwiseCompare,
+    extractCriteria: evaluationTools.extractCriteria,
+    generateRubric: evaluationTools.generateRubric
+  }
+});
+```
+## Capabilities
+### Direct Scoring
+Evaluate a single response against defined criteria and rubric.
+**Input:**
+- Response to evaluate
+- Original prompt/context
+- Evaluation criteria
+- Scoring rubric
+**Output:**
+- Score per criterion (1-5)
+- Overall score
+- Detailed justification
+- Identified issues and strengths
+### Pairwise Comparison
+Compare two responses and select the better one.
+**Input:**
+- Response A
+- Response B
+- Original prompt/context
+- Comparison criteria
+**Output:**
+- Winner selection (A, B, or Tie)
+- Confidence score
+- Comparative analysis
+- Specific differentiators
+### Criteria Extraction
+Automatically extract evaluation criteria from a task description.
+**Input:**
+- Task description
+- Domain context
+- Quality expectations
+**Output:**
+- List of relevant criteria
+- Criterion descriptions
+- Suggested weights
+### Rubric Generation
+Generate a scoring rubric for specific criteria.
+**Input:**
+- Criterion name
+- Quality dimensions
+- Scale (default 1-5)
+**Output:**
+- Rubric with score descriptions
+- Examples for each level
+- Edge case guidance
+## Configuration
+```typescript
+interface EvaluatorConfig {
+  // Scoring configuration
+  scoringMode: "direct" | "pairwise";
+  useChainOfThought: boolean;
+  nShotExamples: number;
+  // Bias mitigation
+  swapPositionsForPairwise: boolean;
+  normalizeForLength: boolean;
+  // Output configuration
+  includeJustification: boolean;
+  includeExamples: boolean;
+  outputFormat: "structured" | "prose";
+}
+const defaultConfig: EvaluatorConfig = {
+  scoringMode: "direct",
+  useChainOfThought: true,
+  nShotExamples: 2,
+  swapPositionsForPairwise: true,
+  normalizeForLength: false,
+  includeJustification: true,
+  includeExamples: true,
+  outputFormat: "structured"
+};
+```
+## Usage Example
+```typescript
+import { evaluatorAgent } from "./agents/evaluator-agent";
+// Direct scoring
+const evaluation = await evaluatorAgent.generate({
+  prompt: `Evaluate the following response:
+Original Question: "Explain quantum entanglement to a high school student"
+Response: "${generatedResponse}"
+Criteria:
+1. Accuracy - Scientific correctness
+2. Clarity - Understandable for target audience
+3. Engagement - Interesting and memorable
+4. Completeness - Covers key concepts
+Provide scores and detailed feedback.`
+});
+// Pairwise comparison
+const comparison = await evaluatorAgent.generate({
+  prompt: `Compare these two responses to the same question.
+Question: "What are the benefits of exercise?"
+Response A: "${responseA}"
+Response B: "${responseB}"
+Which response is better? Explain your reasoning.`
+});
+```
+## Integration Points
+- **Content Generation Pipeline**: Evaluate outputs before delivery
+- **Model Comparison**: Compare responses from different models
+- **Quality Monitoring**: Track response quality over time
+- **Fine-tuning Data**: Generate preference data for RLHF

package/assets/skills/context-optimization/examples/llm-as-judge-skills/agents/index.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Agents Index
+Agents are reusable AI components with defined capabilities, tools, and instructions.
+## Available Agents
+### Evaluator Agent
+**Path**: `agents/evaluator-agent/evaluator-agent.md`
+**Purpose**: Assess the quality of LLM-generated responses
+**Capabilities**:
+- Direct scoring against rubrics
+- Pairwise comparison of responses
+- Criteria extraction from task descriptions
+- Rubric generation for evaluation
+**Tools Used**:
+- `directScore`
+- `pairwiseCompare`
+- `extractCriteria`
+- `generateRubric`
+**Best For**:
+- Quality gates in content pipelines
+- Model comparison studies
+- RLHF preference data generation
+- Output validation before delivery
+---
+### Research Agent
+**Path**: `agents/research-agent/research-agent.md`
+**Purpose**: Gather, verify, and synthesize information from multiple sources
+**Capabilities**:
+- Web search and result analysis
+- URL content extraction
+- Claim extraction and verification
+- Research synthesis
+**Tools Used**:
+- `webSearch`
+- `readUrl`
+- `extractClaims`
+- `verifyClaim`
+- `synthesize`
+**Best For**:
+- Knowledge base building
+- Fact checking
+- Market research
+- Technical documentation
+---
+### Orchestrator Agent
+**Path**: `agents/orchestrator-agent/orchestrator-agent.md`
+**Purpose**: Coordinate multi-agent workflows for complex tasks
+**Capabilities**:
+- Task decomposition and assignment
+- Parallel task execution
+- Result synthesis
+- Error handling and recovery
+**Tools Used**:
+- `delegateToAgent`
+- `parallelExecution`
+- `waitForCompletion`
+- `synthesizeResults`
+- `handleError`
+**Best For**:
+- Complex multi-step tasks
+- Cross-capability workflows
+- Quality-assured pipelines
+- Long-running operations
+## Agent Interaction Patterns
+### Sequential Pipeline
+```
+Input → Agent A → Agent B → Agent C → Output
+```
+Use when each step depends on the previous.
+### Parallel Fan-Out
+```
+        ┌→ Agent A ─┐
+Input ──┼→ Agent B ──┼→ Synthesis → Output
+        └→ Agent C ─┘
+```
+Use for independent subtasks that can run concurrently.
+### Iterative Refinement
+```
+Input → Agent → Evaluator ─┬→ Output (if pass)
+                           └→ Agent (if fail, with feedback)
+```
+Use for quality-critical outputs.
+## Adding New Agents
+1. Create agent directory: `agents/<agent-name>/`
+2. Create main file: `agents/<agent-name>/<agent-name>.md`
+3. Define:
+   - Purpose and role
+   - System instructions
+   - Tool assignments
+   - Configuration options
+   - Usage examples
+4. Update this index
+5. Register with orchestrator if applicable

package/assets/skills/context-optimization/examples/llm-as-judge-skills/agents/orchestrator-agent/orchestrator-agent.md ADDED Viewed

@@ -0,0 +1,205 @@
+# Orchestrator Agent
+## Purpose
+The Orchestrator Agent manages complex workflows by delegating tasks to specialized agents, coordinating their outputs, and ensuring coherent end-to-end execution. It serves as the primary interface for multi-agent operations.
+## Agent Definition
+```typescript
+import { ToolLoopAgent } from "ai";
+import { anthropic } from "@ai-sdk/anthropic";
+import { orchestrationTools } from "../tools";
+export const orchestratorAgent = new ToolLoopAgent({
+  name: "orchestrator",
+  model: anthropic("claude-sonnet-4-20250514"),
+  instructions: `You are a workflow orchestration expert.
+Your role is to:
+1. Analyze complex tasks and break them into subtasks
+2. Assign subtasks to appropriate specialized agents
+3. Coordinate agent outputs and handle dependencies
+4. Synthesize results into coherent final outputs
+5. Handle errors and retries gracefully
+Orchestration Principles:
+- Decompose tasks by capability requirements
+- Parallelize independent operations when possible
+- Maintain context continuity across agent handoffs
+- Validate intermediate outputs before proceeding
+- Provide clear status updates during long operations
+Available Agents:
+- evaluator: Assesses quality of LLM outputs
+- researcher: Gathers and synthesizes information
+- writer: Generates and refines content
+- analyst: Performs data analysis and insights
+When delegating:
+- Provide complete context the agent needs
+- Specify expected output format
+- Set clear success criteria`,
+  tools: {
+    delegateToAgent: orchestrationTools.delegateToAgent,
+    parallelExecution: orchestrationTools.parallelExecution,
+    waitForCompletion: orchestrationTools.waitForCompletion,
+    synthesizeResults: orchestrationTools.synthesizeResults,
+    handleError: orchestrationTools.handleError
+  }
+});
+```
+## Capabilities
+### Task Delegation
+Route a task to a specialized agent.
+**Input:**
+- Agent name
+- Task description
+- Context/dependencies
+- Expected output format
+**Output:**
+- Agent response
+- Execution metadata
+- Status
+### Parallel Execution
+Execute multiple independent tasks simultaneously.
+**Input:**
+- List of (agent, task) pairs
+- Timeout configuration
+**Output:**
+- Results array
+- Completion status per task
+- Any errors encountered
+### Result Synthesis
+Combine outputs from multiple agents into coherent result.
+**Input:**
+- Agent outputs
+- Synthesis instructions
+- Target format
+**Output:**
+- Synthesized result
+- Source attribution
+- Confidence assessment
+### Error Handling
+Manage failures and implement retry logic.
+**Input:**
+- Failed task
+- Error details
+- Retry policy
+**Output:**
+- Retry result or
+- Graceful degradation or
+- Error escalation
+## Configuration
+```typescript
+interface OrchestratorConfig {
+  // Execution settings
+  maxParallelTasks: number;
+  defaultTimeout: number; // ms
+  retryPolicy: RetryPolicy;
+  // Quality settings
+  validateIntermediateOutputs: boolean;
+  evaluateBeforeDelivery: boolean;
+  // Reporting
+  enableProgressUpdates: boolean;
+  updateFrequency: number; // ms
+}
+interface RetryPolicy {
+  maxRetries: number;
+  backoffMultiplier: number;
+  retryableErrors: string[];
+}
+const defaultConfig: OrchestratorConfig = {
+  maxParallelTasks: 5,
+  defaultTimeout: 60000,
+  retryPolicy: {
+    maxRetries: 3,
+    backoffMultiplier: 2,
+    retryableErrors: ["RATE_LIMIT", "TIMEOUT", "TEMPORARY_ERROR"]
+  },
+  validateIntermediateOutputs: true,
+  evaluateBeforeDelivery: false,
+  enableProgressUpdates: true,
+  updateFrequency: 5000
+};
+```
+## Usage Example
+```typescript
+import { orchestratorAgent } from "./agents/orchestrator-agent";
+const result = await orchestratorAgent.generate({
+  prompt: `Complete the following research and analysis task:
+1. Research current best practices for LLM evaluation
+2. Analyze the trade-offs between different evaluation methods
+3. Generate a recommendation report
+4. Evaluate the quality of the report
+Ensure the final output is comprehensive but accessible to technical leaders.`
+});
+```
+## Orchestration Patterns
+### Sequential Pipeline
+```mermaid
+graph LR
+    A[Task] --> B[Research Agent]
+    B --> C[Analyst Agent]
+    C --> D[Writer Agent]
+    D --> E[Evaluator Agent]
+    E --> F[Final Output]
+```
+### Parallel with Aggregation
+```mermaid
+graph TD
+    A[Task] --> B[Parallel Dispatch]
+    B --> C[Agent 1]
+    B --> D[Agent 2]
+    B --> E[Agent 3]
+    C --> F[Aggregation]
+    D --> F
+    E --> F
+    F --> G[Synthesis]
+```
+### Iterative Refinement
+```mermaid
+graph TD
+    A[Draft] --> B[Evaluator]
+    B --> C{Score OK?}
+    C -->|No| D[Refine]
+    D --> A
+    C -->|Yes| E[Final Output]
+```
+## Integration Points
+- **API Gateway**: Primary entry point for complex requests
+- **Job Queue**: Handle long-running orchestrated tasks
+- **Monitoring**: Track multi-agent execution metrics
+- **Audit Log**: Record all delegations and decisions

package/assets/skills/context-optimization/examples/llm-as-judge-skills/agents/research-agent/research-agent.md ADDED Viewed

@@ -0,0 +1,183 @@
+# Research Agent
+## Purpose
+The Research Agent gathers, synthesizes, and summarizes information from multiple sources to answer complex research questions. It implements a multi-step research workflow with source verification and citation tracking.
+## Agent Definition
+```typescript
+import { ToolLoopAgent } from "ai";
+import { openai } from "@ai-sdk/openai";
+import { researchTools } from "../tools";
+export const researchAgent = new ToolLoopAgent({
+  name: "researcher",
+  model: openai("gpt-4o"),
+  instructions: `You are an expert research analyst.
+Your role is to:
+1. Break down complex research questions into searchable queries
+2. Gather information from multiple sources
+3. Verify and cross-reference claims
+4. Synthesize findings into coherent summaries
+5. Provide proper citations for all claims
+Research Methodology:
+- Start with broad searches to understand the landscape
+- Narrow down to specific sources for detailed information
+- Always verify facts from multiple sources when possible
+- Distinguish between facts, claims, and opinions
+- Note the recency and authority of sources
+Quality Standards:
+- Never fabricate information or sources
+- Clearly indicate when information is uncertain
+- Provide direct quotes when precision matters
+- Include source URLs/references for verification`,
+  tools: {
+    webSearch: researchTools.webSearch,
+    readUrl: researchTools.readUrl,
+    extractClaims: researchTools.extractClaims,
+    verifyClaim: researchTools.verifyClaim,
+    synthesize: researchTools.synthesize
+  }
+});
+```
+## Capabilities
+### Web Search
+Search the web for relevant information.
+**Input:**
+- Search query
+- Optional filters (date, source type)
+**Output:**
+- List of relevant results
+- Snippets and URLs
+- Source metadata
+### URL Reading
+Extract content from a specific URL.
+**Input:**
+- URL to read
+- Content type (article, paper, documentation)
+**Output:**
+- Extracted text content
+- Key sections identified
+- Publication metadata
+### Claim Extraction
+Identify distinct claims from a source.
+**Input:**
+- Source text
+- Claim types to extract
+**Output:**
+- List of claims
+- Confidence level
+- Supporting context
+### Claim Verification
+Cross-reference a claim against other sources.
+**Input:**
+- Claim to verify
+- Original source
+**Output:**
+- Verification status
+- Supporting/contradicting sources
+- Confidence assessment
+### Synthesis
+Combine findings into a coherent summary.
+**Input:**
+- Research findings
+- Target format
+- Key questions to answer
+**Output:**
+- Synthesized summary
+- Key insights
+- Source citations
+## Configuration
+```typescript
+interface ResearchConfig {
+  // Search configuration
+  maxSearchResults: number;
+  preferredSources: string[];
+  excludedDomains: string[];
+  // Verification settings
+  minSourcesForVerification: number;
+  requireRecentSources: boolean;
+  maxSourceAge: "1month" | "6months" | "1year" | "any";
+  // Output configuration
+  citationStyle: "inline" | "footnote" | "endnote";
+  summaryLength: "brief" | "standard" | "comprehensive";
+  includeSourceQuality: boolean;
+}
+const defaultConfig: ResearchConfig = {
+  maxSearchResults: 10,
+  preferredSources: [],
+  excludedDomains: [],
+  minSourcesForVerification: 2,
+  requireRecentSources: false,
+  maxSourceAge: "any",
+  citationStyle: "inline",
+  summaryLength: "standard",
+  includeSourceQuality: true
+};
+```
+## Usage Example
+```typescript
+import { researchAgent } from "./agents/research-agent";
+const research = await researchAgent.generate({
+  prompt: `Research the current state of LLM evaluation methods.
+I need to understand:
+1. What are the main approaches to evaluating LLM outputs?
+2. What are the limitations of human evaluation?
+3. How reliable are LLM-based evaluators compared to humans?
+4. What are best practices for implementing LLM-as-a-Judge?
+Provide a comprehensive summary with citations.`
+});
+```
+## Research Workflow
+```mermaid
+graph TD
+    A[Research Question] --> B[Query Decomposition]
+    B --> C[Initial Search]
+    C --> D[Source Selection]
+    D --> E[Deep Reading]
+    E --> F[Claim Extraction]
+    F --> G[Cross-Verification]
+    G --> H[Synthesis]
+    H --> I[Final Report]
+```
+## Integration Points
+- **Knowledge Base Building**: Populate internal knowledge stores
+- **Fact Checking**: Verify claims in generated content
+- **Market Research**: Gather competitive intelligence
+- **Technical Documentation**: Research implementation approaches

package/assets/skills/context-optimization/examples/llm-as-judge-skills/env.example ADDED Viewed

@@ -0,0 +1,6 @@
+# OpenAI Configuration
+OPENAI_API_KEY=your_openai_api_key_here
+OPENAI_MODEL=gpt-4o
+# Optional: Anthropic for alternative models
+# ANTHROPIC_API_KEY=your_anthropic_api_key_here

package/assets/skills/context-optimization/examples/llm-as-judge-skills/eslint.config.js ADDED Viewed

@@ -0,0 +1,18 @@
+import eslint from '@eslint/js';
+import tseslint from 'typescript-eslint';
+export default tseslint.config(
+  eslint.configs.recommended,
+  ...tseslint.configs.recommended,
+  {
+    ignores: ['dist/', 'node_modules/', 'coverage/']
+  },
+  {
+    rules: {
+      '@typescript-eslint/no-unused-vars': ['error', { argsIgnorePattern: '^_' }],
+      '@typescript-eslint/explicit-function-return-type': 'off',
+      '@typescript-eslint/no-explicit-any': 'warn'
+    }
+  }
+);