npm - @musashishao/agent-kit - Versions diffs - 1.8.1 → 1.9.0 - Mend

@musashishao/agent-kit 1.8.1 → 1.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (92) hide show

package/.agent/agents/ai-architect.md +39 -0
package/.agent/agents/cloud-engineer.md +39 -0
package/.agent/agents/game-asset-curator.md +317 -0
package/.agent/agents/game-developer.md +190 -89
package/.agent/agents/game-narrative-designer.md +310 -0
package/.agent/agents/game-qa-agent.md +441 -0
package/.agent/agents/marketing-specialist.md +41 -0
package/.agent/agents/penetration-tester.md +15 -1
package/.agent/rules/CODEX.md +26 -2
package/.agent/rules/GEMINI.md +7 -5
package/.agent/rules/REFERENCE.md +92 -2
package/.agent/scripts/ak_cli.py +1 -1
package/.agent/scripts/localize_workflows.py +54 -0
package/.agent/scripts/memory_manager.py +24 -1
package/.agent/skills/3d-web-experience/SKILL.md +386 -0
package/.agent/skills/DEPENDENCIES.md +54 -0
package/.agent/skills/ab-test-setup/SKILL.md +77 -0
package/.agent/skills/active-directory-attacks/SKILL.md +59 -0
package/.agent/skills/agent-evaluation/SKILL.md +430 -0
package/.agent/skills/agent-memory-systems/SKILL.md +426 -0
package/.agent/skills/agent-tool-builder/SKILL.md +139 -0
package/.agent/skills/ai-agents-architect/SKILL.md +115 -0
package/.agent/skills/ai-product/SKILL.md +86 -0
package/.agent/skills/ai-wrapper-product/SKILL.md +90 -0
package/.agent/skills/analytics-tracking/SKILL.md +88 -0
package/.agent/skills/api-fuzzing-bug-bounty/SKILL.md +66 -0
package/.agent/skills/app-store-optimization/SKILL.md +66 -0
package/.agent/skills/autonomous-agent-patterns/SKILL.md +414 -0
package/.agent/skills/aws-penetration-testing/SKILL.md +50 -0
package/.agent/skills/aws-serverless/SKILL.md +327 -0
package/.agent/skills/azure-functions/SKILL.md +340 -0
package/.agent/skills/broken-authentication/SKILL.md +53 -0
package/.agent/skills/browser-automation/SKILL.md +408 -0
package/.agent/skills/browser-extension-builder/SKILL.md +422 -0
package/.agent/skills/bullmq-specialist/SKILL.md +424 -0
package/.agent/skills/bun-development/SKILL.md +386 -0
package/.agent/skills/burp-suite-testing/SKILL.md +60 -0
package/.agent/skills/clerk-auth/SKILL.md +432 -0
package/.agent/skills/cloud-penetration-testing/SKILL.md +51 -0
package/.agent/skills/copywriting/SKILL.md +66 -0
package/.agent/skills/crewai/SKILL.md +470 -0
package/.agent/skills/discord-bot-architect/SKILL.md +447 -0
package/.agent/skills/email-sequence/SKILL.md +73 -0
package/.agent/skills/ethical-hacking-methodology/SKILL.md +67 -0
package/.agent/skills/firebase/SKILL.md +377 -0
package/.agent/skills/game-development/godot-expert/SKILL.md +462 -0
package/.agent/skills/game-development/npc-ai-integration/SKILL.md +110 -0
package/.agent/skills/game-development/procedural-generation/SKILL.md +168 -0
package/.agent/skills/game-development/unity-integration/SKILL.md +358 -0
package/.agent/skills/game-development/webgpu-shading/SKILL.md +209 -0
package/.agent/skills/gcp-cloud-run/SKILL.md +358 -0
package/.agent/skills/graphql/SKILL.md +492 -0
package/.agent/skills/idor-testing/SKILL.md +64 -0
package/.agent/skills/inngest/SKILL.md +128 -0
package/.agent/skills/langfuse/SKILL.md +415 -0
package/.agent/skills/langgraph/SKILL.md +360 -0
package/.agent/skills/launch-strategy/SKILL.md +68 -0
package/.agent/skills/linux-privilege-escalation/SKILL.md +62 -0
package/.agent/skills/llm-app-patterns/SKILL.md +367 -0
package/.agent/skills/marketing-ideas/SKILL.md +66 -0
package/.agent/skills/metasploit-framework/SKILL.md +60 -0
package/.agent/skills/micro-saas-launcher/SKILL.md +93 -0
package/.agent/skills/neon-postgres/SKILL.md +339 -0
package/.agent/skills/paid-ads/SKILL.md +64 -0
package/.agent/skills/supabase-integration/SKILL.md +411 -0
package/.agent/workflows/ai-agent.md +36 -0
package/.agent/workflows/autofix.md +1 -0
package/.agent/workflows/brainstorm.md +1 -0
package/.agent/workflows/context.md +1 -0
package/.agent/workflows/create.md +1 -0
package/.agent/workflows/dashboard.md +1 -0
package/.agent/workflows/debug.md +1 -0
package/.agent/workflows/deploy.md +1 -0
package/.agent/workflows/enhance.md +1 -0
package/.agent/workflows/game-prototype.md +154 -0
package/.agent/workflows/marketing.md +37 -0
package/.agent/workflows/next.md +1 -0
package/.agent/workflows/orchestrate.md +1 -0
package/.agent/workflows/pentest.md +37 -0
package/.agent/workflows/plan.md +1 -0
package/.agent/workflows/preview.md +2 -1
package/.agent/workflows/quality.md +1 -0
package/.agent/workflows/saas.md +36 -0
package/.agent/workflows/spec.md +1 -0
package/.agent/workflows/status.md +1 -0
package/.agent/workflows/test.md +1 -0
package/.agent/workflows/ui-ux-pro-max.md +1 -0
package/README.md +52 -24
package/bin/cli.js +68 -3
package/docs/CHANGELOG_AI_INFRA.md +30 -0
package/docs/MIGRATION_GUIDE_V1.9.md +55 -0
package/package.json +1 -1

package/.agent/skills/langfuse/SKILL.md ADDED Viewed

@@ -0,0 +1,415 @@
+---
+name: langfuse
+description: "Langfuse LLM observability and tracing. Track, debug, and analyze LLM applications with detailed traces, user feedback, and cost monitoring. Integrates with OpenAI, LangChain, and custom LLM calls."
+version: "1.0.0"
+source: "antigravity-awesome-skills (adapted)"
+---
+# 📊 Langfuse
+**Role**: LLM Observability Expert
+You are an expert in LLM observability using Langfuse. You understand that production LLM applications need visibility into performance, costs, and quality. You instrument applications properly, track user feedback, and use data to improve prompts.
+---
+## When to Use This Skill
+- Setting up LLM observability for production
+- Debugging LLM application issues
+- Tracking costs and performance metrics
+- Collecting user feedback for improvement
+- A/B testing prompts in production
+---
+## Capabilities
+- `langfuse`
+- `llm-tracing`
+- `llm-observability`
+- `prompt-management`
+- `user-feedback`
+- `cost-tracking`
+- `evaluation`
+---
+## Requirements
+```bash
+pip install langfuse
+```
+```python
+# Environment variables
+LANGFUSE_PUBLIC_KEY="pk-..."
+LANGFUSE_SECRET_KEY="sk-..."
+LANGFUSE_HOST="https://cloud.langfuse.com"  # or self-hosted
+```
+---
+## 1. Core Concepts
+### Tracing Hierarchy
+```
+┌─────────────────────────────────────────────────────────────┐
+│                         TRACE                                │
+│  (One user request / conversation turn)                     │
+│                                                              │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │                      SPAN                            │    │
+│  │  (A logical operation within the trace)             │    │
+│  │                                                      │    │
+│  │  ┌──────────────┐  ┌──────────────┐                │    │
+│  │  │  GENERATION  │  │  GENERATION  │                │    │
+│  │  │  (LLM call)  │  │  (LLM call)  │                │    │
+│  │  └──────────────┘  └──────────────┘                │    │
+│  └─────────────────────────────────────────────────────┘    │
+│                                                              │
+│  ┌─────────────────────────────────────────────────────┐    │
+│  │                      SCORE                           │    │
+│  │  (User feedback, evaluation result)                  │    │
+│  └─────────────────────────────────────────────────────┘    │
+└─────────────────────────────────────────────────────────────┘
+```
+| Component | Description |
+|-----------|-------------|
+| **Trace** | Top-level container for a user request |
+| **Span** | Logical operation within a trace |
+| **Generation** | A single LLM call with input/output |
+| **Score** | Feedback or evaluation attached to trace |
+---
+## 2. Patterns
+### 2.1 Basic Tracing Setup
+Manual instrumentation for any LLM.
+```python
+from langfuse import Langfuse
+import openai
+# Initialize client
+langfuse = Langfuse(
+    public_key="pk-...",
+    secret_key="sk-...",
+    host="https://cloud.langfuse.com"
+)
+def chat_with_tracing(user_message: str, user_id: str, session_id: str):
+    # Create a trace for this request
+    trace = langfuse.trace(
+        name="chat-completion",
+        user_id=user_id,
+        session_id=session_id,
+        metadata={"feature": "customer-support"},
+        tags=["production", "v2"]
+    )
+    # Log the generation (LLM call)
+    generation = trace.generation(
+        name="gpt-4o-response",
+        model="gpt-4o",
+        model_parameters={"temperature": 0.7},
+        input={"messages": [{"role": "user", "content": user_message}]},
+        metadata={"attempt": 1}
+    )
+    # Make actual LLM call
+    response = openai.chat.completions.create(
+        model="gpt-4o",
+        messages=[{"role": "user", "content": user_message}]
+    )
+    # Complete the generation with output
+    generation.end(
+        output=response.choices[0].message.content,
+        usage={
+            "input": response.usage.prompt_tokens,
+            "output": response.usage.completion_tokens
+        }
+    )
+    return response.choices[0].message.content, trace.id
+# Score the trace based on user feedback
+def record_feedback(trace_id: str, is_helpful: bool):
+    langfuse.score(
+        trace_id=trace_id,
+        name="user-feedback",
+        value=1 if is_helpful else 0,
+        comment="User clicked helpful" if is_helpful else "User clicked not helpful"
+    )
+# IMPORTANT: Flush before exit (especially in serverless)
+langfuse.flush()
+```
+### 2.2 OpenAI Integration (Drop-in)
+Automatic tracing with OpenAI SDK.
+```python
+from langfuse.openai import openai  # Drop-in replacement
+# All calls automatically traced!
+response = openai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Hello"}],
+    # Langfuse-specific parameters
+    name="greeting",
+    session_id="session-123",
+    user_id="user-456",
+    tags=["test"],
+    metadata={"feature": "chat"}
+)
+# Works with streaming
+stream = openai.chat.completions.create(
+    model="gpt-4o",
+    messages=[{"role": "user", "content": "Tell me a story"}],
+    stream=True,
+    name="story-generation"
+)
+for chunk in stream:
+    print(chunk.choices[0].delta.content, end="")
+# Works with async
+from langfuse.openai import AsyncOpenAI
+async_client = AsyncOpenAI()
+async def main():
+    response = await async_client.chat.completions.create(
+        model="gpt-4o",
+        messages=[{"role": "user", "content": "Hello"}],
+        name="async-greeting"
+    )
+```
+### 2.3 LangChain Integration
+```python
+from langchain_openai import ChatOpenAI
+from langchain_core.prompts import ChatPromptTemplate
+from langfuse.callback import CallbackHandler
+# Create Langfuse callback handler
+langfuse_handler = CallbackHandler(
+    public_key="pk-...",
+    secret_key="sk-...",
+    host="https://cloud.langfuse.com",
+    session_id="session-123",
+    user_id="user-456"
+)
+# Use with any LangChain component
+llm = ChatOpenAI(model="gpt-4o")
+prompt = ChatPromptTemplate.from_messages([
+    ("system", "You are a helpful assistant."),
+    ("user", "{input}")
+])
+chain = prompt | llm
+# Pass handler to invoke
+response = chain.invoke(
+    {"input": "Hello"},
+    config={"callbacks": [langfuse_handler]}
+)
+# Works with agents, retrievers, etc.
+from langchain.agents import create_openai_tools_agent, AgentExecutor
+agent = create_openai_tools_agent(llm, tools, prompt)
+agent_executor = AgentExecutor(agent=agent, tools=tools)
+result = agent_executor.invoke(
+    {"input": "What's the weather?"},
+    config={"callbacks": [langfuse_handler]}
+)
+```
+### 2.4 Decorator Pattern
+Clean tracing with decorators.
+```python
+from langfuse.decorators import observe, langfuse_context
+@observe()  # Automatically creates trace
+def process_request(user_input: str):
+    # Add metadata to current trace
+    langfuse_context.update_current_trace(
+        user_id="user-123",
+        tags=["production"]
+    )
+    # Nested spans created automatically
+    result = analyze(user_input)
+    response = generate_response(result)
+    return response
+@observe()  # Creates child span
+def analyze(text: str):
+    # Analysis logic
+    return {"sentiment": "positive"}
+@observe()  # Creates child span
+def generate_response(analysis: dict):
+    # Use LLM
+    response = openai.chat.completions.create(...)
+    return response.choices[0].message.content
+# Evaluate and score
+@observe()
+def evaluate_response(response: str, expected: str):
+    score = calculate_similarity(response, expected)
+    langfuse_context.score_current_trace(
+        name="accuracy",
+        value=score
+    )
+    return score
+```
+### 2.5 Prompt Management
+Version and A/B test prompts in production.
+```python
+from langfuse import Langfuse
+langfuse = Langfuse()
+# Fetch prompt from Langfuse (versioned)
+prompt = langfuse.get_prompt("customer-support-v2")
+# Use the prompt
+messages = prompt.compile(
+    customer_name="John",
+    issue="billing"
+)
+# Prompt is automatically linked to trace
+response = openai.chat.completions.create(
+    model="gpt-4o",
+    messages=messages,
+    langfuse_prompt=prompt  # Links prompt to generation
+)
+# In Langfuse UI:
+# - See which prompt version was used
+# - Compare performance across versions
+# - A/B test prompts
+```
+---
+## 3. Metrics to Track
+### Dashboard Metrics
+| Metric | Description | Target |
+|--------|-------------|--------|
+| **Latency P50** | Median response time | < 2s |
+| **Latency P99** | 99th percentile | < 10s |
+| **Token Usage** | Avg tokens per request | Monitor |
+| **Cost per Request** | $ per API call | Optimize |
+| **Error Rate** | % failed requests | < 1% |
+| **User Satisfaction** | Feedback score | > 80% |
+### Custom Scores
+```python
+# Numeric scores
+langfuse.score(trace_id=trace_id, name="accuracy", value=0.95)
+langfuse.score(trace_id=trace_id, name="relevance", value=0.88)
+# Categorical scores
+langfuse.score(trace_id=trace_id, name="quality", value="good")
+# Boolean scores
+langfuse.score(trace_id=trace_id, name="hallucination", value=0)
+```
+---
+## 4. Anti-Patterns
+### ❌ Not Flushing in Serverless
+```python
+# WRONG: Traces lost in serverless
+def handler(event, context):
+    response = call_llm()
+    return response  # Function ends, traces not sent!
+# CORRECT: Always flush
+def handler(event, context):
+    response = call_llm()
+    langfuse.flush()  # Send traces before exit
+    return response
+```
+### ❌ Tracing Everything
+```python
+# WRONG: Trace every internal function
+@observe()
+def helper_function():  # Noise!
+    pass
+# CORRECT: Trace meaningful operations
+@observe()
+def process_user_request():  # Meaningful
+    pass
+```
+### ❌ No User/Session IDs
+```python
+# WRONG: Anonymous traces
+trace = langfuse.trace(name="chat")  # Can't group!
+# CORRECT: Include identifiers
+trace = langfuse.trace(
+    name="chat",
+    user_id="user-123",     # Group by user
+    session_id="session-456" # Group by session
+)
+```
+---
+## 5. Production Checklist
+| Check | Status |
+|-------|--------|
+| ✅ Traces have user_id and session_id | |
+| ✅ All LLM calls are generations | |
+| ✅ Flush called in serverless | |
+| ✅ User feedback collected | |
+| ✅ Costs monitored | |
+| ✅ Error handling doesn't break traces | |
+| ✅ PII redacted from traces | |
+---
+## Related Skills
+- `llm-app-patterns` - LLM architecture patterns
+- `observability-patterns` - General observability
+- `opentelemetry-expert` - OpenTelemetry tracing
+- `langgraph` - Agent frameworks