npm - @hustle-together/api-dev-tools - Versions diffs - 1.6.0 → 1.7.1 - Mend

@hustle-together/api-dev-tools 1.6.0 → 1.7.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

package/README.md +26 -1
package/commands/api-research.md +77 -0
package/hooks/enforce-external-research.py +328 -0
package/hooks/track-tool-use.py +70 -2
package/package.json +1 -1
package/templates/api-dev-state.json +3 -1
package/templates/settings.json +12 -0

package/README.md CHANGED Viewed

@@ -26,8 +26,9 @@ Five powerful slash commands for Claude Code:
 - **`/api-status [endpoint]`** - Track implementation progress and phase completion
 ### Enforcement Hooks
-Five Python hooks that provide **real programmatic guarantees**:
+Six Python hooks that provide **real programmatic guarantees**:
+- **`enforce-external-research.py`** - (v1.7.0) Detects external API questions and requires research before answering
 - **`enforce-research.py`** - Blocks API code writing until research is complete
 - **`enforce-interview.py`** - Verifies user questions were actually asked (prevents self-answering)
 - **`verify-implementation.py`** - Checks implementation matches interview requirements
@@ -445,6 +446,30 @@ The workflow now includes automatic detection of common implementation gaps:
 **Fix:** `verify-implementation.py` warns when test files check env vars that don't match interview requirements.
+### Gap 6: Training Data Reliance (v1.7.0+)
+**Problem:** AI answers questions about external APIs from potentially outdated training data instead of researching first.
+**Example:**
+- User asks: "What providers does Vercel AI Gateway support?"
+- AI answers from memory: "Groq not in gateway" (WRONG!)
+- Reality: Groq has 4 models in the gateway (Llama variants)
+**Fix:** New `UserPromptSubmit` hook (`enforce-external-research.py`) that:
+1. Detects questions about external APIs/SDKs using pattern matching
+2. Injects context requiring research before answering
+3. Works for ANY API (Brandfetch, Stripe, Twilio, etc.) - not just specific ones
+4. Auto-allows WebSearch and Context7 without permission prompts
+```
+USER: "What providers does Brandfetch API support?"
+        ↓
+HOOK: Detects "Brandfetch", "API", "providers"
+        ↓
+INJECTS: "RESEARCH REQUIRED: Use Context7/WebSearch before answering"
+        ↓
+CLAUDE: Researches first → Gives accurate answer
+```
 ## 🔧 Requirements
 - **Node.js** 14.0.0 or higher

package/commands/api-research.md CHANGED Viewed

@@ -259,6 +259,83 @@ With thorough research:
 - ✅ Robust implementation
 - ✅ Better documentation
+---
+## Research-First Schema Design (MANDATORY)
+### The Anti-Pattern: Schema-First Development
+**NEVER DO THIS:**
+- ❌ Define interfaces based on assumptions before researching
+- ❌ Rely on training data for API capabilities
+- ❌ Say "I think it supports..." without verification
+- ❌ Build schemas from memory instead of documentation
+**Real Example of Failure:**
+- User asked: "What providers does Vercel AI Gateway support?"
+- AI answered from memory: "Groq not in gateway"
+- Reality: Groq has 4 models in the gateway (Llama variants)
+- Root cause: No research was done before answering
+### The Correct Pattern: Research-First
+**ALWAYS DO THIS:**
+**Step 1: Research the Source of Truth**
+- Use Context7 (`mcp__context7__resolve-library-id` + `get-library-docs`) for SDK docs
+- Use WebSearch for official provider documentation
+- Query APIs directly when possible (don't assume)
+- Check GitHub repositories for current implementation
+**Step 2: Build Schema FROM Research**
+- Interface fields emerge from discovered capabilities
+- Every field has a source (docs, SDK types, API response)
+- Don't guess - verify each capability
+- Document where each field came from
+**Step 3: Verify with Actual Calls**
+- Test capabilities before marking them supported
+- Investigate skipped tests - they're bugs, not features
+- No "should work" - prove it works
+- All tests must pass, not be skipped
+### Mandatory Checklist Before Answering ANY External API Question
+Before responding to questions about APIs, SDKs, or external services:
+```
+[ ] Did I use Context7 to get current documentation?
+[ ] Did I use WebSearch for official docs?
+[ ] Did I verify the information is current (not training data)?
+[ ] Am I stating facts from research, not memory?
+[ ] Have I cited my sources?
+```
+### Research Query Tracking
+All research is now tracked in `.claude/api-dev-state.json`:
+```json
+{
+  "research_queries": [
+    {
+      "timestamp": "2025-12-07T...",
+      "tool": "WebSearch",
+      "query": "Vercel AI Gateway Groq providers",
+      "terms": ["vercel", "gateway", "groq", "providers"]
+    },
+    {
+      "timestamp": "2025-12-07T...",
+      "tool": "mcp__context7__get-library-docs",
+      "library": "@ai-sdk/gateway",
+      "terms": ["@ai-sdk/gateway"]
+    }
+  ]
+}
+```
+This allows verification that specific topics were actually researched before answering.
 <claude-commands-template>
 ## Research Guidelines

package/hooks/enforce-external-research.py ADDED Viewed

@@ -0,0 +1,328 @@
+#!/usr/bin/env python3
+"""
+Hook: UserPromptSubmit
+Purpose: ALWAYS enforce research before answering technical questions
+This hook runs BEFORE Claude processes the user's prompt. It aggressively
+detects ANY technical question and requires comprehensive research using
+BOTH Context7 AND multiple WebSearches before answering.
+Philosophy: "ALWAYS research. Training data is NEVER trustworthy for technical info."
+The hook triggers on:
+- ANY mention of APIs, SDKs, libraries, packages, frameworks
+- ANY technical "how to" or capability questions
+- ANY code-related questions (functions, methods, parameters, types)
+- ANY questions about tools, services, or platforms
+- ANY request for implementation, editing, or changes
+Returns:
+  - Prints context to stdout (injected into conversation)
+  - Exit 0 to allow the prompt to proceed
+"""
+import json
+import sys
+import re
+from pathlib import Path
+from datetime import datetime
+# State file is in .claude/ directory (sibling to hooks/)
+STATE_FILE = Path(__file__).parent.parent / "api-dev-state.json"
+# ============================================================================
+# AGGRESSIVE DETECTION PATTERNS
+# ============================================================================
+# Technical terms that ALWAYS trigger research
+TECHNICAL_TERMS = [
+    # Code/Development
+    r"\b(?:function|method|class|interface|type|schema|model)\b",
+    r"\b(?:parameter|argument|option|config|setting|property)\b",
+    r"\b(?:import|export|require|module|package|library|dependency)\b",
+    r"\b(?:api|sdk|framework|runtime|engine|platform)\b",
+    r"\b(?:endpoint|route|url|path|request|response|header)\b",
+    r"\b(?:database|query|table|collection|document|record)\b",
+    r"\b(?:authentication|authorization|token|key|secret|credential)\b",
+    r"\b(?:error|exception|bug|issue|problem|fix)\b",
+    r"\b(?:test|spec|coverage|mock|stub|fixture)\b",
+    r"\b(?:deploy|build|compile|bundle|publish|release)\b",
+    r"\b(?:install|setup|configure|initialize|migrate)\b",
+    r"\b(?:provider|service|client|server|handler|middleware)\b",
+    r"\b(?:stream|async|await|promise|callback|event)\b",
+    r"\b(?:component|widget|element|view|layout|template)\b",
+    r"\b(?:state|store|reducer|action|context|hook)\b",
+    r"\b(?:validate|parse|serialize|transform|convert)\b",
+    # Package patterns
+    r"@[\w-]+/[\w-]+",                      # @scope/package
+    r"\b[\w-]+-(?:sdk|api|js|ts|py|go|rs)\b",  # something-sdk, something-api
+    # Version patterns
+    r"\bv?\d+\.\d+(?:\.\d+)?(?:-[\w.]+)?\b",  # v1.2.3, 2.0.0-beta
+    # File patterns
+    r"\b[\w-]+\.(?:ts|js|tsx|jsx|py|go|rs|json|yaml|yml|toml|env)\b",
+]
+# Question patterns that indicate asking about functionality
+QUESTION_PATTERNS = [
+    # Direct questions
+    r"\b(?:what|which|where|when|why|how)\b",
+    r"\b(?:can|could|would|should|will|does|do|is|are)\b.*\?",
+    # Requests
+    r"\b(?:show|tell|explain|describe|list|find|get|give)\b",
+    r"\b(?:help|need|want|looking for|trying to)\b",
+    # Actions
+    r"\b(?:create|make|build|add|implement|write|generate)\b",
+    r"\b(?:update|change|modify|edit|fix|refactor|improve)\b",
+    r"\b(?:delete|remove|drop|clear|reset)\b",
+    r"\b(?:connect|integrate|link|sync|merge)\b",
+    r"\b(?:debug|trace|log|monitor|track)\b",
+    # Comparisons
+    r"\b(?:difference|compare|versus|vs|between|or)\b",
+    r"\b(?:better|best|recommended|preferred|alternative)\b",
+]
+# Phrases that ALWAYS require research (no exceptions)
+ALWAYS_RESEARCH_PHRASES = [
+    r"how (?:to|do|does|can|should|would)",
+    r"what (?:is|are|does|can|should)",
+    r"(?:does|can|will|should) .+ (?:support|have|handle|work|do)",
+    r"(?:list|show|get|find) (?:all|available|supported)",
+    r"example (?:of|for|using|with|code)",
+    r"(?:implement|add|create|build|write|generate) .+",
+    r"(?:update|change|modify|edit|fix) .+",
+    r"(?:configure|setup|install|deploy) .+",
+    r"(?:error|issue|problem|bug|not working)",
+    r"(?:api|sdk|library|package|module|framework)",
+    r"(?:documentation|docs|reference|guide)",
+]
+# Exclusion patterns - things that DON'T need research
+EXCLUDE_PATTERNS = [
+    r"^(?:hi|hello|hey|thanks|thank you|ok|okay|yes|no|sure)[\s!?.]*$",
+    r"^(?:good morning|good afternoon|good evening|goodbye|bye)[\s!?.]*$",
+    r"^(?:please|sorry|excuse me)[\s!?.]*$",
+    r"^(?:\d+[\s+\-*/]\d+|calculate|math).*$",  # Simple math
+]
+# ============================================================================
+# DETECTION LOGIC
+# ============================================================================
+def is_excluded(prompt: str) -> bool:
+    """Check if prompt is a simple greeting or non-technical."""
+    prompt_clean = prompt.strip().lower()
+    # Very short prompts that are just greetings
+    if len(prompt_clean) < 20:
+        for pattern in EXCLUDE_PATTERNS:
+            if re.match(pattern, prompt_clean, re.IGNORECASE):
+                return True
+    return False
+def detect_technical_question(prompt: str) -> dict:
+    """
+    Aggressively detect if the prompt is technical and requires research.
+    Returns:
+        {
+            "detected": bool,
+            "terms": list of detected terms,
+            "patterns_matched": list of pattern types,
+            "confidence": "critical" | "high" | "medium" | "low" | "none"
+        }
+    """
+    if is_excluded(prompt):
+        return {
+            "detected": False,
+            "terms": [],
+            "patterns_matched": [],
+            "confidence": "none",
+        }
+    prompt_lower = prompt.lower()
+    detected_terms = []
+    patterns_matched = []
+    # Check for ALWAYS_RESEARCH_PHRASES first (highest priority)
+    for pattern in ALWAYS_RESEARCH_PHRASES:
+        if re.search(pattern, prompt_lower, re.IGNORECASE):
+            patterns_matched.append("always_research")
+            # Extract the matched phrase
+            match = re.search(pattern, prompt_lower, re.IGNORECASE)
+            if match:
+                detected_terms.append(match.group(0)[:50])
+    # Check technical terms
+    for pattern in TECHNICAL_TERMS:
+        matches = re.findall(pattern, prompt_lower, re.IGNORECASE)
+        if matches:
+            detected_terms.extend(matches[:3])  # Limit per pattern
+            patterns_matched.append("technical_term")
+    # Check question patterns
+    for pattern in QUESTION_PATTERNS:
+        if re.search(pattern, prompt_lower, re.IGNORECASE):
+            patterns_matched.append("question_pattern")
+            break
+    # Deduplicate
+    detected_terms = list(dict.fromkeys(detected_terms))[:10]
+    patterns_matched = list(set(patterns_matched))
+    # Determine confidence - MUCH more aggressive
+    if "always_research" in patterns_matched:
+        confidence = "critical"
+    elif "technical_term" in patterns_matched and "question_pattern" in patterns_matched:
+        confidence = "high"
+    elif "technical_term" in patterns_matched:
+        confidence = "high"  # Technical terms alone = high
+    elif "question_pattern" in patterns_matched and len(prompt) > 30:
+        confidence = "medium"  # Questions longer than 30 chars
+    elif len(prompt) > 50:
+        confidence = "low"  # Longer prompts default to low (still triggers)
+    else:
+        confidence = "none"
+    # AGGRESSIVE: Trigger on anything except "none"
+    detected = confidence != "none"
+    return {
+        "detected": detected,
+        "terms": detected_terms,
+        "patterns_matched": patterns_matched,
+        "confidence": confidence,
+    }
+def check_active_workflow() -> bool:
+    """Check if there's an active API development workflow."""
+    if not STATE_FILE.exists():
+        return False
+    try:
+        state = json.loads(STATE_FILE.read_text())
+        phases = state.get("phases", {})
+        for phase_key, phase_data in phases.items():
+            if isinstance(phase_data, dict):
+                status = phase_data.get("status", "")
+                if status in ["in_progress", "pending", "complete"]:
+                    # If ANY phase has been touched, we're in a workflow
+                    return True
+        return False
+    except (json.JSONDecodeError, Exception):
+        return False
+def log_detection(prompt: str, detection: dict, injected: bool) -> None:
+    """Log this detection for debugging/auditing."""
+    try:
+        if STATE_FILE.exists():
+            state = json.loads(STATE_FILE.read_text())
+        else:
+            state = {"prompt_detections": []}
+        if "prompt_detections" not in state:
+            state["prompt_detections"] = []
+        state["prompt_detections"].append({
+            "timestamp": datetime.now().isoformat(),
+            "prompt_preview": prompt[:100] + "..." if len(prompt) > 100 else prompt,
+            "detection": detection,
+            "injected": injected,
+        })
+        # Keep only last 50 detections
+        state["prompt_detections"] = state["prompt_detections"][-50:]
+        STATE_FILE.parent.mkdir(parents=True, exist_ok=True)
+        STATE_FILE.write_text(json.dumps(state, indent=2))
+    except Exception:
+        pass  # Don't fail the hook on logging errors
+# ============================================================================
+# MAIN
+# ============================================================================
+def main():
+    # Read hook input from stdin
+    try:
+        input_data = json.load(sys.stdin)
+    except json.JSONDecodeError:
+        sys.exit(0)
+    prompt = input_data.get("prompt", "")
+    if not prompt or len(prompt.strip()) < 5:
+        sys.exit(0)
+    # Check if in active workflow mode
+    active_workflow = check_active_workflow()
+    # Detect technical questions
+    detection = detect_technical_question(prompt)
+    # In active workflow, ALWAYS inject (even for low confidence)
+    if active_workflow and detection["confidence"] != "none":
+        detection["detected"] = True
+    # Log all detections
+    log_detection(prompt, detection, detection["detected"])
+    # Inject context if detected
+    if detection["detected"]:
+        terms_str = ", ".join(detection["terms"][:5]) if detection["terms"] else "technical question"
+        confidence = detection["confidence"]
+        # Build the injection message
+        injection = f"""
+<user-prompt-submit-hook>
+RESEARCH REQUIRED - {confidence.upper()} CONFIDENCE
+Detected: {terms_str}
+{"MODE: Active API Development Workflow - STRICT ENFORCEMENT" if active_workflow else ""}
+MANDATORY BEFORE ANSWERING:
+1. USE CONTEXT7 FIRST:
+   - Call mcp__context7__resolve-library-id to find the library
+   - Call mcp__context7__get-library-docs to get CURRENT documentation
+   - This gives you the ACTUAL source of truth
+2. USE WEBSEARCH (2-3 SEARCHES MINIMUM):
+   - Search for official documentation
+   - Search with different phrasings to get comprehensive coverage
+   - Search for recent updates, changes, or known issues
+   - Example searches:
+     * "[topic] official documentation"
+     * "[topic] API reference guide"
+     * "[topic] latest updates 2024 2025"
+3. NEVER TRUST TRAINING DATA:
+   - Training data can be months or years outdated
+   - APIs change constantly
+   - Features get added, deprecated, or modified
+   - Parameter names and types change
+4. CITE YOUR SOURCES:
+   - After researching, mention where the information came from
+   - Include links when available
+RESEARCH FIRST. ANSWER SECOND.
+</user-prompt-submit-hook>
+"""
+        print(injection)
+    sys.exit(0)
+if __name__ == "__main__":
+    main()

package/hooks/track-tool-use.py CHANGED Viewed

@@ -147,6 +147,29 @@ def main():
     # Add to sources list
     sources.append(source_entry)
+    # Also add to research_queries for prompt verification
+    research_queries = state.setdefault("research_queries", [])
+    query_entry = {
+        "timestamp": timestamp,
+        "tool": tool_name,
+    }
+    # Extract query/term based on tool type
+    if tool_name == "WebSearch":
+        query_entry["query"] = tool_input.get("query", "")
+        query_entry["terms"] = extract_terms(tool_input.get("query", ""))
+    elif tool_name == "WebFetch":
+        query_entry["url"] = tool_input.get("url", "")
+        query_entry["terms"] = extract_terms_from_url(tool_input.get("url", ""))
+    elif "context7" in tool_name.lower():
+        query_entry["library"] = tool_input.get("libraryName", tool_input.get("libraryId", ""))
+        query_entry["terms"] = [tool_input.get("libraryName", "").lower()]
+    research_queries.append(query_entry)
+    # Keep only last 50 queries
+    state["research_queries"] = research_queries[-50:]
     # Update last activity timestamp
     research["last_activity"] = timestamp
     research["source_count"] = len(sources)
@@ -190,7 +213,7 @@ def main():
 def create_initial_state():
     """Create initial state structure"""
     return {
-        "version": "1.0.0",
+        "version": "1.1.0",
         "created_at": datetime.now().isoformat(),
         "phases": {
             "scope": {"status": "not_started"},
@@ -209,7 +232,9 @@ def create_initial_state():
             "schema_matches_docs": False,
             "tests_cover_params": False,
             "all_tests_passing": False
-        }
+        },
+        "research_queries": [],
+        "prompt_detections": []
     }
@@ -225,5 +250,48 @@ def sanitize_input(tool_input):
     return sanitized
+def extract_terms(query: str) -> list:
+    """Extract searchable terms from a query string."""
+    import re
+    # Remove common words and extract meaningful terms
+    stop_words = {"the", "a", "an", "is", "are", "was", "were", "be", "been",
+                  "how", "to", "do", "does", "what", "which", "for", "and", "or",
+                  "in", "on", "at", "with", "from", "this", "that", "it"}
+    # Extract words
+    words = re.findall(r'\b[\w@/-]+\b', query.lower())
+    # Filter and return
+    terms = [w for w in words if w not in stop_words and len(w) > 2]
+    return terms[:10]  # Limit to 10 terms
+def extract_terms_from_url(url: str) -> list:
+    """Extract meaningful terms from a URL."""
+    import re
+    from urllib.parse import urlparse
+    try:
+        parsed = urlparse(url)
+        # Get domain parts and path parts
+        domain_parts = parsed.netloc.replace("www.", "").split(".")
+        path_parts = [p for p in parsed.path.split("/") if p]
+        # Combine and filter
+        all_parts = domain_parts + path_parts
+        terms = []
+        for part in all_parts:
+            # Split by common separators
+            sub_parts = re.split(r'[-_.]', part.lower())
+            terms.extend(sub_parts)
+        # Filter short/common terms
+        stop_terms = {"com", "org", "io", "dev", "api", "docs", "www", "http", "https"}
+        terms = [t for t in terms if t not in stop_terms and len(t) > 2]
+        return terms[:10]
+    except Exception:
+        return []
 if __name__ == "__main__":
     main()

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@hustle-together/api-dev-tools",
-  "version": "1.6.0",
+  "version": "1.7.1",
   "description": "Interview-driven API development workflow for Claude Code - Automates research, testing, and documentation",
   "main": "bin/cli.js",
   "bin": {

package/templates/api-dev-state.json CHANGED Viewed

@@ -1,8 +1,10 @@
 {
-  "version": "1.0.0",
+  "version": "1.1.0",
   "created_at": null,
   "endpoint": null,
   "library": null,
+  "research_queries": [],
+  "prompt_detections": [],
   "phases": {
     "scope": {
       "status": "not_started",

package/templates/settings.json CHANGED Viewed

@@ -4,6 +4,8 @@
       "WebSearch",
       "WebFetch",
       "mcp__context7",
+      "mcp__context7__resolve-library-id",
+      "mcp__context7__get-library-docs",
       "mcp__github",
       "Bash(claude mcp:*)",
       "Bash(pnpm test:*)",
@@ -14,6 +16,16 @@
     ]
   },
   "hooks": {
+    "UserPromptSubmit": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "$CLAUDE_PROJECT_DIR/.claude/hooks/enforce-external-research.py"
+          }
+        ]
+      }
+    ],
     "PreToolUse": [
       {
         "matcher": "Write|Edit",