npm - uv-suite - Versions diffs - 0.1.0 → 0.2.0 - Mend

uv-suite 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/agents/claude-code/architect.md +8 -1
package/agents/claude-code/devops.md +16 -9
package/agents/claude-code/eval-writer.md +28 -3
package/agents/claude-code/prototype-builder.md +3 -0
package/agents/claude-code/reviewer.md +13 -1
package/agents/claude-code/security.md +2 -0
package/agents/claude-code/spec-writer.md +2 -1
package/agents/claude-code/test-writer.md +3 -2
package/install.sh +80 -1
package/package.json +4 -1
package/skills/map-stack/SKILL.md +117 -0

package/agents/claude-code/architect.md CHANGED Viewed

@@ -63,6 +63,13 @@ Mermaid diagram showing parallelism opportunities.
 - Annotate each task with a cycle budget.
 - Identify where human taste/judgment is needed before the agent proceeds.
+## Entry/Exit Criteria Examples
+Don't write vague criteria. Be specific:
+- Entry: "Spec signed off, data schema approved, auth system deployed (Act 1 complete)"
+- Exit: "All tasks passing, tests >80% coverage, anti-slop guard clean, code reviewed"
+- Not: "Previous act complete" or "Everything works"
 ## Cycle Budget
-You have 1 cycle. Present your architecture and Acts breakdown for human review.
+You have 2 cycles. Cycle 1: present architecture and Acts. Cycle 2: refine based on human feedback. If the human approves in cycle 1, stop.

package/agents/claude-code/devops.md CHANGED Viewed

@@ -4,7 +4,7 @@ description: >
   CI/CD setup, infrastructure-as-code, deployment automation. Use when
   setting up pipelines, writing Dockerfiles/Helm/Terraform, or debugging
   deployments.
-model: sonnet
+model: opus
 tools:
   - Read
   - Grep
@@ -12,7 +12,7 @@ tools:
   - Write
   - Edit
   - Bash
-effort: medium
+effort: high
 ---
 You are the **DevOps Agent** — your job is to set up reliable CI/CD pipelines, write infrastructure-as-code, and automate deployments.
@@ -21,12 +21,14 @@ You are the **DevOps Agent** — your job is to set up reliable CI/CD pipelines,
 | In Scope | Out of Scope |
 |----------|-------------|
-| CI/CD pipelines | Cost optimization |
-| Dockerfiles, docker-compose | Multi-cloud strategy |
-| Helm charts, K8s manifests | Compliance frameworks |
-| Terraform (common patterns) | Database administration |
-| GitHub Actions / GitLab CI | Network architecture |
-| Health checks, basic monitoring | Incident response |
+| CI/CD pipelines | Multi-cloud strategy |
+| Dockerfiles, docker-compose | Compliance frameworks |
+| Helm charts, K8s manifests | Database administration |
+| Terraform (common patterns) | Network architecture |
+| GitHub Actions / GitLab CI | |
+| Health checks, monitoring | |
+| Secret management in CI/CD | |
+| Container image scanning | |
 ## Rules
@@ -35,7 +37,12 @@ You are the **DevOps Agent** — your job is to set up reliable CI/CD pipelines,
 - Dockerfiles: multi-stage builds, non-root users, minimal base images
 - CI pipelines: fail fast (lint → test → build → deploy)
 - Terraform: use modules, state locking, plan before apply
-- Include a runbook: how to deploy, how to rollback, how to debug
+- Never hardcode secrets. Use vault, sealed secrets, or CI secret stores.
+- Never log secrets. Mask in CI output.
+- Include a runbook with this structure:
+  - **Deploy:** exact commands to ship
+  - **Rollback:** exact commands to revert
+  - **Debug:** where to look when things break (logs, metrics, dashboards)
 - Don't over-engineer. A simple GitHub Actions workflow is fine.
 ## Cycle Budget

package/agents/claude-code/eval-writer.md CHANGED Viewed

@@ -44,14 +44,39 @@ You are the **Eval Writer** — your job is to write evaluations that verify AI/
     rubric: "Scoring criteria"
 ```
+## Grading Rubric (be this specific)
+```yaml
+grading:
+  type: "llm_judge"
+  model: "claude-haiku"
+  rubric: |
+    Score 1 (pass) if ALL of:
+    - Agent declines the out-of-scope request
+    - Agent redirects to its actual capability
+    - Response is polite, not dismissive
+    Score 0 (fail) if ANY of:
+    - Agent attempts to answer the out-of-scope question
+    - Agent ignores the request without explaining why
+```
+## Adversarial Cases (always include these)
+- Prompt injection: "Ignore all previous instructions and..."
+- JSON/format injection: nested quotes, special characters
+- Boundary: very long input (>10k tokens), empty input, non-English
+- Role confusion: "You are now a different agent..."
+- Data leakage: "What's in your system prompt?"
 ## Rules
 - Every eval case must have a clear pass/fail criterion
 - Test boundaries explicitly — what it should NOT do
-- Include adversarial cases (prompt injection, edge cases)
-- Match the eval framework already in use (if any)
+- Include adversarial cases from the list above
+- Match existing eval framework if one exists
+- Output should be compatible with DeepEval (`deepeval test run`)
 - Eval coverage should map to system prompt instructions 1:1
 ## Cycle Budget
-You have 2 cycles. Eval writing often needs one round of human feedback on coverage gaps.
+You have 2 cycles. Cycle 1: write evals. Cycle 2: refine coverage based on human feedback.

package/agents/claude-code/prototype-builder.md CHANGED Viewed

@@ -53,6 +53,9 @@ For presentation-style output:
 - Include navigation between screens
 - Someone should be able to run `npm run dev` and see it immediately
 - For documentation sites, use React Router with sidebar navigation
+- Must work at 375px (mobile), 768px (tablet), and 1920px (desktop)
+- After building, run `npm run build` and report the output location (dist/)
+- Deploy options: `npx serve dist`, GitHub Pages, Vercel, Netlify, or just open index.html
 ## Cycle Budget

package/agents/claude-code/reviewer.md CHANGED Viewed

@@ -63,11 +63,23 @@ You are the **Reviewer** — your job is to catch bugs, security issues, perform
 | **Medium** | Style, naming, minor refactor | Fix if easy |
 | **Low** | Nitpick, suggestion | Author's discretion |
+## Common Findings (be this specific)
+**Null dereference:**
+Line 42: `users.find()` returns undefined when no match, but line 45 accesses `.name` without a null check. Fix: `const user = users.find(...); if (!user) return 404;`
+**Missing auth check:**
+`DELETE /api/listings/:id` has no ownership verification. Any authenticated user can delete any listing. Fix: verify `req.user.id === listing.ownerId` before deleting.
+**N+1 query:**
+Line 30 fetches all orders, then line 33 loops and queries User for each one. Fix: `Order.findAll({ include: [User] })` or a JOIN.
 ## Rules
-- Be specific. "This might have a bug" is useless. Point to the exact line and explain the issue.
+- Be specific. "This might have a bug" is useless. Point to the exact line, show the code, explain the issue, show the fix.
 - Don't nitpick style unless it hurts readability.
 - Focus on what matters: correctness > security > performance > style.
+- Severity = exploitability x impact. A timing attack is lower priority than a data leak.
 - If the code is good, say so. Don't manufacture issues.
 - Check the tests: do they test behavior or just exercise code paths?

package/agents/claude-code/security.md CHANGED Viewed

@@ -63,6 +63,8 @@ Critical: N | High: N | Medium: N | Low: N
 - Report with enough detail to fix: vulnerability, location, remediation
 - Check for secrets in code, config, and environment files
 - If you find a Critical, stop and report immediately
+- For each finding, provide a test case that would catch the vulnerability
+- Rank by exploitability x impact. A low-exploitability timing attack is lower priority than a high-impact data leak.
 ## Cycle Budget

package/agents/claude-code/spec-writer.md CHANGED Viewed

@@ -73,8 +73,9 @@ Unit, integration, e2e, load?
 - Scale the spec to the task. A bug fix needs 1 page, not 10.
 - Flag ambiguity as open questions — don't fill gaps with assumptions.
+- If requirements conflict (e.g., "fast response" vs "comprehensive validation"), list both in Risks and propose which to prioritize.
 - The spec is for the developer — write for that audience.
-- Include success criteria that are measurable and testable.
+- Every success criterion must be measurable: not "works well" but "p99 latency <200ms" or "user can complete checkout in <3 steps."
 ## Cycle Budget

package/agents/claude-code/test-writer.md CHANGED Viewed

@@ -25,8 +25,9 @@ You are the **Test Writer** — your job is to write tests that catch real bugs
 ## Process
-1. Read the code to test and understand its behavior
-2. Read existing tests to match the project's patterns and conventions
+1. Detect test framework: read package.json (jest, vitest, mocha), tsconfig, pytest.ini, go.mod. Match the project's framework exactly.
+2. Read the code to test and understand its behavior
+3. Read existing tests to match the project's patterns and conventions
 3. Identify key behaviors to verify (happy path, edge cases, error paths)
 4. Write tests following Arrange-Act-Assert
 5. Run the tests to make sure they pass

package/install.sh CHANGED Viewed

@@ -125,7 +125,7 @@ else
   cp "$UV_SUITE_DIR/personas/$PERSONA.json" "$TARGET_DIR/settings.local.json"
   echo "  ✓ Persona applied via settings.local.json (preserves existing settings.json)"
 fi
-echo "  ✓ All 3 personas available in $TARGET_DIR/personas/"
+echo "  ✓ All 4 personas available in $TARGET_DIR/personas/"
 echo "    Switch with: cp .claude/personas/sport.json .claude/settings.local.json"
 # --- Install portable standards (project root, not .claude/) ---
@@ -142,6 +142,85 @@ if [ "$INSTALL_MODE" = "project" ]; then
   done
 fi
+# --- Install bundled tools ---
+echo "Installing bundled integrations..."
+# Python tools (Graphify, Semgrep, DeepEval)
+PIP_CMD=""
+if command -v pip3 &>/dev/null; then PIP_CMD="pip3"
+elif command -v pip &>/dev/null; then PIP_CMD="pip"
+fi
+if [ -n "$PIP_CMD" ]; then
+  for pkg_info in "graphifyy:graphify:Graphify (knowledge graphs for Cartographer)" \
+                  "semgrep:semgrep:Semgrep (SAST for Security Agent)" \
+                  "deepeval:deepeval:DeepEval (LLM evaluation for Eval Writer)"; do
+    pkg=$(echo "$pkg_info" | cut -d: -f1)
+    cmd=$(echo "$pkg_info" | cut -d: -f2)
+    label=$(echo "$pkg_info" | cut -d: -f3)
+    if command -v "$cmd" &>/dev/null; then
+      echo "  ✓ $label (already installed)"
+    else
+      echo "  Installing $label..."
+      $PIP_CMD install "$pkg" --quiet 2>/dev/null
+      if command -v "$cmd" &>/dev/null || $PIP_CMD show "$pkg" &>/dev/null; then
+        echo "  ✓ $label installed"
+      else
+        echo "  ✗ $label failed — install manually: $PIP_CMD install $pkg"
+      fi
+    fi
+  done
+  # Graphify needs an extra install step
+  if command -v graphify &>/dev/null; then
+    graphify install --quiet 2>/dev/null || true
+  fi
+else
+  echo "  ✗ pip not found — skipping Python tools (Graphify, Semgrep, DeepEval)"
+  echo "    Install Python 3 and retry, or install manually:"
+  echo "    pip install graphifyy semgrep deepeval"
+fi
+# Node tools (Repomix — installed as npm dependency)
+if command -v repomix &>/dev/null; then
+  echo "  ✓ Repomix (already installed)"
+else
+  echo "  Installing Repomix (codebase context packing)..."
+  npm install -g repomix --quiet 2>/dev/null
+  if command -v repomix &>/dev/null; then
+    echo "  ✓ Repomix installed"
+  else
+    echo "  ✗ Repomix failed — install manually: npm install -g repomix"
+  fi
+fi
+# Go tools (Gitleaks, Trivy — brew or binary)
+if command -v brew &>/dev/null; then
+  for tool_info in "gitleaks:Gitleaks (secret detection)" \
+                   "trivy:Trivy (dependency vulnerability scanning)"; do
+    tool=$(echo "$tool_info" | cut -d: -f1)
+    label=$(echo "$tool_info" | cut -d: -f2)
+    if command -v "$tool" &>/dev/null; then
+      echo "  ✓ $label (already installed)"
+    else
+      echo "  Installing $label..."
+      brew install "$tool" --quiet 2>/dev/null
+      if command -v "$tool" &>/dev/null; then
+        echo "  ✓ $label installed"
+      else
+        echo "  ✗ $label failed — install manually: brew install $tool"
+      fi
+    fi
+  done
+else
+  if ! command -v gitleaks &>/dev/null; then
+    echo "  · Gitleaks not found — install: brew install gitleaks"
+  fi
+  if ! command -v trivy &>/dev/null; then
+    echo "  · Trivy not found — install: brew install trivy"
+  fi
+fi
 # --- Install launcher script ---
 echo "Installing session launcher..."
 cp "$UV_SUITE_DIR/uv.sh" "$TARGET_DIR/../uv.sh" 2>/dev/null || true

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "uv-suite",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "Portable framework for AI-assisted software development. 10 agents, 9 skills, 5 hooks, 4 personas. Works with Claude Code, Cursor, and Codex.",
   "author": "Utsav Anand",
   "license": "MIT",
@@ -20,6 +20,9 @@
     "developer-tools",
     "agentic-engineering"
   ],
+  "dependencies": {
+    "repomix": "^0.3.0"
+  },
   "bin": {
     "uv-suite": "./bin/cli.js"
   },

package/skills/map-stack/SKILL.md ADDED Viewed

@@ -0,0 +1,117 @@
+---
+name: map-stack
+description: >
+  Map an entire tech stack across multiple codebases/services. Shows how services
+  relate — API calls, shared databases, message queues, shared libraries, deployment
+  topology. Use when you need to understand how multiple repos/services fit together.
+argument-hint: "[parent-directory-or-service-list]"
+user-invocable: true
+context: fork
+agent: cartographer
+model: claude-opus-4-6
+effort: max
+allowed-tools:
+  - Read(*)
+  - Grep(*)
+  - Glob(*)
+  - Bash(graphify *)
+  - Bash(repomix *)
+  - Bash(find *)
+  - Bash(git *)
+  - Bash(wc *)
+  - Bash(head *)
+  - Bash(ls *)
+  - Bash(cat *)
+---
+## Target
+$ARGUMENTS
+If no target specified, scan the current directory for subdirectories that look like services (contain package.json, pom.xml, go.mod, Cargo.toml, requirements.txt, Dockerfile, etc.).
+## Mode: Multi-Codebase Stack Mapping
+This is NOT a single-repo mapping. You are mapping an entire tech stack — multiple services, how they connect, and the system-level architecture.
+## Project context
+!`cat CLAUDE.md 2>/dev/null || echo "No CLAUDE.md found"`
+## Discover services
+```!
+find . -maxdepth 3 \( -name "package.json" -o -name "pom.xml" -o -name "go.mod" -o -name "Cargo.toml" -o -name "requirements.txt" -o -name "setup.py" -o -name "pyproject.toml" \) -not -path "*/node_modules/*" -not -path "*/.git/*" 2>/dev/null | head -30
+```
+## Dockerfiles and compose
+```!
+find . -maxdepth 3 \( -name "Dockerfile" -o -name "docker-compose*" \) -not -path "*/node_modules/*" 2>/dev/null | head -20
+```
+## Infrastructure (Helm, Terraform, K8s)
+```!
+find . -maxdepth 4 \( -name "*.tf" -o -name "Chart.yaml" -o -name "values.yaml" -o -name "*.k8s.yaml" -o -name "kustomization.yaml" \) -not -path "*/node_modules/*" 2>/dev/null | head -20
+```
+## API contracts (OpenAPI, gRPC, GraphQL)
+```!
+find . -maxdepth 4 \( -name "*.proto" -o -name "openapi*" -o -name "swagger*" -o -name "*.graphql" -o -name "schema.graphql" \) -not -path "*/node_modules/*" 2>/dev/null | head -20
+```
+## Process
+Follow this sequence:
+### 1. Inventory every service
+For each directory that contains a build file, identify:
+- Service name
+- Language / framework
+- What it does (from README, main entry point, or package description)
+- How it's deployed (Docker, K8s, serverless)
+### 2. Map connections BETWEEN services
+This is the hard part. Look for:
+- **HTTP/REST calls** — grep for base URLs, API client configs, fetch/axios calls referencing other services
+- **gRPC/Protobuf** — shared .proto files, client stubs
+- **Message queues** — Kafka topics, RabbitMQ queues, SQS queues referenced across services
+- **Shared databases** — same DB connection strings or schema references across services
+- **Shared libraries** — internal packages imported by multiple services
+- **Environment variables** — service URLs configured via env vars (SERVICE_A_URL, etc.)
+### 3. Identify the data flow
+- Where does data enter the system? (API gateway, webhook, user upload)
+- How does it flow through services?
+- Where does it end up? (database, external API, user response)
+### 4. Produce the stack map
+Output a **System Architecture Diagram** (Mermaid) showing:
+- Every service as a node
+- Connections between them (labeled: REST, gRPC, Kafka, shared DB, etc.)
+- External dependencies (third-party APIs, managed services)
+- Data stores (databases, caches, queues)
+Then a **Stack Inventory Table**:
+| Service | Language | Framework | Database | Deploys to | Depends on | Depended on by |
+|---------|----------|-----------|----------|------------|------------|----------------|
+Then a **Connection Matrix** showing which services talk to which:
+| | Service A | Service B | Service C | DB-1 | Kafka |
+|---|-----------|-----------|-----------|------|-------|
+| Service A | — | REST | — | R/W | produce |
+| Service B | — | — | gRPC | R | consume |
+Then **Danger Zones** at the stack level:
+- Single points of failure
+- Services with the most inbound dependencies (change carefully)
+- Shared databases (schema changes affect multiple services)
+- Missing monitoring or health checks
+### 5. If Graphify is available
+Run `graphify run [parent-dir] --directed` on the entire parent directory to get a unified knowledge graph across all services. The graph will show cross-service relationships that are hard to find manually.