npm - @jamie-tam/forge - Versions diffs - 6.0.0 - Mend

@jamie-tam/forge 6.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (213) hide show

package/skills/quality-security-audit/references/owasp-checks.md ADDED Viewed

@@ -0,0 +1,178 @@
+# OWASP Top 10 — checks and exploit scenarios
+Phase 1 of the security audit: systematically check the code against each OWASP Top 10 category. Each category lists what to check for and at least one realistic exploit scenario.
+**Core rule:** If you cannot describe how an attacker would exploit a finding, it is not a real finding. Every finding at MEDIUM or above MUST include a realistic exploit scenario.
+---
+## A01: Broken Access Control
+**Check for:**
+- Missing authorization checks on endpoints
+- Privilege escalation paths (user accessing admin resources)
+- Insecure direct object references (IDOR) -- can user A access user B's data by changing an ID?
+- Missing function-level access control
+- CORS misconfiguration allowing unauthorized origins
+- Directory traversal in file operations
+**Exploit scenario example:**
+```
+FINDING: IDOR in GET /api/users/:id
+EXPLOIT: Authenticated user changes :id parameter to another user's ID.
+  curl -H "Authorization: Bearer <user-a-token>" /api/users/<user-b-id>
+  Response: 200 OK with user B's profile data including email, phone, address.
+IMPACT: Any authenticated user can read any other user's personal data.
+SEVERITY: HIGH
+FIX: Verify requesting user's ID matches :id parameter, or user has admin role.
+```
+## A02: Cryptographic Failures
+**Check for:**
+- Passwords stored in plaintext or weak hash (MD5, SHA1)
+- Sensitive data transmitted without TLS
+- Hardcoded encryption keys or IVs
+- Weak random number generation for tokens/sessions
+- PII stored without encryption at rest
+- Deprecated cryptographic algorithms
+**Exploit scenario example:**
+```
+FINDING: Password hashed with MD5 in user registration
+EXPLOIT: Attacker obtains database dump. MD5 hashes cracked in seconds using
+  rainbow tables. All user passwords compromised.
+IMPACT: Full account takeover for all users.
+SEVERITY: CRITICAL
+FIX: Use bcrypt/scrypt/argon2id with appropriate cost factor.
+```
+## A03: Injection
+**Check for:**
+- SQL injection (string concatenation in queries)
+- NoSQL injection (unsanitized input in MongoDB queries)
+- Command injection (user input in exec/spawn/system calls)
+- LDAP injection
+- Template injection (user input rendered in server-side templates)
+- Header injection (user input in HTTP headers)
+**Exploit scenario example:**
+```
+FINDING: SQL injection in search endpoint
+CODE: db.query(`SELECT * FROM products WHERE name LIKE '%${req.query.search}%'`)
+EXPLOIT: Attacker sends: /search?search=' UNION SELECT username,password FROM users--
+  Returns all usernames and password hashes.
+IMPACT: Full database read access including credentials.
+SEVERITY: CRITICAL
+FIX: Use parameterized query: db.query('SELECT * FROM products WHERE name LIKE $1', [`%${search}%`])
+```
+## A04: Insecure Design
+**Check for:**
+- Missing rate limiting on authentication endpoints
+- No account lockout after failed attempts
+- Business logic flaws (negative quantities, race conditions in payments)
+- Missing input validation on business rules
+- Lack of defense in depth
+**Exploit scenario example:**
+```
+FINDING: No rate limiting on POST /api/auth/login
+EXPLOIT: Attacker runs brute-force attack with common password list.
+  At 100 requests/second, 10000 common passwords tested in 100 seconds.
+  No lockout, no CAPTCHA, no delay.
+IMPACT: Account takeover for users with weak passwords.
+SEVERITY: HIGH
+FIX: Add rate limiting (5 attempts per minute per IP), account lockout after 10 failures,
+  progressive delays, CAPTCHA after 3 failures.
+```
+## A05: Security Misconfiguration
+**Check for:**
+- Debug mode enabled in production
+- Default credentials in configuration
+- Unnecessary features enabled (directory listing, stack traces)
+- Missing security headers (CSP, HSTS, X-Frame-Options)
+- Overly permissive CORS
+- Verbose error messages exposing internal details
+## A06: Vulnerable and Outdated Components
+**Check for:**
+- Known CVEs in dependencies (npm audit, pip audit, cargo audit)
+- Outdated packages with known vulnerabilities
+- Abandoned/unmaintained packages
+- Packages with very few maintainers (bus factor risk)
+**Run dependency audit:**
+```bash
+# Node.js
+npm audit
+# or: npx better-npm-audit audit
+# Python
+pip audit
+# or: safety check
+# Go
+govulncheck ./...
+# Rust
+cargo audit
+```
+## A07: Identification and Authentication Failures
+**Check for:**
+- Weak password requirements
+- Session tokens in URLs
+- Session fixation vulnerabilities
+- Missing session invalidation on logout/password change
+- JWT without expiration
+- JWT secret hardcoded or weak
+## A08: Software and Data Integrity Failures
+**Check for:**
+- Unsigned updates or deployments
+- Untrusted CI/CD pipeline modifications
+- Deserialization of untrusted data
+- Missing integrity checks on critical data
+## A09: Security Logging and Monitoring Failures
+**Check for:**
+- Missing audit logs for authentication events
+- Missing logs for authorization failures
+- No alerting on suspicious patterns
+- Sensitive data in log output (passwords, tokens, PII)
+- Log injection vulnerabilities
+## A10: Server-Side Request Forgery (SSRF)
+**Check for:**
+- User-controlled URLs in server-side requests
+- Missing URL allowlist validation
+- Internal network access via crafted URLs
+- Cloud metadata endpoint access (169.254.169.254)
+**Exploit scenario example:**
+```
+FINDING: SSRF in image proxy endpoint
+CODE: const image = await fetch(req.query.url);
+EXPLOIT: Attacker sends: /proxy?url=http://169.254.169.254/latest/meta-data/iam/security-credentials/
+  Server fetches AWS credentials and returns them to attacker.
+IMPACT: Full AWS account access via stolen IAM credentials.
+SEVERITY: CRITICAL
+FIX: Validate URL against allowlist. Block private IP ranges. Block metadata endpoints.
+  Use a URL parsing library, do not rely on string matching.
+```

package/skills/quality-test-execution/SKILL.md ADDED Viewed

@@ -0,0 +1,435 @@
+---
+name: quality-test-execution
+description: "Use when a test plan exists and tests need to run — executes all test types specified by the plan."
+---
+# Test Execution
+## Overview
+Execute every test defined in the test plan. Map each scenario to pass/fail. Generate a results report with full traceability back to requirements. No skipping.
+**Core principle:** The test plan says what to test. This skill executes it ALL. No exceptions, no shortcuts, no "we'll test that later."
+**Announce at start:** "I'm using the quality-test-execution skill to execute the full test plan."
+## When to Use
+- After `quality-test-plan` has produced a test plan document
+- During `/feature` and `/greenfield` commands at the test execution phase
+- Before deployment (quality gate requirement)
+**Not for:**
+- Writing tests during implementation (that is build-tdd)
+- Generating test plans (that is quality-test-plan)
+- Ad-hoc testing (just run the tests directly)
+## Prerequisites
+Before executing, the following must exist:
+1. **Test plan** -- `.forge/work/{type}/{name}/test-plan.md` (the plan to execute)
+2. **Implementation code** -- All code from the build-tdd phase
+3. **Test infrastructure** -- Test runners, databases, Playwright, etc. configured
+4. **All test files written** -- Tests from the plan must exist as actual code
+If any are missing, stop. Do not execute a partial test suite.
+## The Execution Process
+### Step 1: Read the Test Plan
+Parse `.forge/work/{type}/{name}/test-plan.md` and extract:
+- All test IDs (UT-001, IT-001, E2E-001, SMOKE-001, LOAD-001, CONTRACT-001)
+- Their expected locations (file paths)
+- Their traceability (which requirements they cover)
+- Pass/fail thresholds (coverage %, response time, error rate)
+**Verify all test files exist.** If a test from the plan has no corresponding test file, flag it immediately.
+```
+MISSING TEST FILES:
+- IT-003: tests/integration/auth/login.test.ts (file not found)
+- E2E-002: tests/e2e/auth/login.spec.ts (file not found)
+Cannot proceed. Write missing tests first.
+```
+### Step 1.5: Codex Mode Check
+Now that the test plan is parsed and files are verified, run the Codex consent flow from `protocols/codex.md`. The selected mode applies for the rest of this skill's invocation.
+- **Takeover:** Dispatch Codex with the test plan to execute tests and produce the results report. Claude reviews test quality and coverage.
+- **Verify** or **Skip / Codex unavailable:** Proceed with the steps below. The Codex Verify note in Step 9 will dispatch Codex to review test quality (Verify only).
+### Step 2: External Service Availability Check
+Before executing any tests, determine which external services are available for real testing.
+1. **Extract** all external services from the architecture's API contract (`architecture/api-contract.md` — look for external service sections)
+2. **Attempt automated connectivity** — hit health endpoints or make minimal requests to each service
+3. **If check succeeds** — service confirmed available, record the endpoint
+4. **If check fails or is ambiguous** — ask the user: "Is [service] running right now? If yes, confirm the endpoint."
+5. **Record the availability matrix** — this feeds into all subsequent test steps
+```markdown
+## External Service Availability
+| Service | Endpoint | Status | Verified |
+|---|---|---|---|
+| Payment API | api.stripe.com/v1/charges | Available ✓ | Real request returned 200 |
+| Email Service | smtp.sendgrid.net | Not available | Connection refused |
+```
+**Downstream enforcement based on availability:**
+| Service Status | Unit Tests | Integration Tests | E2E Tests |
+|---|---|---|---|
+| **Confirmed available** | Mocks OK | SHOULD use real, mocks require documented justification | **MUST use real service** |
+| **Not available** | Mocks OK | Mocks OK, flagged | **BLOCKED** — not faked |
+**Note:** This step is skipped during `/hotfix` execution, consistent with existing gate exemptions (smoke tests only).
+### Step 3: Execute Unit Tests
+```bash
+# Run unit tests with coverage
+npm test -- --coverage
+# or: pytest --cov=src --cov-report=term-missing
+# or: go test ./... -coverprofile=coverage.out
+```
+**Record:**
+- Total tests: passed / failed / skipped
+- Coverage percentage (overall and per-file)
+- Any skipped tests (MUST be justified)
+- Failed test details with error messages
+**Coverage check:**
+```
+Coverage: 87% overall
+  src/auth/service.ts: 95%
+  src/auth/middleware.ts: 100%
+  src/auth/validation.ts: 82%
+  src/users/repository.ts: 73%  <-- BELOW THRESHOLD (80%)
+FAIL: src/users/repository.ts coverage below 80% threshold
+```
+**Rules:**
+- 80% minimum overall coverage
+- 100% for critical paths (auth, payments, data validation)
+- Zero skipped tests without documented justification
+- Zero failed tests (all must pass)
+### Step 4: Execute Integration Tests
+```bash
+# Run integration tests (typically against test database)
+npm run test:integration
+# or: pytest tests/integration/
+# or: go test ./tests/integration/...
+```
+**Pre-execution setup:**
+- Verify test database is running and accessible (use real database, not mocks)
+- Run migrations on test database
+- Seed test data if required by the plan
+- For external services confirmed available in Step 2: use real service (mocks require documented justification)
+- For external services not available: verify stubs/mocks are configured before running the suite (suite is flagged as incomplete, but must still execute)
+**Record:**
+- Total tests: passed / failed / skipped
+- API contract compliance (response shapes match)
+- Database constraint validation results
+- Failed test details with full error output
+**Rules:**
+- Use real database, not mocks
+- Each test cleans up its own data
+- Validate against API contract shapes exactly
+- All integration tests from the plan must execute
+### Step 5: Execute E2E Tests (Playwright)
+Dispatch the **e2e-runner** subagent for E2E test execution. The e2e-runner has Bash access, specializes in Playwright, and frees your context from long test output.
+```bash
+# Run Playwright E2E tests
+npx playwright test
+# or: npx playwright test tests/e2e/auth/
+```
+**Pre-execution setup:**
+- Verify application is running (start if needed)
+- Verify database is seeded with E2E test data
+- Configure Playwright browsers (chromium minimum, cross-browser if specified)
+- Set base URL and auth credentials for test environment
+**Execute each E2E scenario from the plan:**
+```typescript
+// Example: Execute E2E-001 from test plan
+test('E2E-001: User registration flow', async ({ page }) => {
+  // Steps from test plan
+  await page.goto('/register');
+  await page.fill('[name="email"]', 'e2e-test@example.com');
+  await page.fill('[name="password"]', 'ValidPass123');
+  await page.fill('[name="name"]', 'E2E Test User');
+  await page.click('button[type="submit"]');
+  // Expected results from test plan
+  await expect(page).toHaveURL('/dashboard');
+  await expect(page.locator('.welcome-message')).toContainText('E2E Test User');
+});
+```
+**Record:**
+- Total scenarios: passed / failed / skipped
+- Screenshots captured per step (for visual verification)
+- Video recordings of failed scenarios
+- Browser(s) tested against
+- Timing information per scenario
+**Rules:**
+- Every E2E scenario from the plan must execute
+- Capture screenshots at key steps
+- Record video for failed tests
+- Test both happy path AND error scenarios from the plan
+- No "flaky test" excuses -- if a test flakes, fix it
+### Step 6: Execute Smoke Tests
+```bash
+# Run smoke tests (typically lightweight HTTP checks)
+npm run test:smoke
+# or: curl-based health checks
+```
+**Execute each smoke check from the plan:**
+```bash
+# SMOKE-001: Application starts and responds
+curl -f http://localhost:3000/health --max-time 5
+# Expected: 200 OK
+# SMOKE-002: Authentication works
+curl -X POST http://localhost:3000/api/auth/login \
+  -H "Content-Type: application/json" \
+  -d '{"email":"test@example.com","password":"testpass"}' \
+  --max-time 10
+# Expected: 200 + token in response
+# SMOKE-003: Core API responds
+curl -H "Authorization: Bearer $TOKEN" \
+  http://localhost:3000/api/users \
+  --max-time 10
+# Expected: 200
+```
+**Record:**
+- Each check: pass / fail
+- Response time for each check
+- Any timeout or connection failures
+**Rules:**
+- All smoke checks must pass
+- Timeout thresholds from the plan are hard limits
+- Smoke tests must complete within 60 seconds total
+### Step 7: Execute Load/Stress Tests
+**Only execute if specified in the test plan.** Check the project profile -- not all projects need load testing.
+```bash
+# Run load tests with k6
+k6 run tests/load/registration-load.js
+# Or with artillery
+npx artillery run tests/load/registration.yml
+```
+**Execute each load scenario from the plan:**
+```javascript
+// k6 example for LOAD-001
+import http from 'k6/http';
+import { check } from 'k6';
+export const options = {
+  scenarios: {
+    normal_load: {
+      executor: 'constant-vus',
+      vus: 50,
+      duration: '5m',
+    },
+    peak_load: {
+      executor: 'constant-vus',
+      vus: 200,
+      duration: '2m',
+      startTime: '5m',
+    },
+  },
+  thresholds: {
+    http_req_duration: ['p(95)<500', 'p(99)<2000'],
+    http_req_failed: ['rate<0.01'],
+  },
+};
+```
+**Record:**
+- Requests per second achieved
+- Response time percentiles (p50, p95, p99)
+- Error rate under each load level
+- Breaking point (if stress test included)
+- Threshold pass/fail status
+**Rules:**
+- Use thresholds from the test plan (not arbitrary values)
+- Run against a clean environment (no other traffic)
+- Report actual numbers, not just pass/fail
+- If thresholds fail, this is a blocking issue
+### Step 8: Execute API Contract Tests
+```bash
+# Run contract tests
+npm run test:contract
+# or: pytest tests/contract/
+```
+**Verify each contract from the plan:**
+```typescript
+// CONTRACT-001: POST /api/users request/response shape
+test('POST /api/users matches API contract', async () => {
+  const response = await request(app)
+    .post('/api/users')
+    .send(validUserPayload)
+    .expect(201);
+  // Validate response shape matches api-contract.md exactly
+  expect(response.body).toMatchSchema(userCreatedSchema);
+});
+```
+**Record:**
+- Each contract: valid / violated
+- Schema mismatches with details
+- Missing fields, extra fields, wrong types
+**Rules:**
+- Validate exact shapes (not just "has some fields")
+- Test all status codes defined in the contract
+- Any contract violation is a blocking issue
+### Step 9: Generate Results Report
+Map every test result back to the plan and produce the report.
+**Codex Verify:** Before presenting the report, check the mode recorded at Step 1.5. If **Verify** was selected, dispatch Codex to review for tests that pass but test the wrong thing, misleading coverage, and mock-only blind spots. If **Takeover** was selected, skip this step (Codex already ran). If **Skip**, do nothing. Do NOT re-run the consent flow. See **Codex Integration** section below for full details.
+**Output to:** `.forge/work/{type}/{name}/test-results.md`
+Required sections:
+1. **Metadata** — feature, date, duration, link to test plan
+2. **Summary table** — each test type: Total / Passed / Failed / Skipped / Status
+3. **Coverage report** — per-file and overall, against thresholds
+4. **Traceability matrix** — each test ID mapped to requirement and result
+5. **Failed tests** — error, screenshot/video path, root requirement traced
+6. **Load test results** — p50/p95/p99, error rate, threshold status (if applicable)
+7. **Quality gate status** — each gate criteria, pass/fail
+8. **External service verification** — from Step 2 availability matrix: service, verified status, real tests run
+## Enforcement Rules
+### No Skipping
+Every test in the plan MUST execute. If a test cannot execute:
+1. **Infrastructure missing** -- Set it up. Do not skip.
+2. **Test is flaky** -- Fix it. Do not skip.
+3. **Test takes too long** -- Optimize it. Do not skip.
+4. **External service confirmed available** -- E2E tests MUST use real service. Integration tests SHOULD use real service. Do not mock what is available.
+5. **External service not available** -- E2E tests that depend on the service are BLOCKED (not faked, not worked around with pre-existing data). Integration tests may use stubs. Flag the suite as INCOMPLETE.
+6. **E2E tests MUST NOT fall back to cached, pre-seeded, or pre-existing data** as a substitute for real service responses. If a real service is slow or times out, the test FAILS — it does not degrade to fake data.
+The ONLY acceptable skip is a test that the user explicitly requests to defer, with documented justification.
+### No Partial Execution
+Do not execute only unit tests and declare success. All test types from the plan must run:
+- Unit tests
+- Integration tests
+- E2E tests
+- Smoke tests
+- Load tests (if in plan)
+- Contract tests (if in plan)
+### Failure Handling
+| Failures | Action |
+|----------|--------|
+| 0 failures | PASS -- proceed to deployment gate |
+| 1-3 failures | FAIL -- report details, return to build-tdd to fix |
+| 4+ failures | FAIL -- report details, may need architecture review |
+| Infrastructure failure | BLOCKED -- fix infrastructure, re-run entire suite |
+| External service unavailable | INCOMPLETE -- can proceed to PR with flag visible, cannot deploy to production without resolution |
+## Quality Enforcement
+| Do NOT | DO |
+|---|---|
+| Run only unit tests | Execute ALL test types from the plan |
+| Ignore flaky tests | Fix them — flaky = bug |
+| Skip load tests | Run if in the plan |
+| Proceed with failures | Zero failures = quality gate |
+| Skip E2E screenshots/video | Capture at every key step |
+| Test against production data | Use isolated test environment |
+| Declare success without traceability | Map every result to the plan |
+## I/O Contract
+| Field | Value |
+|---|---|
+| **Requires** | Test plan (`.forge/work/{type}/{name}/test-plan.md`) + implementation code (from `build-tdd`) + architecture artifacts (`.forge/work/{type}/{name}/architecture/api-contract.md` for external service verification) |
+| **Produces** | `.forge/work/{type}/{name}/test-results.md` |
+| **Feeds into** | `deliver-deploy` (via quality gate -- zero failures required) |
+| **Updates manifest** | `artifacts.test-results: test-results.md`, `phases.quality.test-execution: { status: complete, gate-passed: true }` |
+## Codex Integration
+**Modes:** Verify or Takeover | **Protocol:** `protocols/codex.md`
+- **Verify:** Claude executes tests and reports, Codex reviews test quality.
+- **Takeover:** Codex executes tests and reports, Claude reviews results.
+**When:** After test execution, when reviewing results.
+**Context to pass:**
+- Path to `test-results.md` or test output
+- Path to `test-plan.md`
+- Path to test source files
+**What Codex reviews:**
+- Tests that pass but test the wrong thing (assertions on mocks, not real behavior)
+- Coverage numbers that are misleading (high line coverage, low branch coverage)
+- Integration boundaries tested only via mocks
+**Prompt focus:** "Review these test results and test source code. Identify tests that pass but don't verify real behavior — especially mock-based tests at integration boundaries. Flag misleading coverage metrics. Which passing tests would still pass if the feature were completely broken?"
+**Presentation:** Codex findings presented as "Test Quality Review" alongside pass/fail results.
+---
+## Integration
+**Called by:**
+- `/feature` command (test execution phase)
+- `/greenfield` command (test execution phase)
+**Pairs with:**
+- `quality-test-plan` (provides the plan to execute)
+- `build-tdd` (provides the tests and implementation)
+- `deliver-deploy` (test results are a quality gate for deployment)
+- `quality-code-review` (test failures may trigger re-review)