xtrm-tools 0.7.17 → 0.7.19
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.xtrm/config/hooks.json +2 -0
- package/.xtrm/config/instructions/agents-top.md +2 -1
- package/.xtrm/registry.json +429 -712
- package/.xtrm/skills/default/creating-service-skills/scripts/bootstrap.py +82 -156
- package/.xtrm/skills/default/creating-service-skills/scripts/scaffolder.py +73 -121
- package/.xtrm/skills/default/hook-development/references/patterns.md +1 -1
- package/.xtrm/skills/default/last30days/scripts/test-v1-vs-v2.sh +2 -2
- package/.xtrm/skills/default/planning/SKILL.md +75 -29
- package/.xtrm/skills/default/releasing/SKILL.md +163 -57
- package/.xtrm/skills/default/security-pipeline/SKILL.md +192 -0
- package/.xtrm/skills/default/security-pipeline/scripts/security-bootstrap.sh +294 -0
- package/.xtrm/skills/default/security-pipeline/templates/.githooks/pre-push.template +39 -0
- package/.xtrm/skills/default/security-pipeline/templates/.github/workflows/gitleaks.yml +33 -0
- package/.xtrm/skills/default/security-pipeline/templates/.github/workflows/osv-scanner.yml +33 -0
- package/.xtrm/skills/default/security-pipeline/templates/.github/workflows/semgrep.yml +41 -0
- package/.xtrm/skills/default/security-pipeline/templates/.gitleaks.toml +44 -0
- package/.xtrm/skills/default/security-pipeline/templates/.pre-commit-config.yaml +67 -0
- package/.xtrm/skills/default/security-pipeline/templates/.semgrepignore +46 -0
- package/.xtrm/skills/default/security-pipeline/templates/scripts/security-scan.sh +57 -0
- package/.xtrm/skills/default/security-pipeline/templates/scripts/semgrep-diff.sh +68 -0
- package/.xtrm/skills/default/session-close-report/SKILL.md +167 -6
- package/.xtrm/skills/default/sync-docs/SKILL.md +1 -1
- package/.xtrm/skills/default/update-xt/SKILL.md +270 -4
- package/.xtrm/skills/default/updating-service-skills/scripts/drift_detector.py +22 -0
- package/.xtrm/skills/default/using-script-specialists/SKILL.md +7 -5
- package/.xtrm/skills/default/using-specialists/SKILL.md +13 -12
- package/.xtrm/skills/default/using-specialists-auto/SKILL.md +137 -0
- package/.xtrm/skills/default/using-specialists-v2/SKILL.md +14 -21
- package/.xtrm/skills/default/using-specialists-v3/SKILL.md +533 -21
- package/.xtrm/skills/default/vaultctl/SKILL.md +2 -2
- package/CHANGELOG.md +87 -3
- package/cli/dist/index.cjs +12429 -3769
- package/cli/dist/index.cjs.map +1 -1
- package/cli/package.json +9 -3
- package/package.json +27 -7
- package/packages/pi-extensions/package.json +1 -1
- package/.xtrm/skills/default/planning/evals/evals.json +0 -19
- package/.xtrm/skills/default/quality-gates/evals/evals.json +0 -181
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/FINAL-EVAL-SUMMARY.md +0 -75
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/edge-case-auto-fix-verification/with_skill/outputs/response.md +0 -59
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/edge-case-mixed-language-project/with_skill/outputs/response.md +0 -60
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/eval-summary.md +0 -105
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/partial-install-python-only/with_skill/outputs/response.md +0 -93
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/python-refactor-request/with_skill/outputs/response.md +0 -104
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/quality-gate-error-fix/with_skill/outputs/response.md +0 -74
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/should-not-trigger-general-chat/with_skill/outputs/response.md +0 -18
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/should-not-trigger-math-question/with_skill/outputs/response.md +0 -18
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/should-not-trigger-unrelated-coding/with_skill/outputs/response.md +0 -56
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/tdd-guard-blocking-confusion/with_skill/outputs/response.md +0 -67
- package/.xtrm/skills/default/quality-gates/workspace/iteration-1/typescript-feature-with-tests/with_skill/outputs/response.md +0 -97
- package/.xtrm/skills/default/sync-docs/evals/evals.json +0 -89
- package/.xtrm/skills/default/test-planning/evals/evals.json +0 -23
- package/.xtrm/skills/default/using-specialists/SKILL.safe.md +0 -1082
- package/.xtrm/skills/default/using-specialists/SKILL.ultra.md +0 -1082
- package/.xtrm/skills/default/using-specialists/evals/evals.json +0 -68
- package/.xtrm/skills/default/using-specialists-v3/evals/evals.json +0 -89
- package/packages/pi-extensions/.serena/project.yml +0 -130
|
@@ -1,93 +0,0 @@
|
|
|
1
|
-
# Eval 4: partial-install-python-only
|
|
2
|
-
|
|
3
|
-
**Prompt:** I'm working on a Python-only project with pytest. What quality tools should I install?
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response (with skill active)
|
|
10
|
-
|
|
11
|
-
For a **Python-only project**, you need the following quality tools:
|
|
12
|
-
|
|
13
|
-
### Required Installation
|
|
14
|
-
|
|
15
|
-
```bash
|
|
16
|
-
# 1. TDD Guard CLI (global install)
|
|
17
|
-
npm install -g tdd-guard
|
|
18
|
-
|
|
19
|
-
# 2. TDD Guard pytest reporter (project install)
|
|
20
|
-
pip install tdd-guard-pytest
|
|
21
|
-
|
|
22
|
-
# 3. Python Quality Gate tools
|
|
23
|
-
pip install ruff mypy
|
|
24
|
-
```
|
|
25
|
-
|
|
26
|
-
### Configure TDD Guard Reporter
|
|
27
|
-
|
|
28
|
-
Add to your `pyproject.toml` or `conftest.py`:
|
|
29
|
-
|
|
30
|
-
**Option A: pyproject.toml**
|
|
31
|
-
```toml
|
|
32
|
-
[tool.pytest.ini_options]
|
|
33
|
-
tdd_guard_project_root = "/absolute/path/to/your/project"
|
|
34
|
-
```
|
|
35
|
-
|
|
36
|
-
**Option B: conftest.py**
|
|
37
|
-
```python
|
|
38
|
-
# conftest.py
|
|
39
|
-
from tdd_guard_pytest import PytestReporter
|
|
40
|
-
|
|
41
|
-
def pytest_configure(config):
|
|
42
|
-
config.option.tdd_guard_project_root = "/absolute/path/to/your/project"
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
### Configure Quality Gate
|
|
46
|
-
|
|
47
|
-
Set environment variables (in `.bashrc`, `.zshrc`, or project `.env`):
|
|
48
|
-
|
|
49
|
-
```bash
|
|
50
|
-
export CLAUDE_HOOKS_RUFF_ENABLED=true
|
|
51
|
-
export CLAUDE_HOOKS_MYPY_ENABLED=true
|
|
52
|
-
export CLAUDE_HOOKS_AUTOFIX=true
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
### Create Configuration Files
|
|
56
|
-
|
|
57
|
-
**ruff.toml** or **pyproject.toml**:
|
|
58
|
-
```toml
|
|
59
|
-
[tool.ruff]
|
|
60
|
-
line-length = 88
|
|
61
|
-
target-version = "py38"
|
|
62
|
-
|
|
63
|
-
[tool.ruff.lint]
|
|
64
|
-
select = ["E", "F", "W", "I", "N", "UP", "B", "C4"]
|
|
65
|
-
```
|
|
66
|
-
|
|
67
|
-
**mypy.ini**:
|
|
68
|
-
```ini
|
|
69
|
-
[mypy]
|
|
70
|
-
python_version = 3.8
|
|
71
|
-
warn_return_any = True
|
|
72
|
-
warn_unused_configs = True
|
|
73
|
-
disallow_untyped_defs = False
|
|
74
|
-
ignore_missing_imports = True
|
|
75
|
-
```
|
|
76
|
-
|
|
77
|
-
### Your Python-Only Workflow
|
|
78
|
-
|
|
79
|
-
```
|
|
80
|
-
1. Write failing test (pytest)
|
|
81
|
-
↓
|
|
82
|
-
2. TDD Guard validates test exists
|
|
83
|
-
↓
|
|
84
|
-
3. Implement feature
|
|
85
|
-
↓
|
|
86
|
-
4. Ruff + Mypy validate quality (auto-fix applied)
|
|
87
|
-
↓
|
|
88
|
-
5. pytest confirms tests pass
|
|
89
|
-
```
|
|
90
|
-
|
|
91
|
-
---
|
|
92
|
-
|
|
93
|
-
**Note:** You don't need TypeScript tools (eslint, prettier, tdd-guard-jest, etc.) for a Python-only project. The Quality Gates skill handles partial installs gracefully.
|
|
@@ -1,104 +0,0 @@
|
|
|
1
|
-
# Eval 2: python-refactor-request
|
|
2
|
-
|
|
3
|
-
**Prompt:** Refactor the database connection pooling in db/connection.py to use async/await. Current implementation is blocking and causing performance issues.
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response (with skill active)
|
|
10
|
-
|
|
11
|
-
To refactor the database connection pooling to async/await, follow the **Quality Gates workflow** for Python:
|
|
12
|
-
|
|
13
|
-
### Step 1: Write a Failing Test First
|
|
14
|
-
|
|
15
|
-
TDD Guard requires a failing test before implementation. Create a test that verifies async behavior:
|
|
16
|
-
|
|
17
|
-
**Create** `tests/test_connection_async.py`:
|
|
18
|
-
|
|
19
|
-
```python
|
|
20
|
-
import pytest
|
|
21
|
-
import asyncio
|
|
22
|
-
from db.connection import AsyncConnectionPool
|
|
23
|
-
|
|
24
|
-
class TestAsyncConnectionPool:
|
|
25
|
-
@pytest.mark.asyncio
|
|
26
|
-
async def test_async_connection_acquire(self):
|
|
27
|
-
pool = AsyncConnectionPool(max_size=5)
|
|
28
|
-
async with pool.acquire() as conn:
|
|
29
|
-
assert conn is not None
|
|
30
|
-
result = await conn.execute("SELECT 1")
|
|
31
|
-
assert result == 1
|
|
32
|
-
|
|
33
|
-
@pytest.mark.asyncio
|
|
34
|
-
async def test_async_connection_pooling(self):
|
|
35
|
-
pool = AsyncConnectionPool(max_size=2)
|
|
36
|
-
tasks = [pool.acquire() for _ in range(5)]
|
|
37
|
-
# Should handle contention gracefully
|
|
38
|
-
async with asyncio.TaskGroup() as tg:
|
|
39
|
-
for task in tasks:
|
|
40
|
-
tg.create_task(task)
|
|
41
|
-
```
|
|
42
|
-
|
|
43
|
-
Run tests to confirm failure:
|
|
44
|
-
```bash
|
|
45
|
-
pytest tests/test_connection_async.py # Should fail - implementation doesn't exist
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
### Step 2: Implement Async Refactor
|
|
49
|
-
|
|
50
|
-
Once tests fail, TDD Guard allows implementation:
|
|
51
|
-
|
|
52
|
-
```python
|
|
53
|
-
# db/connection.py
|
|
54
|
-
import asyncio
|
|
55
|
-
import asyncpg
|
|
56
|
-
from contextlib import asynccontextmanager
|
|
57
|
-
|
|
58
|
-
class AsyncConnectionPool:
|
|
59
|
-
def __init__(self, max_size: int = 10, dsn: str = None):
|
|
60
|
-
self.max_size = max_size
|
|
61
|
-
self.dsn = dsn or os.environ.get("DATABASE_URL")
|
|
62
|
-
self._pool = None
|
|
63
|
-
|
|
64
|
-
async def initialize(self):
|
|
65
|
-
self._pool = await asyncpg.create_pool(
|
|
66
|
-
self.dsn,
|
|
67
|
-
max_size=self.max_size,
|
|
68
|
-
command_timeout=60
|
|
69
|
-
)
|
|
70
|
-
|
|
71
|
-
@asynccontextmanager
|
|
72
|
-
async def acquire(self):
|
|
73
|
-
if not self._pool:
|
|
74
|
-
await self.initialize()
|
|
75
|
-
async with self._pool.acquire() as conn:
|
|
76
|
-
yield conn
|
|
77
|
-
|
|
78
|
-
async def close(self):
|
|
79
|
-
if self._pool:
|
|
80
|
-
await self._pool.close()
|
|
81
|
-
```
|
|
82
|
-
|
|
83
|
-
### Step 3: Python Quality Gate Validation
|
|
84
|
-
|
|
85
|
-
After editing `db/connection.py`, the **Python Quality Gate** automatically runs:
|
|
86
|
-
|
|
87
|
-
- **Ruff linting** - Checks style, best practices (auto-fixes enabled)
|
|
88
|
-
- **Ruff formatting** - Black-compatible formatting (auto-fixes enabled)
|
|
89
|
-
- **Mypy type checking** - Static type validation (manual fix required)
|
|
90
|
-
|
|
91
|
-
If blocking errors (exit code 2), fix remaining issues. Type errors and complex lint violations need manual fixes.
|
|
92
|
-
|
|
93
|
-
### Step 4: Verify Tests Pass
|
|
94
|
-
|
|
95
|
-
```bash
|
|
96
|
-
pytest tests/test_connection_async.py -v # All tests should pass
|
|
97
|
-
```
|
|
98
|
-
|
|
99
|
-
---
|
|
100
|
-
|
|
101
|
-
**Note:** Ensure dependencies are installed:
|
|
102
|
-
- `pip install tdd-guard-pytest` (TDD Guard reporter)
|
|
103
|
-
- `pip install ruff mypy` (Quality gate tools)
|
|
104
|
-
- `pip install asyncpg pytest pytest-asyncio` (Async DB and testing)
|
|
@@ -1,74 +0,0 @@
|
|
|
1
|
-
# Eval 3: quality-gate-error-fix
|
|
2
|
-
|
|
3
|
-
**Prompt:** I'm getting blocked by the quality gate with TypeScript errors. Here's the error: 'Type string is not assignable to type number'. How do I fix this?
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response (with skill active)
|
|
10
|
-
|
|
11
|
-
You're encountering a **blocking error** from the TypeScript Quality Gate (exit code 2). Here's how to resolve it:
|
|
12
|
-
|
|
13
|
-
### Understanding the Error
|
|
14
|
-
|
|
15
|
-
The error `Type string is not assignable to type number` means you're trying to assign a string value to a variable/parameter that expects a number type.
|
|
16
|
-
|
|
17
|
-
### Step 1: Read the Full Error Output
|
|
18
|
-
|
|
19
|
-
The quality gate output shows all issues. Example:
|
|
20
|
-
```
|
|
21
|
-
[ERROR] TypeScript compilation failed:
|
|
22
|
-
src/user.ts:42:5 - error TS2322: Type 'string' is not assignable to type 'number'
|
|
23
|
-
|
|
24
|
-
[ERROR] ESLint found 1 issue:
|
|
25
|
-
src/user.ts:15:10 - 'unusedVar' is defined but never used
|
|
26
|
-
|
|
27
|
-
[WARN] Auto-fix applied: 1 issue fixed
|
|
28
|
-
[BLOCK] 2 issues remain - fix before continuing
|
|
29
|
-
```
|
|
30
|
-
|
|
31
|
-
### Step 2: Fix Type Errors (Manual)
|
|
32
|
-
|
|
33
|
-
Type errors **cannot be auto-fixed** - you must fix them manually:
|
|
34
|
-
|
|
35
|
-
```typescript
|
|
36
|
-
// ❌ Wrong - assigning string to number
|
|
37
|
-
const userId: number = "123";
|
|
38
|
-
|
|
39
|
-
// ✅ Fix - convert string to number
|
|
40
|
-
const userId: number = parseInt("123", 10);
|
|
41
|
-
// or
|
|
42
|
-
const userId: number = Number("123");
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Common fixes:
|
|
46
|
-
- Use `parseInt()` or `Number()` for string→number conversion
|
|
47
|
-
- Check function return types match declarations
|
|
48
|
-
- Verify API response types match your interfaces
|
|
49
|
-
|
|
50
|
-
### Step 3: Auto-Fix Handles Linting/Formatting
|
|
51
|
-
|
|
52
|
-
ESLint and Prettier issues are **auto-fixed** when enabled. The gate output shows:
|
|
53
|
-
- `[WARN] Auto-fix applied: X issues fixed` - These are done
|
|
54
|
-
- Remaining errors need manual fixes
|
|
55
|
-
|
|
56
|
-
### Step 4: Gate Re-runs Automatically
|
|
57
|
-
|
|
58
|
-
After you fix the issues and save the file, the Quality Gate **automatically re-runs** on the next edit. No manual trigger needed.
|
|
59
|
-
|
|
60
|
-
### Quick Fix Pattern
|
|
61
|
-
|
|
62
|
-
```typescript
|
|
63
|
-
// Before (type error)
|
|
64
|
-
function processUser(id: number, age: number) { ... }
|
|
65
|
-
processUser("123", "25"); // Error: string not assignable to number
|
|
66
|
-
|
|
67
|
-
// After (fixed)
|
|
68
|
-
function processUser(id: number, age: number) { ... }
|
|
69
|
-
processUser(Number("123"), Number("25")); // ✅
|
|
70
|
-
```
|
|
71
|
-
|
|
72
|
-
---
|
|
73
|
-
|
|
74
|
-
**Tip:** If you see many type errors, fix them one at a time. The gate re-runs after each edit, giving you incremental feedback.
|
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
# Eval 11: should-not-trigger-general-chat
|
|
2
|
-
|
|
3
|
-
**Prompt:** What's the weather like today?
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates (should NOT trigger)
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response
|
|
10
|
-
|
|
11
|
-
**Skill did NOT trigger** — This is general chat, not a code quality workflow question.
|
|
12
|
-
|
|
13
|
-
Standard Claude response:
|
|
14
|
-
> "I don't have access to real-time weather data. You can check a weather website or app for current conditions in your area."
|
|
15
|
-
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
**Result:** ✅ Correct — skill appropriately did not activate
|
|
@@ -1,18 +0,0 @@
|
|
|
1
|
-
# Eval 13: should-not-trigger-math-question
|
|
2
|
-
|
|
3
|
-
**Prompt:** What's the time complexity of binary search?
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates (should NOT trigger)
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response
|
|
10
|
-
|
|
11
|
-
**Skill did NOT trigger** — This is a CS theory question, not about quality gates.
|
|
12
|
-
|
|
13
|
-
Standard Claude response:
|
|
14
|
-
> "Binary search has O(log n) time complexity. It works by repeatedly dividing the search interval in half..."
|
|
15
|
-
|
|
16
|
-
---
|
|
17
|
-
|
|
18
|
-
**Result:** ✅ Correct — skill appropriately did not activate
|
|
@@ -1,56 +0,0 @@
|
|
|
1
|
-
# Eval 12: should-not-trigger-unrelated-coding (Re-run with Minimal Mode)
|
|
2
|
-
|
|
3
|
-
**Prompt:** Write a Python script to scrape data from example.com and save it to CSV.
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates (Minimal Mode)
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response
|
|
10
|
-
|
|
11
|
-
Here's a Python script to scrape data from example.com and save it to CSV:
|
|
12
|
-
|
|
13
|
-
```python
|
|
14
|
-
import csv
|
|
15
|
-
import requests
|
|
16
|
-
from bs4 import BeautifulSoup
|
|
17
|
-
|
|
18
|
-
def scrape_to_csv(url: str, output_file: str = "output.csv"):
|
|
19
|
-
"""Scrape data from a webpage and save to CSV."""
|
|
20
|
-
response = requests.get(url)
|
|
21
|
-
response.raise_for_status()
|
|
22
|
-
|
|
23
|
-
soup = BeautifulSoup(response.text, 'html.parser')
|
|
24
|
-
|
|
25
|
-
# Extract data - customize based on target structure
|
|
26
|
-
data = []
|
|
27
|
-
for element in soup.find_all('p'): # Example: extract all paragraphs
|
|
28
|
-
text = element.get_text(strip=True)
|
|
29
|
-
if text:
|
|
30
|
-
data.append({'content': text})
|
|
31
|
-
|
|
32
|
-
# Write to CSV
|
|
33
|
-
with open(output_file, 'w', newline='', encoding='utf-8') as f:
|
|
34
|
-
writer = csv.DictWriter(f, fieldnames=['content'])
|
|
35
|
-
writer.writeheader()
|
|
36
|
-
writer.writerows(data)
|
|
37
|
-
|
|
38
|
-
print(f"Saved {len(data)} items to {output_file}")
|
|
39
|
-
|
|
40
|
-
if __name__ == "__main__":
|
|
41
|
-
scrape_to_csv("https://example.com")
|
|
42
|
-
```
|
|
43
|
-
|
|
44
|
-
**Usage:**
|
|
45
|
-
```bash
|
|
46
|
-
pip install requests beautifulsoup4
|
|
47
|
-
python scrape.py
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
---
|
|
51
|
-
|
|
52
|
-
**Note:** Consider adding tests for this script. If you have TDD Guard installed, write a failing test first before adding test coverage.
|
|
53
|
-
|
|
54
|
-
---
|
|
55
|
-
|
|
56
|
-
**Result:** ✅ Minimal mode applied — Task completed directly, brief TDD mention at end, no full workflow explanation
|
|
@@ -1,67 +0,0 @@
|
|
|
1
|
-
# Eval 5: tdd-guard-blocking-confusion
|
|
2
|
-
|
|
3
|
-
**Prompt:** Why am I getting 'No failing test found' when I try to edit src/service.ts? I just want to add a logging statement.
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response (with skill active)
|
|
10
|
-
|
|
11
|
-
You're encountering the **TDD Guard**, which enforces test-driven development for **all code changes** - including small changes like adding a logging statement.
|
|
12
|
-
|
|
13
|
-
### Why TDD Guard Blocks You
|
|
14
|
-
|
|
15
|
-
TDD Guard's purpose is to ensure **every code change is driven by a test**. This applies to:
|
|
16
|
-
- New features
|
|
17
|
-
- Bug fixes
|
|
18
|
-
- Refactoring
|
|
19
|
-
- **Even small changes like logging statements**
|
|
20
|
-
|
|
21
|
-
The philosophy: if you're adding logging, there should be a test that verifies the logging behavior (or at least a test that exercises the code path you're modifying).
|
|
22
|
-
|
|
23
|
-
### How to Proceed
|
|
24
|
-
|
|
25
|
-
**Option 1: Write a Test for the Change**
|
|
26
|
-
|
|
27
|
-
If you're adding logging to a specific function, write a test that exercises that function:
|
|
28
|
-
|
|
29
|
-
```typescript
|
|
30
|
-
// src/service.test.ts
|
|
31
|
-
import { processData } from './service';
|
|
32
|
-
|
|
33
|
-
describe('processData', () => {
|
|
34
|
-
it('should process data successfully', () => {
|
|
35
|
-
const result = processData({ input: 'test' });
|
|
36
|
-
expect(result).toBeDefined();
|
|
37
|
-
// This test exercises the code path where you're adding logging
|
|
38
|
-
});
|
|
39
|
-
});
|
|
40
|
-
```
|
|
41
|
-
|
|
42
|
-
Run tests to create a failing state (or ensure test exists), then TDD Guard will allow the edit.
|
|
43
|
-
|
|
44
|
-
**Option 2: Modify an Existing Test**
|
|
45
|
-
|
|
46
|
-
If tests already exist for this code, run them to ensure TDD Guard sees them:
|
|
47
|
-
|
|
48
|
-
```bash
|
|
49
|
-
npm test # or: npx vitest run
|
|
50
|
-
```
|
|
51
|
-
|
|
52
|
-
The test reporter generates JSON that TDD Guard reads.
|
|
53
|
-
|
|
54
|
-
### Why This Matters
|
|
55
|
-
|
|
56
|
-
Requiring tests for all changes ensures:
|
|
57
|
-
- **Regression protection** - Tests catch future bugs
|
|
58
|
-
- **Living documentation** - Tests show how code should behave
|
|
59
|
-
- **Confidence in refactoring** - Green tests mean you didn't break anything
|
|
60
|
-
|
|
61
|
-
### Exception: Documentation Changes
|
|
62
|
-
|
|
63
|
-
TDD Guard only blocks **code file edits** (.ts, .js, .py, etc.). Documentation files (.md, .txt) can be edited without tests.
|
|
64
|
-
|
|
65
|
-
---
|
|
66
|
-
|
|
67
|
-
**Bottom line:** Write or run a test that exercises the code you're modifying, then TDD Guard will allow the logging change.
|
|
@@ -1,97 +0,0 @@
|
|
|
1
|
-
# Eval 1: typescript-feature-with-tests
|
|
2
|
-
|
|
3
|
-
**Prompt:** Add a new authentication middleware to src/auth.ts that validates JWT tokens. The middleware should check the Authorization header, verify the token, and attach the user to the request object.
|
|
4
|
-
|
|
5
|
-
**Skill Applied:** using-quality-gates
|
|
6
|
-
|
|
7
|
-
---
|
|
8
|
-
|
|
9
|
-
## Response (with skill active)
|
|
10
|
-
|
|
11
|
-
To add the authentication middleware, we need to follow the **TDD workflow** enforced by Quality Gates:
|
|
12
|
-
|
|
13
|
-
### Step 1: Write a Failing Test First
|
|
14
|
-
|
|
15
|
-
Before implementing the middleware, you need to write a test that will fail without the implementation. The TDD Guard will block implementation until a failing test exists.
|
|
16
|
-
|
|
17
|
-
**Create** `src/auth.test.ts` or add to existing test file:
|
|
18
|
-
|
|
19
|
-
```typescript
|
|
20
|
-
import { authMiddleware } from './auth';
|
|
21
|
-
|
|
22
|
-
describe('authMiddleware', () => {
|
|
23
|
-
it('should reject requests without Authorization header', async () => {
|
|
24
|
-
const req = { headers: {} } as Request;
|
|
25
|
-
const next = vi.fn();
|
|
26
|
-
|
|
27
|
-
await authMiddleware(req, {} as Response, next);
|
|
28
|
-
|
|
29
|
-
expect(next).not.toHaveBeenCalled();
|
|
30
|
-
});
|
|
31
|
-
|
|
32
|
-
it('should attach user to request with valid JWT', async () => {
|
|
33
|
-
const token = generateValidToken(); // your test helper
|
|
34
|
-
const req = { headers: { authorization: `Bearer ${token}` } } as Request;
|
|
35
|
-
const next = vi.fn();
|
|
36
|
-
|
|
37
|
-
await authMiddleware(req, {} as Response, next);
|
|
38
|
-
|
|
39
|
-
expect(next).toHaveBeenCalled();
|
|
40
|
-
expect(req.user).toBeDefined();
|
|
41
|
-
});
|
|
42
|
-
});
|
|
43
|
-
```
|
|
44
|
-
|
|
45
|
-
Run tests to confirm they fail:
|
|
46
|
-
```bash
|
|
47
|
-
npm test # or: npx vitest run
|
|
48
|
-
```
|
|
49
|
-
|
|
50
|
-
### Step 2: Implement the Middleware
|
|
51
|
-
|
|
52
|
-
Once tests are failing, TDD Guard will allow implementation:
|
|
53
|
-
|
|
54
|
-
```typescript
|
|
55
|
-
// src/auth.ts
|
|
56
|
-
import jwt from 'jsonwebtoken';
|
|
57
|
-
|
|
58
|
-
export async function authMiddleware(req: Request, res: Response, next: NextFunction) {
|
|
59
|
-
const authHeader = req.headers.authorization;
|
|
60
|
-
|
|
61
|
-
if (!authHeader?.startsWith('Bearer ')) {
|
|
62
|
-
res.status(401).json({ error: 'No token provided' });
|
|
63
|
-
return;
|
|
64
|
-
}
|
|
65
|
-
|
|
66
|
-
const token = authHeader.substring(7);
|
|
67
|
-
|
|
68
|
-
try {
|
|
69
|
-
const decoded = jwt.verify(token, process.env.JWT_SECRET!);
|
|
70
|
-
(req as any).user = decoded;
|
|
71
|
-
next();
|
|
72
|
-
} catch (err) {
|
|
73
|
-
res.status(401).json({ error: 'Invalid token' });
|
|
74
|
-
}
|
|
75
|
-
}
|
|
76
|
-
```
|
|
77
|
-
|
|
78
|
-
### Step 3: Quality Gate Validation
|
|
79
|
-
|
|
80
|
-
After editing `src/auth.ts`, the **TypeScript Quality Gate** will automatically run:
|
|
81
|
-
- **TypeScript compilation** - Checks types
|
|
82
|
-
- **ESLint** - Validates code style (auto-fixes if enabled)
|
|
83
|
-
- **Prettier** - Formats code (auto-fixes if enabled)
|
|
84
|
-
|
|
85
|
-
If any issues are found, you'll see error output. Fix remaining issues manually (type errors can't be auto-fixed).
|
|
86
|
-
|
|
87
|
-
### Step 4: Verify Tests Pass
|
|
88
|
-
|
|
89
|
-
```bash
|
|
90
|
-
npm test # All tests should now pass
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
---
|
|
94
|
-
|
|
95
|
-
**Note:** Ensure you have the required dependencies installed:
|
|
96
|
-
- `npm install --save-dev tdd-guard-vitest` (for Vitest test reporter)
|
|
97
|
-
- `npm install --save-dev typescript eslint prettier` (for quality gate)
|
|
@@ -1,89 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"skill_name": "sync-docs",
|
|
3
|
-
"evals": [
|
|
4
|
-
{
|
|
5
|
-
"id": 1,
|
|
6
|
-
"prompt": "I just closed a bunch of bd issues this sprint and merged 3 PRs. The README.md is getting long — can you sync the docs and make sure everything's in order? Use sync-docs.",
|
|
7
|
-
"expected_output": "Runs context_gatherer.py and doc_structure_analyzer.py, reports what was found (closed issues, PRs, any drift), identifies README extraction candidates or MISSING docs/ files, and either fixes them or gives a clear plan with next steps.",
|
|
8
|
-
"files": [],
|
|
9
|
-
"assertions": [
|
|
10
|
-
{
|
|
11
|
-
"id": "ran-context-gatherer",
|
|
12
|
-
"description": "Ran context_gatherer.py and reported bd closed issues or merged PRs from the output",
|
|
13
|
-
"check": "result.md mentions context_gatherer or bd closed issues or merged PRs with specific data"
|
|
14
|
-
},
|
|
15
|
-
{
|
|
16
|
-
"id": "ran-structure-analyzer",
|
|
17
|
-
"description": "Ran doc_structure_analyzer.py and used its output to identify doc issues",
|
|
18
|
-
"check": "result.md references MISSING, STALE, EXTRACTABLE, or BLOATED status from the analyzer"
|
|
19
|
-
},
|
|
20
|
-
{
|
|
21
|
-
"id": "concrete-action",
|
|
22
|
-
"description": "Produced at least one concrete recommendation or action (not just a vague summary)",
|
|
23
|
-
"check": "result.md names a specific file (e.g. docs/hooks.md) or section with a specific next step"
|
|
24
|
-
},
|
|
25
|
-
{
|
|
26
|
-
"id": "used-skill-scripts",
|
|
27
|
-
"description": "Used the skill scripts rather than just reading files manually",
|
|
28
|
-
"check": "result.md shows script execution output, not just manual file reading"
|
|
29
|
-
}
|
|
30
|
-
]
|
|
31
|
-
},
|
|
32
|
-
{
|
|
33
|
-
"id": 2,
|
|
34
|
-
"prompt": "Run sync-docs --fix on this project and remember what you did with bd.",
|
|
35
|
-
"expected_output": "Runs doc_structure_analyzer.py --fix --bd-remember, creates scaffold files for any missing docs/ subsystems, persists a bd memory with the summary, then validates the created files with validate_doc.py.",
|
|
36
|
-
"files": [],
|
|
37
|
-
"assertions": [
|
|
38
|
-
{
|
|
39
|
-
"id": "ran-fix-flag",
|
|
40
|
-
"description": "Ran doc_structure_analyzer.py with --fix flag",
|
|
41
|
-
"check": "result.md shows the --fix command was executed"
|
|
42
|
-
},
|
|
43
|
-
{
|
|
44
|
-
"id": "ran-bd-remember",
|
|
45
|
-
"description": "Ran with --bd-remember or manually ran bd remember with a summary",
|
|
46
|
-
"check": "result.md shows bd remember was called and reports the memory key"
|
|
47
|
-
},
|
|
48
|
-
{
|
|
49
|
-
"id": "scaffold-created",
|
|
50
|
-
"description": "At least one scaffold file was created in docs/",
|
|
51
|
-
"check": "result.md lists a docs/*.md file created, OR reports no gaps found (valid outcome)"
|
|
52
|
-
},
|
|
53
|
-
{
|
|
54
|
-
"id": "validated-schema",
|
|
55
|
-
"description": "Ran validate_doc.py on created files to confirm schema",
|
|
56
|
-
"check": "result.md shows validate_doc.py was run and reports pass/fail for created files"
|
|
57
|
-
}
|
|
58
|
-
]
|
|
59
|
-
},
|
|
60
|
-
{
|
|
61
|
-
"id": 3,
|
|
62
|
-
"prompt": "Do a doc audit. I think the README has sections that should be in docs/ but I'm not sure which ones.",
|
|
63
|
-
"expected_output": "Runs the full 5-phase sync-docs workflow: gathers context, runs drift detection, runs doc_structure_analyzer.py, and identifies EXTRACTABLE/BLOATED sections with specific suggestions for what goes in which docs/ file before recommending or making changes.",
|
|
64
|
-
"files": [],
|
|
65
|
-
"assertions": [
|
|
66
|
-
{
|
|
67
|
-
"id": "ran-analyzer",
|
|
68
|
-
"description": "Ran doc_structure_analyzer.py and referenced its structured output",
|
|
69
|
-
"check": "result.md cites the analyzer output (EXTRACTABLE, BLOATED, line count, or specific section names from the report)"
|
|
70
|
-
},
|
|
71
|
-
{
|
|
72
|
-
"id": "named-specific-sections",
|
|
73
|
-
"description": "Named specific README sections with their suggested docs/ destination",
|
|
74
|
-
"check": "result.md lists at least 2 specific sections (e.g. '## Policy System → docs/policies.md') not just generic advice"
|
|
75
|
-
},
|
|
76
|
-
{
|
|
77
|
-
"id": "actionable-report",
|
|
78
|
-
"description": "Report is actionable — tells user exactly what to do next, not just observations",
|
|
79
|
-
"check": "result.md includes a prioritized list or clear next steps, not just 'the README could be shorter'"
|
|
80
|
-
},
|
|
81
|
-
{
|
|
82
|
-
"id": "no-edits-made",
|
|
83
|
-
"description": "Did not edit or create any files (audit only)",
|
|
84
|
-
"check": "result.md does not claim to have modified README.md or created docs/ files"
|
|
85
|
-
}
|
|
86
|
-
]
|
|
87
|
-
}
|
|
88
|
-
]
|
|
89
|
-
}
|
|
@@ -1,23 +0,0 @@
|
|
|
1
|
-
{
|
|
2
|
-
"skill_name": "test-planning",
|
|
3
|
-
"evals": [
|
|
4
|
-
{
|
|
5
|
-
"id": 1,
|
|
6
|
-
"prompt": "I've got an epic for a new notification service (notif-3a). I just created 8 child issues: .1 is the Postgres schema migration, .2 is the async message consumer (reads from RabbitMQ), .3 is the template renderer (Jinja2, pure python), .4 is the delivery client (calls Twilio/SendGrid APIs), .5 is the REST API for managing preferences, .6 is the CLI tool for ops to send test notifications, .7 is the retry/dead-letter handler, .8 is the rate limiter. Break down what testing each of these needs and create the bd issues.",
|
|
7
|
-
"expected_output": "Should detect layers: .1 (boundary/DB), .2 (boundary/MQ), .3 (core/pure), .4 (boundary/external API), .5 (boundary/API), .6 (shell/CLI), .7 (core/state machine), .8 (core/algorithm). Should batch into ~3 test issues: core tests (.3, .7, .8), boundary/contract tests (.1, .2, .4, .5), shell integration (.6). Should use property-based for rate limiter, contract tests for external APIs, characterization or spec-first for DB schema.",
|
|
8
|
-
"files": []
|
|
9
|
-
},
|
|
10
|
-
{
|
|
11
|
-
"id": 2,
|
|
12
|
-
"prompt": "I just finished implementing the data ingestion pipeline (issue data-pipe-9f.4) — it reads CSVs from S3, validates schemas, transforms column types, and writes to Postgres. The parent epic is data-pipe-9f. Can you close it for me? bd close data-pipe-9f.4 --reason 'pipeline implemented and deployed'",
|
|
13
|
-
"expected_output": "Should trigger closure gate behavior: check bd children data-pipe-9f for existing test issues. Since none exist, should create a test issue covering all layers: unit tests for schema validation and column transforms (core), contract tests for S3 reads and Postgres writes (boundary), integration test for end-to-end pipeline (shell). Should NOT just close the issue without checking for test coverage.",
|
|
14
|
-
"files": []
|
|
15
|
-
},
|
|
16
|
-
{
|
|
17
|
-
"id": 3,
|
|
18
|
-
"prompt": "We have an epic tracker-7b with 5 implementation issues done and 1 test issue (tracker-7b.6) that was created during planning. But during implementation of .3 (the websocket price feed handler), we discovered the feed sometimes sends malformed JSON that the parser needs to handle gracefully, and .5 (the position calculator) ended up also doing margin calculations which weren't in the original plan. Can you review tracker-7b.6 and update it?",
|
|
19
|
-
"expected_output": "Should read tracker-7b.6, identify drift: (1) malformed JSON handling in websocket parser is a new edge case — add property-based tests for parser robustness, (2) margin calculations in position calculator are new core logic — add unit tests. Should update the existing test issue via bd update or bd comments, not create a new one. Should note that the parser needs characterization tests if there's existing behavior to preserve.",
|
|
20
|
-
"files": []
|
|
21
|
-
}
|
|
22
|
-
]
|
|
23
|
-
}
|