npm - @kodrunhq/opencode-autopilot - Versions diffs - 1.3.0 → 1.5.0 - Mend

@kodrunhq/opencode-autopilot 1.3.0 → 1.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (50) hide show

package/assets/commands/brainstorm.md +7 -0
package/assets/commands/stocktake.md +7 -0
package/assets/commands/tdd.md +7 -0
package/assets/commands/update-docs.md +7 -0
package/assets/commands/write-plan.md +7 -0
package/assets/skills/brainstorming/SKILL.md +295 -0
package/assets/skills/code-review/SKILL.md +241 -0
package/assets/skills/e2e-testing/SKILL.md +266 -0
package/assets/skills/git-worktrees/SKILL.md +296 -0
package/assets/skills/go-patterns/SKILL.md +240 -0
package/assets/skills/plan-executing/SKILL.md +258 -0
package/assets/skills/plan-writing/SKILL.md +278 -0
package/assets/skills/python-patterns/SKILL.md +255 -0
package/assets/skills/rust-patterns/SKILL.md +293 -0
package/assets/skills/strategic-compaction/SKILL.md +217 -0
package/assets/skills/systematic-debugging/SKILL.md +299 -0
package/assets/skills/tdd-workflow/SKILL.md +311 -0
package/assets/skills/typescript-patterns/SKILL.md +278 -0
package/assets/skills/verification/SKILL.md +240 -0
package/package.json +1 -1
package/src/index.ts +72 -1
package/src/observability/context-monitor.ts +102 -0
package/src/observability/event-emitter.ts +136 -0
package/src/observability/event-handlers.ts +322 -0
package/src/observability/event-store.ts +226 -0
package/src/observability/index.ts +53 -0
package/src/observability/log-reader.ts +152 -0
package/src/observability/log-writer.ts +93 -0
package/src/observability/mock/mock-provider.ts +72 -0
package/src/observability/mock/types.ts +31 -0
package/src/observability/retention.ts +57 -0
package/src/observability/schemas.ts +83 -0
package/src/observability/session-logger.ts +63 -0
package/src/observability/summary-generator.ts +209 -0
package/src/observability/token-tracker.ts +97 -0
package/src/observability/types.ts +24 -0
package/src/orchestrator/skill-injection.ts +38 -0
package/src/review/sanitize.ts +1 -1
package/src/skills/adaptive-injector.ts +122 -0
package/src/skills/dependency-resolver.ts +88 -0
package/src/skills/linter.ts +113 -0
package/src/skills/loader.ts +88 -0
package/src/templates/skill-template.ts +4 -0
package/src/tools/create-skill.ts +12 -0
package/src/tools/logs.ts +178 -0
package/src/tools/mock-fallback.ts +100 -0
package/src/tools/pipeline-report.ts +148 -0
package/src/tools/session-stats.ts +185 -0
package/src/tools/stocktake.ts +170 -0
package/src/tools/update-docs.ts +116 -0

package/assets/skills/plan-writing/SKILL.md ADDED Viewed

@@ -0,0 +1,278 @@
+---
+name: plan-writing
+description: Methodology for decomposing features into bite-sized implementation tasks with file paths, dependencies, and verification criteria
+stacks: []
+requires: []
+---
+# Plan Writing
+A systematic methodology for breaking down features, refactors, and bug fixes into bite-sized implementation tasks. Each task has exact file paths, clear actions, verification commands, and dependency ordering. Plans are the bridge between "what we want" and "what we build."
+## When to Use
+- **New feature implementation** — any feature touching more than 3 files needs a plan
+- **Refactoring existing code** — without a plan, refactors sprawl and break things
+- **Multi-step bug fixes** — when the fix spans multiple files or modules
+- **Any work that will take more than 60 minutes** — break it into trackable tasks
+- **Work that others need to review** — a plan makes the approach reviewable before code is written
+- **Work you might not finish in one session** — a plan lets you (or someone else) resume cleanly
+A plan is not overhead — it is the work. Writing the plan forces you to think through the approach, identify dependencies, and surface problems before you write any code. The time spent planning is recovered 3x during implementation.
+## The Plan Writing Process
+### Step 1: Define the Goal
+State what must be TRUE when this work is complete. Goals are outcome-shaped, not task-shaped.
+**Good goals:**
+- "Users can log in with email and password, receiving a JWT on success and a clear error on failure"
+- "The review engine filters agents by detected project stack, loading only relevant agents"
+- "All API endpoints validate input with Zod schemas and return structured error responses"
+**Bad goals:**
+- "Build the auth system" (too vague — what does "build" mean?)
+- "Refactor the code" (refactor what, to achieve what outcome?)
+- "Fix the bug" (which bug? what is the expected behavior?)
+**Process:**
+1. Write the goal as a single sentence starting with a noun ("Users can...", "The system...", "All endpoints...")
+2. Include the observable behavior (what a user or developer would see)
+3. Include the key constraint or quality attribute (performance, security, correctness)
+4. If you cannot state the goal in one sentence, you have multiple goals — write multiple plans
+### Step 2: List Required Artifacts
+For each goal, list the concrete files that must exist or be modified. Use exact file paths.
+**Process:**
+1. List every source file that must be created or modified
+2. List every test file that must be created or modified
+3. List every configuration file affected (schemas, migrations, config)
+4. List every type/interface file needed
+5. Use exact paths relative to the project root: `src/auth/login.ts`, not "the login module"
+**Example:**
+```
+Goal: Users can log in with email and password
+Artifacts:
+- src/auth/login.ts          (new — login endpoint handler)
+- src/auth/login.test.ts     (new — tests for login)
+- src/auth/token.ts          (new — JWT creation and verification)
+- src/auth/token.test.ts     (new — tests for token utilities)
+- src/types/auth.ts          (new — LoginRequest, LoginResponse types)
+- src/middleware/auth.ts      (modify — add JWT verification middleware)
+- src/index.ts               (modify — register login route)
+```
+**Why file paths matter:** Vague artifact descriptions ("create the auth module") leave too much ambiguity. Exact file paths make the scope visible, reviewable, and trackable. If you cannot name the file, you do not understand the implementation well enough to plan it.
+### Step 3: Map Dependencies
+For each artifact, identify what must exist before it can be built.
+**Process:**
+1. For each file, ask: "What does this file import or depend on?"
+2. Draw arrows from dependencies to dependents
+3. Files with no dependencies are starting points
+4. Files that everything depends on are critical path items
+**Example:**
+```
+src/types/auth.ts           → depends on: nothing (pure types)
+src/auth/token.ts           → depends on: src/types/auth.ts
+src/auth/login.ts           → depends on: src/types/auth.ts, src/auth/token.ts
+src/middleware/auth.ts       → depends on: src/auth/token.ts
+src/auth/token.test.ts      → depends on: src/auth/token.ts
+src/auth/login.test.ts      → depends on: src/auth/login.ts
+src/index.ts                → depends on: src/auth/login.ts, src/middleware/auth.ts
+```
+**Dependency rules:**
+- Types and interfaces have no dependencies (they go first)
+- Utility functions depend on types but not on business logic
+- Business logic depends on types and utilities
+- Tests depend on the code they test
+- Wiring/registration depends on everything it wires together
+### Step 4: Group into Tasks
+Each task is a unit of work that can be completed, verified, and committed independently.
+**Task sizing rules:**
+- **1-3 files per task** — enough to make progress, small enough to verify
+- **15-60 minutes of work** — less than 15 means combine with another task, more than 60 means split
+- **Single concern** — one task should not mix unrelated changes
+- **Independently verifiable** — each task has a way to prove it works
+**Each task must have:**
+1. **Name** — action-oriented verb phrase ("Create auth types and token utilities")
+2. **Files** — exact file paths created or modified
+3. **Action** — specific instructions for what to implement
+4. **Verification** — command or check that proves the task is done
+5. **Done criteria** — measurable statement of completeness
+**Example task:**
+```
+Task 1: Create auth types and token utilities
+Files: src/types/auth.ts, src/auth/token.ts, src/auth/token.test.ts
+Action: Define LoginRequest (email: string, password: string) and
+        LoginResponse (token: string, expiresAt: number) types.
+        Implement createToken(userId) and verifyToken(token) using jose.
+        Write tests for both functions including expired token and invalid token cases.
+Verification: bun test tests/auth/token.test.ts
+Done: Token creation and verification work with test coverage for happy path and error cases.
+```
+### Step 5: Assign Waves
+Group tasks into dependency waves for execution ordering.
+**Process:**
+1. **Wave 1** — tasks with no dependencies on other tasks (can run in parallel)
+2. **Wave 2** — tasks that depend only on Wave 1 tasks
+3. **Wave 3** — tasks that depend on Wave 1 or Wave 2 tasks
+4. Continue until all tasks are assigned
+**Principles:**
+- More waves of smaller tasks is better than fewer waves of larger tasks
+- Tasks in the same wave can theoretically run in parallel
+- Each wave should leave the codebase in a working state
+- The final wave typically handles wiring, integration, and end-to-end verification
+**Example:**
+```
+Wave 1 (no dependencies):
+  Task 1: Create auth types and token utilities
+  Task 2: Create password hashing utilities
+Wave 2 (depends on Wave 1):
+  Task 3: Create login endpoint handler
+  Task 4: Create auth middleware
+Wave 3 (depends on Wave 2):
+  Task 5: Wire login route and middleware into app
+  Task 6: Add end-to-end login test
+```
+### Step 6: Add Verification
+Every task needs a verification command. The plan as a whole needs an end-to-end verification step.
+**Per-task verification:**
+- A test command: `bun test tests/auth/token.test.ts`
+- A build check: `bunx tsc --noEmit`
+- A lint check: `bun run lint`
+- A runtime check: "Start the server and POST to /login with valid credentials"
+**Plan-level verification:**
+- Run the full test suite: `bun test`
+- Run the linter: `bun run lint`
+- Verify the goal: "A user can log in with email/password and receive a JWT"
+- Check for regressions: "All previously passing tests still pass"
+## Task Sizing Guide
+### Too Small (Less Than 15 Minutes)
+**Symptoms:** "Create the User type" (one file, one type, 5 minutes)
+**Fix:** Combine with a related task. Types + the first function that uses them is a natural grouping.
+### Right Size (15-60 Minutes)
+**Symptoms:** Touches 1-3 files. Single concern. Clear done criteria. You can explain the task in one sentence.
+**Examples:**
+- "Create auth types and token utilities with tests" (3 files, 30 min)
+- "Add input validation to all API endpoints" (3-4 files, 45 min)
+- "Implement the review agent selection logic with stack gating" (2 files, 60 min)
+### Too Large (More Than 60 Minutes)
+**Symptoms:** Touches 5+ files. Multiple concerns mixed together. Done criteria is vague. You need sub-steps to explain it.
+**Fix:** Split by one of these dimensions:
+- **By file:** Types in one task, implementation in another, tests in a third
+- **By concern:** Validation in one task, business logic in another
+- **By layer:** Data access first, business logic second, wiring third
+- **By feature slice:** User creation first, user login second (vertical slices over horizontal layers)
+## Anti-Pattern Catalog
+### Anti-Pattern: Vague Tasks
+**What goes wrong:** "Set up the database" — what tables? What columns? What constraints? What migrations? The implementer has to make all the decisions that should have been made during planning.
+**Instead:** "Add User and Project models to schema.prisma with UUID primary keys, email unique constraint on User, and a one-to-many relation from User to Project."
+### Anti-Pattern: No File Paths
+**What goes wrong:** "Create the auth module" — which files? What directory structure? What naming convention? The implementer makes different choices than the planner intended.
+**Instead:** "Create `src/auth/login.ts` with a POST handler accepting `{ email: string, password: string }` and returning `{ token: string }`."
+### Anti-Pattern: Horizontal Layers
+**What goes wrong:** "Create all models, then all APIs, then all UIs." This means nothing works end-to-end until the last layer is done. Integration issues are discovered late.
+**Instead:** Vertical slices — "User feature (model + API + test), then Product feature (model + API + test)." Each slice delivers a working feature.
+### Anti-Pattern: Missing Verification
+**What goes wrong:** Tasks without a way to prove they are done. The implementer finishes the code and says "looks good" — but nothing was verified.
+**Instead:** Every task has a verification command. If you cannot write a verification step, the task is not well-defined enough.
+### Anti-Pattern: No Dependencies Mapped
+**What goes wrong:** The implementer starts Task 3 and discovers it depends on something from Task 5. They either hack around it or rearrange on the fly, losing time and introducing bugs.
+**Instead:** Map dependencies explicitly in Step 3. If Task 3 depends on Task 5, reorder them.
+### Anti-Pattern: Plan as Documentation
+**What goes wrong:** The plan is written after the code, as documentation of what was built. This defeats the purpose — the plan should guide the implementation, not describe it.
+**Instead:** Write the plan before writing any code. Review the plan (are the tasks right-sized? dependencies correct? verification clear?) before implementing.
+## Integration with Our Tools
+- **`oc_orchestrate`** — Execute the plan automatically. The orchestrator reads the plan and dispatches tasks to implementation agents
+- **`oc_plan`** — Track task completion status as implementation progresses
+- **plan-executing skill** — Use the companion skill for the execution methodology (how to work through the plan task by task)
+- **`oc_review`** — After writing the plan, review it for completeness before implementation begins
+## Failure Modes
+### Plan Too Large
+**Symptom:** More than 5-6 tasks in a single plan, or estimated total time exceeds 4 hours.
+**Fix:** Split into multiple plans of 2-4 tasks each. Each plan should deliver a working increment. Plan A provides the foundation, Plan B builds on it.
+### Circular Dependencies
+**Symptom:** Task A depends on Task B, which depends on Task A. The dependency graph has a cycle.
+**Fix:** The cycle means the tasks are not properly separated. Extract the shared dependency into its own task (usually types or interfaces). Both Task A and Task B depend on the new task instead of each other.
+### Tasks Keep Growing
+**Symptom:** "This task was supposed to be 30 minutes but it is been 2 hours." Implementation reveals more work than planned.
+**Fix:** You are combining concerns. Stop, re-plan the remaining work. Split the current task into smaller tasks. The sunk time is gone — do not let it cascade into more wasted time.
+### Verification Cannot Be Automated
+**Symptom:** The verification step is "manually check that it works" — no test command, no build check, nothing automated.
+**Fix:** If you truly cannot automate verification, write a manual verification checklist with specific steps ("Open the browser, navigate to /login, enter email and password, verify token appears in response"). But first, ask: can this be a test? Usually it can.
+### Scope Creep During Planning
+**Symptom:** The plan keeps growing as you discover more work. What started as 3 tasks is now 12.
+**Fix:** Separate "must have for the goal" from "nice to have." The plan delivers the goal — everything else goes into a follow-up plan. A plan that does one thing well is better than a plan that does five things partially.

package/assets/skills/python-patterns/SKILL.md ADDED Viewed

@@ -0,0 +1,255 @@
+---
+name: python-patterns
+description: Pythonic patterns covering type hints, error handling, async, testing with pytest, and project organization
+stacks:
+  - python
+requires: []
+---
+# Python Patterns
+Pythonic patterns for writing clean, typed, and testable Python code. Covers type hints, error handling, async programming, testing with pytest, project organization, and common anti-patterns. Apply these when writing, reviewing, or refactoring Python code.
+## 1. Type Hints
+**DO:** Use type hints on all function signatures and module-level variables for clarity and static analysis.
+- Annotate all function parameters and return types:
+  ```python
+  def fetch_user(user_id: str) -> User | None:
+      ...
+  ```
+- Use `from __future__ import annotations` at the top of every module for forward references and PEP 604 union syntax in older Python versions
+- Use `TypedDict` for structured dictionaries that cross boundaries:
+  ```python
+  class UserResponse(TypedDict):
+      id: str
+      name: str
+      email: str
+      is_active: bool
+  ```
+- Use `@dataclass` for value objects with automatic `__init__`, `__eq__`, and `__repr__`:
+  ```python
+  @dataclass(frozen=True)
+  class Coordinate:
+      lat: float
+      lon: float
+  ```
+- Use `Protocol` for structural subtyping (duck typing with type safety):
+  ```python
+  class Readable(Protocol):
+      def read(self, n: int = -1) -> bytes: ...
+  def process(source: Readable) -> bytes:
+      return source.read()
+  # Any object with a read() method satisfies Readable
+  ```
+- Use `Literal` for constrained string/int values:
+  ```python
+  def set_log_level(level: Literal["debug", "info", "warning", "error"]) -> None:
+      ...
+  ```
+**DON'T:**
+- Use `Any` without justification -- prefer `object` for truly unknown types or narrow with `isinstance`
+- Use `dict` when `TypedDict` or a dataclass better describes the structure
+- Use `Optional[X]` -- prefer `X | None` (clearer, shorter)
+- Omit return type annotations -- even `-> None` is valuable documentation
+- Use `Union[X, Y]` -- prefer `X | Y` (PEP 604 syntax)
+## 2. Error Handling
+**DO:** Use specific exceptions and context managers for clean resource management.
+- Raise specific exceptions with descriptive messages:
+  ```python
+  raise ValueError(f"age must be positive, got {age}")
+  ```
+- Create a domain exception hierarchy:
+  ```python
+  class AppError(Exception):
+      """Base for all application errors."""
+  class AuthError(AppError):
+      """Authentication or authorization failure."""
+  class NotFoundError(AppError):
+      """Requested resource does not exist."""
+  ```
+- Use context managers for resource cleanup:
+  ```python
+  with open(path, "r") as f:
+      data = json.load(f)
+  ```
+- Chain exceptions to preserve the original cause:
+  ```python
+  try:
+      result = parse_config(raw)
+  except json.JSONDecodeError as e:
+      raise ConfigError(f"invalid config at {path}") from e
+  ```
+- Use `logging.exception()` in catch blocks to capture the full traceback:
+  ```python
+  except DatabaseError:
+      logger.exception("Failed to connect to database")
+      raise
+  ```
+**DON'T:**
+- Use bare `except:` -- it catches `KeyboardInterrupt` and `SystemExit`. Use `except Exception:` at minimum
+- Catch and silently swallow: `except Exception: pass` is almost always a bug
+- Use exceptions for control flow -- check conditions with `if` before potentially failing operations
+- Raise `Exception("something")` -- use specific types
+- Log and re-raise without `from` -- you lose the traceback chain
+## 3. Async Patterns
+**DO:** Use `async`/`await` for I/O-bound operations and structured concurrency for parallelism.
+- Use `async`/`await` for network, file, and database operations:
+  ```python
+  async def fetch_user(session: aiohttp.ClientSession, user_id: str) -> User:
+      async with session.get(f"/users/{user_id}") as resp:
+          data = await resp.json()
+          return User(**data)
+  ```
+- Use `asyncio.gather()` for concurrent independent tasks:
+  ```python
+  users, orders = await asyncio.gather(
+      fetch_users(session),
+      fetch_orders(session),
+  )
+  ```
+- Use `asyncio.TaskGroup` (Python 3.11+) for structured concurrency with automatic cancellation:
+  ```python
+  async with asyncio.TaskGroup() as tg:
+      task1 = tg.create_task(fetch_users())
+      task2 = tg.create_task(fetch_orders())
+  # Both tasks complete or all are cancelled on first failure
+  ```
+- Use `async with` for async context managers (database connections, HTTP sessions)
+- Use `asyncio.Semaphore` for rate limiting concurrent operations:
+  ```python
+  sem = asyncio.Semaphore(10)
+  async def limited_fetch(url: str) -> bytes:
+      async with sem:
+          return await fetch(url)
+  ```
+**DON'T:**
+- Mix sync and async in the same module -- pick one paradigm
+- Use `asyncio.run()` inside an already-running event loop
+- Block the event loop with CPU-bound work -- use `asyncio.to_thread()` or `ProcessPoolExecutor`
+- Use `time.sleep()` in async code -- use `await asyncio.sleep()`
+- Create tasks without awaiting them -- orphan tasks are goroutine leaks
+## 4. Testing with pytest
+**DO:** Write focused tests using pytest fixtures, parametrize, and clear assertion patterns.
+- Use `@pytest.fixture` for test setup and dependency injection:
+  ```python
+  @pytest.fixture
+  def db_connection():
+      conn = create_test_db()
+      yield conn
+      conn.close()
+  def test_insert_user(db_connection):
+      db_connection.execute("INSERT INTO users ...")
+      assert db_connection.query("SELECT count(*) FROM users") == 1
+  ```
+- Parametrize tests for multiple inputs:
+  ```python
+  @pytest.mark.parametrize("input,expected", [
+      ("hello", "HELLO"),
+      ("", ""),
+      ("Hello World", "HELLO WORLD"),
+  ])
+  def test_uppercase(input: str, expected: str):
+      assert uppercase(input) == expected
+  ```
+- Use `conftest.py` for shared fixtures across test modules
+- Use `pytest.raises` for exception testing:
+  ```python
+  with pytest.raises(ValueError, match="must be positive"):
+      validate_age(-1)
+  ```
+- Use `tmp_path` fixture for temporary file tests:
+  ```python
+  def test_write_config(tmp_path: Path):
+      config_file = tmp_path / "config.json"
+      write_config(config_file, {"key": "value"})
+      assert config_file.read_text() == '{"key": "value"}'
+  ```
+- Use `monkeypatch` for mocking environment variables and module attributes
+**DON'T:**
+- Use `unittest.TestCase` in new code -- pytest's function-based tests are simpler and more powerful
+- Create test fixtures with complex inheritance hierarchies
+- Test implementation details -- test behavior through the public API
+- Use `mock.patch` on internal functions -- inject dependencies instead
+- Write tests that depend on execution order
+## 5. Project Organization
+**DO:** Use modern Python project structure with `pyproject.toml` and the `src/` layout.
+- Standard project structure:
+  ```
+  project/
+    pyproject.toml
+    src/
+      mypackage/
+        __init__.py
+        models/
+        services/
+        api/
+    tests/
+      conftest.py
+      test_models.py
+      test_services.py
+  ```
+- Use `pyproject.toml` as the single source for project metadata, dependencies, and tool configuration
+- Use `__init__.py` for public API exports only -- keep them minimal:
+  ```python
+  # src/mypackage/__init__.py
+  from mypackage.client import Client
+  from mypackage.errors import AppError
+  __all__ = ["Client", "AppError"]
+  ```
+- Separate concerns: `models/` for data structures, `services/` for business logic, `api/` for HTTP layer
+- Use `pydantic` for data validation at system boundaries (API input, config files, external data)
+- Pin dependencies with lock files (`uv.lock`, `poetry.lock`, `requirements.txt` with hashes)
+**DON'T:**
+- Use `setup.py` for new projects -- `pyproject.toml` is the standard
+- Put everything in `__init__.py` -- it becomes a maintenance burden
+- Import from `tests/` in production code
+- Use relative imports across package boundaries -- absolute imports are clearer
+## 6. Anti-Pattern Catalog
+**Anti-Pattern: Mutable Default Arguments**
+`def f(items=[])` shares the same list across all calls. The default is evaluated once at function definition, not per call. Instead: `def f(items: list[str] | None = None): items = items if items is not None else []`
+**Anti-Pattern: Bare Except**
+`except:` catches everything including `KeyboardInterrupt`, `SystemExit`, and `GeneratorExit`. This prevents Ctrl+C from working and hides real bugs. Instead: `except Exception:` for broad catching, or specific exception types.
+**Anti-Pattern: Star Imports**
+`from module import *` pollutes the namespace, makes it impossible to trace where names come from, and breaks static analysis. Instead: import explicitly: `from module import ClassName, function_name`
+**Anti-Pattern: God Class**
+A single class with 20+ methods handling validation, database access, formatting, and business logic. Instead: split into focused classes with single responsibilities. Use composition over inheritance.
+**Anti-Pattern: String Formatting for SQL**
+`f"SELECT * FROM users WHERE id = '{user_id}'"` is a SQL injection vulnerability. Instead: use parameterized queries: `cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))`
+**Anti-Pattern: Nested Try/Except**
+Three levels of `try/except` blocks handling different errors. Instead: use early returns, separate the operations into functions, or use a result type pattern.