npm - litclaude-ai - Versions diffs - 0.2.2 - Mend

litclaude-ai 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (156) hide show

package/plugins/litclaude/skills/programming/references/python/data-processing.md ADDED Viewed

@@ -0,0 +1,133 @@
+# Data Processing — Polars + DuckDB
+## The rule
+NEVER pandas. Polars (with numpy) plus DuckDB. Pandas is 10-50x slower, has weaker types, and the modern Python data ecosystem has moved on.
+## Quick decision tree
+| Operation | Use | Why |
+|---|---|---|
+| `.csv` / `.parquet` / `.json` direct query | DuckDB | Zero memory load, SQL ergonomics |
+| `.duckdb` file | DuckDB | Native format |
+| Filter (any size) | Polars | 128x faster than DuckDB for filtering |
+| Sort | Polars | 12x faster |
+| Multi-table join | DuckDB | 3x faster, more join types |
+| Heavy GROUP BY aggregation | DuckDB | 4x faster on large datasets |
+| Window function | Polars | 3-5x faster |
+| Pivot / melt / string ops | Polars | 2x faster |
+| Larger than RAM | Polars streaming or DuckDB out-of-core | Both handle OOM |
+| Mixed pipeline | Hybrid (zero-copy via Arrow) | Use each tool's strengths |
+For the deep version (per-operation benchmarks, OOM strategies, full execution templates), load the **`data-scientist`** skill - it lives in this same skill set and is the source of truth for performance numbers.
+## Standard imports
+```python
+import numpy as np
+import polars as pl
+import duckdb
+```
+## DuckDB direct file query (zero memory load)
+```python
+result = duckdb.sql("""
+    SELECT category, SUM(amount) AS total
+    FROM 'data.csv'
+    WHERE date >= '2026-01-01'
+    GROUP BY category
+    ORDER BY total DESC
+""").pl()  # zero-copy → Polars DataFrame
+```
+`.pl()` returns Polars; `.df()` would return pandas - never use `.df()`.
+## Polars lazy pipeline
+```python
+result = (
+    pl.scan_csv("data.csv")              # lazy, no read yet
+    .filter(pl.col("amount") > 1000)
+    .filter(pl.col("status") == "active")
+    .sort("amount", descending=True)
+    .head(100)
+    .collect()                            # execute optimised plan
+)
+```
+`scan_*` over `read_*` for files; `lazy()` then `collect()` for in-memory frames. Polars optimises the entire plan before execution (predicate pushdown, projection pushdown, common subexpression elimination).
+## Streaming for OOM data
+```python
+result = (
+    pl.scan_csv("huge.csv")
+    .filter(pl.col("active"))
+    .group_by("category")
+    .agg([
+        pl.len().alias("count"),
+        pl.sum("amount").alias("total"),
+    ])
+    .collect(streaming=True)
+)
+```
+## Hybrid pipeline (most realistic shape)
+```python
+# Phase 1: DuckDB for the join (3x faster)
+joined = duckdb.sql("""
+    SELECT o.*, c.region, p.category
+    FROM 'orders.parquet' o
+    JOIN 'customers.parquet' c ON o.customer_id = c.id
+    JOIN 'products.parquet'  p ON o.product_id  = p.id
+""").pl()
+# Phase 2: Polars for filtering and transformation (128x + 2x faster)
+processed = (
+    joined
+    .filter(pl.col("amount") > 100)
+    .with_columns([
+        (pl.col("amount") * 1.1).alias("amount_with_tax"),
+    ])
+)
+# Phase 3: DuckDB for final aggregation (4x faster) - register Polars frame by name
+duckdb.register("processed", processed)
+final = duckdb.sql("""
+    SELECT region, category, SUM(amount_with_tax) AS revenue
+    FROM processed
+    GROUP BY region, category
+    ORDER BY revenue DESC
+""").pl()
+```
+## Type safety with Polars
+Polars supports schema overrides at read time, and `.cast()` for explicit conversion. Avoid implicit coercion in hot paths.
+```python
+schema = {"id": pl.Int64, "amount": pl.Float64, "date": pl.Date}
+df = pl.read_csv("data.csv", schema_overrides=schema)
+```
+basedpyright understands `polars-stubs`, which ship with polars itself. No extra type stubs to install.
+## Things you might miss from pandas (and how to do them in Polars)
+| pandas | polars |
+|---|---|
+| `df.iloc[5]` | `df.row(5)` (named tuple) or `df[5]` (single-row frame) |
+| `df.loc[df["x"] > 5]` | `df.filter(pl.col("x") > 5)` |
+| `df["x"].apply(fn)` | `df["x"].map_elements(fn)` (slow path) or use native expressions |
+| `df.merge(...)` | `df.join(other, on="key")` |
+| `df.groupby(...).agg(...)` | `df.group_by(...).agg(...)` |
+| `pd.read_csv(...).dtypes` | `pl.read_csv(...).schema` |
+| `df.to_dict("records")` | `df.to_dicts()` |
+## Sources
+- Polars docs: <https://docs.pola.rs>
+- DuckDB Python API: <https://duckdb.org/docs/api/python/overview>
+- Cross-reference - this skill set's `data-scientist` skill (load it for the deep version)

package/plugins/litclaude/skills/programming/references/python/error-handling.md ADDED Viewed

@@ -0,0 +1,218 @@
+# Error Handling
+Typed errors, exhaustive matching, union returns, and resource safety.
+---
+## Typed errors — no bare strings
+Error types carry structured data. Pattern matching works. Callers know exactly what can go wrong.
+```python
+from dataclasses import dataclass
+from typing import NewType
+UserId = NewType("UserId", int)
+@dataclass(frozen=True, slots=True)
+class UserNotFoundError(Exception):
+    user_id: UserId
+    def __str__(self) -> str:  # REQUIRED — see note below
+        return f"user {self.user_id} not found"
+@dataclass(frozen=True, slots=True)
+class PermissionDeniedError(Exception):
+    user_id: UserId
+    required_role: str
+    def __str__(self) -> str:
+        return f"user {self.user_id} needs role {self.required_role}"
+```
+**`__str__` is mandatory** on dataclass exceptions. `@dataclass` replaces `Exception.__init__`, so `self.args` is always `()`. Without `__str__`, `str(e)` returns an empty string and logging/monitoring breaks.
+```python
+# BAD
+raise ValueError("user not found")
+raise ValueError("permission denied")
+# GOOD
+raise UserNotFoundError(user_id=uid)
+raise PermissionDeniedError(user_id=uid, required_role="admin")
+```
+---
+## Union returns — expected failures without exceptions
+For failures that are **expected** (not found, validation error, permission denied), return a union instead of raising. Exceptions are for **unexpected** failures (network down, OOM, corrupted data).
+### Define the outcome types
+```python
+@dataclass(frozen=True, slots=True)
+class User:
+    id: UserId
+    name: str
+@dataclass(frozen=True, slots=True)
+class UserNotFound:
+    id: UserId
+@dataclass(frozen=True, slots=True)
+class PermissionDenied:
+    id: UserId
+    reason: str
+type GetUserResult = User | UserNotFound | PermissionDenied
+```
+### Handle exhaustively
+```python
+from typing import assert_never
+def handle_result(result: GetUserResult) -> str:
+    match result:
+        case User(name=name):
+            return f"Found: {name}"
+        case UserNotFound(id=uid):
+            return f"No user with id {uid}"
+        case PermissionDenied(reason=reason):
+            return f"Denied: {reason}"
+        case _ as unreachable:
+            assert_never(unreachable)
+```
+`assert_never` in the default case: if you add a new variant to `GetUserResult` without handling it here, the type checker errors. No silent fall-through.
+### When to use which
+**The heuristic**: caller is 1-2 levels away and MUST handle it → union return. Error should propagate up many layers to a boundary → exception.
+| Scenario | Pattern | Why |
+|---|---|---|
+| Repository → service (caller handles it) | Union return (`User \| UserNotFound`) | Caller is right there, must handle both |
+| Validation at boundary (parsing input) | Exception (typed, with fields) | Propagates up to HTTP/CLI handler |
+| Infrastructure failure (network, OOM) | Exception | Can't handle locally, must propagate |
+| Service → service (deep internal) | Exception (typed) | Union boilerplate across many layers is worse than exceptions |
+| HTTP handler → response | Catch exceptions, convert to response | Boundary code catches and translates |
+**Practical tradeoff**: union returns are safest (type checker forces handling) but create boilerplate when every caller in a chain must `match`. If the error would just propagate through 3+ layers unchanged, use a typed exception instead.
+---
+## Exhaustive match — every match needs a default
+Every `match` statement ends with `case _: assert_never(x)`. No exceptions.
+```python
+from enum import StrEnum
+from typing import assert_never
+class Status(StrEnum):
+    PENDING = "pending"
+    ACTIVE = "active"
+    DELETED = "deleted"
+def describe(status: Status) -> str:
+    match status:
+        case Status.PENDING:
+            return "waiting"
+        case Status.ACTIVE:
+            return "live"
+        case Status.DELETED:
+            return "gone"
+        case _ as unreachable:
+            assert_never(unreachable)
+```
+Add a new enum member? The type checker tells you every `match` that needs updating.
+---
+## Context managers — resource safety
+If it has `.close()`, `.shutdown()`, `.disconnect()`, or `.release()`, wrap it in `with`.
+```python
+# BAD
+f = open("data.txt")
+data = f.read()
+f.close()  # forgotten? leaked
+# GOOD
+from pathlib import Path
+data = Path("data.txt").read_text()
+```
+### Async resources
+```python
+import httpx
+async def fetch_users() -> list[User]:
+    async with httpx.AsyncClient() as client:
+        response = await client.get("https://api.example.com/users")
+        response.raise_for_status()
+        return [User(**u) for u in response.json()]
+```
+### Custom context manager
+```python
+from contextlib import asynccontextmanager
+from collections.abc import AsyncIterator
+@asynccontextmanager
+async def managed_connection(url: str) -> AsyncIterator[Connection]:
+    conn = await connect(url)
+    try:
+        yield conn
+    finally:
+        await conn.close()
+async with managed_connection("postgres://...") as conn:
+    await conn.execute("SELECT 1")
+# conn is closed here, guaranteed
+```
+---
+## Exception hierarchy — when you do raise
+Keep exception hierarchies shallow and specific.
+```python
+class AppError(Exception):
+    """Base for all application errors."""
+@dataclass(frozen=True, slots=True)
+class NotFoundError(AppError):
+    entity: str
+    id: int
+    def __str__(self) -> str:
+        return f"{self.entity} {self.id} not found"
+@dataclass(frozen=True, slots=True)
+class ConflictError(AppError):
+    entity: str
+    field: str
+    value: str
+    def __str__(self) -> str:
+        return f"{self.entity}.{self.field} = {self.value!r} already exists"
+```
+Callers catch `AppError` at the boundary, or specific subtypes where they can do something useful.
+---
+## Sources
+- Python docs: [typing — assert_never](https://docs.python.org/3/library/typing.html#typing.assert_never)
+- Python docs: [contextlib](https://docs.python.org/3/library/contextlib.html)
+- Python docs: [match statement](https://docs.python.org/3/reference/compound_stmts.html#the-match-statement)

package/plugins/litclaude/skills/programming/references/python/fastapi-stack.md ADDED Viewed

@@ -0,0 +1,316 @@
+# FastAPI + SQLAlchemy 2.x async + Postgres + Pydantic v2
+The canonical web API stack. Async end-to-end, type-safe end-to-end, OpenAPI-generated end-to-end.
+## Project layout
+```
+myapi/
+├── pyproject.toml
+├── alembic.ini
+├── migrations/
+│   └── env.py
+├── src/
+│   └── myapi/
+│       ├── __init__.py
+│       ├── main.py            # FastAPI app + lifespan
+│       ├── config.py          # pydantic-settings
+│       ├── db.py              # engine, session factory, dependency
+│       ├── models.py          # SQLAlchemy declarative models
+│       ├── schemas.py         # Pydantic request/response models
+│       └── routers/
+│           ├── __init__.py
+│           └── users.py
+└── tests/
+    ├── conftest.py
+    └── test_users.py
+```
+## Dependencies
+```bash
+uv add fastapi 'sqlalchemy[asyncio]>=2.0' asyncpg 'pydantic[email]>=2' pydantic-settings 'uvicorn[standard]' orjson
+uv add --dev httpx pytest alembic
+```
+`orjson` is mandatory: set `default_response_class=ORJSONResponse` on the FastAPI app. Pydantic-typed responses bypass it (Pydantic v2's `model_dump_json` is already Rust-backed); raw `dict` / `list` returns are accelerated. For SSE / NDJSON streams, call `orjson.dumps(...)` per chunk inside `StreamingResponse`. See `orjson-stack.md` for the decision tree, flag reference, and benchmarks.
+## Configuration (`config.py`)
+```python
+from functools import lru_cache
+from pydantic import Field, PostgresDsn
+from pydantic_settings import BaseSettings, SettingsConfigDict
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(env_file=".env", env_prefix="MYAPI_")
+    database_url: PostgresDsn
+    debug: bool = False
+    cors_origins: list[str] = Field(default_factory=list)
+@lru_cache
+def get_settings() -> Settings:
+    return Settings()  # type: ignore[call-arg]  # pydantic populates from env
+```
+Wait — that comment violates the no-excuse rule. Use proper field defaults instead. Real version:
+```python
+class Settings(BaseSettings):
+    model_config = SettingsConfigDict(env_file=".env", env_prefix="MYAPI_")
+    database_url: PostgresDsn
+    debug: bool = False
+    cors_origins: list[str] = Field(default_factory=list)
+```
+Construct via `Settings(_env_file=".env")` if needed in tests; in production it reads from env.
+## Database (`db.py`)
+```python
+from collections.abc import AsyncIterator
+from typing import Annotated
+from fastapi import Depends
+from sqlalchemy.ext.asyncio import (
+    AsyncEngine,
+    AsyncSession,
+    async_sessionmaker,
+    create_async_engine,
+)
+from myapi.config import get_settings
+def make_engine() -> AsyncEngine:
+    settings = get_settings()
+    return create_async_engine(
+        str(settings.database_url),
+        echo=settings.debug,
+        pool_pre_ping=True,
+    )
+_engine = make_engine()
+_SessionFactory = async_sessionmaker(_engine, expire_on_commit=False)
+async def get_session() -> AsyncIterator[AsyncSession]:
+    async with _SessionFactory() as session:
+        yield session
+SessionDep = Annotated[AsyncSession, Depends(get_session)]
+```
+`expire_on_commit=False` is essential for FastAPI - otherwise attribute access after commit triggers an implicit refresh and errors out under async.
+## Models (`models.py`)
+```python
+from datetime import datetime, UTC
+from sqlalchemy import DateTime, String, func
+from sqlalchemy.orm import (
+    DeclarativeBase,
+    Mapped,
+    MappedAsDataclass,
+    mapped_column,
+)
+class Base(MappedAsDataclass, DeclarativeBase):
+    pass
+class User(Base):
+    __tablename__ = "users"
+    id: Mapped[int] = mapped_column(primary_key=True, init=False)
+    email: Mapped[str] = mapped_column(String(255), unique=True, index=True)
+    name: Mapped[str] = mapped_column(String(100))
+    created_at: Mapped[datetime] = mapped_column(
+        DateTime(timezone=True),
+        server_default=func.now(),
+        init=False,
+    )
+```
+`MappedAsDataclass` makes `User(email=..., name=...)` work as a real dataclass constructor. `init=False` excludes the auto-generated columns (`id`, `created_at`) from `__init__`.
+## Schemas (`schemas.py`)
+```python
+from datetime import datetime
+from pydantic import BaseModel, ConfigDict, EmailStr
+class UserCreate(BaseModel):
+    email: EmailStr
+    name: str
+class UserRead(BaseModel):
+    model_config = ConfigDict(from_attributes=True)  # SQLAlchemy → Pydantic
+    id: int
+    email: EmailStr
+    name: str
+    created_at: datetime
+```
+Always have a separate `*Create` (input) and `*Read` (output) model. Never expose your ORM model as the API model.
+## Routers (`routers/users.py`)
+```python
+from fastapi import APIRouter, HTTPException, status
+from sqlalchemy import select
+from myapi.db import SessionDep
+from myapi.models import User
+from myapi.schemas import UserCreate, UserRead
+router = APIRouter(prefix="/users", tags=["users"])
+@router.post("", response_model=UserRead, status_code=status.HTTP_201_CREATED)
+async def create_user(payload: UserCreate, session: SessionDep) -> User:
+    user = User(email=payload.email, name=payload.name)
+    session.add(user)
+    await session.commit()
+    await session.refresh(user)
+    return user
+@router.get("/{user_id}", response_model=UserRead)
+async def get_user(user_id: int, session: SessionDep) -> User:
+    result = await session.execute(select(User).where(User.id == user_id))
+    user = result.scalar_one_or_none()
+    if user is None:
+        raise HTTPException(status.HTTP_404_NOT_FOUND, "User not found")
+    return user
+@router.get("", response_model=list[UserRead])
+async def list_users(session: SessionDep, limit: int = 100) -> list[User]:
+    result = await session.execute(select(User).limit(limit))
+    return list(result.scalars().all())
+```
+## Application (`main.py`)
+```python
+from contextlib import asynccontextmanager
+from collections.abc import AsyncIterator
+from fastapi import FastAPI
+from myapi.config import get_settings
+from myapi.routers import users
+@asynccontextmanager
+async def lifespan(_: FastAPI) -> AsyncIterator[None]:
+    # Startup: warm up engine pool, run migrations check, etc.
+    yield
+    # Shutdown: close engine
+    from myapi.db import _engine
+    await _engine.dispose()
+def create_app() -> FastAPI:
+    settings = get_settings()
+    app = FastAPI(
+        title="My API",
+        debug=settings.debug,
+        lifespan=lifespan,
+    )
+    app.include_router(users.router)
+    return app
+app = create_app()
+```
+Run with:
+```bash
+uv run uvicorn myapi.main:app --host 0.0.0.0 --port 8000 --reload
+```
+## Migrations (Alembic + async)
+```bash
+uv run alembic init -t async migrations
+```
+In `migrations/env.py` replace the `target_metadata` line:
+```python
+from myapi.models import Base
+target_metadata = Base.metadata
+```
+Set `sqlalchemy.url` in `alembic.ini` to your async URL or override via `env.py`:
+```python
+from myapi.config import get_settings
+config.set_main_option("sqlalchemy.url", str(get_settings().database_url))
+```
+Generate and apply:
+```bash
+uv run alembic revision --autogenerate -m "create users"
+uv run alembic upgrade head
+```
+## Tests (`tests/test_users.py`)
+```python
+import pytest
+from httpx import ASGITransport, AsyncClient
+from myapi.main import app
+@pytest.mark.anyio
+async def test_create_and_get_user() -> None:
+    async with AsyncClient(transport=ASGITransport(app=app), base_url="http://test") as client:
+        create_response = await client.post(
+            "/users",
+            json={"email": "alice@example.com", "name": "Alice"},
+        )
+        assert create_response.status_code == 201
+        user_id = create_response.json()["id"]
+        get_response = await client.get(f"/users/{user_id}")
+        assert get_response.status_code == 200
+        assert get_response.json()["email"] == "alice@example.com"
+```
+For database-backed tests, run a Postgres container in CI (`testcontainers-python` or `docker-compose`) and apply migrations against a test schema. SQLite-as-test-db breaks once you use Postgres-specific types (`JSONB`, `tsvector`, arrays).
+## Common pitfalls
+| Pitfall | Fix |
+|---|---|
+| `MissingGreenlet` exception when accessing relationships after commit | `expire_on_commit=False` on the session factory |
+| Connection pool exhausted under load | Set `pool_size`, `max_overflow` in `create_async_engine` |
+| Pydantic v1 syntax (`from pydantic import ...; class X(BaseModel): class Config: orm_mode = True`) | v2 uses `model_config = ConfigDict(from_attributes=True)` |
+| Returning ORM objects without `response_model` | FastAPI serialises with `from_attributes=True` automatically; declare `response_model` so OpenAPI is correct |
+| `await session.execute(...)` returning Sequence | Wrap with `list(result.scalars().all())` to satisfy strict types |
+| `func.now()` returning naive datetime | Use `DateTime(timezone=True)` and `created_at: Mapped[datetime]` with `UTC`-aware default |
+## Sources
+- FastAPI: <https://fastapi.tiangolo.com>
+- SQLAlchemy 2.x async: <https://docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html>
+- SQLAlchemy MappedAsDataclass: <https://docs.sqlalchemy.org/en/20/orm/dataclasses.html>
+- asyncpg: <https://magicstack.github.io/asyncpg/current/>
+- Pydantic v2 migration: <https://docs.pydantic.dev/latest/migration/>
+- Alembic async: <https://alembic.sqlalchemy.org/en/latest/cookbook.html#using-asyncio-with-alembic>