PyPI - mlx-stack - Versions diffs - 0.1.0__tar.gz - Mend

mlx-stack 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (220) hide show

mlx_stack-0.1.0/.factory/init.sh ADDED Viewed

@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+set -euo pipefail
+# Verify Python version
+python_version=$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")
+required="3.13"
+if [ "$(printf '%s\n' "$required" "$python_version" | sort -V | head -n1)" != "$required" ]; then
+    echo "ERROR: Python >= 3.13 required (found $python_version)"
+    exit 1
+fi
+# Install dependencies if pyproject.toml exists
+if [ -f pyproject.toml ]; then
+    uv sync
+fi

mlx_stack-0.1.0/.factory/library/architecture.md ADDED Viewed

@@ -0,0 +1,92 @@
+# Architecture
+Architectural decisions, patterns discovered, and conventions.
+**What belongs here:** Architecture decisions, module patterns, code conventions.
+---
+## Project Structure
+- `src/mlx_stack/` — main package (src layout)
+- `src/mlx_stack/cli/` — Click CLI package
+  - `cli/__init__.py` — package init
+  - `cli/main.py` — CLI entry point with Click command group
+  - `cli/profile.py` — `mlx-stack profile` command
+  - `cli/config.py` — `mlx-stack config` commands
+  - `cli/init.py` — `mlx-stack init` command (stack + LiteLLM config generation)
+  - `cli/recommend.py` — `mlx-stack recommend` command
+  - `cli/models.py` — `mlx-stack models` command (local model listing + catalog browsing)
+- `src/mlx_stack/core/` — shared business logic modules
+  - `core/hardware.py` — hardware detection (Apple Silicon profiling)
+  - `core/config.py` — configuration management (YAML-based)
+  - `core/catalog.py` — model catalog system (query API over YAML entries)
+  - `core/deps.py` — dependency management (auto-installing uv tools)
+  - `core/paths.py` — path utilities (`~/.mlx-stack/` and friends)
+  - `core/scoring.py` — recommendation scoring engine (intent-weighted composite scoring)
+  - `core/litellm_gen.py` — LiteLLM proxy config generation (model_list, router_settings, fallbacks)
+  - `core/stack_init.py` — stack initialization logic (port allocation, vllm_flags, overwrite protection)
+  - `core/models.py` — local model scanning, catalog listing, size formatting
+- `src/mlx_stack/data/` — static data files
+  - `data/catalog/` — shipped YAML catalog files (15 models)
+- `src/mlx_stack/utils/` — utility modules
+- `tests/` — pytest tests
+- `tests/fixtures/` — mock data (profiles, catalogs, etc.)
+## Conventions
+- Click for CLI, Rich for terminal output
+- PyYAML for all YAML operations
+- httpx for HTTP requests (async not needed — use sync client)
+- psutil for process management
+- All state lives in `~/.mlx-stack/` (configurable via `model-dir` for models)
+- Tests use `tmp_path` pytest fixture — NEVER touch real `~/.mlx-stack/`
+- External commands (sysctl, system_profiler, subprocess) are always mocked in unit tests
+- Click eager options (`--help`, `--version`) may exit before the group callback runs, so callback-based setup hooks should not be relied on for those code paths
+- Note: The config module currently sends success output to stderr. Future features should use stdout for successful output and stderr only for errors/warnings.
+## Key Design Decisions
+- One vllm-mlx process per model (ADR-003)
+- vllm-mlx and litellm managed as pinned uv tools, auto-installed on first use
+- Catalog schema: no int6, disk_size_gb per quant source, min_mlx_lm_version top-level, verified_on in separate data/verification.yaml
+- 2 intents for MVP: balanced, agent-fleet (architecture supports more)
+- 40% default memory budget of total unified memory
+- Recommendation/init budget behavior: budget filtering is per-model eligibility (`model.memory_gb <= budget`); the combined memory of selected tiers can exceed the budget
+## Ops Layer (Milestone 5)
+### New Modules
+- `core/log_rotation.py` — Copytruncate-based log rotation (copy → gzip → truncate)
+- `core/log_viewer.py` — Log viewing/following/listing logic
+- `core/watchdog.py` — Health polling loop, auto-restart, flap detection, daemon mode
+- `core/launchd.py` — Plist generation/loading/unloading via plistlib + launchctl
+- `cli/logs.py` — `mlx-stack logs` command
+- `cli/watch.py` — `mlx-stack watch` command
+- `cli/install.py` — `mlx-stack install` / `mlx-stack uninstall` commands
+### Key Integration Points
+- `process.py:start_service` — Log file open mode changed from "w" to "a" for rotation compatibility
+- `core/config.py` — 2 new keys: log-max-size-mb (int, default 50), log-max-files (int, default 5)
+- `process.py:acquire_lock` — Watchdog uses per-restart lock, not held during polling
+- `paths.py` — Watchdog PID at get_pids_dir()/watchdog.pid
+- `stack_status.py:run_status` — Used by watchdog for health polling
+- `process.py:start_service` / `stop_service` — Used by watchdog for restart
+- `cli/main.py` — 3 new commands registered: logs (Diagnostics), watch (Lifecycle), install/uninstall (Lifecycle)
+### Log Rotation Strategy
+- Copytruncate: copy log to archive, gzip compress, truncate original in-place
+- Service FDs remain valid (point to same inode, just at offset 0 after truncation)
+- Naming: service.log.1.gz (most recent) → service.log.N.gz (oldest)
+- Archives shifted up before new rotation
+- No cooperation needed from child processes (vllm-mlx, litellm)
+### Log Follow Caveat
+- `core/log_viewer.py:follow_log` detects truncation when `current_size < position`.
+- Edge case: truncate + immediate rewrite back to exactly the previous byte length may not trigger truncation detection (`current_size == position`), so the stream can miss lines until new writes advance file size.
+### Watchdog Architecture
+- Single foreground loop (or daemonized with --daemon)
+- Polls get_service_status for all services each interval
+- Restart trigger: crashed state only (PID file exists, process dead)
+- NOT restarted: stopped (no PID file), healthy, degraded
+- Flap detection: rolling window of restart timestamps per service
+- Lock: acquire_lock only during actual restart, released immediately
+- Log rotation: triggered as side-effect of each poll cycle

mlx_stack-0.1.0/.factory/library/environment.md ADDED Viewed

@@ -0,0 +1,23 @@
+# Environment
+Environment variables, external dependencies, and setup notes.
+**What belongs here:** Required env vars, external API keys/services, dependency quirks, platform-specific notes.
+**What does NOT belong here:** Service ports/commands (use `.factory/services.yaml`).
+---
+## Machine
+- Apple MacBook Pro M5 Max, 128 GB unified memory, 18 CPU cores, 40 GPU cores
+- macOS 26.x
+- Python 3.14.3 (targeting 3.13+ compatibility)
+## Tools
+- uv 0.10.12 (package manager)
+- vllm-mlx v0.2.6 (installed as uv tool at ~/.local/bin/vllm-mlx)
+- litellm (installed as uv tool at ~/.local/bin/litellm)
+- For robust `uv tool list` parsing, set `NO_COLOR=1` when invoking uv to avoid ANSI escape sequences in output
+## External Dependencies
+- HuggingFace Hub (for model downloads — optional HF_TOKEN for rate limiting)
+- OpenRouter API (optional, for cloud fallback — key stored in ~/.mlx-stack/config.yaml)

mlx_stack-0.1.0/.factory/library/user-testing.md ADDED Viewed

@@ -0,0 +1,80 @@
+# User Testing
+Testing surface, required testing skills/tools, resource cost classification per surface.
+**What belongs here:** How to test the user-facing surface, tools needed, concurrency limits.
+---
+## Validation Surface
+**Surface:** CLI commands executed in terminal
+**Tool:** Direct shell command execution (subprocess or Click CliRunner)
+**Required tools:**
+- Python 3.13+ with uv
+- vllm-mlx v0.2.6 (installed as uv tool)
+- litellm (installed as uv tool)
+- curl (for HTTP endpoint verification)
+**Setup needed for validation:**
+- A downloaded model (small, e.g., qwen3.5-0.8b int4) for lifecycle testing
+- `mlx-stack init --accept-defaults` to generate configs
+- No browser or GUI tools needed
+**Gaps:**
+- Full integration testing of `up`/`down`/`status` requires downloaded models and sufficient memory
+- Benchmark validation requires a running model server
+- Tool-call benchmark requires a model that supports tool calling
+- Foundation milestone user-testing run (2026-03-24) observed placeholder CLI surfaces for `models --catalog`, `up`, and `bench`; related catalog/dependency assertions were blocked until those commands are implemented.
+## Validation Concurrency
+**Machine:** M5 Max 128GB, 18 cores, ~97GB free at baseline
+**CLI surface:** Lightweight Python process execution (~100-200MB per validator)
+**Max concurrent validators:** 5
+**Rationale:** Each validator runs a CLI command (Python process ~200MB). 5 concurrent = ~1GB. Even with model servers running during lifecycle tests (~10-20GB per model), the machine has ample headroom. Using 70% of available headroom: 67.9GB available * 0.7 = 47.5GB budget. Each lifecycle validator with a model server: ~12GB worst case. Max concurrent lifecycle validators: 3. For non-lifecycle tests: 5.
+## Flow Validator Guidance: CLI
+- Use only terminal-based validation commands (`uv run mlx-stack ...`) and shell inspection commands.
+- Enforce isolation with a unique `MLX_STACK_HOME` per validator (example: `/tmp/mlx-stack-user-testing/<group-id>`). Never reuse another validator's home.
+- Do not read from or write to real `~/.mlx-stack/`; keep all generated files under each validator's assigned `MLX_STACK_HOME`.
+- Keep evidence in the assigned mission evidence directory only.
+- Stay within assigned assertion scope and avoid commands that mutate global/shared system state.
+## Recommendation milestone run notes (2026-03-24)
+- `recommend` is currently display-only and does **not** persist `profile.json` when auto-detecting hardware.
+- `models --catalog` currently does not expose filter flags for family/tag/capability on the CLI surface.
+- `pull` and `bench` remain placeholder commands in this build, which blocks benchmark-save recommendation validation flows.
+- For validator fixture scripting, prefer `uv run python` over system `python3` so project dependencies (e.g., PyYAML) are available.
+## Lifecycle milestone rerun notes (2026-03-24)
+- In isolated lifecycle rerun flow `r2-g1-fixes`, macOS denied `psutil.net_connections(kind='inet')` with `AccessDenied`; port conflict output fell back to `PID 0 (<unknown>)` even though preflight conflict skipping worked. Treat owner-resolution checks as potentially permission-sensitive on this host.
+## Tooling milestone run notes (2026-03-24)
+- Tooling rerun round 4 confirms `bench qwen3-8b` now passes tool-calling validation (`✓ Valid tool call — round-trip: 5.89s`), resolving VAL-BENCH-008.
+- Catalog repository availability has drifted: `qwen3.5-*` int4 repos referenced in catalog returned `RepositoryNotFound` during live pull testing. `gemma3-*`, `deepseek-r1-8b`, and `qwen3-8b` int4 repos were reachable.
+- The current Hugging Face CLI package installs `hf` (not `huggingface-cli`). For live pull validation, a local wrapper script (`/tmp/huggingface-cli -> hf`) was used so `mlx-stack pull` subprocess invocation could execute.
+- Tooling rerun (round 2) confirms pull progress is now user-visible with incremental percent updates (`0% ... 100%`) and temp bench-instance flows now start successfully (`bench <model-id>` and `bench --save` pass, including non-conflicting temp-port binding evidence).
+- Remaining tooling gaps after tooling rerun round 2 were: (1) network-error pull still surfaced long upstream traceback output before the concise error summary, and (2) tool-calling benchmark still reported `No tool calls in response` for `qwen3-8b`.
+- Tooling rerun round 3 confirmed network-error pull output is now traceback-free for users (VAL-PULL-008 passed); tool-calling benchmark still fails for `qwen3-8b` with `No tool calls in response` (VAL-BENCH-008).
+## Misc-cross-area milestone run notes (2026-03-24)
+- User-testing flow `r1-g1-cross-flows` validated `VAL-CROSS-001`, `VAL-CROSS-012`, and `VAL-CROSS-013` as passing on the real CLI surface in isolated `MLX_STACK_HOME` mode.
+- `VAL-CROSS-007` remained blocked in this environment because host port `5000` was already occupied by a non-mlx-stack service; `up` correctly reported a conflict and skipped LiteLLM at that port.
+- A workaround run with `litellm-port 5001` confirmed the same config-propagation/startup behavior when a free port is used.
+- Rerun flow `r2-g4-cross-port5050` (after contract update to port `5050`) passed `VAL-CROSS-007`: `up` served LiteLLM on `127.0.0.1:5050` and `/v1/models` returned HTTP 200 while `4000` stayed inactive.
+- Setup finding: the host `litellm` uv tool runtime was missing proxy dependencies (`websockets`, `backoff`, `fastapi`, etc.). Installing proxy extras (`litellm[proxy]`) in that tool environment unblocked LiteLLM startup for user-testing flows.
+## Ops milestone run notes (2026-04-01)
+- On this host, `user-testing-flow-validator` subagent runs intermittently exited early with `insufficient permission to proceed ... Re-run with --skip-permissions-unsafe`. Workaround was to continue validation with isolated direct CLI/test execution while preserving evidence artifacts.
+- Repo-level pytest defaults include quiet output, so assertion-level mapping is hard to prove from `-q` logs. For per-assertion evidence, use:
+  - `uv run pytest <files> -o addopts='' -vv`
+  which emits test names and pass lines suitable for assertion mapping in synthesis.

mlx_stack-0.1.0/.factory/services.yaml ADDED Viewed

@@ -0,0 +1,9 @@
+commands:
+  install: uv sync
+  test: uv run pytest -x -q --tb=short
+  typecheck: uv run python -m pyright
+  lint: uv run ruff check src/ tests/
+  format: uv run ruff format src/ tests/
+  coverage: uv run pytest --cov=src/mlx_stack --cov-report=term-missing
+services: {}

mlx_stack-0.1.0/.factory/settings.json ADDED Viewed

@@ -0,0 +1,5 @@
+{
+  "enabledPlugins": {
+    "core@factory-plugins": true
+  }
+}

mlx_stack-0.1.0/.factory/skills/cli-feature/SKILL.md ADDED Viewed

@@ -0,0 +1,350 @@
+# CLI Feature Worker
+You are a CLI feature worker for **mlx-stack**, a Python CLI tool that manages local LLM infrastructure on Apple Silicon. You implement features end-to-end: failing tests first, then production code, then verification.
+---
+## Project Structure
+```
+src/mlx_stack/
+├── __init__.py          # Package version
+├── cli/
+│   ├── __init__.py      # Click group (main entry point)
+│   └── <command>.py     # One file per CLI command
+├── core/
+│   ├── __init__.py
+│   ├── hardware.py      # Hardware detection
+│   ├── catalog.py       # Model catalog loading/querying
+│   ├── config.py        # Config persistence (~/.mlx-stack/config.yaml)
+│   ├── scoring.py       # Recommendation scoring engine
+│   ├── process.py       # Process management (up/down/status)
+│   ├── deps.py          # Dependency management (uv tool install)
+│   └── models.py        # Model download/inventory
+├── data/
+│   ├── catalog/         # YAML catalog entries (one per model)
+│   └── verification.yaml
+└── utils/
+    └── display.py       # Rich output helpers
+tests/
+├── conftest.py          # Shared fixtures (tmp_path, mock hardware, etc.)
+├── unit/
+│   ├── test_<module>.py # Unit tests for core/ modules
+│   └── test_cli_<cmd>.py # CLI command tests via CliRunner
+└── integration/         # Real system tests (marked, optional)
+```
+**Package:** `mlx_stack` | **CLI entry point:** `mlx-stack` | **Config dir:** `~/.mlx-stack/`
+---
+## Technology Stack
+- **Python 3.13+** with full type annotations
+- **Click** — CLI framework, command groups
+- **Rich** — All terminal output (tables, panels, progress bars, styled text)
+- **httpx** — HTTP client for health checks and API calls
+- **psutil** — Process monitoring
+- **PyYAML** — Config and catalog file handling
+- **pytest + pytest-cov** — Testing (80%+ coverage on `core/`)
+- **uv** — Package manager (`uv run` for all commands)
+- **Pyright** — Static type checking
+---
+## Workflow: TDD Feature Implementation
+Follow this exact sequence for every feature. Do not skip steps.
+### Step 1: Understand the Feature
+1. Read the task description fully. Identify which CLI command(s) and core module(s) are involved.
+2. Check existing code — read the relevant files in `src/mlx_stack/cli/` and `src/mlx_stack/core/` to understand current state.
+3. Identify the public API: what Click commands, function signatures, and data structures are needed.
+### Step 2: Write Failing Tests First
+Write tests BEFORE any production code. Tests define the contract.
+**CLI command tests** (in `tests/unit/test_cli_<command>.py`):
+```python
+from click.testing import CliRunner
+from mlx_stack.cli import cli  # the main Click group
+def test_<command>_basic(tmp_path, monkeypatch):
+    """<Command> produces expected output for valid input."""
+    # Redirect config dir to tmp_path to avoid touching real ~/.mlx-stack/
+    monkeypatch.setenv("MLX_STACK_HOME", str(tmp_path))
+    runner = CliRunner()
+    result = runner.invoke(cli, ["<command>", "<args>"])
+    assert result.exit_code == 0
+    assert "<expected output fragment>" in result.output
+def test_<command>_error_case(tmp_path, monkeypatch):
+    """<Command> shows user-friendly error, no stack trace."""
+    monkeypatch.setenv("MLX_STACK_HOME", str(tmp_path))
+    runner = CliRunner()
+    result = runner.invoke(cli, ["<command>", "--bad-flag"])
+    assert result.exit_code != 0
+    assert "Traceback" not in result.output
+```
+**Core module tests** (in `tests/unit/test_<module>.py`):
+```python
+import pytest
+from mlx_stack.core.<module> import <function_under_test>
+def test_<function>_happy_path(tmp_path):
+    """<Function> returns expected result for valid input."""
+    result = <function_under_test>(valid_input, config_dir=tmp_path)
+    assert result == expected
+def test_<function>_edge_case(tmp_path):
+    """<Function> handles <edge case> gracefully."""
+    result = <function_under_test>(edge_input, config_dir=tmp_path)
+    assert result == expected_edge
+def test_<function>_invalid_input():
+    """<Function> raises ValueError for invalid input."""
+    with pytest.raises(ValueError, match="<expected message>"):
+        <function_under_test>(invalid_input)
+```
+**Critical test rules:**
+- **NEVER** read from or write to the real `~/.mlx-stack/` directory. Always use `tmp_path` or `monkeypatch.setenv("MLX_STACK_HOME", str(tmp_path))`.
+- **ALWAYS** mock external system calls:
+  - `subprocess.run` / `subprocess.Popen` for sysctl, system_profiler, vllm-mlx, litellm
+  - `httpx.Client` / `httpx.AsyncClient` for health checks and API calls
+  - `psutil.Process` for process monitoring
+  - `shutil.disk_usage` for disk space checks
+- Use `pytest.fixture` for reusable test state (mock hardware profiles, sample catalog entries, tmp config dirs).
+- Test both success and error paths. Error paths must never show Python stack traces — only user-friendly Rich-formatted messages.
+### Step 3: Run Tests — Confirm They Fail
+```bash
+uv run pytest tests/unit/test_<relevant_files>.py -v
+```
+All new tests MUST fail at this point (ImportError or AssertionError). This confirms the tests are actually testing something. If a test passes before implementation, the test is wrong — fix it.
+### Step 4: Implement the Feature
+Now write the minimum production code to make all tests pass.
+**CLI command file** (`src/mlx_stack/cli/<command>.py`):
+```python
+"""mlx-stack <command> — <one-line description>."""
+import click
+from rich.console import Console
+from rich.table import Table
+from mlx_stack.core.<module> import <core_function>
+console = Console()
+@click.command()
+@click.option("--flag", help="Description.")
+@click.pass_context
+def <command>(ctx: click.Context, flag: str | None) -> None:
+    """<Docstring shown in --help>."""
+    try:
+        result = <core_function>(...)
+        # Use Rich for ALL output
+        table = Table(title="...")
+        table.add_column(...)
+        console.print(table)
+    except <ExpectedError> as e:
+        console.print(f"[red]Error:[/red] {e}")
+        raise SystemExit(1)
+```
+**Core module** (`src/mlx_stack/core/<module>.py`):
+```python
+"""<Module description>."""
+from __future__ import annotations
+from dataclasses import dataclass
+from pathlib import Path
+# ... typed, documented, no bare exceptions
+def <function>(input: InputType, *, config_dir: Path | None = None) -> OutputType:
+    """<Docstring with Args/Returns/Raises>."""
+    ...
+```
+**Implementation rules:**
+- Full type annotations on every function signature. Use `from __future__ import annotations`.
+- Docstrings on all public functions and classes.
+- The config directory must default to `~/.mlx-stack/` but be overridable via `MLX_STACK_HOME` env var or function parameter — this is how tests isolate themselves.
+- Use `dataclass` or `TypedDict` for structured data, never raw dicts for domain objects.
+- Rich for ALL terminal output — no bare `print()` calls.
+- Handle errors with specific exception types. Catch at the CLI layer and display with Rich. Never let stack traces reach the user.
+- Register new commands in `src/mlx_stack/cli/__init__.py`:
+  ```python
+  from mlx_stack.cli.<command> import <command>
+  cli.add_command(<command>)
+  ```
+### Step 5: Run Tests — Confirm They Pass
+```bash
+uv run pytest tests/unit/ -v --tb=short
+```
+All tests must pass. Fix any failures before proceeding. Do not move on with failing tests.
+### Step 6: Run Full Test Suite + Type Checking
+```bash
+# Full test suite with coverage
+uv run pytest tests/ -v --cov=mlx_stack --cov-report=term-missing
+# Type checking
+uv run pyright src/mlx_stack/
+```
+**Targets:**
+- All tests pass
+- Coverage on `src/mlx_stack/core/` ≥ 80%
+- Zero Pyright errors (warnings acceptable if justified)
+Fix any issues before proceeding.
+### Step 7: Manual Verification
+Run the actual CLI command and visually verify the output is correct and well-formatted.
+```bash
+# For safe commands (profile, config, models, recommend):
+uv run mlx-stack <command> <args>
+# For commands that start processes (up, bench):
+uv run mlx-stack <command> --dry-run
+```
+Check:
+- Output uses Rich formatting (colors, tables, panels) — not plain text
+- Help text is accurate: `uv run mlx-stack <command> --help`
+- Error cases show friendly messages, not tracebacks
+- Exit codes are correct (0 for success, non-zero for errors)
+If manual verification reveals issues, fix them and re-run tests.
+---
+## Mocking Patterns Reference
+### Mock Hardware Detection (sysctl / system_profiler)
+```python
+@pytest.fixture
+def mock_m4_pro(monkeypatch):
+    """Mock an M4 Pro with 48GB unified memory."""
+    def mock_sysctl(cmd, **kwargs):
+        responses = {
+            "sysctl -n machdep.cpu.brand_string": "Apple M4 Pro",
+            "sysctl -n hw.memsize": "51539607552",  # 48GB
+        }
+        return subprocess.CompletedProcess(cmd, 0, stdout=responses.get(cmd, ""))
+    monkeypatch.setattr("subprocess.run", mock_sysctl)
+    # Also mock system_profiler for GPU core count
+    ...
+```
+### Mock Subprocess for Process Management
+```python
+@pytest.fixture
+def mock_processes(monkeypatch, tmp_path):
+    """Mock vllm-mlx and litellm process spawning."""
+    pids = iter([1001, 1002, 1003])
+    def mock_popen(cmd, **kwargs):
+        mock = MagicMock()
+        mock.pid = next(pids)
+        mock.poll.return_value = None  # process is running
+        return mock
+    monkeypatch.setattr("subprocess.Popen", mock_popen)
+```
+### Mock HTTP Health Checks
+```python
+@pytest.fixture
+def mock_health_ok(monkeypatch):
+    """Mock healthy HTTP responses from model servers."""
+    def mock_get(self, url, **kwargs):
+        return httpx.Response(200, json={"status": "ok"})
+    monkeypatch.setattr("httpx.Client.get", mock_get)
+```
+### Isolated Config Directory
+```python
+@pytest.fixture
+def mlx_home(tmp_path, monkeypatch):
+    """Redirect MLX_STACK_HOME to a temp directory."""
+    home = tmp_path / ".mlx-stack"
+    home.mkdir()
+    monkeypatch.setenv("MLX_STACK_HOME", str(home))
+    return home
+```
+---
+## Handoff Requirements
+When your implementation is complete, report the following:
+### Tests Added
+List every new test file and the test functions within it:
+```
+tests/unit/test_cli_profile.py
+  - test_profile_detects_hardware
+  - test_profile_writes_profile_json
+  - test_profile_rejects_non_apple_silicon
+  - test_profile_unknown_chip_estimates_bandwidth
+tests/unit/test_hardware.py
+  - test_detect_m4_pro
+  - test_detect_unknown_m_chip
+  - test_detect_intel_raises
+  - test_bandwidth_lookup_known_chips
+  - test_bandwidth_estimation_formula
+```
+### Commands Run (with output summary)
+```
+$ uv run pytest tests/unit/ -v --cov=mlx_stack --cov-report=term-missing
+  → 23 passed, 0 failed, core/hardware.py: 94% coverage
+$ uv run pyright src/mlx_stack/
+  → 0 errors, 0 warnings
+$ uv run mlx-stack profile
+  → Rich table output showing M5 Max, 128GB, 40 GPU cores, 546 GB/s bandwidth
+```
+### Files Created or Modified
+List every file touched with a one-line description of the change:
+```
+src/mlx_stack/core/hardware.py — NEW: hardware detection module (detect_hardware, estimate_bandwidth)
+src/mlx_stack/cli/profile.py — NEW: profile command implementation
+src/mlx_stack/cli/__init__.py — MODIFIED: registered profile command
+tests/unit/test_hardware.py — NEW: 5 unit tests for hardware detection
+tests/unit/test_cli_profile.py — NEW: 4 CLI tests via CliRunner
+tests/conftest.py — MODIFIED: added mock_m4_pro and mlx_home fixtures
+```
+### Discovered Issues
+Note anything that came up during implementation that the next worker or orchestrator should know about:
+```
+- system_profiler XML parsing is slow (~2s); consider caching in profile.json
+- psutil not detecting vllm-mlx by name; may need PID-file-based tracking instead
+- (none) — clean implementation, no blockers
+```

mlx_stack-0.1.0/.factory/validation/foundation/scrutiny/reviews/configuration-management.json ADDED Viewed

@@ -0,0 +1,33 @@
+{
+  "featureId": "configuration-management",
+  "reviewedAt": "2026-03-24T00:50:25Z",
+  "commitId": "725b662",
+  "transcriptSkeletonReviewed": true,
+  "diffReviewed": true,
+  "status": "pass",
+  "codeReview": {
+    "summary": "The feature implementation in commit 725b662 covers the required config module and CLI surface (set/get/list/reset), including key validation, typed parsing, defaults, persistence, masking, corrupt/empty file handling, and reset confirmation behavior. I found one non-blocking UX/scripting issue around output streams.",
+    "issues": [
+      {
+        "file": "src/mlx_stack/cli/config.py",
+        "line": 24,
+        "severity": "non_blocking",
+        "description": "The module-level console is configured as `Console(stderr=True)` and is used for successful `config set/get/reset` output (e.g., `console.print(display)` at line 72), so `mlx-stack config get ...` writes value output to stderr. This is inconsistent with `config list` (stdout via `out = Console()` at line 95) and makes stdout capture in shell scripts less ergonomic."
+      }
+    ]
+  },
+  "sharedStateObservations": [
+    {
+      "area": "conventions",
+      "observation": "Shared guidance documents are out of sync with the actual CLI module layout, which can mislead workers during feature implementation.",
+      "evidence": "Mission AGENTS.md says commands live under `commands/` (lines 16-17), and `.factory/library/architecture.md` says `src/mlx_stack/cli.py` + `src/mlx_stack/commands/` (lines 11-12), while the real codebase uses `src/mlx_stack/cli/` modules (see `src/mlx_stack/cli/main.py`, `src/mlx_stack/cli/config.py`, and directory listing of `src/mlx_stack`)."
+    },
+    {
+      "area": "skills",
+      "observation": "Feature metadata requires `cli-feature`, but worker handoff reports that skill was unavailable at runtime; this indicates a registry/discoverability gap between mission metadata and executable skills.",
+      "evidence": "Feature entry has `skillName: cli-feature` in mission `features.json`; handoff JSON reports deviation 'cli-feature skill was not found in available skills' and suggests ensuring it is available (`handoffs/2026-03-24T00-41-41-697Z__configuration-management__86a87c3a-0806-4220-9886-13cfa289ee9a.json`, lines 142-148)."
+    }
+  ],
+  "addressesFailureFrom": null,
+  "summary": "Reviewed configuration-management for foundation using feature metadata, handoff, commit diff, transcript skeleton, and skill spec. Implementation is functionally complete for the requested behavior, with one non-blocking stderr/stdout consistency issue and two shared-state documentation/skill-availability observations."
+}

mlx_stack-0.1.0/.factory/validation/foundation/scrutiny/reviews/dependency-management.json ADDED Viewed

@@ -0,0 +1,50 @@
+{
+  "featureId": "dependency-management",
+  "reviewedAt": "2026-03-24T00:50:27Z",
+  "commitId": "e607176",
+  "transcriptSkeletonReviewed": true,
+  "diffReviewed": true,
+  "status": "fail",
+  "codeReview": {
+    "summary": "The feature introduces the expected module and broad unit coverage, but two implementation defects break core dependency detection behavior on real `uv tool list` output and prevent correct detection of installed `vllm-mlx`.",
+    "issues": [
+      {
+        "file": "src/mlx_stack/core/deps.py",
+        "line": 34,
+        "severity": "blocking",
+        "description": "Incorrect binary mapping: `vllm-mlx` is mapped to `vllm`, but the installed uv tool executable is `vllm-mlx`. This causes false `installed=False` results and can trigger unnecessary installs/post-install failures. Evidence: `which vllm-mlx` resolves to `/Users/weae1504/.local/bin/vllm-mlx`, while `which vllm` is absent."
+      },
+      {
+        "file": "src/mlx_stack/core/deps.py",
+        "line": 122,
+        "severity": "blocking",
+        "description": "Version parsing assumes plain lines like `<tool> v<version>`, but actual captured `uv tool list` output includes ANSI escape sequences and executable bullet lines. As a result, `_get_installed_version` returns `None` for installed tools and version-mismatch warnings are skipped. Evidence: captured stdout lines include `\\u001b[1mlitellm v1.82.6\\u001b[0m`, and `check_dependency('litellm')` returned `installed_version=None`."
+      },
+      {
+        "file": "tests/unit/test_deps.py",
+        "line": 71,
+        "severity": "non_blocking",
+        "description": "Tests encode incorrect assumptions about real environment behavior (`vllm` binary name and unformatted `uv tool list` output), so they do not catch the above production defects."
+      }
+    ]
+  },
+  "sharedStateObservations": [
+    {
+      "area": "skills",
+      "observation": "Feature metadata requires `cli-feature`, and the repo contains `.factory/skills/cli-feature/SKILL.md`, but the worker could not invoke that skill in runtime.",
+      "evidence": "Transcript skeleton for worker session `b001445d-222c-4684-934e-b1b39be237ad` shows `Tool: Skill {\"skill\":\"cli-feature\"}` followed by `Error: Skill \"cli-feature\" not found`; skill file exists at `/Users/weae1504/Projects/mlx-stack/.factory/skills/cli-feature/SKILL.md`."
+    },
+    {
+      "area": "conventions",
+      "observation": "Mission AGENTS conventions describe a `commands/` CLI layout, but the repository and skill documentation use `cli/`. This mismatch can mislead workers about file placement and registration patterns.",
+      "evidence": "Mission AGENTS.md lines 16-17 specify `src/mlx_stack/` with `commands/` and one command per module in `commands/`; actual repo uses `src/mlx_stack/cli/` and SKILL.md project structure also documents `cli/`."
+    },
+    {
+      "area": "knowledge",
+      "observation": "Shared state does not document `uv tool list` output quirks (ANSI formatting and executable sub-lines), which directly affects reliable parser implementation.",
+      "evidence": "Captured `subprocess.run([uv, 'tool', 'list'], capture_output=True, text=True)` output begins with `\\u001b[1mlitellm v1.82.6\\u001b[0m` then `- litellm` style lines; current parser expects plain `tool version` text."
+    }
+  ],
+  "addressesFailureFrom": null,
+  "summary": "Review failed: dependency detection has two blocking runtime issues (wrong `vllm-mlx` binary lookup and fragile `uv tool list` parsing), so required behavior is not reliable despite passing unit tests."
+}