ophar 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ophar-0.1.0/AGENTS.md +30 -0
- ophar-0.1.0/CLAUDE.md +38 -0
- ophar-0.1.0/LICENSE +21 -0
- ophar-0.1.0/MANIFEST.in +10 -0
- ophar-0.1.0/PKG-INFO +394 -0
- ophar-0.1.0/README.md +374 -0
- ophar-0.1.0/cli/client.py +137 -0
- ophar-0.1.0/cli/commands/metrics.py +59 -0
- ophar-0.1.0/cli/commands/settings.py +25 -0
- ophar-0.1.0/cli/commands/system.py +76 -0
- ophar-0.1.0/cli/commands/tasks.py +104 -0
- ophar-0.1.0/cli/display/formatting.py +29 -0
- ophar-0.1.0/cli/main.py +19 -0
- ophar-0.1.0/harness/checkpoint.sh +106 -0
- ophar-0.1.0/harness/dispatch.sh +194 -0
- ophar-0.1.0/harness/ground-truth.sh +121 -0
- ophar-0.1.0/harness/iterate.sh +137 -0
- ophar-0.1.0/harness/land.sh +47 -0
- ophar-0.1.0/harness/ledger.sh +39 -0
- ophar-0.1.0/harness/lib/adapt-report.sh +37 -0
- ophar-0.1.0/harness/lib/log-metrics.sh +71 -0
- ophar-0.1.0/harness/lib/log-opus.sh +48 -0
- ophar-0.1.0/harness/lib/mock-claude.sh +36 -0
- ophar-0.1.0/harness/lib/mock-cursor-agent.sh +170 -0
- ophar-0.1.0/harness/mcp_server.py +462 -0
- ophar-0.1.0/harness/metrics-report.sh +175 -0
- ophar-0.1.0/harness/orchestrate.sh +221 -0
- ophar-0.1.0/harness/reconcile.sh +109 -0
- ophar-0.1.0/harness/route-report.sh +111 -0
- ophar-0.1.0/harness/run.sh +75 -0
- ophar-0.1.0/harness/verdict.sh +91 -0
- ophar-0.1.0/harness/verify-heldout.sh +126 -0
- ophar-0.1.0/heldout/T-0002/manifest.json +8 -0
- ophar-0.1.0/heldout/T-0002/test_heldout_signals.py +39 -0
- ophar-0.1.0/heldout/T-1001/manifest.json +8 -0
- ophar-0.1.0/heldout/T-1001/test_heldout_signals.py +55 -0
- ophar-0.1.0/heldout/T-RESERVE-DEMO/manifest.json +12 -0
- ophar-0.1.0/heldout/T-RESERVE-DEMO/test_place.py +15 -0
- ophar-0.1.0/heldout/T-RESERVE-DEMO/test_reserve.py +16 -0
- ophar-0.1.0/ophar/__init__.py +7 -0
- ophar-0.1.0/ophar/bootstrap.py +84 -0
- ophar-0.1.0/ophar/mcp_entry.py +33 -0
- ophar-0.1.0/ophar/paths.py +51 -0
- ophar-0.1.0/ophar/setup_cmd.py +99 -0
- ophar-0.1.0/ophar.egg-info/PKG-INFO +394 -0
- ophar-0.1.0/ophar.egg-info/SOURCES.txt +73 -0
- ophar-0.1.0/ophar.egg-info/dependency_links.txt +1 -0
- ophar-0.1.0/ophar.egg-info/entry_points.txt +4 -0
- ophar-0.1.0/ophar.egg-info/requires.txt +11 -0
- ophar-0.1.0/ophar.egg-info/top_level.txt +3 -0
- ophar-0.1.0/orchestrator-pipeline-plan.md +513 -0
- ophar-0.1.0/pyproject.toml +36 -0
- ophar-0.1.0/server/__init__.py +0 -0
- ophar-0.1.0/server/config.py +83 -0
- ophar-0.1.0/server/main.py +59 -0
- ophar-0.1.0/server/models/__init__.py +85 -0
- ophar-0.1.0/server/routers/__init__.py +0 -0
- ophar-0.1.0/server/routers/ledger.py +36 -0
- ophar-0.1.0/server/routers/metrics.py +29 -0
- ophar-0.1.0/server/routers/settings.py +28 -0
- ophar-0.1.0/server/routers/state.py +21 -0
- ophar-0.1.0/server/routers/tasks.py +141 -0
- ophar-0.1.0/server/services/__init__.py +0 -0
- ophar-0.1.0/server/services/dispatch.py +175 -0
- ophar-0.1.0/server/services/metrics.py +85 -0
- ophar-0.1.0/server/services/registry.py +88 -0
- ophar-0.1.0/server/services/state.py +40 -0
- ophar-0.1.0/server/ws/__init__.py +0 -0
- ophar-0.1.0/server/ws/events.py +75 -0
- ophar-0.1.0/setup.cfg +4 -0
- ophar-0.1.0/setup.py +45 -0
- ophar-0.1.0/state/STATE.md +77 -0
- ophar-0.1.0/tasks/T-0001.json +12 -0
- ophar-0.1.0/tasks/T-0002.json +13 -0
- ophar-0.1.0/tasks/T-1002.json +13 -0
ophar-0.1.0/AGENTS.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
1
|
+
# Executor boundaries (Composer)
|
|
2
|
+
|
|
3
|
+
You are the **executor**. You run headless inside a git worktree of the target repo and
|
|
4
|
+
your output is **untrusted** — an independent verifier re-derives ground truth (git diff,
|
|
5
|
+
tests, typecheck/lint, hidden checks) and decides accept/iterate. Honesty and staying in
|
|
6
|
+
bounds are what make you useful; gaming the visible signal is always caught and counts
|
|
7
|
+
against you.
|
|
8
|
+
|
|
9
|
+
## Scope
|
|
10
|
+
- Modify **only** files inside the task's allowed scope. Touching anything outside it fails
|
|
11
|
+
verification (the change is rejected and logged as a scope violation).
|
|
12
|
+
- Stay inside the worktree. Do not reach outside the workspace.
|
|
13
|
+
|
|
14
|
+
## How to make tests pass
|
|
15
|
+
- Fix the **real cause**. Do **not** delete, skip, comment out, or weaken tests/assertions,
|
|
16
|
+
and do not special-case the exact inputs shown in the visible tests.
|
|
17
|
+
- Correctness is judged on **general behavior**, from a different angle than the visible
|
|
18
|
+
tests. A solution tuned to the shown cases fails the hidden checks.
|
|
19
|
+
- Do not search for, read, or try to reconstruct hidden/held-out checks. They are not in
|
|
20
|
+
your workspace during the run; looking for them is wasted effort and a trust red flag.
|
|
21
|
+
|
|
22
|
+
## Forbidden / destructive
|
|
23
|
+
- No `rm -rf` outside the workspace, no `git push`, no `git reset --hard` on shared refs,
|
|
24
|
+
no rewriting git history, no network exfiltration, no editing CI/secrets.
|
|
25
|
+
- Prefer the smallest diff that satisfies the acceptance criterion.
|
|
26
|
+
|
|
27
|
+
## Reporting
|
|
28
|
+
- Commit your work (or leave it staged) — the orchestrator snapshots the tree.
|
|
29
|
+
- Report **honestly**. If you are blocked or unsure, say so. A false "done" is detected by
|
|
30
|
+
independent verification and lowers your trust score; an honest "blocked" does not.
|
ophar-0.1.0/CLAUDE.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Orchestrator delegation discipline (Opus)
|
|
2
|
+
|
|
3
|
+
You are **Opus, the orchestrator** of the Opus→Composer pipeline. Your job is to plan,
|
|
4
|
+
delegate, and verify — **not** to write product code yourself. The whole economic case for
|
|
5
|
+
this pipeline depends on your context staying thin and the dirty work going to the cheap
|
|
6
|
+
executor. Read `orchestrator-pipeline-plan.md` for the full design; this file is the
|
|
7
|
+
behavioral layer (the routine rules), and `state/STATE.md` is the live state.
|
|
8
|
+
|
|
9
|
+
## Session start (before trusting anything)
|
|
10
|
+
- Run `harness/reconcile.sh` FIRST. It checks `state/STATE.md`'s machine-checkable claims
|
|
11
|
+
against git/tests/files/ledger. Until it reports 0 discrepancies, treat the prose as a
|
|
12
|
+
hint, not truth.
|
|
13
|
+
|
|
14
|
+
## Delegate, don't code
|
|
15
|
+
- Do not edit product code in the target repo yourself. Write a task spec and dispatch the
|
|
16
|
+
executor. Your edits are limited to the harness, specs, and `state/`.
|
|
17
|
+
- Every task spec states **machine-checkable acceptance criteria** ("done" = tests/typecheck/
|
|
18
|
+
lint/held-out green + scope clean), never prose like "make it nice".
|
|
19
|
+
|
|
20
|
+
## Trust ground truth, never the report
|
|
21
|
+
- Decisions come from `ground-truth.sh` (git diff, tests, typecheck/lint, held-out, scope) —
|
|
22
|
+
never from the executor's `summary`/`status`/`claimed_success`. If you catch yourself
|
|
23
|
+
accepting based on the executor's narrative, that is the trust leak this project exists to
|
|
24
|
+
prevent.
|
|
25
|
+
|
|
26
|
+
## Keep your context thin
|
|
27
|
+
- Look at diffs + test-log tails, not whole repos. Do not read files wholesale.
|
|
28
|
+
- At a logical checkpoint or when context approaches the window, write `state/STATE.md` and
|
|
29
|
+
start a fresh session that rehydrates from disk + reconcile.
|
|
30
|
+
|
|
31
|
+
## State authorship
|
|
32
|
+
- You are the sole author of `state/STATE.md` and the ledger. Keep **volatile** state OUT of
|
|
33
|
+
this file (it loads into every session); put it in `state/`.
|
|
34
|
+
|
|
35
|
+
## Held-out (anti-overfit)
|
|
36
|
+
- Held-out checks are authored trusted-side only and never shown to the executor. On a
|
|
37
|
+
held-out failure, give a **generalized** hint ("require general correctness"), never the
|
|
38
|
+
held-out assertion itself — leaking it converts a hidden check into a visible test.
|
ophar-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 itsyourdecide
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
ophar-0.1.0/MANIFEST.in
ADDED
ophar-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,394 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: ophar
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Ophar — Opus plans, Composer builds, Harness verifies. Ground-truth verdicts, never executor self-report.
|
|
5
|
+
License-Expression: MIT
|
|
6
|
+
Requires-Python: >=3.11
|
|
7
|
+
Description-Content-Type: text/markdown
|
|
8
|
+
License-File: LICENSE
|
|
9
|
+
Requires-Dist: typer>=0.15
|
|
10
|
+
Requires-Dist: rich>=13
|
|
11
|
+
Requires-Dist: httpx>=0.28
|
|
12
|
+
Requires-Dist: mcp>=1.0
|
|
13
|
+
Requires-Dist: fastapi>=0.115
|
|
14
|
+
Requires-Dist: uvicorn[standard]>=0.34
|
|
15
|
+
Requires-Dist: pydantic>=2.0
|
|
16
|
+
Requires-Dist: websockets>=13
|
|
17
|
+
Provides-Extra: test
|
|
18
|
+
Requires-Dist: pytest>=7; extra == "test"
|
|
19
|
+
Dynamic: license-file
|
|
20
|
+
|
|
21
|
+
# Ophar
|
|
22
|
+
|
|
23
|
+
**Ophar** (Opus · Composer · Harness) - pair the smartest available model (Opus) with a
|
|
24
|
+
cheap, fast executor (Composer) so you can build large projects without paying Opus prices
|
|
25
|
+
for every line of code. Opus plans and delegates; Composer does the implementation in
|
|
26
|
+
isolated git worktrees. The harness decides accept/reject from independent *ground truth*
|
|
27
|
+
(the real diff, the real test run, scope, held-out checks) - not from the executor's report.
|
|
28
|
+
|
|
29
|
+
> 🇬🇧 English below · [🇷🇺 Русская версия ниже](#-ophar-русский)
|
|
30
|
+
|
|
31
|
+
---
|
|
32
|
+
|
|
33
|
+
## Why
|
|
34
|
+
|
|
35
|
+
**Opus** is the smartest model available right now, but running an entire large project
|
|
36
|
+
on it gets expensive fast. **Composer** is much cheaper and quicker for coding work, but
|
|
37
|
+
on its own it cannot hold architecture, long horizons, or a whole codebase together.
|
|
38
|
+
|
|
39
|
+
This pipeline combines the two and neutralizes the gap:
|
|
40
|
+
|
|
41
|
+
- **Opus** (orchestrator) - the brain: breaks work into tasks, writes specs with
|
|
42
|
+
machine-checkable acceptance criteria, keeps its own context thin.
|
|
43
|
+
- **Composer** (executor, e.g. `cursor-agent`) - the hands: fast, cheap implementation in
|
|
44
|
+
an isolated git worktree.
|
|
45
|
+
- **The harness** - independently verifies every result and is the *only* source of
|
|
46
|
+
verdicts.
|
|
47
|
+
|
|
48
|
+
**Trust boundary (a precaution, not the main idea):** the orchestrator should stay clean
|
|
49
|
+
and innocent - it decides from verified facts, not from the executor's narrative. The
|
|
50
|
+
executor only sees what it needs for the current task (scoped spec, no held-out checks, no
|
|
51
|
+
extra project context). That keeps Opus's window small and stops the cheap model from
|
|
52
|
+
polluting architectural decisions.
|
|
53
|
+
|
|
54
|
+
## Architecture
|
|
55
|
+
|
|
56
|
+
```
|
|
57
|
+
you ── natural language ──▶ Opus (orchestrator, via MCP)
|
|
58
|
+
│ authors spec + held-out checks
|
|
59
|
+
▼
|
|
60
|
+
run_in_composer ──▶ orchestrate.sh
|
|
61
|
+
│
|
|
62
|
+
┌─────────────────────────┼─────────────────────────┐
|
|
63
|
+
▼ ▼ ▼
|
|
64
|
+
dispatch.sh ground-truth.sh verdict.sh
|
|
65
|
+
(isolated worktree + (diff · tests · typecheck · accept / iterate /
|
|
66
|
+
headless executor + lint · scope · held-out §9) reject / block
|
|
67
|
+
structural scope guard) │
|
|
68
|
+
└──────────── GROUND TRUTH ───────────┘
|
|
69
|
+
│
|
|
70
|
+
▼ accept → land on orch/accepted/<task> (base untouched)
|
|
71
|
+
Opus explains the verified result ──▶ you
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
The executor's report is **untrusted input** by design (see trust boundary above); the
|
|
75
|
+
diff/test/scope/held-out bundle is the **only** trusted signal for accept/reject.
|
|
76
|
+
|
|
77
|
+
## Install (recommended)
|
|
78
|
+
|
|
79
|
+
**Requirements:** Python 3.11+, `git`, `bash`, `jq`. (`node` only for the JS toy repo.)
|
|
80
|
+
For real executor runs you also need the `cursor-agent` CLI.
|
|
81
|
+
|
|
82
|
+
### From GitHub (minimal steps)
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
pip install "ophar @ git+https://github.com/itsyourdecide/ophar.git"
|
|
86
|
+
ophar-setup
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
`ophar-setup` copies the pipeline bundle to `~/.local/share/ophar` (override with
|
|
90
|
+
`OPHAR_HOME`) and registers MCP in **Cursor** (`~/.cursor/mcp.json`) and **Claude Code**
|
|
91
|
+
(`claude mcp add --scope user`) in parallel.
|
|
92
|
+
|
|
93
|
+
Reload Cursor (Settings → MCP) or run `claude`.
|
|
94
|
+
|
|
95
|
+
### From a git checkout (development)
|
|
96
|
+
|
|
97
|
+
```bash
|
|
98
|
+
git clone https://github.com/itsyourdecide/ophar.git
|
|
99
|
+
cd ophar
|
|
100
|
+
./scripts/install.sh
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
This creates `.venv`, installs editable `ophar`, and runs `ophar-setup`.
|
|
104
|
+
|
|
105
|
+
### MCP config (manual)
|
|
106
|
+
|
|
107
|
+
If you prefer to wire MCP yourself:
|
|
108
|
+
|
|
109
|
+
```json
|
|
110
|
+
{
|
|
111
|
+
"mcpServers": {
|
|
112
|
+
"ophar": {
|
|
113
|
+
"command": "ophar-mcp",
|
|
114
|
+
"args": []
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
}
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
See [`docs/mcp.cursor.json.example`](docs/mcp.cursor.json.example). Claude Code:
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
claude mcp add --scope user ophar -- ophar-mcp
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
## Quickstart (developers)
|
|
127
|
+
|
|
128
|
+
After install:
|
|
129
|
+
|
|
130
|
+
```bash
|
|
131
|
+
bash scripts/setup-fixtures.sh # build the toy target repos (sandbox/, sandbox-py/)
|
|
132
|
+
for t in tests/*.sh; do bash "$t"; done # 11 gates, all green on the mock (zero quota)
|
|
133
|
+
bash harness/reconcile.sh # 0 discrepancies
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Everything above uses a **mock executor** - no API quota, no network.
|
|
137
|
+
|
|
138
|
+
## Using the orchestrator (via MCP)
|
|
139
|
+
|
|
140
|
+
The orchestrator is reached through the **`ophar` MCP server** (`ophar-mcp`), which exposes
|
|
141
|
+
the whole pipeline to any MCP client (e.g. Cursor, Claude Code) - no API key, it rides your
|
|
142
|
+
existing subscription.
|
|
143
|
+
|
|
144
|
+
- **tools** - `init_repo` (scaffold a target repo) and `run_in_composer` (dispatch + get
|
|
145
|
+
verified ground truth back);
|
|
146
|
+
- **instructions** - the orchestrator's operating manual (role, trust boundary, how to
|
|
147
|
+
author specs and held-out checks), auto-injected into the session;
|
|
148
|
+
- **resources** - `pipeline://state`, `pipeline://discipline`, `pipeline://plan`,
|
|
149
|
+
`pipeline://ledger` (live, read on demand).
|
|
150
|
+
|
|
151
|
+
Example: *"Fix the bug in `normalize_probability` in /path/to/repo; tests are in
|
|
152
|
+
`tests/`."* Opus authors a spec, dispatches Composer, and reports the **ground truth** -
|
|
153
|
+
not Composer's story.
|
|
154
|
+
|
|
155
|
+
## Using the CLI (`opctl`)
|
|
156
|
+
|
|
157
|
+
`opctl` manages the pipeline's server, tasks, and metrics (it is **not** the
|
|
158
|
+
orchestrator - that's the MCP path above):
|
|
159
|
+
|
|
160
|
+
```bash
|
|
161
|
+
opctl serve # start the FastAPI server (serial dispatch worker)
|
|
162
|
+
opctl tasks ... # submit / list / inspect tasks
|
|
163
|
+
opctl metrics # metrics dashboard
|
|
164
|
+
opctl system reconcile # check STATE.md claims against ground truth
|
|
165
|
+
opctl settings-set MAX_ITERATIONS 5
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
## How it works
|
|
169
|
+
|
|
170
|
+
- **Orchestrate loop** (`harness/orchestrate.sh`) - dispatch → ground truth → verdict,
|
|
171
|
+
iterating up to `MAX_ITERATIONS`. On accept it lands the result on a durable
|
|
172
|
+
`orch/accepted/<task>` branch and **never merges to your base** (that stays a human
|
|
173
|
+
decision). The throwaway worktree and scratch branch are reclaimed afterward.
|
|
174
|
+
- **Ground truth** (`harness/ground-truth.sh`) - the §6.2 trusted bundle: actual diff,
|
|
175
|
+
visible tests, optional typecheck/lint, scope, and held-out checks.
|
|
176
|
+
- **Held-out, anti-overfit (§9)** - checks authored trusted-side and *never shown to the
|
|
177
|
+
executor*. If the visible tests pass but held-out fails, the executor overfit - that is
|
|
178
|
+
caught and not accepted.
|
|
179
|
+
- **Structural scope guard** (`ENFORCE_SCOPE=1`) - during the executor's run the worktree
|
|
180
|
+
is read-only outside `allowed_scope`, so out-of-scope writes fail at the filesystem
|
|
181
|
+
layer (detection in ground truth stays as defense-in-depth).
|
|
182
|
+
- **Serial worker** - exactly one dispatch at a time (the harness uses a shared run
|
|
183
|
+
pointer); both the FastAPI worker and the MCP server enforce this.
|
|
184
|
+
|
|
185
|
+
## Project layout
|
|
186
|
+
|
|
187
|
+
```
|
|
188
|
+
harness/ the pipeline glue (bash) + mcp_server.py (the MCP orchestrator)
|
|
189
|
+
orchestrate.sh, dispatch.sh, ground-truth.sh, verdict.sh, iterate.sh, land.sh, ...
|
|
190
|
+
lib/ mock executor + mock claude (for zero-quota gates)
|
|
191
|
+
cli/ opctl - Typer CLI
|
|
192
|
+
server/ FastAPI server (routers, serial dispatch worker, registry)
|
|
193
|
+
tests/ 11 gate scripts (run on the mock)
|
|
194
|
+
tasks/ committed task-spec fixtures (T-0001/0002/1002)
|
|
195
|
+
heldout/ committed held-out fixtures (§9)
|
|
196
|
+
state/ STATE.md (soft state) + runtime ledger (gitignored)
|
|
197
|
+
scripts/ setup-fixtures.sh - regenerate the toy target repos
|
|
198
|
+
CLAUDE.md orchestrator delegation discipline
|
|
199
|
+
AGENTS.md executor boundaries
|
|
200
|
+
orchestrator-pipeline-plan.md full design & rationale
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
## Notes
|
|
204
|
+
|
|
205
|
+
- **Real executor runs cost quota.** Development uses the mock
|
|
206
|
+
(`CURSOR_AGENT_CMD=harness/lib/mock-cursor-agent.sh`). For real runs, pin the model
|
|
207
|
+
(`composer-2.5`) and keep batches small.
|
|
208
|
+
- **`SANDBOX`** defaults to `disabled` because `cursor-agent`'s own sandbox can't start on
|
|
209
|
+
every host (AppArmor); the harness still confines the executor via the isolated worktree
|
|
210
|
+
+ structural scope guard. Set `SANDBOX=enabled` where the cursor sandbox works.
|
|
211
|
+
- The full design lives in
|
|
212
|
+
[`orchestrator-pipeline-plan.md`](orchestrator-pipeline-plan.md).
|
|
213
|
+
|
|
214
|
+
## License
|
|
215
|
+
|
|
216
|
+
MIT © [itsyourdecide](https://github.com/itsyourdecide). See [LICENSE](LICENSE).
|
|
217
|
+
|
|
218
|
+
---
|
|
219
|
+
|
|
220
|
+
# 🇷🇺 Ophar (Русский)
|
|
221
|
+
|
|
222
|
+
**Ophar** (Opus · Composer · Harness) - связка самой умной доступной модели (Opus) с
|
|
223
|
+
дешёвым и быстрым исполнителем (Composer), чтобы вести большие проекты без оплаты Opus за
|
|
224
|
+
каждую строчку кода. Opus планирует и делегирует; Composer делает реализацию в
|
|
225
|
+
изолированных git-worktree. Harness принимает accept/reject по независимой *ground truth*
|
|
226
|
+
(реальный diff, прогон тестов, scope, held-out) - а не по отчёту исполнителя.
|
|
227
|
+
|
|
228
|
+
## Зачем
|
|
229
|
+
|
|
230
|
+
**Opus** - самая умная модель из доступных сейчас, но гонять на нём целый большой проект
|
|
231
|
+
дорого. **Composer** - намного дешевле и быстрее на кодинге, но сам по себе не тянет
|
|
232
|
+
архитектуру, длинный горизонт и целостность большой кодовой базы.
|
|
233
|
+
|
|
234
|
+
Этот пайплайн связывает их и нивелирует разрыв:
|
|
235
|
+
|
|
236
|
+
- **Opus** (оркестратор) - мозг: дробит работу на задачи, пишет спеки с машинно-
|
|
237
|
+
проверяемыми критериями приёмки, держит свой контекст тонким.
|
|
238
|
+
- **Composer** (исполнитель, напр. `cursor-agent`) - руки: быстрая дешёвая реализация в
|
|
239
|
+
изолированном git-worktree.
|
|
240
|
+
- **Харнесс** - независимо проверяет каждый результат и является *единственным* источником
|
|
241
|
+
вердиктов.
|
|
242
|
+
|
|
243
|
+
**Граница доверия (мера осторожности, не главная идея):** оркестратор должен оставаться
|
|
244
|
+
чистым и невинным - он решает по проверенным фактам, а не по нарративу исполнителя.
|
|
245
|
+
Исполнитель видит только то, что нужно для текущей задачи (scope-спека, без held-out, без
|
|
246
|
+
лишнего контекста проекта). Так окно Opus остаётся маленьким, а дешёвая модель не
|
|
247
|
+
засоряет архитектурные решения.
|
|
248
|
+
|
|
249
|
+
## Архитектура
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
ты ── естественный язык ──▶ Opus (оркестратор, через MCP)
|
|
253
|
+
│ пишет спеку + held-out проверки
|
|
254
|
+
▼
|
|
255
|
+
run_in_composer ──▶ orchestrate.sh
|
|
256
|
+
│
|
|
257
|
+
┌─────────────────────────┼─────────────────────────┐
|
|
258
|
+
▼ ▼ ▼
|
|
259
|
+
dispatch.sh ground-truth.sh verdict.sh
|
|
260
|
+
(изолированный worktree + (diff · тесты · typecheck · accept / iterate /
|
|
261
|
+
headless-исполнитель + lint · scope · held-out §9) reject / block
|
|
262
|
+
структурный scope-guard) │
|
|
263
|
+
└──────────── GROUND TRUTH ───────────┘
|
|
264
|
+
│
|
|
265
|
+
▼ accept → land в orch/accepted/<task> (база не тронута)
|
|
266
|
+
Opus объясняет проверенный результат ──▶ тебе
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
Отчёт исполнителя - **недоверенный вход** по задумке (см. границу доверия выше); связка
|
|
270
|
+
diff/тесты/scope/held-out - **единственный** доверенный сигнал для accept/reject.
|
|
271
|
+
|
|
272
|
+
## Установка (рекомендуется)
|
|
273
|
+
|
|
274
|
+
**Требуется:** Python 3.11+, `git`, `bash`, `jq`. (`node` - только для JS-песочницы.)
|
|
275
|
+
Для реальных прогонов нужен CLI `cursor-agent`.
|
|
276
|
+
|
|
277
|
+
### С GitHub (минимум шагов)
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
pip install "ophar @ git+https://github.com/itsyourdecide/ophar.git"
|
|
281
|
+
ophar-setup
|
|
282
|
+
```
|
|
283
|
+
|
|
284
|
+
`ophar-setup` копирует bundle в `~/.local/share/ophar` (или `OPHAR_HOME`) и параллельно
|
|
285
|
+
регистрирует MCP в **Cursor** и **Claude Code**. Перезагрузи Cursor или запусти `claude`.
|
|
286
|
+
|
|
287
|
+
### Из git-репозитория (разработка)
|
|
288
|
+
|
|
289
|
+
```bash
|
|
290
|
+
git clone https://github.com/itsyourdecide/ophar.git
|
|
291
|
+
cd ophar
|
|
292
|
+
./scripts/install.sh
|
|
293
|
+
```
|
|
294
|
+
|
|
295
|
+
### MCP вручную
|
|
296
|
+
|
|
297
|
+
```bash
|
|
298
|
+
claude mcp add --scope user ophar -- ophar-mcp
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
Пример для Cursor: [`docs/mcp.cursor.json.example`](docs/mcp.cursor.json.example).
|
|
302
|
+
|
|
303
|
+
## Быстрый старт (разработчикам)
|
|
304
|
+
|
|
305
|
+
После установки:
|
|
306
|
+
|
|
307
|
+
```bash
|
|
308
|
+
bash scripts/setup-fixtures.sh # создать игрушечные репо-цели (sandbox/, sandbox-py/)
|
|
309
|
+
for t in tests/*.sh; do bash "$t"; done # 11 гейтов, все зелёные на моке (без quota)
|
|
310
|
+
bash harness/reconcile.sh # 0 расхождений
|
|
311
|
+
```
|
|
312
|
+
|
|
313
|
+
Всё выше работает на **моке исполнителя** - без quota и без сети.
|
|
314
|
+
|
|
315
|
+
## Использование оркестратора (через MCP)
|
|
316
|
+
|
|
317
|
+
Оркестратор доступен через **MCP-сервер `ophar`** (`ophar-mcp`), который отдаёт весь
|
|
318
|
+
пайплайн любому MCP-клиенту (Cursor, Claude Code) - без API-ключа, на твоей подписке.
|
|
319
|
+
|
|
320
|
+
Дальше просто запусти `claude` и общайся. MCP-сервер отдаёт всё для работы с пайплайном:
|
|
321
|
+
|
|
322
|
+
- **tools** - `init_repo` (создать репо-цель) и `run_in_composer` (диспатч + проверенная
|
|
323
|
+
ground truth обратно);
|
|
324
|
+
- **instructions** - операционный мануал оркестратора (роль, граница доверия, как писать
|
|
325
|
+
спеки и held-out), автоматически вшивается в сессию;
|
|
326
|
+
- **resources** - `pipeline://state`, `pipeline://discipline`, `pipeline://plan`,
|
|
327
|
+
`pipeline://ledger` (живые, читаются по требованию).
|
|
328
|
+
|
|
329
|
+
Пример: *«Исправь баг в `normalize_probability` в /path/to/repo; тесты в `tests/`.»* Opus
|
|
330
|
+
пишет спеку, диспатчит Composer и докладывает **ground truth** - а не историю Composer.
|
|
331
|
+
|
|
332
|
+
## Использование CLI (`opctl`)
|
|
333
|
+
|
|
334
|
+
`opctl` управляет сервером, задачами и метриками пайплайна (это **не** оркестратор - он
|
|
335
|
+
через MCP выше):
|
|
336
|
+
|
|
337
|
+
```bash
|
|
338
|
+
opctl serve # запустить FastAPI-сервер (серийный воркер диспатча)
|
|
339
|
+
opctl tasks ... # submit / list / inspect задач
|
|
340
|
+
opctl metrics # дашборд метрик
|
|
341
|
+
opctl system reconcile # сверить claims STATE.md с ground truth
|
|
342
|
+
opctl settings-set MAX_ITERATIONS 5
|
|
343
|
+
```
|
|
344
|
+
|
|
345
|
+
## Как это работает
|
|
346
|
+
|
|
347
|
+
- **Цикл оркестрации** (`harness/orchestrate.sh`) - диспатч → ground truth → вердикт, с
|
|
348
|
+
итерациями до `MAX_ITERATIONS`. При accept результат лэндится на durable-ветку
|
|
349
|
+
`orch/accepted/<task>` и **никогда не мержится в твою базу** (это решение человека).
|
|
350
|
+
Временный worktree и scratch-ветка убираются после.
|
|
351
|
+
- **Ground truth** (`harness/ground-truth.sh`) - доверенная связка §6.2: реальный diff,
|
|
352
|
+
видимые тесты, опционально typecheck/lint, scope и held-out.
|
|
353
|
+
- **Held-out, анти-оверфит (§9)** - проверки пишутся на доверенной стороне и *никогда не
|
|
354
|
+
показываются исполнителю*. Если видимые тесты прошли, а held-out упал - исполнитель
|
|
355
|
+
переобучился, это ловится и не принимается.
|
|
356
|
+
- **Структурный scope-guard** (`ENFORCE_SCOPE=1`) - во время прогона исполнителя worktree
|
|
357
|
+
доступен только на запись внутри `allowed_scope`, так что запись вне scope падает на
|
|
358
|
+
уровне ФС (детект в ground truth остаётся как defense-in-depth).
|
|
359
|
+
- **Серийный воркер** - ровно один диспатч за раз (харнесс использует общий указатель
|
|
360
|
+
прогона); это обеспечивают и FastAPI-воркер, и MCP-сервер.
|
|
361
|
+
|
|
362
|
+
## Структура проекта
|
|
363
|
+
|
|
364
|
+
```
|
|
365
|
+
harness/ связка пайплайна (bash) + mcp_server.py (MCP-оркестратор)
|
|
366
|
+
orchestrate.sh, dispatch.sh, ground-truth.sh, verdict.sh, iterate.sh, land.sh, ...
|
|
367
|
+
lib/ мок-исполнитель + мок-claude (для гейтов без quota)
|
|
368
|
+
cli/ opctl - Typer CLI
|
|
369
|
+
server/ FastAPI-сервер (роутеры, серийный воркер диспатча, реестр)
|
|
370
|
+
tests/ 11 скриптов-гейтов (на моке)
|
|
371
|
+
tasks/ закоммиченные фикстуры спек (T-0001/0002/1002)
|
|
372
|
+
heldout/ закоммиченные held-out фикстуры (§9)
|
|
373
|
+
state/ STATE.md (soft state) + рантайм-ledger (gitignored)
|
|
374
|
+
scripts/ setup-fixtures.sh - пересоздать игрушечные репо-цели
|
|
375
|
+
CLAUDE.md дисциплина делегирования оркестратора
|
|
376
|
+
AGENTS.md границы исполнителя
|
|
377
|
+
orchestrator-pipeline-plan.md полный дизайн и обоснование
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
## Примечания
|
|
381
|
+
|
|
382
|
+
- **Реальные прогоны исполнителя тратят quota.** Разработка идёт на моке
|
|
383
|
+
(`CURSOR_AGENT_CMD=harness/lib/mock-cursor-agent.sh`). Для реальных прогонов пинуй модель
|
|
384
|
+
(`composer-2.5`) и держи батчи маленькими.
|
|
385
|
+
- **`SANDBOX`** по умолчанию `disabled`, потому что собственный сэндбокс `cursor-agent`
|
|
386
|
+
стартует не на каждом хосте (AppArmor); харнесс всё равно ограничивает исполнителя через
|
|
387
|
+
изолированный worktree + структурный scope-guard. Ставь `SANDBOX=enabled` там, где
|
|
388
|
+
сэндбокс cursor работает.
|
|
389
|
+
- Полный дизайн - в
|
|
390
|
+
[`orchestrator-pipeline-plan.md`](orchestrator-pipeline-plan.md).
|
|
391
|
+
|
|
392
|
+
## Лицензия
|
|
393
|
+
|
|
394
|
+
MIT © [itsyourdecide](https://github.com/itsyourdecide). См. [LICENSE](LICENSE).
|