challenge-plans 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,18 @@
1
+ name: ci
2
+ on:
3
+ push:
4
+ branches: [main]
5
+ pull_request:
6
+ jobs:
7
+ test:
8
+ runs-on: ubuntu-latest
9
+ strategy:
10
+ matrix:
11
+ python-version: ["3.10", "3.12"]
12
+ steps:
13
+ - uses: actions/checkout@v4
14
+ - uses: actions/setup-python@v5
15
+ with:
16
+ python-version: ${{ matrix.python-version }}
17
+ - run: pip install -e ".[dev]"
18
+ - run: pytest -q
@@ -0,0 +1,33 @@
1
+ name: release
2
+
3
+ # Publish to PyPI via Trusted Publishing (OIDC) — no long-lived token stored.
4
+ # Trigger: push a version tag, e.g. `git tag v0.1.0 && git push origin v0.1.0`.
5
+ on:
6
+ push:
7
+ tags: ["v*"]
8
+
9
+ jobs:
10
+ publish:
11
+ name: Build & publish to PyPI
12
+ runs-on: ubuntu-latest
13
+ environment:
14
+ name: pypi
15
+ url: https://pypi.org/p/challenge-plans
16
+ permissions:
17
+ id-token: write # required for PyPI Trusted Publishing (OIDC)
18
+ steps:
19
+ - uses: actions/checkout@v4
20
+ - uses: actions/setup-python@v5
21
+ with:
22
+ python-version: "3.12"
23
+ - name: Run tests (never publish a red build)
24
+ run: |
25
+ python -m pip install --upgrade pip
26
+ python -m pip install -e ".[dev]"
27
+ pytest -q
28
+ - name: Build sdist + wheel
29
+ run: |
30
+ python -m pip install --upgrade build
31
+ python -m build
32
+ - name: Publish to PyPI
33
+ uses: pypa/gh-action-pypi-publish@release/v1
@@ -0,0 +1,9 @@
1
+ __pycache__/
2
+ *.py[cod]
3
+ *.egg-info/
4
+ .pytest_cache/
5
+ .venv/
6
+ venv/
7
+ dist/
8
+ build/
9
+ .DS_Store
@@ -0,0 +1,24 @@
1
+ # Contributing
2
+
3
+ Thanks for your interest in challenge-plans.
4
+
5
+ ## Dev setup
6
+
7
+ ```bash
8
+ pip install -e ".[dev]" # installs the package + pytest
9
+ pytest # pythonpath/testpaths are preconfigured
10
+ challenge-plans doctor # check your backend CLIs are logged in
11
+ ```
12
+
13
+ Python ≥ 3.10. The only runtime dependency is PyYAML.
14
+
15
+ ## Ground rules
16
+
17
+ - **Tests pin behavior.** Every non-trivial change should keep `pytest` green; add a test for new behavior. The suite encodes the invariants established across the project's adversarial-review rounds — don't loosen them without a reason.
18
+ - **Dogfood it.** This is an adversarial-review tool; we review our own plans and diffs with it. Running `challenge-plans run <your-change>.diff --type diff` before opening a PR is encouraged.
19
+ - **No silent simplification of guarantees.** The integrity, verification, and false-consensus guards (see README) are load-bearing; if you change one, say so explicitly in the PR.
20
+ - **Keep it backend-neutral.** The tool must work with at least one subscription CLI and degrade gracefully (advisory) when only one model family is available.
21
+
22
+ ## Reporting issues
23
+
24
+ Open a GitHub issue with the command you ran, the JSON/Markdown output it printed (redact anything sensitive), and what you expected.
@@ -0,0 +1,201 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or Derivative
95
+ Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and do
117
+ not modify the License. You may add Your own attribution notices
118
+ within Derivative Works that You distribute, alongside or as an
119
+ addendum to the NOTICE text of the Work, provided that such
120
+ additional attribution notices cannot be construed as modifying
121
+ the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright 2026 challenge-plans contributors
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
@@ -0,0 +1,178 @@
1
+ Metadata-Version: 2.4
2
+ Name: challenge-plans
3
+ Version: 0.1.0
4
+ Summary: Multi-agent adversarial review & deliberation for plans/specs on subscription CLIs (reduce rework before execution)
5
+ Project-URL: Homepage, https://github.com/hiadrianchen/challenge-plans
6
+ Project-URL: Repository, https://github.com/hiadrianchen/challenge-plans
7
+ License-Expression: Apache-2.0
8
+ License-File: LICENSE
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: Programming Language :: Python :: 3
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Topic :: Software Development :: Quality Assurance
14
+ Requires-Python: >=3.10
15
+ Requires-Dist: pyyaml>=6
16
+ Provides-Extra: dev
17
+ Requires-Dist: pytest>=8; extra == 'dev'
18
+ Description-Content-Type: text/markdown
19
+
20
+ # challenge-plans
21
+
22
+ [![Python](https://img.shields.io/badge/python-%E2%89%A53.10-blue.svg)]()
23
+ [![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)
24
+ [![CI](https://img.shields.io/github/actions/workflow/status/hiadrianchen/challenge-plans/ci.yml?branch=main)](https://github.com/hiadrianchen/challenge-plans/actions)
25
+
26
+ > 中文文档: [README-zh.md](README-zh.md)
27
+
28
+ **Adversarially review your plan or spec before you execute it — across the coding CLIs you already have logged in. No API keys.**
29
+
30
+ `challenge-plans` orchestrates the subscription AI coding CLIs already on your machine (Claude Code, Codex, …) to cross-examine a plan/spec and surface the flaws that cause rework downstream — and to vote across options when you're unsure. It also reviews a raw `git diff` as a lightweight **code review** pass, and drops in as an **agent skill**. It runs on your existing subscriptions, so there are no per-token API charges. It slots into the superpowers plan lifecycle: `writing-plans → challenge-plans → executing-plans`.
31
+
32
+ ```text
33
+ $ challenge-plans run plan.md --type spec --profile standard --sink markdown
34
+
35
+ # challenge-plans · challenge · verdict: request_changes
36
+ - panel: expected 3 / collected 3 · complete ✓
37
+ - diversity: 2 families
38
+ - verified: 3 high/critical reviewed by Verifier (✓ verified, may hard-gate; ? unverified, advisory)
39
+ - surviving objections: 4
40
+
41
+ - [high✓] sensitive data sent to a third-party LLM with no privacy boundary @L42-43 (security_or_privacy_boundary, by claude:scope-boundary)
42
+ - [high✓] "schema-aligned" claimed but there's no contract test @L12-30 (integration_contract_gap, by gpt:correctness)
43
+ - [high✓] no measurable acceptance threshold @L1 (contract_violation, by preflight)
44
+ - [medium?] missing_fields vs null semantics left undefined @L32-34 (ambiguity_to_wrong_implementation)
45
+ ```
46
+
47
+ ## Why challenge-plans
48
+
49
+ - 🔑 **No API keys, no per-token charges** — it drives the subscription CLIs you're already logged into (Claude Code, Codex). Bring at least one.
50
+ - 🧪 **Evidence beats headcount** — a minority objection with a reproduction can override a majority vote; correctness is not decided by voting.
51
+ - 🤝 **Cross-family verification** — an objection only earns hard-gate authority (`✓`) when an *independent model family* reproduces it with concrete, line-anchored evidence. Single-model claims stay advisory.
52
+ - 🛡️ **Guards 7 known multi-agent failure modes** — vote loss, option anchoring, premature hand-off, majority-over-minority, single-round complacency, false consensus, false convergence. Each was hit (and fixed) while building this tool with its own adversarial process.
53
+ - 🌍 **Reads in your language** — the codebase is English, but `--lang zh` (or `ja`, `de`, `fr`, …) makes every reviewer write its findings in your language while JSON keys and line anchors stay machine-stable. One flag, no separate build — see [Output in your language](#output-in-your-language).
54
+
55
+ ## Quickstart
56
+
57
+ Requires Python ≥ 3.10 (PyYAML installs automatically). Bring at least one logged-in coding CLI — **Claude Code** (`claude`) or **OpenAI Codex** (`codex`); two different vendors unlock cross-family verification.
58
+
59
+ ```bash
60
+ git clone https://github.com/hiadrianchen/challenge-plans && cd challenge-plans
61
+ pip install -e . # exposes the `challenge-plans` command
62
+ challenge-plans doctor # which backend CLIs are logged in
63
+ challenge-plans run examples/spec-sample.md --type spec --sink markdown # see a verdict on the bundled sample
64
+ ```
65
+
66
+ Hand the repo to your coding agent instead — *"Install and set up challenge-plans from this repo, then run `challenge-plans doctor`"* — and it'll do the above. To use it **as an agent skill**, drop [SKILL.md](SKILL.md) where your agent discovers skills.
67
+
68
+ ## Use
69
+
70
+ ```bash
71
+ challenge-plans doctor # which backends are ready
72
+ challenge-plans run path/to/spec.md --type spec --profile standard --sink markdown # harden a plan/spec
73
+ challenge-plans run change.diff --type diff --sink markdown # review a git diff
74
+ challenge-plans weigh path/to/options.yaml --profile standard --sink markdown # vote across options
75
+ challenge-plans run path/to/spec.md --enforce # CI gate: non-approve exits non-zero
76
+ challenge-plans run path/to/spec.md --type spec --sink markdown --lang zh # findings written in Chinese
77
+ # not pip-installed? prefix with: PYTHONPATH=src python3 -m challenge_plans.cli ...
78
+ ```
79
+
80
+ Ready-to-run samples live in [`examples/`](examples/) (`spec-sample.md`, `options.yaml`). `options.yaml`:
81
+ ```yaml
82
+ question: Refactor auth with approach A or B?
83
+ options:
84
+ - id: A
85
+ text: One-shot rewrite — concentrated risk, clean result
86
+ - id: B
87
+ text: Incremental migration — slower, every step reversible
88
+ ```
89
+
90
+ - `--profile fast|standard|deep`, `--sink stdout|markdown`, `--enforce` (non-approve verdicts exit non-zero; advisory exit 0 by default).
91
+ - `--lang <code>` writes the human-readable output in your language (default `en`) — see [below](#output-in-your-language).
92
+ - `[sev✓]` = cross-family verified, may hard-gate; `[sev?]` = unverified, advisory only.
93
+ - **Artifact types:** `--type spec` and `--type diff` are supported; `plan` / `decision` are reserved (rubric pending).
94
+
95
+ The bundled [SKILL.md](SKILL.md) routes **review/QA** of a plan/spec to `run` automatically; option-voting is the `weigh` subcommand.
96
+
97
+ ### Output in your language
98
+
99
+ challenge-plans ships an English codebase, but the reviewers can answer in **any** language — just add `--lang`:
100
+
101
+ ```bash
102
+ challenge-plans run plan.md --type spec --lang zh # objections, evidence, reproductions in Chinese
103
+ challenge-plans weigh options.yaml --lang ja # deliberation reasons in Japanese
104
+ ```
105
+
106
+ `--lang` only switches the **human-readable prose** (steelman, titles, evidence, reproductions, vote reasons). JSON keys, enum values, and `L12-15` line anchors stay verbatim, so parsing, dedup, and CI gates are unaffected. It's equivalent to exporting `CHALLENGE_PLANS_LANG` once. There's no separate translated build to maintain — the same English source localizes on demand.
107
+
108
+ **As an agent skill:** your agent just passes `--lang <your-language>` and the whole cross-review comes back localized. The bundled [SKILL.md](SKILL.md) documents the flag so the calling agent can set it from the user's language automatically.
109
+
110
+ ## Two modes
111
+
112
+ challenge-plans isn't one feature — it's two modes on one engine. The **calling agent routes by intent**; the user never has to pick:
113
+
114
+ | | **challenge** (adversarial) | **weigh-options** (deliberation) |
115
+ |---|---|---|
116
+ | When | You have a **drafted** plan/spec to poke holes in / harden | You have **several options / a pile of to-dos** and aren't sure which |
117
+ | Routing signal | a single drafted artifact + "review / find flaws / can this execute" | multiple candidates + "which one / rank these / is it worth it" |
118
+ | Aggregation | **Evidence survival** — a minority can be right, **no majority vote** | **Weighted majority + exposed dissent** — only genuine trade-offs get voted on |
119
+ | Output | 6-state verdict + surviving objections + reproductions / counter-evidence | ranked options + vote tally + strongest dissent |
120
+
121
+ **The agent picks the mode — it isn't dumped on the user:** it reads the intent and routes "review a drafted artifact" to adversarial mode and "choose among options" to deliberation, with deterministic routing signals defining the boundary. During deliberation, if an option is flagged with a **mechanically verifiable blocker**, the recommendation is **downgraded to `discuss` and you're asked to verify it in challenge mode** rather than adopting it outright — so a vote can never outweigh a falsifiable minority objection.
122
+
123
+ ## How it works
124
+
125
+ **Adversarial mode** (reduce-rework loop):
126
+ ```
127
+ drafted artifact + bounded context
128
+ → multiple persona/CLI challengers each steelman → find flaws (bound to specific text, no hedging)
129
+ → Verifier (cross-family) produces a minimal reproduction / contradicting source line
130
+ → dedup by canonical key + evidence-survival
131
+ → single verdict pipeline → 6-state verdict + panel-integrity check
132
+ → (--deep: multi-round to two-condition convergence)
133
+ ```
134
+
135
+ **Deliberation mode** — the methodology is a strict three-phase flow. The `weigh` CLI implements phase ③ (it votes on the options you hand it); phases ①② are the calling agent's responsibility before invoking it — **no shortcuts**:
136
+ ```
137
+ ① align (agent) share full background with every voter first — the question, constraints, known facts — don't pre-supply options
138
+ ② collect (agent) each voter independently, unseen by the others and not fed the orchestrator's preferences, generates candidates → dedup/cluster into an option pool
139
+ ③ vote `challenge-plans weigh` votes on that option pool (model_family-weighted to block false consensus) → ranking + tally + dissent
140
+ hands back to a human only on a tie / missing votes; otherwise closes the loop and returns a result
141
+ ```
142
+
143
+ ## What it guards against — 7 multi-agent failure modes
144
+
145
+ These traps are ones a naive multi-agent setup almost always falls into — and ones **we hit ourselves while building this tool with its own adversarial process**. Each guard is built into the design, and the design is dogfooded:
146
+
147
+ 1. **Vote/finding loss** — a challenger is truncated/timed-out/unparseable and the system silently aggregates a partial panel. **Guard:** machine-readable capture + per-voter integrity self-check; missing votes never approve or declare a majority.
148
+ 2. **Option anchoring** — the orchestrator only offers its own pre-picked options, so agents merely ratify the framing. **Guard:** deliberation always diverges (generate first, vote second); voters aren't fed the orchestrator's preferences.
149
+ 3. **Premature hand-off** — the orchestrator bounces the open decision back to the human mid-way instead of finishing the vote. **Guard:** close the loop and return a result; hand back only on a tie / missing votes.
150
+ 4. **Majority over minority** — out-voting a minority that has a reproducible blocker. **Guard:** two modes with split aggregation + the escape gate; adversarial mode bans voting and lets evidence beat headcount.
151
+ 5. **Single-round complacency** — one pass declared sufficient. **Guard:** `--deep` multi-round to convergence + adversarial review of the code itself before shipping.
152
+ 6. **False consensus** — same-model personas counted as independent votes, so one model's bias gets cloned into a "majority". **Guard:** per-`model_family` weight cap, raw/weighted both shown, single-family warning.
153
+ 7. **False convergence** — declaring "done" when no *new* objection appeared but an old blocker is still open. **Guard:** two-condition convergence (new_surviving == 0 **and** unresolved_required == 0).
154
+
155
+ ## Backends
156
+
157
+ challenge-plans drives whatever subscription coding CLI you already have logged in — e.g. **Claude Code** (`claude`) or **OpenAI Codex** (`codex`). You don't need any specific one. With **two different vendors** it can cross-verify findings; with one, results stay advisory. No API keys, and no per-token API charges from this tool (`doctor` checks the CLIs are logged in, not your billing; usage still counts against your normal subscription limits).
158
+
159
+ ## Status
160
+
161
+ **v1 — usable.** Both modes work end-to-end, validated against a real spec and pinned by a pytest suite, hardened across multiple cross-agent adversarial-review rounds.
162
+
163
+ Known boundaries (also reflected in the run output): concern dedup is exact-anchor only; no idle-timeout (wall-clock only); deliberation blockers are flagged, not yet auto-verified by the Verifier; the open-decision divergence phase is the calling agent's job; `manual_paste`/Gemini adapters are follow-ups.
164
+
165
+ ## Testing
166
+
167
+ ```bash
168
+ pip install -e ".[dev]" && pytest # pythonpath/testpaths preconfigured
169
+ ```
170
+ The suite pins every invariant established across the adversarial-review rounds.
171
+
172
+ ## Contributing
173
+
174
+ Issues and PRs welcome — see [CONTRIBUTING.md](CONTRIBUTING.md). The project is dogfooded: reviewing your own change with `challenge-plans run <change>.diff --type diff` before opening a PR is encouraged.
175
+
176
+ ## License
177
+
178
+ [Apache-2.0](LICENSE).
@@ -0,0 +1,159 @@
1
+ # challenge-plans(中文)
2
+
3
+ [![Python](https://img.shields.io/badge/python-%E2%89%A53.10-blue.svg)]()
4
+ [![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)
5
+ [![CI](https://img.shields.io/github/actions/workflow/status/hiadrianchen/challenge-plans/ci.yml?branch=main)](https://github.com/hiadrianchen/challenge-plans/actions)
6
+
7
+ > English: [README.md](README.md)
8
+
9
+ **在执行一份 plan/spec 之前,用你已登录的编码 CLI 对它做多 agent 对抗式评审。无需 API key。**
10
+
11
+ `challenge-plans` 编排你本机已有的订阅编码 CLI(Claude Code、Codex…)交叉拷问一份 plan/spec,把会导致后续返工的坑提前挖出来;拿不准时还能对多个选项投票。它也能审一份 `git diff`、当一次轻量 **code review**,并可作为 **agent skill** 接入。它跑在你已有的订阅上,没有按 token 的 API 费用。卡进 superpowers 计划生命周期:`writing-plans → challenge-plans → executing-plans`。
12
+
13
+ ```text
14
+ $ challenge-plans run plan.md --type spec --profile standard --sink markdown
15
+
16
+ # challenge-plans · challenge · verdict: request_changes
17
+ - panel: expected 3 / collected 3 · complete ✓
18
+ - diversity: 2 families
19
+ - verified: 3 high/critical reviewed by Verifier (✓ verified, may hard-gate; ? unverified, advisory)
20
+ - surviving objections: 4
21
+
22
+ - [high✓] sensitive data sent to a third-party LLM with no privacy boundary @L42-43 (security_or_privacy_boundary, by claude:scope-boundary)
23
+ - [high✓] "schema-aligned" claimed but there's no contract test @L12-30 (integration_contract_gap, by gpt:correctness)
24
+ - [high✓] no measurable acceptance threshold @L1 (contract_violation, by preflight)
25
+ - [medium?] missing_fields vs null semantics left undefined @L32-34 (ambiguity_to_wrong_implementation)
26
+ ```
27
+
28
+ ## 为什么用 challenge-plans
29
+
30
+ - 🔑 **无 API key、不按量计费** —— 驱动你已登录的订阅 CLI(Claude Code、Codex),至少有一个即可。
31
+ - 🧪 **证据胜过人数** —— 一条带可复现反证的少数派异议可以压过多数票;正确性不靠投票决定。
32
+ - 🤝 **跨家族验证** —— 一条异议只有当**另一个独立模型家族**用具体、带行锚的证据复现后,才获得硬 gate 资格(`✓`);单模型声明只算 advisory。
33
+ - 🛡️ **内建防住 7 个多 agent 编排失败模式** —— 票据丢失、选项锚定、半途甩锅、多数压少数、单轮即收、虚假共识、假收敛。每一个都是开发本工具时用它自己的对抗流程真实踩中并修掉的。
34
+ - 🌍 **按你的语言输出** —— 源码是英文,但 `--lang zh`(或 `ja`、`de`、`fr`…)让每个评审都用你的语言写结论,而 JSON 键与行锚保持机器稳定。一个参数搞定,不另维护翻译版本 —— 见 [按你的语言输出](#按你的语言输出)。
35
+
36
+ ## 快速开始
37
+
38
+ 需要 Python ≥ 3.10(PyYAML 自动安装)。至少有一个已登录的编码 CLI —— **Claude Code**(`claude`)或 **OpenAI Codex**(`codex`);两个不同厂商解锁跨家族验证。
39
+
40
+ ```bash
41
+ git clone https://github.com/hiadrianchen/challenge-plans && cd challenge-plans
42
+ pip install -e . # 暴露 `challenge-plans` 命令
43
+ challenge-plans doctor # 看哪些后端 CLI 已登录
44
+ challenge-plans run examples/spec-sample.md --type spec --sink markdown # 在自带样例上跑出一个 verdict
45
+ ```
46
+
47
+ 也可以把 repo 交给你的编码 agent —— *“从这个 repo 安装并配置好 challenge-plans,然后跑 `challenge-plans doctor`”* —— 它会替你做上面这些。**作为 agent skill**:把 [SKILL.md](SKILL.md) 放到你 agent 发现 skill 的目录即可。
48
+
49
+ ## 使用
50
+
51
+ ```bash
52
+ challenge-plans doctor # 哪些后端就绪
53
+ challenge-plans run path/to/spec.md --type spec --profile standard --sink markdown # 硬化一份 plan/spec
54
+ challenge-plans run change.diff --type diff --sink markdown # 审一份 git diff
55
+ challenge-plans weigh path/to/options.yaml --profile standard --sink markdown # 在多个选项间投票
56
+ challenge-plans run path/to/spec.md --enforce # CI gate:非 approve 退非零
57
+ challenge-plans run path/to/spec.md --type spec --sink markdown --lang zh # 异议/证据用中文输出
58
+ # 未 pip 安装时前缀:PYTHONPATH=src python3 -m challenge_plans.cli ...
59
+ ```
60
+
61
+ [`examples/`](examples/) 有可直接跑的样例(`spec-sample.md`、`options.yaml`)。`options.yaml`:
62
+ ```yaml
63
+ question: 重构鉴权用方案 A 还是 B?
64
+ options:
65
+ - id: A
66
+ text: 一次性重写——风险集中但干净
67
+ - id: B
68
+ text: 渐进迁移——慢但每步可回滚
69
+ ```
70
+
71
+ - `--profile fast|standard|deep`、`--sink stdout|markdown`、`--enforce`(非 approve 退非零;默认 advisory 退 0)。
72
+ - `--lang <代码>` 让人类可读输出用你的语言(默认 `en`)—— 见 [下文](#按你的语言输出)。
73
+ - `[sev✓]` = 跨家族已验证、可硬 gate;`[sev?]` = 未验证、仅 advisory。
74
+ - **artifact 类型:** `--type spec` 与 `--type diff` 均可用;`plan` / `decision` 保留(rubric 待补)。
75
+
76
+ bundled 的 [SKILL.md](SKILL.md) 自动把“审/QA 一份 plan/spec”路由到 `run`;投票走 `weigh` 子命令。
77
+
78
+ ### 按你的语言输出
79
+
80
+ challenge-plans 源码是英文的,但评审可以用**任意**语言作答 —— 加上 `--lang` 即可:
81
+
82
+ ```bash
83
+ challenge-plans run plan.md --type spec --lang zh # 异议、证据、复现用中文
84
+ challenge-plans weigh options.yaml --lang ja # 议事理由用日语
85
+ ```
86
+
87
+ `--lang` 只切换**人类可读的文字**(steelman、标题、证据、复现、投票理由)。JSON 键、枚举值、`L12-15` 行锚保持原样,所以解析、去重、CI gate 都不受影响。等价于设一次 `CHALLENGE_PLANS_LANG`。没有另一份翻译版本要维护 —— 同一份英文源按需本地化。
88
+
89
+ **作为 agent skill:** agent 只要传 `--lang <用户语言>`,整份交叉 review 就用该语言返回。bundled 的 [SKILL.md](SKILL.md) 写明了这个参数,方便调用 agent 据用户语言自动设置。
90
+
91
+ ## 两种模式
92
+
93
+ challenge-plans 不是单一功能,而是同一引擎上的两种模式。**调用 agent 按意图自动路由**,用户无需手动指定:
94
+
95
+ | | **对抗模式(challenge)** | **议事模式(weigh-options)** |
96
+ |---|---|---|
97
+ | 何时 | 有一份**成型的** plan/spec 要挑刺/硬化 | 有**几个选项 / 一堆 to-do** 拿不准选哪个 |
98
+ | 路由信号 | 单一成型 artifact + “帮我审/找漏洞/能不能执行” | 多个候选 + “选哪个/排序/值不值得做” |
99
+ | 聚合 | **证据存活制**:少数派可对,**禁多数投票** | **加权多数 + 暴露异议**:纯偏好取舍才投票 |
100
+ | 产出 | 六态 verdict + 存活异议 + 复现反证 | 排序选项 + 票数 + 最强异议 |
101
+
102
+ **模式由 agent 选、不甩给用户**:agent 读意图 → “审成型 artifact”走对抗、“在选项里挑”走议事,边界由确定性路由信号判定。议事中若某选项被标出**可机械验证的硬伤(blocker)**,推荐**降级为 `discuss` 并提示你去 challenge 模式核验**,而非径自采纳——所以投票永远压不掉能被证伪的少数派。
103
+
104
+ ## 工作原理
105
+
106
+ **对抗模式**(reduce-rework loop):
107
+ ```
108
+ 成型 artifact + bounded context
109
+ → 多 persona/CLI challenger 各自 steelman→找漏洞(绑具体文本,禁hedging)
110
+ → Verifier 跨家族出最小可复现反证 / 矛盾源行
111
+ → 按 canonical key 去重 + 证据存活判定
112
+ → 单一裁决管线出 verdict(六态) + 面板完整性核对
113
+ →(--deep: 多轮到双条件收敛)
114
+ ```
115
+
116
+ **议事模式** —— 方法论是标准三段。`weigh` CLI 实现第 ③ 段(对你给定的选项投票);①② 段由调用 agent 在调用前负责,**禁止抄近路**:
117
+ ```
118
+ ① 背景对齐 (agent) 先给所有 voter 充分背景(待决问题/约束/已知信息), 不预设选项
119
+ ② 意见收集 (agent) 各 voter 独立·互不可见·不被喂主控偏好地发散生成候选 → 去重聚类成选项池
120
+ ③ 组织投票 `challenge-plans weigh` 对该选项池投票(model_family 加权防虚假共识) → 排序+票数+异议
121
+ 平票/缺票才交人, 否则完成闭环带回结果
122
+ ```
123
+
124
+ ## 它内建防住的 7 个多 agent 失败模式
125
+
126
+ 这些坑是 naive 多 agent 编排几乎必踩的,也是我们**用对抗流程开发本工具时自己真实踩出来的**——每一个都被设计成机制挡住(dogfood 出来的护城河):
127
+
128
+ 1. **票据丢失**:challenger 输出被截断/超时/解析失败,系统静默拿残缺面板聚合。**防**:机械可读捕获 + 逐 voter 完整性自检;缺票绝不出 approve/宣布多数。
129
+ 2. **选项锚定**:主控只抛自己预选的选项让大家投。**防**:议事必走“发散在前、投票在后”,voter 不被喂主控偏好。
130
+ 3. **半途甩锅**:主控中途把开放决定甩回人工而不完成投票。**防**:完成闭环带回结果,仅平票/缺票才交人。
131
+ 4. **多数压少数**:用多数投票否掉有可复现 blocker 的少数派真问题。**防**:两模式分治 + 串联逃逸门,对抗模式禁投票、证据压票数。
132
+ 5. **单轮即收**:一轮对抗就宣布够了。**防**:`--deep` 多轮到收敛 + 写码前对代码再对抗。
133
+ 6. **虚假共识**:同模型多 persona 的票被当独立票。**防**:按 model_family 加权封顶、raw/weighted 双显、单家族告警。
134
+ 7. **假收敛**:某轮没新异议但旧 blocker 还 open 就宣布收敛。**防**:双条件收敛(new_surviving==0 且 unresolved_required==0)。
135
+
136
+ ## 后端
137
+
138
+ challenge-plans 驱动你已登录的任一订阅编码 CLI —— 如 **Claude Code**(`claude`)或 **OpenAI Codex**(`codex`)。**不绑定任何特定一家。** 有**两个不同厂商**时可跨家族互验;只有一个时结论保持 advisory。无 API key、本工具不产生 per-token API 费用(`doctor` 只验登录、不查账单;用量仍计入你正常订阅额度)。
139
+
140
+ ## 状态
141
+
142
+ **v1 — 可用。** 两模式端到端可跑,对真实 spec 验证过、由 pytest 套件钉住不变量,经多轮跨 agent 对抗 review 加固。
143
+
144
+ 已知边界(输出里亦标):concern 去重为精确锚点;无 idle-timeout(用墙钟);议事 blocker 目前只标注、尚未自动转 Verifier 核验;开放决策的发散阶段由调用 agent 负责;`manual_paste`/Gemini adapter 为后续。
145
+
146
+ ## 测试
147
+
148
+ ```bash
149
+ pip install -e ".[dev]" && pytest # pythonpath/testpaths 已配
150
+ ```
151
+ 测试套件钉住了历次对抗 review 确立的全部不变量。
152
+
153
+ ## 贡献
154
+
155
+ 欢迎 issue / PR —— 见 [CONTRIBUTING.md](CONTRIBUTING.md)。本项目 dogfood:开 PR 前用 `challenge-plans run <change>.diff --type diff` 审一遍自己的改动。
156
+
157
+ ## 许可
158
+
159
+ [Apache-2.0](LICENSE)。