open-reflection-protocol 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,262 @@
1
+ Metadata-Version: 2.4
2
+ Name: open-reflection-protocol
3
+ Version: 0.3.0
4
+ Summary: Turn agent failures into regression tests, reusable lessons, and measurable improvements
5
+ Project-URL: Home, https://github.com/Fujo930/ORP
6
+ Author: ORP Contributors
7
+ License: MIT
8
+ Keywords: agent,ai,observability,opentelemetry,reflection
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Requires-Python: >=3.10
17
+ Requires-Dist: pydantic>=2.0
18
+ Description-Content-Type: text/markdown
19
+
20
+ # Open Reflection Protocol (ORP)
21
+
22
+ > Turn agent failures into regression tests, reusable lessons, and measurable improvements.
23
+
24
+ **Tracing tells you what your agent did. ORP turns what happened into a tested lesson.**
25
+
26
+ ---
27
+
28
+ ## Demo: 30 Seconds
29
+
30
+ A coding agent fixes an auth bug but misses the anonymous user path. Tests fail at 34/35.
31
+
32
+ ```bash
33
+ # 1. Wrap your agent with ORP
34
+ orp wrap -- python my_agent.py
35
+
36
+ # 2. ORP captures the failure, challenges unproven claims,
37
+ # and compiles a Lesson + regression Eval
38
+ orp learn latest
39
+
40
+ # 3. Same agent retrieves the Lesson via MCP, applies it
41
+ # -> All 35 tests pass this time
42
+ orp mcp-server
43
+
44
+ # 4. Before/after comparison
45
+ orp diff exp_before exp_after
46
+ ```
47
+
48
+ **Before:**
49
+ ```
50
+ Task success: FAILED (34/35 tests)
51
+ Claims: 1 unproven
52
+ ```
53
+
54
+ **After:**
55
+ ```
56
+ Task success: PASSED (35/35 tests)
57
+ Claims: 0 unproven
58
+ ```
59
+
60
+ That's the loop. One mistake, one lesson, one measurable improvement.
61
+
62
+ ---
63
+
64
+ ## What ORP Does
65
+
66
+ ORP is an **open experience layer for AI agents**, built on OpenTelemetry. It converts agent traces into three executable artifacts:
67
+
68
+ | Artifact | What | Example |
69
+ |----------|------|---------|
70
+ | **Lesson** | Retrievable, scope-scoped experience | "Test anonymous, authenticated, and forbidden paths" |
71
+ | **Eval** | Regression test reproducing the failure | `pytest tests/test_anonymous_access.py` |
72
+ | **Guardrail** | Preventative rule | "Before modifying auth, run full test suite" |
73
+
74
+ Each Lesson goes through a lifecycle:
75
+
76
+ ```
77
+ candidate -> active -> under_review -> deprecated -> rejected
78
+ |
79
+ (only active lessons
80
+ are retrievable)
81
+ ```
82
+
83
+ ---
84
+
85
+ ## Key Concepts
86
+
87
+ - **Evidence-first**: ORP distinguishes observed facts (tool output, test results) from agent claims (diagnoses, confidence statements). Claims are never automatically treated as ground truth.
88
+ - **Executable experience**: Lessons compile to runnable evals and guardrails, not just text.
89
+ - **Outcome-based value**: Lesson quality is determined by whether it actually improves results, measured through effect evaluation.
90
+ - **Built on OpenTelemetry**: ORP extends existing trace infrastructure instead of replacing it.
91
+ - **Default private**: All data stays local, de-identified by default, no prompt/tool output uploaded.
92
+
93
+ ---
94
+
95
+ ## Install
96
+
97
+ ```bash
98
+ pip install open-reflection-protocol
99
+ ```
100
+
101
+ Requires Python 3.10+.
102
+
103
+ ---
104
+
105
+ ## Quick Start
106
+
107
+ ### 1. Wrap any agent command
108
+
109
+ ```bash
110
+ orp wrap -- python my_agent.py --run-task
111
+ ```
112
+
113
+ ORP automatically captures stdout, exit codes, test results, git diff, and OpenTelemetry spans.
114
+
115
+ ### 2. Learn from the run
116
+
117
+ ```bash
118
+ orp learn latest
119
+ ```
120
+
121
+ This generates:
122
+ - A **diagnosis** of what went wrong
123
+ - **Challenged claims** (unsupported agent statements)
124
+ - A **Lesson** candidate
125
+ - A **regression Eval**
126
+
127
+ ### 3. View results
128
+
129
+ ```bash
130
+ orp inspect latest
131
+ orp report --open # HTML report
132
+ orp diff exp_before exp_after
133
+ ```
134
+
135
+ ### 4. Deliver lessons to future runs
136
+
137
+ ```bash
138
+ # Start the MCP Lesson server
139
+ orp mcp-server --transport stdio
140
+
141
+ # Compatible agents can now use these MCP tools:
142
+ # orp_retrieve_lessons(task, limit=3)
143
+ # orp_acknowledge_lesson(lesson_id)
144
+ # orp_report_outcome(lesson_id, outcome, evidence_refs)
145
+ ```
146
+
147
+ ---
148
+
149
+ ## Run the Demo
150
+
151
+ ```bash
152
+ git clone https://github.com/Fujo930/ORP
153
+ cd ORP
154
+ uv run python demo/orp_demo.py
155
+ ```
156
+
157
+ Output:
158
+
159
+ ```
160
+ Run 1: Agent misses anonymous user path -> FAILED
161
+ ORP analyzes the failure -> challenges 1 unproven claim
162
+ ORP compiles Lesson + Eval
163
+ MCP delivers Lesson to Agent
164
+ Run 2: Agent applies Lesson -> PASSED
165
+
166
+ Before: 34/35 tests, 1 unproven claim
167
+ After: 35/35 tests, 0 unproven claims
168
+ Estimated effect: 0.5
169
+ ```
170
+
171
+ ## Experimental Results
172
+
173
+ **10 failure tasks, 5 trials each, 100 total runs.**
174
+
175
+ | Metric | Control (no ORP) | +ORP | Improvement |
176
+ |--------|:-:|:-:|:-:|
177
+ | Task success rate | 14% | 100% | **+86%** |
178
+ | Repeat failure rate | high | 0% | **100% reduction** |
179
+ | Lesson application | — | 100% | — |
180
+ | Eval validity | — | 85% | — |
181
+
182
+ ```
183
+ Go/No-Go: >>> GO — 4/4 checks passed
184
+ ```
185
+
186
+ Run yourself: `uv run python exps/runner.py`
187
+
188
+ ---
189
+
190
+ ## CLI Reference
191
+
192
+ ```text
193
+ orp wrap -- python agent.py Wrap an agent process with ORP
194
+ orp inspect [id] Inspect an experience (default: latest)
195
+ orp learn [id] Generate lessons from an experience
196
+ orp replay <id> Counterfactual replay
197
+ orp lessons list List lessons
198
+ orp lessons validate <id> Validate lesson integrity
199
+ orp lessons conflicts Auto-detect conflicting lessons
200
+ orp lessons rollback <id> Rollback a lesson
201
+ orp lessons deliver <id> Deliver a lesson
202
+ orp effects evaluate <id> Evaluate lesson effect
203
+ orp training candidates List training candidates
204
+ orp training export Export approved training data
205
+ orp mcp-server Start MCP lesson server
206
+ orp report --open Generate HTML report
207
+ orp diff <id1> <id2> Compare two experiences
208
+ orp export [id] Export as JSON
209
+ ```
210
+
211
+ ---
212
+
213
+ ## Architecture
214
+
215
+ ```text
216
+ Agent / Existing Trace
217
+ |
218
+ v
219
+ Trace Adapters (OTel / OpenAI / LangGraph / Generic JSON)
220
+ |
221
+ v
222
+ Experience Builder -> Evidence Verifier
223
+ -> Reflection Analyzer (diagnosis + challenger)
224
+ -> Counterfactual Replayer
225
+ |
226
+ v
227
+ Experience Compiler
228
+ +----+----+------+
229
+ | | |
230
+ Lesson Eval Guardrail
231
+ | | |
232
+ +---- Delivery Router (MCP Server / Prompt / Policy / Runtime Hook)
233
+ |
234
+ v
235
+ Effect Evaluator + Rollback
236
+ ```
237
+
238
+ ---
239
+
240
+ ## For Contributors
241
+
242
+ Tests (58 total):
243
+
244
+ ```bash
245
+ uv run pytest -q
246
+ # 58 passed in 0.68s
247
+ ```
248
+
249
+ Key design documents in this repo:
250
+
251
+ | File | What |
252
+ |------|------|
253
+ | `ROADMAP.md` | Project roadmap and strategy |
254
+ | `SPEC.md` | Protocol specification v0.3 |
255
+ | `ARCHITECTURE.md` | Implementation architecture |
256
+ | `demo/orp_demo.py` | Standalone demo |
257
+
258
+ ---
259
+
260
+ ## License
261
+
262
+ MIT
@@ -0,0 +1,29 @@
1
+ orp/__init__.py,sha256=ElxT4yZk6CJLGCxfPGKAKF_bgsO3FQXBABL5AkSyZcw,2228
2
+ orp/capture.py,sha256=cq_N52iNeio7X-8AQJjcqm1czvMZCuRJ1Ft3PfWM2fw,5255
3
+ orp/cli.py,sha256=0HZm3KbsQ17ouT0TM1txjNmJZt2TjxUeKOvmhnd6w4E,12927
4
+ orp/compiler.py,sha256=JNvcc_tISJmsVKNyt6HzUWuD-edYc6bMfVcJVyE86wc,4818
5
+ orp/conflicts.py,sha256=k_VhbFccO2SgfeNOT3GzSJS6ONmlZn9H8y-DuRODj30,2471
6
+ orp/delivery.py,sha256=H3pKf9MsCOqb5kbQO032pXawfkueMNqk9g9k-6IOLfw,4432
7
+ orp/effects.py,sha256=i8BAQjSJPvi07qVDhEb909fe-tTOS0rRw4FfQwdTfII,4367
8
+ orp/evidence.py,sha256=hjpfujrQ9lhIQhWFH6YGjBHMcefuYXRnrlc3plimVxY,3085
9
+ orp/experience.py,sha256=Y0N3uYBQ5kGciM_GZ8V6l17RcutpFvlMeJJIKZHoi1s,4141
10
+ orp/export.py,sha256=o5JN4fNC1rp2lsJKFdxwSwZfkV83bhlDItk_oojX54Q,1919
11
+ orp/lessons.py,sha256=qWilxA48vDXctdDZwl-hyQZ6O26H2IUX5N0tHSqHbKE,3603
12
+ orp/mcp_server.py,sha256=pNifTZOOV5XKTCTmrLktogQJSuqW9_jFt4ShMBPFUfU,7158
13
+ orp/reflect.py,sha256=2peSjF-Dch2N_uV9Rd7uxDTbM_T7GPPnXvUWCDQRPR8,3862
14
+ orp/replay.py,sha256=PBHX-TZ6UglRy0UP9u_gZk4CFvezBvdTU64IM2EIiRg,4210
15
+ orp/rollback.py,sha256=mcVfsQmL51hJ5AVNJ4grdcFrAAA2ttlbed8JUfY5FtM,2926
16
+ orp/schema.py,sha256=DouLTExAVUItRcY_t8hptGyIar0BxKvFnouuvOHWkAs,12698
17
+ orp/storage.py,sha256=IiU3Oe5TIZyPwFaRXhXeZWlLK453ETh-Vd3g5in-va8,19310
18
+ orp/training.py,sha256=jCcJ-qfOpDrLrKsaUMqiKZKuj9uvKhT1E71K7aC5Fn8,3815
19
+ orp/viewer.py,sha256=OLpTNZandwVrSoT4FJkLg7rwfAu3_3Xtwi9iXOtJgrg,5062
20
+ orp/adapters/__init__.py,sha256=k8GarXirpNGZS7b-pEWvRxCVzWS_OKDUgOyUNruD0I0,285
21
+ orp/adapters/generic_json.py,sha256=p-v8mlOUMdeBqjU4Dw5-LrnwoSUd4snDjNSsoysajvQ,697
22
+ orp/adapters/langgraph.py,sha256=J6Rxsw4u1v_EwdieMFrtNuZNhYn4yVJdMuj0CqnA42A,829
23
+ orp/adapters/openai_agents.py,sha256=NmZ6nT4vwQ2UNLLVYwDsCW367IuT3ECP3kSBYJW_A5E,1044
24
+ orp/adapters/otel.py,sha256=lIEwqG7Ve0CathNxLX_BASxUu2dIzZQsW03nec5C-xQ,1929
25
+ orp/examples/failing_coding_agent.py,sha256=Cw6Wjewl-8nWskiXEvAbU28olkLeOk4Ax4Wq25i6FdA,1311
26
+ open_reflection_protocol-0.3.0.dist-info/METADATA,sha256=91IDOlDccp-BFFS72dHBXOcpT-N5MvUvgwBXA04bp5E,6700
27
+ open_reflection_protocol-0.3.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
28
+ open_reflection_protocol-0.3.0.dist-info/entry_points.txt,sha256=lwGrp8bM18-BFu2L3vHWCWQrmodWoWASHdDppgvnbtw,37
29
+ open_reflection_protocol-0.3.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.30.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ orp = orp.cli:main
orp/__init__.py ADDED
@@ -0,0 +1,66 @@
1
+ """ORP — Open Reflection Protocol
2
+
3
+ Turn agent failures into regression tests, reusable lessons, and measurable improvements.
4
+
5
+ Core API:
6
+ from orp import Experience
7
+ with Experience(goal="fix bug") as exp:
8
+ result = agent.run()
9
+ exp.outcome(result)
10
+
11
+ from orp import autolog
12
+ autolog() # auto-capture all agent runs (experimental)
13
+ """
14
+
15
+ from orp.schema import (
16
+ ExperienceRecord, Lesson, EvalArtifact, LessonStatus,
17
+ TimelineEvent, EventKind, LessonDelivery, LessonEvaluation,
18
+ LessonRollback, TrainingCandidate, Outcome,
19
+ )
20
+ from orp.storage import ORPStorage
21
+ from orp.experience import ExperienceBuilder, Redactor, EvidenceLinker
22
+ from orp.capture import capture_trace_context
23
+ from orp.lessons import LessonStore
24
+ from orp.compiler import ExperienceCompiler
25
+ from orp.reflect import ReflectionAnalyzer, Challenger
26
+ from orp.replay import CounterfactualReplayer
27
+ from orp.delivery import DeliveryRouter
28
+ from orp.conflicts import ConflictDefender
29
+ from orp.effects import EffectEvaluator
30
+ from orp.rollback import RollbackManager
31
+ from orp.training import TrainingPipeline
32
+ from orp.mcp_server import MCPServer
33
+ from orp.export import ExportEngine
34
+ from orp.viewer import HTMLReporter
35
+
36
+
37
+ def autolog():
38
+ """Enable automatic capture (experimental)"""
39
+ import warnings
40
+ warnings.warn("autolog() is experimental — use `orp wrap -- python agent.py` instead")
41
+
42
+
43
+ class Experience:
44
+ """Experience context manager — captures an agent run as an ORP experience"""
45
+ def __init__(self, goal: str = ""):
46
+ self.goal = goal
47
+ self._ctx = None
48
+ self._record = None
49
+
50
+ def __enter__(self):
51
+ self._ctx = capture_trace_context(self.goal)
52
+ return self._ctx.__enter__()
53
+
54
+ def __exit__(self, *args):
55
+ if self._ctx:
56
+ self._ctx.__exit__(*args)
57
+ events = self._ctx.get_events()
58
+ if events:
59
+ builder = ExperienceBuilder()
60
+ self._record = builder.from_events(events, self.goal)
61
+ storage = ORPStorage()
62
+ storage.save_experience(self._record)
63
+
64
+ @property
65
+ def experience_id(self) -> str:
66
+ return self._record.experience_id if self._record else ""
@@ -0,0 +1,6 @@
1
+ """Trace adapters — 将异构 trace 格式转换为 ExperienceRecord"""
2
+
3
+ from orp.adapters.generic_json import GenericJSONAdapter
4
+ from orp.adapters.otel import OTelAdapter
5
+ from orp.adapters.openai_agents import OpenAIAgentsAdapter
6
+ from orp.adapters.langgraph import LangGraphAdapter
@@ -0,0 +1,24 @@
1
+ """Generic JSON Adapter — 从任意 JSON trace 导入"""
2
+
3
+ from typing import Any, Optional
4
+
5
+ from orp.schema import ExperienceRecord
6
+ from orp.experience import ExperienceBuilder
7
+
8
+
9
+ class GenericJSONAdapter:
10
+ """通用 JSON trace 适配器"""
11
+
12
+ def __init__(self):
13
+ self._builder = ExperienceBuilder()
14
+
15
+ def parse(self, data: dict[str, Any],
16
+ agent_id: str = "unknown",
17
+ goal: str = "") -> ExperienceRecord:
18
+ return self._builder.from_trace(data, agent_id=agent_id, goal=goal)
19
+
20
+ def parse_file(self, path: str) -> ExperienceRecord:
21
+ import json
22
+ with open(path) as f:
23
+ data = json.load(f)
24
+ return self.parse(data)
@@ -0,0 +1,24 @@
1
+ """LangGraph Adapter"""
2
+
3
+ from typing import Any, Optional
4
+
5
+ from orp.schema import ExperienceRecord, TimelineEvent
6
+ from orp.experience import ExperienceBuilder
7
+
8
+
9
+ class LangGraphAdapter:
10
+ """LangGraph trace 适配器"""
11
+
12
+ def parse(self, state_snapshots: list[dict[str, Any]],
13
+ agent_id: str = "langgraph-agent",
14
+ goal: str = "") -> ExperienceRecord:
15
+ builder = ExperienceBuilder()
16
+ events = []
17
+ for i, snapshot in enumerate(state_snapshots):
18
+ node = snapshot.get("node", f"step_{i}")
19
+ events.append(TimelineEvent(
20
+ kind=snapshot.get("kind", "action"),
21
+ content=f"Node {node}: {snapshot.get('keys', '')}",
22
+ source="agent",
23
+ ))
24
+ return builder.from_events(events, goal=goal, agent_id=agent_id)
@@ -0,0 +1,27 @@
1
+ """OpenAI Agents SDK Adapter"""
2
+
3
+ from typing import Any, Optional
4
+
5
+ from orp.schema import ExperienceRecord, TimelineEvent
6
+ from orp.experience import ExperienceBuilder
7
+
8
+
9
+ class OpenAIAgentsAdapter:
10
+ """OpenAI Agents SDK trace 适配器"""
11
+
12
+ def parse(self, trace_data: dict[str, Any],
13
+ agent_id: str = "openai-agent",
14
+ goal: str = "") -> ExperienceRecord:
15
+ builder = ExperienceBuilder()
16
+ events = []
17
+ # OpenAI Agents SDK trace 结构: trace -> runs -> steps
18
+ runs = trace_data.get("runs", [trace_data])
19
+ for run in runs:
20
+ for step in run.get("steps", []):
21
+ events.append(TimelineEvent(
22
+ kind=step.get("type", "action"),
23
+ content=step.get("output", step.get("input", str(step)))[:500],
24
+ source="agent",
25
+ evidence_refs=[f"otel:{step.get('span_id', '')}"] if step.get("span_id") else [],
26
+ ))
27
+ return builder.from_events(events, goal=goal, agent_id=agent_id)
orp/adapters/otel.py ADDED
@@ -0,0 +1,52 @@
1
+ """OpenTelemetry Adapter — 从 OTel GenAI trace 导入"""
2
+
3
+ from typing import Any, Optional
4
+
5
+ from orp.schema import ExperienceRecord, TimelineEvent, EventKind
6
+ from orp.experience import ExperienceBuilder
7
+
8
+
9
+ class OTelAdapter:
10
+ """OpenTelemetry GenAI trace 适配器
11
+
12
+ 解析符合 OTel GenAI 语义约定的 trace/span 数据。
13
+ """
14
+
15
+ def parse(self, spans: list[dict[str, Any]],
16
+ agent_id: str = "unknown",
17
+ goal: str = "") -> ExperienceRecord:
18
+ builder = ExperienceBuilder()
19
+ events = []
20
+ for span in spans:
21
+ kind = self._map_kind(span)
22
+ events.append(TimelineEvent(
23
+ kind=kind,
24
+ content=span.get("name", span.get("attributes", {}).get("gen_ai.request.model", "")),
25
+ source=span.get("attributes", {}).get("gen_ai.system", "agent"),
26
+ ))
27
+ if not events:
28
+ events.append(TimelineEvent(kind="observation", content="Empty OTel trace"))
29
+ return builder.from_events(events, goal=goal, agent_id=agent_id)
30
+
31
+ def _map_kind(self, span: dict[str, Any]) -> str:
32
+ attrs = span.get("attributes", {})
33
+ kind = span.get("kind", "SPAN_KIND_INTERNAL")
34
+ if "gen_ai.request.model" in attrs:
35
+ return "action"
36
+ if "gen_ai.evaluation.result" in attrs:
37
+ return "feedback"
38
+ if "exception" in span or "error" in span:
39
+ return "observation"
40
+ return "action"
41
+
42
+ def from_otel_json(self, path: str) -> list[ExperienceRecord]:
43
+ import json
44
+ with open(path) as f:
45
+ data = json.load(f)
46
+ records = []
47
+ for resource_span in data.get("resourceSpans", []):
48
+ for scope_span in resource_span.get("scopeSpans", []):
49
+ spans = scope_span.get("spans", [])
50
+ if spans:
51
+ records.append(self.parse(spans))
52
+ return records
orp/capture.py ADDED
@@ -0,0 +1,162 @@
1
+ """捕获层 — 进程/工具/测试/OTel 数据采集"""
2
+
3
+ import os
4
+ import subprocess
5
+ import sys
6
+ import tempfile
7
+ import time
8
+ from contextlib import contextmanager
9
+ from datetime import datetime, timezone
10
+ from pathlib import Path
11
+ from typing import Any, Optional
12
+
13
+ from orp.schema import TimelineEvent, EventKind
14
+
15
+
16
+ def _now_iso() -> str:
17
+ return datetime.now(timezone.utc).isoformat()
18
+
19
+
20
+ def capture_command(
21
+ command: list[str],
22
+ workdir: Optional[str] = None,
23
+ timeout: int = 300,
24
+ ) -> dict[str, Any]:
25
+ """运行命令并捕获输出、退出码和耗时"""
26
+ start = time.time()
27
+ try:
28
+ result = subprocess.run(
29
+ command,
30
+ capture_output=True,
31
+ text=True,
32
+ cwd=workdir or os.getcwd(),
33
+ timeout=timeout,
34
+ )
35
+ duration = time.time() - start
36
+ return {
37
+ "command": " ".join(command),
38
+ "exit_code": result.returncode,
39
+ "stdout": result.stdout,
40
+ "stderr": result.stderr,
41
+ "duration": round(duration, 2),
42
+ "success": result.returncode == 0,
43
+ "timed_out": False,
44
+ }
45
+ except subprocess.TimeoutExpired as e:
46
+ return {
47
+ "command": " ".join(command),
48
+ "exit_code": -1,
49
+ "stdout": e.stdout or "",
50
+ "stderr": e.stderr or "",
51
+ "duration": timeout,
52
+ "success": False,
53
+ "timed_out": True,
54
+ }
55
+
56
+
57
+ def capture_git_diff(workdir: Optional[str] = None) -> str:
58
+ """捕获工作目录的 git diff"""
59
+ try:
60
+ cwd = workdir or os.getcwd()
61
+ result = subprocess.run(
62
+ ["git", "diff"],
63
+ capture_output=True, text=True, cwd=cwd, timeout=30,
64
+ )
65
+ return result.stdout
66
+ except (subprocess.TimeoutExpired, FileNotFoundError):
67
+ return ""
68
+
69
+
70
+ def capture_git_status(workdir: Optional[str] = None) -> str:
71
+ """捕获 git 状态"""
72
+ try:
73
+ cwd = workdir or os.getcwd()
74
+ result = subprocess.run(
75
+ ["git", "status", "--short"],
76
+ capture_output=True, text=True, cwd=cwd, timeout=30,
77
+ )
78
+ return result.stdout
79
+ except (subprocess.TimeoutExpired, FileNotFoundError):
80
+ return ""
81
+
82
+
83
+ def capture_pytest_result(workdir: Optional[str] = None) -> dict[str, Any]:
84
+ """运行 pytest 并捕获结果"""
85
+ try:
86
+ cwd = workdir or os.getcwd()
87
+ result = subprocess.run(
88
+ [sys.executable, "-m", "pytest", "-q", "--tb=short"],
89
+ capture_output=True, text=True, cwd=cwd, timeout=120,
90
+ )
91
+ output = result.stdout + result.stderr
92
+ passed = "passed" in output or result.returncode == 0
93
+ failed_count = 0
94
+ passed_count = 0
95
+ for line in output.split("\n"):
96
+ if "failed" in line and "passed" in line:
97
+ parts = line.split()
98
+ for p in parts:
99
+ if "failed" in p:
100
+ try:
101
+ failed_count = int(p.split("failed")[0])
102
+ except ValueError:
103
+ pass
104
+ elif "passed" in p:
105
+ try:
106
+ passed_count = int(p.split("passed")[0])
107
+ except ValueError:
108
+ pass
109
+ return {
110
+ "exit_code": result.returncode,
111
+ "summary": result.stdout.strip().split("\n")[-1] if result.stdout else "",
112
+ "passed": passed,
113
+ "passed_count": passed_count,
114
+ "failed_count": failed_count,
115
+ "output": output,
116
+ }
117
+ except (subprocess.TimeoutExpired, FileNotFoundError):
118
+ return {"exit_code": -1, "passed": False, "error": "could not run pytest"}
119
+
120
+
121
+ @contextmanager
122
+ def capture_trace_context(goal: str):
123
+ """上下文管理器 — 创建一个带有基本 trace 的作用域
124
+
125
+ 用法:
126
+ with capture_trace_context("修复登录错误") as ctx:
127
+ result = agent.run()
128
+ ctx.set_outcome(result)
129
+ """
130
+ events: list[TimelineEvent] = []
131
+ outcome = {"status": "unknown"}
132
+ start = time.time()
133
+
134
+ class CaptureContext:
135
+ def add_event(self, kind: str, content: str, source: str = "agent",
136
+ evidence_refs: Optional[list[str]] = None):
137
+ events.append(TimelineEvent(
138
+ kind=kind,
139
+ content=content,
140
+ source=source,
141
+ evidence_refs=evidence_refs or [],
142
+ ))
143
+
144
+ def set_outcome(self, status: str, signals: Optional[dict[str, Any]] = None):
145
+ nonlocal outcome
146
+ outcome = {"status": status, "objective_signals": [signals] if signals else []}
147
+
148
+ def get_events(self) -> list[TimelineEvent]:
149
+ return events.copy()
150
+
151
+ def get_duration(self) -> float:
152
+ return time.time() - start
153
+
154
+ ctx = CaptureContext()
155
+ try:
156
+ yield ctx
157
+ ctx.add_event("outcome", f"Completed in {time.time()-start:.1f}s", source="system")
158
+ except Exception as e:
159
+ ctx.set_outcome("failed", {"error": str(e)})
160
+ ctx.add_event("observation", f"Error: {e}", source="system")
161
+ finally:
162
+ pass