harness-evolver 2.2.0 → 2.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/evolve/SKILL.md +69 -5
package/package.json
CHANGED
package/skills/evolve/SKILL.md
CHANGED
|
@@ -52,21 +52,85 @@ LS_PROJECT=$(langsmith-cli --json projects list --name-pattern "harness-evolver*
|
|
|
52
52
|
|
|
53
53
|
If `LS_PROJECT` is empty, langsmith-cli is not available or no projects exist — skip to step 2.
|
|
54
54
|
|
|
55
|
-
**Step 2: Gather traces from the discovered project**
|
|
55
|
+
**Step 2: Gather raw traces from the discovered project**
|
|
56
56
|
|
|
57
57
|
```bash
|
|
58
58
|
if [ -n "$LS_PROJECT" ]; then
|
|
59
|
-
langsmith-cli --json runs list --project "$LS_PROJECT" --
|
|
60
|
-
langsmith-cli --json runs list --project "$LS_PROJECT" --fields id,name,inputs,outputs,latency_ms,total_tokens --limit 20 > .harness-evolver/langsmith_runs.json 2>/dev/null || echo "[]" > .harness-evolver/langsmith_runs.json
|
|
59
|
+
langsmith-cli --json runs list --project "$LS_PROJECT" --recent --fields id,name,inputs,outputs,error,total_tokens --limit 30 > /tmp/langsmith_raw.json 2>/dev/null || echo "[]" > /tmp/langsmith_raw.json
|
|
61
60
|
langsmith-cli --json runs stats --project "$LS_PROJECT" > .harness-evolver/langsmith_stats.json 2>/dev/null || echo "{}" > .harness-evolver/langsmith_stats.json
|
|
62
61
|
echo "$LS_PROJECT" > .harness-evolver/langsmith_project.txt
|
|
63
62
|
else
|
|
64
|
-
echo "[]" >
|
|
63
|
+
echo "[]" > /tmp/langsmith_raw.json
|
|
65
64
|
echo "{}" > .harness-evolver/langsmith_stats.json
|
|
66
65
|
fi
|
|
67
66
|
```
|
|
68
67
|
|
|
69
|
-
|
|
68
|
+
**Step 3: Process raw LangSmith data into a readable format for proposers**
|
|
69
|
+
|
|
70
|
+
The raw langsmith data has LangChain-serialized messages that are hard to read. Process it into a clean summary:
|
|
71
|
+
|
|
72
|
+
```bash
|
|
73
|
+
python3 -c "
|
|
74
|
+
import json, sys
|
|
75
|
+
|
|
76
|
+
raw = json.load(open('/tmp/langsmith_raw.json'))
|
|
77
|
+
if not raw:
|
|
78
|
+
json.dump([], open('.harness-evolver/langsmith_runs.json', 'w'))
|
|
79
|
+
sys.exit(0)
|
|
80
|
+
|
|
81
|
+
clean = []
|
|
82
|
+
for r in raw:
|
|
83
|
+
entry = {'name': r.get('name', '?'), 'tokens': r.get('total_tokens', 0), 'error': r.get('error')}
|
|
84
|
+
|
|
85
|
+
# Extract readable prompt from LangChain serialized inputs
|
|
86
|
+
inputs = r.get('inputs', {})
|
|
87
|
+
if isinstance(inputs, dict) and 'messages' in inputs:
|
|
88
|
+
msgs = inputs['messages']
|
|
89
|
+
for msg_group in (msgs if isinstance(msgs, list) else [msgs]):
|
|
90
|
+
for msg in (msg_group if isinstance(msg_group, list) else [msg_group]):
|
|
91
|
+
if isinstance(msg, dict):
|
|
92
|
+
kwargs = msg.get('kwargs', msg)
|
|
93
|
+
content = kwargs.get('content', '')
|
|
94
|
+
msg_type = msg.get('id', ['','','',''])[3] if isinstance(msg.get('id'), list) else 'unknown'
|
|
95
|
+
if 'Human' in str(msg_type) or 'user' in str(msg_type).lower():
|
|
96
|
+
entry['user_message'] = str(content)[:300]
|
|
97
|
+
elif 'System' in str(msg_type):
|
|
98
|
+
entry['system_prompt_preview'] = str(content)[:200]
|
|
99
|
+
|
|
100
|
+
# Extract readable output
|
|
101
|
+
outputs = r.get('outputs', {})
|
|
102
|
+
if isinstance(outputs, dict) and 'generations' in outputs:
|
|
103
|
+
gens = outputs['generations']
|
|
104
|
+
if gens and isinstance(gens, list) and gens[0]:
|
|
105
|
+
gen = gens[0][0] if isinstance(gens[0], list) else gens[0]
|
|
106
|
+
if isinstance(gen, dict):
|
|
107
|
+
msg = gen.get('message', gen)
|
|
108
|
+
if isinstance(msg, dict):
|
|
109
|
+
kwargs = msg.get('kwargs', msg)
|
|
110
|
+
entry['llm_response'] = str(kwargs.get('content', ''))[:300]
|
|
111
|
+
|
|
112
|
+
clean.append(entry)
|
|
113
|
+
|
|
114
|
+
json.dump(clean, open('.harness-evolver/langsmith_runs.json', 'w'), indent=2, ensure_ascii=False)
|
|
115
|
+
print(f'Processed {len(clean)} LangSmith runs into readable format')
|
|
116
|
+
" 2>/dev/null || echo "[]" > .harness-evolver/langsmith_runs.json
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
The resulting `langsmith_runs.json` has clean, readable entries:
|
|
120
|
+
```json
|
|
121
|
+
[
|
|
122
|
+
{
|
|
123
|
+
"name": "ChatGoogleGenerativeAI",
|
|
124
|
+
"tokens": 1332,
|
|
125
|
+
"error": null,
|
|
126
|
+
"user_message": "Analise este texto: Bom dia pessoal...",
|
|
127
|
+
"system_prompt_preview": "Você é um moderador de conteúdo...",
|
|
128
|
+
"llm_response": "{\"categories\": [\"safe\"], \"severity\": \"safe\"...}"
|
|
129
|
+
}
|
|
130
|
+
]
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
These files are included in the proposer's `<files_to_read>` so it has readable trace data for diagnosis.
|
|
70
134
|
|
|
71
135
|
### 2. Propose (3 parallel candidates)
|
|
72
136
|
|