@lcv-ideas-software/cross-review 4.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +2568 -0
- package/LICENSE +201 -0
- package/NOTICE +26 -0
- package/README.md +208 -0
- package/SECURITY.md +52 -0
- package/dist/scripts/api-streaming-smoke.d.ts +1 -0
- package/dist/scripts/api-streaming-smoke.js +78 -0
- package/dist/scripts/api-streaming-smoke.js.map +1 -0
- package/dist/scripts/runtime-default-smoke.d.ts +1 -0
- package/dist/scripts/runtime-default-smoke.js +88 -0
- package/dist/scripts/runtime-default-smoke.js.map +1 -0
- package/dist/scripts/runtime-smoke.d.ts +1 -0
- package/dist/scripts/runtime-smoke.js +148 -0
- package/dist/scripts/runtime-smoke.js.map +1 -0
- package/dist/scripts/smoke.d.ts +1 -0
- package/dist/scripts/smoke.js +6156 -0
- package/dist/scripts/smoke.js.map +1 -0
- package/dist/src/core/cache-manifest.d.ts +22 -0
- package/dist/src/core/cache-manifest.js +133 -0
- package/dist/src/core/cache-manifest.js.map +1 -0
- package/dist/src/core/caller-tokens.d.ts +32 -0
- package/dist/src/core/caller-tokens.js +240 -0
- package/dist/src/core/caller-tokens.js.map +1 -0
- package/dist/src/core/config.d.ts +9 -0
- package/dist/src/core/config.js +643 -0
- package/dist/src/core/config.js.map +1 -0
- package/dist/src/core/convergence.d.ts +5 -0
- package/dist/src/core/convergence.js +186 -0
- package/dist/src/core/convergence.js.map +1 -0
- package/dist/src/core/cost.d.ts +59 -0
- package/dist/src/core/cost.js +359 -0
- package/dist/src/core/cost.js.map +1 -0
- package/dist/src/core/file-config.d.ts +316 -0
- package/dist/src/core/file-config.js +490 -0
- package/dist/src/core/file-config.js.map +1 -0
- package/dist/src/core/orchestrator.d.ts +199 -0
- package/dist/src/core/orchestrator.js +3430 -0
- package/dist/src/core/orchestrator.js.map +1 -0
- package/dist/src/core/prompt-parts.d.ts +58 -0
- package/dist/src/core/prompt-parts.js +122 -0
- package/dist/src/core/prompt-parts.js.map +1 -0
- package/dist/src/core/relator-lottery.d.ts +23 -0
- package/dist/src/core/relator-lottery.js +112 -0
- package/dist/src/core/relator-lottery.js.map +1 -0
- package/dist/src/core/reports.d.ts +2 -0
- package/dist/src/core/reports.js +82 -0
- package/dist/src/core/reports.js.map +1 -0
- package/dist/src/core/session-store.d.ts +149 -0
- package/dist/src/core/session-store.js +1923 -0
- package/dist/src/core/session-store.js.map +1 -0
- package/dist/src/core/status.d.ts +61 -0
- package/dist/src/core/status.js +249 -0
- package/dist/src/core/status.js.map +1 -0
- package/dist/src/core/timeouts.d.ts +2 -0
- package/dist/src/core/timeouts.js +3 -0
- package/dist/src/core/timeouts.js.map +1 -0
- package/dist/src/core/types.d.ts +604 -0
- package/dist/src/core/types.js +36 -0
- package/dist/src/core/types.js.map +1 -0
- package/dist/src/dashboard/server.d.ts +2 -0
- package/dist/src/dashboard/server.js +339 -0
- package/dist/src/dashboard/server.js.map +1 -0
- package/dist/src/mcp/server.d.ts +54 -0
- package/dist/src/mcp/server.js +1584 -0
- package/dist/src/mcp/server.js.map +1 -0
- package/dist/src/observability/logger.d.ts +9 -0
- package/dist/src/observability/logger.js +24 -0
- package/dist/src/observability/logger.js.map +1 -0
- package/dist/src/peers/anthropic.d.ts +14 -0
- package/dist/src/peers/anthropic.js +290 -0
- package/dist/src/peers/anthropic.js.map +1 -0
- package/dist/src/peers/base.d.ts +72 -0
- package/dist/src/peers/base.js +416 -0
- package/dist/src/peers/base.js.map +1 -0
- package/dist/src/peers/deepseek.d.ts +12 -0
- package/dist/src/peers/deepseek.js +246 -0
- package/dist/src/peers/deepseek.js.map +1 -0
- package/dist/src/peers/errors.d.ts +2 -0
- package/dist/src/peers/errors.js +185 -0
- package/dist/src/peers/errors.js.map +1 -0
- package/dist/src/peers/gemini.d.ts +13 -0
- package/dist/src/peers/gemini.js +215 -0
- package/dist/src/peers/gemini.js.map +1 -0
- package/dist/src/peers/grok.d.ts +17 -0
- package/dist/src/peers/grok.js +346 -0
- package/dist/src/peers/grok.js.map +1 -0
- package/dist/src/peers/model-selection.d.ts +4 -0
- package/dist/src/peers/model-selection.js +260 -0
- package/dist/src/peers/model-selection.js.map +1 -0
- package/dist/src/peers/openai.d.ts +14 -0
- package/dist/src/peers/openai.js +299 -0
- package/dist/src/peers/openai.js.map +1 -0
- package/dist/src/peers/perplexity.d.ts +18 -0
- package/dist/src/peers/perplexity.js +375 -0
- package/dist/src/peers/perplexity.js.map +1 -0
- package/dist/src/peers/registry.d.ts +3 -0
- package/dist/src/peers/registry.js +77 -0
- package/dist/src/peers/registry.js.map +1 -0
- package/dist/src/peers/retry.d.ts +2 -0
- package/dist/src/peers/retry.js +36 -0
- package/dist/src/peers/retry.js.map +1 -0
- package/dist/src/peers/stub.d.ts +13 -0
- package/dist/src/peers/stub.js +344 -0
- package/dist/src/peers/stub.js.map +1 -0
- package/dist/src/peers/text.d.ts +18 -0
- package/dist/src/peers/text.js +39 -0
- package/dist/src/peers/text.js.map +1 -0
- package/dist/src/security/redact.d.ts +2 -0
- package/dist/src/security/redact.js +128 -0
- package/dist/src/security/redact.js.map +1 -0
- package/docs/api-keys.md +34 -0
- package/docs/architecture.md +118 -0
- package/docs/caching.md +135 -0
- package/docs/costs.md +40 -0
- package/docs/evidence-preflight.md +88 -0
- package/docs/github-security-baseline.md +32 -0
- package/docs/model-selection.md +105 -0
- package/docs/reports/cross-review-v2-api-capability-smoke-2026-04-30.md +354 -0
- package/docs/reports/cross-review-v2-format-recovery-findings-2026-04-28.md +223 -0
- package/docs/reports/cross-review-v2-official-provider-docs-refresh-2026-05-05.md +60 -0
- package/docs/reports/cross-review-v2-token-streaming-smoke-2026-04-30.md +119 -0
- package/package.json +88 -0
|
@@ -0,0 +1,354 @@
|
|
|
1
|
+
# Cross Review v2 - API Capability Smoke
|
|
2
|
+
|
|
3
|
+
Date: 2026-04-30, America/Sao_Paulo
|
|
4
|
+
Runtime under test: local `cross-review-v2` source, package version `2.1.1`
|
|
5
|
+
|
|
6
|
+
## Purpose
|
|
7
|
+
|
|
8
|
+
This report records a real provider capability smoke test for the API-first
|
|
9
|
+
runtime. It is intended to support release review without exposing API keys,
|
|
10
|
+
raw secrets, full prompts or full provider responses.
|
|
11
|
+
|
|
12
|
+
The test used the same Windows environment variable strategy as the MCP host
|
|
13
|
+
configuration. The keys were detected in the current process, User environment
|
|
14
|
+
and Machine environment. No key value was printed or written to this report.
|
|
15
|
+
|
|
16
|
+
## Official Documentation Checked
|
|
17
|
+
|
|
18
|
+
- OpenAI latest model: `https://platform.openai.com/docs/guides/latest-model`
|
|
19
|
+
(redirects to `https://developers.openai.com/api/docs/guides/latest-model`)
|
|
20
|
+
- OpenAI reasoning:
|
|
21
|
+
`https://platform.openai.com/docs/guides/reasoning`
|
|
22
|
+
- Anthropic adaptive thinking: `https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking`
|
|
23
|
+
- Anthropic effort: `https://platform.claude.com/docs/en/build-with-claude/effort`
|
|
24
|
+
- Google Gemini thinking: `https://ai.google.dev/gemini-api/docs/thinking`
|
|
25
|
+
- DeepSeek quick start: `https://api-docs.deepseek.com/`
|
|
26
|
+
- DeepSeek thinking mode: `https://api-docs.deepseek.com/guides/thinking_mode`
|
|
27
|
+
- DeepSeek multi-round conversation: `https://api-docs.deepseek.com/guides/multi_round_chat`
|
|
28
|
+
|
|
29
|
+
Relevant current documentation observations:
|
|
30
|
+
|
|
31
|
+
- OpenAI documents GPT-5.5 as the latest model page, recommends updating the
|
|
32
|
+
model slug to `gpt-5.5`, recommends the Responses API for reasoning,
|
|
33
|
+
tool-calling and multi-turn use cases, and documents `xhigh` for the hardest
|
|
34
|
+
asynchronous agentic tasks and security/code review workloads.
|
|
35
|
+
- Anthropic's official Claude API Docs on `platform.claude.com` document
|
|
36
|
+
`claude-opus-4-7` with adaptive thinking and `output_config.effort`,
|
|
37
|
+
including `xhigh` for advanced coding and complex agentic work.
|
|
38
|
+
- Gemini documents `thinkingConfig` for Gemini API calls; the real model
|
|
39
|
+
metadata for `gemini-3.1-pro-preview` reports `thinking: true`.
|
|
40
|
+
- DeepSeek documents `deepseek-v4-pro`, `thinking.type=enabled`,
|
|
41
|
+
`reasoning_effort=high|max`, JavaScript OpenAI-client examples with
|
|
42
|
+
top-level `thinking` and `reasoning_effort`, the 2026-07-24 deprecation of
|
|
43
|
+
`deepseek-chat` and `deepseek-reasoner`, and stateless multi-round chat
|
|
44
|
+
behavior.
|
|
45
|
+
|
|
46
|
+
## Model Exclusion Rationale
|
|
47
|
+
|
|
48
|
+
- `deepseek-chat` and `deepseek-reasoner` are excluded because DeepSeek marks
|
|
49
|
+
both names for deprecation on 2026-07-24 and maps them to compatibility names
|
|
50
|
+
for `deepseek-v4-flash`.
|
|
51
|
+
- `claude-haiku-4-5` is excluded because the cross-review role requires the
|
|
52
|
+
advanced Opus/Sonnet adaptive-thinking line. Anthropic documents adaptive
|
|
53
|
+
thinking support for Opus 4.7, Opus 4.6 and Sonnet 4.6; Haiku is not in the
|
|
54
|
+
active advanced priority set for this peer-review role.
|
|
55
|
+
- `gemini-3-pro-preview` is excluded because the user's key exposes the newer
|
|
56
|
+
`gemini-3.1-pro-preview` model with thinking support. The runtime should use
|
|
57
|
+
the highest visible advanced thinking model and avoid older intermediate
|
|
58
|
+
previews when a newer advanced model is available.
|
|
59
|
+
|
|
60
|
+
## Redacted Model API Excerpts
|
|
61
|
+
|
|
62
|
+
The following snippets are reduced to non-secret model metadata needed for
|
|
63
|
+
release review. They are not full API-key-bearing responses.
|
|
64
|
+
|
|
65
|
+
```json
|
|
66
|
+
{
|
|
67
|
+
"openai": {
|
|
68
|
+
"models_endpoint_count": 126,
|
|
69
|
+
"relevant_model_ids": [
|
|
70
|
+
"gpt-5.5",
|
|
71
|
+
"gpt-5.4",
|
|
72
|
+
"gpt-5.2",
|
|
73
|
+
"gpt-5.1-codex-max",
|
|
74
|
+
"gpt-5.1-codex",
|
|
75
|
+
"gpt-5.1",
|
|
76
|
+
"gpt-5-pro",
|
|
77
|
+
"gpt-5"
|
|
78
|
+
],
|
|
79
|
+
"selected": "gpt-5.5",
|
|
80
|
+
"reported_model": "gpt-5.5-2026-04-23"
|
|
81
|
+
},
|
|
82
|
+
"anthropic": {
|
|
83
|
+
"models_endpoint_count": 9,
|
|
84
|
+
"relevant_model_ids": [
|
|
85
|
+
"claude-opus-4-7",
|
|
86
|
+
"claude-opus-4-6",
|
|
87
|
+
"claude-sonnet-4-6",
|
|
88
|
+
"claude-opus-4-5-20251101",
|
|
89
|
+
"claude-haiku-4-5-20251001"
|
|
90
|
+
],
|
|
91
|
+
"selected": "claude-opus-4-7",
|
|
92
|
+
"reported_model": "claude-opus-4-7"
|
|
93
|
+
},
|
|
94
|
+
"gemini": {
|
|
95
|
+
"models_endpoint_count": 55,
|
|
96
|
+
"selected_model_metadata": {
|
|
97
|
+
"id": "gemini-3.1-pro-preview",
|
|
98
|
+
"displayName": "Gemini 3.1 Pro Preview",
|
|
99
|
+
"inputTokenLimit": 1048576,
|
|
100
|
+
"outputTokenLimit": 65536,
|
|
101
|
+
"supportedActions": [
|
|
102
|
+
"generateContent",
|
|
103
|
+
"countTokens",
|
|
104
|
+
"createCachedContent",
|
|
105
|
+
"batchGenerateContent"
|
|
106
|
+
],
|
|
107
|
+
"thinking": true
|
|
108
|
+
},
|
|
109
|
+
"reported_model": "gemini-3.1-pro-preview"
|
|
110
|
+
},
|
|
111
|
+
"deepseek": {
|
|
112
|
+
"models_endpoint_count": 2,
|
|
113
|
+
"model_ids": ["deepseek-v4-flash", "deepseek-v4-pro"],
|
|
114
|
+
"selected": "deepseek-v4-pro",
|
|
115
|
+
"reported_model": "deepseek-v4-pro"
|
|
116
|
+
}
|
|
117
|
+
}
|
|
118
|
+
```
|
|
119
|
+
|
|
120
|
+
## Real API Results
|
|
121
|
+
|
|
122
|
+
All four provider capability checks succeeded.
|
|
123
|
+
|
|
124
|
+
### OpenAI
|
|
125
|
+
|
|
126
|
+
- Configured model: `gpt-5.5`
|
|
127
|
+
- API model catalog count visible to the key: `126`
|
|
128
|
+
- Selected model: `gpt-5.5`
|
|
129
|
+
- Reported model from real Responses API call: `gpt-5.5-2026-04-23`
|
|
130
|
+
- Capability tested: Responses API, `reasoning.effort=xhigh`
|
|
131
|
+
- Reasoning tokens observed: `15`
|
|
132
|
+
- Output preview: `OK_OPENAI`
|
|
133
|
+
|
|
134
|
+
### Anthropic
|
|
135
|
+
|
|
136
|
+
- Configured model: `claude-opus-4-7`
|
|
137
|
+
- API model catalog count visible to the key: `9`
|
|
138
|
+
- Relevant advanced models observed: `claude-opus-4-7`,
|
|
139
|
+
`claude-opus-4-6`, `claude-sonnet-4-6`
|
|
140
|
+
- Selected model: `claude-opus-4-7`
|
|
141
|
+
- Capability tested: Messages API, `thinking.type=adaptive`,
|
|
142
|
+
`thinking.display=omitted`, `output_config.effort=max`
|
|
143
|
+
- Additional capability test: `output_config.effort=xhigh`
|
|
144
|
+
- Reported model from both real calls: `claude-opus-4-7`
|
|
145
|
+
- Stop reason: `end_turn`
|
|
146
|
+
|
|
147
|
+
### Google Gemini
|
|
148
|
+
|
|
149
|
+
- Configured model: `gemini-3.1-pro-preview`
|
|
150
|
+
- API model catalog count visible to the key: `55`
|
|
151
|
+
- Selected model: `gemini-3.1-pro-preview`
|
|
152
|
+
- Selected model metadata:
|
|
153
|
+
- `inputTokenLimit`: `1048576`
|
|
154
|
+
- `outputTokenLimit`: `65536`
|
|
155
|
+
- `supportedActions`: `generateContent`, `countTokens`,
|
|
156
|
+
`createCachedContent`, `batchGenerateContent`
|
|
157
|
+
- `thinking`: `true`
|
|
158
|
+
- Capability tested: `generateContent` with `thinkingConfig.thinkingLevel=HIGH`
|
|
159
|
+
- Thoughts token count observed: `115`
|
|
160
|
+
- Output preview: `OK_GEMINI`
|
|
161
|
+
|
|
162
|
+
### DeepSeek
|
|
163
|
+
|
|
164
|
+
- Configured model: `deepseek-v4-pro`
|
|
165
|
+
- API model catalog count visible to the key: `2`
|
|
166
|
+
- Models visible to the key: `deepseek-v4-flash`, `deepseek-v4-pro`
|
|
167
|
+
- Selected model: `deepseek-v4-pro`
|
|
168
|
+
- Capability tested: OpenAI-compatible chat completions with
|
|
169
|
+
`thinking.type=enabled` and `reasoning_effort=max`
|
|
170
|
+
- Reasoning tokens observed: `29`
|
|
171
|
+
- Output preview: `OK_DEEPSEEK`
|
|
172
|
+
|
|
173
|
+
## Local Runtime Evidence
|
|
174
|
+
|
|
175
|
+
`npm test` passed after the latest change set. The test command includes:
|
|
176
|
+
|
|
177
|
+
1. `npm run build`
|
|
178
|
+
2. `npm run smoke`
|
|
179
|
+
3. `npm run runtime-smoke`
|
|
180
|
+
|
|
181
|
+
Runtime smoke reported this server identity:
|
|
182
|
+
|
|
183
|
+
```json
|
|
184
|
+
{
|
|
185
|
+
"name": "cross-review-v2",
|
|
186
|
+
"publisher": "LCV Ideas & Software",
|
|
187
|
+
"version": "2.1.1",
|
|
188
|
+
"release_date": "2026-04-30",
|
|
189
|
+
"transport": "stdio",
|
|
190
|
+
"api_only": true,
|
|
191
|
+
"cli_execution": false,
|
|
192
|
+
"stable_release": true,
|
|
193
|
+
"max_output_tokens": 20000,
|
|
194
|
+
"stub": true,
|
|
195
|
+
"retry_timeout_ms": 1800000
|
|
196
|
+
}
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
The package dry run reported:
|
|
200
|
+
|
|
201
|
+
```json
|
|
202
|
+
{
|
|
203
|
+
"id": "@lcv-ideas-software/cross-review-v2@2.1.1",
|
|
204
|
+
"name": "@lcv-ideas-software/cross-review-v2",
|
|
205
|
+
"version": "2.1.1",
|
|
206
|
+
"filename": "lcv-ideas-software-cross-review-v2-2.1.1.tgz",
|
|
207
|
+
"entryCount": 91,
|
|
208
|
+
"bundled": []
|
|
209
|
+
}
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
The dry-run file list includes `dist/`, `docs/`, `README.md`, `LICENSE`,
|
|
213
|
+
`NOTICE`, `SECURITY.md`, `CHANGELOG.md`, `package.json`, and this report. It
|
|
214
|
+
does not include `data/`, `.env`, logs, session files or API keys.
|
|
215
|
+
|
|
216
|
+
Full dry-run path list:
|
|
217
|
+
|
|
218
|
+
```text
|
|
219
|
+
CHANGELOG.md
|
|
220
|
+
LICENSE
|
|
221
|
+
NOTICE
|
|
222
|
+
README.md
|
|
223
|
+
SECURITY.md
|
|
224
|
+
dist/scripts/runtime-smoke.d.ts
|
|
225
|
+
dist/scripts/runtime-smoke.js
|
|
226
|
+
dist/scripts/runtime-smoke.js.map
|
|
227
|
+
dist/scripts/smoke.d.ts
|
|
228
|
+
dist/scripts/smoke.js
|
|
229
|
+
dist/scripts/smoke.js.map
|
|
230
|
+
dist/src/core/config.d.ts
|
|
231
|
+
dist/src/core/config.js
|
|
232
|
+
dist/src/core/config.js.map
|
|
233
|
+
dist/src/core/convergence.d.ts
|
|
234
|
+
dist/src/core/convergence.js
|
|
235
|
+
dist/src/core/convergence.js.map
|
|
236
|
+
dist/src/core/cost.d.ts
|
|
237
|
+
dist/src/core/cost.js
|
|
238
|
+
dist/src/core/cost.js.map
|
|
239
|
+
dist/src/core/orchestrator.d.ts
|
|
240
|
+
dist/src/core/orchestrator.js
|
|
241
|
+
dist/src/core/orchestrator.js.map
|
|
242
|
+
dist/src/core/reports.d.ts
|
|
243
|
+
dist/src/core/reports.js
|
|
244
|
+
dist/src/core/reports.js.map
|
|
245
|
+
dist/src/core/session-store.d.ts
|
|
246
|
+
dist/src/core/session-store.js
|
|
247
|
+
dist/src/core/session-store.js.map
|
|
248
|
+
dist/src/core/status.d.ts
|
|
249
|
+
dist/src/core/status.js
|
|
250
|
+
dist/src/core/status.js.map
|
|
251
|
+
dist/src/core/timeouts.d.ts
|
|
252
|
+
dist/src/core/timeouts.js
|
|
253
|
+
dist/src/core/timeouts.js.map
|
|
254
|
+
dist/src/core/types.d.ts
|
|
255
|
+
dist/src/core/types.js
|
|
256
|
+
dist/src/core/types.js.map
|
|
257
|
+
dist/src/dashboard/server.d.ts
|
|
258
|
+
dist/src/dashboard/server.js
|
|
259
|
+
dist/src/dashboard/server.js.map
|
|
260
|
+
dist/src/mcp/server.d.ts
|
|
261
|
+
dist/src/mcp/server.js
|
|
262
|
+
dist/src/mcp/server.js.map
|
|
263
|
+
dist/src/observability/logger.d.ts
|
|
264
|
+
dist/src/observability/logger.js
|
|
265
|
+
dist/src/observability/logger.js.map
|
|
266
|
+
dist/src/peers/anthropic.d.ts
|
|
267
|
+
dist/src/peers/anthropic.js
|
|
268
|
+
dist/src/peers/anthropic.js.map
|
|
269
|
+
dist/src/peers/base.d.ts
|
|
270
|
+
dist/src/peers/base.js
|
|
271
|
+
dist/src/peers/base.js.map
|
|
272
|
+
dist/src/peers/deepseek.d.ts
|
|
273
|
+
dist/src/peers/deepseek.js
|
|
274
|
+
dist/src/peers/deepseek.js.map
|
|
275
|
+
dist/src/peers/errors.d.ts
|
|
276
|
+
dist/src/peers/errors.js
|
|
277
|
+
dist/src/peers/errors.js.map
|
|
278
|
+
dist/src/peers/gemini.d.ts
|
|
279
|
+
dist/src/peers/gemini.js
|
|
280
|
+
dist/src/peers/gemini.js.map
|
|
281
|
+
dist/src/peers/model-selection.d.ts
|
|
282
|
+
dist/src/peers/model-selection.js
|
|
283
|
+
dist/src/peers/model-selection.js.map
|
|
284
|
+
dist/src/peers/openai.d.ts
|
|
285
|
+
dist/src/peers/openai.js
|
|
286
|
+
dist/src/peers/openai.js.map
|
|
287
|
+
dist/src/peers/registry.d.ts
|
|
288
|
+
dist/src/peers/registry.js
|
|
289
|
+
dist/src/peers/registry.js.map
|
|
290
|
+
dist/src/peers/retry.d.ts
|
|
291
|
+
dist/src/peers/retry.js
|
|
292
|
+
dist/src/peers/retry.js.map
|
|
293
|
+
dist/src/peers/stub.d.ts
|
|
294
|
+
dist/src/peers/stub.js
|
|
295
|
+
dist/src/peers/stub.js.map
|
|
296
|
+
dist/src/peers/text.d.ts
|
|
297
|
+
dist/src/peers/text.js
|
|
298
|
+
dist/src/peers/text.js.map
|
|
299
|
+
dist/src/security/redact.d.ts
|
|
300
|
+
dist/src/security/redact.js
|
|
301
|
+
dist/src/security/redact.js.map
|
|
302
|
+
docs/api-keys.md
|
|
303
|
+
docs/architecture.md
|
|
304
|
+
docs/costs.md
|
|
305
|
+
docs/github-security-baseline.md
|
|
306
|
+
docs/model-selection.md
|
|
307
|
+
docs/reports/cross-review-v2-api-capability-smoke-2026-04-30.md
|
|
308
|
+
docs/reports/cross-review-v2-format-recovery-findings-2026-04-28.md
|
|
309
|
+
package.json
|
|
310
|
+
```
|
|
311
|
+
|
|
312
|
+
## Regression Coverage Added
|
|
313
|
+
|
|
314
|
+
- `scripts/smoke.ts` verifies `CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS` accepts
|
|
315
|
+
positive integers and falls back to `20000` for invalid values.
|
|
316
|
+
- `scripts/smoke.ts` verifies all four adapters use the configured output
|
|
317
|
+
token value rather than hard-coded limits.
|
|
318
|
+
- `scripts/smoke.ts` verifies Anthropic, Gemini and DeepSeek thinking markers
|
|
319
|
+
are present in adapter source.
|
|
320
|
+
- `scripts/smoke.ts` verifies active priority lists do not contain
|
|
321
|
+
`claude-haiku-4-5`, `gemini-3-pro-preview`, `deepseek-chat` or
|
|
322
|
+
`deepseek-reasoner`.
|
|
323
|
+
- `scripts/smoke.ts` verifies a provider returning only
|
|
324
|
+
`claude-haiku-4-5-20251001` does not cause a silent weak-model downgrade; the
|
|
325
|
+
selected model remains `claude-opus-4-7` with `confidence=unknown`.
|
|
326
|
+
- `scripts/smoke.ts` verifies malformed, mismatched, overlapping, repeated,
|
|
327
|
+
CRLF and long private-key markers do not reintroduce the previous redaction
|
|
328
|
+
complexity issue.
|
|
329
|
+
- `scripts/smoke.ts` verifies empty peer output triggers the full decision retry
|
|
330
|
+
and records `decision_retry_succeeded`.
|
|
331
|
+
- `scripts/smoke.ts` verifies moderation recovery, model fallback, budget
|
|
332
|
+
preflight, cooperative cancellation, runtime events and metrics.
|
|
333
|
+
|
|
334
|
+
## Pre-Commit Identity
|
|
335
|
+
|
|
336
|
+
This evidence report is intentionally pre-commit. The candidate is based on
|
|
337
|
+
current `main` HEAD `b7ae98836dfe8c461d72b406e5ab30712705d765` plus the local
|
|
338
|
+
working-tree changes described above. The `release_date` constant is
|
|
339
|
+
intentionally set to `2026-04-30`, the planned ship date for package `2.1.1`.
|
|
340
|
+
The final commit SHA and GitHub release tag will be produced only after
|
|
341
|
+
cross-review approval and the commit/push workflow.
|
|
342
|
+
|
|
343
|
+
## Release Implications
|
|
344
|
+
|
|
345
|
+
- The current model defaults are reachable with the user's keys.
|
|
346
|
+
- The runtime should continue to prefer advanced thinking-capable models only.
|
|
347
|
+
- `claude-haiku-4-5`, `gemini-3-pro-preview`, `deepseek-chat` and
|
|
348
|
+
`deepseek-reasoner` must stay out of active priority lists.
|
|
349
|
+
- If a provider model API returns candidates but none match the advanced
|
|
350
|
+
priority list, the runtime must keep the documented advanced fallback rather
|
|
351
|
+
than silently downgrading to a weaker returned candidate.
|
|
352
|
+
- `CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS=20000` remains the configured production
|
|
353
|
+
output budget; this smoke used a smaller request budget to avoid unnecessary
|
|
354
|
+
test output while proving the parameters are accepted.
|
|
@@ -0,0 +1,223 @@
|
|
|
1
|
+
# Cross Review v2 - Format Recovery Findings
|
|
2
|
+
|
|
3
|
+
Date: 2026-04-28, America/Sao_Paulo
|
|
4
|
+
Runtime: pre-stable API-first runtime 2.0.0-alpha.2
|
|
5
|
+
|
|
6
|
+
## Context
|
|
7
|
+
|
|
8
|
+
This report records real operational issues found while using the API-first cross-review runtime
|
|
9
|
+
that later became `cross-review-v2`, while reviewing the published Maestro Editorial AI v0.3.11
|
|
10
|
+
release.
|
|
11
|
+
|
|
12
|
+
The reviewed release itself was published successfully:
|
|
13
|
+
|
|
14
|
+
- Repository: `LCV-Ideas-Software/maestro-app`
|
|
15
|
+
- Commit: `ec37513`
|
|
16
|
+
- Release: `v0.3.11`
|
|
17
|
+
- Release URL: `https://github.com/LCV-Ideas-Software/maestro-app/releases/tag/v0.3.11`
|
|
18
|
+
- Release asset: `maestro-editorial-ai-v0.3.11-windows-x64-portable.zip`
|
|
19
|
+
- Asset SHA-256: `f97947b1a7ea74ae8d652d64fbbb0b9146fe5a2d6b60bf176aaa0346d66f6b62`
|
|
20
|
+
- CI, Release, CodeQL, and Code Quality runs: success
|
|
21
|
+
- Open code scanning alerts: 0
|
|
22
|
+
- Open Dependabot alerts: 0
|
|
23
|
+
|
|
24
|
+
## Sessions
|
|
25
|
+
|
|
26
|
+
- `b560d4fb-640e-46cf-9ff3-26218cdfdddf`
|
|
27
|
+
- `16d55e54-4b8c-4153-8451-818c3fc37625`
|
|
28
|
+
- `41e5d453-84ed-45a3-9c6d-c70c31a9d9f9`
|
|
29
|
+
|
|
30
|
+
Relevant persisted files live under:
|
|
31
|
+
|
|
32
|
+
- `data/sessions/<session-id>/meta.json`
|
|
33
|
+
- `data/sessions/<session-id>/agent-runs/*.json`
|
|
34
|
+
- `data/sessions/b560d4fb-640e-46cf-9ff3-26218cdfdddf/evidence/`
|
|
35
|
+
|
|
36
|
+
## Findings
|
|
37
|
+
|
|
38
|
+
### 1. READY content can be classified as NEEDS_EVIDENCE when `summary` exceeds 800 chars
|
|
39
|
+
|
|
40
|
+
Several peers returned semantically clear `READY` decisions, but the structured
|
|
41
|
+
parser rejected the response because `summary` was longer than 800 characters.
|
|
42
|
+
The round then recorded the peer as `NEEDS_EVIDENCE`.
|
|
43
|
+
|
|
44
|
+
Observed examples:
|
|
45
|
+
|
|
46
|
+
- `b560d4fb-640e-46cf-9ff3-26218cdfdddf`
|
|
47
|
+
- `41e5d453-84ed-45a3-9c6d-c70c31a9d9f9`
|
|
48
|
+
|
|
49
|
+
The parser warning shape was:
|
|
50
|
+
|
|
51
|
+
```text
|
|
52
|
+
summary: Too big: expected string to have <=800 characters
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
Impact:
|
|
56
|
+
|
|
57
|
+
- A peer can agree with the decision but still block convergence.
|
|
58
|
+
- The operator must inspect raw artifacts to distinguish a true disagreement
|
|
59
|
+
from a formatting failure.
|
|
60
|
+
- This increases false-negative convergence results.
|
|
61
|
+
|
|
62
|
+
Recommended fix:
|
|
63
|
+
|
|
64
|
+
- Treat overlong summary as a recoverable format violation, not as substantive
|
|
65
|
+
`NEEDS_EVIDENCE`.
|
|
66
|
+
- Server-side normalize by truncating `summary` to the schema limit while
|
|
67
|
+
preserving the full raw text in the artifact.
|
|
68
|
+
- Add a parser warning such as `summary_truncated`, but keep `status=READY`
|
|
69
|
+
when the status is otherwise parseable and valid.
|
|
70
|
+
|
|
71
|
+
### 2. Recovery rounds can silently narrow quorum scope
|
|
72
|
+
|
|
73
|
+
After full-peer rounds produced `codex`, `gemini`, and `deepseek` as `READY`,
|
|
74
|
+
an isolated recovery call was sent only to `claude`. Claude returned `READY`,
|
|
75
|
+
and the runtime marked that round as converged because `expected_peers=["claude"]`.
|
|
76
|
+
|
|
77
|
+
Observed session:
|
|
78
|
+
|
|
79
|
+
- `41e5d453-84ed-45a3-9c6d-c70c31a9d9f9`
|
|
80
|
+
|
|
81
|
+
Impact:
|
|
82
|
+
|
|
83
|
+
- The tool can report `converged=true` for the recovery round while the session
|
|
84
|
+
no longer represents a single strict quadrilateral round.
|
|
85
|
+
- The correct human interpretation is "all peers reached READY across the
|
|
86
|
+
original round plus a format-recovery round", not "the latest full quorum
|
|
87
|
+
round converged".
|
|
88
|
+
|
|
89
|
+
Recommended fix:
|
|
90
|
+
|
|
91
|
+
- Add a first-class "format recovery" mode that retries only failed-format
|
|
92
|
+
peers but preserves the original quorum scope.
|
|
93
|
+
- Convergence should distinguish:
|
|
94
|
+
- `latest_round_converged`
|
|
95
|
+
- `session_quorum_converged`
|
|
96
|
+
- `recovery_converged`
|
|
97
|
+
- The public response should not collapse a recovery-only quorum into ordinary
|
|
98
|
+
strict unanimity.
|
|
99
|
+
|
|
100
|
+
### 3. Minimal prompts can cause peers to review the schema instead of the decision
|
|
101
|
+
|
|
102
|
+
In session `16d55e54-4b8c-4153-8451-818c3fc37625`, the draft included a JSON
|
|
103
|
+
schema example with placeholders like `READY|NOT_READY`. DeepSeek interpreted
|
|
104
|
+
the template itself as the artifact under review and returned `NOT_READY`
|
|
105
|
+
because it saw no concrete decision.
|
|
106
|
+
|
|
107
|
+
Impact:
|
|
108
|
+
|
|
109
|
+
- Attempts to reduce prompt size for parser compliance can create semantic
|
|
110
|
+
ambiguity.
|
|
111
|
+
- The model may correctly reject the prompt, but for the wrong target.
|
|
112
|
+
|
|
113
|
+
Recommended fix:
|
|
114
|
+
|
|
115
|
+
- The runtime should inject a non-ambiguous response contract internally instead of
|
|
116
|
+
requiring the caller to include a schema template in `draft`.
|
|
117
|
+
- Use a separate transport-level response schema or provider-native structured
|
|
118
|
+
output where available.
|
|
119
|
+
- If a schema example must be included, wrap it in a clearly labeled
|
|
120
|
+
`RESPONSE_FORMAT_INSTRUCTIONS` block and keep the reviewed artifact separate.
|
|
121
|
+
|
|
122
|
+
### 4. The runtime needs automated per-peer format retries
|
|
123
|
+
|
|
124
|
+
The operator had to manually create shorter prompts and isolated calls to
|
|
125
|
+
recover from parser failures.
|
|
126
|
+
|
|
127
|
+
Impact:
|
|
128
|
+
|
|
129
|
+
- Manual recovery is slow and easy to misinterpret.
|
|
130
|
+
- It can distort convergence scope, as described above.
|
|
131
|
+
|
|
132
|
+
Recommended fix:
|
|
133
|
+
|
|
134
|
+
- Add an automatic retry path when parsing fails but raw text includes a
|
|
135
|
+
recognizable status.
|
|
136
|
+
- Retry only the affected peer with a compact reformat instruction and the
|
|
137
|
+
original evidence.
|
|
138
|
+
- Preserve the original peer set in session convergence computation.
|
|
139
|
+
- Cap retries per peer to avoid runaway cost.
|
|
140
|
+
|
|
141
|
+
### 5. Raw status extraction should be separated from structured payload validation
|
|
142
|
+
|
|
143
|
+
The current behavior appears to conflate:
|
|
144
|
+
|
|
145
|
+
- status detection (`READY`, `NOT_READY`, `NEEDS_EVIDENCE`)
|
|
146
|
+
- structured object validation
|
|
147
|
+
- convergence eligibility
|
|
148
|
+
|
|
149
|
+
Impact:
|
|
150
|
+
|
|
151
|
+
- A valid status can be hidden by a non-critical structured validation issue.
|
|
152
|
+
|
|
153
|
+
Recommended fix:
|
|
154
|
+
|
|
155
|
+
- Parse status first.
|
|
156
|
+
- Validate structured fields second.
|
|
157
|
+
- Classify format defects by severity:
|
|
158
|
+
- fatal: no recognizable status, invalid JSON with no recoverable status
|
|
159
|
+
- recoverable: overlong summary, too many follow-ups, missing optional fields
|
|
160
|
+
- warning: extra fields, markdown fence around JSON
|
|
161
|
+
- Allow recoverable defects to become `READY_WITH_WARNINGS` internally, while
|
|
162
|
+
still counting as READY for convergence if the status is unambiguous.
|
|
163
|
+
|
|
164
|
+
## Suggested Acceptance Tests
|
|
165
|
+
|
|
166
|
+
1. Peer returns valid JSON with `status=READY` and a 1,500-character summary.
|
|
167
|
+
Expected: status counts as READY; summary is truncated or moved to raw text;
|
|
168
|
+
parser warning is recorded.
|
|
169
|
+
|
|
170
|
+
2. Four-peer round where three peers parse READY and one peer has overlong
|
|
171
|
+
summary but raw status READY.
|
|
172
|
+
Expected: session convergence can become true without manual intervention
|
|
173
|
+
after automatic recovery or normalization.
|
|
174
|
+
|
|
175
|
+
3. Recovery call for one peer after a four-peer round.
|
|
176
|
+
Expected: session-level quorum remains the original four peers; the response
|
|
177
|
+
explicitly reports that the latest call was a format recovery.
|
|
178
|
+
|
|
179
|
+
4. Draft includes a response schema example and a separate artifact.
|
|
180
|
+
Expected: peers review the artifact, not the schema placeholder.
|
|
181
|
+
|
|
182
|
+
5. Peer returns markdown-fenced JSON.
|
|
183
|
+
Expected: parser extracts JSON and records a warning instead of rejecting.
|
|
184
|
+
|
|
185
|
+
## Operator Interpretation for Maestro v0.3.11
|
|
186
|
+
|
|
187
|
+
For the Maestro v0.3.11 review, the substantive result was favorable:
|
|
188
|
+
|
|
189
|
+
- Codex: READY
|
|
190
|
+
- Gemini: READY
|
|
191
|
+
- DeepSeek: READY
|
|
192
|
+
- Claude: READY after isolated format-recovery prompt
|
|
193
|
+
|
|
194
|
+
However, because Claude's READY was obtained in an isolated recovery call, the
|
|
195
|
+
The runtime should not present this as a normal single-round quadrilateral convergence.
|
|
196
|
+
It should present it as recovered unanimity with explicit scope and audit trail.
|
|
197
|
+
|
|
198
|
+
## Implementation Update
|
|
199
|
+
|
|
200
|
+
Implemented locally after this report:
|
|
201
|
+
|
|
202
|
+
- Overlong `summary`, `evidence_sources`, `caller_requests` and `follow_ups`
|
|
203
|
+
fields are now normalized server-side when the peer status is unambiguous.
|
|
204
|
+
- Parser warnings now preserve the recovery reason in the audit trail instead
|
|
205
|
+
of converting the peer to a false `NEEDS_EVIDENCE`.
|
|
206
|
+
- Markdown-fenced JSON and tagged JSON are extracted with explicit parser
|
|
207
|
+
warnings.
|
|
208
|
+
- Invalid JSON with an unambiguous `"status": "..."` key is recovered as a
|
|
209
|
+
status-only structured result.
|
|
210
|
+
- Responses with no parseable status now trigger one automatic per-peer format
|
|
211
|
+
recovery attempt before the round is judged blocked.
|
|
212
|
+
- Recovery calls that cover only a subset of peers now preserve the prior
|
|
213
|
+
expected quorum and expose `latest_round_converged`,
|
|
214
|
+
`session_quorum_converged`, `recovery_converged` and `quorum_peers`.
|
|
215
|
+
- `statusInstruction()` now tells peers not to review the response-format
|
|
216
|
+
instructions as the artifact under review.
|
|
217
|
+
|
|
218
|
+
Validated with:
|
|
219
|
+
|
|
220
|
+
- `npm run typecheck`
|
|
221
|
+
- `npm run smoke`
|
|
222
|
+
- `npm run build`
|
|
223
|
+
- `npm run lint`
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
# cross-review-v2 Official Provider Docs Refresh — 2026-05-05
|
|
2
|
+
|
|
3
|
+
Scope: official documentation check for the five cross-review-v2 peers before
|
|
4
|
+
the v2.16.0 protocol repair release.
|
|
5
|
+
|
|
6
|
+
## Sources Checked
|
|
7
|
+
|
|
8
|
+
- OpenAI — GPT-5.5 latest-model guide:
|
|
9
|
+
https://developers.openai.com/api/docs/guides/latest-model
|
|
10
|
+
- OpenAI — model catalog:
|
|
11
|
+
https://developers.openai.com/api/docs/models
|
|
12
|
+
- OpenAI — Responses API reasoning fields:
|
|
13
|
+
https://developers.openai.com/api/reference/resources/responses
|
|
14
|
+
- Anthropic — Claude model overview:
|
|
15
|
+
https://platform.claude.com/docs/en/about-claude/models/overview
|
|
16
|
+
- Anthropic — extended/adaptive thinking:
|
|
17
|
+
https://platform.claude.com/docs/en/build-with-claude/extended-thinking
|
|
18
|
+
- Google — Gemini models:
|
|
19
|
+
https://ai.google.dev/gemini-api/docs/models
|
|
20
|
+
- Google — Gemini thinking:
|
|
21
|
+
https://ai.google.dev/gemini-api/docs/thinking
|
|
22
|
+
- DeepSeek — API changelog:
|
|
23
|
+
https://api-docs.deepseek.com/updates
|
|
24
|
+
- DeepSeek — reasoning model guide:
|
|
25
|
+
https://api-docs.deepseek.com/guides/reasoning_model
|
|
26
|
+
- xAI — Grok reasoning:
|
|
27
|
+
https://docs.x.ai/developers/model-capabilities/text/reasoning
|
|
28
|
+
- xAI — Grok multi-agent:
|
|
29
|
+
https://docs.x.ai/developers/model-capabilities/text/multi-agent
|
|
30
|
+
- xAI — models and pricing / aliases:
|
|
31
|
+
https://docs.x.ai/developers/models
|
|
32
|
+
|
|
33
|
+
## Findings Applied
|
|
34
|
+
|
|
35
|
+
- OpenAI: `gpt-5.5` remains the correct top Codex/OpenAI priority. Responses
|
|
36
|
+
API reasoning effort through `xhigh` is still compatible with the adapter.
|
|
37
|
+
- Anthropic: `claude-opus-4-7` remains the strongest Claude default for complex
|
|
38
|
+
reasoning and agentic coding. The adapter's adaptive-thinking path remains
|
|
39
|
+
aligned with current docs.
|
|
40
|
+
- Gemini: `gemini-3.1-pro-preview` remains the correct advanced Gemini priority.
|
|
41
|
+
`gemini-3-pro-preview` is deprecated/shut down and remains excluded.
|
|
42
|
+
- DeepSeek: `deepseek-v4-pro` and `deepseek-v4-flash` are the current V4 API
|
|
43
|
+
models. Legacy `deepseek-chat` and `deepseek-reasoner` are scheduled for
|
|
44
|
+
discontinuation on 2026-07-24 and remain excluded from priority fallbacks.
|
|
45
|
+
- Grok: `GROK_API_KEY` is canonical in this project. The xAI model catalog
|
|
46
|
+
currently recommends `grok-4.3` for general Chat API use, while the reasoning
|
|
47
|
+
docs identify `grok-4.20-multi-agent` as the only Grok model that accepts
|
|
48
|
+
explicit `reasoning.effort`. Automatic-reasoning models such as
|
|
49
|
+
`grok-4-latest`, `grok-4.3`, `grok-4.20`, and `grok-4.20-reasoning` must omit
|
|
50
|
+
that field. The priority list preserves operator choice through
|
|
51
|
+
`CROSS_REVIEW_GROK_MODEL` and keeps the explicit multi-agent model first for
|
|
52
|
+
cross-review runs that require agent-count control.
|
|
53
|
+
|
|
54
|
+
## Code/Docs Changes
|
|
55
|
+
|
|
56
|
+
- Updated `src/peers/model-selection.ts` Grok priority list and docs URL.
|
|
57
|
+
- Clarified Grok model/effort behavior in `src/peers/grok.ts`,
|
|
58
|
+
`src/core/config.ts`, `README.md`, and `docs/model-selection.md`.
|
|
59
|
+
- Added smoke coverage so the official-doc-backed priority list keeps current
|
|
60
|
+
model IDs and excludes known deprecated/weak IDs.
|