@lcv-ideas-software/cross-review 4.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (122) hide show
  1. package/CHANGELOG.md +2568 -0
  2. package/LICENSE +201 -0
  3. package/NOTICE +26 -0
  4. package/README.md +208 -0
  5. package/SECURITY.md +52 -0
  6. package/dist/scripts/api-streaming-smoke.d.ts +1 -0
  7. package/dist/scripts/api-streaming-smoke.js +78 -0
  8. package/dist/scripts/api-streaming-smoke.js.map +1 -0
  9. package/dist/scripts/runtime-default-smoke.d.ts +1 -0
  10. package/dist/scripts/runtime-default-smoke.js +88 -0
  11. package/dist/scripts/runtime-default-smoke.js.map +1 -0
  12. package/dist/scripts/runtime-smoke.d.ts +1 -0
  13. package/dist/scripts/runtime-smoke.js +148 -0
  14. package/dist/scripts/runtime-smoke.js.map +1 -0
  15. package/dist/scripts/smoke.d.ts +1 -0
  16. package/dist/scripts/smoke.js +6156 -0
  17. package/dist/scripts/smoke.js.map +1 -0
  18. package/dist/src/core/cache-manifest.d.ts +22 -0
  19. package/dist/src/core/cache-manifest.js +133 -0
  20. package/dist/src/core/cache-manifest.js.map +1 -0
  21. package/dist/src/core/caller-tokens.d.ts +32 -0
  22. package/dist/src/core/caller-tokens.js +240 -0
  23. package/dist/src/core/caller-tokens.js.map +1 -0
  24. package/dist/src/core/config.d.ts +9 -0
  25. package/dist/src/core/config.js +643 -0
  26. package/dist/src/core/config.js.map +1 -0
  27. package/dist/src/core/convergence.d.ts +5 -0
  28. package/dist/src/core/convergence.js +186 -0
  29. package/dist/src/core/convergence.js.map +1 -0
  30. package/dist/src/core/cost.d.ts +59 -0
  31. package/dist/src/core/cost.js +359 -0
  32. package/dist/src/core/cost.js.map +1 -0
  33. package/dist/src/core/file-config.d.ts +316 -0
  34. package/dist/src/core/file-config.js +490 -0
  35. package/dist/src/core/file-config.js.map +1 -0
  36. package/dist/src/core/orchestrator.d.ts +199 -0
  37. package/dist/src/core/orchestrator.js +3430 -0
  38. package/dist/src/core/orchestrator.js.map +1 -0
  39. package/dist/src/core/prompt-parts.d.ts +58 -0
  40. package/dist/src/core/prompt-parts.js +122 -0
  41. package/dist/src/core/prompt-parts.js.map +1 -0
  42. package/dist/src/core/relator-lottery.d.ts +23 -0
  43. package/dist/src/core/relator-lottery.js +112 -0
  44. package/dist/src/core/relator-lottery.js.map +1 -0
  45. package/dist/src/core/reports.d.ts +2 -0
  46. package/dist/src/core/reports.js +82 -0
  47. package/dist/src/core/reports.js.map +1 -0
  48. package/dist/src/core/session-store.d.ts +149 -0
  49. package/dist/src/core/session-store.js +1923 -0
  50. package/dist/src/core/session-store.js.map +1 -0
  51. package/dist/src/core/status.d.ts +61 -0
  52. package/dist/src/core/status.js +249 -0
  53. package/dist/src/core/status.js.map +1 -0
  54. package/dist/src/core/timeouts.d.ts +2 -0
  55. package/dist/src/core/timeouts.js +3 -0
  56. package/dist/src/core/timeouts.js.map +1 -0
  57. package/dist/src/core/types.d.ts +604 -0
  58. package/dist/src/core/types.js +36 -0
  59. package/dist/src/core/types.js.map +1 -0
  60. package/dist/src/dashboard/server.d.ts +2 -0
  61. package/dist/src/dashboard/server.js +339 -0
  62. package/dist/src/dashboard/server.js.map +1 -0
  63. package/dist/src/mcp/server.d.ts +54 -0
  64. package/dist/src/mcp/server.js +1584 -0
  65. package/dist/src/mcp/server.js.map +1 -0
  66. package/dist/src/observability/logger.d.ts +9 -0
  67. package/dist/src/observability/logger.js +24 -0
  68. package/dist/src/observability/logger.js.map +1 -0
  69. package/dist/src/peers/anthropic.d.ts +14 -0
  70. package/dist/src/peers/anthropic.js +290 -0
  71. package/dist/src/peers/anthropic.js.map +1 -0
  72. package/dist/src/peers/base.d.ts +72 -0
  73. package/dist/src/peers/base.js +416 -0
  74. package/dist/src/peers/base.js.map +1 -0
  75. package/dist/src/peers/deepseek.d.ts +12 -0
  76. package/dist/src/peers/deepseek.js +246 -0
  77. package/dist/src/peers/deepseek.js.map +1 -0
  78. package/dist/src/peers/errors.d.ts +2 -0
  79. package/dist/src/peers/errors.js +185 -0
  80. package/dist/src/peers/errors.js.map +1 -0
  81. package/dist/src/peers/gemini.d.ts +13 -0
  82. package/dist/src/peers/gemini.js +215 -0
  83. package/dist/src/peers/gemini.js.map +1 -0
  84. package/dist/src/peers/grok.d.ts +17 -0
  85. package/dist/src/peers/grok.js +346 -0
  86. package/dist/src/peers/grok.js.map +1 -0
  87. package/dist/src/peers/model-selection.d.ts +4 -0
  88. package/dist/src/peers/model-selection.js +260 -0
  89. package/dist/src/peers/model-selection.js.map +1 -0
  90. package/dist/src/peers/openai.d.ts +14 -0
  91. package/dist/src/peers/openai.js +299 -0
  92. package/dist/src/peers/openai.js.map +1 -0
  93. package/dist/src/peers/perplexity.d.ts +18 -0
  94. package/dist/src/peers/perplexity.js +375 -0
  95. package/dist/src/peers/perplexity.js.map +1 -0
  96. package/dist/src/peers/registry.d.ts +3 -0
  97. package/dist/src/peers/registry.js +77 -0
  98. package/dist/src/peers/registry.js.map +1 -0
  99. package/dist/src/peers/retry.d.ts +2 -0
  100. package/dist/src/peers/retry.js +36 -0
  101. package/dist/src/peers/retry.js.map +1 -0
  102. package/dist/src/peers/stub.d.ts +13 -0
  103. package/dist/src/peers/stub.js +344 -0
  104. package/dist/src/peers/stub.js.map +1 -0
  105. package/dist/src/peers/text.d.ts +18 -0
  106. package/dist/src/peers/text.js +39 -0
  107. package/dist/src/peers/text.js.map +1 -0
  108. package/dist/src/security/redact.d.ts +2 -0
  109. package/dist/src/security/redact.js +128 -0
  110. package/dist/src/security/redact.js.map +1 -0
  111. package/docs/api-keys.md +34 -0
  112. package/docs/architecture.md +118 -0
  113. package/docs/caching.md +135 -0
  114. package/docs/costs.md +40 -0
  115. package/docs/evidence-preflight.md +88 -0
  116. package/docs/github-security-baseline.md +32 -0
  117. package/docs/model-selection.md +105 -0
  118. package/docs/reports/cross-review-v2-api-capability-smoke-2026-04-30.md +354 -0
  119. package/docs/reports/cross-review-v2-format-recovery-findings-2026-04-28.md +223 -0
  120. package/docs/reports/cross-review-v2-official-provider-docs-refresh-2026-05-05.md +60 -0
  121. package/docs/reports/cross-review-v2-token-streaming-smoke-2026-04-30.md +119 -0
  122. package/package.json +88 -0
@@ -0,0 +1,354 @@
1
+ # Cross Review v2 - API Capability Smoke
2
+
3
+ Date: 2026-04-30, America/Sao_Paulo
4
+ Runtime under test: local `cross-review-v2` source, package version `2.1.1`
5
+
6
+ ## Purpose
7
+
8
+ This report records a real provider capability smoke test for the API-first
9
+ runtime. It is intended to support release review without exposing API keys,
10
+ raw secrets, full prompts or full provider responses.
11
+
12
+ The test used the same Windows environment variable strategy as the MCP host
13
+ configuration. The keys were detected in the current process, User environment
14
+ and Machine environment. No key value was printed or written to this report.
15
+
16
+ ## Official Documentation Checked
17
+
18
+ - OpenAI latest model: `https://platform.openai.com/docs/guides/latest-model`
19
+ (redirects to `https://developers.openai.com/api/docs/guides/latest-model`)
20
+ - OpenAI reasoning:
21
+ `https://platform.openai.com/docs/guides/reasoning`
22
+ - Anthropic adaptive thinking: `https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking`
23
+ - Anthropic effort: `https://platform.claude.com/docs/en/build-with-claude/effort`
24
+ - Google Gemini thinking: `https://ai.google.dev/gemini-api/docs/thinking`
25
+ - DeepSeek quick start: `https://api-docs.deepseek.com/`
26
+ - DeepSeek thinking mode: `https://api-docs.deepseek.com/guides/thinking_mode`
27
+ - DeepSeek multi-round conversation: `https://api-docs.deepseek.com/guides/multi_round_chat`
28
+
29
+ Relevant current documentation observations:
30
+
31
+ - OpenAI documents GPT-5.5 as the latest model page, recommends updating the
32
+ model slug to `gpt-5.5`, recommends the Responses API for reasoning,
33
+ tool-calling and multi-turn use cases, and documents `xhigh` for the hardest
34
+ asynchronous agentic tasks and security/code review workloads.
35
+ - Anthropic's official Claude API Docs on `platform.claude.com` document
36
+ `claude-opus-4-7` with adaptive thinking and `output_config.effort`,
37
+ including `xhigh` for advanced coding and complex agentic work.
38
+ - Gemini documents `thinkingConfig` for Gemini API calls; the real model
39
+ metadata for `gemini-3.1-pro-preview` reports `thinking: true`.
40
+ - DeepSeek documents `deepseek-v4-pro`, `thinking.type=enabled`,
41
+ `reasoning_effort=high|max`, JavaScript OpenAI-client examples with
42
+ top-level `thinking` and `reasoning_effort`, the 2026-07-24 deprecation of
43
+ `deepseek-chat` and `deepseek-reasoner`, and stateless multi-round chat
44
+ behavior.
45
+
46
+ ## Model Exclusion Rationale
47
+
48
+ - `deepseek-chat` and `deepseek-reasoner` are excluded because DeepSeek marks
49
+ both names for deprecation on 2026-07-24 and maps them to compatibility names
50
+ for `deepseek-v4-flash`.
51
+ - `claude-haiku-4-5` is excluded because the cross-review role requires the
52
+ advanced Opus/Sonnet adaptive-thinking line. Anthropic documents adaptive
53
+ thinking support for Opus 4.7, Opus 4.6 and Sonnet 4.6; Haiku is not in the
54
+ active advanced priority set for this peer-review role.
55
+ - `gemini-3-pro-preview` is excluded because the user's key exposes the newer
56
+ `gemini-3.1-pro-preview` model with thinking support. The runtime should use
57
+ the highest visible advanced thinking model and avoid older intermediate
58
+ previews when a newer advanced model is available.
59
+
60
+ ## Redacted Model API Excerpts
61
+
62
+ The following snippets are reduced to non-secret model metadata needed for
63
+ release review. They are not full API-key-bearing responses.
64
+
65
+ ```json
66
+ {
67
+ "openai": {
68
+ "models_endpoint_count": 126,
69
+ "relevant_model_ids": [
70
+ "gpt-5.5",
71
+ "gpt-5.4",
72
+ "gpt-5.2",
73
+ "gpt-5.1-codex-max",
74
+ "gpt-5.1-codex",
75
+ "gpt-5.1",
76
+ "gpt-5-pro",
77
+ "gpt-5"
78
+ ],
79
+ "selected": "gpt-5.5",
80
+ "reported_model": "gpt-5.5-2026-04-23"
81
+ },
82
+ "anthropic": {
83
+ "models_endpoint_count": 9,
84
+ "relevant_model_ids": [
85
+ "claude-opus-4-7",
86
+ "claude-opus-4-6",
87
+ "claude-sonnet-4-6",
88
+ "claude-opus-4-5-20251101",
89
+ "claude-haiku-4-5-20251001"
90
+ ],
91
+ "selected": "claude-opus-4-7",
92
+ "reported_model": "claude-opus-4-7"
93
+ },
94
+ "gemini": {
95
+ "models_endpoint_count": 55,
96
+ "selected_model_metadata": {
97
+ "id": "gemini-3.1-pro-preview",
98
+ "displayName": "Gemini 3.1 Pro Preview",
99
+ "inputTokenLimit": 1048576,
100
+ "outputTokenLimit": 65536,
101
+ "supportedActions": [
102
+ "generateContent",
103
+ "countTokens",
104
+ "createCachedContent",
105
+ "batchGenerateContent"
106
+ ],
107
+ "thinking": true
108
+ },
109
+ "reported_model": "gemini-3.1-pro-preview"
110
+ },
111
+ "deepseek": {
112
+ "models_endpoint_count": 2,
113
+ "model_ids": ["deepseek-v4-flash", "deepseek-v4-pro"],
114
+ "selected": "deepseek-v4-pro",
115
+ "reported_model": "deepseek-v4-pro"
116
+ }
117
+ }
118
+ ```
119
+
120
+ ## Real API Results
121
+
122
+ All four provider capability checks succeeded.
123
+
124
+ ### OpenAI
125
+
126
+ - Configured model: `gpt-5.5`
127
+ - API model catalog count visible to the key: `126`
128
+ - Selected model: `gpt-5.5`
129
+ - Reported model from real Responses API call: `gpt-5.5-2026-04-23`
130
+ - Capability tested: Responses API, `reasoning.effort=xhigh`
131
+ - Reasoning tokens observed: `15`
132
+ - Output preview: `OK_OPENAI`
133
+
134
+ ### Anthropic
135
+
136
+ - Configured model: `claude-opus-4-7`
137
+ - API model catalog count visible to the key: `9`
138
+ - Relevant advanced models observed: `claude-opus-4-7`,
139
+ `claude-opus-4-6`, `claude-sonnet-4-6`
140
+ - Selected model: `claude-opus-4-7`
141
+ - Capability tested: Messages API, `thinking.type=adaptive`,
142
+ `thinking.display=omitted`, `output_config.effort=max`
143
+ - Additional capability test: `output_config.effort=xhigh`
144
+ - Reported model from both real calls: `claude-opus-4-7`
145
+ - Stop reason: `end_turn`
146
+
147
+ ### Google Gemini
148
+
149
+ - Configured model: `gemini-3.1-pro-preview`
150
+ - API model catalog count visible to the key: `55`
151
+ - Selected model: `gemini-3.1-pro-preview`
152
+ - Selected model metadata:
153
+ - `inputTokenLimit`: `1048576`
154
+ - `outputTokenLimit`: `65536`
155
+ - `supportedActions`: `generateContent`, `countTokens`,
156
+ `createCachedContent`, `batchGenerateContent`
157
+ - `thinking`: `true`
158
+ - Capability tested: `generateContent` with `thinkingConfig.thinkingLevel=HIGH`
159
+ - Thoughts token count observed: `115`
160
+ - Output preview: `OK_GEMINI`
161
+
162
+ ### DeepSeek
163
+
164
+ - Configured model: `deepseek-v4-pro`
165
+ - API model catalog count visible to the key: `2`
166
+ - Models visible to the key: `deepseek-v4-flash`, `deepseek-v4-pro`
167
+ - Selected model: `deepseek-v4-pro`
168
+ - Capability tested: OpenAI-compatible chat completions with
169
+ `thinking.type=enabled` and `reasoning_effort=max`
170
+ - Reasoning tokens observed: `29`
171
+ - Output preview: `OK_DEEPSEEK`
172
+
173
+ ## Local Runtime Evidence
174
+
175
+ `npm test` passed after the latest change set. The test command includes:
176
+
177
+ 1. `npm run build`
178
+ 2. `npm run smoke`
179
+ 3. `npm run runtime-smoke`
180
+
181
+ Runtime smoke reported this server identity:
182
+
183
+ ```json
184
+ {
185
+ "name": "cross-review-v2",
186
+ "publisher": "LCV Ideas & Software",
187
+ "version": "2.1.1",
188
+ "release_date": "2026-04-30",
189
+ "transport": "stdio",
190
+ "api_only": true,
191
+ "cli_execution": false,
192
+ "stable_release": true,
193
+ "max_output_tokens": 20000,
194
+ "stub": true,
195
+ "retry_timeout_ms": 1800000
196
+ }
197
+ ```
198
+
199
+ The package dry run reported:
200
+
201
+ ```json
202
+ {
203
+ "id": "@lcv-ideas-software/cross-review-v2@2.1.1",
204
+ "name": "@lcv-ideas-software/cross-review-v2",
205
+ "version": "2.1.1",
206
+ "filename": "lcv-ideas-software-cross-review-v2-2.1.1.tgz",
207
+ "entryCount": 91,
208
+ "bundled": []
209
+ }
210
+ ```
211
+
212
+ The dry-run file list includes `dist/`, `docs/`, `README.md`, `LICENSE`,
213
+ `NOTICE`, `SECURITY.md`, `CHANGELOG.md`, `package.json`, and this report. It
214
+ does not include `data/`, `.env`, logs, session files or API keys.
215
+
216
+ Full dry-run path list:
217
+
218
+ ```text
219
+ CHANGELOG.md
220
+ LICENSE
221
+ NOTICE
222
+ README.md
223
+ SECURITY.md
224
+ dist/scripts/runtime-smoke.d.ts
225
+ dist/scripts/runtime-smoke.js
226
+ dist/scripts/runtime-smoke.js.map
227
+ dist/scripts/smoke.d.ts
228
+ dist/scripts/smoke.js
229
+ dist/scripts/smoke.js.map
230
+ dist/src/core/config.d.ts
231
+ dist/src/core/config.js
232
+ dist/src/core/config.js.map
233
+ dist/src/core/convergence.d.ts
234
+ dist/src/core/convergence.js
235
+ dist/src/core/convergence.js.map
236
+ dist/src/core/cost.d.ts
237
+ dist/src/core/cost.js
238
+ dist/src/core/cost.js.map
239
+ dist/src/core/orchestrator.d.ts
240
+ dist/src/core/orchestrator.js
241
+ dist/src/core/orchestrator.js.map
242
+ dist/src/core/reports.d.ts
243
+ dist/src/core/reports.js
244
+ dist/src/core/reports.js.map
245
+ dist/src/core/session-store.d.ts
246
+ dist/src/core/session-store.js
247
+ dist/src/core/session-store.js.map
248
+ dist/src/core/status.d.ts
249
+ dist/src/core/status.js
250
+ dist/src/core/status.js.map
251
+ dist/src/core/timeouts.d.ts
252
+ dist/src/core/timeouts.js
253
+ dist/src/core/timeouts.js.map
254
+ dist/src/core/types.d.ts
255
+ dist/src/core/types.js
256
+ dist/src/core/types.js.map
257
+ dist/src/dashboard/server.d.ts
258
+ dist/src/dashboard/server.js
259
+ dist/src/dashboard/server.js.map
260
+ dist/src/mcp/server.d.ts
261
+ dist/src/mcp/server.js
262
+ dist/src/mcp/server.js.map
263
+ dist/src/observability/logger.d.ts
264
+ dist/src/observability/logger.js
265
+ dist/src/observability/logger.js.map
266
+ dist/src/peers/anthropic.d.ts
267
+ dist/src/peers/anthropic.js
268
+ dist/src/peers/anthropic.js.map
269
+ dist/src/peers/base.d.ts
270
+ dist/src/peers/base.js
271
+ dist/src/peers/base.js.map
272
+ dist/src/peers/deepseek.d.ts
273
+ dist/src/peers/deepseek.js
274
+ dist/src/peers/deepseek.js.map
275
+ dist/src/peers/errors.d.ts
276
+ dist/src/peers/errors.js
277
+ dist/src/peers/errors.js.map
278
+ dist/src/peers/gemini.d.ts
279
+ dist/src/peers/gemini.js
280
+ dist/src/peers/gemini.js.map
281
+ dist/src/peers/model-selection.d.ts
282
+ dist/src/peers/model-selection.js
283
+ dist/src/peers/model-selection.js.map
284
+ dist/src/peers/openai.d.ts
285
+ dist/src/peers/openai.js
286
+ dist/src/peers/openai.js.map
287
+ dist/src/peers/registry.d.ts
288
+ dist/src/peers/registry.js
289
+ dist/src/peers/registry.js.map
290
+ dist/src/peers/retry.d.ts
291
+ dist/src/peers/retry.js
292
+ dist/src/peers/retry.js.map
293
+ dist/src/peers/stub.d.ts
294
+ dist/src/peers/stub.js
295
+ dist/src/peers/stub.js.map
296
+ dist/src/peers/text.d.ts
297
+ dist/src/peers/text.js
298
+ dist/src/peers/text.js.map
299
+ dist/src/security/redact.d.ts
300
+ dist/src/security/redact.js
301
+ dist/src/security/redact.js.map
302
+ docs/api-keys.md
303
+ docs/architecture.md
304
+ docs/costs.md
305
+ docs/github-security-baseline.md
306
+ docs/model-selection.md
307
+ docs/reports/cross-review-v2-api-capability-smoke-2026-04-30.md
308
+ docs/reports/cross-review-v2-format-recovery-findings-2026-04-28.md
309
+ package.json
310
+ ```
311
+
312
+ ## Regression Coverage Added
313
+
314
+ - `scripts/smoke.ts` verifies `CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS` accepts
315
+ positive integers and falls back to `20000` for invalid values.
316
+ - `scripts/smoke.ts` verifies all four adapters use the configured output
317
+ token value rather than hard-coded limits.
318
+ - `scripts/smoke.ts` verifies Anthropic, Gemini and DeepSeek thinking markers
319
+ are present in adapter source.
320
+ - `scripts/smoke.ts` verifies active priority lists do not contain
321
+ `claude-haiku-4-5`, `gemini-3-pro-preview`, `deepseek-chat` or
322
+ `deepseek-reasoner`.
323
+ - `scripts/smoke.ts` verifies a provider returning only
324
+ `claude-haiku-4-5-20251001` does not cause a silent weak-model downgrade; the
325
+ selected model remains `claude-opus-4-7` with `confidence=unknown`.
326
+ - `scripts/smoke.ts` verifies malformed, mismatched, overlapping, repeated,
327
+ CRLF and long private-key markers do not reintroduce the previous redaction
328
+ complexity issue.
329
+ - `scripts/smoke.ts` verifies empty peer output triggers the full decision retry
330
+ and records `decision_retry_succeeded`.
331
+ - `scripts/smoke.ts` verifies moderation recovery, model fallback, budget
332
+ preflight, cooperative cancellation, runtime events and metrics.
333
+
334
+ ## Pre-Commit Identity
335
+
336
+ This evidence report is intentionally pre-commit. The candidate is based on
337
+ current `main` HEAD `b7ae98836dfe8c461d72b406e5ab30712705d765` plus the local
338
+ working-tree changes described above. The `release_date` constant is
339
+ intentionally set to `2026-04-30`, the planned ship date for package `2.1.1`.
340
+ The final commit SHA and GitHub release tag will be produced only after
341
+ cross-review approval and the commit/push workflow.
342
+
343
+ ## Release Implications
344
+
345
+ - The current model defaults are reachable with the user's keys.
346
+ - The runtime should continue to prefer advanced thinking-capable models only.
347
+ - `claude-haiku-4-5`, `gemini-3-pro-preview`, `deepseek-chat` and
348
+ `deepseek-reasoner` must stay out of active priority lists.
349
+ - If a provider model API returns candidates but none match the advanced
350
+ priority list, the runtime must keep the documented advanced fallback rather
351
+ than silently downgrading to a weaker returned candidate.
352
+ - `CROSS_REVIEW_V2_MAX_OUTPUT_TOKENS=20000` remains the configured production
353
+ output budget; this smoke used a smaller request budget to avoid unnecessary
354
+ test output while proving the parameters are accepted.
@@ -0,0 +1,223 @@
1
+ # Cross Review v2 - Format Recovery Findings
2
+
3
+ Date: 2026-04-28, America/Sao_Paulo
4
+ Runtime: pre-stable API-first runtime 2.0.0-alpha.2
5
+
6
+ ## Context
7
+
8
+ This report records real operational issues found while using the API-first cross-review runtime
9
+ that later became `cross-review-v2`, while reviewing the published Maestro Editorial AI v0.3.11
10
+ release.
11
+
12
+ The reviewed release itself was published successfully:
13
+
14
+ - Repository: `LCV-Ideas-Software/maestro-app`
15
+ - Commit: `ec37513`
16
+ - Release: `v0.3.11`
17
+ - Release URL: `https://github.com/LCV-Ideas-Software/maestro-app/releases/tag/v0.3.11`
18
+ - Release asset: `maestro-editorial-ai-v0.3.11-windows-x64-portable.zip`
19
+ - Asset SHA-256: `f97947b1a7ea74ae8d652d64fbbb0b9146fe5a2d6b60bf176aaa0346d66f6b62`
20
+ - CI, Release, CodeQL, and Code Quality runs: success
21
+ - Open code scanning alerts: 0
22
+ - Open Dependabot alerts: 0
23
+
24
+ ## Sessions
25
+
26
+ - `b560d4fb-640e-46cf-9ff3-26218cdfdddf`
27
+ - `16d55e54-4b8c-4153-8451-818c3fc37625`
28
+ - `41e5d453-84ed-45a3-9c6d-c70c31a9d9f9`
29
+
30
+ Relevant persisted files live under:
31
+
32
+ - `data/sessions/<session-id>/meta.json`
33
+ - `data/sessions/<session-id>/agent-runs/*.json`
34
+ - `data/sessions/b560d4fb-640e-46cf-9ff3-26218cdfdddf/evidence/`
35
+
36
+ ## Findings
37
+
38
+ ### 1. READY content can be classified as NEEDS_EVIDENCE when `summary` exceeds 800 chars
39
+
40
+ Several peers returned semantically clear `READY` decisions, but the structured
41
+ parser rejected the response because `summary` was longer than 800 characters.
42
+ The round then recorded the peer as `NEEDS_EVIDENCE`.
43
+
44
+ Observed examples:
45
+
46
+ - `b560d4fb-640e-46cf-9ff3-26218cdfdddf`
47
+ - `41e5d453-84ed-45a3-9c6d-c70c31a9d9f9`
48
+
49
+ The parser warning shape was:
50
+
51
+ ```text
52
+ summary: Too big: expected string to have <=800 characters
53
+ ```
54
+
55
+ Impact:
56
+
57
+ - A peer can agree with the decision but still block convergence.
58
+ - The operator must inspect raw artifacts to distinguish a true disagreement
59
+ from a formatting failure.
60
+ - This increases false-negative convergence results.
61
+
62
+ Recommended fix:
63
+
64
+ - Treat overlong summary as a recoverable format violation, not as substantive
65
+ `NEEDS_EVIDENCE`.
66
+ - Server-side normalize by truncating `summary` to the schema limit while
67
+ preserving the full raw text in the artifact.
68
+ - Add a parser warning such as `summary_truncated`, but keep `status=READY`
69
+ when the status is otherwise parseable and valid.
70
+
71
+ ### 2. Recovery rounds can silently narrow quorum scope
72
+
73
+ After full-peer rounds produced `codex`, `gemini`, and `deepseek` as `READY`,
74
+ an isolated recovery call was sent only to `claude`. Claude returned `READY`,
75
+ and the runtime marked that round as converged because `expected_peers=["claude"]`.
76
+
77
+ Observed session:
78
+
79
+ - `41e5d453-84ed-45a3-9c6d-c70c31a9d9f9`
80
+
81
+ Impact:
82
+
83
+ - The tool can report `converged=true` for the recovery round while the session
84
+ no longer represents a single strict quadrilateral round.
85
+ - The correct human interpretation is "all peers reached READY across the
86
+ original round plus a format-recovery round", not "the latest full quorum
87
+ round converged".
88
+
89
+ Recommended fix:
90
+
91
+ - Add a first-class "format recovery" mode that retries only failed-format
92
+ peers but preserves the original quorum scope.
93
+ - Convergence should distinguish:
94
+ - `latest_round_converged`
95
+ - `session_quorum_converged`
96
+ - `recovery_converged`
97
+ - The public response should not collapse a recovery-only quorum into ordinary
98
+ strict unanimity.
99
+
100
+ ### 3. Minimal prompts can cause peers to review the schema instead of the decision
101
+
102
+ In session `16d55e54-4b8c-4153-8451-818c3fc37625`, the draft included a JSON
103
+ schema example with placeholders like `READY|NOT_READY`. DeepSeek interpreted
104
+ the template itself as the artifact under review and returned `NOT_READY`
105
+ because it saw no concrete decision.
106
+
107
+ Impact:
108
+
109
+ - Attempts to reduce prompt size for parser compliance can create semantic
110
+ ambiguity.
111
+ - The model may correctly reject the prompt, but for the wrong target.
112
+
113
+ Recommended fix:
114
+
115
+ - The runtime should inject a non-ambiguous response contract internally instead of
116
+ requiring the caller to include a schema template in `draft`.
117
+ - Use a separate transport-level response schema or provider-native structured
118
+ output where available.
119
+ - If a schema example must be included, wrap it in a clearly labeled
120
+ `RESPONSE_FORMAT_INSTRUCTIONS` block and keep the reviewed artifact separate.
121
+
122
+ ### 4. The runtime needs automated per-peer format retries
123
+
124
+ The operator had to manually create shorter prompts and isolated calls to
125
+ recover from parser failures.
126
+
127
+ Impact:
128
+
129
+ - Manual recovery is slow and easy to misinterpret.
130
+ - It can distort convergence scope, as described above.
131
+
132
+ Recommended fix:
133
+
134
+ - Add an automatic retry path when parsing fails but raw text includes a
135
+ recognizable status.
136
+ - Retry only the affected peer with a compact reformat instruction and the
137
+ original evidence.
138
+ - Preserve the original peer set in session convergence computation.
139
+ - Cap retries per peer to avoid runaway cost.
140
+
141
+ ### 5. Raw status extraction should be separated from structured payload validation
142
+
143
+ The current behavior appears to conflate:
144
+
145
+ - status detection (`READY`, `NOT_READY`, `NEEDS_EVIDENCE`)
146
+ - structured object validation
147
+ - convergence eligibility
148
+
149
+ Impact:
150
+
151
+ - A valid status can be hidden by a non-critical structured validation issue.
152
+
153
+ Recommended fix:
154
+
155
+ - Parse status first.
156
+ - Validate structured fields second.
157
+ - Classify format defects by severity:
158
+ - fatal: no recognizable status, invalid JSON with no recoverable status
159
+ - recoverable: overlong summary, too many follow-ups, missing optional fields
160
+ - warning: extra fields, markdown fence around JSON
161
+ - Allow recoverable defects to become `READY_WITH_WARNINGS` internally, while
162
+ still counting as READY for convergence if the status is unambiguous.
163
+
164
+ ## Suggested Acceptance Tests
165
+
166
+ 1. Peer returns valid JSON with `status=READY` and a 1,500-character summary.
167
+ Expected: status counts as READY; summary is truncated or moved to raw text;
168
+ parser warning is recorded.
169
+
170
+ 2. Four-peer round where three peers parse READY and one peer has overlong
171
+ summary but raw status READY.
172
+ Expected: session convergence can become true without manual intervention
173
+ after automatic recovery or normalization.
174
+
175
+ 3. Recovery call for one peer after a four-peer round.
176
+ Expected: session-level quorum remains the original four peers; the response
177
+ explicitly reports that the latest call was a format recovery.
178
+
179
+ 4. Draft includes a response schema example and a separate artifact.
180
+ Expected: peers review the artifact, not the schema placeholder.
181
+
182
+ 5. Peer returns markdown-fenced JSON.
183
+ Expected: parser extracts JSON and records a warning instead of rejecting.
184
+
185
+ ## Operator Interpretation for Maestro v0.3.11
186
+
187
+ For the Maestro v0.3.11 review, the substantive result was favorable:
188
+
189
+ - Codex: READY
190
+ - Gemini: READY
191
+ - DeepSeek: READY
192
+ - Claude: READY after isolated format-recovery prompt
193
+
194
+ However, because Claude's READY was obtained in an isolated recovery call, the
195
+ The runtime should not present this as a normal single-round quadrilateral convergence.
196
+ It should present it as recovered unanimity with explicit scope and audit trail.
197
+
198
+ ## Implementation Update
199
+
200
+ Implemented locally after this report:
201
+
202
+ - Overlong `summary`, `evidence_sources`, `caller_requests` and `follow_ups`
203
+ fields are now normalized server-side when the peer status is unambiguous.
204
+ - Parser warnings now preserve the recovery reason in the audit trail instead
205
+ of converting the peer to a false `NEEDS_EVIDENCE`.
206
+ - Markdown-fenced JSON and tagged JSON are extracted with explicit parser
207
+ warnings.
208
+ - Invalid JSON with an unambiguous `"status": "..."` key is recovered as a
209
+ status-only structured result.
210
+ - Responses with no parseable status now trigger one automatic per-peer format
211
+ recovery attempt before the round is judged blocked.
212
+ - Recovery calls that cover only a subset of peers now preserve the prior
213
+ expected quorum and expose `latest_round_converged`,
214
+ `session_quorum_converged`, `recovery_converged` and `quorum_peers`.
215
+ - `statusInstruction()` now tells peers not to review the response-format
216
+ instructions as the artifact under review.
217
+
218
+ Validated with:
219
+
220
+ - `npm run typecheck`
221
+ - `npm run smoke`
222
+ - `npm run build`
223
+ - `npm run lint`
@@ -0,0 +1,60 @@
1
+ # cross-review-v2 Official Provider Docs Refresh — 2026-05-05
2
+
3
+ Scope: official documentation check for the five cross-review-v2 peers before
4
+ the v2.16.0 protocol repair release.
5
+
6
+ ## Sources Checked
7
+
8
+ - OpenAI — GPT-5.5 latest-model guide:
9
+ https://developers.openai.com/api/docs/guides/latest-model
10
+ - OpenAI — model catalog:
11
+ https://developers.openai.com/api/docs/models
12
+ - OpenAI — Responses API reasoning fields:
13
+ https://developers.openai.com/api/reference/resources/responses
14
+ - Anthropic — Claude model overview:
15
+ https://platform.claude.com/docs/en/about-claude/models/overview
16
+ - Anthropic — extended/adaptive thinking:
17
+ https://platform.claude.com/docs/en/build-with-claude/extended-thinking
18
+ - Google — Gemini models:
19
+ https://ai.google.dev/gemini-api/docs/models
20
+ - Google — Gemini thinking:
21
+ https://ai.google.dev/gemini-api/docs/thinking
22
+ - DeepSeek — API changelog:
23
+ https://api-docs.deepseek.com/updates
24
+ - DeepSeek — reasoning model guide:
25
+ https://api-docs.deepseek.com/guides/reasoning_model
26
+ - xAI — Grok reasoning:
27
+ https://docs.x.ai/developers/model-capabilities/text/reasoning
28
+ - xAI — Grok multi-agent:
29
+ https://docs.x.ai/developers/model-capabilities/text/multi-agent
30
+ - xAI — models and pricing / aliases:
31
+ https://docs.x.ai/developers/models
32
+
33
+ ## Findings Applied
34
+
35
+ - OpenAI: `gpt-5.5` remains the correct top Codex/OpenAI priority. Responses
36
+ API reasoning effort through `xhigh` is still compatible with the adapter.
37
+ - Anthropic: `claude-opus-4-7` remains the strongest Claude default for complex
38
+ reasoning and agentic coding. The adapter's adaptive-thinking path remains
39
+ aligned with current docs.
40
+ - Gemini: `gemini-3.1-pro-preview` remains the correct advanced Gemini priority.
41
+ `gemini-3-pro-preview` is deprecated/shut down and remains excluded.
42
+ - DeepSeek: `deepseek-v4-pro` and `deepseek-v4-flash` are the current V4 API
43
+ models. Legacy `deepseek-chat` and `deepseek-reasoner` are scheduled for
44
+ discontinuation on 2026-07-24 and remain excluded from priority fallbacks.
45
+ - Grok: `GROK_API_KEY` is canonical in this project. The xAI model catalog
46
+ currently recommends `grok-4.3` for general Chat API use, while the reasoning
47
+ docs identify `grok-4.20-multi-agent` as the only Grok model that accepts
48
+ explicit `reasoning.effort`. Automatic-reasoning models such as
49
+ `grok-4-latest`, `grok-4.3`, `grok-4.20`, and `grok-4.20-reasoning` must omit
50
+ that field. The priority list preserves operator choice through
51
+ `CROSS_REVIEW_GROK_MODEL` and keeps the explicit multi-agent model first for
52
+ cross-review runs that require agent-count control.
53
+
54
+ ## Code/Docs Changes
55
+
56
+ - Updated `src/peers/model-selection.ts` Grok priority list and docs URL.
57
+ - Clarified Grok model/effort behavior in `src/peers/grok.ts`,
58
+ `src/core/config.ts`, `README.md`, and `docs/model-selection.md`.
59
+ - Added smoke coverage so the official-doc-backed priority list keeps current
60
+ model IDs and excludes known deprecated/weak IDs.