academic-army 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (68) hide show
  1. package/.editorconfig +9 -0
  2. package/.github/workflows/publish.yml +44 -0
  3. package/.prettierrc.json +3 -0
  4. package/LICENSE +21 -0
  5. package/README.md +172 -0
  6. package/README.zh-CN.md +172 -0
  7. package/agent-forge.yaml +83 -0
  8. package/eslint.config.js +28 -0
  9. package/install_mcp.py +85 -0
  10. package/mcp-server/__main__.py +33 -0
  11. package/mcp-server/deepresearch/__init__.py +3 -0
  12. package/mcp-server/deepresearch/tools.py +33 -0
  13. package/mcp-server/requirements.txt +4 -0
  14. package/metaskills/README.md +131 -0
  15. package/metaskills/README.zh-CN.md +131 -0
  16. package/metaskills/academic-army-architect/METASKILL.md +91 -0
  17. package/metaskills/academic-army-architect/envolve.sh +9 -0
  18. package/metaskills/academic-army-coding-plan/ENVOLVETASK.md +1 -0
  19. package/metaskills/academic-army-coding-plan/METASKILL.md +118 -0
  20. package/metaskills/academic-army-coding-plan/envolve.sh +9 -0
  21. package/metaskills/academic-army-coding-style/METASKILL.md +292 -0
  22. package/metaskills/academic-army-experiment-plan/ENVOLVETASK.md +1 -0
  23. package/metaskills/academic-army-experiment-plan/METASKILL.md +82 -0
  24. package/metaskills/academic-army-experiment-plan/envolve.sh +9 -0
  25. package/metaskills/academic-army-repo-scaffold/ENVOLVETASK.md +1 -0
  26. package/metaskills/academic-army-repo-scaffold/METASKILL.md +223 -0
  27. package/metaskills/academic-army-repo-scaffold/envolve.sh +9 -0
  28. package/package.json +35 -0
  29. package/runs/develop-skill.sh +17 -0
  30. package/runs/develop.sh +16 -0
  31. package/skills/academic-army-architect/SKILL.md +336 -0
  32. package/skills/academic-army-architect/agents/openai.yaml +11 -0
  33. package/skills/academic-army-architect/references/blueprint-schema.md +345 -0
  34. package/skills/academic-army-coding-plan/SKILL.md +491 -0
  35. package/skills/academic-army-coding-plan/agents/openai.yaml +11 -0
  36. package/skills/academic-army-coding-style/SKILL.md +915 -0
  37. package/skills/academic-army-coding-style/agents/openai.yaml +11 -0
  38. package/skills/academic-army-experiment-plan/SKILL.md +517 -0
  39. package/skills/academic-army-experiment-plan/agents/openai.yaml +11 -0
  40. package/skills/academic-army-repo-scaffold/SKILL.md +756 -0
  41. package/skills/academic-army-repo-scaffold/agents/openai.yaml +10 -0
  42. package/src/README.md +79 -0
  43. package/src/README.zh-CN.md +79 -0
  44. package/src/cli.ts +55 -0
  45. package/src/developing/README.md +146 -0
  46. package/src/developing/README.zh-CN.md +146 -0
  47. package/src/developing/agents/developer.ts +40 -0
  48. package/src/developing/agents/factory.ts +11 -0
  49. package/src/developing/agents/index.ts +8 -0
  50. package/src/developing/agents/manager.ts +74 -0
  51. package/src/developing/agents/prompts.ts +12 -0
  52. package/src/developing/agents/reviewer.ts +44 -0
  53. package/src/developing/agents/trajectory-optimizer.ts +70 -0
  54. package/src/developing/agents/types.ts +41 -0
  55. package/src/developing/index.ts +2 -0
  56. package/src/developing/pipeline.ts +306 -0
  57. package/src/developing/pipelineskill.ts +169 -0
  58. package/src/evolve-skill/README.md +116 -0
  59. package/src/evolve-skill/README.zh-CN.md +116 -0
  60. package/src/evolve-skill/agents/evaluator.ts +28 -0
  61. package/src/evolve-skill/agents/factory.ts +11 -0
  62. package/src/evolve-skill/agents/index.ts +4 -0
  63. package/src/evolve-skill/agents/modifier.ts +27 -0
  64. package/src/evolve-skill/agents/runner.ts +19 -0
  65. package/src/evolve-skill/index.ts +1 -0
  66. package/src/evolve-skill/pipeline.ts +140 -0
  67. package/src/pipeline.ts +65 -0
  68. package/tsconfig.json +22 -0
@@ -0,0 +1,915 @@
1
+ ---
2
+ name: academic-army-coding-style
3
+ description: >-
4
+ Maintain clean, local, low-coupling code trajectories in existing Academic
5
+ Army research repositories. Use when Codex writes or edits code, refactors
6
+ modules, implements features, harnesses, tests, methods, baselines, metrics,
7
+ result exports, or framework docs. This skill does not initialize template
8
+ repositories or generate full project scaffolds from empty directories.
9
+ ---
10
+
11
+ # Academic Army Coding Style
12
+
13
+ ## Mission
14
+
15
+ Produce clear, direct, maintainable code inside an existing repository. The
16
+ upstream user task decides what to implement; this skill decides how to keep the
17
+ implementation readable, local, low-coupling, and consistent with the current
18
+ framework.
19
+
20
+ Do not use this skill to create a repository template from scratch. Template
21
+ initialization belongs to the dedicated initialization skill. This skill may add
22
+ files, modules, subfolders, tests, harness support, or documentation only when
23
+ the current task or current repository framework needs them.
24
+
25
+ ## Operating Boundary
26
+
27
+ Use the user-specified repository root as the boundary for project work. Do not
28
+ create, modify, or reference project files outside that root unless the user
29
+ explicitly asks.
30
+
31
+ Respect the current repository structure, language ecosystem, naming style,
32
+ test layout, and framework documents. Improve them locally when they make the
33
+ current feature hard to read, test, or change, but do not redesign the whole
34
+ repository just because upstream plans describe future systems.
35
+
36
+ Ignore unrelated drafts, logs, old outputs, historical runs, and nearby files
37
+ unless the user makes them part of the task.
38
+
39
+ Keep the fixed experiment directories when they already exist:
40
+
41
+ - `data/`: input datasets, pointers, traces, manifests, fixtures, or sample data
42
+ - `output/`: program-run outputs and intermediate artifacts
43
+ - `results/`: experiment result records and curated artifacts
44
+ - `harness/`: harness code, harness contracts, configs, schemas, samples, and
45
+ support files
46
+
47
+ Do not force a fixed test directory. Tests follow the repository's existing
48
+ layout, project configuration, initialization docs, or adjacent test style.
49
+
50
+ ## Task Classification
51
+
52
+ Before editing, classify the task:
53
+
54
+ - **Feature or code implementation**: implement the smallest clear code path
55
+ that satisfies the requested behavior.
56
+ - **Refactor or structure cleanup**: move, split, merge, rename, or delete code
57
+ only to improve the current change's locality, readability, or testability.
58
+ - **Harness work**: keep harness code under the relevant `harness/` subfolder;
59
+ make the harness objective, inputs, metrics, raw artifacts, and run loop
60
+ explicit.
61
+ - **Test work**: place tests in the existing test system's natural location;
62
+ keep each test focused on one behavior with small fixtures or toy inputs.
63
+ - **Method, baseline, metric, or export work**: keep the change close to that
64
+ extension point and update registration, docs, exports, and tests only when
65
+ those surfaces are explicitly in scope.
66
+ - **Validation-only pass**: run the exact requested focused validation before
67
+ editing. If it passes, make no source, test, docs, dependency, export, or TODO
68
+ changes except removing generated cache artifacts. If it fails, inspect the
69
+ failure first and make only the smallest local fix to the accepted contract
70
+ named by the failing test or source path.
71
+ Before running, verify that every explicitly requested test target exists; a
72
+ missing target is a validation blocker to report, not permission to create a
73
+ new test, drop that target from the command, or run a narrower substitute.
74
+ Existing dirty or untracked files that are part of the accepted in-progress
75
+ surface are not a reason to reset or clean the repository; validate the
76
+ current tree as given and remove only cache artifacts created by the run. If
77
+ no cache or bytecode artifacts are present after the run, report that finding
78
+ instead of treating cleanup as a required edit.
79
+ - **Framework/documentation sync**: update `FRAMEWORK.md` and
80
+ `FRAMEWORK.zh-CN.md` when module boundaries, extension points, harness/test
81
+ organization, artifact schemas, or repository responsibilities change.
82
+ - **Repository integrity repair**: restore, revert, or clean only the explicitly
83
+ requested tracked files or artifacts from the trusted source named by the
84
+ user. Do not use planning documents, old reports, or memory to recreate
85
+ missing files, and do not advance the repository with new implementation,
86
+ tests, harnesses, dependencies, generated artifacts, or TODO direction unless
87
+ the user separately requests that work.
88
+ - **Docs-only scaffold or handoff repair**: describe the current repository
89
+ exactly. Do not backfill missing code, tests, loaders, schemas, dependencies,
90
+ entrypoints, results, TODO status, or execution claims.
91
+ - **Trajectory or TODO maintenance**: record only accepted, verified work and a
92
+ next task only when a next task was requested or already exists as part of
93
+ the active workflow. Do not invent implementation work from broad plans after
94
+ a restore, cleanup, or docs-only repair.
95
+
96
+ If the task is broad, choose a local implementation slice that can be reviewed.
97
+ If the next useful work would require real data, algorithmic evidence, harness
98
+ runs, metrics, experiments, baselines, or generated results beyond the user's
99
+ request, pause or mark the trajectory finished instead of inventing another
100
+ static task.
101
+
102
+ ## Pre-Edit Inventory
103
+
104
+ For existing repositories, establish a small task-relevant inventory:
105
+
106
+ - target repository root and version-control root
107
+ - files and directories relevant to the task
108
+ - files expected to be created, modified, deleted, and left untouched
109
+ - current test layout and harness layout when either is relevant
110
+ - whether task-relevant files are present, untracked, tracked, or absent
111
+ - for record-backed helpers, the accepted constructor fields, constructor
112
+ defaults, validation owner, existing `record_id` prefix and identity payload,
113
+ and whether package-level exports are in scope
114
+
115
+ Treat a suddenly empty or partially missing tree as an integrity blocker. Do not
116
+ reconstruct missing code from memory, reports, or plans unless the user requests
117
+ restoration from a trusted source.
118
+
119
+ For integrity-repair tasks, inventory the exact status entries before editing,
120
+ restore only those requested paths from the trusted source, and verify that the
121
+ same status entries are gone afterward. A successful restore is complete when
122
+ the requested version-control state is reached; it is not a signal to begin the
123
+ next implementation slice.
124
+
125
+ ## Implementation Style
126
+
127
+ Prefer code that is short, direct, and easy to read in execution order. The data
128
+ flow should be visible: where inputs come from, how they are transformed, where
129
+ they go, and what is returned or written.
130
+
131
+ Use names from the paper blueprint, experiment plan, coding plan, user request,
132
+ and current code semantics. Keep one concept's spelling consistent across code,
133
+ config, harnesses, tests, artifacts, prompts, and docs.
134
+
135
+ Keep responsibilities single:
136
+
137
+ - one file should mainly carry one interface, adapter family, metric family,
138
+ harness entry/support area, data-processing step, export shape, or test group
139
+ - split files that mix unrelated change reasons or abstraction levels
140
+ - merge or simplify files that only add thin wrappers, pure forwarding, or
141
+ extra jumping
142
+ - avoid `utils`, `misc`, mega-runners, and all-in-one modules unless they are
143
+ already narrow and stable
144
+
145
+ Prefer inline or local helpers when logic is used once and remains readable.
146
+ Extract helpers, adapters, registries, factories, contexts, or interfaces only
147
+ when they provide real reuse, isolate a stable boundary, preserve an invariant,
148
+ reduce caller code, or make tests simpler.
149
+
150
+ Do not add abstractions for imagined future cases. If a simple implementation
151
+ clearly satisfies the current task, keep it simple.
152
+
153
+ Reduce global state, hidden path assumptions, implicit side effects, long
154
+ calling chains, repeated registration points, and heavy configuration for simple
155
+ experiments.
156
+
157
+ When an interface forces every caller to pass excessive parameters, consider a
158
+ small explicit context or config object. Do not turn that into a framework when
159
+ plain values remain clearer.
160
+
161
+ ## Change Locality
162
+
163
+ Before writing code, identify the natural owner of the change:
164
+
165
+ - a method change should mainly touch method code and necessary comparison or
166
+ registration surfaces
167
+ - a baseline change should mainly touch baseline code and focused tests;
168
+ comparison, registration, docs, package exports, and harness surfaces change
169
+ only when explicitly requested
170
+ - a metric change should mainly touch metric definition, computation, export,
171
+ and tests
172
+ - a public package export change should mainly touch the package entrypoint or
173
+ existing export module plus a focused export-surface test; keep `__all__`
174
+ exactly aligned with the intended public names and do not export helpers or
175
+ adjacent internal types unless the task asks for them
176
+ - a harness change should mainly touch the relevant `harness/<name>/` area plus
177
+ necessary shared interfaces
178
+ - a result-artifact change should mainly touch artifact schema/export logic and
179
+ tests
180
+ - a loader or manifest change should mainly touch that input layer and tests
181
+
182
+ If one feature requires unrelated edits across many areas, treat that as a
183
+ framework-boundary risk. Do the smallest local refactor that brings related
184
+ code together, or report the coupling if a safe local refactor is outside the
185
+ current task.
186
+
187
+ Keep code that changes together close. Keep unrelated reasons to change in
188
+ separate modules. Public/shared layers should contain only stable capabilities
189
+ needed by multiple users; special cases should stay near their use sites.
190
+
191
+ ## Harness And Test Discipline
192
+
193
+ Harnesses serve paper goals, performance comparison, method screening, module
194
+ optimization, and experiment evaluation. Tests serve functional correctness,
195
+ interfaces, data formats, config parsing, metrics, export behavior, and basic
196
+ module interaction.
197
+
198
+ Keep harness and test responsibilities separate:
199
+
200
+ - harnesses should expose stable entry semantics, input protocols, metric names,
201
+ raw artifact records, seeds, splits, config snapshots, and parseable outputs
202
+ - tests should use small fixtures, toy inputs, and clear pass/fail expectations
203
+ - each test function should keep one named behavioral responsibility; when a
204
+ task adds a focused check, do not move or append unrelated assertions into
205
+ that test just because they use the same fixture or module
206
+ - no-mutation tests should pass the mutable sequence, mapping, or object that
207
+ they later inspect into the implementation; do not pass a tuple, copy, or
208
+ derived proxy and then assert that the untouched original stayed unchanged
209
+ - when the no-mutation contract is about source record objects rather than
210
+ container ordering, it is enough to inspect those exact objects before and
211
+ after the call; do not force a mutable container unless the container itself
212
+ is part of the mutation contract
213
+ - preserve existing test responsibility boundaries when editing a test file:
214
+ export-surface assertions belong in export tests, invalid-state assertions
215
+ belong in invalid-state tests, and identity or schema assertions belong in
216
+ their own clearly named tests
217
+ - harness code should not become functional test code
218
+ - test code should not become paper-performance evaluation
219
+ - plotting and paper table generation should consume raw or metric artifacts
220
+ outside the core harness logic
221
+
222
+ When a harness grows, split support modules inside that harness's own folder
223
+ before pushing special logic into a shared layer. When tests grow, split them in
224
+ the existing test system's style.
225
+
226
+ ## Framework Documents
227
+
228
+ Maintain `FRAMEWORK.md` and `FRAMEWORK.zh-CN.md` when the task changes the
229
+ repository's framework surface. The English document uses English; the Chinese
230
+ document uses Chinese and may keep code identifiers, method names, metric names,
231
+ harness names, and config keys in English.
232
+
233
+ Framework docs should describe current reality, not template initialization and
234
+ not aspirational implementation status. Include only capabilities that exist on
235
+ disk and distinguish:
236
+
237
+ - implemented source code
238
+ - reserved or documented extension points
239
+ - authored but unrun tests
240
+ - runnable harnesses versus README-only harness specs
241
+ - raw artifact schemas versus generated result artifacts
242
+ - installed or declared dependencies versus planned dependencies
243
+
244
+ Useful framework docs explain where future local changes should happen:
245
+
246
+ - stable boundaries and extension points
247
+ - change map from feature type to module/harness/test/export area
248
+ - harness purposes, metrics, and raw artifacts
249
+ - test organization actually used by the repository
250
+ - raw-first export approach and downstream analysis boundary
251
+ - framework risks where future changes cannot yet stay local
252
+
253
+ Do not put skill internals, tool mechanics, sandbox notes, or generation process
254
+ inside framework docs.
255
+
256
+ For README-style documentation syncs, update the requested overview, package,
257
+ and test documents as a consistent current-state map. Name implemented modules,
258
+ public entrypoints, helper contracts, and focused test files exactly as they
259
+ exist, and keep capability lists aligned across root, package, and test READMEs.
260
+ Before editing, read each requested README-style file and classify it as current,
261
+ stale, or internally inconsistent. Edit only the files that are stale or
262
+ inconsistent. If a requested doc already describes the accepted state, leave it
263
+ unchanged and say it was verified current; do not rewrite it for cosmetic
264
+ parallelism. If all requested docs are current, report a no-op docs sync rather
265
+ than creating formatting churn.
266
+ If the task text says the docs are stale but the live files already contain the
267
+ accepted helper names, metric names, test coverage, and narrowed absence clauses,
268
+ trust the current files over the stale premise. Report the exact surfaces that
269
+ were verified current and make no edit.
270
+ When a newly accepted helper and its focused test exist on disk but are missing
271
+ from current-surface lists, package-helper lists, layout tables, or test
272
+ summaries, treat that as a real stale-doc reason. Update each requested doc only
273
+ at the surfaces needed to make the current-state map complete and parallel with
274
+ accepted sibling helpers.
275
+ When an accepted surface changes from one implementation to multiple sibling
276
+ implementations, update singular wording in prose, package summaries, layout
277
+ tables, and test summaries together. A row that still says "the scheduler" or
278
+ "the helper" after documenting two accepted siblings is stale even when the
279
+ module filename is already listed correctly.
280
+ Also scan negative or absence statements in the requested docs, such as "no
281
+ metric tests are present" or "no helper exists"; update those when the accepted
282
+ files now exist, while preserving narrower exclusions such as no metric formulas,
283
+ aggregation logic, result exporters, harnesses, or experiments.
284
+ If an absence statement names a broad capability category that now has one
285
+ accepted bounded exception, rewrite the category instead of leaving a
286
+ contradiction. For example, after accepting a minimum baseline scheduler, do not
287
+ keep "no scheduling methods exist"; say no proposed RefABR algorithms, robust
288
+ MPC/BOLA/learned policies, full scheduling infrastructure, harnesses,
289
+ experiments, or paper outputs exist beyond that accepted baseline.
290
+ When a helper name overlaps with planned runtime work, such as metrics or method
291
+ manifests, keep or add explicit absence clauses for the adjacent runtime
292
+ capabilities so readers do not infer formulas, selection logic, runners, exports,
293
+ experiments, or paper outputs.
294
+ When documenting a bounded helper, state what it accepts, what it returns, and
295
+ which owner performs deeper validation, but do not convert that description into
296
+ claims about loaders, registries, fixtures, harnesses, metrics, runnable
297
+ experiments, dependencies, generated outputs, or paper results.
298
+ For metric-record helpers, describe only mapping-to-`MetricRecord`
299
+ normalization and focused validation. Do not let the word metric imply metric
300
+ formula implementation, computation, aggregation, result export, harness
301
+ execution, or paper-result generation.
302
+ For bounded metric-computation helper docs, describe only the accepted formula
303
+ that exists, its raw input record type, returned `MetricRecord` fields, and
304
+ focused validation. Narrow stale "no metrics" or "no metric computation"
305
+ clauses to exclude additional metric formulas, QoE weighting, aggregation
306
+ frameworks, result exporters, harnesses, experiments, and paper outputs without
307
+ contradicting the accepted in-memory computation helper. If the helper returns
308
+ provenance such as source record IDs, describe it as provenance from existing
309
+ source records, not as generated artifacts or result export. Do not call one
310
+ accepted formula the Metric Computation Layer, metric registry, experiment
311
+ metric suite, or paper-result metric implementation.
312
+ When multiple bounded metric-computation helpers are accepted in the same
313
+ module, name each formula helper in implemented-surface lists, package
314
+ summaries, layout rows, and test summaries. Keep absence clauses limited to
315
+ metric formulas and runtime surfaces beyond those accepted helpers.
316
+ For repeated metric-helper documentation syncs, use the newly accepted helper
317
+ as a checklist key across every requested README-style file: helper function
318
+ name, emitted `metric_name`, package/module summary, layout row, test summary,
319
+ and local absence clause. Do not stop after updating the first helper list; a
320
+ stale test row or absence sentence is still a docs-sync defect.
321
+ Also update the category label when the accepted helpers broaden the module's
322
+ meaning. If a non-rate helper such as mean latency joins rate helpers, replace
323
+ stale wording like "rate metric computation", "rate computation path", or
324
+ "deadline-rate helpers" with neutral "metric computation" wording or an exact
325
+ helper list; adding the new helper name while leaving a rate-only descriptor is
326
+ an incomplete docs sync.
327
+ When documenting focused coverage for numeric mean helpers, mirror the actual
328
+ test cases and metric semantics. Use zero-latency, zero-FPS, zero-quality,
329
+ all-zero, mixed, and single-frame wording only when those exact numeric cases
330
+ are covered; do not turn numeric zero cases into "none" wording, which belongs
331
+ to boolean/count rate cases only when tests assert no flagged frames.
332
+ In bilingual docs, check both the implemented-helper sentence and the local
333
+ absence sentence for stale singular exceptions. Phrases like "except the
334
+ accepted deadline-hit-rate helper" become misleading after a sibling helper is
335
+ accepted; rewrite them to name the current accepted helpers or to say only
336
+ additional formulas and runtime metric infrastructure remain absent. When a
337
+ third or later sibling helper is accepted, also replace stale "both helpers",
338
+ "two helpers", or "deadline-rate helpers" wording with an exact list or a
339
+ neutral plural description that includes the new helper.
340
+ For frozen-method-manifest helpers, describe only mapping-to-`FrozenMethodManifest`
341
+ normalization and focused validation. Do not let method, freeze, or manifest
342
+ wording imply method adapters, candidate selection, method-freeze selection
343
+ logic, harness runners, CLI entrypoints, package exports, experiments, or paper
344
+ outputs.
345
+ For bounded lifecycle projection helper docs, describe only
346
+ `simulate_reference_lifecycle_state(...)` as selected-reference `MediaObject`
347
+ plus explicit timing inputs projected into `ReferenceLifecycleState`, with
348
+ focused validation for that projection path. Do not call this helper a
349
+ normalizer, do not describe stable-ID coverage as a normalization path, and in
350
+ Chinese docs prefer `投影路径` over `归一化路径` for this helper. Do not let
351
+ lifecycle wording imply full simulators, transport models, event processors,
352
+ component profilers, dataset or trace loaders, metrics, harness runners,
353
+ experiments, or paper outputs.
354
+ For bounded raw-record mapping helper docs, describe only
355
+ `record_to_mapping(...)` as accepted-record-to-JSON-compatible-mapping
356
+ conversion in memory. State that it preserves accepted field names and each
357
+ source record's existing `record_id`; do not claim stable-ID mutation or
358
+ identity-bearing-field coverage unless the focused tests actually mutate those
359
+ fields. If a doc has broad absence wording such as "exports are not
360
+ implemented", narrow it after accepting this helper to file-based/result
361
+ exporters, JSONL/CSV writers, artifact schemas or directories, harnesses,
362
+ experiments, and paper outputs. Do not imply package-level exports unless the
363
+ package entrypoint exports the helper.
364
+ For bounded method scheduling interface docs, describe only the `MethodScheduler`
365
+ and `run_scheduler(...)` call boundary over existing records plus its focused
366
+ tests. Do not let scheduler, method, or interface wording imply concrete
367
+ scheduling methods, baseline policies, ABR algorithms, candidate selection,
368
+ utility models, adapters, harness runners, CLI entrypoints, package exports,
369
+ experiments, or paper outputs.
370
+ For bounded baseline scheduler docs, describe only the accepted simple baseline
371
+ surface that exists on disk, its input records, configuration keys, returned
372
+ record type, and focused tests. Do not let baseline wording imply proposed
373
+ RefABR methods, BOLA/MPC/learned policies, candidate generation, utility-model
374
+ research, harness runners, metric computation, result export, experiments, or
375
+ paper outputs unless those files and validations actually exist.
376
+ For cumulative helper docs, update prose lists, layout tables, package summaries,
377
+ and test summaries together with parallel wording for all accepted sibling
378
+ helpers. Do not imply that helper modules are package-level exports unless the
379
+ package entrypoint actually exports them.
380
+ For cumulative baseline docs, name each accepted simple baseline in every
381
+ current-surface description that characterizes the baseline module or its tests;
382
+ do not leave old singular descriptions in layout rows or package/test summaries.
383
+ If the user names an exact docs-only file set or excludes TODO/status files, do
384
+ not update TODO, handoff, source, test, dependency, export, or generated-artifact
385
+ surfaces as part of the documentation sync. Leave TODO or trajectory recording
386
+ to the separate TODO-maintenance step.
387
+
388
+ ## Content, State, And References
389
+
390
+ Keep content and references distinct in names and data flow. A variable named
391
+ like a path, ID, URL, handle, or reference should not contain already-read
392
+ content; a variable named like content should not contain an external location.
393
+
394
+ Only stable cross-boundary information belongs in shared models. Temporary,
395
+ single-run, display-only, orchestration-only, or unstable intermediate values
396
+ should stay local.
397
+
398
+ Assign one owner for writing, saving, exporting, or returning an artifact. Avoid
399
+ multiple layers claiming responsibility for the same output.
400
+
401
+ For shared record, schema, or metadata surfaces, treat identity as an explicit
402
+ contract. Prefer caller-provided stable IDs and required identity fields over
403
+ derived hashes, generated suffixes, timestamps, random values, or implicit
404
+ serialization choices. Add derived `record_id`, cache keys, or fingerprint
405
+ helpers only when the user or existing framework asks for derived identity.
406
+ When derived identity is required, document the identity-bearing fields in code
407
+ structure rather than prose alone, include every required field that changes the
408
+ record's semantic identity, and keep non-identity payload separate from the key.
409
+ Tests for identity-owner changes must mutate each identity-bearing field that
410
+ matters and prove the identifier changes. Add non-identity no-change checks only
411
+ when the task changes identity code, the user requests that distinction, or
412
+ existing tests already define it; otherwise cover non-identity payload through
413
+ pass-through or validation tests.
414
+ Before writing identity tests around an existing record, inspect the accepted
415
+ `record_id` construction or equivalent key contract and treat that as the source
416
+ of truth. A helper, loader, adapter, or normalizer task must not add fields to an
417
+ accepted record identity merely because a new helper test mutated those fields;
418
+ if a validation-only field is not part of the accepted key, cover it through
419
+ presence, pass-through, or delegated invalid-state tests instead. Do not infer a
420
+ record-id prefix or identity-bearing fields from the class name, helper name, or
421
+ domain wording; use the current implementation as the accepted contract unless
422
+ the user explicitly asks to revise that contract.
423
+
424
+ For bounded in-memory normalization helpers, preserve the accepted field names
425
+ and keep the helper as a thin adapter into the owning record or schema type.
426
+ Validate presence and shape only as far as the helper owns them, then delegate
427
+ domain validation, defaults, and stable identity to the record or schema that
428
+ already owns those contracts. Do not turn a mapping normalizer into file I/O, a
429
+ dataset registry, fixture discovery, trace loading, or a broader ingestion
430
+ layer unless the task explicitly asks for that expansion.
431
+
432
+ When adding a sibling normalizer, mirror the established local helper shape
433
+ before inventing a new pattern: required-field constant, optional-field
434
+ handling, mapping-type rejection, missing-key rejection, payload construction,
435
+ and constructor delegation. Focus tests on the helper's owned contract: valid
436
+ normalization, non-mapping rejection when required, missing required fields,
437
+ optional field pass-through, delegated invalid values, and stable identity
438
+ through the normalization path. When the user names accepted identity-bearing
439
+ fields, use that list and the current record implementation as the test
440
+ boundary; do not expand stable-ID coverage to optional non-identity fields just
441
+ because they are accepted payload. Do not use a normalizer test to redefine
442
+ which fields belong to the underlying record's identity; mutate fields already
443
+ owned by that record's accepted identity contract. If a normalizer exposes both
444
+ identity-bearing fields and validation-only fields, keep those assertions in
445
+ separate clearly named tests so reviewer feedback can distinguish adapter
446
+ behavior from record-contract changes. Keep mutated identity payloads valid so a
447
+ stable-ID test reaches the identity path rather than a domain validator.
448
+
449
+ For optional fields in normalizers, either pass them only when the input
450
+ contains them or mirror the owning record's constructor defaults exactly. Do not
451
+ invent aliases, alternate defaults, package-level exports, documentation
452
+ updates, or broader API surfaces as part of a normalizer source/test task unless
453
+ the user explicitly asks. If a normalizer test reveals that the accepted record
454
+ identity or validation contract may be wrong, report that as a separate
455
+ contract question; do not change the record class, schema, or existing record
456
+ tests inside the bounded normalizer task.
457
+
458
+ For frozen-method-manifest normalizers, keep the implementation to record
459
+ construction and focused validation only. The presence of `FrozenMethodManifest`
460
+ does not authorize method adapters, candidate selection, method-freeze selection
461
+ logic, harness runners, CLI entrypoints, package-level exports, file I/O, or
462
+ experiment artifacts.
463
+
464
+ For bounded in-memory metric computation helpers, implement only the named
465
+ formula over accepted raw record objects. Validate only helper-owned inputs such
466
+ as non-empty source sequences and aggregation-key presence, compute the direct
467
+ numerator, denominator, ratio, and provenance required by the task, and then
468
+ delegate metric-field validation and stable `record_id` generation to
469
+ `MetricRecord`. Do not add metric registries, QoE weighting, aggregation
470
+ frameworks, confidence statistics, file I/O, JSONL/CSV writers, result
471
+ exporters, artifact directories, package-level exports, docs, dependencies,
472
+ loaders, harness runners, CLI, experiments, or paper-output behavior unless the
473
+ task explicitly asks. Tests should cover all relevant boundary count cases,
474
+ empty input, invalid aggregation keys, source `record_id` provenance, no source
475
+ mutation, and `record_id` behavior according to the accepted `MetricRecord`
476
+ identity fields rather than inferred provenance fields.
477
+ For numeric metric helpers, boundary-value coverage must be literal. A test
478
+ named zero-latency, zero-FPS, all-zero, all-hit, all-miss, all-dropped,
479
+ none-dropped, or single-frame must use fixture values and assertions that
480
+ actually exercise that boundary; a mixed-valued case with a boundary word in its
481
+ name is not boundary coverage.
482
+ When adding a sibling formula to an existing bounded metric-computation module,
483
+ share a small private counting/build helper only if it makes both public
484
+ functions shorter and keeps the formula-specific fields obvious at each
485
+ entrypoint. Do not turn that helper into a registry, dispatcher, framework, or
486
+ configuration layer.
487
+ If a previously formula-specific private helper becomes shared by a broader
488
+ metric family, rename that helper in the same change so its name matches the
489
+ new responsibility. For example, a helper named for deadline rates should not
490
+ also compute dropped-frame rates; use a neutral operation/data name such as
491
+ `_compute_frame_rate` instead of adding another abstraction layer.
492
+ For sibling arithmetic-mean helpers over `FrameOutcome` fields, avoid one
493
+ private helper per field when those helpers duplicate validation and
494
+ `MetricRecord` construction. Either keep each public helper direct when it stays
495
+ short, or use one neutral helper such as `_compute_frame_mean` that accepts the
496
+ metric name, unit, direction, and value extractor while leaving formula-specific
497
+ fields visible at the public entrypoint.
498
+
499
+ For bounded lifecycle projection helpers, keep the implementation to an
500
+ in-memory projection from an already selected `MediaObject` and explicit timing
501
+ inputs into `ReferenceLifecycleState`. Reject non-reference media objects,
502
+ require only the timing keys needed for the requested lifecycle state, derive
503
+ `useful`, `stale`, and `expired` only from provided state, timestamp, and
504
+ deadline values, and delegate timestamp ordering, lifecycle legality, defaults,
505
+ and stable `record_id` generation to `ReferenceLifecycleState`. Do not add
506
+ external clocks, file I/O, trace loading, transport models, event processors,
507
+ component profilers, full simulators, harness runners, exports, docs, or
508
+ generated artifacts unless the task explicitly asks for them.
509
+
510
+ For bounded in-memory record-to-mapping helpers, keep the helper as a raw record
511
+ surface adapter, not a result exporter. Support only the accepted record classes
512
+ named by the task, preserve dataclass field names exactly, include the existing
513
+ `record_id`, and convert JSON-hostile containers such as tuples, lists, sets,
514
+ and nested mappings into JSON-friendly in-memory values without mutating the
515
+ source record. Prefer explicit `dataclasses.fields(record)` plus `getattr(...)`
516
+ over `dataclasses.asdict(...)` when accepted records may contain arbitrary
517
+ `Mapping` implementations; `asdict(...)` can fail on valid non-`dict` mappings
518
+ before custom conversion runs. Convert `collections.abc.Mapping` values into
519
+ plain `dict` values recursively. Do not add file I/O, JSONL/CSV writers, result
520
+ artifact directories, schema layers beyond the returned mapping shape,
521
+ package-level exports, docs, dependencies, loaders, metrics, harnesses, CLI, or
522
+ paper-output behavior unless explicitly requested.
523
+
524
+ For bounded method scheduling interfaces, implement only the requested call
525
+ boundary over existing record types. A first scheduler contract may define a
526
+ small protocol, callable type, or minimal runner that invokes a supplied
527
+ scheduler and rejects non-`ScheduleDecision` returns, but it must not introduce
528
+ concrete scheduling policies, baseline behavior, ABR algorithms, candidate
529
+ selection, utility models, method-freeze logic, harness runners, CLI entrypoints,
530
+ package exports, file I/O, or experiment artifacts. Tests should use toy
531
+ in-memory records, prove the scheduler receives candidates and configuration as
532
+ given, assert the returned `ScheduleDecision` is the exact object produced by
533
+ the scheduler, and cover invalid-return rejection only when a runtime runner is
534
+ part of the task.
535
+
536
+ For minimum simple baseline schedulers, implement only the named baseline policy
537
+ and keep it as an in-memory method implementation behind the accepted scheduling
538
+ boundary. Use existing record fields directly, make ordering and default
539
+ configuration behavior deterministic, avoid mutating candidate sequences or
540
+ configuration mappings, and return the existing decision record without adding
541
+ candidate generators, utility models, metrics, harness runners, file I/O,
542
+ exports, or external state. Tests should invoke the baseline through the method
543
+ runner when one exists, use toy records, cover the policy's owned behavior and
544
+ decision invariants, pass mutable candidates/configuration directly when
545
+ asserting no mutation, and keep algorithm-comparison or experiment assertions
546
+ out of the baseline unit tests.
547
+
548
+ ## Prompts, Text, And Comments
549
+
550
+ Prompt strings, embedded task text, and user-facing messages should be short,
551
+ direct, and task-oriented. Distinguish input references from direct content and
552
+ generation responsibility from save/write responsibility.
553
+
554
+ Use comments only for non-obvious constraints, sources, or decisions. Prefer
555
+ clearer names and structure over comments that explain avoidable complexity. Do
556
+ not write style rules, generation history, or tool-process notes into comments.
557
+
558
+ ## Open-Source Reuse
559
+
560
+ When implementing a mature existing capability, first decide whether reuse is
561
+ legal, appropriate, and lower maintenance than rewriting.
562
+
563
+ Reuse preference:
564
+
565
+ 1. stable dependency with compatible license
566
+ 2. adapter around a stable external API
567
+ 3. small copied or ported snippet when license permits
568
+ 4. own implementation
569
+
570
+ Before copying or adapting code, check license compatibility. Do not copy code
571
+ without a clear compatible license. For copied, ported, or adapted code, keep
572
+ source attribution in the relevant code comment and maintain `THIRD_PARTY.md`
573
+ or an equivalent notice file when multiple external snippets are used. Include
574
+ project, URL, file/module, license, version/commit when available, reuse mode,
575
+ and main local changes.
576
+
577
+ Use `academic_army_mcp_tools.deepresearch` when the task depends on unfamiliar
578
+ language conventions, framework organization, harness/test practices,
579
+ open-source reuse, or external code choices. Do not run it for narrow local
580
+ edits in an already-established stack unless the user asks or reuse decisions
581
+ are involved.
582
+
583
+ ## Trajectory Selection
584
+
585
+ A good trajectory is one bounded next edit that follows from accepted,
586
+ verified repository state. It should not become a wishlist from the upstream
587
+ plans.
588
+
589
+ After an accepted change:
590
+
591
+ - re-read the changed files and any root framework docs that may be stale
592
+ - record only what is present and accepted
593
+ - choose a docs-only sync only when a specific existing framework or package doc
594
+ is known to be stale, incomplete, or contradicted by the accepted change
595
+ - if a proposed docs-only sync is mostly already current, narrow it to the exact
596
+ stale document or close it as verified-current instead of editing unrelated
597
+ README-style files
598
+ - choose a focused source/test/harness/export task only when the user or current
599
+ workflow explicitly asks for continued implementation
600
+ - after an accepted docs-only sync, update TODO or handoff notes to close that
601
+ task without inventing another implementation step; clearing the selected next
602
+ task is appropriate when no bounded follow-up has been requested
603
+ - after repository integrity repair, restore-only cleanup, or docs-only
604
+ scaffold repair, stop at the verified baseline unless the user explicitly
605
+ requested a follow-up implementation trajectory
606
+ - after an accepted validation-only pass with no fixes, record the command,
607
+ pass/fail count, and cache cleanup or no-cache-artifact finding only; leave
608
+ the next task neutral unless the user or active workflow already selected a
609
+ bounded follow-up
610
+ - when recording validation-only results, preserve the exact reported result
611
+ line, including skipped count when present, and distinguish "no fixes needed"
612
+ from "files changed"; cache cleanup is not a source/test/docs change.
613
+ - do not treat a green whole-surface validation pass as a new accepted feature
614
+ surface. It confirms the current contracts; it does not create a reason to
615
+ schedule docs syncs, package exports, harness work, metric expansion, or
616
+ additional implementation.
617
+ - pause or set `FINISHED` when the accepted change completes the requested
618
+ static surface and remaining work requires new implementation authority,
619
+ execution, datasets, metrics, algorithms, harness runs, or paper results
620
+
621
+ For TODO or trajectory files:
622
+
623
+ - make the next task executable as one bounded repository edit only when a next
624
+ task is required by the user or by an existing active trajectory
625
+ - name the exact stale file or documented mismatch that motivates a docs-only
626
+ next task; do not add a generic documentation-sync task after every accepted
627
+ source or test change
628
+ - after an accepted source/test change, set a README-style sync as the next task
629
+ only when current README-style files are part of the active repository surface
630
+ and now omit or contradict the newly accepted symbol, helper, test, or
631
+ boundary; name the exact docs and repeat the exclusions that prevent future
632
+ capabilities from being implied
633
+ - after accepting a new bounded metric-computation helper, explicitly check the
634
+ README-style metric-computation entries, package summary, layout row, test
635
+ summary, and metric absence clause before leaving the next task neutral. If
636
+ any still omit the helper or describe only older sibling formulas, queue a
637
+ docs-only sync that names the exact stale docs and keeps runtime metric
638
+ frameworks, QoE weighting, exports, harnesses, experiments, and paper outputs
639
+ excluded.
640
+ - before queuing a multi-file README-style sync, do a small read-only scan of
641
+ the likely README files and record the specific stale sentence, row, or
642
+ absence clause that needs work. If only one requested doc is stale, make the
643
+ next task a one-doc correction or say the other docs are already current,
644
+ instead of carrying a template four-file sync forward.
645
+ - when a TODO or trajectory update selects a README-style sync after a metric
646
+ helper, name the newly accepted helper/metric and the exact README-style files
647
+ or surfaces verified stale. Do not set a generic "README sync" next task from
648
+ the assumption that docs must be stale; if no live docs scan was done, leave
649
+ the next task neutral or make the next step an explicit read-only docs scan.
650
+ - if a queued README-style sync later proves to be fully current, record it as a
651
+ verified no-op and clear or keep the next task neutral; do not queue another
652
+ docs sync for the same helper unless a new stale sentence is found in the live
653
+ files.
654
+ - when recording an accepted docs-only sync, distinguish docs that were changed
655
+ from docs that were only read back and verified current; do not imply all
656
+ requested docs were modified if only one needed a formatting or consistency
657
+ adjustment
658
+ - if the accepted task was only a restore, cleanup, revert, or docs-only repair,
659
+ record the accepted baseline and mark the trajectory finished or waiting for
660
+ user direction instead of promoting code, test, harness, or experiment work
661
+ - after an accepted README-style sync for a metric helper, do not select the
662
+ next metric helper from the coding plan merely because the docs are now
663
+ current. Leave the next task neutral unless task selection has explicitly
664
+ chosen that next implementation slice.
665
+ - for TODO-only maintenance after an accepted docs-only sync, update only the
666
+ TODO or handoff file, read it back, and report that no tests were run because
667
+ no executable code changed
668
+ - for TODO-only maintenance after an accepted validation-only pass, update only
669
+ the TODO or handoff file; copy the accepted command, pass/fail count, no-fix
670
+ status, and cache cleanup or no-cache finding from the developer report; state
671
+ that no tests were run for the TODO-only step; and leave the next task neutral
672
+ unless an existing active trajectory already selected a bounded follow-up
673
+ - do not convert a validation-only pass count into a completed feature count;
674
+ it confirms the current accepted surface across the named tests but does not
675
+ accept additional formulas, exports, harnesses, experiment execution, or paper
676
+ outputs
677
+ - when no next task has been explicitly selected after docs-only acceptance, use
678
+ a neutral waiting state such as "no next developer task is selected; run task
679
+ selection before more work" instead of promoting the next source, harness,
680
+ metric, experiment, or paper-output task
681
+ - include explicit exclusions when broad verbs like load, run, export,
682
+ validate, or normalize could be misread as runtime work
683
+ - do not mark source contracts, loaders, runnable harnesses, metrics, exports,
684
+ experiments, or paper results complete unless they exist and were verified
685
+ - do not promote code/test work from a docs-only scaffold or restored scaffold
686
+ unless explicitly requested
687
+ - do not resurrect old review defects or historical plans that are not present
688
+ in the current repository
689
+
690
+ ## Review Guidance
691
+
692
+ When reviewing code, lead with defects that harm readability, locality, naming,
693
+ state ownership, interface clarity, harness/test separation, artifact shape, or
694
+ framework consistency.
695
+
696
+ Prefer review suggestions that delete, inline, move to the use site, rename,
697
+ align ordering, split responsibilities, clarify ownership, or reduce caller
698
+ burden. Do not default to adding wrappers, registries, config layers, factories,
699
+ or defensive branches unless they solve the concrete defect.
700
+
701
+ If code is already direct and local, avoid suggesting extra abstraction for
702
+ style alone.
703
+
704
+ For bounded normalizer reviews, first verify the change did not alter the
705
+ owning record/schema identity contract, package exports, docs, dependencies,
706
+ loaders, registries, fixtures, harnesses, simulators, metrics, or generated
707
+ artifacts unless those edits were explicitly in scope. A test that mutates a
708
+ non-identity optional field and expects `record_id` to change is a test defect,
709
+ not permission to expand the accepted record identity. Conversely, do not
710
+ require non-identity no-change assertions in a bounded normalizer review unless
711
+ they were requested or identity implementation changed. Also check that any
712
+ record-id prefix assertion matches the accepted record implementation rather
713
+ than an inferred helper or class name.
714
+
715
+ For lifecycle projection reviews, verify the helper only projects selected
716
+ reference media plus explicit timing inputs into `ReferenceLifecycleState`.
717
+ Non-reference rejection, per-state missing-timing checks, deterministic
718
+ `useful`/`stale`/`expired` flag derivation, delegated timestamp validation, and
719
+ stable identity through the accepted lifecycle record contract should be
720
+ covered. Treat external clocks, runtime simulators, transport or event
721
+ subsystems, component profilers, file I/O, exports, docs, metrics, harness
722
+ runners, or experiment claims as scope defects unless the user requested them.
723
+
724
+ For bounded record-to-mapping reviews, verify the helper supports only the
725
+ accepted record classes, includes the existing `record_id`, preserves field
726
+ names, converts tuple-like values to lists, preserves nested mappings as plain
727
+ dicts, rejects unsupported inputs, and does not mutate source records. Check at
728
+ least one accepted record with a non-`dict` `Mapping` field, such as a
729
+ `MappingProxyType`, so the implementation does not rely on `dataclasses.asdict`
730
+ behavior that fails before custom JSON-friendly conversion can run. Treat file
731
+ writers, JSONL/CSV exporters, artifact directories, new schemas, package exports,
732
+ docs, dependencies, loaders, metrics, harnesses, CLI, or experiment claims as
733
+ scope defects unless the user requested them.
734
+
735
+ For bounded metric-computation reviews, verify the helper implements only the
736
+ named formula, reads accepted raw record fields directly, rejects empty inputs
737
+ and invalid aggregation keys clearly, includes source record IDs as provenance,
738
+ does not mutate source records, and returns a valid `MetricRecord` while
739
+ delegating metric validation and identity to that record type. Treat extra
740
+ metric formulas, QoE weighting, aggregation frameworks, file/result exporters,
741
+ schemas, package exports, docs, dependencies, loaders, harness runners, CLI,
742
+ experiments, or paper-output claims as scope defects unless the user requested
743
+ them.
744
+ Review named boundary tests against their fixture values and expected metric
745
+ fields. If a zero, all, none, or single-frame case is actually mixed data,
746
+ request the smallest focused test correction before accepting the trajectory.
747
+ Also check private shared helper names after sibling formula additions. Passing
748
+ tests are not enough if a shared helper's name still describes only the older
749
+ formula family; require a neutral private name without changing the public API
750
+ or behavior.
751
+ For sibling mean-metric reviews, check whether the change added parallel private
752
+ helpers that differ only by source field and unit. If so, ask for the smallest
753
+ cleanup: one neutral mean helper with explicit unit/value extraction, or direct
754
+ public functions if that is clearer than another abstraction.
755
+
756
+ For minimum baseline scheduler reviews, verify the implementation stays behind
757
+ the accepted method boundary, uses only in-memory record fields, has
758
+ deterministic ordering and missing-configuration behavior, does not mutate
759
+ inputs, and returns a valid decision record. Treat added ABR algorithms,
760
+ candidate generation, utility models, metric formulas, result exports, harness
761
+ runners, package exports, docs, file I/O, or experiment claims as scope defects
762
+ unless the user explicitly requested them.
763
+ When reviewing no-mutation coverage, verify the test asserts on the same
764
+ mutable input object that was passed to the implementation; tests that pass an
765
+ immutable copy or converted container while checking the original are false
766
+ positives.
767
+
768
+ For README-style docs-sync reviews, cross-check every newly documented symbol or
769
+ test against the document's absence clauses. A stale "no methods", "no tests",
770
+ "no metrics", "no exports", or similar broad exclusion that contradicts an
771
+ accepted bounded surface is a documentation defect even if the new module entry
772
+ was added correctly.
773
+ Also scan table rows and summary sentences for stale singular wording after a
774
+ surface becomes cumulative; a docs sync is incomplete if one section says both
775
+ accepted siblings exist while another still describes the same module or test as
776
+ a single helper, scheduler, baseline, or validation path.
777
+ For lifecycle projection docs, also scan English and Chinese wording for
778
+ normalizer/projection confusion. A test summary that says stable IDs are covered
779
+ through a normalization path for `simulate_reference_lifecycle_state(...)` is
780
+ stale; it should say projection path and keep full-simulator/runtime claims out.
781
+ For raw-record mapping docs, verify test summaries say the helper preserves the
782
+ existing `record_id` from source records unless tests actually mutate
783
+ identity-bearing fields. Also narrow stale "no exports" clauses so they exclude
784
+ file-based/result exporters and artifact surfaces without contradicting an
785
+ accepted in-memory mapping helper.
786
+ For bounded metric-computation docs, verify docs name the exact formula helper
787
+ and focused test file, then narrow broad "no metrics" clauses to "no additional
788
+ metric formulas, QoE weighting, aggregation frameworks, runnable harnesses,
789
+ experiments, or paper outputs" without implying the planned metric layer exists.
790
+ When there are multiple accepted formula helpers in one module, review plural
791
+ wording and cumulative lists the same way as baseline docs; stale singular
792
+ "the metric helper" wording can mislead even when every filename is listed.
793
+ After a third or later helper is accepted, treat stale "both", "two", or
794
+ formula-family-only descriptions as review defects even if the module and test
795
+ filenames are already correct.
796
+ For cumulative README-style metric syncs, sample every repeated surface named
797
+ in the task, not just the first occurrence: implemented-surface summary,
798
+ package summary, layout row, test row, and absence clause. If any surface stops
799
+ at the previous helper or omits the new `metric_name`, request the smallest
800
+ docs-only correction in the requested file set.
801
+ When a non-rate metric helper is added to a module that previously contained
802
+ only rate helpers, review the surrounding descriptor as well as the helper list.
803
+ Reject "rate metric computation" or "rate computation path" wording for a mixed
804
+ rate-and-mean surface even when the new helper and metric name are listed.
805
+ For metric docs-sync reviews, compare coverage wording against the focused test
806
+ file. Treat "none FPS", "none latency", "none quality", or "none score" as
807
+ misleading for mean helpers when the tests actually cover all-zero, mixed, and
808
+ single-frame numeric inputs.
809
+ If the developer reports a docs-only no-op, review the live requested files for
810
+ the named helper list, metric names, test summary, and absence clauses before
811
+ requesting edits. Accept the no-op when those surfaces are already current.
812
+
813
+ For validation-only reviews, verify the developer ran the exact requested
814
+ command from the repository root, reported the pass/fail count, made no source,
815
+ test, docs, dependency, export, or TODO changes when the suite passed, and
816
+ removed only generated cache/bytecode artifacts or reported that none remained
817
+ after the run. Do not request cleanup of pre-existing dirty or untracked
818
+ accepted files, and do not ask for new docs, exports, harnesses, or follow-up
819
+ implementation solely because validation passed.
820
+ Also compare the reported command against the requested test-target list. A
821
+ green validation run that omits a requested file, silently substitutes a smaller
822
+ suite, or creates a missing test target during a validation-only task is not an
823
+ acceptable validation trajectory.
824
+
825
+ ## Readability Audit
826
+
827
+ After edits, perform a quick static audit:
828
+
829
+ - names match real meaning and data shape
830
+ - data flow is direct and ordered naturally
831
+ - functions, files, and modules have clear responsibilities
832
+ - abstractions reduce real complexity rather than add jumps
833
+ - no avoidable global state, hidden paths, long call chains, or repeated
834
+ registration points were added
835
+ - the change stayed local to the natural owner
836
+ - harness and test responsibilities remain separate
837
+ - artifact schemas, exporters, docs, and tests agree when any of them changed
838
+ - framework docs are updated or confirmed current
839
+ - external code has compatible license and attribution when reused
840
+ - no generated cache/build/test/output/result artifacts were left behind unless
841
+ explicitly requested
842
+
843
+ For docs-only tasks, audit that docs describe only current repository reality
844
+ and do not imply code, tests, loaders, runnable harnesses, metrics, exports,
845
+ experiments, generated artifacts, or TODO status that do not exist.
846
+
847
+ ## Static Validation
848
+
849
+ Use static validation appropriate to the task and existing stack. Do not run
850
+ installs, harnesses, experiments, or full pipelines through this skill unless
851
+ the user or active coding workflow explicitly authorizes that execution.
852
+
853
+ For explicit validation-only tasks, run the requested command from the
854
+ repository root before making any edits. Treat a green run as the desired
855
+ outcome, not as permission to tidy nearby code, refresh docs, update TODO, add
856
+ exports, or broaden coverage. If the run fails, change only the accepted
857
+ source/test surface needed to satisfy the existing contract, then rerun the
858
+ same focused command and clean generated cache artifacts.
859
+
860
+ For source changes, useful static validation may include syntax checks,
861
+ importability checks, schema-surface checks, public export checks, or collection
862
+ shape checks that do not load real data, run harnesses, or write results.
863
+
864
+ For tests, inspect that fixtures exist, parametrized argument names match test
865
+ function signatures, helpers refer to existing symbols, and invalid cases reach
866
+ the intended validator. For no-mutation tests, check that the object inspected
867
+ after the call is the same mutable object given to the code path under test.
868
+ If the contract is about immutable source records rather than container order or
869
+ mapping mutation, check that the same source record objects are inspected after
870
+ the call; a tuple containing those records is acceptable when the container is
871
+ not itself under test.
872
+
873
+ When a task adds or changes focused unit tests and the repository already has a
874
+ lightweight test runner configured, prefer running the smallest relevant test
875
+ target if that does not require dependency installation, real datasets,
876
+ harnesses, experiments, or generated result artifacts. If the focused tests
877
+ cannot be run in the current task, say so and make the next trajectory
878
+ validation-only before adding more implementation.
879
+
880
+ Before running Python tests, choose a command form that avoids in-repository
881
+ generated artifacts where the project permits it, such as disabling bytecode
882
+ and pytest's cache provider for focused validation. For `src/` layouts, prefer
883
+ an explicit environment path or existing editable install over changing project
884
+ metadata just to satisfy imports. After any validation run, check for generated
885
+ cache or bytecode directories created inside the repository and remove only
886
+ those generated artifacts before handoff.
887
+
888
+ For docs-only syncs, re-read the docs and check referenced paths exist, the docs
889
+ match current repository state, and the diff contains only the requested docs.
890
+ If the diff only touches formatting, verify that the formatting change removes a
891
+ real inconsistency with sibling docs; otherwise leave the file unchanged and
892
+ report it as already current.
893
+ Resolve every documented path from the repository root exactly as written. If a
894
+ file lives outside the repository, label it as an external or parent input and
895
+ use the correct relative path from the documented context, not a bare filename
896
+ that implies a repo-root file. Do not list external planning inputs in a
897
+ repository layout table unless the table explicitly distinguishes them from
898
+ files inside the target repo. When docs list implemented symbols, files, tests,
899
+ or artifacts, derive the list and counts from the current source tree instead
900
+ of memory or planning documents.
901
+
902
+ For cleanup, verify exact deleted paths are absent and no exports, tests, docs,
903
+ or generated artifacts still reference them.
904
+
905
+ ## Final Response
906
+
907
+ Keep the final response concise:
908
+
909
+ - changed repository-relative paths
910
+ - behavior or contract covered
911
+ - relevant static validation performed
912
+ - caveats only when they affect the user's next action
913
+
914
+ Do not paste full files unless requested. Do not explain skill internals or tool
915
+ mechanics.