@laitszkin/apollo-toolkit 3.1.8 → 3.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. package/AGENTS.md +1 -0
  2. package/CHANGELOG.md +8 -0
  3. package/README.md +1 -0
  4. package/analyse-app-logs/scripts/__pycache__/filter_logs_by_time.cpython-312.pyc +0 -0
  5. package/analyse-app-logs/scripts/__pycache__/log_cli_utils.cpython-312.pyc +0 -0
  6. package/analyse-app-logs/scripts/__pycache__/search_logs.cpython-312.pyc +0 -0
  7. package/docs-to-voice/scripts/__pycache__/docs_to_voice.cpython-312.pyc +0 -0
  8. package/generate-spec/scripts/__pycache__/create-specscpython-312.pyc +0 -0
  9. package/iterative-code-performance/LICENSE +21 -0
  10. package/iterative-code-performance/README.md +34 -0
  11. package/iterative-code-performance/SKILL.md +116 -0
  12. package/iterative-code-performance/agents/openai.yaml +4 -0
  13. package/iterative-code-performance/references/algorithmic-complexity.md +58 -0
  14. package/iterative-code-performance/references/allocation-and-hot-loops.md +53 -0
  15. package/iterative-code-performance/references/caching-and-memoization.md +64 -0
  16. package/iterative-code-performance/references/concurrency-and-pipelines.md +61 -0
  17. package/iterative-code-performance/references/coupled-hot-path-strategy.md +78 -0
  18. package/iterative-code-performance/references/io-batching-and-queries.md +55 -0
  19. package/iterative-code-performance/references/iteration-gates.md +133 -0
  20. package/iterative-code-performance/references/job-selection.md +92 -0
  21. package/iterative-code-performance/references/measurement-and-benchmarking.md +78 -0
  22. package/iterative-code-performance/references/module-coverage.md +133 -0
  23. package/iterative-code-performance/references/repository-scan.md +69 -0
  24. package/iterative-code-quality/README.md +1 -0
  25. package/iterative-code-quality/SKILL.md +1 -0
  26. package/iterative-code-quality/agents/openai.yaml +1 -1
  27. package/iterative-code-quality/references/coupled-core-file-strategy.md +2 -2
  28. package/iterative-code-quality/references/iteration-gates.md +5 -2
  29. package/iterative-code-quality/references/job-selection.md +4 -1
  30. package/iterative-code-quality/references/testing-strategy.md +3 -1
  31. package/katex/scripts/__pycache__/render_katex.cpython-312.pyc +0 -0
  32. package/open-github-issue/scripts/__pycache__/open_github_issue.cpython-312.pyc +0 -0
  33. package/package.json +1 -1
  34. package/read-github-issue/scripts/__pycache__/find_issues.cpython-312.pyc +0 -0
  35. package/read-github-issue/scripts/__pycache__/read_issue.cpython-312.pyc +0 -0
  36. package/resolve-review-comments/scripts/__pycache__/review_threads.cpython-312.pyc +0 -0
  37. package/text-to-short-video/scripts/__pycache__/enforce_video_aspect_ratio.cpython-312.pyc +0 -0
@@ -0,0 +1,133 @@
1
+ # Performance Iteration Gates And Stopping Criteria
2
+
3
+ ## Pass discipline
4
+
5
+ Each iteration must have:
6
+
7
+ - a selected module or bounded module cluster,
8
+ - a concrete performance target,
9
+ - an explicit record of which performance job lenses were checked during the deep read,
10
+ - a bounded file/symbol scope,
11
+ - one or more selected execution directions,
12
+ - baseline evidence or a reason measurement is unavailable,
13
+ - a confidence assessment covering the agent's own ability to complete the optimization, the task's inherent difficulty, objective guardrail strength, benchmark quality, and rollback or repair paths,
14
+ - expected behavior-preserving outcome,
15
+ - validation plan,
16
+ - rollback point if evidence contradicts the change.
17
+
18
+ An iteration is not "one work type", and it also does not need to include every direction every time. Within the selected scope, choose the subset of directions that has the best current evidence and leverage: measurement, complexity, IO, caching, allocation, concurrency, and/or staged unlock work.
19
+
20
+ Confidence is not a synonym for "easy". Assess whether the agent has enough understanding, skill, workload context, tests, benchmarks, validation commands, and recovery path to complete the optimization safely. A hard task can still be high-confidence when strong guardrails, characterization coverage, and clear rollback let the agent repair mistakes by making the guarded behavior green again.
21
+
22
+ Avoid starting a broad second iteration before validating the first, but do not stop after a validated iteration if known actionable performance issues remain anywhere in the in-scope codebase.
23
+
24
+ Do not stop after a validated iteration if any in-scope module remains unvisited in the module coverage ledger.
25
+
26
+ ## Validation cadence
27
+
28
+ Run validation from narrow to broad:
29
+
30
+ 1. Formatter or type check for touched files when available.
31
+ 2. Unit tests for touched helpers and modules.
32
+ 3. Benchmarks, profiler runs, query-count checks, or operation-count checks for the optimized path when available.
33
+ 4. Integration tests for affected chains.
34
+ 5. Broader suite or build once multiple passes interact.
35
+
36
+ If validation fails:
37
+
38
+ - determine whether the failure is pre-existing, stale test expectation, flaky benchmark, test isolation issue, or real product bug,
39
+ - fix the true owner,
40
+ - keep regression coverage for real defects,
41
+ - do not mask failures by weakening assertions or widening benchmark budgets without evidence.
42
+
43
+ If validation passes and the performance plus correctness guardrails meaningfully cover the changed behavior, do not keep a known bottleneck in place purely because of subjective confidence concerns. Reassess whether the agent has enough capability and objective support to proceed; if yes, continue, and if no, choose the smallest measurement, benchmark, or unlock step that would make the next optimization credible.
44
+
45
+ The final stopping condition also requires the relevant guarded test surface to be green; a partially red repository is not a completed optimization outcome.
46
+
47
+ ## Re-scan after each iteration
48
+
49
+ Inspect the full known performance backlog for:
50
+
51
+ - modules that are still unvisited or only shallowly read,
52
+ - modules that were read but not yet checked against every available performance job lens,
53
+ - new repeated work after moved or extracted concepts,
54
+ - remaining N+1 calls, serial round trips, or excessive query shapes,
55
+ - caches that are stale, unbounded, or unnecessary,
56
+ - hot loops that still allocate avoidable objects,
57
+ - concurrency changes that need backpressure or max-in-flight proof,
58
+ - logs, metrics, traces, or benchmarks that now describe stale names or paths,
59
+ - documentation or `AGENTS.md` drift.
60
+
61
+ Then choose the next execution directions with these priorities:
62
+
63
+ 1. strongest bottleneck evidence,
64
+ 2. largest user-visible or high-frequency impact,
65
+ 3. highest combined confidence from agent capability, workload understanding, correctness guardrails, benchmark quality, and recovery path,
66
+ 4. strongest leverage for later deeper optimization,
67
+ 5. lowest business-risk path toward broader system improvement.
68
+
69
+ Use `references/job-selection.md` to convert those priorities into a concrete next-job choice.
70
+
71
+ ## Stage-gate after each iteration
72
+
73
+ After every validated iteration, run a deliberate full-codebase decision pass:
74
+
75
+ 1. Re-scan the repository and refresh the known performance backlog.
76
+ 2. Refresh the module coverage ledger and identify unvisited in-scope modules.
77
+ 3. Ask whether any known in-scope actionable bottleneck still remains.
78
+ 4. If yes, decide whether it should be addressed in the very next iteration or whether measurement or unlock work is needed first.
79
+ 5. If the obstacle is a large, coupled, or central hot path, do not stop there; switch to staged unlock work and continue.
80
+ 6. Only declare the repository iteration-complete when the re-scan shows no remaining actionable in-scope bottleneck and no unvisited in-scope module except items that are explicitly deferred or excluded under the allowed stop categories.
81
+
82
+ This stage-gate is mandatory. A validated local optimization does not by itself mean the repository is done.
83
+
84
+ ## Continue when
85
+
86
+ Repeat the cycle when:
87
+
88
+ - any known in-scope actionable performance issue remains unresolved,
89
+ - any in-scope module remains unvisited,
90
+ - measured slow paths remain,
91
+ - clear algorithmic waste remains,
92
+ - avoidable IO, query, or external-service round trips remain,
93
+ - unsafe or low-value caches need removal or replacement,
94
+ - allocation churn or hot-loop repeated work remains in a material path,
95
+ - concurrency, backpressure, or batching gaps remain,
96
+ - benchmarks or profiling are missing for a high-risk optimization that is otherwise actionable.
97
+
98
+ Do not produce a final completion report while any item in this section is true. Continue with the next bounded iteration instead.
99
+
100
+ ## Stop when
101
+
102
+ Stop only when there are no unresolved known in-scope actionable performance issues. Any remaining candidates must be explicitly classified as one of:
103
+
104
+ - low-value micro-optimization,
105
+ - speculative without concrete evidence,
106
+ - production-measurement-only with no safe local proxy,
107
+ - public contract migrations,
108
+ - macro-architecture changes,
109
+ - product behavior changes needing user approval,
110
+ - blocked by unavailable credentials, unstable external systems, or missing documentation,
111
+ - untestable with the current repository tooling and too risky to change safely.
112
+
113
+ If a remaining candidate cannot be placed in one of these categories, it is still an actionable gap and the agent must continue iterating rather than complete the task.
114
+
115
+ If an in-scope module has not received a performance deep-read iteration, it is still an actionable coverage gap even when the already-read modules look fast.
116
+
117
+ ## Completion evidence
118
+
119
+ The final report should make the stopping point auditable:
120
+
121
+ - passes completed,
122
+ - execution directions selected per iteration,
123
+ - module or module cluster covered per iteration,
124
+ - performance job lenses checked per iteration,
125
+ - final module coverage ledger,
126
+ - stage-gate verdict after each full-codebase re-scan,
127
+ - validation commands and outcomes,
128
+ - confirmation that the guarded test surface is green after the optimization,
129
+ - speedup, throughput, latency, CPU, memory, allocation, query-count, or complexity evidence,
130
+ - behavior-preservation evidence,
131
+ - docs and constraints sync status,
132
+ - proof that the latest scan found no known actionable in-scope performance issues,
133
+ - deferred items with reason and required approval, dependency, production data, or safety constraint.
@@ -0,0 +1,92 @@
1
+ # Performance Job Selection Guide
2
+
3
+ ## Purpose
4
+
5
+ Help the agent choose the next execution direction after each full-codebase performance re-scan.
6
+
7
+ These are job-selection rules for Step 2 of the main skill loop. They are not workflow steps.
8
+
9
+ The goal is not to force one permanent order. The goal is to choose the next job that most safely improves the selected module or module cluster and unlocks later work.
10
+
11
+ Before choosing, the agent should first scan the selected module through every available performance job lens. Job selection happens after that scan; it is not a substitute for that scan.
12
+
13
+ ## Available jobs
14
+
15
+ - measurement / benchmarking
16
+ - algorithmic complexity / repeated-work cleanup
17
+ - IO batching / query shaping
18
+ - caching / memoization
19
+ - allocation / hot-loop cleanup
20
+ - concurrency / pipeline tuning
21
+ - staged unlock work
22
+
23
+ ## Choose `measurement / benchmarking` when
24
+
25
+ - the slow path is suspected but not proven,
26
+ - multiple candidate bottlenecks compete and prioritization would otherwise be guesswork,
27
+ - an optimization could regress correctness or user-visible latency without a baseline,
28
+ - production symptoms exist but local reproduction or profiling is missing.
29
+
30
+ ## Choose `algorithmic complexity / repeated-work cleanup` when
31
+
32
+ - nested loops, repeated scans, repeated sorts, or repeated filters dominate the path,
33
+ - the same derived value is recomputed for every item or request,
34
+ - a more appropriate data structure would reduce asymptotic or constant-factor cost,
35
+ - the complexity change can be proven without changing business behavior.
36
+
37
+ ## Choose `IO batching / query shaping` when
38
+
39
+ - N+1 database, network, filesystem, or external API calls exist,
40
+ - serial round trips can be safely batched or preloaded,
41
+ - query predicates, projections, pagination, or indexes are mismatched to real access patterns,
42
+ - external calls lack timeout, retry, or result-shape handling that affects throughput.
43
+
44
+ ## Choose `caching / memoization` when
45
+
46
+ - the same expensive pure or owner-scoped computation repeats,
47
+ - cache ownership, invalidation, size bounds, and failure behavior are clear,
48
+ - stale data cannot violate business rules,
49
+ - a simpler repeated-work or data-structure fix is insufficient.
50
+
51
+ ## Choose `allocation / hot-loop cleanup` when
52
+
53
+ - tight loops allocate avoidable objects, strings, buffers, regexes, or closures,
54
+ - repeated serialization, parsing, cloning, copying, or formatting appears in a hot path,
55
+ - memory pressure or garbage collection is part of the observed slowdown,
56
+ - changes can preserve readability or are justified by strong hot-path evidence.
57
+
58
+ ## Choose `concurrency / pipeline tuning` when
59
+
60
+ - safe independent work is unnecessarily serial,
61
+ - current parallelism is unbounded or overloads downstream resources,
62
+ - queues, workers, streams, or async tasks lack backpressure,
63
+ - batching or bounded concurrency would improve throughput without changing ordering guarantees.
64
+
65
+ ## Choose `staged unlock work` when
66
+
67
+ - the file feels too central or too coupled for direct optimization,
68
+ - no safe full hot-path rewrite exists yet, but a preparatory step does,
69
+ - you can reduce risk through measurement hooks, seam extraction, characterization tests, type extraction, data-shape clarification, or side-effect isolation,
70
+ - the best next move is to make a future optimization cheaper rather than solve the whole area now.
71
+
72
+ ## Tie-breakers
73
+
74
+ If multiple jobs are plausible, prefer the one that:
75
+
76
+ 1. addresses the highest-evidence bottleneck,
77
+ 2. improves the most user-visible or high-frequency path,
78
+ 3. increases safety for the next iteration,
79
+ 4. removes the strongest blocker to a deeper future optimization,
80
+ 5. helps an unvisited module reach performance deep-read coverage,
81
+ 6. matches the agent's self-assessed ability to understand, execute, benchmark, and repair the change under current evidence,
82
+ 7. preserves behavior with the clearest available guardrails.
83
+
84
+ ## Hard rule
85
+
86
+ If performance evidence is too weak, `measurement / benchmarking` should usually win before code changes.
87
+
88
+ If a high-risk hot path lacks enough correctness guardrails, benchmark or characterization guardrail work should usually win before a deeper optimization.
89
+
90
+ If the area is difficult but the agent can explain the workload, behavior, affected contracts, rollback path, and available tests or benchmarks clearly, do not downgrade confidence just because the optimization is non-trivial. Strong guardrails mean accidental breakage should be repaired by returning the test and benchmark surface to green, not avoided by leaving an actionable bottleneck in place.
91
+
92
+ If any in-scope module remains unvisited, choose jobs that help the next highest-evidence or easiest useful unvisited module become deeply read, improved, or validated-clear before spending another round on already-familiar areas.
@@ -0,0 +1,78 @@
1
+ # Measurement And Benchmarking
2
+
3
+ ## Principle
4
+
5
+ Optimize from evidence. A performance change should have at least one of:
6
+
7
+ - production latency, throughput, CPU, memory, queue, or query evidence,
8
+ - profiler output,
9
+ - repeatable benchmark baseline,
10
+ - test runtime profile,
11
+ - log or trace timings,
12
+ - clear algorithmic complexity proof tied to a plausible workload.
13
+
14
+ If none exists, measurement is usually the next job.
15
+
16
+ Measurement also informs confidence, but it is not the only input. The agent must assess its own ability to understand and complete the optimization, then combine that self-assessment with task difficulty, benchmark quality, correctness tests, rollback options, and repair paths. Strong tests and repeatable benchmarks should make difficult changes more actionable because failures can be diagnosed and driven back to green.
17
+
18
+ ## Baseline rules
19
+
20
+ Before changing a hot path, record:
21
+
22
+ - command, scenario, fixture, data size, seed, and environment,
23
+ - current timing, throughput, allocation, query count, memory, or operation count,
24
+ - variance or repeated-run notes when possible,
25
+ - correctness oracle used with the benchmark,
26
+ - reason if the path can only be measured in production.
27
+
28
+ Do not compare a cold-cache baseline with a warm-cache after result unless that is the intended user-visible scenario.
29
+
30
+ ## Benchmark selection
31
+
32
+ Use the cheapest reliable benchmark that proves the risk:
33
+
34
+ - unit microbenchmarks for pure hot helpers,
35
+ - integration benchmarks for query, serialization, or multi-module orchestration paths,
36
+ - command or request benchmarks for user-visible entrypoints,
37
+ - load tests only when concurrency, backpressure, or throughput is the actual risk,
38
+ - profiler snapshots when the bottleneck location is unknown.
39
+
40
+ Avoid expensive load tests when a deterministic benchmark or integration test proves the same performance issue.
41
+
42
+ ## Before/after comparisons
43
+
44
+ A useful comparison names:
45
+
46
+ - baseline command and result,
47
+ - after command and result,
48
+ - data shape and scale,
49
+ - correctness validation,
50
+ - variance or caveat,
51
+ - whether the improvement is latency, throughput, CPU, memory, allocation, query count, or algorithmic complexity.
52
+
53
+ If exact numbers are unstable, report operation counts, query counts, asymptotic complexity, or profiler rank change instead of pretending precision exists.
54
+
55
+ ## Guardrail design
56
+
57
+ Performance guardrails should fail on meaningful regressions, not noise.
58
+
59
+ Prefer:
60
+
61
+ - deterministic operation or query counts,
62
+ - bounded latency budgets with generous margins only when the environment is stable,
63
+ - regression tests for duplicate work removal,
64
+ - benchmark scripts documented for local use,
65
+ - assertions on cache invalidation behavior,
66
+ - profiler notes for manual verification when automated thresholds are unreliable.
67
+
68
+ Do not add flaky timing thresholds to CI when the repository has no stable benchmark environment.
69
+
70
+ ## Production-only measurement
71
+
72
+ When a bottleneck requires production data or credentials:
73
+
74
+ - capture the best local proxy evidence available,
75
+ - document the missing data source,
76
+ - avoid speculative rewrites that cannot be validated,
77
+ - add safe instrumentation or benchmark hooks if approved and useful,
78
+ - classify the remaining item as production-measurement-only if no safe local action remains.
@@ -0,0 +1,133 @@
1
+ # Module Coverage And Performance Deep-Read Iterations
2
+
3
+ ## Purpose
4
+
5
+ Prevent the agent from repeatedly optimizing only familiar hot paths while untouched modules remain unexamined.
6
+
7
+ Use this reference in Step 1 to build the module inventory and in Step 2 to choose which module or module cluster receives the next performance deep-read iteration.
8
+
9
+ Deep-read here does not mean generic reading. It means scanning the module through each available performance job lens so the agent can identify whether measurement, algorithmic complexity, repeated-work removal, IO batching, caching, allocation cleanup, concurrency work, or staged unlock work is justified.
10
+
11
+ ## Module inventory
12
+
13
+ List every meaningful in-scope module before completion. A module may be:
14
+
15
+ - a package, app, service, route group, command group, worker, or library,
16
+ - a domain folder with a clear responsibility,
17
+ - a runtime entrypoint plus its owned helpers,
18
+ - a persistence/query, external-integration, queue, cache, or reporting subsystem,
19
+ - a testable subsystem with stable callers and contracts.
20
+
21
+ Record each module with:
22
+
23
+ - module name and path roots,
24
+ - primary responsibility,
25
+ - entrypoints and public interfaces,
26
+ - key callers and callees,
27
+ - expected workload shape and frequency,
28
+ - tests, benchmarks, and performance guardrails,
29
+ - logs, metrics, traces, or profiling surfaces,
30
+ - persistence, network, filesystem, or external API contracts,
31
+ - risk level and estimated ease,
32
+ - current coverage status.
33
+
34
+ Exclude generated, vendored, lock, build-output, snapshot, fixture-only, or explicitly out-of-scope areas only with evidence.
35
+
36
+ ## Coverage ledger statuses
37
+
38
+ Use simple statuses so stopping conditions are auditable:
39
+
40
+ - `unvisited`: inventoried but not deeply read yet.
41
+ - `deep-read`: callers, callees, tests, logs, benchmarks, contracts, workload shape, core files, and all available performance job lenses were inspected with enough context to judge performance.
42
+ - `optimized`: at least one behavior-safe performance improvement landed for this module.
43
+ - `validated-clear`: deep read found no actionable in-scope performance issue worth changing now.
44
+ - `deferred`: an issue exists but is blocked, unsafe, speculative, approval-dependent, production-measurement-only, or requires macro-architecture/product scope.
45
+ - `excluded`: not human-maintained source or outside the user's requested scope.
46
+
47
+ Completion is not allowed while any in-scope module remains `unvisited`.
48
+
49
+ ## Easy-first and evidence-first ordering
50
+
51
+ Start with the easiest useful modules when that reduces risk:
52
+
53
+ - small surface area,
54
+ - clear ownership,
55
+ - local tests, cheap benchmarks, or profiling hooks,
56
+ - limited side effects,
57
+ - low public API or persistence risk,
58
+ - likely to clarify workload shape, tests, benchmarks, caching seams, batching seams, or data structures used by harder modules.
59
+
60
+ Prefer measured high-impact bottlenecks when they exist, even if they are not the easiest module.
61
+
62
+ Do not confuse easy-first with low-value micro-optimization. The chosen module should either resolve real performance issues or create context/guardrails that make later hot paths safer.
63
+
64
+ ## Deep-read requirements
65
+
66
+ A module iteration is not deep-read until the agent inspects:
67
+
68
+ - module entrypoints and public interfaces,
69
+ - internal core files and responsibility boundaries,
70
+ - key callers and downstream callees,
71
+ - workload size, frequency, and data-shape assumptions,
72
+ - tests, fixtures, mocks, benchmark commands, and validation commands,
73
+ - logs, metrics, tracing, profiler hooks, and error messages,
74
+ - configuration, persistence, query, cache, concurrency, and external-service contracts when relevant,
75
+ - known TODOs, comments, or docs that describe performance behavior.
76
+
77
+ It also must inspect the module through each available performance job lens:
78
+
79
+ - `measurement / benchmarking`: is there enough baseline evidence, or is measurement the next unlock?
80
+ - `algorithmic complexity / repeated work`: are there avoidable scans, sorts, conversions, or duplicated computations?
81
+ - `IO batching / queries`: are there N+1 calls, excessive round trips, poor query shapes, or serial external work?
82
+ - `caching / memoization`: would caching help, and are ownership plus invalidation safe?
83
+ - `allocation / hot loops`: are tight loops creating avoidable objects, strings, parsing, or serialization?
84
+ - `concurrency / pipelines`: is work too serial, too parallel, unbounded, or missing backpressure?
85
+ - `staged unlock work`: if the module is too coupled for direct optimization, what is the next smaller unlock step?
86
+
87
+ Do not mark a module `validated-clear` from a shallow file skim.
88
+ Do not mark a module `validated-clear` until every available performance job lens has been checked and classified as one of: actionable now, measure-first, unlock-first, deferred, excluded, or no meaningful issue found.
89
+
90
+ ## Choosing the next module
91
+
92
+ After every iteration:
93
+
94
+ 1. Re-scan the module ledger.
95
+ 2. Prefer an `unvisited` module unless a just-touched module must be stabilized before moving on.
96
+ 3. Choose the highest-evidence hot module, or the easiest useful `unvisited` module that can be deeply read and improved or validated now.
97
+ 4. Scan that module through every available performance job lens before deciding what "this round" means.
98
+ 5. If the next module is high-risk and under-guarded, choose benchmark or characterization guardrails first.
99
+ 6. If the next module is too coupled for direct optimization, choose staged unlock work rather than skipping it.
100
+ 7. Return to the full-codebase scan after validation and update the ledger.
101
+
102
+ Revisiting a familiar module is valid only when:
103
+
104
+ - it blocks safe deep reading of an unvisited module,
105
+ - a previous optimization created follow-up risk that must be stabilized,
106
+ - validation exposed a real defect, stale benchmark, or stale contract,
107
+ - cross-module optimization requires touching it together with the next module.
108
+
109
+ ## Module cluster iterations
110
+
111
+ One iteration may cover a small cluster of modules when they share one hot path or invariant, such as:
112
+
113
+ - a command and its parser,
114
+ - a route and its service,
115
+ - a domain module and its query adapter,
116
+ - an integration wrapper and its retry or batching helper,
117
+ - a worker and its queue processor.
118
+
119
+ Keep clusters bounded. Do not use clustering to claim full-repository coverage without deep context.
120
+
121
+ ## Stage-gate questions
122
+
123
+ At the end of each iteration, answer:
124
+
125
+ - Which module or module cluster was deeply read?
126
+ - Which performance job lenses were checked, and which jobs were selected and why?
127
+ - What bottleneck was fixed, or why is the module validated-clear?
128
+ - Which guardrails prove behavior was preserved?
129
+ - What baseline and after evidence exists?
130
+ - Which modules remain `unvisited`?
131
+ - Which module is the next highest-evidence or easiest useful target?
132
+
133
+ If any in-scope module remains `unvisited`, the correct action is to return to Step 1, not to finish.
@@ -0,0 +1,69 @@
1
+ # Repository Performance Scan And Backlog Selection
2
+
3
+ ## Purpose
4
+
5
+ Build a factual performance map before changing code, then choose the highest-value optimizations while tracking module-by-module performance deep-read coverage.
6
+
7
+ ## Required scan
8
+
9
+ - Read `AGENTS.md`, `README*`, project docs, manifests, task runners, CI configs, benchmark setup, profiler setup, and test setup.
10
+ - List entrypoints: CLI commands, servers, workers, jobs, frontend routes, scripts, libraries, public packages, and scheduled tasks.
11
+ - Identify core domain modules, persistence/query boundaries, external integrations, serialization/parsing paths, logging utilities, queues, caches, and test helpers.
12
+ - Create a module inventory and coverage ledger using `references/module-coverage.md`.
13
+ - For each module, scan through the available performance job lenses instead of treating scan as generic code reading.
14
+ - Inspect current git state before editing so unrelated user changes are not overwritten.
15
+ - Identify generated, vendored, lock, snapshot, build-output, fixture, compiled, and minified files; exclude them unless they are human-maintained source.
16
+
17
+ ## Performance backlog signals
18
+
19
+ Prioritize files or functions with:
20
+
21
+ - measured slow requests, commands, jobs, tests, startup, builds, or user-visible interactions,
22
+ - high fan-in, high loop counts, high request frequency, or repeated invocation in long-running workers,
23
+ - avoidable nested loops, repeated scans, repeated sorting, repeated parsing, or repeated conversions,
24
+ - N+1 database, network, filesystem, or external API calls,
25
+ - unbounded concurrency, serial work that can be safely batched, or pipelines without backpressure,
26
+ - repeated serialization/deserialization or large intermediate objects,
27
+ - allocation churn, excessive cloning/copying, or memory-pressure paths,
28
+ - caches with missing invalidation, excessive retention, or low hit value,
29
+ - logs, metrics, traces, or benchmarks that hide where time is spent.
30
+
31
+ ## Evidence to capture
32
+
33
+ For each candidate record:
34
+
35
+ - file path and symbol name,
36
+ - owning module or module cluster,
37
+ - job lens that exposed the issue,
38
+ - performance evidence: benchmark, trace, profiler output, log timing, production symptom, or complexity analysis,
39
+ - expected speed, throughput, allocation, IO, or complexity improvement,
40
+ - correctness risks and behavior invariants,
41
+ - tests, benchmarks, or validations needed to prove safety,
42
+ - reason to defer if the candidate requires product, architecture, operational, or production-data approval.
43
+
44
+ ## Exclusion rules
45
+
46
+ Do not optimize:
47
+
48
+ - third-party, generated, compiled, or minified artifacts,
49
+ - snapshots where churn would hide signal,
50
+ - code the user marked as actively edited elsewhere,
51
+ - public schema/API names or data contracts that require migration planning,
52
+ - cold paths where the optimization makes code harder to maintain without evidence of value,
53
+ - areas that cannot be validated and are not causing a clear performance risk.
54
+
55
+ ## Backlog scoring
56
+
57
+ Prefer a small set of high-confidence improvements over an exhaustive sweep.
58
+
59
+ Score each candidate by:
60
+
61
+ 1. **Impact**: latency, throughput, CPU, memory, IO, user criticality, and call frequency.
62
+ 2. **Evidence**: measurement quality or clear complexity proof.
63
+ 3. **Correctness confidence**: ability to preserve business behavior.
64
+ 4. **Validation**: ability to benchmark, test, or otherwise prove equivalence.
65
+ 5. **Blast radius**: number of modules, public contracts, persistence paths, and operational assumptions affected.
66
+
67
+ Start with high-impact, high-evidence, low-blast-radius items. Escalate broad changes only when smaller passes cannot resolve the root performance problem.
68
+
69
+ Do not finish from backlog scoring alone. Completion also requires the module coverage ledger to show that every in-scope module has been deeply read and either improved, validated-clear, deferred, or excluded with evidence.
@@ -16,6 +16,7 @@ Improve an existing repository through a strict three-step loop of full-codebase
16
16
  - Repairs stale or missing logs and adds tests for important observability contracts.
17
17
  - Adds high-value unit, property-based, integration, or E2E tests based on risk.
18
18
  - Does not require pre-existing tests before every refactor; for high-risk under-guarded areas, it treats test addition as the next unlock direction.
19
+ - Requires confidence decisions to combine the agent's self-assessed ability, task complexity, guardrail strength, rollback or repair paths, and whether a strong test suite can safely drive broken refactors back to green.
19
20
  - Uses those tests and other guardrails to justify more aggressive refactors, instead of leaving known issues in place for subjective confidence reasons.
20
21
  - Re-scans the full repository after every iteration and picks the next highest-confidence, highest-leverage directions.
21
22
  - Uses small safe refactors to prepare the ground for larger later refactors, progressing gradually from outside to inside.
@@ -52,6 +52,7 @@ For this skill, `macro architecture` means the system's top-level runtime shape
52
52
  - Choose jobs only after the latest full-codebase scan. Jobs are optional execution directions, not ordered workflow steps.
53
53
  - Treat module scanning and job choice as one linked activity: inspect the selected module through every available job lens before deciding which jobs actually land in this round.
54
54
  - Select the smallest set of jobs that can safely improve the currently selected module or module cluster under current guardrails.
55
+ - Before choosing or deferring a refactor, explicitly assess refactor confidence as a combination of the agent's own ability to understand and complete the task, the objective safety net from tests and other guardrails, the clarity of rollback or repair paths, and the task's inherent difficulty. Do not treat difficulty alone as low confidence; when strong tests guard the behavior, use them to support bolder changes because failures can be driven back to green.
55
56
  - Prefer easy-first module ordering: start from low-risk, high-confidence modules when doing so builds context, tests, naming clarity, or seams that make harder modules safer later.
56
57
  - Do not keep revisiting familiar modules while other in-scope modules remain unvisited unless the familiar module blocks the next unvisited module's safe deep read.
57
58
  - Prefer smaller, high-confidence refactors that reduce risk and prepare the ground for deeper later cleanup.
@@ -1,4 +1,4 @@
1
1
  interface:
2
2
  display_name: "Iterative Code Quality"
3
3
  short_description: "Refactor names, functions, modules, logs, and tests in repeated behavior-safe passes"
4
- default_prompt: "Use $iterative-code-quality as a strict three-step loop. Step 1: scan the full repository, refresh the actionable quality backlog, and maintain a module inventory plus coverage ledger. Step 2: choose this round's module or bounded module cluster, scan it through every available job lens, and only then decide which jobs actually land now; start from the easiest useful unvisited modules, jobs are selectable directions rather than workflow steps, and if a high-risk area is weakly guarded add the missing tests or other guardrails instead of stopping. If a file is too coupled or too central for direct cleanup, switch to staged unlock work and keep progressing. After validation, run a full-codebase stage-gate; if any known in-scope actionable issue remains or any in-scope module has not received a job-oriented deep-read iteration, go back to Step 1 immediately. Step 3: only when the latest full-codebase scan is clear and every in-scope module is deeply read through the available-job lenses, run $align-project-documents and $maintain-project-constraints to synchronize docs and AGENTS.md. Preserve intended business behavior and the system's macro architecture, keep the guarded test surface green, and do not write a completion report while actionable gaps or unvisited modules still exist."
4
+ default_prompt: "Use $iterative-code-quality as a strict three-step loop. Step 1: scan the full repository, refresh the actionable quality backlog, and maintain a module inventory plus coverage ledger. Step 2: choose this round's module or bounded module cluster, scan it through every available job lens, and only then decide which jobs actually land now; start from the easiest useful unvisited modules, jobs are selectable directions rather than workflow steps, and assess refactor confidence from your own ability to understand and complete the change, task complexity, guardrail strength, rollback or repair paths, and whether tests can drive accidental breakage back to green. If a high-risk area is weakly guarded add the missing tests or other guardrails instead of stopping; if strong tests guard the behavior, use them to justify bolder safe refactors rather than avoiding the work. If a file is too coupled or too central for direct cleanup, switch to staged unlock work and keep progressing. After validation, run a full-codebase stage-gate; if any known in-scope actionable issue remains or any in-scope module has not received a job-oriented deep-read iteration, go back to Step 1 immediately. Step 3: only when the latest full-codebase scan is clear and every in-scope module is deeply read through the available-job lenses, run $align-project-documents and $maintain-project-constraints to synchronize docs and AGENTS.md. Preserve intended business behavior and the system's macro architecture, keep the guarded test surface green, and do not write a completion report while actionable gaps or unvisited modules still exist."
@@ -52,14 +52,14 @@ Avoid these anti-patterns:
52
52
 
53
53
  Prefer the next step that maximizes:
54
54
 
55
- 1. confidence under existing tests or quickly addable tests,
55
+ 1. combined confidence from the agent's own ability, code understanding, existing tests, or quickly addable tests,
56
56
  2. leverage for future deeper cleanup,
57
57
  3. reduction in coupling or cognitive load,
58
58
  4. low risk to current business behavior.
59
59
 
60
60
  If two steps are both safe, choose the one that makes the next iteration easier.
61
61
 
62
- If the file is high-risk and under-tested, prefer adding the smallest useful characterization tests before attempting deeper structural edits.
62
+ If the file is high-risk and under-tested, prefer adding the smallest useful characterization tests before attempting deeper structural edits. If the file is high-risk but well-guarded, do not stop only because the change is difficult; use the guardrails to validate the agent's work and repair any accidental breakage.
63
63
 
64
64
  ## Completion rule for coupled files
65
65
 
@@ -9,12 +9,15 @@ Each iteration must have:
9
9
  - an explicit record of which job lenses were checked during the deep read,
10
10
  - a bounded file/symbol scope,
11
11
  - one or more selected execution directions,
12
+ - a confidence assessment covering the agent's own ability to complete the refactor, the task's inherent difficulty, objective guardrail strength, and rollback or repair paths,
12
13
  - expected behavior-neutral outcome,
13
14
  - validation plan,
14
15
  - rollback point if evidence contradicts the change.
15
16
 
16
17
  An iteration is not "one work type", and it also does not need to include every direction every time. Within the selected scope, choose the subset of directions that has the best current confidence and leverage: naming, simplification, module boundaries, logging, and/or tests.
17
18
 
19
+ Confidence is not a synonym for "easy". Assess whether the agent has enough understanding, skill, local context, tests, validation commands, and recovery path to complete the refactor safely. A hard task can still be high-confidence when strong tests, characterization coverage, and clear rollback let the agent repair mistakes by making the guarded behavior green again.
20
+
18
21
  Avoid starting a broad second iteration before validating the first, but do not stop after a validated iteration if known actionable quality issues remain anywhere in the in-scope codebase.
19
22
 
20
23
  Do not stop after a validated iteration if any in-scope module remains unvisited in the module coverage ledger.
@@ -35,7 +38,7 @@ If validation fails:
35
38
  - keep regression coverage for real defects,
36
39
  - do not mask failures by weakening assertions.
37
40
 
38
- If validation passes and the guardrails meaningfully cover the changed behavior, do not keep a known quality issue in place purely because of subjective confidence concerns.
41
+ If validation passes and the guardrails meaningfully cover the changed behavior, do not keep a known quality issue in place purely because of subjective confidence concerns. Reassess whether the agent has enough capability and objective support to proceed; if yes, continue, and if no, choose the smallest guardrail or unlock step that would make the next refactor credible.
39
42
 
40
43
  The final stopping condition also requires the relevant guarded test surface to be green; a partially red repository is not a completed refactor outcome.
41
44
 
@@ -54,7 +57,7 @@ Inspect the full known quality backlog for:
54
57
 
55
58
  Then choose the next execution directions with these priorities:
56
59
 
57
- 1. highest confidence under current guardrails,
60
+ 1. highest combined confidence from agent capability, code understanding, guardrails, and recovery path,
58
61
  2. strongest leverage for later deeper cleanup,
59
62
  3. lowest business-risk path toward broader system improvement.
60
63
 
@@ -66,10 +66,13 @@ If multiple jobs are plausible, prefer the one that:
66
66
  2. reduces cognitive load fastest,
67
67
  3. removes the strongest blocker to a deeper future refactor,
68
68
  4. helps an unvisited module reach deep-read coverage,
69
- 5. preserves behavior with the clearest available guardrails.
69
+ 5. matches the agent's self-assessed ability to understand, execute, and repair the change under current evidence,
70
+ 6. preserves behavior with the clearest available guardrails.
70
71
 
71
72
  ## Hard rule
72
73
 
73
74
  If a high-risk area lacks enough guardrails, `test addition` or another guardrail-building job should usually win before a deeper structural refactor.
74
75
 
76
+ If the area is difficult but the agent can explain the behavior, affected contracts, rollback path, and available tests clearly, do not downgrade confidence just because the refactor is non-trivial. Strong guardrails mean accidental breakage should be repaired by returning the test suite to green, not avoided by leaving an actionable quality issue in place.
77
+
75
78
  If any in-scope module remains unvisited, choose jobs that help the next easiest useful unvisited module become deeply read, improved, or validated-clear before spending another round on already-familiar areas.
@@ -8,6 +8,8 @@ For every non-trivial pass, ask what could regress silently if the cleanup were
8
8
 
9
9
  Use the resulting guardrails aggressively: when tests or equivalent verification can prove behavior preservation, they should unlock bolder refactors rather than merely justify small cosmetic edits.
10
10
 
11
+ Confidence decisions must include the agent's own ability to understand and complete the refactor, not only the apparent difficulty of the code. Strong tests, characterization coverage, narrow rollback points, and clear validation commands are objective support: if the refactor breaks behavior, the agent should use the failing guardrails to repair the true owner and return the suite to green instead of treating the task as impossible.
12
+
11
13
  Do not require pre-existing tests before every refactor. Instead:
12
14
 
13
15
  - if existing guardrails are already sufficient, proceed;
@@ -89,5 +91,5 @@ Consider:
89
91
  - Preserve failing seeds or examples from property-based tests.
90
92
  - Do not weaken existing tests to fit the refactor.
91
93
  - If old tests asserted implementation details, rewrite them around stable behavior while preserving the business invariant.
92
- - Once stable guardrails exist, do not refuse a maintainability-improving refactor purely because confidence feels lower than ideal; let the guardrails decide.
94
+ - Once stable guardrails exist, do not refuse a maintainability-improving refactor purely because confidence feels lower than ideal; combine self-assessed ability with the objective safety net and let the guardrails decide.
93
95
  - If stable guardrails do not yet exist for a high-risk area, create them as the next execution direction instead of treating the refactor as blocked forever.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@laitszkin/apollo-toolkit",
3
- "version": "3.1.8",
3
+ "version": "3.2.0",
4
4
  "description": "Apollo Toolkit npm installer for managed skill copying across Codex, OpenClaw, and Trae.",
5
5
  "license": "MIT",
6
6
  "author": "LaiTszKin",