@laitszkin/apollo-toolkit 3.12.1 → 3.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (67) hide show
  1. package/AGENTS.md +38 -107
  2. package/CHANGELOG.md +28 -0
  3. package/CLAUDE.md +38 -0
  4. package/README.md +9 -16
  5. package/analyse-app-logs/scripts/__pycache__/filter_logs_by_time.cpython-312.pyc +0 -0
  6. package/analyse-app-logs/scripts/__pycache__/log_cli_utils.cpython-312.pyc +0 -0
  7. package/analyse-app-logs/scripts/__pycache__/search_logs.cpython-312.pyc +0 -0
  8. package/archive-specs/SKILL.md +0 -6
  9. package/commit-and-push/SKILL.md +3 -9
  10. package/docs-to-voice/scripts/__pycache__/docs_to_voice.cpython-312.pyc +0 -0
  11. package/generate-spec/SKILL.md +30 -11
  12. package/generate-spec/references/definition.md +12 -0
  13. package/generate-spec/scripts/__pycache__/create-specscpython-312.pyc +0 -0
  14. package/init-project-html/SKILL.md +18 -22
  15. package/init-project-html/references/definition.md +12 -0
  16. package/katex/scripts/__pycache__/render_katex.cpython-312.pyc +0 -0
  17. package/maintain-project-constraints/SKILL.md +11 -19
  18. package/merge-changes-from-local-branches/SKILL.md +11 -24
  19. package/open-github-issue/scripts/__pycache__/open_github_issue.cpython-312.pyc +0 -0
  20. package/optimise-skill/SKILL.md +10 -2
  21. package/optimise-skill/references/example_skill.md +10 -2
  22. package/package.json +1 -1
  23. package/read-github-issue/scripts/__pycache__/find_issues.cpython-312.pyc +0 -0
  24. package/read-github-issue/scripts/__pycache__/read_issue.cpython-312.pyc +0 -0
  25. package/resolve-review-comments/scripts/__pycache__/review_threads.cpython-312.pyc +0 -0
  26. package/solve-issues-found-during-review/SKILL.md +1 -1
  27. package/systematic-debug/SKILL.md +11 -38
  28. package/test-case-strategy/SKILL.md +10 -37
  29. package/text-to-short-video/scripts/__pycache__/enforce_video_aspect_ratio.cpython-312.pyc +0 -0
  30. package/update-project-html/SKILL.md +19 -24
  31. package/update-project-html/references/definition.md +12 -0
  32. package/version-release/SKILL.md +16 -37
  33. package/iterative-code-performance/LICENSE +0 -21
  34. package/iterative-code-performance/README.md +0 -34
  35. package/iterative-code-performance/SKILL.md +0 -116
  36. package/iterative-code-performance/agents/openai.yaml +0 -4
  37. package/iterative-code-performance/references/algorithmic-complexity.md +0 -58
  38. package/iterative-code-performance/references/allocation-and-hot-loops.md +0 -53
  39. package/iterative-code-performance/references/caching-and-memoization.md +0 -64
  40. package/iterative-code-performance/references/concurrency-and-pipelines.md +0 -61
  41. package/iterative-code-performance/references/coupled-hot-path-strategy.md +0 -78
  42. package/iterative-code-performance/references/io-batching-and-queries.md +0 -55
  43. package/iterative-code-performance/references/iteration-gates.md +0 -133
  44. package/iterative-code-performance/references/job-selection.md +0 -92
  45. package/iterative-code-performance/references/measurement-and-benchmarking.md +0 -78
  46. package/iterative-code-performance/references/module-coverage.md +0 -133
  47. package/iterative-code-performance/references/repository-scan.md +0 -69
  48. package/iterative-code-quality/LICENSE +0 -21
  49. package/iterative-code-quality/README.md +0 -45
  50. package/iterative-code-quality/SKILL.md +0 -112
  51. package/iterative-code-quality/agents/openai.yaml +0 -4
  52. package/iterative-code-quality/references/coupled-core-file-strategy.md +0 -73
  53. package/iterative-code-quality/references/iteration-gates.md +0 -127
  54. package/iterative-code-quality/references/job-selection.md +0 -78
  55. package/iterative-code-quality/references/logging-alignment.md +0 -67
  56. package/iterative-code-quality/references/module-boundaries.md +0 -83
  57. package/iterative-code-quality/references/module-coverage.md +0 -126
  58. package/iterative-code-quality/references/naming-and-simplification.md +0 -73
  59. package/iterative-code-quality/references/repository-scan.md +0 -65
  60. package/iterative-code-quality/references/testing-strategy.md +0 -95
  61. package/merge-conflict-resolver/SKILL.md +0 -46
  62. package/merge-conflict-resolver/agents/openai.yaml +0 -5
  63. package/spec-to-project-html/SKILL.md +0 -42
  64. package/spec-to-project-html/agents/openai.yaml +0 -11
  65. package/spec-to-project-html/references/TEMPLATE_SPEC.md +0 -113
  66. package/submission-readiness-check/SKILL.md +0 -39
  67. package/submission-readiness-check/agents/openai.yaml +0 -4
@@ -1,58 +0,0 @@
1
- # Algorithmic Complexity And Repeated Work
2
-
3
- ## Signals
4
-
5
- Look for:
6
-
7
- - nested loops over growing inputs,
8
- - repeated full scans to answer point lookups,
9
- - repeated sorting when one sort or heap would do,
10
- - filtering or mapping the same collection in many branches,
11
- - recomputing derived values inside loops,
12
- - repeated parsing, validation, normalization, or conversion of identical inputs,
13
- - linear membership checks where sets or maps fit the domain,
14
- - duplicated business-rule computation across callers.
15
-
16
- ## Safe optimization moves
17
-
18
- - Precompute lookup maps or sets at the smallest correct ownership boundary.
19
- - Move invariant computations out of loops.
20
- - Replace repeated scans with grouped data structures.
21
- - Sort once and reuse the ordering when the ordering contract is stable.
22
- - Convert repeated validation or normalization into a named helper with tests.
23
- - Preserve stable ordering when callers rely on it.
24
- - Keep data structures local unless shared ownership and invalidation are clear.
25
-
26
- ## Complexity evidence
27
-
28
- Record:
29
-
30
- - current complexity and after complexity,
31
- - input sizes where the improvement matters,
32
- - any ordering, deduplication, or equality semantics,
33
- - memory tradeoff,
34
- - correctness guardrails.
35
-
36
- Do not claim a complexity improvement when the change only moves cost to another equally hot path.
37
-
38
- ## Tradeoffs
39
-
40
- Prefer readability-preserving optimizations first. More complex data structures are justified when:
41
-
42
- - workload size is large enough,
43
- - call frequency is high enough,
44
- - the old implementation is measurably slow or asymptotically unsafe,
45
- - tests or invariants prove equivalence.
46
-
47
- Avoid clever micro-optimizations when the path is cold or the complexity cost is not material.
48
-
49
- ## Correctness checklist
50
-
51
- Before and after complexity changes, verify:
52
-
53
- - duplicates are handled the same way,
54
- - stable ordering remains stable when required,
55
- - null, empty, malformed, and boundary inputs behave the same,
56
- - floating point, currency, timestamp, and locale semantics are unchanged,
57
- - errors and side effects occur in the same order when order matters,
58
- - public API and persistence contracts remain stable.
@@ -1,53 +0,0 @@
1
- # Allocation And Hot-Loop Cleanup
2
-
3
- ## Signals
4
-
5
- Look for hot paths that:
6
-
7
- - allocate new arrays, maps, regexes, formatters, buffers, or closures repeatedly,
8
- - clone, copy, or stringify large objects unnecessarily,
9
- - parse the same payload or date repeatedly,
10
- - build strings through repeated concatenation in loops,
11
- - perform expensive logging or serialization even when logs are disabled,
12
- - retain large intermediate objects longer than needed,
13
- - create short-lived promises or tasks in very tight loops.
14
-
15
- ## Safe cleanup moves
16
-
17
- - Hoist invariant allocations out of loops.
18
- - Reuse existing parsed or normalized values.
19
- - Stream or chunk large data when repository patterns support it.
20
- - Avoid building debug payloads unless the log level or trace is enabled.
21
- - Replace repeated string concatenation with the project's idiomatic builder or join pattern.
22
- - Keep object reuse local and simple; avoid shared mutable pooling unless the codebase already uses it safely.
23
-
24
- ## Memory tradeoffs
25
-
26
- Some optimizations reduce CPU by using more memory. Record:
27
-
28
- - maximum expected input size,
29
- - retained memory lifetime,
30
- - whether data is tenant/user/request scoped,
31
- - cleanup or eviction behavior,
32
- - effect on tail latency or garbage collection.
33
-
34
- Do not retain large data globally without a clear lifecycle.
35
-
36
- ## Readability threshold
37
-
38
- Allocation cleanup may make code less direct. Accept that tradeoff only when:
39
-
40
- - profiler or benchmark evidence shows the path is hot,
41
- - the new code remains understandable,
42
- - comments or helper names explain non-obvious constraints,
43
- - tests preserve the behavior being optimized.
44
-
45
- ## Validation
46
-
47
- Use:
48
-
49
- - allocation or memory benchmarks where available,
50
- - profiler allocation samples,
51
- - operation-count tests for avoidable repeated work,
52
- - regression tests for empty, large, duplicate, and malformed inputs,
53
- - end-to-end command/request checks when the hot loop affects public output.
@@ -1,64 +0,0 @@
1
- # Caching And Memoization
2
-
3
- ## Use caching only when
4
-
5
- - the same expensive computation or read repeats,
6
- - the cached value has a clear owner,
7
- - invalidation or expiration rules are explicit,
8
- - stale data cannot violate business rules,
9
- - memory growth is bounded,
10
- - errors and partial results are handled deliberately,
11
- - benchmark or profiling evidence shows caching is worthwhile.
12
-
13
- ## Prefer simpler fixes first
14
-
15
- Before adding a cache, ask:
16
-
17
- - Can repeated work be removed by computing once in the caller?
18
- - Can the data structure be reshaped locally?
19
- - Can the query or API call fetch all needed data in one pass?
20
- - Can a pure helper accept precomputed inputs?
21
-
22
- If yes, prefer the simpler change.
23
-
24
- ## Cache design checklist
25
-
26
- Define:
27
-
28
- - key shape and equality semantics,
29
- - value ownership,
30
- - lifecycle and invalidation trigger,
31
- - maximum size or eviction policy,
32
- - concurrency behavior,
33
- - error caching policy,
34
- - observability for hit, miss, eviction, or stale detection when useful,
35
- - tests or benchmarks proving both correctness and value.
36
-
37
- ## Memoization boundaries
38
-
39
- Good boundaries:
40
-
41
- - within one request, command, job, or transaction,
42
- - inside a pure helper for repeated identical inputs,
43
- - around immutable configuration or static metadata with reload semantics,
44
- - inside an owner module that already controls lifecycle.
45
-
46
- Risky boundaries:
47
-
48
- - global mutable caches with no invalidation,
49
- - cross-tenant or cross-user caches,
50
- - caches keyed by partial authorization context,
51
- - caches around time-sensitive, money, inventory, permission, or security decisions,
52
- - caches that hide external-service failures.
53
-
54
- ## Cache removal
55
-
56
- Removing or narrowing a stale cache is a performance improvement when it:
57
-
58
- - prevents memory leaks,
59
- - avoids stale correctness risk,
60
- - reduces invalidation overhead,
61
- - improves tail latency by removing lock contention,
62
- - simplifies a path where hit rate is low.
63
-
64
- Validate both speed and correctness after removal.
@@ -1,61 +0,0 @@
1
- # Concurrency And Pipeline Tuning
2
-
3
- ## Signals
4
-
5
- Look for:
6
-
7
- - independent work performed serially on a critical path,
8
- - unbounded `Promise.all`, goroutines, tasks, threads, workers, or queue dispatch,
9
- - missing backpressure between producer and consumer,
10
- - retries that amplify overload,
11
- - locks, mutexes, transactions, or global caches causing contention,
12
- - long-running jobs that hold resources while waiting on slow IO,
13
- - pipeline stages with mismatched throughput.
14
-
15
- ## Safe concurrency moves
16
-
17
- - Add bounded concurrency with existing project primitives.
18
- - Batch independent work while preserving ordering where required.
19
- - Push slow side effects behind existing queues only when delivery semantics remain the same.
20
- - Add cancellation, timeout, or backpressure only when consistent with current contracts.
21
- - Reduce lock scope or shared mutable state when the owner is clear.
22
- - Avoid parallelizing code that depends on side-effect ordering.
23
-
24
- ## Backpressure and limits
25
-
26
- Define:
27
-
28
- - maximum concurrency,
29
- - queue size or batch size,
30
- - retry and timeout behavior,
31
- - downstream rate limits,
32
- - partial-failure handling,
33
- - ordering guarantees,
34
- - cancellation behavior.
35
-
36
- Do not introduce unbounded work in the name of throughput.
37
-
38
- ## Correctness risks
39
-
40
- Concurrency changes can alter:
41
-
42
- - result ordering,
43
- - duplicate handling,
44
- - idempotency,
45
- - transaction boundaries,
46
- - retry timing,
47
- - log order,
48
- - shared test state,
49
- - resource cleanup.
50
-
51
- Add tests or controlled integration runs for the specific risk.
52
-
53
- ## Validation
54
-
55
- Prefer:
56
-
57
- - deterministic unit tests for limiters and schedulers,
58
- - integration tests with fake slow dependencies,
59
- - operation-count or max-in-flight assertions,
60
- - load tests only when throughput or backpressure is the core claim,
61
- - logs or metrics that prove bounded behavior without leaking sensitive data.
@@ -1,78 +0,0 @@
1
- # Staged Strategy For Large Coupled Hot Paths
2
-
3
- ## Purpose
4
-
5
- Teach the agent how to keep making progress when a performance-sensitive file feels too central, too coupled, or too risky to optimize directly.
6
-
7
- The correct response is usually not "stop". The correct response is "find the next unlock step".
8
-
9
- ## Core rule
10
-
11
- A large coupled hot path is a **decomposition and measurement signal**, not a **completion blocker**.
12
-
13
- If a safe, behavior-preserving unlock step exists under current guardrails, take that step now instead of deferring the whole area.
14
-
15
- If measurement or correctness guardrails are too weak for direct optimization, strengthening them is itself the next unlock step.
16
-
17
- ## First questions to ask
18
-
19
- When a hot path feels untouchable, ask:
20
-
21
- - Which part is actually slow, and how do we know?
22
- - Which work is pure computation, IO, allocation, synchronization, or side effect?
23
- - Which workload shape, input size, or data distribution matters?
24
- - Which behavior can be locked down with characterization tests?
25
- - Which benchmark or profiler hook would isolate the bottleneck?
26
- - Which dependency seam can be introduced without changing behavior?
27
- - Which optimization would most reduce the cost of the next optimization?
28
-
29
- ## Typical unlock sequence
30
-
31
- Pick one or more of these, in the order justified by current evidence:
32
-
33
- 1. Add timing, profiling, benchmark, or characterization guardrails around current behavior.
34
- 2. Extract pure transformations or workload-shape calculations.
35
- 3. Name and isolate expensive repeated work.
36
- 4. Separate query/IO preparation from pure decision logic.
37
- 5. Introduce a narrow batching, lookup, or data-structure seam.
38
- 6. Isolate cache ownership and invalidation decisions.
39
- 7. Separate concurrency control from per-item business logic.
40
- 8. Re-scan and decide whether a deeper optimization is now safer.
41
-
42
- ## What not to do
43
-
44
- Avoid these anti-patterns:
45
-
46
- - declaring the area blocked just because it is important,
47
- - attempting a full hot-path rewrite before guardrails exist,
48
- - adding a cache because measurement is missing,
49
- - removing correctness checks to make the benchmark faster,
50
- - parallelizing side effects without proving ordering and idempotency,
51
- - escalating ordinary internal optimization into a fake macro-architecture concern,
52
- - mixing unlock work with unrelated style churn.
53
-
54
- ## Choosing the next step
55
-
56
- Prefer the next step that maximizes:
57
-
58
- 1. combined confidence from the agent's own ability, workload understanding, existing tests, benchmarks, or quickly addable guardrails,
59
- 2. quality of performance evidence,
60
- 3. leverage for future deeper optimization,
61
- 4. reduction in repeated work, IO, allocation, or contention,
62
- 5. low risk to current business behavior.
63
-
64
- If two steps are both safe, choose the one that makes the next iteration easier to measure or validate.
65
-
66
- If the file is high-risk and under-tested, prefer adding the smallest useful characterization or benchmark guardrails before attempting deeper edits. If the file is high-risk but well-guarded, do not stop only because the change is difficult; use the guardrails to validate the agent's work and repair any accidental breakage.
67
-
68
- ## Completion rule for coupled hot paths
69
-
70
- Do not ask "Can I solve the whole file now?"
71
-
72
- Ask:
73
-
74
- - "Can I measure this path more accurately in the next iteration?"
75
- - "Can I make this path meaningfully cheaper in the next iteration?"
76
- - "Can I reduce repeated work, IO, allocation, contention, or cache risk right now?"
77
-
78
- If the answer is yes, continue iterating.
@@ -1,55 +0,0 @@
1
- # IO Batching And Query Optimization
2
-
3
- ## IO signals
4
-
5
- Look for:
6
-
7
- - N+1 database, network, filesystem, or external API calls,
8
- - repeated reads of the same file, config, secret, or remote resource,
9
- - chatty persistence writes that can be safely combined,
10
- - missing projection or over-fetching large records,
11
- - queries that load broad datasets then filter in memory,
12
- - serial external calls that are independent and can be bounded or batched,
13
- - retry loops that multiply downstream load.
14
-
15
- ## Safe batching moves
16
-
17
- - Batch by the natural owner boundary and preserve per-item error reporting.
18
- - Use existing repository query builders, clients, and retry conventions.
19
- - Fetch only required fields when the API supports projection.
20
- - Push filtering, pagination, aggregation, or joins to the datastore only when semantics match.
21
- - Add bounded concurrency instead of unbounded fan-out.
22
- - Preserve idempotency, ordering guarantees, and partial-failure behavior.
23
-
24
- ## Query shaping
25
-
26
- Before changing a query, identify:
27
-
28
- - source of truth and owner module,
29
- - expected row or document counts,
30
- - indexes or access patterns already documented,
31
- - transaction and consistency requirements,
32
- - pagination and sorting contract,
33
- - caller expectations around missing or duplicate records.
34
-
35
- Do not add indexes, migrations, or datastore-level changes unless the user scope and repository conventions support them.
36
-
37
- ## External services
38
-
39
- For network and API paths:
40
-
41
- - respect provider rate limits and retry guidance,
42
- - avoid batching that exceeds request-size limits,
43
- - preserve timeout and cancellation behavior,
44
- - keep request IDs and correlation fields in logs,
45
- - use mocks, fakes, or recorded fixtures for tests when real services are not under test.
46
-
47
- ## Validation
48
-
49
- Prefer guardrails that verify:
50
-
51
- - query or request count decreases,
52
- - result set and ordering are unchanged,
53
- - partial failures are reported the same way,
54
- - retries and timeouts still follow existing policy,
55
- - batching preserves per-item authorization and validation.
@@ -1,133 +0,0 @@
1
- # Performance Iteration Gates And Stopping Criteria
2
-
3
- ## Pass discipline
4
-
5
- Each iteration must have:
6
-
7
- - a selected module or bounded module cluster,
8
- - a concrete performance target,
9
- - an explicit record of which performance job lenses were checked during the deep read,
10
- - a bounded file/symbol scope,
11
- - one or more selected execution directions,
12
- - baseline evidence or a reason measurement is unavailable,
13
- - a confidence assessment covering the agent's own ability to complete the optimization, the task's inherent difficulty, objective guardrail strength, benchmark quality, and rollback or repair paths,
14
- - expected behavior-preserving outcome,
15
- - validation plan,
16
- - rollback point if evidence contradicts the change.
17
-
18
- An iteration is not "one work type", and it also does not need to include every direction every time. Within the selected scope, choose the subset of directions that has the best current evidence and leverage: measurement, complexity, IO, caching, allocation, concurrency, and/or staged unlock work.
19
-
20
- Confidence is not a synonym for "easy". Assess whether the agent has enough understanding, skill, workload context, tests, benchmarks, validation commands, and recovery path to complete the optimization safely. A hard task can still be high-confidence when strong guardrails, characterization coverage, and clear rollback let the agent repair mistakes by making the guarded behavior green again.
21
-
22
- Avoid starting a broad second iteration before validating the first, but do not stop after a validated iteration if known actionable performance issues remain anywhere in the in-scope codebase.
23
-
24
- Do not stop after a validated iteration if any in-scope module remains unvisited in the module coverage ledger.
25
-
26
- ## Validation cadence
27
-
28
- Run validation from narrow to broad:
29
-
30
- 1. Formatter or type check for touched files when available.
31
- 2. Unit tests for touched helpers and modules.
32
- 3. Benchmarks, profiler runs, query-count checks, or operation-count checks for the optimized path when available.
33
- 4. Integration tests for affected chains.
34
- 5. Broader suite or build once multiple passes interact.
35
-
36
- If validation fails:
37
-
38
- - determine whether the failure is pre-existing, stale test expectation, flaky benchmark, test isolation issue, or real product bug,
39
- - fix the true owner,
40
- - keep regression coverage for real defects,
41
- - do not mask failures by weakening assertions or widening benchmark budgets without evidence.
42
-
43
- If validation passes and the performance plus correctness guardrails meaningfully cover the changed behavior, do not keep a known bottleneck in place purely because of subjective confidence concerns. Reassess whether the agent has enough capability and objective support to proceed; if yes, continue, and if no, choose the smallest measurement, benchmark, or unlock step that would make the next optimization credible.
44
-
45
- The final stopping condition also requires the relevant guarded test surface to be green; a partially red repository is not a completed optimization outcome.
46
-
47
- ## Re-scan after each iteration
48
-
49
- Inspect the full known performance backlog for:
50
-
51
- - modules that are still unvisited or only shallowly read,
52
- - modules that were read but not yet checked against every available performance job lens,
53
- - new repeated work after moved or extracted concepts,
54
- - remaining N+1 calls, serial round trips, or excessive query shapes,
55
- - caches that are stale, unbounded, or unnecessary,
56
- - hot loops that still allocate avoidable objects,
57
- - concurrency changes that need backpressure or max-in-flight proof,
58
- - logs, metrics, traces, or benchmarks that now describe stale names or paths,
59
- - documentation or `AGENTS.md/CLAUDE.md` drift.
60
-
61
- Then choose the next execution directions with these priorities:
62
-
63
- 1. strongest bottleneck evidence,
64
- 2. largest user-visible or high-frequency impact,
65
- 3. highest combined confidence from agent capability, workload understanding, correctness guardrails, benchmark quality, and recovery path,
66
- 4. strongest leverage for later deeper optimization,
67
- 5. lowest business-risk path toward broader system improvement.
68
-
69
- Use `references/job-selection.md` to convert those priorities into a concrete next-job choice.
70
-
71
- ## Stage-gate after each iteration
72
-
73
- After every validated iteration, run a deliberate full-codebase decision pass:
74
-
75
- 1. Re-scan the repository and refresh the known performance backlog.
76
- 2. Refresh the module coverage ledger and identify unvisited in-scope modules.
77
- 3. Ask whether any known in-scope actionable bottleneck still remains.
78
- 4. If yes, decide whether it should be addressed in the very next iteration or whether measurement or unlock work is needed first.
79
- 5. If the obstacle is a large, coupled, or central hot path, do not stop there; switch to staged unlock work and continue.
80
- 6. Only declare the repository iteration-complete when the re-scan shows no remaining actionable in-scope bottleneck and no unvisited in-scope module except items that are explicitly deferred or excluded under the allowed stop categories.
81
-
82
- This stage-gate is mandatory. A validated local optimization does not by itself mean the repository is done.
83
-
84
- ## Continue when
85
-
86
- Repeat the cycle when:
87
-
88
- - any known in-scope actionable performance issue remains unresolved,
89
- - any in-scope module remains unvisited,
90
- - measured slow paths remain,
91
- - clear algorithmic waste remains,
92
- - avoidable IO, query, or external-service round trips remain,
93
- - unsafe or low-value caches need removal or replacement,
94
- - allocation churn or hot-loop repeated work remains in a material path,
95
- - concurrency, backpressure, or batching gaps remain,
96
- - benchmarks or profiling are missing for a high-risk optimization that is otherwise actionable.
97
-
98
- Do not produce a final completion report while any item in this section is true. Continue with the next bounded iteration instead.
99
-
100
- ## Stop when
101
-
102
- Stop only when there are no unresolved known in-scope actionable performance issues. Any remaining candidates must be explicitly classified as one of:
103
-
104
- - low-value micro-optimization,
105
- - speculative without concrete evidence,
106
- - production-measurement-only with no safe local proxy,
107
- - public contract migrations,
108
- - macro-architecture changes,
109
- - product behavior changes needing user approval,
110
- - blocked by unavailable credentials, unstable external systems, or missing documentation,
111
- - untestable with the current repository tooling and too risky to change safely.
112
-
113
- If a remaining candidate cannot be placed in one of these categories, it is still an actionable gap and the agent must continue iterating rather than complete the task.
114
-
115
- If an in-scope module has not received a performance deep-read iteration, it is still an actionable coverage gap even when the already-read modules look fast.
116
-
117
- ## Completion evidence
118
-
119
- The final report should make the stopping point auditable:
120
-
121
- - passes completed,
122
- - execution directions selected per iteration,
123
- - module or module cluster covered per iteration,
124
- - performance job lenses checked per iteration,
125
- - final module coverage ledger,
126
- - stage-gate verdict after each full-codebase re-scan,
127
- - validation commands and outcomes,
128
- - confirmation that the guarded test surface is green after the optimization,
129
- - speedup, throughput, latency, CPU, memory, allocation, query-count, or complexity evidence,
130
- - behavior-preservation evidence,
131
- - docs and constraints sync status,
132
- - proof that the latest scan found no known actionable in-scope performance issues,
133
- - deferred items with reason and required approval, dependency, production data, or safety constraint.
@@ -1,92 +0,0 @@
1
- # Performance Job Selection Guide
2
-
3
- ## Purpose
4
-
5
- Help the agent choose the next execution direction after each full-codebase performance re-scan.
6
-
7
- These are job-selection rules for Step 2 of the main skill loop. They are not workflow steps.
8
-
9
- The goal is not to force one permanent order. The goal is to choose the next job that most safely improves the selected module or module cluster and unlocks later work.
10
-
11
- Before choosing, the agent should first scan the selected module through every available performance job lens. Job selection happens after that scan; it is not a substitute for that scan.
12
-
13
- ## Available jobs
14
-
15
- - measurement / benchmarking
16
- - algorithmic complexity / repeated-work cleanup
17
- - IO batching / query shaping
18
- - caching / memoization
19
- - allocation / hot-loop cleanup
20
- - concurrency / pipeline tuning
21
- - staged unlock work
22
-
23
- ## Choose `measurement / benchmarking` when
24
-
25
- - the slow path is suspected but not proven,
26
- - multiple candidate bottlenecks compete and prioritization would otherwise be guesswork,
27
- - an optimization could regress correctness or user-visible latency without a baseline,
28
- - production symptoms exist but local reproduction or profiling is missing.
29
-
30
- ## Choose `algorithmic complexity / repeated-work cleanup` when
31
-
32
- - nested loops, repeated scans, repeated sorts, or repeated filters dominate the path,
33
- - the same derived value is recomputed for every item or request,
34
- - a more appropriate data structure would reduce asymptotic or constant-factor cost,
35
- - the complexity change can be proven without changing business behavior.
36
-
37
- ## Choose `IO batching / query shaping` when
38
-
39
- - N+1 database, network, filesystem, or external API calls exist,
40
- - serial round trips can be safely batched or preloaded,
41
- - query predicates, projections, pagination, or indexes are mismatched to real access patterns,
42
- - external calls lack timeout, retry, or result-shape handling that affects throughput.
43
-
44
- ## Choose `caching / memoization` when
45
-
46
- - the same expensive pure or owner-scoped computation repeats,
47
- - cache ownership, invalidation, size bounds, and failure behavior are clear,
48
- - stale data cannot violate business rules,
49
- - a simpler repeated-work or data-structure fix is insufficient.
50
-
51
- ## Choose `allocation / hot-loop cleanup` when
52
-
53
- - tight loops allocate avoidable objects, strings, buffers, regexes, or closures,
54
- - repeated serialization, parsing, cloning, copying, or formatting appears in a hot path,
55
- - memory pressure or garbage collection is part of the observed slowdown,
56
- - changes can preserve readability or are justified by strong hot-path evidence.
57
-
58
- ## Choose `concurrency / pipeline tuning` when
59
-
60
- - safe independent work is unnecessarily serial,
61
- - current parallelism is unbounded or overloads downstream resources,
62
- - queues, workers, streams, or async tasks lack backpressure,
63
- - batching or bounded concurrency would improve throughput without changing ordering guarantees.
64
-
65
- ## Choose `staged unlock work` when
66
-
67
- - the file feels too central or too coupled for direct optimization,
68
- - no safe full hot-path rewrite exists yet, but a preparatory step does,
69
- - you can reduce risk through measurement hooks, seam extraction, characterization tests, type extraction, data-shape clarification, or side-effect isolation,
70
- - the best next move is to make a future optimization cheaper rather than solve the whole area now.
71
-
72
- ## Tie-breakers
73
-
74
- If multiple jobs are plausible, prefer the one that:
75
-
76
- 1. addresses the highest-evidence bottleneck,
77
- 2. improves the most user-visible or high-frequency path,
78
- 3. increases safety for the next iteration,
79
- 4. removes the strongest blocker to a deeper future optimization,
80
- 5. helps an unvisited module reach performance deep-read coverage,
81
- 6. matches the agent's self-assessed ability to understand, execute, benchmark, and repair the change under current evidence,
82
- 7. preserves behavior with the clearest available guardrails.
83
-
84
- ## Hard rule
85
-
86
- If performance evidence is too weak, `measurement / benchmarking` should usually win before code changes.
87
-
88
- If a high-risk hot path lacks enough correctness guardrails, benchmark or characterization guardrail work should usually win before a deeper optimization.
89
-
90
- If the area is difficult but the agent can explain the workload, behavior, affected contracts, rollback path, and available tests or benchmarks clearly, do not downgrade confidence just because the optimization is non-trivial. Strong guardrails mean accidental breakage should be repaired by returning the test and benchmark surface to green, not avoided by leaving an actionable bottleneck in place.
91
-
92
- If any in-scope module remains unvisited, choose jobs that help the next highest-evidence or easiest useful unvisited module become deeply read, improved, or validated-clear before spending another round on already-familiar areas.
@@ -1,78 +0,0 @@
1
- # Measurement And Benchmarking
2
-
3
- ## Principle
4
-
5
- Optimize from evidence. A performance change should have at least one of:
6
-
7
- - production latency, throughput, CPU, memory, queue, or query evidence,
8
- - profiler output,
9
- - repeatable benchmark baseline,
10
- - test runtime profile,
11
- - log or trace timings,
12
- - clear algorithmic complexity proof tied to a plausible workload.
13
-
14
- If none exists, measurement is usually the next job.
15
-
16
- Measurement also informs confidence, but it is not the only input. The agent must assess its own ability to understand and complete the optimization, then combine that self-assessment with task difficulty, benchmark quality, correctness tests, rollback options, and repair paths. Strong tests and repeatable benchmarks should make difficult changes more actionable because failures can be diagnosed and driven back to green.
17
-
18
- ## Baseline rules
19
-
20
- Before changing a hot path, record:
21
-
22
- - command, scenario, fixture, data size, seed, and environment,
23
- - current timing, throughput, allocation, query count, memory, or operation count,
24
- - variance or repeated-run notes when possible,
25
- - correctness oracle used with the benchmark,
26
- - reason if the path can only be measured in production.
27
-
28
- Do not compare a cold-cache baseline with a warm-cache after result unless that is the intended user-visible scenario.
29
-
30
- ## Benchmark selection
31
-
32
- Use the cheapest reliable benchmark that proves the risk:
33
-
34
- - unit microbenchmarks for pure hot helpers,
35
- - integration benchmarks for query, serialization, or multi-module orchestration paths,
36
- - command or request benchmarks for user-visible entrypoints,
37
- - load tests only when concurrency, backpressure, or throughput is the actual risk,
38
- - profiler snapshots when the bottleneck location is unknown.
39
-
40
- Avoid expensive load tests when a deterministic benchmark or integration test proves the same performance issue.
41
-
42
- ## Before/after comparisons
43
-
44
- A useful comparison names:
45
-
46
- - baseline command and result,
47
- - after command and result,
48
- - data shape and scale,
49
- - correctness validation,
50
- - variance or caveat,
51
- - whether the improvement is latency, throughput, CPU, memory, allocation, query count, or algorithmic complexity.
52
-
53
- If exact numbers are unstable, report operation counts, query counts, asymptotic complexity, or profiler rank change instead of pretending precision exists.
54
-
55
- ## Guardrail design
56
-
57
- Performance guardrails should fail on meaningful regressions, not noise.
58
-
59
- Prefer:
60
-
61
- - deterministic operation or query counts,
62
- - bounded latency budgets with generous margins only when the environment is stable,
63
- - regression tests for duplicate work removal,
64
- - benchmark scripts documented for local use,
65
- - assertions on cache invalidation behavior,
66
- - profiler notes for manual verification when automated thresholds are unreliable.
67
-
68
- Do not add flaky timing thresholds to CI when the repository has no stable benchmark environment.
69
-
70
- ## Production-only measurement
71
-
72
- When a bottleneck requires production data or credentials:
73
-
74
- - capture the best local proxy evidence available,
75
- - document the missing data source,
76
- - avoid speculative rewrites that cannot be validated,
77
- - add safe instrumentation or benchmark hooks if approved and useful,
78
- - classify the remaining item as production-measurement-only if no safe local action remains.