npm - codeforge-dev - Versions diffs - 1.5.7 → 1.7.0 - Mend

codeforge-dev 1.5.7 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (80) hide show

package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/references/interpreting-results.md ADDED Viewed

@@ -0,0 +1,235 @@
+# Interpreting Profiler Output
+How to read and analyze output from profiling tools, with annotated examples.
+## Contents
+- [Reading cProfile Output](#reading-cprofile-output)
+- [Reading Flamegraphs](#reading-flamegraphs)
+- [Reading memory_profiler Output](#reading-memory_profiler-output)
+- [Reading line_profiler Output](#reading-line_profiler-output)
+- [Reading `time` Output](#reading-time-output)
+- [Benchmark Result Analysis](#benchmark-result-analysis)
+- [Common Pitfalls](#common-pitfalls)
+---
+## Reading cProfile Output
+### Annotated Example
+```
+         2847 function calls (2832 primitive calls) in 1.234 seconds
+   Ordered by: cumulative time
+   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
+        1    0.000    0.000    1.234    1.234 app.py:1(<module>)
+        1    0.002    0.002    1.230    1.230 app.py:45(process_data)
+      100    0.850    0.009    1.100    0.011 app.py:60(parse_record)
+      100    0.200    0.002    0.200    0.002 app.py:80(validate_fields)
+      100    0.050    0.001    0.050    0.001 app.py:95(compute_hash)
+     1000    0.030    0.000    0.030    0.000 {built-in method builtins.len}
+      500    0.025    0.000    0.025    0.000 {method 'split' of 'str' objects}
+```
+**Column meanings:**
+- `ncalls`: Number of calls. `100/50` means 100 total calls, 50 non-recursive.
+- `tottime`: Time spent **in this function only** (excluding subfunctions). This is where the CPU actually spent its cycles.
+- `percall` (first): `tottime / ncalls`. Average time per call in this function body.
+- `cumtime`: Time spent in this function **and all functions it calls**. This is the total "cost" of calling this function.
+- `percall` (second): `cumtime / ncalls`. Average total cost per call.
+**How to read this example:**
+1. `process_data` takes 1.23s cumulative but only 0.002s in its own body → it's a coordinator, not the bottleneck.
+2. `parse_record` takes 0.85s in its own body (`tottime`) and 1.1s cumulative → it's the hot function. The 0.25s difference (1.1 - 0.85) is spent in its subfunctions.
+3. `validate_fields` takes 0.2s → secondary target for optimization.
+4. Built-in functions (`len`, `split`) are fast — don't optimize these.
+**Action:** Focus on `parse_record`. It's called 100 times, spending 8.5ms per call in its own body. Can you cache results, reduce calls, or use a faster parsing library?
+---
+## Reading Flamegraphs
+### Anatomy of a Flamegraph
+A flamegraph is a visualization where:
+- **X-axis** = stack frames sorted alphabetically (NOT time). Width = proportion of total samples.
+- **Y-axis** = call stack depth. Bottom = entry point, top = leaf function.
+- **Color** = typically random warm colors (no semantic meaning by default).
+### What to Look For
+**Wide bars at the top (plateaus):**
+These are leaf functions where the CPU actually spends time. A wide bar at the top of the graph means this function is consuming a large portion of CPU time directly.
+```
+Example: A wide "json.loads" bar at the top means JSON parsing is the bottleneck.
+Action: Reduce the number of parse calls, use a faster JSON library (orjson, ujson),
+or change the data format.
+```
+**Wide bars at the bottom:**
+These are entry points that lead to expensive call trees. The function itself may be cheap, but its children are expensive.
+```
+Example: A wide "handle_request" bar at the bottom that narrows into many children
+means request handling is expensive collectively, but no single child dominates.
+Action: Look for the widest children and optimize those first.
+```
+**Towers (deep narrow stacks):**
+Deep but narrow stacks are recursive calls or deeply nested abstractions. They're not usually bottlenecks unless they're also wide.
+**Missing frames:**
+If the flamegraph shows `[unknown]` or gaps, the profiler couldn't resolve the frame. This happens with:
+- JIT-compiled code (Node.js, Java) — use `--perf-basic-prof` for Node
+- Native extensions — use `py-spy --native` for Python
+- Optimized code with frame pointers stripped — compile with `-fno-omit-frame-pointer`
+---
+## Reading memory_profiler Output
+### Annotated Example
+```
+Line #    Mem usage    Increment  Occurrences   Line Contents
+============================================================
+    10     50.2 MiB     50.2 MiB           1   @profile
+    11                                         def process():
+    12    550.2 MiB    500.0 MiB           1       data = load_csv("big.csv")
+    13    750.2 MiB    200.0 MiB           1       expanded = expand_rows(data)
+    14    780.5 MiB     30.3 MiB           1       result = aggregate(expanded)
+    15    280.5 MiB   -500.0 MiB           1       del data
+    16    180.5 MiB   -100.0 MiB           1       del expanded
+    17    180.5 MiB      0.0 MiB           1       return result
+```
+**How to read:**
+- `Mem usage`: Total memory of the process at this line.
+- `Increment`: Change in memory from the previous line.
+- Line 12: Loading the CSV adds 500 MiB. This is the peak driver.
+- Line 13: Expanding rows adds another 200 MiB. Peak memory is 780 MiB.
+- Lines 15-16: Deleting intermediate data reclaims 600 MiB.
+- **Peak memory: 780.5 MiB** (at line 14).
+**Action:** If 780 MiB is too much:
+1. Process the CSV in chunks instead of loading all at once.
+2. Stream `expand_rows` as a generator instead of materializing the full list.
+3. If `data` is only needed for expansion, delete it before aggregation (already done here).
+---
+## Reading line_profiler Output
+### Annotated Example
+```
+Timer unit: 1e-06 s
+Total time: 2.5 s
+File: parser.py
+Function: parse_records at line 15
+Line #      Hits         Time  Per Hit   % Time  Line Contents
+==============================================================
+    15                                           def parse_records(raw_data):
+    16         1          5.0      5.0      0.0      results = []
+    17      1000       2500.0      2.5      0.1      for line in raw_data:
+    18      1000    1200000.0   1200.0     48.0          parsed = json.loads(line)
+    19      1000     800000.0    800.0     32.0          validated = validate(parsed)
+    20       950     450000.0    473.7     18.0          results.append(transform(validated))
+    21        50      47500.0    950.0      1.9          log_invalid(parsed)
+    22         1          2.0      2.0      0.0      return results
+```
+**How to read:**
+- `Hits`: How many times the line executed.
+- `Time`: Total time spent on this line (microseconds).
+- `Per Hit`: Average time per execution.
+- `% Time`: Percentage of total function time.
+**Analysis:**
+- Line 18 (`json.loads`): 48% of time. 1.2ms per call × 1000 calls = 1.2s.
+- Line 19 (`validate`): 32% of time. 0.8ms per call.
+- Line 20 (`transform`): 18% of time. 0.47ms per call, but only 950 hits (50 were invalid).
+**Action:** `json.loads` is the primary target. Options:
+1. Use `orjson.loads` (3-10x faster than `json.loads`).
+2. If the JSON structure is known, use a streaming parser.
+3. If data is coming from a controlled source, consider a faster format (msgpack).
+---
+## Reading `time` Output
+```bash
+$ /usr/bin/time -v python script.py
+        Command being timed: "python script.py"
+        User time (seconds): 3.45          ← CPU time in user space
+        System time (seconds): 0.12        ← CPU time in kernel
+        Elapsed (wall clock) time: 8.23    ← actual time elapsed
+        Maximum resident set size (kbytes): 524288  ← peak memory (512 MB)
+        Major (requiring I/O) page faults: 0
+        Minor (reclaiming a frame) page faults: 131072
+        Voluntary context switches: 1523   ← waiting for I/O
+        Involuntary context switches: 45   ← preempted by scheduler
+```
+**Interpretation patterns:**
+| Condition | Meaning | Action |
+|-----------|---------|--------|
+| wall >> user + sys | I/O bound | Profile I/O: network calls, disk reads, sleep/wait |
+| user >> sys | CPU bound (computation) | Profile CPU: use cProfile or py-spy |
+| sys >> user | Kernel bound (syscalls) | Profile syscalls: use strace |
+| high voluntary ctx switches | Lots of I/O waiting | Batch I/O, use async, reduce round-trips |
+| high involuntary ctx switches | CPU contention | Reduce thread count, check other processes |
+| high max RSS | Memory hungry | Profile memory: use memory_profiler or scalene |
+**This example:** Wall time (8.23s) >> user + sys (3.57s) → the script is I/O bound, spending 4.66s waiting for something. Investigate network calls, database queries, or file I/O.
+---
+## Benchmark Result Analysis
+### hyperfine Output
+```
+Benchmark 1: python v1.py
+  Time (mean ± σ):      1.234 s ±  0.056 s    [User: 1.180 s, System: 0.045 s]
+  Range (min … max):    1.156 s …  1.345 s    10 runs
+Benchmark 2: python v2.py
+  Time (mean ± σ):      0.876 s ±  0.034 s    [User: 0.830 s, System: 0.040 s]
+  Range (min … max):    0.823 s …  0.934 s    10 runs
+Summary
+  python v2.py ran
+    1.41 ± 0.08 times faster than python v1.py
+```
+**How to evaluate:**
+1. **Is the difference significant?** The 1.41x speedup is outside the standard deviation range, so yes.
+2. **Is the variance acceptable?** σ = 0.056s for v1, 0.034s for v2. Both are <5% of the mean — good.
+3. **Is the improvement meaningful?** 1.234s → 0.876s = 0.358s saved. For a batch job running once: marginal. For a request handler running 1000x/sec: substantial.
+### Statistical Significance Rules of Thumb
+- **Difference > 2σ**: Likely real (p < 0.05 roughly).
+- **Difference < 1σ**: Probably noise. Don't ship it.
+- **Coefficient of variation (σ/mean) > 10%**: Your benchmark is noisy. Increase runs, reduce background load, or pin CPU frequency.
+- **Outliers in range**: If min and max are far apart, investigate. Was there a GC pause? A background process?
+---
+## Common Pitfalls
+1. **Profiling optimized code with debug flags**: Debug builds disable optimizations. Profile release/production builds.
+2. **Profiling on a loaded machine**: Other processes compete for CPU. Use isolated environments for benchmarks.
+3. **Ignoring warmup**: JIT compilers (Node.js V8, PyPy) are slow on first run. Always warm up.
+4. **Optimizing by percentage**: A 50% improvement on a 2ms function saves 1ms. A 5% improvement on a 10s function saves 500ms. Optimize by absolute time, not percentage.
+5. **Micro-benchmarking in isolation**: A function that's fast alone may be slow under real load (cache eviction, memory pressure, GC pauses). Benchmark in realistic conditions.

package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/references/tool-commands.md ADDED Viewed

@@ -0,0 +1,395 @@
+# Performance Profiling: Tool Command Reference
+Full command reference for Python, JavaScript, and system profiling tools.
+## Contents
+- [Python Profiling Tools](#python-profiling-tools)
+  - [cProfile](#cprofile)
+  - [py-spy](#py-spy)
+  - [scalene](#scalene)
+  - [memory_profiler](#memory_profiler)
+  - [line_profiler](#line_profiler)
+  - [pytest-benchmark](#pytest-benchmark)
+- [JavaScript / Node.js Profiling Tools](#javascript--nodejs-profiling-tools)
+  - [V8 Built-in Profiler](#v8-built-in-profiler)
+  - [clinic.js](#clinicjs)
+  - [Chrome DevTools (Node.js)](#chrome-devtools-nodejs)
+  - [Lighthouse](#lighthouse)
+- [System Profiling Tools](#system-profiling-tools)
+  - [time](#time)
+  - [htop / top](#htop--top)
+  - [iostat](#iostat)
+  - [perf (Linux)](#perf-linux)
+  - [strace (Linux)](#strace-linux)
+- [Benchmarking Tools](#benchmarking-tools)
+  - [hyperfine](#hyperfine)
+---
+## Python Profiling Tools
+### cProfile
+```bash
+# Profile a script, sorted by cumulative time
+python -m cProfile -s cumtime script.py
+# Sort options: calls, cumulative, filename, line, module, name, nfl, pcalls,
+#               stdname, time, tottime
+python -m cProfile -s tottime script.py     # sort by time spent in function itself
+python -m cProfile -s calls script.py       # sort by call count
+# Save profile data for later analysis
+python -m cProfile -o output.prof script.py
+# Analyze saved profile
+python -c "
+import pstats
+p = pstats.Stats('output.prof')
+p.strip_dirs()
+p.sort_stats('cumulative')
+p.print_stats(30)             # top 30 functions
+p.print_callers('function_name')  # who calls this function
+p.print_callees('function_name')  # what does this function call
+"
+# Profile a specific function in code
+import cProfile
+profiler = cProfile.Profile()
+profiler.enable()
+result = my_function()
+profiler.disable()
+profiler.print_stats(sort='cumulative')
+```
+### py-spy
+```bash
+# Install
+pip install py-spy
+# Record a flamegraph (SVG output)
+py-spy record -o flamegraph.svg -- python script.py
+# Record with specific rate (samples/second, default 100)
+py-spy record --rate 200 -o flamegraph.svg -- python script.py
+# Record native (C extension) frames too
+py-spy record --native -o flamegraph.svg -- python script.py
+# Attach to a running process
+py-spy record -o flamegraph.svg --pid 12345
+# Record for a specific duration (seconds)
+py-spy record --duration 30 -o flamegraph.svg --pid 12345
+# Top-like live view
+py-spy top -- python script.py
+py-spy top --pid 12345
+# Dump current stack traces (one-shot)
+py-spy dump --pid 12345
+# Output formats
+py-spy record -f speedscope -o profile.json -- python script.py  # speedscope
+py-spy record -f raw -o profile.txt -- python script.py          # raw text
+# Profile subprocesses too
+py-spy record --subprocesses -o flamegraph.svg -- python script.py
+# Filter by thread
+py-spy record --threads -o flamegraph.svg -- python script.py
+```
+### scalene
+```bash
+# Install
+pip install scalene
+# Basic profile
+scalene script.py
+# CPU only
+scalene --cpu script.py
+# Memory only
+scalene --memory script.py
+# Reduced profile (only functions with significant time)
+scalene --reduced-profile script.py
+# Profile specific files only
+scalene --profile-only mymodule.py script.py
+# Output formats
+scalene --json --outfile profile.json script.py
+scalene --html --outfile profile.html script.py
+# Programmatic usage
+from scalene import scalene_profiler
+scalene_profiler.start()
+# ... code to profile ...
+scalene_profiler.stop()
+```
+### memory_profiler
+```bash
+# Install
+pip install memory_profiler
+# Profile a script (requires @profile decorators in code)
+python -m memory_profiler script.py
+# Time-based memory usage plot (outputs to mprofile_*.dat)
+mprof run script.py
+mprof plot                    # opens matplotlib plot
+mprof plot -o memory.png      # save to file
+# Track memory over time for a running process
+mprof run --include-children script.py
+```
+### line_profiler
+```bash
+# Install
+pip install line_profiler
+# Profile (requires @profile decorators in code)
+kernprof -l script.py         # generates script.py.lprof
+python -m line_profiler script.py.lprof  # view results
+# Or combined
+kernprof -l -v script.py      # run and view immediately
+```
+### pytest-benchmark
+```bash
+# Install
+pip install pytest-benchmark
+# Run benchmarks only
+pytest --benchmark-only
+# Sort by mean time
+pytest --benchmark-sort=mean
+# Other sort options: min, max, stddev, name, fullname, rounds
+pytest --benchmark-sort=stddev
+# Save baseline
+pytest --benchmark-save=baseline
+# Compare against saved baseline
+pytest --benchmark-compare=0001_baseline
+# Minimum rounds and warmup
+pytest --benchmark-min-rounds=20 --benchmark-warmup=on
+# Disable GC during benchmarks (more stable results)
+pytest --benchmark-disable-gc
+# Output formats
+pytest --benchmark-json=results.json
+pytest --benchmark-histogram=output    # generates output.svg
+```
+---
+## JavaScript / Node.js Profiling Tools
+### V8 Built-in Profiler
+```bash
+# Generate V8 profile log
+node --prof app.js
+# Process the log
+node --prof-process isolate-*.log > profile.txt
+# CPU profiling with V8 inspector
+node --cpu-prof app.js
+# Generates CPU.*.cpuprofile — open in Chrome DevTools
+# Heap snapshot
+node --heap-prof app.js
+# Generates Heap.*.heapprofile
+```
+### clinic.js
+```bash
+# Install
+npm install -g clinic
+# Doctor: overall health (event loop delays, GC, active handles)
+clinic doctor -- node app.js
+# Flame: CPU flamegraph
+clinic flame -- node app.js
+# Bubbleprof: async flow visualization
+clinic bubbleprof -- node app.js
+# HeapProfiler: memory allocation tracking
+clinic heapprofiler -- node app.js
+# Combine with autocannon for load testing
+clinic doctor --autocannon [ /api/endpoint ] -- node app.js
+clinic flame --autocannon [ -m POST /api/data ] -- node app.js
+```
+### Chrome DevTools (Node.js)
+```bash
+# Start with inspector (attach when ready)
+node --inspect app.js
+# Start with inspector and break on first line
+node --inspect-brk app.js
+# Custom port
+node --inspect=0.0.0.0:9229 app.js
+# Then open chrome://inspect in Chrome and click "inspect"
+```
+### Lighthouse
+```bash
+# Install
+npm install -g lighthouse
+# Basic audit
+lighthouse https://example.com
+# Output formats
+lighthouse https://example.com --output json --output-path report.json
+lighthouse https://example.com --output html --output-path report.html
+# Specific categories
+lighthouse https://example.com --only-categories=performance
+# Mobile vs Desktop
+lighthouse https://example.com --preset=desktop
+lighthouse https://example.com --preset=perf   # mobile (default)
+# Headless Chrome flags
+lighthouse https://example.com --chrome-flags="--headless --no-sandbox"
+```
+---
+## System Profiling Tools
+### time
+```bash
+# Basic timing
+time python script.py
+# GNU time with more details (note: use \time or /usr/bin/time, not the shell builtin)
+/usr/bin/time -v python script.py
+# Outputs: wall clock, user CPU, system CPU, max RSS, page faults, context switches
+```
+### htop / top
+```bash
+# Interactive process monitor
+htop
+# Monitor specific PID
+htop -p 12345
+# Sort by memory
+htop --sort-key=PERCENT_MEM
+# Non-interactive (for scripting)
+top -b -n 1 -p 12345
+```
+### iostat
+```bash
+# Disk I/O statistics, refresh every 1 second
+iostat -x 1
+# Specific device
+iostat -x -d sda 1
+# Key columns: r/s (reads/sec), w/s (writes/sec), %util (device utilization)
+```
+### perf (Linux)
+```bash
+# Count hardware events
+perf stat python script.py
+# Reports: cycles, instructions, cache misses, branch misses
+# Specific events
+perf stat -e cache-misses,cache-references python script.py
+# Record for flamegraph
+perf record -g python script.py
+perf script > perf.data.txt
+# Generate flamegraph from perf data
+# (requires FlameGraph tools: https://github.com/brendangregg/FlameGraph)
+perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
+```
+### strace (Linux)
+```bash
+# Summary of syscall time
+strace -c python script.py
+# Trace specific syscall categories
+strace -e trace=network python script.py    # network calls
+strace -e trace=file python script.py       # file operations
+strace -e trace=memory python script.py     # memory operations
+# Trace with timestamps
+strace -t python script.py                  # HH:MM:SS
+strace -T python script.py                  # time spent in each syscall
+```
+---
+## Benchmarking Tools
+### hyperfine
+```bash
+# Install: cargo install hyperfine, or brew install hyperfine
+# Basic benchmark
+hyperfine 'python script.py'
+# With warmup runs
+hyperfine --warmup 3 'python script.py'
+# Compare two commands
+hyperfine --warmup 3 'python v1.py' 'python v2.py'
+# Parameter sweep
+hyperfine --warmup 3 -P threads 1 8 'python script.py --threads {threads}'
+# Parameter list
+hyperfine --warmup 3 -L algo bubble,merge,quick 'python sort.py --algo {algo}'
+# Minimum runs
+hyperfine --min-runs 20 'python script.py'
+# Setup and cleanup commands
+hyperfine --setup 'python generate_data.py' --cleanup 'rm data.tmp' 'python script.py'
+# Export results
+hyperfine --warmup 3 --export-json results.json --export-markdown results.md 'python script.py'
+```