npm - codeforge-dev - Versions diffs - 1.5.7 → 1.7.0 - Mend

codeforge-dev 1.5.7 → 1.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (80) hide show

package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/git-forensics/references/investigation-playbooks.md ADDED Viewed

@@ -0,0 +1,319 @@
+# Git Investigation Playbooks
+Step-by-step playbooks for common git forensic investigations.
+## Contents
+- [Playbook 1: Finding When a Bug Was Introduced](#playbook-1-finding-when-a-bug-was-introduced)
+- [Playbook 2: Finding Deleted Code](#playbook-2-finding-deleted-code)
+- [Playbook 3: Tracing a Line's History](#playbook-3-tracing-a-lines-history)
+- [Playbook 4: Recovering Lost Work](#playbook-4-recovering-lost-work)
+- [Playbook 5: Identifying Hot Spots and Code Ownership](#playbook-5-identifying-hot-spots-and-code-ownership)
+- [Playbook 6: Understanding a Complex Merge Conflict](#playbook-6-understanding-a-complex-merge-conflict)
+---
+## Playbook 1: Finding When a Bug Was Introduced
+**Scenario:** A feature that used to work is now broken. You need to find the exact commit that introduced the regression.
+### Step 1: Establish the boundary
+```bash
+# Find a known-good commit (e.g., last release tag)
+git tag --list "v*" --sort=-version:refname | head -5
+# Verify the good commit is actually good
+git stash  # save current work
+git checkout v2.1.0
+# run the failing test or reproduce the bug manually
+# if it passes, this is your good commit
+```
+### Step 2: Start bisect
+```bash
+git bisect start
+git bisect bad HEAD           # current commit is bad
+git bisect good v2.1.0        # last known good commit
+```
+### Step 3: Test each checkout
+Git will check out a commit roughly in the middle. Test it:
+```bash
+# Option A: Manual testing
+pytest tests/test_feature.py -x
+# If it passes: git bisect good
+# If it fails: git bisect bad
+# Option B: Automated
+git bisect run pytest tests/test_feature.py -x
+```
+### Step 4: Analyze the result
+```bash
+# Git outputs the first bad commit
+# Example: abc1234 is the first bad commit
+# Examine the commit
+git show abc1234
+git show abc1234 --stat
+# Understand the context
+git log --oneline abc1234~5..abc1234
+```
+### Step 5: Clean up
+```bash
+git bisect reset
+git stash pop  # restore your work
+```
+---
+## Playbook 2: Finding Deleted Code
+**Scenario:** A function or class that used to exist has been deleted. You need to find when it was removed and why.
+### Step 1: Search for when the code was last present
+```bash
+# Find commits that changed the count of the string
+git log -S "def calculate_tax" --oneline
+# Output shows commits where the function was added or removed
+# The LAST commit in the list removed it; the FIRST added it
+```
+### Step 2: Examine the removal commit
+```bash
+# Show the commit that removed the function
+git show abc1234
+# See the full file at the commit BEFORE removal
+git show abc1234^:path/to/file.py
+# See the diff to understand what replaced it
+git diff abc1234^..abc1234 -- path/to/file.py
+```
+### Step 3: Find the file if it was renamed or moved
+```bash
+# If the file itself was deleted, find when
+git log --diff-filter=D --summary | grep "path/to/file.py"
+# If the function moved to another file, search with -G
+git log -G "def calculate_tax" --oneline --all
+```
+### Step 4: Recover the deleted code
+```bash
+# Get the file content from before the deletion
+git show abc1234^:path/to/file.py > recovered_file.py
+# Or cherry-pick just the function (manual extraction)
+git show abc1234^:path/to/file.py | grep -A 50 "def calculate_tax"
+```
+---
+## Playbook 3: Tracing a Line's History
+**Scenario:** You need to understand why a specific line of code exists -- who wrote it, when, and in what context.
+### Step 1: Initial blame
+```bash
+# Blame with whitespace and move detection
+git blame -w -M -C path/to/file.py
+# Focus on the specific line range
+git blame -w -M -C -L 42,42 path/to/file.py
+```
+### Step 2: Go deeper if the blame shows a bulk change
+If blame points to a formatting or refactoring commit:
+```bash
+# Blame at the commit BEFORE the bulk change
+git blame -w -M -C abc1234^ -- path/to/file.py -L 42,42
+# Or use .git-blame-ignore-revs to skip bulk commits automatically
+git blame --ignore-revs-file .git-blame-ignore-revs path/to/file.py
+```
+### Step 3: Read the full commit context
+```bash
+# See the full commit that introduced the line
+git show def5678
+# See the PR/issue if the commit message references one
+# e.g., "Fix #123" or "Closes #456"
+git log --format="%H %s" | grep "#123"
+```
+### Step 4: See all changes to this line over time
+```bash
+# Log of all commits that touched this line range
+git log -L 42,42:path/to/file.py
+# This shows the line's evolution across commits, including the diff at each step
+```
+---
+## Playbook 4: Recovering Lost Work
+**Scenario:** You accidentally ran `git reset --hard`, deleted a branch, or lost commits.
+### Step 1: Don't panic -- check the reflog
+```bash
+# See recent HEAD movements
+git reflog
+# Look for the commit you lost
+# Example output:
+# abc1234 HEAD@{0}: reset: moving to HEAD~3    ← the reset that lost your work
+# def5678 HEAD@{1}: commit: add user validation ← your lost commit
+# 789abcd HEAD@{2}: commit: fix login bug       ← another lost commit
+```
+### Step 2: Verify the lost commit
+```bash
+# Check the commit contents
+git show def5678
+git show def5678 --stat
+```
+### Step 3: Recover
+```bash
+# Option A: Create a branch at the lost commit (safest)
+git branch recovery def5678
+# Option B: Cherry-pick the commit onto your current branch
+git cherry-pick def5678
+# Option C: Reset to the lost commit (if you want to restore the full state)
+git reset --hard def5678
+```
+### Step 4: If reflog doesn't help
+```bash
+# Find unreachable objects (last resort)
+git fsck --unreachable --no-reflogs
+# This lists dangling commits, blobs, and trees
+# Look for "dangling commit" entries
+# Examine them with git show
+```
+---
+## Playbook 5: Identifying Hot Spots and Code Ownership
+**Scenario:** You're new to a codebase and need to understand which files change most and who knows them best.
+### Step 1: Find frequently changed files
+```bash
+# Most changed files in the last 6 months
+git log --since="6 months ago" --pretty=format: --name-only | sort | uniq -c | sort -rn | head -20
+```
+### Step 2: Find who knows each area
+```bash
+# Top contributors overall
+git shortlog -sn --no-merges --since="6 months ago"
+# Top contributors for a specific directory
+git shortlog -sn --no-merges -- src/auth/
+# Who last touched each file in a directory
+for f in src/auth/*.py; do
+    echo "$f: $(git log -1 --format='%an (%ar)' -- "$f")"
+done
+```
+### Step 3: Find coupling (files that change together)
+```bash
+# Files that frequently appear in the same commit
+git log --pretty=format: --name-only | sort | uniq -c | sort -rn | head -30
+# Look for pairs: if file A and file B always change together,
+# they may have hidden coupling that should be made explicit
+```
+### Step 4: Identify aging code
+```bash
+# Files not modified in over a year
+git log --diff-filter=M --since="1 year ago" --pretty=format: --name-only | sort -u > recent.txt
+git ls-files | sort > all.txt
+comm -23 all.txt recent.txt | head -30
+rm recent.txt all.txt
+```
+---
+## Playbook 6: Understanding a Complex Merge Conflict
+**Scenario:** You have a merge conflict and need to understand the history of both sides before resolving.
+### Step 1: See what both branches changed
+```bash
+# Changes on your branch since diverging from main
+git log --oneline main..HEAD
+# Changes on main since your branch diverged
+git log --oneline HEAD..main
+# The common ancestor
+git merge-base HEAD main
+```
+### Step 2: Understand the conflicting file's history
+```bash
+# History on your branch
+git log --oneline main..HEAD -- path/to/conflicted/file.py
+# History on main
+git log --oneline HEAD..main -- path/to/conflicted/file.py
+```
+### Step 3: See the three-way diff
+```bash
+# During a merge conflict, git stores three versions:
+# :1: = common ancestor (base)
+# :2: = your version (ours)
+# :3: = their version (theirs)
+git show :1:path/to/file.py > base.py
+git show :2:path/to/file.py > ours.py
+git show :3:path/to/file.py > theirs.py
+# Compare
+diff3 ours.py base.py theirs.py
+```
+### Step 4: Resolve with context
+Understanding both sides' intent (from the commit messages in Step 1) helps you resolve the conflict correctly rather than just picking one side.

package/.devcontainer/plugins/devs-marketplace/plugins/code-directive/skills/performance-profiling/SKILL.md ADDED Viewed

@@ -0,0 +1,341 @@
+---
+name: performance-profiling
+description: >-
+  This skill should be used when the user asks to "profile this code",
+  "find the bottleneck", "optimize performance", "measure execution time",
+  "check memory usage", "create a flamegraph", "benchmark this function",
+  "find memory leaks", "reduce latency", "run a performance test",
+  or discusses profiling tools, flamegraphs, benchmarking methodology,
+  cProfile, py-spy, scalene, Chrome DevTools performance,
+  memory profiling, or hot path analysis.
+version: 0.1.0
+---
+# Performance Profiling
+## Mental Model
+Performance work follows one rule: **measure first, optimize second**. Bottlenecks are almost never where you think they are. Developers consistently misjudge performance by 10-100x -- the "obviously slow" nested loop is often fast, while the "simple" database query is the real bottleneck.
+The profiling workflow is:
+1. **Establish a baseline** -- measure current performance with a reproducible benchmark
+2. **Profile** -- identify where time and memory are actually spent
+3. **Hypothesize** -- form a specific theory about the bottleneck
+4. **Optimize** -- make one targeted change
+5. **Measure again** -- verify the optimization actually helped
+6. **Compare** -- did the change improve the baseline? By how much?
+Without this discipline, you'll waste time optimizing code that doesn't matter, introduce complexity without measurable benefit, and have no proof that your changes helped.
+**Amdahl's Law** sets the ceiling: if a function consumes 5% of total runtime, making it infinitely fast saves only 5%. Focus on the biggest bars in the profile first.
+---
+## Python Profiling
+### cProfile (built-in, deterministic)
+cProfile instruments every function call. It shows call count, cumulative time, and per-call time:
+```bash
+# Profile a script
+python -m cProfile -s cumtime myapp.py
+# Profile and save to a file for analysis
+python -m cProfile -o profile.prof myapp.py
+# Analyze the saved profile
+python -c "
+import pstats
+p = pstats.Stats('profile.prof')
+p.sort_stats('cumulative')
+p.print_stats(20)  # top 20 functions
+"
+```
+**Tradeoff:** cProfile adds ~30% overhead and measures wall-clock time. It's deterministic (traces every call) so it catches everything but distorts timing for very fast functions.
+### py-spy (sampling, no overhead)
+py-spy samples the call stack without modifying the target process. It can attach to running processes:
+```bash
+# Record a flamegraph (SVG)
+py-spy record -o flamegraph.svg -- python myapp.py
+# Attach to a running process
+py-spy record -o flamegraph.svg --pid 12345
+# Top-like live view
+py-spy top --pid 12345
+# Profile for a specific duration
+py-spy record --duration 30 -o flamegraph.svg --pid 12345
+```
+**Tradeoff:** Sampling misses very short functions but has near-zero overhead. Ideal for production profiling.
+### scalene (CPU + memory + GPU)
+Scalene profiles CPU time, memory allocation, and memory usage simultaneously. It distinguishes Python time from native (C) time:
+```bash
+# Profile a script
+scalene myapp.py
+# Profile with specific options
+scalene --cpu --memory --reduced-profile myapp.py
+# Profile a specific function (in code)
+# from scalene import scalene_profiler
+# scalene_profiler.start()
+# ... code to profile ...
+# scalene_profiler.stop()
+```
+### memory_profiler (line-by-line memory)
+```python
+from memory_profiler import profile
+import pandas as pd
+@profile
+def process_data() -> pd.DataFrame:
+    data = pd.read_csv("large.csv")     # Line 5: +500 MiB
+    filtered = data[data["active"]]      # Line 6: +200 MiB
+    result = filtered.groupby("region").sum()  # Line 7: +50 MiB
+    del data, filtered                   # Line 8: -700 MiB
+    return result
+```
+```bash
+python -m memory_profiler myapp.py
+```
+### line_profiler (line-by-line CPU)
+```python
+# Decorate functions to profile
+@profile
+def expensive_function():
+    result = []                     # 0.0%
+    for item in large_list:         # 2.1%
+        parsed = parse(item)        # 45.3%  <-- hot line
+        if validate(parsed):        # 12.7%
+            result.append(parsed)   # 0.4%
+    return result
+```
+```bash
+kernprof -l -v myapp.py
+```
+> **Deep dive:** See `references/tool-commands.md` for the full command reference per language and tool.
+---
+## JavaScript / Node.js Profiling
+### V8 Profiler (`--prof`)
+Node's built-in V8 profiler generates a log that can be processed into a human-readable report:
+```bash
+# Generate a V8 profile log
+node --prof app.js
+# Process the log into readable output
+node --prof-process isolate-*.log > processed.txt
+```
+### clinic.js
+A suite of profiling tools for Node.js:
+```bash
+# Install
+npm install -g clinic
+# Doctor: overall health check (event loop, GC, I/O)
+clinic doctor -- node app.js
+# Flame: flamegraph
+clinic flame -- node app.js
+# Bubbleprof: async flow visualization
+clinic bubbleprof -- node app.js
+```
+### Chrome DevTools
+For both browser and Node.js profiling:
+```bash
+# Start Node with inspector
+node --inspect app.js
+# Or break on first line
+node --inspect-brk app.js
+```
+Then open `chrome://inspect` in Chrome:
+- **Performance tab:** Record a profile, see flamechart, call tree, and bottom-up views
+- **Memory tab:** Take heap snapshots, record allocation timelines, detect leaks
+### Lighthouse (Web Performance)
+```bash
+# CLI audit
+npx lighthouse https://example.com --output json --output html
+# Key metrics: FCP, LCP, TTI, TBT, CLS
+# Target: Performance score > 90
+```
+---
+## System Profiling
+When the bottleneck isn't in your code but in the system:
+```bash
+# Wall-clock time, user CPU, system CPU
+time python myapp.py
+# Process-level resource usage (live)
+htop                    # interactive process viewer
+htop -p 12345           # monitor specific PID
+# I/O statistics
+iostat -x 1             # disk I/O per device, every 1 second
+# CPU performance counters (Linux)
+perf stat python myapp.py
+# Counts: cycles, instructions, cache misses, branch misses
+# System call tracing
+strace -c python myapp.py       # summary of syscall time
+strace -e trace=network app     # only network syscalls
+```
+**Interpreting `time` output:**
+- **real** > **user** + **sys** → I/O bound (waiting for disk, network, or sleep)
+- **user** >> **sys** → CPU bound in userspace (computation)
+- **sys** >> **user** → CPU bound in kernel (many syscalls, context switches)
+---
+## Benchmarking Methodology
+Benchmarks must be reproducible, statistically sound, and isolated from noise.
+### CLI Benchmarking with hyperfine
+```bash
+# Basic benchmark with warmup
+hyperfine --warmup 3 'python myapp.py'
+# Compare two implementations
+hyperfine --warmup 3 'python v1.py' 'python v2.py'
+# With parameter sweeps
+hyperfine --warmup 3 -P size 100,1000,10000 'python bench.py --size {size}'
+# Export results
+hyperfine --warmup 3 --export-json results.json 'python myapp.py'
+```
+hyperfine automatically detects outliers, calculates mean/median/stddev, and warns about statistical issues.
+### Python Benchmarking with pytest-benchmark
+```python
+# benchmark fixture is injected by pytest-benchmark — no import needed
+def test_sort_performance(benchmark) -> None:
+    data = list(range(10000, 0, -1))
+    result = benchmark(sorted, data)
+    assert result == list(range(1, 10001))
+def test_json_parse_performance(benchmark) -> None:
+    """Benchmark with setup to exclude data preparation from timing."""
+    import json
+    payload = json.dumps({"users": [{"id": i, "name": f"user_{i}"} for i in range(1000)]})
+    result = benchmark(json.loads, payload)
+    assert len(result["users"]) == 1000
+```
+```bash
+pytest --benchmark-only --benchmark-sort=mean
+pytest --benchmark-compare          # compare against saved baseline
+pytest --benchmark-save=baseline    # save current results
+```
+### Benchmarking Rules
+1. **Warmup runs** -- JIT compilers, caches, and OS page faults all affect the first run. Always include warmup.
+2. **Multiple iterations** -- A single measurement is noise. Run at least 10 iterations and report mean, median, and stddev.
+3. **Isolate variables** -- Change one thing at a time. Benchmark before and after each optimization.
+4. **Control the environment** -- Close other applications, disable turbo boost for CPU benchmarks, use consistent hardware.
+5. **Statistical significance** -- If the difference is less than 2x the standard deviation, it's probably noise.
+---
+## Interpreting Results
+### Reading Flamegraphs
+Flamegraphs show the call stack over time. The x-axis is **not** time -- it's alphabetically sorted stack frames. Width represents the proportion of total samples.
+- **Wide bars at the top** = functions that consume a lot of CPU directly
+- **Wide bars at the bottom** = functions that call expensive children
+- **Plateaus** (flat tops) = functions where time is spent in the function itself, not its children
+- **Look for:** the widest bars at the top of the graph -- these are your hot paths
+### Identifying Hot Paths
+A hot path is the sequence of function calls that consumes the most cumulative time:
+1. Sort by cumulative time (`cumtime` in cProfile)
+2. Find the top-level function with the highest cumulative time
+3. Follow its callees -- which child function consumes the most?
+4. Repeat until you reach a leaf function
+The hot path tells you where optimization effort will have the most impact.
+### Memory Leak Patterns
+Signs of a memory leak:
+- Memory usage grows linearly with time/requests
+- `gc.collect()` doesn't reclaim memory
+- Heap snapshots show growing object counts for a specific type
+Common causes:
+- **Unbounded caches** -- dictionaries that grow forever. Fix: use `functools.lru_cache(maxsize=N)` or TTL-based caching.
+- **Event listener accumulation** -- listeners added but never removed. Fix: use weak references or explicit cleanup.
+- **Circular references with `__del__`** -- Python's GC can't collect cycles that have finalizers. Fix: use `weakref` to break the cycle.
+- **Global state accumulation** -- appending to module-level lists. Fix: scope the collection to the request/session lifecycle.
+> **Deep dive:** See `references/interpreting-results.md` for annotated examples of profiler output and how to read them.
+---
+## Ambiguity Policy
+These defaults apply when the user does not specify a preference. State the assumption when making a choice:
+- **Profiler choice:** Default to py-spy for Python (low overhead, flamegraph output), clinic.js for Node.js, and Chrome DevTools for browser. Use cProfile when the user needs exact call counts.
+- **Benchmark iterations:** Default to at least 10 iterations with 3 warmup runs. Increase for sub-millisecond operations.
+- **Metric focus:** Default to wall-clock time. Switch to CPU time when I/O is deliberately excluded. Switch to memory when the user mentions "memory", "leak", or "OOM".
+- **Optimization scope:** Optimize only the identified hot path. Do not refactor surrounding code for "consistency" unless it's part of the hot path.
+- **Baseline requirement:** Always establish a baseline measurement before optimizing. Refuse to optimize without one -- "it feels slow" is not a baseline.
+- **Reporting:** Report absolute numbers (ms, MB) alongside relative improvements (%). A 50% improvement from 2ms to 1ms matters less than a 10% improvement from 10s to 9s.
+---
+## Reference Files
+| File | Contents |
+|------|----------|
+| `references/tool-commands.md` | Full command reference for Python, JavaScript, and system profiling tools with all flags and options |
+| `references/interpreting-results.md` | How to read profiler output: annotated cProfile tables, flamegraph walkthroughs, memory timeline interpretation, and benchmark result analysis |