RubyGems - stud-finder - Versions diffs - 0.1.0 - Mend

stud-finder 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (27) hide show

checksums.yaml +7 -0
data/CHANGELOG.md +17 -0
data/LICENSE +21 -0
data/PRODUCT.md +172 -0
data/README.md +176 -0
data/VISION.md +151 -0
data/bin/stud-finder +6 -0
data/lib/stud-finder.rb +3 -0
data/lib/stud_finder/churn.rb +111 -0
data/lib/stud_finder/cli.rb +771 -0
data/lib/stud_finder/complexity.rb +104 -0
data/lib/stud_finder/coverage/cobertura.rb +59 -0
data/lib/stud_finder/coverage/detector.rb +26 -0
data/lib/stud_finder/coverage/lcov.rb +93 -0
data/lib/stud_finder/coverage/resultset.rb +103 -0
data/lib/stud_finder/diff.rb +53 -0
data/lib/stud_finder/edges.rb +113 -0
data/lib/stud_finder/fan_in.rb +243 -0
data/lib/stud_finder/file_collector.rb +152 -0
data/lib/stud_finder/js_complexity.rb +203 -0
data/lib/stud_finder/js_fan_in.rb +121 -0
data/lib/stud_finder/normalizer.rb +38 -0
data/lib/stud_finder/scorer.rb +171 -0
data/lib/stud_finder/temporal_coupling.rb +104 -0
data/lib/stud_finder/version.rb +5 -0
data/lib/stud_finder.rb +14 -0
metadata +199 -0

checksums.yaml ADDED Viewed

@@ -0,0 +1,7 @@
+---
+SHA256:
+  metadata.gz: 187e021ca929ec91327d22576d8e834b085dfcddcf68915c14ada124b1b99d5d
+  data.tar.gz: 840844ff17652ab19c769a52f8bc5a3162de63bb4d47827664f4e9ca58433520
+SHA512:
+  metadata.gz: 43b47e35c51c450555ef66ae9f879979242fe8e5cb24119de98ea1b13f12b3befefe8d5f28a4443b35ead8691138eff6214fd6a36a5a5a851f0c4f64a3affae0
+  data.tar.gz: '049bd829d40c7f63417c251875670df9f6f1544b53d71d484adb5d0c129b64fdeb6074a6b4fe2fcc79c3cff3972bdd5dfa44827a36ac37d574dd2684f2c66fc9'

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,17 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [0.1.0] - 2026-07-03
+### Added
+- Initial RubyGems release of `stud-finder`, a CLI that ranks files by five risk signals: fan-in (blast radius), fan-out, cyclomatic complexity, git churn, and test coverage.
+- Temporal coupling analysis for identifying files that change together.
+- Diff mode for scoring only files changed in a pull request while preserving repo-relative scores.
+- Ruby and JavaScript/TypeScript support.
+- Table, JSON, CSV, and Markdown output formats.
+- Lexical constant resolution for Ruby fan-in analysis, including the Task 1 fix from PR #33.

data/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2026 Fernando Baz
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.

data/PRODUCT.md ADDED Viewed

@@ -0,0 +1,172 @@
+# stud-finder
+**Find the files that will hurt you before they do.**
+---
+## The Problem
+Every codebase has load-bearing walls. Files that dozens of other files depend on. Files that change every sprint. Files whose complexity means one wrong edit cascades into a day of debugging.
+Most teams discover these files the hard way — after the incident.
+stud-finder surfaces them before you touch them.
+---
+## What It Does
+stud-finder analyzes a codebase and produces a ranked list of every file, scored by structural risk. Run it before a sprint, before a refactor, before a code review. Know which files deserve extra attention before anyone writes a line.
+```
+$ stud-finder ./my-rails-app
+FILE                              SCORE  LABEL   FAN_IN  COMPLEXITY  CHURN_COMMITS  CHURN_LINES  CHURN_PCT  COVERAGE
+app/models/user.rb                0.91   trunk   0.97    0.42        0.88           0.91         0.89       0.14
+app/services/payment_service.rb   0.84   trunk   0.78    0.91        0.71           0.68         0.69       0.22
+app/controllers/orders_controller 0.73   branch  0.61    0.65        0.74           0.77         0.75       0.31
+...
+```
+Three labels, one decision framework:
+- **Trunk** — load-bearing. Change with care. High review bar.
+- **Branch** — meaningful coupling. Worth a second look.
+- **Leaf** — isolated. Lower risk. Move fast here.
+---
+## The Five Signals
+Each file is scored on up to five independently measured signals, each grounded in decades of software engineering research. M7 introduced scored `fan_out`, so the composite now considers both incoming blast radius and outgoing coupling burden.
+### 1. Fan-in — Blast Radius
+*"How many files depend on this one?"*
+Rooted in Robert Martin's **afferent coupling (Ca)** metric (1994) and graph theory in-degree analysis. A file with fan_in 60 means 60 other files break if it breaks. The Stable Dependencies Principle says: high-coupling files must be treated as infrastructure.
+stud-finder builds the dependency graph via static analysis — Zeitwerk constant mapping for Rails, falling back to AST scanning. No runtime instrumentation required.
+**Weight: 25% of total score**
+### 2. Fan-out — Coupling Burden
+*"How many files does this one depend on?"*
+Rooted in Robert Martin's **efferent coupling (Ce)** metric. A high `fan_out` file has more direct dependencies to understand, coordinate, and mock in tests. In M7, fan-out moved from an informational column into the scored model.
+**Weight: 10% of total score**
+### 3. Complexity — Cognitive Load
+*"How hard is this file to reason about?"*
+Cyclomatic complexity, measured as the **maximum across any single method** in the file. A file with one function of complexity 12 is riskier than a file with ten functions of complexity 3 each — the hardest function determines how deep you have to go.
+Computed via RuboCop's static analysis engine. No manual annotation.
+**Weight: 25% of total score**
+### 4. Churn — Change Velocity
+*"How often is this file being touched, and how much?"*
+A composite signal: 50% commit frequency + 50% lines changed, both percentile-ranked across the full codebase. A file touched in 40 commits but only for small fixes is different from a file touched in 40 commits with major rewrites each time.
+Computed from git history over a configurable window (default: 180 days). Language-agnostic.
+**Weight: 25% of total score**
+### 5. Coverage — Safety Net
+*"If this file breaks, will tests catch it?"*
+Low coverage on a high-risk file is compounded danger — no blast-radius detection, no complexity safety net, no test catch. Coverage is measured as an inverse (0% coverage = maximum penalty), and files absent from the coverage report are handled via coverage fallback rather than penalized falsely.
+Supports Cobertura XML (RSpec + SimpleCov), LCOV (Jest, lcov), and SimpleCov JSON resultsets. Auto-detected by file extension.
+**Weight: 15% of total score** (optional — runs as 4-factor model when no coverage report provided)
+---
+## The Score
+Each signal is percentile-ranked across the full codebase — so scores are always relative to the project itself, not an external benchmark. A file at the 90th percentile of fan_in has more incoming dependencies than 90% of its peers.
+The composite score (0.0–1.0) weights the signals and produces the ranked output. Classification thresholds are configurable.
+**4-factor formula (no coverage):**
+```
+score = 0.2941 × fan_in_pct + 0.1176 × fan_out_pct + 0.2941 × complexity_pct + 0.2941 × churn_pct
+```
+**5-factor formula (with coverage):**
+```
+score = 0.25 × fan_in_pct + 0.10 × fan_out_pct + 0.25 × complexity_pct + 0.25 × churn_pct + 0.15 × (1 − coverage)
+```
+---
+## Use Cases
+**Pre-sprint risk assessment** — before planning, run stud-finder against the files your team is about to touch. Trunk files get more review time budgeted.
+**Refactor prioritization** — you have ten candidates for cleanup. stud-finder tells you which ones have the highest blast radius if the refactor goes wrong.
+**Onboarding** — new engineer joining the team. Here's the trunk map. These are the files you ask before changing.
+**PR review triage** — reviewer bandwidth is finite. Direct it at the files that matter.
+**Architecture health monitoring** — run stud-finder weekly. Watch if trunk is growing or shrinking. Trunk growth is a coupling smell.
+---
+## Technical Foundation
+- **Language:** Ruby gem, zero runtime instrumentation
+- **Static analysis:** RuboCop (complexity), Zeitwerk + custom AST (fan_in), git log (churn)
+- **Coverage formats:** Cobertura XML, LCOV, SimpleCov JSON — auto-detected
+- **Output formats:** table (default), JSON, CSV, Markdown
+- **Configuration:** CLI flags for weights, thresholds, excludes, churn window
+- **Requires:** Ruby, RuboCop, git. Nothing else for Ruby analysis.
+---
+## Roadmap
+**M1–M3 — Complete**
+Initial composite score (Ruby + JS/TS). `--diff-base` / `--only` filter for per-PR output. Per-PR CircleCI integration — stud-finder runs on every PR, posts ranked artifact and PR comment. Non-blocking.
+**M4 — Complete: fan-out, instability, `stud-finder edges`**
+Fan-out (efferent coupling) and instability (`fan_out / (fan_in + fan_out)`) added to every row in the core output. New `stud-finder edges FILE` subcommand emits the actual dependency edge list for a specific file — dependents and dependencies, both sorted by risk score. Shifts the output from "this file scores high" to "here are the specific files in the blast radius."
+**M5 — Sentry integration**
+Connect to the Sentry REST API. Parse production stack traces, aggregate error frequency by source file. A runtime signal: not structural approximation but observed failure in production. `--sentry-token`, `--sentry-org`, `--sentry-project` flags. Percentile-ranked and added to the composite score.
+**M6 — Temporal coupling**
+Co-change frequency from git history: file pairs that change together more often than expected by chance. Surfaces hidden coupling that static analysis cannot see — implicit contracts, shared state, callback side effects. Observed behavior, not structural approximation.
+**Pinned — Producer-consumer dependency mapping**
+Explicitly surfacing which components consume data produced by other components, flagging pairs with high temporal coupling but low static coupling as candidates for explicit contract documentation.
+**M7 — Complete: scored fan-out + rankings**
+Scored fan-out introduced as the fifth risk signal with a 10% default weight.
+**M7 follow-up — Merge-to-staging S3 timeline (lowest priority)**
+Full stud-finder run on each merge to the mainline branch → JSON → S3, keyed by timestamp + commit SHA. Durable risk-over-time feed for trend analysis.
+**Future — Toward a validated risk estimator**
+Calibrated weights back-tested against bug history. Historical bug density as a direct input metric. Change-scope awareness (per-PR risk = file-risk × change-magnitude × change-type). Test quality beyond line coverage. See `VISION.md` for the full analysis.
+---
+## Why stud-finder?
+In construction, a stud finder locates the load-bearing structure inside a wall before you drill. You don't guess — you know exactly where the structure is.
+Same principle. Before you refactor, before you sprint, before you review — know where the load-bearing code is.
+---
+*Built by Artífice. Ruby gem. Open to collaboration.*

data/README.md ADDED Viewed

@@ -0,0 +1,176 @@
+# stud-finder
+**Find the files that will hurt you before they do.**
+A code risk scoring CLI for Ruby and JavaScript/TypeScript codebases. Ranks every file by structural risk so you know where to put your senior review effort, your refactoring time, and your test coverage — *before* the incident.
+```
+$ bundle exec bin/stud-finder ./my-rails-app
+RANK  LANGUAGE  FILE                              SCORE  CLASS   FAN_IN  FAN_OUT  COMPLEXITY  CHURN_COMMITS  MAX_COUPLING  COUPLING_PARTNERS  COVERAGE
+1     ruby      app/models/proficiency.rb         0.91   trunk   223     4        85          11             0.62          3                  0.99
+2     ruby      app/services/payment_service.rb   0.84   trunk   78      12       91          42             0.71          5                  0.22
+3     ruby      app/controllers/orders_controller 0.73   branch  61      9        65          74             0.48          2                  0.31
+4     js        src/components/Dashboard.tsx      0.68   branch  44      18       56          18             0.00          0                  —
+...
+```
+---
+## Install
+After the gem is published:
+```bash
+gem install stud-finder
+```
+Or add it to your Gemfile:
+```ruby
+gem 'stud-finder'
+```
+Then run `bundle install`.
+For edge or unreleased changes, install from git:
+```ruby
+gem 'stud-finder', git: 'https://github.com/bazfer/stud-finder'
+```
+Or clone and run directly.
+**Requirements:** Ruby >= 3.2. For JavaScript support, install `dependency-cruiser` and `eslint` in the target project (`npm install -D dependency-cruiser eslint`).
+---
+## Usage
+The path is positional. Everything else is optional flags.
+```bash
+bundle exec bin/stud-finder PATH [options]
+```
+### Common runs
+```bash
+# Basic: rank every file in the project
+bundle exec bin/stud-finder ./my-rails-app
+# CSV output for spreadsheet review
+bundle exec bin/stud-finder ./my-rails-app --output csv > risk.csv
+# Top 50 highest-risk files, markdown for a PR comment
+bundle exec bin/stud-finder ./my-rails-app --top 50 --output markdown
+# With coverage signals (5-factor scoring)
+bundle exec bin/stud-finder ./my-rails-app \
+  --ruby-coverage ./coverage/resultset.json \
+  --js-coverage ./coverage/lcov.info
+```
+---
+## The Five Signals
+Each file is scored on up to five independently measured signals. See [PRODUCT.md](PRODUCT.md) for the full theory and weighting math.
+| Signal | What it measures | Weight |
+|--------|------------------|--------|
+| **fan_in** | How many other files depend on this one (blast radius) | 25% |
+| **fan_out** | How many other files this one depends on (its own coupling burden) | 10% |
+| **complexity** | Cyclomatic complexity of the hardest method in the file | 25% |
+| **churn** | Commit frequency + line volume over a 180-day window | 25% |
+| **coverage** | Inverse of line coverage (lower coverage = higher risk) | 15% |
+When coverage isn't available, the remaining four signals (fan_in, fan_out, complexity, churn) re-normalize to 100% automatically (4-factor mode).
+### Informational columns (not scored)
+These ride alongside the score to give reviewers extra context, but do not contribute to it:
+- **instability** / **instability_pct** — `fan_out / (fan_in + fan_out)`, and its percentile rank across the repo. High instability = depends on a lot while little depends on it.
+- **max_coupling** / **max_coupling_partner** / **coupling_partners** / **coupling_pct** — temporal coupling from git history: the strongest co-change ratio with any partner file, the path of that strongest partner, how many partners cross the threshold, and the percentile rank of `max_coupling`. The analysis produces co-change pairs; each file's row keeps the strongest pair's ratio (`max_coupling`), that partner's path (`max_coupling_partner`), and the count of pairs (`coupling_partners`). On ties the strongest partner is chosen deterministically: highest coupling, then highest co-change count, then alphabetical path; `max_coupling_partner` is an empty string when a file has no qualifying partners. Computed once over the full file set in the main scan (one extra `git log` pass), so cross-language co-change is captured. Same thresholds as the `edges` subcommand (`--coupling-threshold`, `--coupling-min-commits`).
+Files are classified into three labels based on their **fan_in percentile** (not the total score):
+- **trunk** — fan_in in the top 15% (default `trunk_threshold: 85`). Load-bearing. High review bar, change with care.
+- **branch** — fan_in between the 50th and 85th percentile (default `branch_threshold: 50`). Meaningful coupling.
+- **leaf** — everything below the 50th percentile. Isolated. Move fast here.
+The total score still drives the ranking. The class label is a separate coupling-based signal.
+---
+## Language Support
+**Ruby:**
+- fan_in via Zeitwerk constant mapping (Rails-aware), AST fallback
+- complexity via RuboCop
+- coverage: SimpleCov resultset JSON, Cobertura XML
+**JavaScript / TypeScript (.js, .jsx, .ts, .tsx):**
+- fan_in via `dependency-cruiser` (must be installed in the target project)
+- complexity via `eslint` (`--rule '{"complexity":["error",0]}'`)
+- coverage: LCOV (`.info` format)
+Each language gets its own ranking section in the output — Ruby and JS are not pooled.
+---
+## Flag Reference
+| Flag | Description |
+|------|-------------|
+| `--output FORMAT` | `table` (default), `json`, `markdown`, `csv` |
+| `--ruby-coverage PATH` | Ruby coverage report (SimpleCov `.json` or Cobertura `.xml`) |
+| `--js-coverage PATH` | JavaScript coverage report (LCOV `.info`) |
+| `--coverage PATH` | Deprecated alias for `--ruby-coverage` |
+| `--js-timeout N` | dependency-cruiser timeout in seconds (default: 60) |
+| `--churn-days N` | Commit lookback window in days (default: 180) |
+| `--weights WEIGHTS` | Custom weights as fractions, e.g. `fan_in:0.25,fan_out:0.10,complexity:0.25,churn:0.25,coverage:0.15`. Defaults shown. All five keys are required. |
+| `--trunk-threshold N` | fan_in percentile cutoff for trunk classification (default: 85) |
+| `--branch-threshold N` | fan_in percentile cutoff for branch classification (default: 50) |
+| `--exclude PATTERN` | Exclude glob pattern (repeatable). `spec/` and `test/` excluded by default. |
+| `--top N` | Emit only the top N results |
+| `--diff-base REF` | Score the whole repo but emit only the files changed on `HEAD` vs the merge-base with `REF` (e.g. `origin/staging`). Ranks and scores stay relative to the full repo. Ideal for per-PR runs. |
+| `--only PATHS` | Emit only these comma-separated repo-relative paths. Like `--diff-base` but with an explicit list instead of a git diff. Mutually exclusive with `--diff-base`. |
+| `--min-files N` | Advisory minimum file count to trust percentiles (default: 20) |
+| `--verbose` | Print suppressed per-file warnings to stderr |
+| `--version`, `--help` | Self-explanatory |
+---
+## Output Formats
+- `table` — human-readable, aligned columns
+- `csv` — spreadsheet-friendly, pipe to a file
+- `json` — machine-readable with `meta`, `warnings`, `ruby`, `javascript` sections
+- `markdown` — drop directly into a PR comment or issue
+---
+## What It's For
+Run it:
+- Before a sprint, to see what the team is about to touch
+- Before a major refactor, to identify the load-bearing walls
+- Before a code review, to know which PRs deserve extra scrutiny
+- On every PR in CI, as a risk-tagged diff context
+Don't run it as a gate — risk isn't a binary blocker. Run it as input to human judgment.
+---
+## Documentation
+- **[PRODUCT.md](PRODUCT.md)** — theory, formulas, and the research behind each signal
+- **[VISION.md](VISION.md)** — project vision and positioning
+---
+## License
+MIT. See [LICENSE](LICENSE).

data/VISION.md ADDED Viewed

@@ -0,0 +1,151 @@
+# stud-finder — Vision & Roadmap to Risk Estimator
+## What stud-finder is today
+A **triage and orientation tool**. It surfaces which files deserve attention before you touch them, based on four structural signals: fan-in (blast radius), complexity (cognitive load), churn (change velocity), and coverage (safety net). Scores are percentile-ranked across the full codebase — so the output is always relative to the project itself.
+Legitimate uses today:
+- Bootstrapping orientation in an unfamiliar codebase — fast
+- Prioritizing which modules get a formal stabilization review first
+- Making implicit architectural risk explicit ("everyone knows employee.rb is risky" — now there's a number)
+- Directing reviewer bandwidth at the files that matter
+---
+## The honest limits of the current model
+**1. Coupling ≠ correctness.**
+Fan-in measures blast radius, not bug probability. High fan-in files (`employee.rb`, `objective_template.rb`) tend to get the most attention, the most tests, the most experienced eyes. They can be riskier to change, but they may also be the best-maintained files in the codebase. High structural score does not mean high bug rate.
+**2. The weights are invented.**
+`fan_in: 0.35, complexity: 0.25, churn: 0.25, coverage: 0.15` — these were chosen on first principles. They haven't been back-tested against actual bug outcomes. Without fitting, the score is a ranking, not an estimate.
+**3. Bugs live at interfaces, not in files.**
+The dominant class of production bug — traced across 30 real incidents — is not "a single high-risk file was wrong." It's an implicit contract between a producer and a consumer that was never explicitly defined, then violated by a refactor or a lifecycle change. No individual file scores high; the interface between them is broken. File-level scoring misses this entire class.
+**4. File-risk ≠ change-risk.**
+A one-line comment edit to `employee.rb` is not the same risk as a 400-line refactor. The current score is on the file, not on the change. Without change-scope awareness, the score can't distinguish.
+**5. Coverage measures execution, not correctness.**
+Line coverage tells you which lines run during tests. It doesn't tell you whether the tests assert the invariants that matter. Files with 90%+ coverage have produced serious production bugs because the broken invariant was never tested.
+**6. No runtime signal.**
+Static analysis is backward-looking about structure. Where code is actually failing in production right now is a stronger signal than where it looks risky structurally.
+---
+## Why similar tools don't fully exist
+CodeClimate measures complexity + churn. Structure101 measures coupling. Danger and CodeOwners operate on change shape. Nobody has combined all of these into a single validated risk score for review prioritization — because:
+- Signal-to-noise at the file level is low without calibration against outcomes
+- Teams that need precision use formal methods (property-based testing, TLA+, invariant documentation) on specific risky subsystems — not file-level rankings
+- The actuators (review depth, staging gate) are hard to connect to file topology alone
+This is not a reason not to build it. It's a reason to be honest about what validation is required before calling it an estimator.
+---
+## The path to a genuine risk estimator
+### Already built (M1–M3)
+- 4-signal composite score (fan_in, complexity, churn, coverage)
+- `--diff-base` / `--only` filter — per-PR output scoped to touched files, full-repo scoring preserved
+- Per-PR CircleCI job — stud-finder runs on every PR, posts markdown + CSV artifact and PR comment
+### M4 — Fan-out, instability, and `stud-finder edges`
+Retain the dependency graph that fan_in already builds internally (and discards). Two deliverables:
+**Fan-out + instability in the core output (every row):**
+- `fan_out` — raw count of files this file depends on (efferent coupling)
+- `instability` — Robert Martin's metric: `fan_out / (fan_in + fan_out)`. Bounded [0, 1]. A file with fan_in=100, fan_out=10 → instability=0.09 (stable). A file with fan_in=2, fan_out=50 → instability=0.96 (fragile consumer).
+Fan-out captures a different failure mode than fan-in. Several production bugs in the Covalent 2026 incident set were fan-out failures: a consumer depended on a fragile implicit contract, not on a high-blast-radius file. Fan-in alone would not have flagged them.
+Instability is not yet added to the composite score — first validate it against known fan-out bugs (CO-21367 is the reference case), then calibrate the weight.
+**`stud-finder edges FILE [PATH]` subcommand:**
+Drill-down for a specific file. Emits:
+- Dependents — files that depend on this file (incoming edges), sorted by score desc
+- Dependencies — files this file depends on (outgoing edges), sorted by score desc
+### M5 — Sentry integration
+Query the Sentry REST API for production issues, parse stack trace frames, aggregate by source file. `sentry_events[file]` = distinct production errors that touched this file. Percentile-ranked as a scored signal.
+This is the only runtime signal in the stack — not "this file looks risky structurally" but "this file is actually in the stack when things break in production." Stronger than any static approximation.
+CLI: `--sentry-token`, `--sentry-org`, `--sentry-project`. Main implementation challenge: path normalization (Sentry frame paths → repo-relative paths).
+### M6 — Temporal coupling
+Co-change frequency from git history: file pairs that change together in the same commit more often than expected by chance. This captures hidden coupling that static analysis cannot see — implicit contracts, shared state, callback side effects that always require coordinated edits.
+This is the most empirically defensible structural metric in the roadmap — observed behavior in real production git history, not a theoretical approximation. Files that always change together have a hidden dependency. If that dependency is not explicit, it's a risk.
+### Pinned — Producer-consumer dependency mapping
+Explicitly mapping what data each component consumes and who produces it. The interface between a data producer (a clone/publish flow, an import pipeline) and a data consumer (a blueprint serializer, a dashboard query) is where the hardest-to-detect bugs live. These interfaces are not visible to fan-in analysis because they operate through data shape (a join table schema, a JSON payload structure) rather than constant references.
+This is a design artifact — an explicit contract document — not just a metric. The tooling question: can stud-finder surface candidate producer-consumer pairs (files with high temporal coupling but low static coupling) and flag them for explicit contract documentation?
+### What's still missing to reach "validated estimator"
+These five items are the gap between "plausible ranking" and "calibrated risk estimator":
+**1. Calibrated weights (back-tested against bug history).**
+Run stud-finder against git history at the introducing commit of each known production bug. What were the scores of the introducing files? Fit the weights so the score would have ranked those files highly before the bug surfaced. The seed data for this already exists: a 57-ticket CSV of true production issues with introducing PRs traced. This is a supervised fitting problem — the labels (bug/no-bug) and the features (fan_in, complexity, churn) are both available.
+**2. Historical bug density as a direct input metric.**
+Count production bugs introduced per file over a trailing window (e.g., 2 years). This is the strongest single signal available — it is the outcome variable itself, used as a predictor. Files that score high structurally AND have prior bug density are high-confidence risk. Files that score high structurally but have zero prior bugs may be well-maintained trunks. Combined with structural signals, this dramatically improves precision.
+**3. Change-scope awareness (delta-risk = file-risk × change-magnitude × change-type).**
+File-risk × lines changed is a trivially computable multiplier already partially available in the per-PR job. Change-type (touching a public interface vs. an internal method vs. a query) is harder — static analysis cannot distinguish reliably. LLM-based semantic classification of the diff is the natural extension here: classify the change type, feed it into a per-PR risk score that is change-specific, not file-specific.
+**4. Test quality beyond line coverage.**
+Mutation score — does killing a line cause a test failure? — is expensive to compute but dramatically more signal-rich than line coverage. Even assertion density (assertions per tested line) would improve on raw coverage. The goal: distinguish "this file has 90% coverage with meaningful assertions" from "this file has 90% coverage with happy-path-only tests that missed the broken invariant."
+**5. Runtime signals (Sentry / error rates per file).**
+Where code is actually failing in production is a stronger signal than where it looks structurally risky. A file with moderate structural score but high Sentry event rate is more dangerous than a high-score file that has never produced an error. Sentry API integration — mapping error events back to source files — would make stud-finder empirical at the risk-in-production level, not just structural.
+---
+## What each milestone unlocks
+| Milestone | What it enables |
+|-----------|----------------|
+| M4 `--explain` | Actionable blast-radius view per file; fan-out as a new signal class |
+| M5 temporal coupling | Empirical hidden coupling detection; surfaces implicit contracts |
+| Pinned producer-consumer | Framework for explicit interface contracts; feeds stabilization review docs |
+| Calibrated weights | Score becomes an estimate, not just a ranking |
+| Historical bug density | Strongest predictor; validates structural signals against outcomes |
+| Change-scope awareness | Per-PR risk score, not per-file; connects to actual review decisions |
+| Test quality | Coverage signal becomes meaningful rather than gameable |
+| Runtime signals | Ground truth: where code is failing right now |
+**Immediately actionable (data already in hand):** calibrated weights + historical bug density. The 57-ticket CSV with introducing PRs is the training set.
+**Near-term with existing infrastructure:** change-scope awareness (lines diff is in the per-PR job; LLM classification via Covy is a natural extension).
+**Longer-term:** mutation score, Sentry integration.
+---
+## The ceiling without validation
+Without calibrated weights and historical bug density (items 1 and 2 above), stud-finder identifies the right *neighborhoods* to be suspicious of but cannot say how suspicious to be about a specific change. It remains a triage tool — valuable, but not an estimator.
+With items 1 and 2, stud-finder becomes a validated estimator: the score has a known relationship to bug probability, not just a plausible structural theory. With item 3 (change-scope), it becomes actionable at the PR level — where the review and staging gate decisions actually happen.
+---
+## Updated milestone roadmap
+| Milestone | Status | Description |
+|-----------|--------|-------------|
+| M1 | Done | 4-signal composite score, Ruby |
+| M2 | Done | JS/TS support, `--diff-base` / `--only` filter |
+| M3 | Done (PR open) | Per-PR CircleCI job — artifact + PR comment |
+| M4 | Next | Fan-out + instability in core output; `stud-finder edges FILE` subcommand |
+| M5 | Queued | Sentry integration — runtime error frequency as a scored signal |
+| M6 | Queued | Temporal coupling — co-change frequency from git history |
+| Pinned | Queued | Producer-consumer dependency mapping |
+| M7 | Lowest prio | Merge-to-staging S3 timeline producer |
+| Future | Backlog | Calibrated weights, historical bug density, change-scope LLM classification, mutation score |

data/bin/stud-finder ADDED Viewed

@@ -0,0 +1,6 @@
+#!/usr/bin/env ruby
+# frozen_string_literal: true
+require_relative '../lib/stud_finder/cli'
+exit StudFinder::CLI.start(ARGV)

data/lib/stud-finder.rb ADDED Viewed

@@ -0,0 +1,3 @@
+# frozen_string_literal: true
+require 'stud_finder'