sniff-cli 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,184 @@
1
+ Metadata-Version: 2.4
2
+ Name: sniff-cli
3
+ Version: 1.0.0
4
+ Summary: A terminal-native AI-likelihood detection engine for Git repositories.
5
+ Home-page: https://github.com/alphaonelabs/sniff
6
+ Author: AlphaOneLabs
7
+ Author-email: hello@alphaonelabs.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Environment :: Console
12
+ Requires-Python: >=3.9
13
+ Description-Content-Type: text/markdown
14
+ Requires-Dist: typer>=0.12.3
15
+ Requires-Dist: rich>=13.7.1
16
+ Requires-Dist: questionary>=2.0.1
17
+ Requires-Dist: gitpython>=3.1.43
18
+ Requires-Dist: nltk>=3.8.1
19
+ Requires-Dist: torch>=2.2.0
20
+ Requires-Dist: transformers>=4.40.0
21
+ Requires-Dist: numpy>=1.26.0
22
+ Requires-Dist: sentence-transformers>=2.7.0
23
+ Requires-Dist: plotille>=5.0.0
24
+ Requires-Dist: pyfiglet>=1.0.2
25
+ Requires-Dist: anthropic
26
+ Requires-Dist: python-dotenv
27
+ Dynamic: author
28
+ Dynamic: author-email
29
+ Dynamic: classifier
30
+ Dynamic: description
31
+ Dynamic: description-content-type
32
+ Dynamic: home-page
33
+ Dynamic: requires-dist
34
+ Dynamic: requires-python
35
+ Dynamic: summary
36
+
37
+ # Sniff 🐕
38
+ **Offline AI Contribution Detection Engine for Git Repositories**
39
+
40
+ Sniff is a terminal-native AI detection system designed to analyze Git repositories and estimate the likelihood that commits or code contributions were generated or heavily assisted by AI tools.
41
+
42
+ It combines deterministic structural analysis with a local Large Language Model (GPT-2 via HuggingFace) to provide explainable AI-likelihood scoring — all within a beautiful, interactive terminal interface. **100% offline. Zero cloud APIs. Your code never leaves your machine.**
43
+
44
+ ---
45
+
46
+ ## 1. Problem Statement
47
+
48
+ **AI-Generated Code Transparency & Governance in Modern Development**
49
+
50
+ With the rapid rise of AI coding assistants such as GitHub Copilot and ChatGPT, developers are increasingly committing AI-generated code without fully understanding it.
51
+
52
+ This creates several risks:
53
+ - Technical debt accumulation
54
+ - Security vulnerabilities
55
+ - Loss of code ownership accountability
56
+ - Academic integrity violations
57
+ - Reduced code quality over time
58
+
59
+ Currently, Git platforms provide no structured transparency layer to detect or analyze AI-assisted contributions.
60
+
61
+ ### Target Users
62
+ - DevSecOps Teams
63
+ - Enterprise Engineering Managers
64
+ - Academic Institutions
65
+ - Open Source Maintainers
66
+ - Security Auditors
67
+
68
+ ### Existing Gaps
69
+ - No repository-level AI usage analytics
70
+ - No explainable AI-likelihood scoring for commits
71
+ - No structured governance tools for AI contribution transparency
72
+
73
+ ---
74
+
75
+ ## 2. Root Cause Analysis
76
+
77
+ AI-generated code often exhibits:
78
+ - Highly structured and formal commit messages with low linguistic entropy
79
+ - Boilerplate-heavy code patterns with repetitive variable naming
80
+ - Large bursts of code additions in physically impossible time windows
81
+ - Consistent function scaffolding (docstrings, uniform indentation, predictable naming)
82
+
83
+ Existing approaches rely on simple keyword matching (fragile) or fully black-box cloud APIs (non-transparent). Sniff is the first offline, explainable alternative.
84
+
85
+ ---
86
+
87
+ ## 3. Solution: Tri-Engine ML Architecture
88
+
89
+ Sniff uses a **three-signal hybrid detection pipeline** to compute a final probabilistic AI-likelihood score for every commit.
90
+
91
+ ### Engine 1: Text Perplexity (NLP)
92
+ - Uses a local **HuggingFace GPT-2** model to calculate the log-probability perplexity of commit messages.
93
+ - LLMs produce mathematically "perfect" text (low perplexity). Human writing is chaotic and bursty (high perplexity).
94
+ - **Flag:** Perplexity < 30 → Score: 0.9
95
+
96
+ ### Engine 2: Code AST Entropy (Structural)
97
+ - Parses code additions into a **Python Abstract Syntax Tree (AST)** to analyze structural complexity.
98
+ - Detects AI signatures: high docstring density, low lexical entropy, uniform scaffold patterns.
99
+ - Falls back to raw diff heuristics for non-Python code (React/JS/Go), detecting patterns like `useState + useEffect` bursts.
100
+ - **Flag:** Low variable uniqueness ratio → Score: 0.3–0.4
101
+
102
+ ### Engine 3: Behavioral Velocity (Metadata)
103
+ - Cross-references **Lines of Code added per minute** by parsing GitPython commit timestamps.
104
+ - Flags physically impossible typing speeds (> 50 LPM).
105
+ - **Flag:** Velocity > 50 LPM → Instant +0.4 boost to final score.
106
+
107
+ ### Score Aggregation
108
+ ```
109
+ final_score = (text × 0.4) + (code × 0.4) + (velocity × 0.2) + amplification_boost
110
+ ```
111
+ - Results in a deterministic, explainable AI-likelihood band: **Likely Human / Mixed / Likely AI-Assisted**
112
+
113
+ ---
114
+
115
+ ## 4. System Architecture
116
+
117
+ ```
118
+ User → [sniff interactive] → Theme Selector → Repository Connect
119
+ → Git Graph Extraction (GitPython)
120
+ → Text Perplexity Engine (GPT-2 local)
121
+ → AST Code Entropy Engine (Python ast)
122
+ → Velocity Behavioral Engine (timestamps)
123
+ → Score Aggregation → Rich Dashboard + Plotille Charts
124
+ ```
125
+
126
+ Sniff is **stateless** and requires no external database. All analysis runs in-memory.
127
+
128
+ ---
129
+
130
+ ## 5. Tech Stack
131
+
132
+ | Layer | Technology |
133
+ |---|---|
134
+ | Language | Python 3.10+ |
135
+ | CLI Framework | Typer |
136
+ | UI & Layout | Rich (Tables/Panels) |
137
+ | ASCII Charts | Plotille |
138
+ | ASCII Typography | PyFiglet |
139
+ | Git Data | GitPython |
140
+ | NLP Model | HuggingFace Transformers (GPT-2) |
141
+ | Code Parsing | Python `ast` |
142
+ | ML Backend | PyTorch |
143
+
144
+ ---
145
+
146
+ ## 6. Installation
147
+
148
+ ```bash
149
+ git clone https://github.com/mrgear111/sniff.git
150
+ cd sniff
151
+ python -m venv venv
152
+ source venv/bin/activate
153
+ pip install -e .
154
+ ```
155
+
156
+ ---
157
+
158
+ ## 7. Usage
159
+
160
+ ### Interactive REPL
161
+ ```bash
162
+ sniff interactive
163
+ ```
164
+
165
+ | Command | Description |
166
+ |---|---|
167
+ | `cd <path or url>` | Switch repo. Pastes GitHub URLs auto-clone to a local cache |
168
+ | `scan [count]` | Analyze the N most recent commits. Default: 10 |
169
+ | `stats [count]` | View contributor AI leaderboard. Default: 50 |
170
+ | `theme` | Switch syntax color theme (Dark / Light / Colorblind) |
171
+ | `clear` | Clear the terminal |
172
+ | `exit` | Quit the session |
173
+
174
+ ### Headless / CI Mode
175
+ ```bash
176
+ sniff scan --path /path/to/repo --json
177
+ sniff stats --path /path/to/repo --json
178
+ ```
179
+
180
+ ---
181
+
182
+ ## 8. Disclaimer
183
+
184
+ Sniff relies on statistical ML models and behavioral heuristics. It is a powerful auditing signal, not a definitive legal claim of AI generation. Results should always be reviewed by a human auditor before action is taken.
@@ -0,0 +1,148 @@
1
+ # Sniff 🐕
2
+ **Offline AI Contribution Detection Engine for Git Repositories**
3
+
4
+ Sniff is a terminal-native AI detection system designed to analyze Git repositories and estimate the likelihood that commits or code contributions were generated or heavily assisted by AI tools.
5
+
6
+ It combines deterministic structural analysis with a local Large Language Model (GPT-2 via HuggingFace) to provide explainable AI-likelihood scoring — all within a beautiful, interactive terminal interface. **100% offline. Zero cloud APIs. Your code never leaves your machine.**
7
+
8
+ ---
9
+
10
+ ## 1. Problem Statement
11
+
12
+ **AI-Generated Code Transparency & Governance in Modern Development**
13
+
14
+ With the rapid rise of AI coding assistants such as GitHub Copilot and ChatGPT, developers are increasingly committing AI-generated code without fully understanding it.
15
+
16
+ This creates several risks:
17
+ - Technical debt accumulation
18
+ - Security vulnerabilities
19
+ - Loss of code ownership accountability
20
+ - Academic integrity violations
21
+ - Reduced code quality over time
22
+
23
+ Currently, Git platforms provide no structured transparency layer to detect or analyze AI-assisted contributions.
24
+
25
+ ### Target Users
26
+ - DevSecOps Teams
27
+ - Enterprise Engineering Managers
28
+ - Academic Institutions
29
+ - Open Source Maintainers
30
+ - Security Auditors
31
+
32
+ ### Existing Gaps
33
+ - No repository-level AI usage analytics
34
+ - No explainable AI-likelihood scoring for commits
35
+ - No structured governance tools for AI contribution transparency
36
+
37
+ ---
38
+
39
+ ## 2. Root Cause Analysis
40
+
41
+ AI-generated code often exhibits:
42
+ - Highly structured and formal commit messages with low linguistic entropy
43
+ - Boilerplate-heavy code patterns with repetitive variable naming
44
+ - Large bursts of code additions in physically impossible time windows
45
+ - Consistent function scaffolding (docstrings, uniform indentation, predictable naming)
46
+
47
+ Existing approaches rely on simple keyword matching (fragile) or fully black-box cloud APIs (non-transparent). Sniff is the first offline, explainable alternative.
48
+
49
+ ---
50
+
51
+ ## 3. Solution: Tri-Engine ML Architecture
52
+
53
+ Sniff uses a **three-signal hybrid detection pipeline** to compute a final probabilistic AI-likelihood score for every commit.
54
+
55
+ ### Engine 1: Text Perplexity (NLP)
56
+ - Uses a local **HuggingFace GPT-2** model to calculate the log-probability perplexity of commit messages.
57
+ - LLMs produce mathematically "perfect" text (low perplexity). Human writing is chaotic and bursty (high perplexity).
58
+ - **Flag:** Perplexity < 30 → Score: 0.9
59
+
60
+ ### Engine 2: Code AST Entropy (Structural)
61
+ - Parses code additions into a **Python Abstract Syntax Tree (AST)** to analyze structural complexity.
62
+ - Detects AI signatures: high docstring density, low lexical entropy, uniform scaffold patterns.
63
+ - Falls back to raw diff heuristics for non-Python code (React/JS/Go), detecting patterns like `useState + useEffect` bursts.
64
+ - **Flag:** Low variable uniqueness ratio → Score: 0.3–0.4
65
+
66
+ ### Engine 3: Behavioral Velocity (Metadata)
67
+ - Cross-references **Lines of Code added per minute** by parsing GitPython commit timestamps.
68
+ - Flags physically impossible typing speeds (> 50 LPM).
69
+ - **Flag:** Velocity > 50 LPM → Instant +0.4 boost to final score.
70
+
71
+ ### Score Aggregation
72
+ ```
73
+ final_score = (text × 0.4) + (code × 0.4) + (velocity × 0.2) + amplification_boost
74
+ ```
75
+ - Results in a deterministic, explainable AI-likelihood band: **Likely Human / Mixed / Likely AI-Assisted**
76
+
77
+ ---
78
+
79
+ ## 4. System Architecture
80
+
81
+ ```
82
+ User → [sniff interactive] → Theme Selector → Repository Connect
83
+ → Git Graph Extraction (GitPython)
84
+ → Text Perplexity Engine (GPT-2 local)
85
+ → AST Code Entropy Engine (Python ast)
86
+ → Velocity Behavioral Engine (timestamps)
87
+ → Score Aggregation → Rich Dashboard + Plotille Charts
88
+ ```
89
+
90
+ Sniff is **stateless** and requires no external database. All analysis runs in-memory.
91
+
92
+ ---
93
+
94
+ ## 5. Tech Stack
95
+
96
+ | Layer | Technology |
97
+ |---|---|
98
+ | Language | Python 3.10+ |
99
+ | CLI Framework | Typer |
100
+ | UI & Layout | Rich (Tables/Panels) |
101
+ | ASCII Charts | Plotille |
102
+ | ASCII Typography | PyFiglet |
103
+ | Git Data | GitPython |
104
+ | NLP Model | HuggingFace Transformers (GPT-2) |
105
+ | Code Parsing | Python `ast` |
106
+ | ML Backend | PyTorch |
107
+
108
+ ---
109
+
110
+ ## 6. Installation
111
+
112
+ ```bash
113
+ git clone https://github.com/mrgear111/sniff.git
114
+ cd sniff
115
+ python -m venv venv
116
+ source venv/bin/activate
117
+ pip install -e .
118
+ ```
119
+
120
+ ---
121
+
122
+ ## 7. Usage
123
+
124
+ ### Interactive REPL
125
+ ```bash
126
+ sniff interactive
127
+ ```
128
+
129
+ | Command | Description |
130
+ |---|---|
131
+ | `cd <path or url>` | Switch repo. Pastes GitHub URLs auto-clone to a local cache |
132
+ | `scan [count]` | Analyze the N most recent commits. Default: 10 |
133
+ | `stats [count]` | View contributor AI leaderboard. Default: 50 |
134
+ | `theme` | Switch syntax color theme (Dark / Light / Colorblind) |
135
+ | `clear` | Clear the terminal |
136
+ | `exit` | Quit the session |
137
+
138
+ ### Headless / CI Mode
139
+ ```bash
140
+ sniff scan --path /path/to/repo --json
141
+ sniff stats --path /path/to/repo --json
142
+ ```
143
+
144
+ ---
145
+
146
+ ## 8. Disclaimer
147
+
148
+ Sniff relies on statistical ML models and behavioral heuristics. It is a powerful auditing signal, not a definitive legal claim of AI generation. Results should always be reviewed by a human auditor before action is taken.
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,45 @@
1
+ from setuptools import setup, find_packages
2
+ from pathlib import Path
3
+
4
+ # Read the contents of the README file
5
+ this_directory = Path(__file__).parent
6
+ long_description = (this_directory / "README.md").read_text(encoding="utf-8")
7
+
8
+ setup(
9
+ name="sniff-cli",
10
+ version="1.0.0",
11
+ author="AlphaOneLabs",
12
+ author_email="hello@alphaonelabs.com",
13
+ description="A terminal-native AI-likelihood detection engine for Git repositories.",
14
+ long_description=long_description,
15
+ long_description_content_type="text/markdown",
16
+ url="https://github.com/alphaonelabs/sniff",
17
+ packages=find_packages(),
18
+ install_requires=[
19
+ "typer>=0.12.3",
20
+ "rich>=13.7.1",
21
+ "questionary>=2.0.1",
22
+ "gitpython>=3.1.43",
23
+ "nltk>=3.8.1",
24
+ "torch>=2.2.0",
25
+ "transformers>=4.40.0",
26
+ "numpy>=1.26.0",
27
+ "sentence-transformers>=2.7.0",
28
+ "plotille>=5.0.0",
29
+ "pyfiglet>=1.0.2",
30
+ "anthropic",
31
+ "python-dotenv"
32
+ ],
33
+ entry_points={
34
+ "console_scripts": [
35
+ "sniff=sniff_cli.main:main",
36
+ ],
37
+ },
38
+ classifiers=[
39
+ "Programming Language :: Python :: 3",
40
+ "License :: OSI Approved :: MIT License",
41
+ "Operating System :: OS Independent",
42
+ "Environment :: Console",
43
+ ],
44
+ python_requires=">=3.9",
45
+ )
File without changes
File without changes
@@ -0,0 +1,195 @@
1
+ """
2
+ Author Style Baseline Engine
3
+ ─────────────────────────────
4
+ Builds a statistical fingerprint of each developer's historical coding style
5
+ from their first N commits, then flags deviations in new commits.
6
+
7
+ Signals tracked:
8
+ - avg_function_length: average lines per function
9
+ - avg_commit_size: average lines changed per commit
10
+ - comment_ratio: fraction of lines that are comments
11
+ - naming_style: dominant convention (snake_case, camelCase, etc.)
12
+ - avg_line_length: mean line length
13
+ - line_length_variance: how "regular" their line lengths are (low = AI)
14
+ """
15
+
16
+ import re
17
+ import math
18
+ import statistics
19
+ from collections import defaultdict, Counter
20
+
21
+
22
+ def _mean(lst):
23
+ return statistics.mean(lst) if lst else 0.0
24
+
25
+ def _stdev(lst):
26
+ return statistics.stdev(lst) if len(lst) >= 2 else 0.0
27
+
28
+ def _comment_ratio(diff: str) -> float:
29
+ lines = [l.strip() for l in diff.split('\n') if l.strip()]
30
+ if not lines:
31
+ return 0.0
32
+ comments = sum(1 for l in lines if l.startswith(('#', '//', '/*', '*', '<!--')))
33
+ return comments / len(lines)
34
+
35
+ def _avg_line_length(diff: str) -> float:
36
+ lines = [l for l in diff.split('\n') if l.strip()]
37
+ return _mean([len(l) for l in lines]) if lines else 0.0
38
+
39
+ def _line_length_variance(diff: str) -> float:
40
+ """Low variance = suspiciously regular = AI-like."""
41
+ lines = [len(l) for l in diff.split('\n') if l.strip()]
42
+ return _stdev(lines) if lines else 0.0
43
+
44
+ def _detect_naming_style(diff: str) -> str:
45
+ """Returns dominant naming style: 'snake_case', 'camelCase', 'mixed'."""
46
+ identifiers = re.findall(r'\b([a-zA-Z_][a-zA-Z0-9_]{2,})\b', diff)
47
+ snake = sum(1 for i in identifiers if '_' in i and i.islower())
48
+ camel = sum(1 for i in identifiers if re.search(r'[a-z][A-Z]', i))
49
+ if snake > camel * 1.5:
50
+ return 'snake_case'
51
+ elif camel > snake * 1.5:
52
+ return 'camelCase'
53
+ return 'mixed'
54
+
55
+ def _extract_function_lengths(diff: str) -> list:
56
+ """Rough estimation of function sizes from indentation patterns."""
57
+ lengths = []
58
+ current_fn_lines = 0
59
+ in_function = False
60
+ for line in diff.split('\n'):
61
+ stripped = line.strip()
62
+ if re.match(r'^(def |function |async function |const \w+ = \(|class )', stripped):
63
+ if in_function and current_fn_lines > 0:
64
+ lengths.append(current_fn_lines)
65
+ in_function = True
66
+ current_fn_lines = 1
67
+ elif in_function:
68
+ current_fn_lines += 1
69
+ if in_function and current_fn_lines > 0:
70
+ lengths.append(current_fn_lines)
71
+ return lengths
72
+
73
+
74
+ class AuthorBaseline:
75
+ """Per-author style profile built from historical commits."""
76
+
77
+ def __init__(self):
78
+ self.profiles = {} # author -> style dict
79
+
80
+ def build_profiles(self, commits, get_commit_diff_fn):
81
+ """
82
+ Build a baseline from the OLDEST commits (which pre-date AI assistance).
83
+ Uses the oldest 30 commits per author to establish the baseline,
84
+ then flags the newest commits as deviations.
85
+ """
86
+ # Collect data per author (oldest commits first)
87
+ author_data = defaultdict(list)
88
+ for commit in reversed(commits):
89
+ author = commit.author.name
90
+ diff = get_commit_diff_fn(commit)
91
+ if not diff.strip():
92
+ continue
93
+ lines = [l for l in diff.split('\n') if l.strip()]
94
+ author_data[author].append({
95
+ 'diff': diff,
96
+ 'lines': len(lines),
97
+ 'comment_ratio': _comment_ratio(diff),
98
+ 'avg_line_length': _avg_line_length(diff),
99
+ 'line_length_variance': _line_length_variance(diff),
100
+ 'fn_lengths': _extract_function_lengths(diff),
101
+ 'naming_style': _detect_naming_style(diff),
102
+ })
103
+
104
+ for author, entries in author_data.items():
105
+ # Use oldest half as baseline, newest half as subject of analysis
106
+ baseline_entries = entries[:max(1, len(entries) // 2)]
107
+
108
+ commit_sizes = [e['lines'] for e in baseline_entries]
109
+ comment_ratios = [e['comment_ratio'] for e in baseline_entries]
110
+ line_lengths = [e['avg_line_length'] for e in baseline_entries]
111
+ variances = [e['line_length_variance'] for e in baseline_entries]
112
+ fn_lengths = []
113
+ for e in baseline_entries:
114
+ fn_lengths.extend(e['fn_lengths'])
115
+ naming_styles = [e['naming_style'] for e in baseline_entries]
116
+ dominant_naming = Counter(naming_styles).most_common(1)[0][0] if naming_styles else 'mixed'
117
+
118
+ self.profiles[author] = {
119
+ 'avg_commit_size': _mean(commit_sizes),
120
+ 'commit_size_stdev': _stdev(commit_sizes),
121
+ 'avg_comment_ratio': _mean(comment_ratios),
122
+ 'avg_line_length': _mean(line_lengths),
123
+ 'avg_line_variance': _mean(variances),
124
+ 'avg_fn_length': _mean(fn_lengths),
125
+ 'dominant_naming': dominant_naming,
126
+ 'total_baseline_commits': len(baseline_entries),
127
+ }
128
+
129
+ def analyze_deviation(self, author: str, diff: str) -> dict:
130
+ """
131
+ Compare a single commit's style against the author's baseline.
132
+ Returns a score 0-1 where high score = strong deviation from baseline.
133
+ """
134
+ if author not in self.profiles or not diff.strip():
135
+ return {"score": 0.0, "reason": "No baseline available"}
136
+
137
+ profile = self.profiles[author]
138
+ # Need at least 5 baseline commits for reliable comparison
139
+ if profile['total_baseline_commits'] < 5:
140
+ return {"score": 0.0, "reason": "Insufficient baseline history"}
141
+
142
+ score = 0.0
143
+ reasons = []
144
+
145
+ lines = [l for l in diff.split('\n') if l.strip()]
146
+ commit_size = len(lines)
147
+ comment_ratio = _comment_ratio(diff)
148
+ line_variance = _line_length_variance(diff)
149
+ naming = _detect_naming_style(diff)
150
+ fn_lengths = _extract_function_lengths(diff)
151
+ avg_fn = _mean(fn_lengths) if fn_lengths else 0.0
152
+
153
+ baseline_size = profile['avg_commit_size']
154
+ baseline_stdev = profile['commit_size_stdev']
155
+
156
+ # ── 1. Commit size deviation ────────────────────────────────────────
157
+ if baseline_size > 0 and baseline_stdev > 0:
158
+ z_score = abs(commit_size - baseline_size) / (baseline_stdev + 1)
159
+ if z_score > 2.0:
160
+ score += 0.4
161
+ reasons.append(f"Commit size extreme outlier (z={z_score:.1f}σ from author baseline of {baseline_size:.0f} lines)")
162
+ elif z_score > 1.2:
163
+ score += 0.2
164
+ reasons.append(f"Unusual commit size (z={z_score:.1f}σ from author baseline)")
165
+
166
+ # ── 2. Comment ratio deviation ──────────────────────────────────────
167
+ baseline_cr = profile['avg_comment_ratio']
168
+ cr_diff = comment_ratio - baseline_cr
169
+ if cr_diff > 0.10 and commit_size > 15:
170
+ score += 0.25
171
+ reasons.append(f"Comment density spike (+{cr_diff*100:.0f}% above author baseline)")
172
+
173
+ # ── 3. Line length regularity (AI writes unnaturally consistent lines)─
174
+ baseline_var = profile['avg_line_variance']
175
+ if baseline_var > 4 and line_variance < (baseline_var * 0.5) and commit_size > 20:
176
+ score += 0.3
177
+ reasons.append(f"Abnormally regular line lengths (variance {line_variance:.1f} vs author baseline {baseline_var:.1f})")
178
+
179
+ # ── 4. Naming style shift ────────────────────────────────────────────
180
+ if naming != profile['dominant_naming'] and naming != 'mixed' and commit_size > 15:
181
+ score += 0.2
182
+ reasons.append(f"Naming style shift ({profile['dominant_naming']} → {naming})")
183
+
184
+ # ── 5. Function length anomaly ───────────────────────────────────────
185
+ baseline_fn = profile['avg_fn_length']
186
+ if baseline_fn > 0 and avg_fn > 0:
187
+ fn_ratio = avg_fn / baseline_fn
188
+ if fn_ratio > 1.8:
189
+ score += 0.25
190
+ reasons.append(f"Functions {fn_ratio:.1f}x longer than author historical average")
191
+
192
+ return {
193
+ "score": min(score, 1.0),
194
+ "reason": "; ".join(reasons) if reasons else "Style consistent with author baseline"
195
+ }