PyPI - diffsniff-gatekeeper - Versions diffs - 0.1.0__tar.gz - Mend

diffsniff-gatekeeper 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

diffsniff_gatekeeper-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,162 @@
+Metadata-Version: 2.4
+Name: diffsniff-gatekeeper
+Version: 0.1.0
+Summary: A local git gatekeeper that blocks AI-generated slop in PR descriptions.
+Author-email: Kaushal <tiwarikaushal2012@gmail.com>
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Requires-Dist: litellm>=1.0.0
+Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: numpy>=1.20.0
+# DiffSniff
+**Stop AI-generated filler from sneaking into your Git history.**
+DiffSniff is a local, terminal-based gatekeeper that cross-references your staged Git changes against your Pull Request description. If it detects that the description is generic AI-generated filler that doesn't accurately reflect the code you've written, it blocks the commit.
+---
+## Why DiffSniff?
+Traditional AI detectors focus on writing style, which makes them easy to bypass. You can simply ask a model to "write like a human."
+DiffSniff takes a different approach.
+Instead of analyzing how text is written, it verifies whether the claims in your PR description actually match the code changes in your diff.
+### How It Works
+#### Local Heuristics
+DiffSniff performs lightweight local analysis to identify low-effort copy-paste descriptions by measuring:
+* Vocabulary overlap
+* Structural variance
+* Content specificity
+This catches obvious mismatches instantly without requiring an API call.
+#### Adversarial Q&A
+DiffSniff then:
+1. Feeds your Git diff to an LLM.
+2. Generates highly specific technical questions about the code changes.
+3. Checks whether your PR description answers those questions.
+If your PR claims one thing while the code does another, DiffSniff flags it.
+---
+## Installation
+Install directly from PyPI:
+```bash
+pip install diffsniff-gatekeeper
+```
+---
+## Configuration
+### Bring Your Own Model
+DiffSniff uses LiteLLM under the hood, allowing you to use virtually any supported provider, including:
+* Gemini
+* OpenAI
+* Anthropic
+* Local models
+### Example: Gemini
+Export your API key:
+```bash
+export GEMINI_API_KEY="your-google-ai-key"
+```
+### Optional Configuration
+Create a `config.json` file in your working directory to customize behavior:
+```json
+{
+  "model": "gemini/gemma-4-31b-it",
+  "slop_threshold": 55,
+  "num_questions": 3
+}
+```
+#### Configuration Options
+| Option           | Description                                 |
+| ---------------- | ------------------------------------------- |
+| `model`          | LLM used for adversarial questioning        |
+| `slop_threshold` | Minimum score required to pass validation   |
+| `num_questions`  | Number of generated code-specific questions |
+### Using OpenAI
+To switch providers:
+```json
+{
+  "model": "gpt-4o"
+}
+```
+Then export your API key:
+```bash
+export OPENAI_API_KEY="your-openai-key"
+```
+---
+## Usage
+1. Stage your code changes:
+```bash
+git add .
+```
+2. Write a draft PR description in a Markdown file.
+3. Run DiffSniff:
+```bash
+diffsniff pr_draft.md
+```
+---
+## Results
+### ✅ Pass
+Your PR description accurately reflects the code changes and you're ready to push.
+### ❌ Fail
+Your description doesn't sufficiently explain what the code actually does.
+Rewrite the documentation and try again.
+---
+## Philosophy
+DiffSniff doesn't care whether a human or an AI wrote your PR description.
+It cares whether the description is *true*.
+If your documentation accurately explains the code, it passes.
+If it's generic filler disconnected from reality, it fails.

diffsniff_gatekeeper-0.1.0/README.md ADDED Viewed

@@ -0,0 +1,148 @@
+# DiffSniff
+**Stop AI-generated filler from sneaking into your Git history.**
+DiffSniff is a local, terminal-based gatekeeper that cross-references your staged Git changes against your Pull Request description. If it detects that the description is generic AI-generated filler that doesn't accurately reflect the code you've written, it blocks the commit.
+---
+## Why DiffSniff?
+Traditional AI detectors focus on writing style, which makes them easy to bypass. You can simply ask a model to "write like a human."
+DiffSniff takes a different approach.
+Instead of analyzing how text is written, it verifies whether the claims in your PR description actually match the code changes in your diff.
+### How It Works
+#### Local Heuristics
+DiffSniff performs lightweight local analysis to identify low-effort copy-paste descriptions by measuring:
+* Vocabulary overlap
+* Structural variance
+* Content specificity
+This catches obvious mismatches instantly without requiring an API call.
+#### Adversarial Q&A
+DiffSniff then:
+1. Feeds your Git diff to an LLM.
+2. Generates highly specific technical questions about the code changes.
+3. Checks whether your PR description answers those questions.
+If your PR claims one thing while the code does another, DiffSniff flags it.
+---
+## Installation
+Install directly from PyPI:
+```bash
+pip install diffsniff-gatekeeper
+```
+---
+## Configuration
+### Bring Your Own Model
+DiffSniff uses LiteLLM under the hood, allowing you to use virtually any supported provider, including:
+* Gemini
+* OpenAI
+* Anthropic
+* Local models
+### Example: Gemini
+Export your API key:
+```bash
+export GEMINI_API_KEY="your-google-ai-key"
+```
+### Optional Configuration
+Create a `config.json` file in your working directory to customize behavior:
+```json
+{
+  "model": "gemini/gemma-4-31b-it",
+  "slop_threshold": 55,
+  "num_questions": 3
+}
+```
+#### Configuration Options
+| Option           | Description                                 |
+| ---------------- | ------------------------------------------- |
+| `model`          | LLM used for adversarial questioning        |
+| `slop_threshold` | Minimum score required to pass validation   |
+| `num_questions`  | Number of generated code-specific questions |
+### Using OpenAI
+To switch providers:
+```json
+{
+  "model": "gpt-4o"
+}
+```
+Then export your API key:
+```bash
+export OPENAI_API_KEY="your-openai-key"
+```
+---
+## Usage
+1. Stage your code changes:
+```bash
+git add .
+```
+2. Write a draft PR description in a Markdown file.
+3. Run DiffSniff:
+```bash
+diffsniff pr_draft.md
+```
+---
+## Results
+### ✅ Pass
+Your PR description accurately reflects the code changes and you're ready to push.
+### ❌ Fail
+Your description doesn't sufficiently explain what the code actually does.
+Rewrite the documentation and try again.
+---
+## Philosophy
+DiffSniff doesn't care whether a human or an AI wrote your PR description.
+It cares whether the description is *true*.
+If your documentation accurately explains the code, it passes.
+If it's generic filler disconnected from reality, it fails.

diffsniff_gatekeeper-0.1.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,28 @@
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "diffsniff-gatekeeper"  # Change this to something unique if PyPI rejects it!
+version = "0.1.0"
+authors = [
+  { name="Kaushal", email="tiwarikaushal2012@gmail.com" }
+]
+description = "A local git gatekeeper that blocks AI-generated slop in PR descriptions."
+readme = "README.md"
+requires-python = ">=3.9"
+classifiers = [
+    "Programming Language :: Python :: 3",
+    "License :: OSI Approved :: MIT License",
+    "Operating System :: OS Independent",
+]
+dependencies = [
+    "litellm>=1.0.0",
+    "scikit-learn>=1.0.0",
+    "numpy>=1.20.0"
+]
+[project.scripts]
+# This line is the CLI magic. It maps the terminal command "diffsniff"
+# to the main() function inside your src/diffsniff/cli.py file.
+diffsniff = "diffsniff.cli:main"

diffsniff_gatekeeper-0.1.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0

diffsniff_gatekeeper-0.1.0/src/diffsniff/analyzers/__init__.py ADDED Viewed

File without changes

diffsniff_gatekeeper-0.1.0/src/diffsniff/analyzers/adversarial.py ADDED Viewed

@@ -0,0 +1,122 @@
+import json
+import os
+from typing import Dict, Any, Set
+from litellm import completion
+from diffsniff.analyzers.base import BaseAnalyzer
+class AdversarialQAExpert(BaseAnalyzer):
+    """
+    Layer 2: The LLM Interrogator.
+    Generates a configurable number of technical questions based ONLY on the code diff,
+    then checks if the developer's written prose actually answers them.
+    """
+    def __init__(self, default_num_questions: int = 3):
+        self.default_num_questions = default_num_questions
+    @property
+    def name(self) -> str:
+        return "Adversarial Q&A Engine"
+    def analyze(self, pr_text: str, code_tokens: Set[str], raw_diff: str = "") -> Dict[str, Any]:
+        # 1. Graceful Fallback: If no API key is found, skip the LLM check so the CLI works offline.
+        if not os.environ.get("GEMINI_API_KEY") and not os.environ.get("OPENROUTER_API_KEY") and not os.environ.get("OPENAI_API_KEY"):
+            return {
+                "score_penalty": 0,
+                "metrics": {"status": "Offline Mode. API Key missing. Skipping Adversarial layer."}
+            }
+        if not raw_diff.strip():
+            return {"score_penalty": 0, "metrics": {"status": "No diff provided to interrogate."}}
+        # 2. Load model & question count configurations from config if they exist
+        model = "gemini/gemma-4-31b-it" # Default
+        num_questions = self.default_num_questions
+        if os.path.exists("config.json"):
+            try:
+                with open("config.json", "r") as f:
+                    config_data = json.load(f)
+                    model = config_data.get("model", model)
+                    num_questions = config_data.get("num_questions", num_questions)
+            except Exception:
+                pass # Fallback to default if json is malformed
+        try:
+            # PHASE 1: Code Interrogation (Diff -> Dynamic Questions)
+            q_prompt = (
+                f"[GIT DIFF]\n{raw_diff}\n\n"
+                f"Act as a strict Senior Software Engineer. Read the git diff above and generate EXACTLY {num_questions} "
+                f"highly specific technical questions a developer must be able to answer if they actually wrote this code. "
+                f"Do not include pleasantries, formatting, or intro text. Just output the {num_questions} questions on separate lines."
+            )
+            q_response = completion(
+                model=model,
+                messages=[{"role": "user", "content": q_prompt}],
+                temperature=0.1
+            )
+            questions = q_response.choices[0].message.content
+            # PHASE 2: Cross-Examination (Questions + Prose -> Dynamic JSON Verdict)
+            eval_prompt = (
+                f"[QUESTIONS TO ANSWER]\n{questions}\n\n"
+                f"[PROPOSED PR DESCRIPTION]\n{pr_text}\n\n"
+                f"Determine if the PR Description factually answers each of the questions above based ONLY on the provided text. "
+                f"Also, identify any major claims in the description that are NOT verified by the questions/diff context. "
+                f"Output strictly in JSON format with this exact structure: \n"
+                '{\n'
+                '  "assessments": [\n'
+                '    { "question_number": 1, "answered": true/false },\n'
+                '    { "question_number": 2, "answered": true/false }\n'
+                '  ],\n'
+                '  "unverified_claims": ["list", "fluff", "here"]\n'
+                '}'
+            )
+            e_response = completion(
+                model=model,
+                messages=[{"role": "user", "content": eval_prompt}],
+                response_format={"type": "json_object"},
+                temperature=0.0
+            )
+            raw_content = e_response.choices[0].message.content
+            # Defensive parser: Extract strictly between first { and last }
+            start_idx = raw_content.find('{')
+            end_idx = raw_content.rfind('}')
+            if start_idx != -1 and end_idx != -1 and end_idx > start_idx:
+                json_str = raw_content[start_idx:end_idx + 1]
+            else:
+                json_str = raw_content
+            eval_data = json.loads(json_str)
+            # ⚖️ Dynamic Penalty Engine
+            assessments = eval_data.get("assessments", [])
+            answered_count = sum(1 for item in assessments if item.get("answered") is True)
+            # Avoid division by zero issues
+            effective_num_qs = len(assessments) if len(assessments) > 0 else num_questions
+            # Proportional scaling (questions account for up to 60 points of penalty)
+            penalty_per_unanswered = 60 / effective_num_qs
+            penalty = (effective_num_qs - answered_count) * penalty_per_unanswered
+            # Flat penalty for hallucinated claims (up to 40 points)
+            unverified = eval_data.get("unverified_claims", [])
+            if len(unverified) > 0:
+                penalty += 40
+            return {
+                "score_penalty": min(int(penalty), 100),
+                "metrics": {
+                    "questions_generated": questions,
+                    "evaluation_matrix": eval_data
+                }
+            }
+        except Exception as e:
+            return {"score_penalty": 0, "metrics": {"error": f"LLM Generation Failure: {str(e)}"}}

diffsniff_gatekeeper-0.1.0/src/diffsniff/analyzers/base.py ADDED Viewed

@@ -0,0 +1,20 @@
+# base version for all analyzers. May change in the future.abs
+from abc import ABC,abstractmethod
+from typing import Dict,Any
+class BaseAnalyzer(ABC):
+    @property
+    @abstractmethod
+    def name(self)->str:
+        """The name of the analyzer"""
+        pass
+    @abstractmethod
+    def analyze(self,pr_text:str,code_tokens:str,raw_diff:str=""):
+        """
+        Takes the written text and the physical code changes,
+        and returns a dictionary of metrics.
+        """
+        pass

diffsniff_gatekeeper-0.1.0/src/diffsniff/analyzers/ml_expert.py ADDED Viewed

@@ -0,0 +1,127 @@
+import math
+import re
+from typing import Dict, Any, Set
+from diffsniff.analyzers.base import BaseAnalyzer
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+class SemanticsMLExpert(BaseAnalyzer):
+    """
+    Layer 1: The ML & Statistical Expert.
+    Isolates, sanitizes, and normalizes code telemetry and description prose
+    to compute deterministic context alignment scores.
+    """
+    @property
+    def name(self) -> str:
+        return "ML & Statistical Expert"
+    def clean_and_explode_tokens(self, token_set: Set[str]) -> str:
+        """Splits structural camelCase and snake_case tokens into plain words."""
+        exploded_words = []
+        for token in token_set:
+            camel_split = re.sub(r'([a-z0-9])([A-Z])', r'\1 \2', token)
+            clean_string = re.sub(r'[_.-]+', ' ', camel_split)
+            words = re.findall(r'\b[a-zA-Z]{3,}\b', clean_string.lower())
+            exploded_words.extend(words)
+        return " ".join(exploded_words)
+    def extract_diff_features(self, raw_diff: str) -> Dict[str, Any]:
+        """Parses raw patch structural telemetry to track physical code changes."""
+        if not raw_diff.strip():
+            return {"lines_added": 0, "lines_removed": 0, "entropy_factor": 0.0, "structural_keywords_count": 0}
+        lines = raw_diff.splitlines()
+        added_lines = [l for l in lines if l.startswith('+') and not l.startswith('+++')]
+        removed_lines = [l for l in lines if l.startswith('-') and not l.startswith('---')]
+        structural_keywords = re.compile(r'\b(def|class|import|return|async|await|try|except|function|const|let)\b')
+        keyword_hits = sum(len(structural_keywords.findall(line)) for line in added_lines)
+        files_changed = len([l for l in lines if l.startswith('+++ b/')])
+        total_churn = len(added_lines) + len(removed_lines)
+        entropy = round(files_changed / total_churn, 4) if total_churn > 0 else 0.0
+        return {
+            "lines_added": len(added_lines),
+            "lines_removed": len(removed_lines),
+            "entropy_factor": entropy,
+            "structural_keywords_count": keyword_hits
+        }
+    def sanitize_prose(self, text: str) -> str:
+        """
+        CRITICAL CLOSING LOOP: Strips out markdown code blocks (```...```)
+        and raw inline backticks to prevent developers from spoofing token alignment.
+        """
+        # Remove multiline code blocks completely
+        no_code_blocks = re.sub(r'```[\s\S]*?```', ' ', text)
+        # Remove inline backticks, symbols, and formatting structures
+        return re.sub(r'[*#`_\[\]\-]+', ' ', no_code_blocks.lower())
+    def compute_lexical_diversity(self, words: list) -> float:
+        """Computes Type-Token Ratio (TTR) to measure vocabulary variation."""
+        if not words:
+            return 0.0
+        return round(len(set(words)) / len(words), 4)
+    def calculate_burstiness_variance(self, text: str) -> float:
+        """Calculates standard deviation (sigma) of sentence lengths to trace sentence uniformity."""
+        sentences = [s.strip() for s in re.split(r'[.!?]+', text) if s.strip()]
+        if len(sentences) <= 1:
+            return 0.0
+        lengths = [len(re.findall(r'\b[a-zA-Z]+\b', s)) for s in sentences]
+        mean_length = sum(lengths) / len(lengths)
+        variance = sum((x - mean_length) ** 2 for x in lengths) / len(lengths)
+        return round(math.sqrt(variance), 4)
+    def analyze(self, pr_text: str, code_tokens: set, raw_diff: str = "") -> Dict[str, Any]:
+        """Orchestrates layer evaluation parameters into a unified penalty score."""
+        # Sanitize prose text before extraction loops
+        clean_prose = self.sanitize_prose(pr_text)
+        words = re.findall(r'\b[a-zA-Z]+\b', clean_prose)
+        total_words = len(words)
+        diff_telemetry = self.extract_diff_features(raw_diff)
+        ttr_score = self.compute_lexical_diversity(words)
+        burstiness_sigma = self.calculate_burstiness_variance(clean_prose)
+        # Calculate TF-IDF Cosine Similarity on normalized text structures
+        normalized_code_string = self.clean_and_explode_tokens(code_tokens)
+        semantic_distance = 1.0
+        if normalized_code_string.strip() and clean_prose.strip():
+            try:
+                vectorizer = TfidfVectorizer(stop_words='english')
+                tfidf_matrix = vectorizer.fit_transform([normalized_code_string, clean_prose])
+                similarity = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])[0][0]
+                semantic_distance = round(float(1.0 - similarity), 4)
+            except Exception:
+                pass
+        total_churn_lines = diff_telemetry["lines_added"] + diff_telemetry["lines_removed"]
+        volatility_ratio = round(total_words / total_churn_lines, 2) if total_churn_lines > 0 else float(total_words)
+        # ⚖️ Penalty Weight Matrix
+        base_penalty = semantic_distance * 60
+        if ttr_score < 0.40:
+            base_penalty += 15
+        if burstiness_sigma < 2.0:
+            base_penalty += 15
+        if volatility_ratio > 40.0 and diff_telemetry["structural_keywords_count"] == 0:
+            base_penalty += 20
+        final_ml_penalty = min(max(int(base_penalty), 0), 100)
+        return {
+            "score_penalty": final_ml_penalty,
+            "metrics": {
+                "semantic_distance": semantic_distance,
+                "lexical_diversity_ttr": ttr_score,
+                "sentence_burstiness_sigma": burstiness_sigma,
+                "volatility_ratio": volatility_ratio,
+                "diff_telemetry": diff_telemetry
+            }
+        }

diffsniff_gatekeeper-0.1.0/src/diffsniff/cli.py ADDED Viewed

@@ -0,0 +1,107 @@
+import sys
+import json
+import os
+from typing import Set
+from diffsniff.git_engine import GitEngine
+from diffsniff.analyzers.ml_expert import SemanticsMLExpert
+from diffsniff.analyzers.adversarial import AdversarialQAExpert
+def load_config() -> dict:
+    """Loads operational thresholds and parameter overrides from config.json."""
+    default_config = {
+        "model": "gemini/gemma-4-31b-it",
+        "temperature": 0.0,
+        "slop_threshold": 55,
+        "num_questions": 3
+    }
+    if os.path.exists("config.json"):
+        try:
+            with open("config.json", "r") as f:
+                user_config = json.load(f)
+                default_config.update(user_config)
+        except Exception:
+            print("warning: config.json is malformed. Using default internal parameters.")
+    return default_config
+def main():
+    if len(sys.argv) < 2:
+        print("Usage: diffsniff <path_to_pr_description.md>")
+        sys.exit(1)
+    pr_file_path = sys.argv[1]
+    if not os.path.exists(pr_file_path):
+        print(f"error: Target file not found: {pr_file_path}")
+        sys.exit(1)
+    with open(pr_file_path, 'r', encoding='utf-8') as f:
+        pr_text = f.read()
+    print("diffsniff: extracting local repository telemetry...")
+    # 1. Gather Telemetry
+    raw_diff = GitEngine.get_live_diff()
+    code_tokens = GitEngine.extract_code_tokens(raw_diff)
+    if not raw_diff.strip():
+        print("warning: clean working directory. parsing cached modifications only.")
+    # 2. Load configurations
+    config = load_config()
+    threshold = config.get("slop_threshold", 55)
+    # 3. Initialize Experts
+    ml_expert = SemanticsMLExpert()
+    adv_expert = AdversarialQAExpert(default_num_questions=config.get("num_questions", 3))
+    print("diffsniff: running lexical statistical matching...")
+    ml_results = ml_expert.analyze(pr_text, code_tokens, raw_diff)
+    ml_penalty = ml_results.get("score_penalty", 0)
+    print("diffsniff: verifying structural change alignment...")
+    adv_results = adv_expert.analyze(pr_text, code_tokens, raw_diff)
+    adv_penalty = adv_results.get("score_penalty", 0)
+    # 4. Layer 3: Executive Judge Decision (Weighted Integration)
+    api_key_active = any(os.environ.get(k) for k in ["GEMINI_API_KEY", "OPENROUTER_API_KEY", "OPENAI_API_KEY"])
+    if not api_key_active:
+        print("info: offline fallback active. evaluating strictly using local lexical heuristics.")
+        total_penalty = ml_penalty
+    else:
+        total_penalty = int((ml_penalty * 0.40) + (adv_penalty * 0.60))
+    # 5. Render Terminal Diagnostic Report
+    print("\n--- DiffSniff Analysis Summary ---")
+    print(f"  Lexical Deviation:  {ml_penalty}/100")
+    if api_key_active:
+        print(f"  Context Mismatch:   {adv_penalty}/100")
+    else:
+        print(f"  Context Mismatch:   [OFFLINE]")
+    print(f"  Evaluated Score:    {total_penalty}/100  (limit: {threshold})")
+    print("----------------------------------")
+    if total_penalty > threshold:
+        print("\nFAIL: PR description lacks sufficient correlation with actual codebase changes.")
+        eval_matrix = adv_results.get("metrics", {}).get("evaluation_matrix", {})
+        if eval_matrix:
+            unverified_claims = eval_matrix.get("unverified_claims", [])
+            if unverified_claims:
+                print("\nUnverified assertions detected in description:")
+                for claim in unverified_claims:
+                    print(f"  - {claim}")
+            qs = adv_results.get("metrics", {}).get("questions_generated", "")
+            if qs:
+                print("\nEnsure the description clearly addresses these technical points:")
+                for q in qs.strip().splitlines():
+                    if q.strip():
+                        print(f"  * {q}")
+        print("\nAction: Revise the summary to reflect the physical code changes.")
+        sys.exit(1)
+    else:
+        print("\nPASS: PR description verified successfully.")
+        sys.exit(0)
+if __name__ == "__main__":
+    main()

diffsniff_gatekeeper-0.1.0/src/diffsniff/git_engine.py ADDED Viewed

@@ -0,0 +1,66 @@
+import subprocess
+import re
+import sys
+class GitEngine:
+    """
+    Handles local shell execution to extract live telemetry from the git tree
+    without relying on external remote connections or pre-existing pushed commits.
+    """
+    @staticmethod
+    def get_live_diff() -> str:
+        """
+        Gathers raw text patch changes from the local staging area or active workspace.
+        """
+        try:
+            # 1. Try to read staged changes first
+            diff_bytes = subprocess.check_output(
+                ['git', 'diff', '--cached'],
+                stderr=subprocess.DEVNULL
+            )
+            diff_text = diff_bytes.decode('utf-8', errors='ignore')
+            # 2. Fall back to unstaged changes if staging is empty
+            if not diff_text.strip():
+                diff_bytes = subprocess.check_output(
+                    ['git', 'diff'],
+                    stderr=subprocess.DEVNULL
+                )
+                diff_text = diff_bytes.decode('utf-8', errors='ignore')
+            return diff_text
+        except subprocess.CalledProcessError:
+            print("❌ Telemetry Error: This command must be executed within a valid Git repository.")
+            sys.exit(1)
+        except FileNotFoundError:
+            print("❌ Dependency Error: 'git' CLI execution binary was not found on your system PATH.")
+            sys.exit(1)
+    @staticmethod
+    def extract_code_tokens(raw_diff: str) -> set:
+        """
+        Scans added lines inside the diff to isolate explicit programming identifiers
+        (filenames, function properties, variable references).
+        """
+        tokens = set()
+        if not raw_diff.strip():
+            return tokens
+        for line in raw_diff.splitlines():
+            # Target explicit file renaming/addition lines
+            if line.startswith('+++ b/'):
+                filename = line.split('/')[-1].strip()
+                tokens.add(filename.lower())
+                continue
+            # Target literal additions, stripping out lines that are purely deletions
+            if line.startswith('+') and not line.startswith('+++'):
+                # Extract alphanumerics starting with letters/underscores
+                found = re.findall(r'\b[a-zA-Z_][a-zA-Z0-9_]*\b', line)
+                for item in found:
+                    if len(item) > 3:  # Drop noisy shorthands (i, j, x, db, ok)
+                        tokens.add(item.lower())
+        return tokens

diffsniff_gatekeeper-0.1.0/src/diffsniff_gatekeeper.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,162 @@
+Metadata-Version: 2.4
+Name: diffsniff-gatekeeper
+Version: 0.1.0
+Summary: A local git gatekeeper that blocks AI-generated slop in PR descriptions.
+Author-email: Kaushal <tiwarikaushal2012@gmail.com>
+Classifier: Programming Language :: Python :: 3
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Requires-Dist: litellm>=1.0.0
+Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: numpy>=1.20.0
+# DiffSniff
+**Stop AI-generated filler from sneaking into your Git history.**
+DiffSniff is a local, terminal-based gatekeeper that cross-references your staged Git changes against your Pull Request description. If it detects that the description is generic AI-generated filler that doesn't accurately reflect the code you've written, it blocks the commit.
+---
+## Why DiffSniff?
+Traditional AI detectors focus on writing style, which makes them easy to bypass. You can simply ask a model to "write like a human."
+DiffSniff takes a different approach.
+Instead of analyzing how text is written, it verifies whether the claims in your PR description actually match the code changes in your diff.
+### How It Works
+#### Local Heuristics
+DiffSniff performs lightweight local analysis to identify low-effort copy-paste descriptions by measuring:
+* Vocabulary overlap
+* Structural variance
+* Content specificity
+This catches obvious mismatches instantly without requiring an API call.
+#### Adversarial Q&A
+DiffSniff then:
+1. Feeds your Git diff to an LLM.
+2. Generates highly specific technical questions about the code changes.
+3. Checks whether your PR description answers those questions.
+If your PR claims one thing while the code does another, DiffSniff flags it.
+---
+## Installation
+Install directly from PyPI:
+```bash
+pip install diffsniff-gatekeeper
+```
+---
+## Configuration
+### Bring Your Own Model
+DiffSniff uses LiteLLM under the hood, allowing you to use virtually any supported provider, including:
+* Gemini
+* OpenAI
+* Anthropic
+* Local models
+### Example: Gemini
+Export your API key:
+```bash
+export GEMINI_API_KEY="your-google-ai-key"
+```
+### Optional Configuration
+Create a `config.json` file in your working directory to customize behavior:
+```json
+{
+  "model": "gemini/gemma-4-31b-it",
+  "slop_threshold": 55,
+  "num_questions": 3
+}
+```
+#### Configuration Options
+| Option           | Description                                 |
+| ---------------- | ------------------------------------------- |
+| `model`          | LLM used for adversarial questioning        |
+| `slop_threshold` | Minimum score required to pass validation   |
+| `num_questions`  | Number of generated code-specific questions |
+### Using OpenAI
+To switch providers:
+```json
+{
+  "model": "gpt-4o"
+}
+```
+Then export your API key:
+```bash
+export OPENAI_API_KEY="your-openai-key"
+```
+---
+## Usage
+1. Stage your code changes:
+```bash
+git add .
+```
+2. Write a draft PR description in a Markdown file.
+3. Run DiffSniff:
+```bash
+diffsniff pr_draft.md
+```
+---
+## Results
+### ✅ Pass
+Your PR description accurately reflects the code changes and you're ready to push.
+### ❌ Fail
+Your description doesn't sufficiently explain what the code actually does.
+Rewrite the documentation and try again.
+---
+## Philosophy
+DiffSniff doesn't care whether a human or an AI wrote your PR description.
+It cares whether the description is *true*.
+If your documentation accurately explains the code, it passes.
+If it's generic filler disconnected from reality, it fails.

diffsniff_gatekeeper-0.1.0/src/diffsniff_gatekeeper.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,14 @@
+README.md
+pyproject.toml
+src/diffsniff/cli.py
+src/diffsniff/git_engine.py
+src/diffsniff/analyzers/__init__.py
+src/diffsniff/analyzers/adversarial.py
+src/diffsniff/analyzers/base.py
+src/diffsniff/analyzers/ml_expert.py
+src/diffsniff_gatekeeper.egg-info/PKG-INFO
+src/diffsniff_gatekeeper.egg-info/SOURCES.txt
+src/diffsniff_gatekeeper.egg-info/dependency_links.txt
+src/diffsniff_gatekeeper.egg-info/entry_points.txt
+src/diffsniff_gatekeeper.egg-info/requires.txt
+src/diffsniff_gatekeeper.egg-info/top_level.txt

diffsniff_gatekeeper-0.1.0/src/diffsniff_gatekeeper.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

diffsniff_gatekeeper-0.1.0/src/diffsniff_gatekeeper.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ [console_scripts]
2	+ diffsniff = diffsniff.cli:main

diffsniff_gatekeeper-0.1.0/src/diffsniff_gatekeeper.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,3 @@
+litellm>=1.0.0
+scikit-learn>=1.0.0
+numpy>=1.20.0

diffsniff_gatekeeper-0.1.0/src/diffsniff_gatekeeper.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ diffsniff