echo-guard 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,3 @@
1
+ .venv/
2
+ venv/
3
+ .DS_STORE
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Jeremy Wizenfeld
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,244 @@
1
+ Metadata-Version: 2.4
2
+ Name: echo-guard
3
+ Version: 0.1.0
4
+ Summary: Semantic linting CLI that detects codebase redundancy created by AI coding agents
5
+ License-Expression: MIT
6
+ License-File: LICENSE
7
+ Requires-Python: >=3.10
8
+ Requires-Dist: datasketch>=1.6.0
9
+ Requires-Dist: duckdb>=0.9.0
10
+ Requires-Dist: numpy>=1.24.0
11
+ Requires-Dist: pyyaml>=6.0
12
+ Requires-Dist: rich>=13.0.0
13
+ Requires-Dist: scikit-learn>=1.3.0
14
+ Requires-Dist: tree-sitter>=0.22.0
15
+ Requires-Dist: typer>=0.9.0
16
+ Requires-Dist: watchdog>=3.0.0
17
+ Provides-Extra: all
18
+ Requires-Dist: tree-sitter-c>=0.21.0; extra == 'all'
19
+ Requires-Dist: tree-sitter-cpp>=0.21.0; extra == 'all'
20
+ Requires-Dist: tree-sitter-go>=0.21.0; extra == 'all'
21
+ Requires-Dist: tree-sitter-java>=0.21.0; extra == 'all'
22
+ Requires-Dist: tree-sitter-javascript>=0.21.0; extra == 'all'
23
+ Requires-Dist: tree-sitter-python>=0.21.0; extra == 'all'
24
+ Requires-Dist: tree-sitter-ruby>=0.21.0; extra == 'all'
25
+ Requires-Dist: tree-sitter-rust>=0.21.0; extra == 'all'
26
+ Requires-Dist: tree-sitter-typescript>=0.21.0; extra == 'all'
27
+ Provides-Extra: dev
28
+ Requires-Dist: pre-commit>=3.0.0; extra == 'dev'
29
+ Requires-Dist: pytest>=7.0.0; extra == 'dev'
30
+ Provides-Extra: languages
31
+ Requires-Dist: tree-sitter-c>=0.21.0; extra == 'languages'
32
+ Requires-Dist: tree-sitter-cpp>=0.21.0; extra == 'languages'
33
+ Requires-Dist: tree-sitter-go>=0.21.0; extra == 'languages'
34
+ Requires-Dist: tree-sitter-java>=0.21.0; extra == 'languages'
35
+ Requires-Dist: tree-sitter-javascript>=0.21.0; extra == 'languages'
36
+ Requires-Dist: tree-sitter-python>=0.21.0; extra == 'languages'
37
+ Requires-Dist: tree-sitter-ruby>=0.21.0; extra == 'languages'
38
+ Requires-Dist: tree-sitter-rust>=0.21.0; extra == 'languages'
39
+ Requires-Dist: tree-sitter-typescript>=0.21.0; extra == 'languages'
40
+ Provides-Extra: mcp
41
+ Requires-Dist: mcp[cli]>=1.0.0; extra == 'mcp'
42
+ Description-Content-Type: text/markdown
43
+
44
+ # Echo Guard
45
+
46
+ **Semantic linting CLI that detects codebase redundancy created by AI coding agents.**
47
+
48
+ AI agents (Claude Code, Cursor, Copilot) write what's asked without knowing what already exists elsewhere in the repo. Echo Guard catches these duplicates — functionally identical code scattered across modules — before they become legacy debt.
49
+
50
+ Works across 9 languages. Detects cross-language redundancy. Zero cloud dependency — everything runs locally.
51
+
52
+ ## Installation
53
+
54
+ ```bash
55
+ pip install echo-guard
56
+ ```
57
+
58
+ For all 9 language grammars (recommended):
59
+
60
+ ```bash
61
+ pip install "echo-guard[languages]"
62
+ ```
63
+
64
+ Without `[languages]`, only Python is available.
65
+
66
+ ## Quick Start
67
+
68
+ ```bash
69
+ # Interactive setup — auto-detects languages, configures, indexes, and scans
70
+ echo-guard setup
71
+
72
+ # Or do it manually:
73
+ echo-guard index # index your codebase (~6s for a medium repo)
74
+ echo-guard scan # scan for redundancies (HIGH + MEDIUM shown)
75
+ echo-guard scan -v # include LOW-severity findings too
76
+ ```
77
+
78
+ ## How It Works
79
+
80
+ Echo Guard uses a 4-stage pipeline:
81
+
82
+ | Stage | What it does | Complexity |
83
+ | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
84
+ | **1. AST fingerprinting** | Tree-sitter parses every function, normalizes the AST (strips names, comments, literals), and hashes it. Exact structural clones are caught instantly. | O(n) |
85
+ | **2. Signature filtering** | Extracts metadata (param count, return type, call count) to eliminate 90%+ of candidates before heavy computation. | O(n) |
86
+ | **3. LSH + TF-IDF** | Locality Sensitive Hashing groups similar code vectors into buckets. TF-IDF with subword tokenization runs cosine similarity only on bucket neighbors. Catches semantic duplicates — even across languages. | O(n\*k) |
87
+ | **4. Intent filtering** | Domain-noun extraction, antonym detection, UI wrapper recognition, per-service boilerplate exclusion, and cross-language threshold gating. Removes false positives without losing signal. | O(n) |
88
+
89
+ The index is stored locally in DuckDB (`.echo-guard/index.duckdb`). Nothing leaves your machine.
90
+
91
+ ## Supported Languages
92
+
93
+ | Language | Extensions |
94
+ | ---------- | ----------------------------- |
95
+ | Python | `.py` |
96
+ | JavaScript | `.js`, `.jsx`, `.mjs`, `.cjs` |
97
+ | TypeScript | `.ts`, `.tsx` |
98
+ | Go | `.go` |
99
+ | Rust | `.rs` |
100
+ | Java | `.java` |
101
+ | Ruby | `.rb` |
102
+ | C | `.c`, `.h` |
103
+ | C++ | `.cpp`, `.cc`, `.cxx`, `.hpp` |
104
+
105
+ Cross-language detection works: a Python `validate_email()` will match a Go `ValidateEmail()` or a JS `validateEmail()`.
106
+
107
+ ## CLI Reference
108
+
109
+ | Command | Description |
110
+ | --------------------------- | ------------------------------------------------------------------------ |
111
+ | `echo-guard setup` | Interactive setup wizard — auto-detects repo, configures, indexes, scans |
112
+ | `echo-guard scan` | Scan for redundant code (HIGH + MEDIUM by default) |
113
+ | `echo-guard scan -v` | Include LOW-severity findings |
114
+ | `echo-guard index` | Index all functions in the repo |
115
+ | `echo-guard check FILES...` | Check specific files against the index (fast path for pre-commit) |
116
+ | `echo-guard watch` | Watch repo and auto-check on file save |
117
+ | `echo-guard health` | Compute codebase health score (0-100) |
118
+ | `echo-guard stats` | Show index statistics |
119
+ | `echo-guard install-hook` | Install git pre-commit hook |
120
+ | `echo-guard init` | Create default `.echoguard.yml` |
121
+ | `echo-guard languages` | List supported languages |
122
+ | `echo-guard clear-index` | Clear the index |
123
+
124
+ ### Key options
125
+
126
+ - `-t, --threshold FLOAT` — Similarity threshold 0.0-1.0 (default: 0.50)
127
+ - `-o, --output FORMAT` — Output format: `rich` (default), `json`, `compact`
128
+ - `-v, --verbose` — Include LOW-severity findings (hidden by default)
129
+ - `-d, --diff` — Show side-by-side diff for each match
130
+ - `--no-graph` — Disable dependency graph routing
131
+
132
+ ## Severity Levels
133
+
134
+ | Level | Similarity | What it means | Default behavior |
135
+ | ---------- | ---------- | ----------------------------------------------------------------------------------- | ----------------- |
136
+ | **HIGH** | 95-100% | Near-exact clones. Copy-pasted code with minimal changes. | Shown, fails CI |
137
+ | **MEDIUM** | 80-94% | Strong semantic match. Same logic, different variable names or minor restructuring. | Shown |
138
+ | **LOW** | 50-79% | Structural similarity. May be intentional patterns or real duplication. | Hidden (use `-v`) |
139
+
140
+ The default report shows only HIGH + MEDIUM.
141
+
142
+ ## Configuration
143
+
144
+ Create `.echoguard.yml` in your repo root (or run `echo-guard setup`):
145
+
146
+ ```yaml
147
+ threshold: 0.50
148
+ min_function_lines: 3
149
+ max_function_lines: 500
150
+ languages:
151
+ - python
152
+ - javascript
153
+ - typescript
154
+ fail_on: high # high, medium, low, none
155
+ enable_dep_graph: true
156
+
157
+ # Monorepo service boundaries (auto-detected if not set)
158
+ # service_boundaries:
159
+ # - services/worker
160
+ # - services/dashboard
161
+ ```
162
+
163
+ ### `.echoguardignore`
164
+
165
+ Gitignore-style file to exclude paths from scanning:
166
+
167
+ ```gitignore
168
+ docs/
169
+ tests/
170
+ *_generated.py
171
+ vendor/
172
+ ```
173
+
174
+ ### Recommended configs
175
+
176
+ **Small project** (< 500 functions):
177
+
178
+ ```yaml
179
+ threshold: 0.50
180
+ fail_on: high
181
+ enable_dep_graph: false
182
+ ```
183
+
184
+ **Large monorepo** (3K+ functions):
185
+
186
+ ```yaml
187
+ threshold: 0.60
188
+ min_function_lines: 4
189
+ fail_on: high
190
+ enable_dep_graph: true
191
+ service_boundaries:
192
+ - services/api
193
+ - services/worker
194
+ ```
195
+
196
+ **First-time setup** (advisory mode):
197
+
198
+ ```yaml
199
+ threshold: 0.50
200
+ fail_on: none
201
+ ```
202
+
203
+ ## CI Integration
204
+
205
+ Echo Guard exits non-zero when findings exceed the configured `fail_on` severity.
206
+
207
+ ```yaml
208
+ # GitHub Actions
209
+ - name: Check for redundant code
210
+ run: |
211
+ pip install "echo-guard[languages]"
212
+ echo-guard index
213
+ echo-guard scan
214
+ ```
215
+
216
+ ```bash
217
+ # Pre-commit hook (installed automatically)
218
+ echo-guard install-hook
219
+ ```
220
+
221
+ ## MCP Server (Claude Code Integration)
222
+
223
+ Echo Guard includes an MCP server so Claude Code can check for existing code before writing new functions:
224
+
225
+ ```bash
226
+ # Add to Claude Code
227
+ claude mcp add echo-guard -- python -m echo_guard.mcp_server
228
+ ```
229
+
230
+ | Tool | Description |
231
+ | ----------------------- | --------------------------------------------------------------------- |
232
+ | `check_before_write` | Pass proposed code, get back existing matches with import suggestions |
233
+ | `search_functions` | Search the index by function name, keyword, or language |
234
+ | `suggest_refactor` | Get full context for consolidating two redundant functions |
235
+ | `get_index_stats` | Index statistics and dependency graph info |
236
+ | `get_codebase_clusters` | View how the codebase is organized by domain |
237
+
238
+ ## Privacy
239
+
240
+ Echo Guard runs entirely on your machine. No code, metrics, or telemetry are sent anywhere. The index (`.echo-guard/index.duckdb`) contains function metadata from your repo — add it to your `.gitignore`.
241
+
242
+ ## License
243
+
244
+ [MIT](LICENSE)
@@ -0,0 +1,201 @@
1
+ # Echo Guard
2
+
3
+ **Semantic linting CLI that detects codebase redundancy created by AI coding agents.**
4
+
5
+ AI agents (Claude Code, Cursor, Copilot) write what's asked without knowing what already exists elsewhere in the repo. Echo Guard catches these duplicates — functionally identical code scattered across modules — before they become legacy debt.
6
+
7
+ Works across 9 languages. Detects cross-language redundancy. Zero cloud dependency — everything runs locally.
8
+
9
+ ## Installation
10
+
11
+ ```bash
12
+ pip install echo-guard
13
+ ```
14
+
15
+ For all 9 language grammars (recommended):
16
+
17
+ ```bash
18
+ pip install "echo-guard[languages]"
19
+ ```
20
+
21
+ Without `[languages]`, only Python is available.
22
+
23
+ ## Quick Start
24
+
25
+ ```bash
26
+ # Interactive setup — auto-detects languages, configures, indexes, and scans
27
+ echo-guard setup
28
+
29
+ # Or do it manually:
30
+ echo-guard index # index your codebase (~6s for a medium repo)
31
+ echo-guard scan # scan for redundancies (HIGH + MEDIUM shown)
32
+ echo-guard scan -v # include LOW-severity findings too
33
+ ```
34
+
35
+ ## How It Works
36
+
37
+ Echo Guard uses a 4-stage pipeline:
38
+
39
+ | Stage | What it does | Complexity |
40
+ | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- |
41
+ | **1. AST fingerprinting** | Tree-sitter parses every function, normalizes the AST (strips names, comments, literals), and hashes it. Exact structural clones are caught instantly. | O(n) |
42
+ | **2. Signature filtering** | Extracts metadata (param count, return type, call count) to eliminate 90%+ of candidates before heavy computation. | O(n) |
43
+ | **3. LSH + TF-IDF** | Locality Sensitive Hashing groups similar code vectors into buckets. TF-IDF with subword tokenization runs cosine similarity only on bucket neighbors. Catches semantic duplicates — even across languages. | O(n\*k) |
44
+ | **4. Intent filtering** | Domain-noun extraction, antonym detection, UI wrapper recognition, per-service boilerplate exclusion, and cross-language threshold gating. Removes false positives without losing signal. | O(n) |
45
+
46
+ The index is stored locally in DuckDB (`.echo-guard/index.duckdb`). Nothing leaves your machine.
47
+
48
+ ## Supported Languages
49
+
50
+ | Language | Extensions |
51
+ | ---------- | ----------------------------- |
52
+ | Python | `.py` |
53
+ | JavaScript | `.js`, `.jsx`, `.mjs`, `.cjs` |
54
+ | TypeScript | `.ts`, `.tsx` |
55
+ | Go | `.go` |
56
+ | Rust | `.rs` |
57
+ | Java | `.java` |
58
+ | Ruby | `.rb` |
59
+ | C | `.c`, `.h` |
60
+ | C++ | `.cpp`, `.cc`, `.cxx`, `.hpp` |
61
+
62
+ Cross-language detection works: a Python `validate_email()` will match a Go `ValidateEmail()` or a JS `validateEmail()`.
63
+
64
+ ## CLI Reference
65
+
66
+ | Command | Description |
67
+ | --------------------------- | ------------------------------------------------------------------------ |
68
+ | `echo-guard setup` | Interactive setup wizard — auto-detects repo, configures, indexes, scans |
69
+ | `echo-guard scan` | Scan for redundant code (HIGH + MEDIUM by default) |
70
+ | `echo-guard scan -v` | Include LOW-severity findings |
71
+ | `echo-guard index` | Index all functions in the repo |
72
+ | `echo-guard check FILES...` | Check specific files against the index (fast path for pre-commit) |
73
+ | `echo-guard watch` | Watch repo and auto-check on file save |
74
+ | `echo-guard health` | Compute codebase health score (0-100) |
75
+ | `echo-guard stats` | Show index statistics |
76
+ | `echo-guard install-hook` | Install git pre-commit hook |
77
+ | `echo-guard init` | Create default `.echoguard.yml` |
78
+ | `echo-guard languages` | List supported languages |
79
+ | `echo-guard clear-index` | Clear the index |
80
+
81
+ ### Key options
82
+
83
+ - `-t, --threshold FLOAT` — Similarity threshold 0.0-1.0 (default: 0.50)
84
+ - `-o, --output FORMAT` — Output format: `rich` (default), `json`, `compact`
85
+ - `-v, --verbose` — Include LOW-severity findings (hidden by default)
86
+ - `-d, --diff` — Show side-by-side diff for each match
87
+ - `--no-graph` — Disable dependency graph routing
88
+
89
+ ## Severity Levels
90
+
91
+ | Level | Similarity | What it means | Default behavior |
92
+ | ---------- | ---------- | ----------------------------------------------------------------------------------- | ----------------- |
93
+ | **HIGH** | 95-100% | Near-exact clones. Copy-pasted code with minimal changes. | Shown, fails CI |
94
+ | **MEDIUM** | 80-94% | Strong semantic match. Same logic, different variable names or minor restructuring. | Shown |
95
+ | **LOW** | 50-79% | Structural similarity. May be intentional patterns or real duplication. | Hidden (use `-v`) |
96
+
97
+ The default report shows only HIGH + MEDIUM.
98
+
99
+ ## Configuration
100
+
101
+ Create `.echoguard.yml` in your repo root (or run `echo-guard setup`):
102
+
103
+ ```yaml
104
+ threshold: 0.50
105
+ min_function_lines: 3
106
+ max_function_lines: 500
107
+ languages:
108
+ - python
109
+ - javascript
110
+ - typescript
111
+ fail_on: high # high, medium, low, none
112
+ enable_dep_graph: true
113
+
114
+ # Monorepo service boundaries (auto-detected if not set)
115
+ # service_boundaries:
116
+ # - services/worker
117
+ # - services/dashboard
118
+ ```
119
+
120
+ ### `.echoguardignore`
121
+
122
+ Gitignore-style file to exclude paths from scanning:
123
+
124
+ ```gitignore
125
+ docs/
126
+ tests/
127
+ *_generated.py
128
+ vendor/
129
+ ```
130
+
131
+ ### Recommended configs
132
+
133
+ **Small project** (< 500 functions):
134
+
135
+ ```yaml
136
+ threshold: 0.50
137
+ fail_on: high
138
+ enable_dep_graph: false
139
+ ```
140
+
141
+ **Large monorepo** (3K+ functions):
142
+
143
+ ```yaml
144
+ threshold: 0.60
145
+ min_function_lines: 4
146
+ fail_on: high
147
+ enable_dep_graph: true
148
+ service_boundaries:
149
+ - services/api
150
+ - services/worker
151
+ ```
152
+
153
+ **First-time setup** (advisory mode):
154
+
155
+ ```yaml
156
+ threshold: 0.50
157
+ fail_on: none
158
+ ```
159
+
160
+ ## CI Integration
161
+
162
+ Echo Guard exits non-zero when findings exceed the configured `fail_on` severity.
163
+
164
+ ```yaml
165
+ # GitHub Actions
166
+ - name: Check for redundant code
167
+ run: |
168
+ pip install "echo-guard[languages]"
169
+ echo-guard index
170
+ echo-guard scan
171
+ ```
172
+
173
+ ```bash
174
+ # Pre-commit hook (installed automatically)
175
+ echo-guard install-hook
176
+ ```
177
+
178
+ ## MCP Server (Claude Code Integration)
179
+
180
+ Echo Guard includes an MCP server so Claude Code can check for existing code before writing new functions:
181
+
182
+ ```bash
183
+ # Add to Claude Code
184
+ claude mcp add echo-guard -- python -m echo_guard.mcp_server
185
+ ```
186
+
187
+ | Tool | Description |
188
+ | ----------------------- | --------------------------------------------------------------------- |
189
+ | `check_before_write` | Pass proposed code, get back existing matches with import suggestions |
190
+ | `search_functions` | Search the index by function name, keyword, or language |
191
+ | `suggest_refactor` | Get full context for consolidating two redundant functions |
192
+ | `get_index_stats` | Index statistics and dependency graph info |
193
+ | `get_codebase_clusters` | View how the codebase is organized by domain |
194
+
195
+ ## Privacy
196
+
197
+ Echo Guard runs entirely on your machine. No code, metrics, or telemetry are sent anywhere. The index (`.echo-guard/index.duckdb`) contains function metadata from your repo — add it to your `.gitignore`.
198
+
199
+ ## License
200
+
201
+ [MIT](LICENSE)
@@ -0,0 +1,6 @@
1
+ """Echo Guard - Semantic linting for AI-generated code redundancy.
2
+
3
+ Supports: Python, JavaScript, TypeScript, Go, Rust, Java, Ruby, C, C++
4
+ """
5
+
6
+ __version__ = "0.2.0"