codeclone 1.3.0__tar.gz → 1.4.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (77) hide show
  1. codeclone-1.4.1/PKG-INFO +388 -0
  2. codeclone-1.4.1/README.md +347 -0
  3. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_cli_args.py +22 -3
  4. codeclone-1.4.1/codeclone/_cli_meta.py +93 -0
  5. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_cli_paths.py +15 -3
  6. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_cli_summary.py +16 -2
  7. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_html_escape.py +3 -4
  8. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_html_snippets.py +11 -4
  9. codeclone-1.4.1/codeclone/_report_blocks.py +94 -0
  10. codeclone-1.4.1/codeclone/_report_explain.py +251 -0
  11. codeclone-1.4.1/codeclone/_report_explain_contract.py +48 -0
  12. codeclone-1.4.1/codeclone/_report_serialize.py +418 -0
  13. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_report_types.py +4 -0
  14. codeclone-1.4.1/codeclone/baseline.py +648 -0
  15. codeclone-1.4.1/codeclone/cache.py +836 -0
  16. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/cli.py +290 -258
  17. codeclone-1.4.1/codeclone/contracts.py +64 -0
  18. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/errors.py +1 -1
  19. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/extractor.py +22 -3
  20. codeclone-1.4.1/codeclone/html_report.py +789 -0
  21. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/report.py +5 -0
  22. codeclone-1.4.1/codeclone/templates.py +3103 -0
  23. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/ui_messages.py +121 -55
  24. codeclone-1.4.1/codeclone.egg-info/PKG-INFO +388 -0
  25. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/SOURCES.txt +6 -0
  26. {codeclone-1.3.0 → codeclone-1.4.1}/pyproject.toml +1 -1
  27. codeclone-1.4.1/tests/test_baseline.py +876 -0
  28. codeclone-1.4.1/tests/test_cache.py +875 -0
  29. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_inprocess.py +891 -190
  30. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_unit.py +117 -10
  31. codeclone-1.4.1/tests/test_detector_golden.py +61 -0
  32. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_extractor.py +86 -2
  33. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_html_report.py +428 -42
  34. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_report.py +574 -30
  35. codeclone-1.4.1/tests/test_report_explain.py +229 -0
  36. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_security.py +2 -0
  37. codeclone-1.3.0/PKG-INFO +0 -402
  38. codeclone-1.3.0/README.md +0 -361
  39. codeclone-1.3.0/codeclone/_cli_meta.py +0 -43
  40. codeclone-1.3.0/codeclone/_report_serialize.py +0 -160
  41. codeclone-1.3.0/codeclone/baseline.py +0 -245
  42. codeclone-1.3.0/codeclone/cache.py +0 -285
  43. codeclone-1.3.0/codeclone/html_report.py +0 -434
  44. codeclone-1.3.0/codeclone/templates.py +0 -2457
  45. codeclone-1.3.0/codeclone.egg-info/PKG-INFO +0 -402
  46. codeclone-1.3.0/tests/test_baseline.py +0 -296
  47. codeclone-1.3.0/tests/test_cache.py +0 -437
  48. {codeclone-1.3.0 → codeclone-1.4.1}/LICENSE +0 -0
  49. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/__init__.py +0 -0
  50. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_report_grouping.py +0 -0
  51. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_report_segments.py +0 -0
  52. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/blockhash.py +0 -0
  53. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/blocks.py +0 -0
  54. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/cfg.py +0 -0
  55. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/cfg_model.py +0 -0
  56. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/fingerprint.py +0 -0
  57. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/meta_markers.py +0 -0
  58. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/normalize.py +0 -0
  59. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/py.typed +0 -0
  60. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/scanner.py +0 -0
  61. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/dependency_links.txt +0 -0
  62. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/entry_points.txt +0 -0
  63. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/requires.txt +0 -0
  64. {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/top_level.txt +0 -0
  65. {codeclone-1.3.0 → codeclone-1.4.1}/setup.cfg +0 -0
  66. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_blockhash.py +0 -0
  67. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_blocks.py +0 -0
  68. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cfg.py +0 -0
  69. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cfg_model.py +0 -0
  70. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_main_guard.py +0 -0
  71. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_main_guard_runpy.py +0 -0
  72. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_smoke.py +0 -0
  73. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_fingerprint.py +0 -0
  74. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_init.py +0 -0
  75. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_normalize.py +0 -0
  76. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_scanner_extra.py +0 -0
  77. {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_segments.py +0 -0
@@ -0,0 +1,388 @@
1
+ Metadata-Version: 2.4
2
+ Name: codeclone
3
+ Version: 1.4.1
4
+ Summary: AST and CFG-based code clone detector for Python focused on architectural duplication
5
+ Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
6
+ Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
7
+ License: MIT
8
+ Project-URL: Homepage, https://github.com/orenlab/codeclone
9
+ Project-URL: Repository, https://github.com/orenlab/codeclone
10
+ Project-URL: Issues, https://github.com/orenlab/codeclone/issues
11
+ Project-URL: Changelog, https://github.com/orenlab/codeclone/releases
12
+ Project-URL: Documentation, https://github.com/orenlab/codeclone/tree/main/docs
13
+ Keywords: python,ast,cfg,code-clone,duplication,static-analysis,architecture,control-flow,ci
14
+ Classifier: Development Status :: 5 - Production/Stable
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: Topic :: Software Development :: Quality Assurance
17
+ Classifier: Topic :: Software Development :: Testing
18
+ Classifier: Typing :: Typed
19
+ Classifier: License :: OSI Approved :: MIT License
20
+ Classifier: Programming Language :: Python :: 3
21
+ Classifier: Programming Language :: Python :: 3.10
22
+ Classifier: Programming Language :: Python :: 3.11
23
+ Classifier: Programming Language :: Python :: 3.12
24
+ Classifier: Programming Language :: Python :: 3.13
25
+ Classifier: Programming Language :: Python :: 3.14
26
+ Classifier: Operating System :: OS Independent
27
+ Requires-Python: >=3.10
28
+ Description-Content-Type: text/markdown
29
+ License-File: LICENSE
30
+ Requires-Dist: pygments>=2.19.2
31
+ Requires-Dist: rich>=14.3.2
32
+ Provides-Extra: dev
33
+ Requires-Dist: pytest>=9.0.0; extra == "dev"
34
+ Requires-Dist: pytest-cov>=6.1.0; extra == "dev"
35
+ Requires-Dist: build>=1.2.0; extra == "dev"
36
+ Requires-Dist: twine>=5.0.0; extra == "dev"
37
+ Requires-Dist: mypy>=1.19.1; extra == "dev"
38
+ Requires-Dist: ruff>=0.15.0; extra == "dev"
39
+ Requires-Dist: pre-commit>=4.5.1; extra == "dev"
40
+ Dynamic: license-file
41
+
42
+ # CodeClone
43
+
44
+ [![PyPI](https://img.shields.io/pypi/v/codeclone.svg?style=flat-square)](https://pypi.org/project/codeclone/)
45
+ [![Downloads](https://img.shields.io/pypi/dm/codeclone.svg?style=flat-square)](https://pypi.org/project/codeclone/)
46
+ [![tests](https://github.com/orenlab/codeclone/actions/workflows/tests.yml/badge.svg?branch=main&style=flat-square)](https://github.com/orenlab/codeclone/actions/workflows/tests.yml)
47
+ [![Python](https://img.shields.io/pypi/pyversions/codeclone.svg?style=flat-square)](https://pypi.org/project/codeclone/)
48
+ ![CI First](https://img.shields.io/badge/CI-first-green?style=flat-square)
49
+ ![Baseline](https://img.shields.io/badge/baseline-versioned-green?style=flat-square)
50
+ [![License](https://img.shields.io/pypi/l/codeclone.svg?style=flat-square)](LICENSE)
51
+
52
+ **CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
53
+ It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
54
+
55
+ ---
56
+
57
+ ## Why CodeClone
58
+
59
+ CodeClone focuses on **architectural duplication**, not text similarity. It detects structural patterns through:
60
+
61
+ - **Normalized AST analysis** — robust to renaming, formatting, and minor refactors
62
+ - **Control Flow Graphs** — captures execution logic, not just syntax
63
+ - **Strict, explainable matching** — clear signals, not fuzzy heuristics
64
+
65
+ Unlike token-based tools, CodeClone compares **structure and control flow**, making it ideal for finding:
66
+
67
+ - Repeated service/orchestration patterns
68
+ - Duplicated guard/validation blocks
69
+ - Copy-pasted handler logic across modules
70
+ - Recurring internal segments in large functions
71
+
72
+ ---
73
+
74
+ ## Core Capabilities
75
+
76
+ **Three Detection Levels:**
77
+
78
+ 1. **Function clones (CFG fingerprint)**
79
+ Strong structural signal for cross-layer duplication
80
+
81
+ 2. **Block clones (statement windows)**
82
+ Detects repeated local logic patterns
83
+
84
+ 3. **Segment clones (report-only)**
85
+ Internal function repetition for explainability; not used for baseline gating
86
+
87
+ **CI-Ready Features:**
88
+
89
+ - Deterministic output with stable ordering
90
+ - Reproducible artifacts for audit trails
91
+ - Baseline-driven gating to prevent new duplication
92
+ - Fast incremental analysis with intelligent caching
93
+
94
+ ---
95
+
96
+ ## Installation
97
+
98
+ ```bash
99
+ pip install codeclone
100
+ ```
101
+
102
+ **Requirements:** Python 3.10+
103
+
104
+ ---
105
+
106
+ ## Quick Start
107
+
108
+ ### Basic Analysis
109
+
110
+ ```bash
111
+ # Analyze current directory
112
+ codeclone .
113
+
114
+ # Check version
115
+ codeclone --version
116
+ ```
117
+
118
+ ### Generate Reports
119
+
120
+ ```bash
121
+ codeclone . \
122
+ --html .cache/codeclone/report.html \
123
+ --json .cache/codeclone/report.json \
124
+ --text .cache/codeclone/report.txt
125
+ ```
126
+
127
+ ### CI Integration
128
+
129
+ ```bash
130
+ # 1. Generate baseline once (commit to repo)
131
+ codeclone . --update-baseline
132
+
133
+ # 2. Add to CI pipeline
134
+ codeclone . --ci
135
+ ```
136
+
137
+ The `--ci` preset is equivalent to `--fail-on-new --no-color --quiet`.
138
+
139
+ ---
140
+
141
+ ## Baseline Workflow
142
+
143
+ Baselines capture the **current state of duplication** in your codebase. Once committed, they serve as the reference
144
+ point for CI checks.
145
+
146
+ **Key points (contract-level):**
147
+
148
+ - Baseline file is versioned (`codeclone.baseline.json`) and used to classify clones as **NEW** vs **KNOWN**.
149
+ - Compatibility is gated by `schema_version`, `fingerprint_version`, and `python_tag`.
150
+ - Baseline trust is gated by `meta.generator.name` (`codeclone`) and integrity (`payload_sha256`).
151
+ - In CI preset (`--ci`), an untrusted baseline is a contract error (exit `2`).
152
+
153
+ Full contract details: [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
154
+
155
+ ---
156
+
157
+ ## Exit Codes
158
+
159
+ CodeClone uses a deterministic exit code contract:
160
+
161
+ | Code | Meaning |
162
+ |------|-----------------------------------------------------------------------------|
163
+ | `0` | Success — run completed without gating failures |
164
+ | `2` | Contract error — baseline missing/untrusted, invalid output extensions, incompatible versions, unreadable source files in CI/gating |
165
+ | `3` | Gating failure — new clones detected or threshold exceeded |
166
+ | `5` | Internal error — unexpected exception |
167
+
168
+ **Priority:** Contract errors (`2`) override gating failures (`3`) when both occur.
169
+
170
+ Full contract details: [`docs/book/03-contracts-exit-codes.md`](docs/book/03-contracts-exit-codes.md)
171
+
172
+ **Debug Support:**
173
+
174
+ ```bash
175
+ # Show detailed error information
176
+ codeclone . --debug
177
+
178
+ # Or via environment variable
179
+ CODECLONE_DEBUG=1 codeclone .
180
+ ```
181
+
182
+ ---
183
+
184
+ ## Reports
185
+
186
+ ### Supported Formats
187
+
188
+ - **HTML** (`--html`) — Interactive web report with filtering
189
+ - **JSON** (`--json`) — Machine-readable structured data
190
+ - **Text** (`--text`) — Plain text summary
191
+
192
+ ### Report Schema (JSON v1.1)
193
+
194
+ The JSON report uses a compact deterministic layout:
195
+
196
+ - Top-level: `meta`, `files`, `groups`, `groups_split`, `group_item_layout`
197
+ - Optional top-level: `facts`
198
+ - `groups_split` provides explicit **NEW / KNOWN** separation per section
199
+ - `meta.groups_counts` provides deterministic per-section aggregates
200
+ - `meta` follows a shared canonical contract across HTML/JSON/TXT
201
+
202
+ Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
203
+
204
+ **Minimal shape (v1.1):**
205
+
206
+ ```json
207
+ {
208
+ "meta": {
209
+ "report_schema_version": "1.1",
210
+ "codeclone_version": "1.4.0",
211
+ "python_version": "3.13",
212
+ "python_tag": "cp313",
213
+ "baseline_path": "/path/to/codeclone.baseline.json",
214
+ "baseline_fingerprint_version": "1",
215
+ "baseline_schema_version": "1.0",
216
+ "baseline_python_tag": "cp313",
217
+ "baseline_generator_name": "codeclone",
218
+ "baseline_generator_version": "1.4.0",
219
+ "baseline_payload_sha256": "<sha256>",
220
+ "baseline_payload_sha256_verified": true,
221
+ "baseline_loaded": true,
222
+ "baseline_status": "ok",
223
+ "cache_path": "/path/to/.cache/codeclone/cache.json",
224
+ "cache_used": true,
225
+ "cache_status": "ok",
226
+ "cache_schema_version": "1.2",
227
+ "files_skipped_source_io": 0,
228
+ "groups_counts": {
229
+ "functions": {
230
+ "total": 0,
231
+ "new": 0,
232
+ "known": 0
233
+ },
234
+ "blocks": {
235
+ "total": 0,
236
+ "new": 0,
237
+ "known": 0
238
+ },
239
+ "segments": {
240
+ "total": 0,
241
+ "new": 0,
242
+ "known": 0
243
+ }
244
+ }
245
+ },
246
+ "files": [],
247
+ "groups": {
248
+ "functions": {},
249
+ "blocks": {},
250
+ "segments": {}
251
+ },
252
+ "groups_split": {
253
+ "functions": {
254
+ "new": [],
255
+ "known": []
256
+ },
257
+ "blocks": {
258
+ "new": [],
259
+ "known": []
260
+ },
261
+ "segments": {
262
+ "new": [],
263
+ "known": []
264
+ }
265
+ },
266
+ "group_item_layout": {
267
+ "functions": [
268
+ "file_i",
269
+ "qualname",
270
+ "start",
271
+ "end",
272
+ "loc",
273
+ "stmt_count",
274
+ "fingerprint",
275
+ "loc_bucket"
276
+ ],
277
+ "blocks": [
278
+ "file_i",
279
+ "qualname",
280
+ "start",
281
+ "end",
282
+ "size"
283
+ ],
284
+ "segments": [
285
+ "file_i",
286
+ "qualname",
287
+ "start",
288
+ "end",
289
+ "size",
290
+ "segment_hash",
291
+ "segment_sig"
292
+ ]
293
+ },
294
+ "facts": {
295
+ "blocks": {}
296
+ }
297
+ }
298
+ ```
299
+
300
+ ---
301
+
302
+ ## Cache
303
+
304
+ Cache is an optimization layer only and is never a source of truth.
305
+
306
+ - Default path: `<root>/.cache/codeclone/cache.json`
307
+ - Schema version: **v1.2**
308
+ - Invalid or oversized cache is ignored with warning and rebuilt (fail-open)
309
+
310
+ Full contract details: [`docs/book/07-cache.md`](docs/book/07-cache.md)
311
+
312
+ ---
313
+
314
+ ## Pre-commit Integration
315
+
316
+ ```yaml
317
+ repos:
318
+ - repo: local
319
+ hooks:
320
+ - id: codeclone
321
+ name: CodeClone
322
+ entry: codeclone
323
+ language: system
324
+ pass_filenames: false
325
+ args: [ ".", "--ci" ]
326
+ types: [ python ]
327
+ ```
328
+
329
+ ---
330
+
331
+ ## What CodeClone Is (and Is Not)
332
+
333
+ ### CodeClone Is
334
+
335
+ - A structural clone detector for Python
336
+ - A CI guard against new duplication
337
+ - A deterministic analysis tool with auditable outputs
338
+
339
+ ### CodeClone Is Not
340
+
341
+ - A linter or code formatter
342
+ - A semantic equivalence prover
343
+ - A runtime execution analyzer
344
+
345
+ ---
346
+
347
+ ## How It Works
348
+
349
+ **High-level Pipeline:**
350
+
351
+ 1. **Parse** — Python source → AST
352
+ 2. **Normalize** — AST → canonical structure
353
+ 3. **CFG Construction** — per-function control flow graph
354
+ 4. **Fingerprinting** — stable hash computation
355
+ 5. **Grouping** — function/block/segment clone groups
356
+ 6. **Determinism** — stable ordering for reproducibility
357
+ 7. **Baseline Comparison** — new vs known clones (when requested)
358
+
359
+ Learn more:
360
+
361
+ - Architecture: [`docs/architecture.md`](docs/architecture.md)
362
+ - CFG semantics: [`docs/cfg.md`](docs/cfg.md)
363
+
364
+ ---
365
+
366
+ ## Documentation Map
367
+
368
+ Use this map to pick the right level of detail:
369
+
370
+ - **Contract book (canonical contracts/specs):** [`docs/book/`](docs/book/)
371
+ - Start here: [`docs/book/00-intro.md`](docs/book/00-intro.md)
372
+ - Exit codes and precedence: [`docs/book/03-contracts-exit-codes.md`](docs/book/03-contracts-exit-codes.md)
373
+ - Baseline contract (schema/trust/integrity): [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
374
+ - Cache contract (schema/integrity/fail-open): [`docs/book/07-cache.md`](docs/book/07-cache.md)
375
+ - Report contract (schema v1.1 + NEW/KNOWN split): [`docs/book/08-report.md`](docs/book/08-report.md)
376
+ - CLI behavior: [`docs/book/09-cli.md`](docs/book/09-cli.md)
377
+ - HTML rendering: [`docs/book/10-html-render.md`](docs/book/10-html-render.md)
378
+ - Determinism policy: [`docs/book/12-determinism.md`](docs/book/12-determinism.md)
379
+ - Compatibility/versioning rules: [
380
+ `docs/book/14-compatibility-and-versioning.md`](docs/book/14-compatibility-and-versioning.md)
381
+ - **Deep dives:**
382
+ - Architecture narrative: [`docs/architecture.md`](docs/architecture.md)
383
+ - CFG semantics: [`docs/cfg.md`](docs/cfg.md)
384
+
385
+ ## Links
386
+
387
+ - **Issues:** <https://github.com/orenlab/codeclone/issues>
388
+ - **PyPI:** <https://pypi.org/project/codeclone/>