codeclone 1.0.0__py3-none-any.whl → 1.2.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- codeclone/__init__.py +16 -0
- codeclone/baseline.py +21 -9
- codeclone/blockhash.py +10 -1
- codeclone/blocks.py +26 -16
- codeclone/cache.py +20 -6
- codeclone/cfg.py +338 -0
- codeclone/cli.py +357 -93
- codeclone/extractor.py +92 -32
- codeclone/fingerprint.py +11 -1
- codeclone/html_report.py +936 -0
- codeclone/normalize.py +73 -26
- codeclone/report.py +29 -13
- codeclone/scanner.py +24 -4
- codeclone-1.2.0.dist-info/METADATA +264 -0
- codeclone-1.2.0.dist-info/RECORD +19 -0
- {codeclone-1.0.0.dist-info → codeclone-1.2.0.dist-info}/WHEEL +1 -1
- codeclone-1.0.0.dist-info/METADATA +0 -211
- codeclone-1.0.0.dist-info/RECORD +0 -17
- {codeclone-1.0.0.dist-info → codeclone-1.2.0.dist-info}/entry_points.txt +0 -0
- {codeclone-1.0.0.dist-info → codeclone-1.2.0.dist-info}/licenses/LICENSE +0 -0
- {codeclone-1.0.0.dist-info → codeclone-1.2.0.dist-info}/top_level.txt +0 -0
|
@@ -1,211 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: codeclone
|
|
3
|
-
Version: 1.0.0
|
|
4
|
-
Summary: AST-based code clone detector for Python focused on architectural duplication
|
|
5
|
-
Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
|
|
6
|
-
Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
|
|
7
|
-
License: MIT
|
|
8
|
-
Project-URL: Homepage, https://github.com/orenlab/codeclone
|
|
9
|
-
Project-URL: Repository, https://github.com/orenlab/codeclone
|
|
10
|
-
Project-URL: Issues, https://github.com/orenlab/codeclone/issues
|
|
11
|
-
Project-URL: Changelog, https://github.com/orenlab/codeclone/releases
|
|
12
|
-
Keywords: python,ast,code-clone,duplication,static-analysis,ci,architecture
|
|
13
|
-
Classifier: Development Status :: 5 - Production/Stable
|
|
14
|
-
Classifier: Intended Audience :: Developers
|
|
15
|
-
Classifier: Topic :: Software Development :: Quality Assurance
|
|
16
|
-
Classifier: Topic :: Software Development :: Code Generators
|
|
17
|
-
Classifier: Topic :: Software Development :: Testing
|
|
18
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
19
|
-
Classifier: Programming Language :: Python :: 3
|
|
20
|
-
Classifier: Programming Language :: Python :: 3.10
|
|
21
|
-
Classifier: Programming Language :: Python :: 3.11
|
|
22
|
-
Classifier: Programming Language :: Python :: 3.12
|
|
23
|
-
Classifier: Programming Language :: Python :: 3.13
|
|
24
|
-
Classifier: Operating System :: OS Independent
|
|
25
|
-
Requires-Python: >=3.10
|
|
26
|
-
Description-Content-Type: text/markdown
|
|
27
|
-
License-File: LICENSE
|
|
28
|
-
Provides-Extra: dev
|
|
29
|
-
Requires-Dist: pytest>=9.0.0; extra == "dev"
|
|
30
|
-
Requires-Dist: build>=1.2.0; extra == "dev"
|
|
31
|
-
Requires-Dist: twine>=5.0.0; extra == "dev"
|
|
32
|
-
Dynamic: license-file
|
|
33
|
-
|
|
34
|
-
# CodeClone
|
|
35
|
-
|
|
36
|
-
**CodeClone** is an AST-based code clone detector for Python, focused on **architectural duplication**, not simple
|
|
37
|
-
copy-paste.
|
|
38
|
-
|
|
39
|
-
It is designed to help teams:
|
|
40
|
-
|
|
41
|
-
- discover structural and logical code duplication,
|
|
42
|
-
- understand architectural hotspots,
|
|
43
|
-
- and prevent *new* duplication from entering the codebase via CI.
|
|
44
|
-
|
|
45
|
-
Unlike token- or text-based tools, CodeClone works on **normalized Python AST**, which makes it robust against renaming,
|
|
46
|
-
formatting, and minor refactoring.
|
|
47
|
-
|
|
48
|
-
---
|
|
49
|
-
|
|
50
|
-
## Why CodeClone?
|
|
51
|
-
|
|
52
|
-
Most existing tools detect *textual* duplication.
|
|
53
|
-
CodeClone detects **structural and block-level duplication** that usually indicates missing abstractions or
|
|
54
|
-
architectural drift.
|
|
55
|
-
|
|
56
|
-
Typical use cases:
|
|
57
|
-
|
|
58
|
-
- duplicated service logic across layers (API ↔ application),
|
|
59
|
-
- repeated validation or guard blocks,
|
|
60
|
-
- copy-pasted request/handler flows,
|
|
61
|
-
- duplicated orchestration logic in routers, handlers, or services.
|
|
62
|
-
|
|
63
|
-
---
|
|
64
|
-
|
|
65
|
-
## Features
|
|
66
|
-
|
|
67
|
-
### Function-level clone detection (Type-2)
|
|
68
|
-
|
|
69
|
-
- Detects functions and methods with identical structure.
|
|
70
|
-
- Robust to:
|
|
71
|
-
- variable renaming,
|
|
72
|
-
- constant changes,
|
|
73
|
-
- formatting differences.
|
|
74
|
-
- Ideal for spotting architectural duplication between layers.
|
|
75
|
-
|
|
76
|
-
### Block-level clone detection (Type-3-lite)
|
|
77
|
-
|
|
78
|
-
- Detects repeated **statement blocks** inside larger functions.
|
|
79
|
-
- Targets:
|
|
80
|
-
- validation blocks,
|
|
81
|
-
- guard clauses,
|
|
82
|
-
- repeated orchestration logic.
|
|
83
|
-
- Carefully filtered to avoid noise:
|
|
84
|
-
- no overlapping windows,
|
|
85
|
-
- no clones inside the same function,
|
|
86
|
-
- no `__init__` noise.
|
|
87
|
-
|
|
88
|
-
### Low-noise by design
|
|
89
|
-
|
|
90
|
-
- AST normalization instead of token matching.
|
|
91
|
-
- Size and statement-count thresholds.
|
|
92
|
-
- Conservative defaults tuned for real-world Python projects.
|
|
93
|
-
|
|
94
|
-
### CI-friendly baseline mode
|
|
95
|
-
|
|
96
|
-
- Establish a baseline of existing clones.
|
|
97
|
-
- Fail CI **only when new clones are introduced**.
|
|
98
|
-
- Safe for legacy codebases.
|
|
99
|
-
|
|
100
|
-
---
|
|
101
|
-
|
|
102
|
-
## Installation
|
|
103
|
-
|
|
104
|
-
```bash
|
|
105
|
-
pip install codeclone
|
|
106
|
-
```
|
|
107
|
-
|
|
108
|
-
Python 3.10+ is required.
|
|
109
|
-
|
|
110
|
-
⸻
|
|
111
|
-
|
|
112
|
-
Quick Start
|
|
113
|
-
|
|
114
|
-
Run on a project:
|
|
115
|
-
|
|
116
|
-
```bash
|
|
117
|
-
codeclone .
|
|
118
|
-
```
|
|
119
|
-
|
|
120
|
-
This will:
|
|
121
|
-
|
|
122
|
-
* scan Python files,
|
|
123
|
-
* detect function-level and block-level clones,
|
|
124
|
-
* print a summary to stdout.
|
|
125
|
-
|
|
126
|
-
Generate reports:
|
|
127
|
-
|
|
128
|
-
```bash
|
|
129
|
-
codeclone . \
|
|
130
|
-
--json-out .cache/codeclone/report.json \
|
|
131
|
-
--text-out .cache/codeclone/report.txt
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
⸻
|
|
135
|
-
|
|
136
|
-
Baseline Workflow (Recommended)
|
|
137
|
-
|
|
138
|
-
1. Create a baseline
|
|
139
|
-
|
|
140
|
-
Run once on your current codebase:
|
|
141
|
-
|
|
142
|
-
```bash
|
|
143
|
-
codeclone . --update-baseline
|
|
144
|
-
```
|
|
145
|
-
|
|
146
|
-
This creates a file:
|
|
147
|
-
|
|
148
|
-
```bash
|
|
149
|
-
.codeclone-baseline.json
|
|
150
|
-
```
|
|
151
|
-
|
|
152
|
-
Commit this file to the repository.
|
|
153
|
-
|
|
154
|
-
⸻
|
|
155
|
-
|
|
156
|
-
2. Use in CI
|
|
157
|
-
|
|
158
|
-
In CI, run:
|
|
159
|
-
|
|
160
|
-
```bash
|
|
161
|
-
codeclone . --fail-on-new
|
|
162
|
-
```
|
|
163
|
-
|
|
164
|
-
Behavior:
|
|
165
|
-
|
|
166
|
-
* ✅ existing clones are allowed,
|
|
167
|
-
* ❌ build fails if new function or block clones appear,
|
|
168
|
-
* ✅ refactoring that removes duplication is always allowed.
|
|
169
|
-
|
|
170
|
-
This enables gradual improvement without breaking existing development flow.
|
|
171
|
-
|
|
172
|
-
⸻
|
|
173
|
-
|
|
174
|
-
What CodeClone Is (and Is Not)
|
|
175
|
-
|
|
176
|
-
CodeClone is
|
|
177
|
-
|
|
178
|
-
* an architectural analysis tool,
|
|
179
|
-
* a duplication radar,
|
|
180
|
-
* a CI guard against copy-paste.
|
|
181
|
-
|
|
182
|
-
CodeClone is not
|
|
183
|
-
|
|
184
|
-
* a linter,
|
|
185
|
-
* a formatter,
|
|
186
|
-
* a replacement for SonarQube or static analyzers,
|
|
187
|
-
* a semantic equivalence prover.
|
|
188
|
-
|
|
189
|
-
It intentionally focuses on high-signal duplication.
|
|
190
|
-
|
|
191
|
-
⸻
|
|
192
|
-
|
|
193
|
-
How It Works (High Level)
|
|
194
|
-
|
|
195
|
-
* Parses Python source into AST.
|
|
196
|
-
* Normalizes:
|
|
197
|
-
- variable names,
|
|
198
|
-
- constants,
|
|
199
|
-
- attributes,
|
|
200
|
-
- docstrings and annotations.
|
|
201
|
-
* Computes stable structural fingerprints.
|
|
202
|
-
* Detects:
|
|
203
|
-
- identical function structures,
|
|
204
|
-
- repeated statement blocks across functions.
|
|
205
|
-
* Applies filters to suppress noise.
|
|
206
|
-
|
|
207
|
-
⸻
|
|
208
|
-
|
|
209
|
-
License
|
|
210
|
-
|
|
211
|
-
MIT License
|
codeclone-1.0.0.dist-info/RECORD
DELETED
|
@@ -1,17 +0,0 @@
|
|
|
1
|
-
codeclone/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
|
|
2
|
-
codeclone/baseline.py,sha256=1848Ugh4l3czOUIgXN68oPx_tu2nzbnnZQUixD1OXXA,1367
|
|
3
|
-
codeclone/blockhash.py,sha256=QewX6jSCc7Q2tDsBXicFGzaMzKPb2S6unMEZEvwuwDs,414
|
|
4
|
-
codeclone/blocks.py,sha256=6nXELQsH2OKl7ScNyLQiR7rMY2jfnsnGTt5_yXbwh3Y,1533
|
|
5
|
-
codeclone/cache.py,sha256=kiqfj5V3evW3hyhKMVqW7EFUiN9AO4mntFPUzfXAjsA,1156
|
|
6
|
-
codeclone/cli.py,sha256=uggGIVDw2QLQKKh5BsZYb2XGpe0ysEsQKx-7JDcepXA,4526
|
|
7
|
-
codeclone/extractor.py,sha256=ubMfYfM87F1apEmzBnnv9W4daY7Gv2nQHthiNoeTTno,2884
|
|
8
|
-
codeclone/fingerprint.py,sha256=pSucv648MGe6LwNczxJBbQjnAOcpHgkXSokaHwGr5zw,364
|
|
9
|
-
codeclone/normalize.py,sha256=hG__ZqJCtUMVIv7c_a9PHfzSVGDbrAIOH5JYXnLfuOk,2930
|
|
10
|
-
codeclone/report.py,sha256=Ptgne99-nsyvAGyJL3SsPNH9fQLorV2mVf--a0KXfxE,1639
|
|
11
|
-
codeclone/scanner.py,sha256=_xomEXvx1mLhMVRiMXW-gkBUV_9Z3GixFV5nK0Pqeq4,831
|
|
12
|
-
codeclone-1.0.0.dist-info/licenses/LICENSE,sha256=ndXAbunvN-jCQjgCaoBOF5AN4FcRlAa0R7hK1lWDuBU,1073
|
|
13
|
-
codeclone-1.0.0.dist-info/METADATA,sha256=F9-0TneuHJuI2USqeIgW0Ayrue4KVTL-5Um69vaN3-I,4993
|
|
14
|
-
codeclone-1.0.0.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
|
15
|
-
codeclone-1.0.0.dist-info/entry_points.txt,sha256=_MI9DVTLOmv3OlxpyogdOmMAduiLVIdHeOlZ_KBsrIY,49
|
|
16
|
-
codeclone-1.0.0.dist-info/top_level.txt,sha256=4tQa_d-4Zle-qV9KmNDkWq0WHYgZsW9vdaeF30rNntg,10
|
|
17
|
-
codeclone-1.0.0.dist-info/RECORD,,
|
|
File without changes
|
|
File without changes
|
|
File without changes
|