codeclone 1.3.0__tar.gz → 1.4.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- codeclone-1.4.1/PKG-INFO +388 -0
- codeclone-1.4.1/README.md +347 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_cli_args.py +22 -3
- codeclone-1.4.1/codeclone/_cli_meta.py +93 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_cli_paths.py +15 -3
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_cli_summary.py +16 -2
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_html_escape.py +3 -4
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_html_snippets.py +11 -4
- codeclone-1.4.1/codeclone/_report_blocks.py +94 -0
- codeclone-1.4.1/codeclone/_report_explain.py +251 -0
- codeclone-1.4.1/codeclone/_report_explain_contract.py +48 -0
- codeclone-1.4.1/codeclone/_report_serialize.py +418 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_report_types.py +4 -0
- codeclone-1.4.1/codeclone/baseline.py +648 -0
- codeclone-1.4.1/codeclone/cache.py +836 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/cli.py +290 -258
- codeclone-1.4.1/codeclone/contracts.py +64 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/errors.py +1 -1
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/extractor.py +22 -3
- codeclone-1.4.1/codeclone/html_report.py +789 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/report.py +5 -0
- codeclone-1.4.1/codeclone/templates.py +3103 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/ui_messages.py +121 -55
- codeclone-1.4.1/codeclone.egg-info/PKG-INFO +388 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/SOURCES.txt +6 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/pyproject.toml +1 -1
- codeclone-1.4.1/tests/test_baseline.py +876 -0
- codeclone-1.4.1/tests/test_cache.py +875 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_inprocess.py +891 -190
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_unit.py +117 -10
- codeclone-1.4.1/tests/test_detector_golden.py +61 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_extractor.py +86 -2
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_html_report.py +428 -42
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_report.py +574 -30
- codeclone-1.4.1/tests/test_report_explain.py +229 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_security.py +2 -0
- codeclone-1.3.0/PKG-INFO +0 -402
- codeclone-1.3.0/README.md +0 -361
- codeclone-1.3.0/codeclone/_cli_meta.py +0 -43
- codeclone-1.3.0/codeclone/_report_serialize.py +0 -160
- codeclone-1.3.0/codeclone/baseline.py +0 -245
- codeclone-1.3.0/codeclone/cache.py +0 -285
- codeclone-1.3.0/codeclone/html_report.py +0 -434
- codeclone-1.3.0/codeclone/templates.py +0 -2457
- codeclone-1.3.0/codeclone.egg-info/PKG-INFO +0 -402
- codeclone-1.3.0/tests/test_baseline.py +0 -296
- codeclone-1.3.0/tests/test_cache.py +0 -437
- {codeclone-1.3.0 → codeclone-1.4.1}/LICENSE +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/__init__.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_report_grouping.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/_report_segments.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/blockhash.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/blocks.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/cfg.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/cfg_model.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/fingerprint.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/meta_markers.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/normalize.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/py.typed +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone/scanner.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/dependency_links.txt +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/entry_points.txt +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/requires.txt +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/codeclone.egg-info/top_level.txt +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/setup.cfg +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_blockhash.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_blocks.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cfg.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cfg_model.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_main_guard.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_main_guard_runpy.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_cli_smoke.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_fingerprint.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_init.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_normalize.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_scanner_extra.py +0 -0
- {codeclone-1.3.0 → codeclone-1.4.1}/tests/test_segments.py +0 -0
codeclone-1.4.1/PKG-INFO
ADDED
|
@@ -0,0 +1,388 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: codeclone
|
|
3
|
+
Version: 1.4.1
|
|
4
|
+
Summary: AST and CFG-based code clone detector for Python focused on architectural duplication
|
|
5
|
+
Author-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
|
|
6
|
+
Maintainer-email: Den Rozhnovskiy <pytelemonbot@mail.ru>
|
|
7
|
+
License: MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/orenlab/codeclone
|
|
9
|
+
Project-URL: Repository, https://github.com/orenlab/codeclone
|
|
10
|
+
Project-URL: Issues, https://github.com/orenlab/codeclone/issues
|
|
11
|
+
Project-URL: Changelog, https://github.com/orenlab/codeclone/releases
|
|
12
|
+
Project-URL: Documentation, https://github.com/orenlab/codeclone/tree/main/docs
|
|
13
|
+
Keywords: python,ast,cfg,code-clone,duplication,static-analysis,architecture,control-flow,ci
|
|
14
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
15
|
+
Classifier: Intended Audience :: Developers
|
|
16
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
17
|
+
Classifier: Topic :: Software Development :: Testing
|
|
18
|
+
Classifier: Typing :: Typed
|
|
19
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
20
|
+
Classifier: Programming Language :: Python :: 3
|
|
21
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
22
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
23
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
24
|
+
Classifier: Programming Language :: Python :: 3.13
|
|
25
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
26
|
+
Classifier: Operating System :: OS Independent
|
|
27
|
+
Requires-Python: >=3.10
|
|
28
|
+
Description-Content-Type: text/markdown
|
|
29
|
+
License-File: LICENSE
|
|
30
|
+
Requires-Dist: pygments>=2.19.2
|
|
31
|
+
Requires-Dist: rich>=14.3.2
|
|
32
|
+
Provides-Extra: dev
|
|
33
|
+
Requires-Dist: pytest>=9.0.0; extra == "dev"
|
|
34
|
+
Requires-Dist: pytest-cov>=6.1.0; extra == "dev"
|
|
35
|
+
Requires-Dist: build>=1.2.0; extra == "dev"
|
|
36
|
+
Requires-Dist: twine>=5.0.0; extra == "dev"
|
|
37
|
+
Requires-Dist: mypy>=1.19.1; extra == "dev"
|
|
38
|
+
Requires-Dist: ruff>=0.15.0; extra == "dev"
|
|
39
|
+
Requires-Dist: pre-commit>=4.5.1; extra == "dev"
|
|
40
|
+
Dynamic: license-file
|
|
41
|
+
|
|
42
|
+
# CodeClone
|
|
43
|
+
|
|
44
|
+
[](https://pypi.org/project/codeclone/)
|
|
45
|
+
[](https://pypi.org/project/codeclone/)
|
|
46
|
+
[](https://github.com/orenlab/codeclone/actions/workflows/tests.yml)
|
|
47
|
+
[](https://pypi.org/project/codeclone/)
|
|
48
|
+

|
|
49
|
+

|
|
50
|
+
[](LICENSE)
|
|
51
|
+
|
|
52
|
+
**CodeClone** is a Python code clone detector based on **normalized AST and Control Flow Graphs (CFG)**.
|
|
53
|
+
It discovers architectural duplication and prevents new copy-paste from entering your codebase via CI.
|
|
54
|
+
|
|
55
|
+
---
|
|
56
|
+
|
|
57
|
+
## Why CodeClone
|
|
58
|
+
|
|
59
|
+
CodeClone focuses on **architectural duplication**, not text similarity. It detects structural patterns through:
|
|
60
|
+
|
|
61
|
+
- **Normalized AST analysis** — robust to renaming, formatting, and minor refactors
|
|
62
|
+
- **Control Flow Graphs** — captures execution logic, not just syntax
|
|
63
|
+
- **Strict, explainable matching** — clear signals, not fuzzy heuristics
|
|
64
|
+
|
|
65
|
+
Unlike token-based tools, CodeClone compares **structure and control flow**, making it ideal for finding:
|
|
66
|
+
|
|
67
|
+
- Repeated service/orchestration patterns
|
|
68
|
+
- Duplicated guard/validation blocks
|
|
69
|
+
- Copy-pasted handler logic across modules
|
|
70
|
+
- Recurring internal segments in large functions
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## Core Capabilities
|
|
75
|
+
|
|
76
|
+
**Three Detection Levels:**
|
|
77
|
+
|
|
78
|
+
1. **Function clones (CFG fingerprint)**
|
|
79
|
+
Strong structural signal for cross-layer duplication
|
|
80
|
+
|
|
81
|
+
2. **Block clones (statement windows)**
|
|
82
|
+
Detects repeated local logic patterns
|
|
83
|
+
|
|
84
|
+
3. **Segment clones (report-only)**
|
|
85
|
+
Internal function repetition for explainability; not used for baseline gating
|
|
86
|
+
|
|
87
|
+
**CI-Ready Features:**
|
|
88
|
+
|
|
89
|
+
- Deterministic output with stable ordering
|
|
90
|
+
- Reproducible artifacts for audit trails
|
|
91
|
+
- Baseline-driven gating to prevent new duplication
|
|
92
|
+
- Fast incremental analysis with intelligent caching
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## Installation
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
pip install codeclone
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
**Requirements:** Python 3.10+
|
|
103
|
+
|
|
104
|
+
---
|
|
105
|
+
|
|
106
|
+
## Quick Start
|
|
107
|
+
|
|
108
|
+
### Basic Analysis
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
# Analyze current directory
|
|
112
|
+
codeclone .
|
|
113
|
+
|
|
114
|
+
# Check version
|
|
115
|
+
codeclone --version
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Generate Reports
|
|
119
|
+
|
|
120
|
+
```bash
|
|
121
|
+
codeclone . \
|
|
122
|
+
--html .cache/codeclone/report.html \
|
|
123
|
+
--json .cache/codeclone/report.json \
|
|
124
|
+
--text .cache/codeclone/report.txt
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
### CI Integration
|
|
128
|
+
|
|
129
|
+
```bash
|
|
130
|
+
# 1. Generate baseline once (commit to repo)
|
|
131
|
+
codeclone . --update-baseline
|
|
132
|
+
|
|
133
|
+
# 2. Add to CI pipeline
|
|
134
|
+
codeclone . --ci
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
The `--ci` preset is equivalent to `--fail-on-new --no-color --quiet`.
|
|
138
|
+
|
|
139
|
+
---
|
|
140
|
+
|
|
141
|
+
## Baseline Workflow
|
|
142
|
+
|
|
143
|
+
Baselines capture the **current state of duplication** in your codebase. Once committed, they serve as the reference
|
|
144
|
+
point for CI checks.
|
|
145
|
+
|
|
146
|
+
**Key points (contract-level):**
|
|
147
|
+
|
|
148
|
+
- Baseline file is versioned (`codeclone.baseline.json`) and used to classify clones as **NEW** vs **KNOWN**.
|
|
149
|
+
- Compatibility is gated by `schema_version`, `fingerprint_version`, and `python_tag`.
|
|
150
|
+
- Baseline trust is gated by `meta.generator.name` (`codeclone`) and integrity (`payload_sha256`).
|
|
151
|
+
- In CI preset (`--ci`), an untrusted baseline is a contract error (exit `2`).
|
|
152
|
+
|
|
153
|
+
Full contract details: [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
|
|
154
|
+
|
|
155
|
+
---
|
|
156
|
+
|
|
157
|
+
## Exit Codes
|
|
158
|
+
|
|
159
|
+
CodeClone uses a deterministic exit code contract:
|
|
160
|
+
|
|
161
|
+
| Code | Meaning |
|
|
162
|
+
|------|-----------------------------------------------------------------------------|
|
|
163
|
+
| `0` | Success — run completed without gating failures |
|
|
164
|
+
| `2` | Contract error — baseline missing/untrusted, invalid output extensions, incompatible versions, unreadable source files in CI/gating |
|
|
165
|
+
| `3` | Gating failure — new clones detected or threshold exceeded |
|
|
166
|
+
| `5` | Internal error — unexpected exception |
|
|
167
|
+
|
|
168
|
+
**Priority:** Contract errors (`2`) override gating failures (`3`) when both occur.
|
|
169
|
+
|
|
170
|
+
Full contract details: [`docs/book/03-contracts-exit-codes.md`](docs/book/03-contracts-exit-codes.md)
|
|
171
|
+
|
|
172
|
+
**Debug Support:**
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
# Show detailed error information
|
|
176
|
+
codeclone . --debug
|
|
177
|
+
|
|
178
|
+
# Or via environment variable
|
|
179
|
+
CODECLONE_DEBUG=1 codeclone .
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
---
|
|
183
|
+
|
|
184
|
+
## Reports
|
|
185
|
+
|
|
186
|
+
### Supported Formats
|
|
187
|
+
|
|
188
|
+
- **HTML** (`--html`) — Interactive web report with filtering
|
|
189
|
+
- **JSON** (`--json`) — Machine-readable structured data
|
|
190
|
+
- **Text** (`--text`) — Plain text summary
|
|
191
|
+
|
|
192
|
+
### Report Schema (JSON v1.1)
|
|
193
|
+
|
|
194
|
+
The JSON report uses a compact deterministic layout:
|
|
195
|
+
|
|
196
|
+
- Top-level: `meta`, `files`, `groups`, `groups_split`, `group_item_layout`
|
|
197
|
+
- Optional top-level: `facts`
|
|
198
|
+
- `groups_split` provides explicit **NEW / KNOWN** separation per section
|
|
199
|
+
- `meta.groups_counts` provides deterministic per-section aggregates
|
|
200
|
+
- `meta` follows a shared canonical contract across HTML/JSON/TXT
|
|
201
|
+
|
|
202
|
+
Canonical report contract: [`docs/book/08-report.md`](docs/book/08-report.md)
|
|
203
|
+
|
|
204
|
+
**Minimal shape (v1.1):**
|
|
205
|
+
|
|
206
|
+
```json
|
|
207
|
+
{
|
|
208
|
+
"meta": {
|
|
209
|
+
"report_schema_version": "1.1",
|
|
210
|
+
"codeclone_version": "1.4.0",
|
|
211
|
+
"python_version": "3.13",
|
|
212
|
+
"python_tag": "cp313",
|
|
213
|
+
"baseline_path": "/path/to/codeclone.baseline.json",
|
|
214
|
+
"baseline_fingerprint_version": "1",
|
|
215
|
+
"baseline_schema_version": "1.0",
|
|
216
|
+
"baseline_python_tag": "cp313",
|
|
217
|
+
"baseline_generator_name": "codeclone",
|
|
218
|
+
"baseline_generator_version": "1.4.0",
|
|
219
|
+
"baseline_payload_sha256": "<sha256>",
|
|
220
|
+
"baseline_payload_sha256_verified": true,
|
|
221
|
+
"baseline_loaded": true,
|
|
222
|
+
"baseline_status": "ok",
|
|
223
|
+
"cache_path": "/path/to/.cache/codeclone/cache.json",
|
|
224
|
+
"cache_used": true,
|
|
225
|
+
"cache_status": "ok",
|
|
226
|
+
"cache_schema_version": "1.2",
|
|
227
|
+
"files_skipped_source_io": 0,
|
|
228
|
+
"groups_counts": {
|
|
229
|
+
"functions": {
|
|
230
|
+
"total": 0,
|
|
231
|
+
"new": 0,
|
|
232
|
+
"known": 0
|
|
233
|
+
},
|
|
234
|
+
"blocks": {
|
|
235
|
+
"total": 0,
|
|
236
|
+
"new": 0,
|
|
237
|
+
"known": 0
|
|
238
|
+
},
|
|
239
|
+
"segments": {
|
|
240
|
+
"total": 0,
|
|
241
|
+
"new": 0,
|
|
242
|
+
"known": 0
|
|
243
|
+
}
|
|
244
|
+
}
|
|
245
|
+
},
|
|
246
|
+
"files": [],
|
|
247
|
+
"groups": {
|
|
248
|
+
"functions": {},
|
|
249
|
+
"blocks": {},
|
|
250
|
+
"segments": {}
|
|
251
|
+
},
|
|
252
|
+
"groups_split": {
|
|
253
|
+
"functions": {
|
|
254
|
+
"new": [],
|
|
255
|
+
"known": []
|
|
256
|
+
},
|
|
257
|
+
"blocks": {
|
|
258
|
+
"new": [],
|
|
259
|
+
"known": []
|
|
260
|
+
},
|
|
261
|
+
"segments": {
|
|
262
|
+
"new": [],
|
|
263
|
+
"known": []
|
|
264
|
+
}
|
|
265
|
+
},
|
|
266
|
+
"group_item_layout": {
|
|
267
|
+
"functions": [
|
|
268
|
+
"file_i",
|
|
269
|
+
"qualname",
|
|
270
|
+
"start",
|
|
271
|
+
"end",
|
|
272
|
+
"loc",
|
|
273
|
+
"stmt_count",
|
|
274
|
+
"fingerprint",
|
|
275
|
+
"loc_bucket"
|
|
276
|
+
],
|
|
277
|
+
"blocks": [
|
|
278
|
+
"file_i",
|
|
279
|
+
"qualname",
|
|
280
|
+
"start",
|
|
281
|
+
"end",
|
|
282
|
+
"size"
|
|
283
|
+
],
|
|
284
|
+
"segments": [
|
|
285
|
+
"file_i",
|
|
286
|
+
"qualname",
|
|
287
|
+
"start",
|
|
288
|
+
"end",
|
|
289
|
+
"size",
|
|
290
|
+
"segment_hash",
|
|
291
|
+
"segment_sig"
|
|
292
|
+
]
|
|
293
|
+
},
|
|
294
|
+
"facts": {
|
|
295
|
+
"blocks": {}
|
|
296
|
+
}
|
|
297
|
+
}
|
|
298
|
+
```
|
|
299
|
+
|
|
300
|
+
---
|
|
301
|
+
|
|
302
|
+
## Cache
|
|
303
|
+
|
|
304
|
+
Cache is an optimization layer only and is never a source of truth.
|
|
305
|
+
|
|
306
|
+
- Default path: `<root>/.cache/codeclone/cache.json`
|
|
307
|
+
- Schema version: **v1.2**
|
|
308
|
+
- Invalid or oversized cache is ignored with warning and rebuilt (fail-open)
|
|
309
|
+
|
|
310
|
+
Full contract details: [`docs/book/07-cache.md`](docs/book/07-cache.md)
|
|
311
|
+
|
|
312
|
+
---
|
|
313
|
+
|
|
314
|
+
## Pre-commit Integration
|
|
315
|
+
|
|
316
|
+
```yaml
|
|
317
|
+
repos:
|
|
318
|
+
- repo: local
|
|
319
|
+
hooks:
|
|
320
|
+
- id: codeclone
|
|
321
|
+
name: CodeClone
|
|
322
|
+
entry: codeclone
|
|
323
|
+
language: system
|
|
324
|
+
pass_filenames: false
|
|
325
|
+
args: [ ".", "--ci" ]
|
|
326
|
+
types: [ python ]
|
|
327
|
+
```
|
|
328
|
+
|
|
329
|
+
---
|
|
330
|
+
|
|
331
|
+
## What CodeClone Is (and Is Not)
|
|
332
|
+
|
|
333
|
+
### CodeClone Is
|
|
334
|
+
|
|
335
|
+
- A structural clone detector for Python
|
|
336
|
+
- A CI guard against new duplication
|
|
337
|
+
- A deterministic analysis tool with auditable outputs
|
|
338
|
+
|
|
339
|
+
### CodeClone Is Not
|
|
340
|
+
|
|
341
|
+
- A linter or code formatter
|
|
342
|
+
- A semantic equivalence prover
|
|
343
|
+
- A runtime execution analyzer
|
|
344
|
+
|
|
345
|
+
---
|
|
346
|
+
|
|
347
|
+
## How It Works
|
|
348
|
+
|
|
349
|
+
**High-level Pipeline:**
|
|
350
|
+
|
|
351
|
+
1. **Parse** — Python source → AST
|
|
352
|
+
2. **Normalize** — AST → canonical structure
|
|
353
|
+
3. **CFG Construction** — per-function control flow graph
|
|
354
|
+
4. **Fingerprinting** — stable hash computation
|
|
355
|
+
5. **Grouping** — function/block/segment clone groups
|
|
356
|
+
6. **Determinism** — stable ordering for reproducibility
|
|
357
|
+
7. **Baseline Comparison** — new vs known clones (when requested)
|
|
358
|
+
|
|
359
|
+
Learn more:
|
|
360
|
+
|
|
361
|
+
- Architecture: [`docs/architecture.md`](docs/architecture.md)
|
|
362
|
+
- CFG semantics: [`docs/cfg.md`](docs/cfg.md)
|
|
363
|
+
|
|
364
|
+
---
|
|
365
|
+
|
|
366
|
+
## Documentation Map
|
|
367
|
+
|
|
368
|
+
Use this map to pick the right level of detail:
|
|
369
|
+
|
|
370
|
+
- **Contract book (canonical contracts/specs):** [`docs/book/`](docs/book/)
|
|
371
|
+
- Start here: [`docs/book/00-intro.md`](docs/book/00-intro.md)
|
|
372
|
+
- Exit codes and precedence: [`docs/book/03-contracts-exit-codes.md`](docs/book/03-contracts-exit-codes.md)
|
|
373
|
+
- Baseline contract (schema/trust/integrity): [`docs/book/06-baseline.md`](docs/book/06-baseline.md)
|
|
374
|
+
- Cache contract (schema/integrity/fail-open): [`docs/book/07-cache.md`](docs/book/07-cache.md)
|
|
375
|
+
- Report contract (schema v1.1 + NEW/KNOWN split): [`docs/book/08-report.md`](docs/book/08-report.md)
|
|
376
|
+
- CLI behavior: [`docs/book/09-cli.md`](docs/book/09-cli.md)
|
|
377
|
+
- HTML rendering: [`docs/book/10-html-render.md`](docs/book/10-html-render.md)
|
|
378
|
+
- Determinism policy: [`docs/book/12-determinism.md`](docs/book/12-determinism.md)
|
|
379
|
+
- Compatibility/versioning rules: [
|
|
380
|
+
`docs/book/14-compatibility-and-versioning.md`](docs/book/14-compatibility-and-versioning.md)
|
|
381
|
+
- **Deep dives:**
|
|
382
|
+
- Architecture narrative: [`docs/architecture.md`](docs/architecture.md)
|
|
383
|
+
- CFG semantics: [`docs/cfg.md`](docs/cfg.md)
|
|
384
|
+
|
|
385
|
+
## Links
|
|
386
|
+
|
|
387
|
+
- **Issues:** <https://github.com/orenlab/codeclone/issues>
|
|
388
|
+
- **PyPI:** <https://pypi.org/project/codeclone/>
|