jscpd-rs 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +69 -0
- package/Cargo.lock +1323 -0
- package/Cargo.toml +54 -0
- package/LICENSE +21 -0
- package/README.md +372 -0
- package/docs/api-parity.md +49 -0
- package/docs/cloning-plan.md +281 -0
- package/docs/compat-baseline.md +535 -0
- package/docs/format-porting.md +86 -0
- package/docs/junior-task-template.md +62 -0
- package/docs/junior-workflow.md +87 -0
- package/docs/migrating-from-jscpd.md +193 -0
- package/docs/npm-release.md +116 -0
- package/docs/public-benchmark-suite.md +81 -0
- package/docs/release-checklist.md +200 -0
- package/docs/release-decisions.md +103 -0
- package/docs/release-readiness.md +51 -0
- package/docs/upstream-bugs.md +501 -0
- package/docs/upstream-issue-drafts.md +393 -0
- package/docs/user-guide.md +309 -0
- package/examples/dump_oxc_tokens.rs +112 -0
- package/examples/library_api.rs +42 -0
- package/npm/bin/jscpd-rs.js +6 -0
- package/npm/bin/jscpd-server.js +6 -0
- package/npm/lib/run-binary.js +68 -0
- package/npm/scripts/postinstall.js +50 -0
- package/package.json +53 -0
- package/skills/dry-refactoring/SKILL.md +63 -0
- package/skills/jscpd/SKILL.md +85 -0
- package/src/app.rs +512 -0
- package/src/bin/jscpd-server.rs +429 -0
- package/src/blame.rs +130 -0
- package/src/cli/config.rs +543 -0
- package/src/cli/parsing.rs +301 -0
- package/src/cli/tests.rs +543 -0
- package/src/cli.rs +671 -0
- package/src/detector/matching/secondary.rs +387 -0
- package/src/detector/matching.rs +274 -0
- package/src/detector/model.rs +190 -0
- package/src/detector/prepare.rs +71 -0
- package/src/detector/skip_local.rs +40 -0
- package/src/detector/statistics.rs +138 -0
- package/src/detector/store.rs +96 -0
- package/src/detector/tests.rs +238 -0
- package/src/detector.rs +265 -0
- package/src/files/discovery.rs +508 -0
- package/src/files/gitignore.rs +203 -0
- package/src/files/paths.rs +68 -0
- package/src/files/shebang.rs +106 -0
- package/src/files/tests.rs +523 -0
- package/src/files.rs +25 -0
- package/src/formats.rs +570 -0
- package/src/lib.rs +433 -0
- package/src/main.rs +26 -0
- package/src/report/ai.rs +125 -0
- package/src/report/badge.rs +238 -0
- package/src/report/console.rs +180 -0
- package/src/report/console_common.rs +37 -0
- package/src/report/console_full.rs +139 -0
- package/src/report/csv.rs +65 -0
- package/src/report/escape.rs +8 -0
- package/src/report/file_output.rs +28 -0
- package/src/report/html/assets.rs +47 -0
- package/src/report/html.rs +336 -0
- package/src/report/json.rs +119 -0
- package/src/report/markdown.rs +125 -0
- package/src/report/sarif.rs +302 -0
- package/src/report/silent.rs +22 -0
- package/src/report/source.rs +38 -0
- package/src/report/summary.rs +50 -0
- package/src/report/test_support.rs +133 -0
- package/src/report/threshold.rs +76 -0
- package/src/report/xcode.rs +90 -0
- package/src/report/xml.rs +119 -0
- package/src/report.rs +250 -0
- package/src/server/mcp.rs +942 -0
- package/src/server.rs +1081 -0
- package/src/tokenizer/apex.rs +97 -0
- package/src/tokenizer/blocks.rs +532 -0
- package/src/tokenizer/embedded.rs +106 -0
- package/src/tokenizer/generic.rs +511 -0
- package/src/tokenizer/hash.rs +27 -0
- package/src/tokenizer/ignore.rs +33 -0
- package/src/tokenizer/line_index.rs +33 -0
- package/src/tokenizer/markdown.rs +289 -0
- package/src/tokenizer/markup_attrs.rs +289 -0
- package/src/tokenizer/oxc/fallback.rs +275 -0
- package/src/tokenizer/oxc/jsx.rs +168 -0
- package/src/tokenizer/oxc/kind.rs +177 -0
- package/src/tokenizer/oxc/lexical.rs +67 -0
- package/src/tokenizer/oxc.rs +659 -0
- package/src/tokenizer/scan.rs +88 -0
- package/src/tokenizer/tap.rs +150 -0
- package/src/tokenizer/tests.rs +915 -0
- package/src/tokenizer.rs +328 -0
- package/src/verbose.rs +195 -0
|
@@ -0,0 +1,535 @@
|
|
|
1
|
+
# Compatibility Baseline
|
|
2
|
+
|
|
3
|
+
Baseline date: 2026-05-31.
|
|
4
|
+
|
|
5
|
+
Latest full release gate:
|
|
6
|
+
`FULL=1 PUBLIC=1 scripts/release-gate.sh`
|
|
7
|
+
passed on 2026-05-31 at code commit `8c3da0e` as part of
|
|
8
|
+
`scripts/prepublish-check.sh`.
|
|
9
|
+
|
|
10
|
+
Latest public release gate:
|
|
11
|
+
`PUBLIC=1 PUBLIC_RUNS=3 scripts/release-gate.sh`
|
|
12
|
+
passed on 2026-05-31 at code commit `8c3da0e` as part of
|
|
13
|
+
`scripts/prepublish-check.sh`.
|
|
14
|
+
|
|
15
|
+
Default gate:
|
|
16
|
+
|
|
17
|
+
```bash
|
|
18
|
+
STRICT=coverage scripts/compat-matrix.sh
|
|
19
|
+
```
|
|
20
|
+
|
|
21
|
+
Coverage means every upstream duplicated line must be covered by the Rust report
|
|
22
|
+
for the same file, and Rust must not report fewer clones. Exact clone starts,
|
|
23
|
+
formats, fragment boundaries, source totals, line totals, and pair ordering are
|
|
24
|
+
diagnostic only because Rust may find a wider or split equivalent range while
|
|
25
|
+
compatibility is converging.
|
|
26
|
+
|
|
27
|
+
Reporter gate:
|
|
28
|
+
|
|
29
|
+
```bash
|
|
30
|
+
scripts/compat-reporters.sh
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
This smoke check runs Rust and upstream with
|
|
34
|
+
`json,csv,markdown,xml,sarif,badge,html`, verifies the expected report files,
|
|
35
|
+
parses JSON/SARIF payloads, checks stable artifact contracts, and compares the
|
|
36
|
+
root JSON report with the default coverage rule. Stable artifact checks include
|
|
37
|
+
CSV/Markdown line and clone summary columns, the upstream Markdown heading
|
|
38
|
+
prefix, exact XML output for the fixture, SARIF structure with normalized
|
|
39
|
+
paths, badge title/aria text, HTML report text and clone summaries, and
|
|
40
|
+
equality between each HTML JSON payload and its root JSON report. The aggregate
|
|
41
|
+
release gate also runs this reporter check against a no-duplicates JavaScript
|
|
42
|
+
fixture so empty JSON/CSV/Markdown/XML/SARIF/badge/HTML reports stay covered.
|
|
43
|
+
|
|
44
|
+
CLI gate:
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
scripts/compat-cli.sh
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
This smoke check compares Rust and upstream exit codes plus stable terminal
|
|
51
|
+
contracts for `--help`, `--version`, `--list`, `--debug`, `--exitCode`,
|
|
52
|
+
`--threshold`, invalid `--mode`, bare `--config`, `--store`, `--store-path`,
|
|
53
|
+
bare optional string flag crashes, `--formats-exts`, `--formats-names`,
|
|
54
|
+
malformed `--formats-exts`/`--formats-names` mappings, `--ignore-pattern`,
|
|
55
|
+
`--ignoreCase`, unknown reporters, explicit `time`
|
|
56
|
+
reporter fallback, terminal footer/tips, `xcode`, `ai`, `consoleFull`, and
|
|
57
|
+
`--verbose`.
|
|
58
|
+
The debug checks include cwd `.gitignore` expansion in the printed `ignore`
|
|
59
|
+
option and user-order preservation for explicit `--format` lists.
|
|
60
|
+
|
|
61
|
+
Config gate:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
scripts/compat-config.sh
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
This smoke check runs both implementations from real `.jscpd.json` and
|
|
68
|
+
`package.json#jscpd` configs, including relative `path`, config `output`,
|
|
69
|
+
`silent`, JSON reporter setup, `exitCode`, and order-sensitive `formatsExts`
|
|
70
|
+
object mappings. It also verifies explicit `--config` files outside `cwd`,
|
|
71
|
+
`formatsNames` mappings for extensionless filenames,
|
|
72
|
+
`reportersOptions.badge` path/subject/status/color overrides, debug
|
|
73
|
+
option-surface preservation for `config`, `cache`, `listeners`, and
|
|
74
|
+
`tokensToSkip`, upstream-coerced string numeric config values for `minLines`,
|
|
75
|
+
`maxLines`, and `threshold`, and checks that
|
|
76
|
+
malformed `package.json` files emit a warning and do not prevent detection from
|
|
77
|
+
continuing. Malformed `.jscpd.json` files are checked separately: both
|
|
78
|
+
implementations fail before detection with an upstream-style `SyntaxError`
|
|
79
|
+
printed to stdout. Symlinked explicit config files are also checked so
|
|
80
|
+
`config`, relative `path`, and relative `ignore` resolution follow the symlink
|
|
81
|
+
location rather than the real target path.
|
|
82
|
+
|
|
83
|
+
Blame gate:
|
|
84
|
+
|
|
85
|
+
```bash
|
|
86
|
+
scripts/compat-blame.sh
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
This smoke check creates a temporary Git repository, commits a duplicated pair,
|
|
90
|
+
runs both implementations with `--blame --reporters json`, verifies that both
|
|
91
|
+
JSON reports include matching blame data on both duplicate fragments, and then
|
|
92
|
+
compares the reports with the default coverage rule.
|
|
93
|
+
|
|
94
|
+
Server gate:
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
scripts/compat-server.sh
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
This smoke check compares the native `jscpd-server` binary with upstream
|
|
101
|
+
`apps/jscpd-server`. It verifies exact server `--help` output, invalid or bare
|
|
102
|
+
`--port`, bare common optional flag error shapes, missing-store warning
|
|
103
|
+
fallback, bare and explicit `--host` startup output, rejects main-CLI-only
|
|
104
|
+
options that upstream server does not accept, config-only `workingDirectory`
|
|
105
|
+
semantics, starts both servers on local ports, and checks the root API info,
|
|
106
|
+
`/api/health`, `/api/stats`, JSON and urlencoded `/api/check`,
|
|
107
|
+
empty/missing/non-string field validation, large and special-character
|
|
108
|
+
snippets, JSON content-type headers, JSON syntax errors, upstream-style JSON
|
|
109
|
+
404 responses for missing routes and wrong API methods, MCP
|
|
110
|
+
initialize/session handling, `tools/list`,
|
|
111
|
+
`resources/list`, `get_statistics`, `check_duplication` with `recheck`,
|
|
112
|
+
`check_current_directory`, `jscpd://statistics`, repeated snippet isolation,
|
|
113
|
+
and `GET /mcp` method rejection. It also checks upstream-style MCP UUID-v4
|
|
114
|
+
session IDs, `Content-Type` rejection,
|
|
115
|
+
`DELETE /mcp` and `OPTIONS /mcp` JSON 404 responses, plus JSON-RPC
|
|
116
|
+
single-request and multi-request batch handling. Stable MCP SDK-shaped
|
|
117
|
+
responses for `initialize`, `tools/list`, `resources/list`, and batch
|
|
118
|
+
list/resource requests are compared exactly against upstream, with only the
|
|
119
|
+
package version normalized.
|
|
120
|
+
|
|
121
|
+
Package/install gate:
|
|
122
|
+
|
|
123
|
+
```bash
|
|
124
|
+
scripts/package-check.sh
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
This release-surface check verifies the crate package file list, rejects
|
|
128
|
+
accidental publication of the upstream `jscpd/` submodule, `target/`,
|
|
129
|
+
`node_modules`, and internal scripts, runs `cargo package --locked`, installs
|
|
130
|
+
the `jscpd` and `jscpd-server` binaries into a temporary Cargo root with
|
|
131
|
+
`cargo install --bins`, and checks the installed binaries' versions and the CLI
|
|
132
|
+
binary's upstream-compatible command name.
|
|
133
|
+
|
|
134
|
+
Native API smoke tests are covered by the Rust test suite. They verify the
|
|
135
|
+
path-based detector API, in-memory source API, upstream singular
|
|
136
|
+
`detectClonesAndStatistic` spelling, default options, supported format registry,
|
|
137
|
+
and default/custom format lookup helpers.
|
|
138
|
+
|
|
139
|
+
Upstream CI fixture gate:
|
|
140
|
+
|
|
141
|
+
```bash
|
|
142
|
+
scripts/compat-upstream-ci.sh
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
This mirrors upstream's CI smoke command, `jscpd ./fixtures`, with the upstream
|
|
146
|
+
defaults that matter for detection (`minTokens=50`, `minLines=5`,
|
|
147
|
+
`maxSize=100kb`). It uses the coverage-first comparison, so Rust may report
|
|
148
|
+
additional clones but must cover every upstream duplicated line.
|
|
149
|
+
|
|
150
|
+
Aggregate gate:
|
|
151
|
+
|
|
152
|
+
```bash
|
|
153
|
+
scripts/release-gate.sh
|
|
154
|
+
FULL=1 scripts/release-gate.sh
|
|
155
|
+
PUBLIC=1 scripts/release-gate.sh
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
The default run covers formatting, unit tests, shell syntax, package/install
|
|
159
|
+
verification, and fast CLI/config/reporter/blame/server compatibility checks.
|
|
160
|
+
`FULL=1` also runs the full coverage-first compatibility matrix. `PUBLIC=1`
|
|
161
|
+
runs the project-owned public benchmark suite with coverage compatibility
|
|
162
|
+
enabled, using `PUBLIC_CASES`, `PUBLIC_RUNS`, `PUBLIC_CHECK_COMPAT`, and
|
|
163
|
+
`PUBLIC_MIN_SPEEDUP` to override its defaults.
|
|
164
|
+
`FULL=1 PUBLIC=1 scripts/release-gate.sh` is required before publication.
|
|
165
|
+
|
|
166
|
+
Release candidate gate:
|
|
167
|
+
|
|
168
|
+
```bash
|
|
169
|
+
scripts/release-candidate.sh
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
This is the pre-publication gate: it runs
|
|
173
|
+
`cargo clippy --all-targets -- -D warnings`, the default release gate, the full
|
|
174
|
+
compatibility matrix with `STRICT=coverage`, and the public benchmark/coverage
|
|
175
|
+
suite with three timing runs on the default public cases.
|
|
176
|
+
The GitHub Actions workflow exposes the same path through the
|
|
177
|
+
`release_candidate` manual dispatch input.
|
|
178
|
+
|
|
179
|
+
CI gate:
|
|
180
|
+
|
|
181
|
+
```bash
|
|
182
|
+
.github/workflows/release-gate.yml
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
The GitHub Actions workflow checks out the upstream submodule, installs Rust
|
|
186
|
+
and Node, restores Cargo/pnpm/upstream-build caches, and runs the default
|
|
187
|
+
release gate on pushes and pull requests. The gate prints per-step timings so
|
|
188
|
+
CI regressions are visible in logs. Default push/PR CI uses the already-built
|
|
189
|
+
Cargo target for the npm package smoke; release-candidate and prepublish gates
|
|
190
|
+
still run the cold npm source-build path. Manual workflow dispatch exposes
|
|
191
|
+
`full`, `public`, `release_candidate`, and `public_runs` inputs for the
|
|
192
|
+
pre-release full matrix, public benchmark, and release-candidate gates.
|
|
193
|
+
|
|
194
|
+
Latest local prepublish check: `scripts/prepublish-check.sh` passed on
|
|
195
|
+
2026-05-31 at code commit `8c3da0e`, covering
|
|
196
|
+
`cargo clippy --all-targets -- -D warnings`, the default release gate, the full
|
|
197
|
+
coverage matrix, the public benchmark/coverage suite, package/install
|
|
198
|
+
verification, crate/tag availability checks, npm package/name/npx verification,
|
|
199
|
+
and `cargo publish --dry-run --locked`.
|
|
200
|
+
|
|
201
|
+
Documentation-only updates after `8c3da0e` may reuse the release-candidate
|
|
202
|
+
evidence if they do not change code, scripts, package metadata, or benchmark
|
|
203
|
+
configuration. Rerun `RUN_RELEASE_CANDIDATE=0 scripts/prepublish-check.sh`
|
|
204
|
+
after documentation edits so package/dry-run evidence matches the exact package
|
|
205
|
+
contents being tagged.
|
|
206
|
+
|
|
207
|
+
Latest GitHub Actions default release-gate check:
|
|
208
|
+
`push` passed on 2026-05-31 at code commit `8c3da0e`:
|
|
209
|
+
https://github.com/vv-bogdanov/jscpd-rs/actions/runs/26710762680
|
|
210
|
+
|
|
211
|
+
Recorded public benchmark baseline:
|
|
212
|
+
|
|
213
|
+
| Case | Commit | Format | Rust avg | Upstream avg | Speedup | Compat |
|
|
214
|
+
| --- | --- | --- | ---: | ---: | ---: | --- |
|
|
215
|
+
| `react` | `f0dfee3` | `javascript` | 0.199097s | 10.079214s | 50.62x | pass |
|
|
216
|
+
| `next` | `2bbb67b9` | `typescript` | 0.262433s | 14.715736s | 56.07x | pass |
|
|
217
|
+
| `prometheus` | `a0524ee` | `go` | 0.085239s | 4.642435s | 54.46x | pass |
|
|
218
|
+
|
|
219
|
+
## Current Matrix
|
|
220
|
+
|
|
221
|
+
| Target | Format | Gate | Notes |
|
|
222
|
+
| --- | --- | --- | --- |
|
|
223
|
+
| `jscpd/fixtures` | `javascript` | pass | exact summary parity |
|
|
224
|
+
| `jscpd/fixtures` | `typescript` | pass | exact summary parity |
|
|
225
|
+
| `jscpd/fixtures/javascript` | `json` | pass | exact clone and line summary parity |
|
|
226
|
+
| `jscpd/fixtures` | auto, upstream CI defaults | pass | 422/422 upstream fragments line-covered; Rust reports a few extra generic/SFC ranges |
|
|
227
|
+
| `jscpd/fixtures/custom` | auto + `--formats-exts c:ccc,cc1` | pass | exact clone and line summary parity |
|
|
228
|
+
| `jscpd/fixtures/ignore` | auto | pass | clone-summary gate; inline `style` attributes produce upstream-compatible CSS source buckets; ignored blocks produce 0 clones |
|
|
229
|
+
| `jscpd/fixtures/ignore-pattern` | auto + `--ignore-pattern` | pass | exact clone and line summary parity |
|
|
230
|
+
| `jscpd/fixtures/ignore-case` | auto | pass | clone-summary gate; no clones without `--ignoreCase` |
|
|
231
|
+
| `jscpd/fixtures/ignore-case` | auto + `--ignoreCase` | pass | clone-summary gate; 1 clone with case folding |
|
|
232
|
+
| `jscpd/fixtures/one-file/one-file.js` | auto | pass | exact summary parity for intra-file clones |
|
|
233
|
+
| `jscpd/fixtures/folder1` + `jscpd/fixtures/folder2` | auto | pass | exact clone and line summary parity without `--skipLocal` |
|
|
234
|
+
| `jscpd/fixtures/folder1` + `jscpd/fixtures/folder2` | auto + `--skipLocal` | pass | exact clone and line summary parity with local clones skipped |
|
|
235
|
+
| `jscpd/fixtures/mixed-formats` | auto | pass | upstream JS-in-HTML clone line-covered; Rust reports a wider cross-file JS range |
|
|
236
|
+
| `jscpd/fixtures/shebang` | auto | pass | exact clone and line summary parity for extensionless bash/python shebang files |
|
|
237
|
+
| `jscpd/fixtures/javascript` | `javascript` / `strict` | pass | exact clone and line summary parity; token totals differ |
|
|
238
|
+
| `jscpd/fixtures` | `typescript` / `strict` | pass | exact clone and line summary parity; token totals differ |
|
|
239
|
+
| `jscpd/fixtures/javascript` | `javascript` / `weak` | pass | clone and line summary parity; token totals differ slightly |
|
|
240
|
+
| `jscpd/fixtures` | `jsx` | pass | exact clone and line summary parity; token totals differ slightly |
|
|
241
|
+
| `jscpd/fixtures` | `tsx` | pass | exact clone and line summary parity; token totals differ slightly |
|
|
242
|
+
| `jscpd/fixtures/markdown` | `markdown` | pass | exact clone/start and duplicated-line parity; source line and token totals differ |
|
|
243
|
+
| `jscpd/fixtures` | `vue` | pass | exact upstream fragment/start coverage; Rust still reports duplicate extra script/template clones |
|
|
244
|
+
| `jscpd/fixtures` | `svelte` | pass | 6/6 upstream fragments line-covered; exact start differs for wider css range |
|
|
245
|
+
| `jscpd/fixtures` | `astro` | pass | exact upstream fragment/start coverage; Rust still reports duplicate extra embedded clones |
|
|
246
|
+
| `jscpd/fixtures/pug` | `pug` | pass | exact clone and line summary parity; upstream overextended `style.` range is mirrored |
|
|
247
|
+
| `jscpd/fixtures/haml` | `haml` | pass | exact clone and line summary parity; upstream overextended silent-comment range is mirrored |
|
|
248
|
+
| `jscpd/fixtures/css` | `css` | pass | exact clone coverage; token totals differ |
|
|
249
|
+
| `jscpd/fixtures/css` | `less` | pass | exact clone and line summary parity |
|
|
250
|
+
| `jscpd/fixtures/css` | `scss` | pass | exact clone and line summary parity |
|
|
251
|
+
| `jscpd/fixtures/python` | `python` | pass | 2/2 upstream fragments line-covered |
|
|
252
|
+
| `jscpd/fixtures/go` | `go` | pass | 2/2 upstream fragments line-covered |
|
|
253
|
+
| `jscpd/fixtures/ruby` | `ruby` | pass | 2/2 upstream fragments line-covered |
|
|
254
|
+
| `jscpd/fixtures/php` | `php` | pass | 2/2 upstream fragments line-covered |
|
|
255
|
+
| `jscpd/fixtures/yaml` | `yaml` | pass | 2/2 upstream fragments line-covered |
|
|
256
|
+
| `jscpd/fixtures/sql` | `sql` | pass | 2/2 upstream fragments line-covered |
|
|
257
|
+
| `jscpd/fixtures/toml` | `toml` | pass | 2/2 upstream fragments line-covered |
|
|
258
|
+
| `jscpd/fixtures/shell` | `bash` | pass | 2/2 upstream fragments line-covered |
|
|
259
|
+
| `jscpd/fixtures/swift` | `swift` | pass | 2/2 upstream fragments line-covered |
|
|
260
|
+
| `jscpd/fixtures/powershell` | `powershell` | pass | 2/2 upstream fragments line-covered |
|
|
261
|
+
| `jscpd/fixtures/lua` | `lua` | pass | 2/2 upstream fragments line-covered |
|
|
262
|
+
| `jscpd/fixtures/haskell` | `haskell` | pass | 4/4 upstream fragments line-covered |
|
|
263
|
+
| `jscpd/fixtures/haskell-literate` | `haskell` | pass | exact clone and line summary parity |
|
|
264
|
+
| `jscpd/fixtures/clojure` | `clojure` | pass | 2/2 upstream fragments line-covered |
|
|
265
|
+
| `jscpd/fixtures/sass` | `sass` | pass | 6/6 upstream fragments line-covered |
|
|
266
|
+
| `jscpd/fixtures/stylus` | `stylus` | pass | 2/2 upstream fragments line-covered |
|
|
267
|
+
| `jscpd/fixtures/rust` | `rust` | pass | exact summary parity; 76/76 upstream fragments line-covered |
|
|
268
|
+
| `jscpd/fixtures/dart` | `dart` | pass | exact summary parity; 4/4 upstream fragments line-covered |
|
|
269
|
+
| `jscpd/fixtures/solidity` | `solidity` | pass | 4/4 upstream fragments line-covered; Rust reports one extra clone |
|
|
270
|
+
| `jscpd/fixtures/perl` | `perl` | pass | exact summary parity; 8/8 upstream fragments line-covered |
|
|
271
|
+
| `jscpd/fixtures/commonlisp` | `lisp` | pass | exact clone and line summary parity |
|
|
272
|
+
| `jscpd/fixtures/mllike` | `ocaml` | pass | exact clone and line summary parity |
|
|
273
|
+
| `jscpd/fixtures/mllike` | `fsharp` | pass | exact clone and line summary parity |
|
|
274
|
+
| `jscpd/fixtures/objective-c` | `objectivec` | pass | exact clone and line summary parity |
|
|
275
|
+
| `jscpd/fixtures/clike` | `c` | pass | 4/4 upstream fragments line-covered |
|
|
276
|
+
| `jscpd/fixtures/z80` | `c` | pass | exact clone and line summary parity |
|
|
277
|
+
| `jscpd/fixtures/clike` | `cpp` | pass | 4/4 upstream fragments line-covered |
|
|
278
|
+
| `jscpd/fixtures/clike` | `c-header` | pass | exact clone and line summary parity |
|
|
279
|
+
| `jscpd/fixtures/clike` | `cpp-header` | pass | exact clone and line summary parity |
|
|
280
|
+
| `jscpd/fixtures/clike` | `java` | pass | 4/4 upstream fragments line-covered |
|
|
281
|
+
| `jscpd/fixtures/clike` | `csharp` | pass | 4/4 upstream fragments line-covered |
|
|
282
|
+
| `jscpd/fixtures/clike` | `kotlin` | pass | 4/4 upstream fragments line-covered |
|
|
283
|
+
| `jscpd/fixtures/clike` | `scala` | pass | 2/2 upstream fragments line-covered |
|
|
284
|
+
| `jscpd/fixtures/groovy` | `groovy` | pass | 2/2 upstream fragments line-covered |
|
|
285
|
+
| `jscpd/fixtures/actionscript` | `actionscript` | pass | 2/2 upstream fragments line-covered |
|
|
286
|
+
| `jscpd/fixtures/awk` | `awk` | pass | 2/2 upstream fragments line-covered |
|
|
287
|
+
| `jscpd/fixtures/basic` | `basic` | pass | 2/2 upstream fragments line-covered |
|
|
288
|
+
| `jscpd/fixtures/coffeescript` | `coffeescript` | pass | 4/4 upstream fragments line-covered |
|
|
289
|
+
| `jscpd/fixtures/crystal` | `crystal` | pass | 2/2 upstream fragments line-covered |
|
|
290
|
+
| `jscpd/fixtures/d` | `d` | pass | 2/2 upstream fragments line-covered |
|
|
291
|
+
| `jscpd/fixtures/elm` | `elm` | pass | 4/4 upstream fragments line-covered |
|
|
292
|
+
| `jscpd/fixtures/erlang` | `erlang` | pass | 2/2 upstream fragments line-covered |
|
|
293
|
+
| `jscpd/fixtures/fortran` | `fortran` | pass | 2/2 upstream fragments line-covered |
|
|
294
|
+
| `jscpd/fixtures/gdscript` | `gdscript` | pass | 4/4 upstream fragments line-covered |
|
|
295
|
+
| `jscpd/fixtures/graphql` | `graphql` | pass | 4/4 upstream fragments line-covered |
|
|
296
|
+
| `jscpd/fixtures/julia` | `julia` | pass | 2/2 upstream fragments line-covered |
|
|
297
|
+
| `jscpd/fixtures/protobuf` | `protobuf` | pass | 2/2 upstream fragments line-covered |
|
|
298
|
+
| `jscpd/fixtures/ada` | `ada` | pass | exact summary parity; 6/6 upstream fragments line-covered |
|
|
299
|
+
| `jscpd/fixtures/apex` | `apex` | pass | exact summary parity; includes embedded SOQL as `sql` |
|
|
300
|
+
| `jscpd/fixtures/haxe` | `haxe` | pass | exact summary parity; 8/8 upstream fragments line-covered |
|
|
301
|
+
| `jscpd/fixtures/r` | `r` | pass | exact summary parity; 4/4 upstream fragments line-covered |
|
|
302
|
+
| `jscpd/fixtures/csv` | `csv` | pass | 2/2 upstream fragments line-covered |
|
|
303
|
+
| `jscpd/fixtures/diff` | `diff` | pass | 2/2 upstream fragments line-covered |
|
|
304
|
+
| `jscpd/fixtures/cmake` | `cmake` | pass | 2/2 upstream fragments line-covered |
|
|
305
|
+
| `jscpd/fixtures/hcl` | `hcl` | pass | 2/2 upstream fragments line-covered |
|
|
306
|
+
| `jscpd/fixtures/gitignore` | `ignore` | pass | exact clone and line summary parity |
|
|
307
|
+
| `jscpd/fixtures/json5` | `json5` | pass | 2/2 upstream fragments line-covered |
|
|
308
|
+
| `jscpd/fixtures/latex` | `latex` | pass | 2/2 upstream fragments line-covered |
|
|
309
|
+
| `jscpd/fixtures/puppet` | `puppet` | pass | 4/4 upstream fragments line-covered |
|
|
310
|
+
| `jscpd/fixtures/qsharp` | `qsharp` | pass | 2/2 upstream fragments line-covered |
|
|
311
|
+
| `jscpd/fixtures/racket` | `racket` | pass | 2/2 upstream fragments line-covered |
|
|
312
|
+
| `jscpd/fixtures/sas` | `sas` | pass | 2/2 upstream fragments line-covered |
|
|
313
|
+
| `jscpd/fixtures/scheme` | `scheme` | pass | 2/2 upstream fragments line-covered |
|
|
314
|
+
| `jscpd/fixtures/vhdl` | `vhdl` | pass | 4/4 upstream fragments line-covered |
|
|
315
|
+
| `jscpd/fixtures/xquery` | `xquery` | pass | 2/2 upstream fragments line-covered |
|
|
316
|
+
| `jscpd/fixtures/verilog` | `verilog` | pass | 4/4 upstream fragments line-covered |
|
|
317
|
+
| `jscpd/fixtures/wgsl` | `wgsl` | pass | 4/4 upstream fragments line-covered |
|
|
318
|
+
| `jscpd/fixtures/zig` | `zig` | pass | 4/4 upstream fragments line-covered |
|
|
319
|
+
| `jscpd/fixtures/tcl` | `tcl` | pass | 4/4 upstream fragments line-covered |
|
|
320
|
+
| `jscpd/fixtures/turtle` | `turtle` | pass | 4/4 upstream fragments line-covered |
|
|
321
|
+
| `jscpd/fixtures/twig` | `twig` | pass | exact upstream fragment/start and line summary parity; token totals differ slightly |
|
|
322
|
+
| `jscpd/fixtures/properties` | `properties` | pass | exact clone and line summary parity |
|
|
323
|
+
| `jscpd/fixtures/properties` | `ini` | pass | exact clone and line summary parity |
|
|
324
|
+
| `jscpd/fixtures/xml` | `markup` | pass | 6/6 upstream fragments line-covered; Rust skips empty XML/XSD inputs |
|
|
325
|
+
| `jscpd/fixtures/htmlmixed` | `markup` | pass | exact clone and line summary parity; upstream also reports embedded script/style sources |
|
|
326
|
+
| `jscpd/fixtures/htmlembedded` | `aspnet` | pass | 9/10 upstream fragments line-covered; one documented upstream range overextends through an inserted email block |
|
|
327
|
+
| `jscpd/fixtures/vb` | `vbnet` | pass | exact clone and line summary parity |
|
|
328
|
+
| `jscpd/fixtures/text` | `txt` | pass | exact clone and line summary parity |
|
|
329
|
+
| `jscpd/fixtures/robotframework` | `robotframework` | pass | 4/4 upstream fragments line-covered; upstream reports final newline as one-past-content |
|
|
330
|
+
| `jscpd/fixtures/tap` | `tap` | pass | exact clone and line summary parity for embedded YAML diagnostics |
|
|
331
|
+
| `jscpd/fixtures/textile` | `textile` | pass | exact clone summary parity |
|
|
332
|
+
| `jscpd/fixtures/antlr4` | `antlr4` | pass | 2/2 upstream fragments line-covered |
|
|
333
|
+
| `jscpd/fixtures/apl` | `apl` | pass | 2/2 upstream fragments line-covered |
|
|
334
|
+
| `jscpd/fixtures/bicep` | `bicep` | pass | 2/2 upstream fragments line-covered |
|
|
335
|
+
| `jscpd/fixtures/brainfuck` | `brainfuck` | pass | 8/8 upstream fragments line-covered |
|
|
336
|
+
| `jscpd/fixtures/cfml` | `cfml` | pass | exact clone and line summary parity |
|
|
337
|
+
| `jscpd/fixtures/cfscript` | `cfscript` | pass | exact clone and line summary parity |
|
|
338
|
+
| `jscpd/fixtures/dot` | `dot` | pass | 2/2 upstream fragments line-covered |
|
|
339
|
+
| `jscpd/fixtures/eiffel` | `eiffel` | pass | exact clone and line summary parity |
|
|
340
|
+
| `jscpd/fixtures/gettext` | `gettext` | pass | 2/2 upstream fragments line-covered; Rust reports extra covered ranges |
|
|
341
|
+
| `jscpd/fixtures/gherkin` | `gherkin` | pass | 2/2 upstream fragments line-covered |
|
|
342
|
+
| `jscpd/fixtures/handlebars` | `handlebars` | pass | 2/2 upstream fragments line-covered |
|
|
343
|
+
| `jscpd/fixtures/idris` | `idris` | pass | 4/4 upstream fragments line-covered |
|
|
344
|
+
| `jscpd/fixtures/lilypond` | `lilypond` | pass | 6/6 upstream fragments line-covered |
|
|
345
|
+
| `jscpd/fixtures/livescript` | `livescript` | pass | 2/2 upstream fragments line-covered |
|
|
346
|
+
| `jscpd/fixtures/linker-script` | `linker-script` | pass | exact clone and line summary parity |
|
|
347
|
+
| `jscpd/fixtures/llvm` | `llvm` | pass | 2/2 upstream fragments line-covered |
|
|
348
|
+
| `jscpd/fixtures/log` | `log` | pass | 2/2 upstream fragments line-covered |
|
|
349
|
+
| `jscpd/fixtures/nsis` | `nsis` | pass | 2/2 upstream fragments line-covered |
|
|
350
|
+
| `jscpd/fixtures/openqasm` | `openqasm` | pass | 2/2 upstream fragments line-covered |
|
|
351
|
+
| `jscpd/fixtures/oz` | `oz` | pass | 2/2 upstream fragments line-covered |
|
|
352
|
+
| `jscpd/fixtures/pascal` | `pascal` | pass | 2/2 upstream fragments line-covered |
|
|
353
|
+
| `jscpd/fixtures/idl` | `prolog` | pass | exact clone and line summary parity |
|
|
354
|
+
| `jscpd/fixtures/plsql` | `plsql` | pass | exact clone and line summary parity |
|
|
355
|
+
| `jscpd/fixtures/plant-uml` | `plant-uml` | pass | 2/2 upstream fragments line-covered |
|
|
356
|
+
| `jscpd/fixtures/powerquery` | `powerquery` | pass | 2/2 upstream fragments line-covered |
|
|
357
|
+
| `jscpd/fixtures/purescript` | `purescript` | pass | exact clone and line summary parity |
|
|
358
|
+
| `jscpd/fixtures/q` | `q` | pass | 2/2 upstream fragments line-covered |
|
|
359
|
+
| `jscpd/fixtures/rescript` | `rescript` | pass | exact clone and line summary parity |
|
|
360
|
+
| `jscpd/fixtures/smalltalk` | `smalltalk` | pass | 2/2 upstream fragments line-covered |
|
|
361
|
+
| `jscpd/fixtures/smarty` | `smarty` | pass | 2/2 upstream fragments line-covered |
|
|
362
|
+
| `jscpd/fixtures/soy` | `soy` | pass | 2/2 upstream fragments line-covered |
|
|
363
|
+
| `jscpd/fixtures/sparql` | `sparql` | pass | 2/2 upstream fragments line-covered |
|
|
364
|
+
| `jscpd/fixtures/tt2` | `tt2` | pass | exact clone and line summary parity |
|
|
365
|
+
| `jscpd/fixtures/unrealscript` | `unrealscript` | pass | 2/2 upstream fragments line-covered |
|
|
366
|
+
| `jscpd/fixtures/velocity` | `velocity` | pass | 2/2 upstream fragments line-covered |
|
|
367
|
+
| `jscpd/fixtures/mathematica` | `wolfram` | pass | exact clone and line summary parity |
|
|
368
|
+
| `jscpd/packages` | `javascript` | pass | no clones in either implementation |
|
|
369
|
+
| `jscpd/packages` | `typescript` | pass | 66/66 upstream fragments line-covered |
|
|
370
|
+
| Private app fixture | `javascript` | pass | 154/154 upstream fragments line-covered; one exact pair differs in generated `.next` chunks |
|
|
371
|
+
| Private app fixture | `typescript` | pass | 408/408 upstream fragments line-covered |
|
|
372
|
+
| Private app fixture | `tsx` | pass | 14/14 upstream fragments line-covered; Rust currently reports extra findings |
|
|
373
|
+
|
|
374
|
+
## Known Deltas
|
|
375
|
+
|
|
376
|
+
- JS/TS/JSX/TSX use native Rust/Oxc tokenization, so token totals can differ
|
|
377
|
+
from Prism while fragment coverage remains green.
|
|
378
|
+
- Long-tail formats are now discoverable through the upstream-synchronized
|
|
379
|
+
registry, but most use generic tokenization and do not carry parity claims.
|
|
380
|
+
- Markdown extracts YAML front matter and fenced code blocks into embedded
|
|
381
|
+
format maps. YAML quoted scalars are kept whole and fenced gap whitespace is
|
|
382
|
+
preserved enough for exact upstream Markdown clone/start and duplicated-line
|
|
383
|
+
parity, while source line and token totals still differ.
|
|
384
|
+
- Vue, Svelte, and Astro now split embedded template/script/style/frontmatter
|
|
385
|
+
regions into format maps. CSS-like style blocks skip internal whitespace
|
|
386
|
+
tokens so Vue SCSS starts align with upstream, while other embedded generic
|
|
387
|
+
block maps still preserve internal whitespace where it is needed for
|
|
388
|
+
coverage. Their fixtures are line-covered, with remaining wider ranges from
|
|
389
|
+
generic markup/style tokenization.
|
|
390
|
+
- Plain `markup` now extracts top-level `<script>` and `<style>` blocks into
|
|
391
|
+
embedded JavaScript/TypeScript/CSS-like maps. This covers upstream mixed HTML
|
|
392
|
+
fixture clones, though Rust may report a wider equivalent embedded range.
|
|
393
|
+
- Pug and HAML mirror Prism's multiline block behavior for fixture parity:
|
|
394
|
+
`pug` keeps non-`script` dot blocks as one token, and `haml` keeps silent
|
|
395
|
+
comment blocks as one token. The overextended upstream report ranges remain
|
|
396
|
+
listed in `docs/upstream-bugs.md`.
|
|
397
|
+
- Non-native generic formats use coarse whitespace tokenization; weak mode
|
|
398
|
+
strips best-effort common comment spans, including `#`, `//`, `/* */`,
|
|
399
|
+
`<!-- -->`, SQL-style `--`, and Lisp/INI-style `;` comments where those
|
|
400
|
+
prefixes are comments in the upstream Prism grammar.
|
|
401
|
+
- CSS-like generic formats split common punctuation so practical stylesheet
|
|
402
|
+
clones meet upstream token thresholds without carrying a full Prism port.
|
|
403
|
+
- Code-like and Prism-like generic formats split common punctuation and
|
|
404
|
+
operator runs so practical language fixtures meet upstream default token
|
|
405
|
+
thresholds without carrying a full Prism port. This includes long-tail
|
|
406
|
+
fixture formats such as YAML, INI, markup, HAML, DOT, CSV, CMake, Clojure,
|
|
407
|
+
CoffeeScript, Q#, SPARQL, and Robot Framework.
|
|
408
|
+
- Properties uses the same generic punctuation/operator split so dotted keys
|
|
409
|
+
and assignments reach upstream clone thresholds without a dedicated lexer.
|
|
410
|
+
- Several upstream fixture directories are gated through upstream aliases:
|
|
411
|
+
`gitignore` as `ignore`, `mathematica` as `wolfram`, `idl` as `prolog`, and
|
|
412
|
+
`z80` as `c`.
|
|
413
|
+
- ASP.NET uses the code-like generic splitter and is gated with a narrow
|
|
414
|
+
documented upstream range exception for `file2.aspx:18-43`, where upstream
|
|
415
|
+
reports through an inserted email field block that is not present in the
|
|
416
|
+
paired source.
|
|
417
|
+
- Apex extracts bracketed SOQL regions into an embedded `sql` map to match
|
|
418
|
+
upstream's multi-format Apex reports.
|
|
419
|
+
- `--mode strict` now preserves Prism-style `empty` and `new_line` whitespace
|
|
420
|
+
tokens in the native JS/TS/Oxc path and the generic tokenizer. The
|
|
421
|
+
JavaScript fixture has exact strict-mode summary parity.
|
|
422
|
+
- Extensionless names such as `Makefile` and `Dockerfile` require
|
|
423
|
+
`--formats-names`, matching upstream behavior.
|
|
424
|
+
- Custom extension and filename mappings are supported through
|
|
425
|
+
`--formats-exts`/`formatsExts` and `--formats-names`/`formatsNames`.
|
|
426
|
+
- Relative `ignore`/`--ignore` patterns are normalized against each configured
|
|
427
|
+
scan root and the current working directory, matching upstream behavior for
|
|
428
|
+
absolute scan paths outside `cwd`.
|
|
429
|
+
- `--noSymlinks` skips symlink scan roots as well as symlinks found during tree
|
|
430
|
+
walking, matching upstream's pre-glob path filtering.
|
|
431
|
+
- File discovery respects the current working directory `.gitignore`, scan-root
|
|
432
|
+
`.gitignore` files, `.git/info/exclude`, and the global Git excludes file
|
|
433
|
+
from `git config --global core.excludesFile`.
|
|
434
|
+
- `--max-size`/`maxSize` follows upstream `bytes.parse` semantics, including
|
|
435
|
+
decimal `kb` through `pb` values, `parseInt` fallback for non-matching
|
|
436
|
+
suffixes such as `1k`, and zero-file behavior for invalid limits.
|
|
437
|
+
- CLI `--min-lines`, `--min-tokens`, and `--max-lines` accept upstream-style
|
|
438
|
+
`parseInt` numeric prefixes, so values such as `20.9` are treated as `20`;
|
|
439
|
+
missing optional values are accepted like Commander `[number]` options.
|
|
440
|
+
- Bare optional values for `--threshold`, `--exitCode`, `--max-size`,
|
|
441
|
+
`--pattern`, `--store`, and `--store-path` follow the local upstream runtime
|
|
442
|
+
behavior where upstream continues instead of failing during CLI parsing.
|
|
443
|
+
- Bare optional values for `--ignore`, `--ignore-pattern`, `--reporters`,
|
|
444
|
+
`--mode`, `--format`, `--formats-exts`, `--formats-names`, and file-writing
|
|
445
|
+
`--output` paths now mirror upstream's Commander runtime TypeError shape
|
|
446
|
+
instead of failing during CLI parsing, including the different
|
|
447
|
+
`fs.mkdirSync` and `path.join` error strings used by different file
|
|
448
|
+
reporters.
|
|
449
|
+
- Malformed CLI `--formats-exts`/`--formats-names` entries without `:` now
|
|
450
|
+
preserve upstream's visible `Cannot read properties of undefined` TypeError
|
|
451
|
+
instead of silently ignoring the entry.
|
|
452
|
+
- CLI `--threshold` follows JavaScript `Number(...)` parsing for values such as
|
|
453
|
+
`0x10` and `nope`, matching upstream threshold reporter behavior.
|
|
454
|
+
- CLI/config `exitCode` keeps the raw Node-like value until clones are found.
|
|
455
|
+
Integer strings such as `0x10` exit with the matching code, while invalid,
|
|
456
|
+
fractional, or bare boolean values emit the same Node-style error after
|
|
457
|
+
reports are written.
|
|
458
|
+
- Config `minLines`, `maxLines`, and `threshold` accept string numeric values
|
|
459
|
+
that upstream coerces at runtime, including JavaScript-style threshold strings
|
|
460
|
+
such as `0x10`. Config `minTokens` remains intentionally strict because
|
|
461
|
+
upstream's string value path can corrupt token-window indexing and crash in
|
|
462
|
+
detection.
|
|
463
|
+
- Invalid `--mode` values fail after CLI parsing with the upstream-style
|
|
464
|
+
`Error: Mode ... does not supported yet.` message printed to stdout.
|
|
465
|
+
- If discovery, size, or line filters leave no files to detect, reporters are
|
|
466
|
+
not run, matching upstream's `InFilesDetector` early return. Silent mode
|
|
467
|
+
stays quiet; non-silent mode only prints the terminal footer.
|
|
468
|
+
- `skipLocal` follows the upstream configured-root validator: clones are skipped
|
|
469
|
+
only when both fragments are inside the same input path.
|
|
470
|
+
- The upstream workflow option surface for `blame`, `store`, `storePath`,
|
|
471
|
+
`cache`, `executionId`, `noTips`, `listeners`, and `tokensToSkip` is parsed
|
|
472
|
+
from CLI/config where applicable. The default `executionId` is generated as a
|
|
473
|
+
UTC RFC3339 timestamp, matching the upstream workflow shape. `--blame`
|
|
474
|
+
populates clone fragment blame data from native `git blame -w` output when
|
|
475
|
+
available.
|
|
476
|
+
- `cache`, config `listeners`, and `tokensToSkip` are intentionally treated as
|
|
477
|
+
option-surface compatibility only for now: the upstream CLI/reference code
|
|
478
|
+
defines or merges these fields, but does not consume them in the detection,
|
|
479
|
+
tokenizer, reporter, or store runtime.
|
|
480
|
+
- `--store <name>` currently follows the upstream missing-store fallback shape
|
|
481
|
+
in both CLI and server entrypoints: it warns that the store package is not
|
|
482
|
+
installed and continues with in-memory detection. Dynamic loading of external
|
|
483
|
+
store packages remains an implementation gap.
|
|
484
|
+
- `--debug` is a dry run like upstream: it prints JS-style option fields and
|
|
485
|
+
discovered files, then exits before clone detection and reporter execution.
|
|
486
|
+
- Explicit `--config` paths are resolved lexically like Node `path.resolve()`,
|
|
487
|
+
without canonicalizing symlinks, so config-relative options use the visible
|
|
488
|
+
config path's directory.
|
|
489
|
+
- `--list` follows the upstream output shape: a `Supported formats:` header
|
|
490
|
+
followed by comma-separated formats.
|
|
491
|
+
- Non-silent runs print clone progress for non-`ai` reporters, then reporter
|
|
492
|
+
output, then a `time:` footer. Tips are printed by default and suppressed by
|
|
493
|
+
`--noTips`; the Rust footer keeps only the AI refactoring tip and omits the
|
|
494
|
+
upstream promotional/support lines.
|
|
495
|
+
- Reporter normalization mirrors upstream append behavior: explicit `silent`
|
|
496
|
+
or `threshold` reporters are not deduplicated when `--silent` or
|
|
497
|
+
`--threshold` appends the same reporter.
|
|
498
|
+
- `--verbose` prints upstream-style format-filter skip messages and detector
|
|
499
|
+
events for `START_DETECTION`, `CLONE_FOUND`, and `CLONE_SKIPPED`.
|
|
500
|
+
- Unknown reporter names emit the upstream-style install warning. Dynamic
|
|
501
|
+
loading of external reporter packages is not implemented yet.
|
|
502
|
+
- `reportersOptions.badge` supports the upstream-style `subject`, `status`,
|
|
503
|
+
`color`, and `path` overrides for the built-in badge reporter.
|
|
504
|
+
- Known upstream bug candidates are tracked in `docs/upstream-bugs.md`.
|
|
505
|
+
|
|
506
|
+
## Benchmark Sanity
|
|
507
|
+
|
|
508
|
+
Recent local sanity checks:
|
|
509
|
+
|
|
510
|
+
| Target | Format | Rust avg | Upstream avg | Approx speedup |
|
|
511
|
+
| --- | --- | ---: | ---: | ---: |
|
|
512
|
+
| Private app fixture | `tsx` | `0.0358s` | `0.568s` | `16x` |
|
|
513
|
+
| `jscpd/packages` | `typescript` | `0.0143s` | `0.831s` | `58x` |
|
|
514
|
+
|
|
515
|
+
Latest public benchmark suite checks, using repositories cloned outside the
|
|
516
|
+
project tree:
|
|
517
|
+
|
|
518
|
+
| Target | Commit | Format | Rust avg | Upstream avg | Approx speedup |
|
|
519
|
+
| --- | --- | --- | ---: | ---: | ---: |
|
|
520
|
+
| `facebook/react` | `f0dfee3` | `javascript` | `0.199097s` | `10.079214s` | `50.62x` |
|
|
521
|
+
| `vercel/next.js` | `2bbb67b9` | `typescript` | `0.262433s` | `14.715736s` | `56.07x` |
|
|
522
|
+
| `prometheus/prometheus` | `a0524ee` | `go` | `0.085239s` | `4.642435s` | `54.46x` |
|
|
523
|
+
|
|
524
|
+
## Additional Mode Checks
|
|
525
|
+
|
|
526
|
+
```bash
|
|
527
|
+
DETECTION_MODE=strict FORMAT=javascript MIN_TOKENS=20 MIN_LINES=3 MAX_SIZE=1mb \
|
|
528
|
+
STRICT=coverage scripts/compat.sh jscpd/fixtures/javascript
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
The default matrix also includes strict JavaScript/TypeScript and weak
|
|
532
|
+
JavaScript mode checks so mode regressions are gated directly. Strict mode uses
|
|
533
|
+
the same coverage-first release rule; token totals remain diagnostic because
|
|
534
|
+
the native token stream may split whitespace differently from Prism while still
|
|
535
|
+
covering every upstream duplicated line.
|
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# Format Porting Guide
|
|
2
|
+
|
|
3
|
+
The first-release policy is coverage-first for hot JS/TS formats and smoke-only
|
|
4
|
+
for long-tail generic formats. A format can find more than upstream while
|
|
5
|
+
compatibility converges, but release-compatible formats must not miss upstream
|
|
6
|
+
duplicate fragments on their fixtures.
|
|
7
|
+
|
|
8
|
+
## Status Levels
|
|
9
|
+
|
|
10
|
+
- `generic`: format is recognized through the upstream-synchronized registry and
|
|
11
|
+
uses coarse whitespace tokenization.
|
|
12
|
+
- `native-smoke`: Rust has format-specific logic and local smoke tests, but no
|
|
13
|
+
upstream coverage claim.
|
|
14
|
+
- `coverage`: `MODE=compat scripts/check-format.sh <format> <target>` passes
|
|
15
|
+
with `STRICT=coverage`.
|
|
16
|
+
- `release`: docs and tests make the support level explicit, and the format is
|
|
17
|
+
included in the release matrix.
|
|
18
|
+
|
|
19
|
+
## Files To Know
|
|
20
|
+
|
|
21
|
+
- `src/formats.rs`: generated format and extension registry. Do not edit by
|
|
22
|
+
hand; run `node scripts/sync-formats.mjs` after upstream tokenizer changes.
|
|
23
|
+
- `src/tokenizer.rs`: tokenizer entrypoint and format dispatch. Native, generic,
|
|
24
|
+
embedded-block, hashing, ignore, and position helpers live under
|
|
25
|
+
`src/tokenizer/`.
|
|
26
|
+
- `src/files.rs`: discovery entrypoint and format filtering. Gitignore, shebang,
|
|
27
|
+
and path-order helpers live under `src/files/`.
|
|
28
|
+
- `src/detector.rs`: clone detection; do not change for ordinary format work.
|
|
29
|
+
- `scripts/check-format.sh`: one-format smoke/compat checks.
|
|
30
|
+
- `scripts/compat.sh`: Rust vs upstream report comparison.
|
|
31
|
+
- `docs/compat-baseline.md`: current compatibility claims and known deltas.
|
|
32
|
+
|
|
33
|
+
## Minimal Format Task
|
|
34
|
+
|
|
35
|
+
1. Confirm the format is present:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
cargo run --quiet -- --list | rg '(^|, )<format>(,|$)'
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
2. Add or reuse a tiny target directory for the format.
|
|
42
|
+
|
|
43
|
+
3. Run smoke mode:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
scripts/check-format.sh <format> <target>
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
4. If the task claims upstream coverage, run compat mode:
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
MODE=compat scripts/check-format.sh <format> <target>
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
5. Add focused tests near the code being changed.
|
|
56
|
+
|
|
57
|
+
6. Update docs only when the support level changes.
|
|
58
|
+
|
|
59
|
+
## Native Tokenizer Task
|
|
60
|
+
|
|
61
|
+
Native tokenizers should be added only when generic tokenization is too noisy or
|
|
62
|
+
misses practical clones. Prefer maintained Rust crates where available. If a
|
|
63
|
+
custom scanner is needed, keep it small and format-specific.
|
|
64
|
+
|
|
65
|
+
Expected shape:
|
|
66
|
+
|
|
67
|
+
- One small scanner/helper for the format.
|
|
68
|
+
- Unit tests for token slices, comments, weak mode, and at least one duplicate
|
|
69
|
+
detection path.
|
|
70
|
+
- No detector changes unless there is a proven cross-format contract issue.
|
|
71
|
+
- No JavaScript runtime fallback.
|
|
72
|
+
|
|
73
|
+
## Junior-Safe Format Tasks
|
|
74
|
+
|
|
75
|
+
- Add a smoke test for a format already handled by generic tokenization.
|
|
76
|
+
- Add one comment-style test and no production code.
|
|
77
|
+
- Add one small production helper by copying an existing tokenizer pattern.
|
|
78
|
+
- Run `scripts/check-format.sh` and report exact `sources`/`clones` output.
|
|
79
|
+
|
|
80
|
+
## Main-Agent-Only Decisions
|
|
81
|
+
|
|
82
|
+
- Promoting a format to `coverage` or `release`.
|
|
83
|
+
- Adding dependencies.
|
|
84
|
+
- Changing detector contracts.
|
|
85
|
+
- Changing compatibility gate semantics.
|
|
86
|
+
- Editing generated registry logic.
|