asmdiff 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,743 @@
1
+ Metadata-Version: 2.4
2
+ Name: asmdiff
3
+ Version: 0.1.0
4
+ Summary: Compare per-function assembly between paired C implementations
5
+ Project-URL: Homepage, https://github.com/rt-rtos/asmdiff
6
+ Project-URL: Repository, https://github.com/rt-rtos/asmdiff
7
+ Author: Rasmus Tikkanen
8
+ License-Expression: MIT
9
+ License-File: LICENSE
10
+ Keywords: assembly,clang,codegen,compiler,disassembly,gcc
11
+ Classifier: Environment :: Console
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Topic :: Software Development :: Compilers
15
+ Classifier: Topic :: Software Development :: Debuggers
16
+ Requires-Python: >=3.8
17
+ Description-Content-Type: text/markdown
18
+
19
+ # asmdiff
20
+ ## per-function assembly comparison for paired C implementations
21
+
22
+ > asmdiff is a command-line tool for comparing the generated assembly of individual C functions across implementations, compiler flags, compiler versions, and source revisions. It is intended for investigating compiler code generation rather than benchmarking runtime performance.
23
+
24
+ `asmdiff.py` answers one question fast: **when I rewrite a C construct, what
25
+ does the compiler actually emit - before and after?** It compiles a small
26
+ harness file across a matrix of compilers, extracts each variant function's
27
+ assembly, and prints side-by-side listings plus a summary of instruction
28
+ counts, outbound calls, and loop spans.
29
+
30
+ Compilers and flags are configured per project through named targets in an
31
+ `asmdiff.toml` file — the tool itself has no project-specific defaults and
32
+ parses any GNU-as ELF assembly.
33
+
34
+ Its home use case: checking whether an expression that used to constant-fold
35
+ (e.g. `x * exp2f(5)` → one multiply) turns into a library call (e.g.
36
+ `ldexpf(x, 5)` → `jmp ldexpf@PLT`) after a "cleanup". That distinction is
37
+ invisible in source review and decisive on hot paths.
38
+
39
+ ## Quick start
40
+
41
+ Write a harness with your two versions as `old_*` / `new_*` function pairs:
42
+
43
+ ```c
44
+ /* myharness.c */
45
+ #include <math.h>
46
+ float old_scale(float x) { return x * exp2f(-5); }
47
+ float new_scale(float x) { return ldexpf(x, -5); }
48
+ ```
49
+
50
+ Run:
51
+
52
+ ```
53
+ $ asmdiff.py myharness.c
54
+ ```
55
+
56
+ Output (gcc section shown, one per compiler):
57
+
58
+ ```
59
+ == gcc -O3 ... ==
60
+
61
+ old_scale | new_scale
62
+ ---------------------------------------------+---------------------------------------------
63
+ endbr64 | endbr64
64
+ mulss .LC0(%rip), %xmm0 | movl $-5, %edi
65
+ ret | jmp ldexpf@PLT
66
+
67
+ function role insns calls loop spans
68
+ old_scale baseline 3 - -
69
+ new_scale candidate 3 ldexpf -
70
+ ```
71
+
72
+ Read the `calls` column first: `-` means the construct lowered to inline
73
+ instructions; a symbol name means a libcall. The side-by-side asm above it is
74
+ the evidence.
75
+
76
+ The `loop spans` column reports `label:N` for every local label that some
77
+ instruction branches back to: N instructions lie between the label and the
78
+ last backward branch targeting it. Whole-function `insns` charges loop-hoisting
79
+ changes for their one-time setup/writeback code; the span count is the part
80
+ that repeats. Which span is your hot loop — and how often it runs — the
81
+ listing and your source know, not the tool.
82
+
83
+ A worked example is included — `asmdiff_example.c` reproduces the
84
+ exp2f/ldexpf analysis for both constant and runtime shift amounts:
85
+
86
+ ```
87
+ $ asmdiff.py asmdiff_example.c
88
+ ```
89
+
90
+ ## Command reference
91
+
92
+ ```
93
+ asmdiff.py SOURCE.c [SOURCE2.c] [--pair OLD:NEW]... [--across FUNC]...
94
+ [--cc 'CC FLAGS']... [--target NAME]... [--config PATH]
95
+ [-- EXTRA_FLAGS...]
96
+ ```
97
+
98
+ | Option | Meaning |
99
+ |---|---|
100
+ | `SOURCE.c` | C file to compile — a purpose-built harness or a real project source. A second file may be given: with `--across` to compare a function, without it for a whole-file A/B summary. |
101
+ | `--pair OLD:NEW` | Compare two *different* functions within one compilation. Repeatable. Default: every `old_X` is auto-paired with its `new_X`; with no pairs at all, the whole-file summary is printed instead. |
102
+ | `--across FUNC` | Compare the *same* function across two compilations (see below). Repeatable. Mutually exclusive with `--pair`. |
103
+ | `--cc 'CC FLAGS'` | One compiler invocation, command and flags in a single quoted string. Repeatable to build a matrix. |
104
+ | `--target NAME` | A named target from the config file, resolved to a `--cc` entry. Repeatable; appended to the matrix after `--cc` entries. |
105
+ | `--config PATH` | Config file to use. Default search: `asmdiff.toml` next to `SOURCE.c`, then in the current directory, then `~/.config/`. First hit wins. |
106
+ | `-- FLAGS...` | Everything after a bare `--` is appended to *every* compiler invocation. |
107
+
108
+ With no `--cc` and no `--target`, the config file's top-level
109
+ `default` target(s) are used; without a config file, plain `gcc -O3` and
110
+ `clang -O3`. The tool's own advice applies: compile at the flags your
111
+ project ships with — put them in a target.
112
+
113
+ Examples:
114
+
115
+ ```bash
116
+ # Explicit pairs, default compilers
117
+ asmdiff.py h.c --pair biquad_v1:biquad_v2 --pair svf_v1:svf_v2
118
+
119
+ # Cross-compilers: quote command and flags together
120
+ asmdiff.py h.c --cc 'xtensa-esp32s3-elf-gcc -O2 -mlongcalls' \
121
+ --cc 'riscv32-esp-elf-gcc -O2'
122
+
123
+ # Try a flag variant across the whole default matrix
124
+ asmdiff.py h.c -- -fno-math-errno
125
+ ```
126
+
127
+ Compilers missing from `PATH` are skipped with a warning; the run fails only
128
+ if none are usable. Exit status is non-zero only for operational failures
129
+ (compile error — the compiler's stderr is shown — unknown `--pair` name, no
130
+ usable compiler). Differing assembly is the expected result, never an error.
131
+
132
+ ## Config file: named targets
133
+
134
+ Retyping a cross-compiler path and ten flags per run is the enemy of actually
135
+ looking at assembly. A TOML config (stdlib `tomllib`, Python ≥ 3.11) names
136
+ each compiler+flags combination once:
137
+
138
+ ```toml
139
+ # asmdiff.toml — next to your harnesses, in CWD, or in ~/.config/
140
+ default = "s3-amy" # target(s) used when no --cc/--target is given
141
+
142
+ [s3-amy] # production-like ESP32-S3 codegen
143
+ cc = "$HOME/.espressif/tools/xtensa-esp-elf/esp-*/xtensa-esp-elf/bin/xtensa-esp32s3-elf-gcc"
144
+ flags = [
145
+ "-O2", "-DAMY_USE_FIXEDPOINT", "-DNDEBUG",
146
+ "-Wno-strict-aliasing", "-mlongcalls",
147
+ "-I$HOME/project/components/amy/src",
148
+ ]
149
+
150
+ [host-fixed] # same defines on host gcc
151
+ cc = "gcc"
152
+ flags = ["-O2", "-DAMY_USE_FIXEDPOINT", "-I$HOME/amy/src"]
153
+ ```
154
+
155
+ `cc` values expand `~` and `$VARS` and may be glob patterns, so a config
156
+ survives toolchain upgrades (`esp-14` → `esp-15`) without editing. A
157
+ pattern matching several installed toolchains resolves to the highest
158
+ version-sorted one — numerically, so `esp-15` beats `esp-9` — and the
159
+ choice is printed to stderr; the `==` header in the output always shows
160
+ the fully resolved command that actually ran. No match is an error. Pin
161
+ the exact directory instead when reproducibility matters more than
162
+ convenience. Flags expand `$VARS` only (no globbing).
163
+
164
+ A target is exactly a saved `--cc` entry — nothing else changes. Useful
165
+ shapes:
166
+
167
+ ```bash
168
+ asmdiff.py h.c # config default target(s)
169
+ asmdiff.py h.c --target s3-amy --target host-fixed # two-target matrix
170
+ asmdiff.py h.c --across f --target s3-amy --cc 'gcc -O2' # mix freely
171
+ ```
172
+
173
+ A config placed next to your harness files travels with them: any invocation
174
+ naming a source in that directory finds it, from any CWD. `default` may be a
175
+ single name or a list (a whole default matrix). The
176
+ included `asmdiff.example.toml` is a starting point. If a flag or include
177
+ path must vary per machine, that's what per-machine config files are for —
178
+ nothing lives in the tool.
179
+
180
+ ## Whole-file summary
181
+
182
+ With no `--pair`, no `--across`, and no `old_*`/`new_*` functions to
183
+ auto-pair, the tool prints what it parsed instead of erroring: every
184
+ function's counts plus a file total. With two files, one block per file:
185
+
186
+ ```
187
+ $ asmdiff.py old/delay.c new/delay.c
188
+
189
+ == xtensa-esp32s3-elf-gcc -O2 ... ==
190
+
191
+ -- old/delay.c --
192
+
193
+ function insns calls loop spans
194
+ stereo_reverb 437 - .L108:327
195
+ ...
196
+ TOTAL (13 functions) 956 malloc_caps, free, ... -
197
+
198
+ -- new/delay.c --
199
+ ...
200
+ TOTAL (13 functions) 1028 malloc_caps, free, ... -
201
+ ```
202
+
203
+ The TOTAL row is a coarse sanity check — did this refactor move the file's
204
+ weight, did a call appear that shouldn't have? It sums parsed function
205
+ bodies only (no literal pools, data, or alignment), so it is not a size
206
+ measurement, and per-function rows are where the real information is.
207
+
208
+ ## Comparing the same function across two builds (`--across`)
209
+
210
+ `--pair` needs both variants to coexist in one compilation. Real changes
211
+ usually don't look like that: the "old" and "new" versions are the same
212
+ function under different flags, defines, or file revisions. `--across FUNC`
213
+ covers both shapes:
214
+
215
+ **One file, two (or more) `--cc` entries** — flag/define variants. The first
216
+ entry is the baseline; each later entry is compared against it:
217
+
218
+ ```bash
219
+ # Did dropping fixed-point change the biquad's codegen?
220
+ asmdiff.py src/filters.c --across dsps_biquad_f32_ansi \
221
+ --cc 'gcc -O3 -DMY_FIXED_CONFIG' --cc 'gcc -O3'
222
+
223
+ # gcc vs clang on the same function
224
+ asmdiff.py src/filters.c --across dsps_biquad_f32_ansi \
225
+ --cc 'gcc -O3' --cc 'clang -O3'
226
+ ```
227
+ ```bash
228
+ # Size vs Performance Optimizations
229
+ asmdiff.py src/filters.c --across dsps_biquad_f32_ansi \
230
+ --cc 'gcc -Os' --cc 'gcc -O2'
231
+
232
+ ```
233
+
234
+ ```
235
+ cc#1: gcc -Os
236
+ cc#2: gcc -O3
237
+
238
+ == cc#1 vs cc#2 ==
239
+
240
+ dsps_biquad_f32_ansi [cc#1] | dsps_biquad_f32_ansi [cc#2]
241
+ ---------------------------------------------+---------------------------------------------
242
+ endbr64 | endbr64
243
+ movl (%r8), %r11d | movdqu (%r8), %xmm0
244
+ movl 8(%r8), %r10d | pushq %r13
245
+ pushq %r15 | pushq %r12
246
+ xorl %r9d, %r9d | pshufd $255, %xmm0, %xmm1
247
+ pushq %r14 | pushq %rbp
248
+ movl 4(%r8), %r15d | movd %xmm1, %ebp
249
+ pushq %r13 | movdqa %xmm0, %xmm1
250
+ movl 12(%r8), %r13d | pushq %rbx
251
+ pushq %r12 | punpckhdq %xmm0, %xmm1
252
+ movl %edx, %r12d | movd %xmm1, %r10d
253
+ pushq %rbp | pshufd $85, %xmm0, %xmm1
254
+ movq %rsi, %rbp | testl %edx, %edx
255
+ pushq %rbx | jle .L24
256
+ movq %rdi, %rbx | movslq %edx, %rdx
257
+ .L27: | movd %xmm1, %r12d
258
+ cmpl %r9d, %r12d | movd %xmm0, %r11d
259
+ jle .L30 | movq %rsi, %r9
260
+ movl (%rbx,%r9,4), %r14d | leaq (%rdi,%rdx,4), %rbx
261
+ movl (%rcx), %edi | movq %rdi, %rsi
262
+ movl %r14d, %esi | jmp .L25
263
+ call SMULR6 | .L26:
264
+ movl 4(%rcx), %edi | movl %eax, %r10d
265
+ movl %r11d, %esi | movl %edi, %r11d
266
+ movl %eax, %edx | .L25:
267
+ call SMULR6 | movl 4(%rcx), %eax
268
+ movl 8(%rcx), %edi | movl (%rsi), %edi
269
+ movl %r15d, %esi | addl $1024, %r12d
270
+ movl %r11d, %r15d | addl $1024, %ebp
271
+ addl %eax, %edx | sarl $11, %r12d
272
+ movl %r14d, %r11d | sarl $11, %ebp
273
+ call SMULR6 | leal 1024(%rax), %edx
274
+ movl 12(%rcx), %edi | leal 1024(%r11), %eax
275
+ movl %r10d, %esi | sarl $11, %eax
276
+ addl %eax, %edx | sarl $11, %edx
277
+ call SMULR6 | leal 1024(%rdi), %r13d
278
+ movl 16(%rcx), %edi | imull %eax, %edx
279
+ movl %r13d, %esi | movl (%rcx), %eax
280
+ movl %r10d, %r13d | sarl $11, %r13d
281
+ subl %eax, %edx | addl $1024, %eax
282
+ call SMULR6 | sarl $11, %eax
283
+ movl %eax, %esi | addl $1, %edx
284
+ movl %edx, %eax | imull %r13d, %eax
285
+ subl %esi, %eax | sarl %edx
286
+ movl %eax, 0(%rbp,%r9,4) | addl $1, %eax
287
+ movl %eax, %r10d | sarl %eax
288
+ incq %r9 | addl %eax, %edx
289
+ jmp .L27 | movl 8(%rcx), %eax
290
+ .L30: | addl $1024, %eax
291
+ popq %rbx | sarl $11, %eax
292
+ movl %r15d, 4(%r8) | imull %r12d, %eax
293
+ xorl %eax, %eax | leal 1024(%r10), %r12d
294
+ popq %rbp | sarl $11, %r12d
295
+ popq %r12 | addl $1, %eax
296
+ movl %r13d, 12(%r8) | sarl %eax
297
+ movl %r11d, (%r8) | addl %edx, %eax
298
+ popq %r13 | movl 12(%rcx), %edx
299
+ movl %r10d, 8(%r8) | addl $1024, %edx
300
+ popq %r14 | sarl $11, %edx
301
+ popq %r15 | imull %r12d, %edx
302
+ ret | movl %r11d, %r12d
303
+ | addl $1, %edx
304
+ | sarl %edx
305
+ | subl %edx, %eax
306
+ | movl 16(%rcx), %edx
307
+ | addl $1024, %edx
308
+ | sarl $11, %edx
309
+ | imull %ebp, %edx
310
+ | movl %r10d, %ebp
311
+ | addl $1, %edx
312
+ | addq $4, %rsi
313
+ | addq $4, %r9
314
+ | sarl %edx
315
+ | subl %edx, %eax
316
+ | movl %eax, -4(%r9)
317
+ | cmpq %rsi, %rbx
318
+ | jne .L26
319
+ | movd %eax, %xmm1
320
+ | movd %r10d, %xmm2
321
+ | movd %edi, %xmm0
322
+ | movd %r11d, %xmm3
323
+ | punpckldq %xmm2, %xmm1
324
+ | punpckldq %xmm3, %xmm0
325
+ | punpcklqdq %xmm1, %xmm0
326
+ | .L24:
327
+ | popq %rbx
328
+ | xorl %eax, %eax
329
+ | popq %rbp
330
+ | movups %xmm0, (%r8)
331
+ | popq %r12
332
+ | popq %r13
333
+ | ret
334
+
335
+ function role insns calls loop spans
336
+ dsps_biquad_f32_ansi [cc#1] baseline 59 SMULR6 .L27:32
337
+ dsps_biquad_f32_ansi [cc#2] candidate 89 - .L26:54
338
+ ```
339
+
340
+ (The columns describe, they don't rank: here `-O2` is bigger by every
341
+ count, and only the listing shows why — `SMULR6` inlined into the loop
342
+ body, vector setup around it. Whether that trade is good is your call.)
343
+
344
+ The output prints a legend mapping `cc#N` tags to the full compiler
345
+ invocations, then one section per baseline/candidate pairing. Runnable
346
+ against the bundled example file:
347
+
348
+ ```
349
+ $ asmdiff.py asmdiff_example.c --across new_rt --cc 'gcc -O0' --cc 'gcc -O3'
350
+
351
+ cc#1: gcc -O0
352
+ cc#2: gcc -O3
353
+
354
+ == cc#1 vs cc#2 ==
355
+
356
+ new_rt [cc#1] | new_rt [cc#2]
357
+ ---------------------------------------------+---------------------------------------------
358
+ endbr64 | endbr64
359
+ pushq %rbp | jmp ldexpf@PLT
360
+ movq %rsp, %rbp |
361
+ subq $16, %rsp |
362
+ movss %xmm0, -4(%rbp) |
363
+ movl %edi, -8(%rbp) |
364
+ movl -8(%rbp), %edx |
365
+ movl -4(%rbp), %eax |
366
+ movl %edx, %edi |
367
+ movd %eax, %xmm0 |
368
+ call ldexpf@PLT |
369
+ leave |
370
+ ret |
371
+
372
+ function role insns calls loop spans
373
+ new_rt [cc#1] baseline 13 ldexpf -
374
+ new_rt [cc#2] candidate 2 ldexpf -
375
+ ```
376
+
377
+ **Two files** — before/after versions of a source file (e.g. from a git
378
+ worktree, a branch checkout, or a patched copy). Each compiler in the matrix
379
+ gets its own section:
380
+
381
+ ```bash
382
+ git worktree add ../baseline main
383
+ asmdiff.py ../baseline/src/filters.c src/filters.c \
384
+ --across dsps_biquad_f32_ansi
385
+ ```
386
+
387
+ Here the tags in the output are the two file paths (shortened to their
388
+ distinct suffix) instead of `cc#N` — the worked example in the next section
389
+ shows a full result of this shape.
390
+
391
+ Because C quote-includes (`#include "amy.h"`) resolve relative to the
392
+ including file first, each tree picks up **its own** headers automatically —
393
+ so a change made in a header (a macro, a typedef) is compared by pointing
394
+ `--across` at any `.c` file that uses it, without touching that `.c` file.
395
+
396
+ ## Worked example: exp2f vs ldexpf in shorepine/AMY sources
397
+
398
+ Suppose the proposal is to change AMY's float-mode shift macros in
399
+ `src/amy_fixedpoint.h` from `(s) * exp2f(b)` to `ldexpf((s), (b))`. No
400
+ harness needed — compare the real functions the macros expand into:
401
+
402
+ ```bash
403
+ # 1. A pristine baseline tree (any ref works)
404
+ git worktree add ../amy-baseline HEAD
405
+
406
+ # 2. The macros in question only exist in the float build, so enable it in
407
+ # BOTH trees: comment out `#define AMY_USE_FIXEDPOINT` in src/amy.h
408
+ # (it is hardcoded there).
409
+
410
+ # 3. In the working tree only, apply the candidate change in
411
+ # src/amy_fixedpoint.h:
412
+ # #define SHIFTR(s, b) ldexpf((s), -(b))
413
+ # #define SHIFTL(s, b) ldexpf((s), (b))
414
+
415
+ # 4. Compare real functions containing both kinds of shift site:
416
+ asmdiff.py ../amy-baseline/src/log2_exp2.c src/log2_exp2.c \
417
+ --across exp2_lut --across log2_lut --cc 'gcc -O3 -Wall'
418
+
419
+ # 5. Clean up
420
+ git worktree remove ../amy-baseline
421
+ ```
422
+
423
+ `src/log2_exp2.c` is a good probe because it contains both site kinds:
424
+ `exp2_lut` shifts by a **runtime** amount, `log2_lut` by **constants**.
425
+ The summary makes the trade-off immediate:
426
+
427
+ ```
428
+ function role insns calls
429
+ exp2_lut [amy-baseline/log2_exp2.c] baseline 65 exp2f
430
+ exp2_lut [amy/log2_exp2.c] candidate 59 ldexpf
431
+ log2_lut [amy-baseline/log2_exp2.c] baseline 58 -
432
+ log2_lut [amy/log2_exp2.c] candidate 64 ldexpf
433
+ ```
434
+
435
+ The runtime site improves (a leaner libcall replaces `exp2f` + multiply),
436
+ but the constant site regresses: baseline `log2_lut` had **no** calls —
437
+ `exp2f(±1)` folds to a multiply — while the candidate now pays a `ldexpf`
438
+ libcall inside its normalisation loop. Any other `.c` file whose hot
439
+ functions use the macros (`filters.c`, `oscillators.c`, `delay.c`) can be
440
+ probed the same way.
441
+
442
+ ## How it works
443
+
444
+ 1. Each compiler runs with `-S` to emit assembly text.
445
+ 2. Function bodies are sliced out between the function's label and its
446
+ `.size` directive (or the next function label). CFI/section/alignment
447
+ directives, comments, and compiler bracketing labels are stripped;
448
+ instructions and meaningful local labels (loop targets) are kept.
449
+ 3. Instruction counts and outbound calls come from a mnemonic scan covering
450
+ x86 (`call`, `jmp` tail calls), ARM (`bl`, `blx`), RISC-V (`call`,
451
+ `tail`, `jal`), and Xtensa (`call0/4/8/12`, `callx*`, `j`). Local-label
452
+ branches and register-indirect x86 jumps are not counted as calls.
453
+ 4. Loop spans come from label references alone — no mnemonic tables, no
454
+ control-flow analysis. The next section walks through it.
455
+
456
+ ### How a span is found
457
+
458
+ The parser sees only the cleaned `-S` text of one function: instructions
459
+ and local labels, as line positions rather than addresses. Two passes:
460
+
461
+ 1. Record the position of every local label line (`.L2:`).
462
+ 2. Scan each instruction's operands for label-shaped tokens (`.L…`). A
463
+ token counts only if that label exists **inside this function body**.
464
+ That one rule filters out literal-pool references — `mulss .LC0(%rip)`,
465
+ `l32r a8, .LC44` — because `.LC*` labels are emitted in data sections
466
+ outside the body and are never in the label map.
467
+
468
+ An instruction that references a label *above* itself is a backward
469
+ branch, whatever its mnemonic (`jne`, `bne`, `bnez.n`, `jnz` — the tool
470
+ never needs to know). The span runs from the label to the last such
471
+ branch, inclusive:
472
+
473
+ ```
474
+ .L2: ─┐
475
+ addl $1, %eax │
476
+ cmpl $8, %eax │ span ".L2:3"
477
+ jne .L2 ─┘ backward reference
478
+ ret outside the span
479
+ ```
480
+
481
+ Several back-edges to one label (a `continue` plus the loop bottom) merge
482
+ into that label's single span. Nested labels report separately — the
483
+ outer span simply contains the inner one. Forward references (loop exits
484
+ like `jle .L24`) are ignored.
485
+
486
+ The one arch-specific case is Xtensa zero-overhead loops, where the
487
+ hardware — not a branch — repeats the body, and the `loop` instruction
488
+ names its *end* label, forward:
489
+
490
+ ```
491
+ loopgt a3, .L5 runs once; not part of the span
492
+ addi.n a2, a2, 1 ─┐
493
+ s32i.n a2, a4, 0 ─┘ span ".L5:2"
494
+ .L5:
495
+ retw.n
496
+ ```
497
+
498
+ That is the entire mechanism. There is no CFG, no trip count, and no
499
+ notion of "the" loop: a backward `goto` produces a span exactly like a
500
+ `for` loop, and an unrolled loop's span is the unrolled body. The column
501
+ states where the compiler laid out a repeatable region — nothing more.
502
+
503
+ No verdicts are printed. The tool reports facts; whether a libcall on that
504
+ path — or an instruction inside a span rather than outside it — matters is
505
+ your judgment.
506
+
507
+ ## Writing good harnesses
508
+
509
+ - Give variants **runtime arguments** for anything that is runtime in the
510
+ real code, and **literals** for anything that is compile-time constant
511
+ there. The fold-vs-libcall answer depends on exactly this.
512
+ - Keep functions non-`static` so the compiler must emit them standalone.
513
+ - Compile at the **flags your project ships with** — a construct that folds
514
+ at `-O3 -ffast-math` may not fold at plain `-O3`. Encode them once as a
515
+ config target and make it the `default`.
516
+ - Beware of over-synthetic harnesses: a function whose whole body is the
517
+ construct can tail-call (`jmp f`) where real surrounding code would
518
+ `call f` and continue. Same libcall either way, but instruction counts
519
+ read differently.
520
+
521
+ ## Porting to another project
522
+
523
+ The tool is one stdlib-only Python 3 file with no imports outside the
524
+ standard library, and contains no project-specific constants. To port:
525
+
526
+ 1. Copy this directory (or just `asmdiff.py`).
527
+ 2. Write an `asmdiff.toml` for the new project's toolchain and flags
528
+ (start from `asmdiff.example.toml`) and drop it next to your
529
+ harnesses, in your working directory, or in `~/.config/`.
530
+ 3. Run the self-tests: `python3 test_asmdiff.py -v` (no compiler needed).
531
+
532
+ ## Limitations
533
+
534
+ - Parses **GNU-as ELF** assembly (`gcc`, `clang`, and GNU cross-compilers
535
+ targeting ELF). macOS Mach-O asm (`_name` labels, no `.size`) is not
536
+ supported — on a Mac, compare inside a Linux container or with a
537
+ cross-toolchain.
538
+ - Call detection is a mnemonic heuristic. Register-indirect calls through a
539
+ loaded address (other than x86 `jmp *reg`) can be reported as a call to
540
+ the register's name (e.g. Xtensa `callx8 a10`), which errs toward
541
+ visibility rather than silence.
542
+ - Columns truncate long instruction lines to keep pairs aligned; when a
543
+ line matters, widen it via the `width` parameter of `side_by_side()` or
544
+ read the raw `-S` output by hand.
545
+ - Loop spans are layout facts, not loop analysis. Label numbers are
546
+ compiler-assigned, so a baseline's `.L27` and a candidate's `.L26` may
547
+ or may not be "the same" loop — match them through the listing, not by
548
+ name. Unrolled or versioned loops (common at `-O3`) appear as several
549
+ spans or as one large span; the tool reports what it sees and does not
550
+ reassemble them into a source-level loop.
551
+ read the raw `-S` output by hand
552
+
553
+ ---
554
+ ## pretend FAQ
555
+ ### Why not just run objdump by hand?
556
+
557
+ The two commands above (steps 4–5) replace a manual workflow with real
558
+ friction at every step. Walking through it end to end on a single,
559
+ one-sided example — did `x * exp2f(-5)` fold to a multiply, or did
560
+ `ldexpf(x, n)` become a libcall — shows where the effort goes.
561
+
562
+ **1. Compile to an object, remembering every project flag by hand.**
563
+
564
+ ```bash
565
+ gcc -O3 -Wall -Wno-strict-aliasing -Wextra -Wno-unused-parameter \
566
+ -Wpointer-arith -Wno-float-conversion -Wno-missing-declarations \
567
+ -DAMY_WAVETABLE -Isrc -c src/log2_exp2.c -o /tmp/candidate.o
568
+ ```
569
+
570
+ Drop one flag (say `-Wno-float-conversion`) and nothing errors — the build
571
+ just quietly takes a different codegen path, and the comparison you're
572
+ about to make is invalid without telling you so. Repeat this for the
573
+ baseline tree with its own `-I`, and again for every extra compiler you
574
+ want in the matrix.
575
+
576
+ **2. Disassemble the function out of the object.**
577
+
578
+ ```bash
579
+ objdump -dr --no-show-raw-insn -M no-aliases /tmp/candidate.o
580
+ ```
581
+
582
+ For a libcall site (`ldexpf(x, n)` with a runtime `n`), the real output is:
583
+
584
+ ```
585
+ 0000000000000000 <g>:
586
+ 0: endbr64
587
+ 4: jmp 9 <g+0x9>
588
+ 5: R_X86_64_PLT32 ldexpf-0x4
589
+ ```
590
+
591
+ The call target isn't in the instruction — `jmp 9 <g+0x9>` points at an
592
+ unresolved stub inside the same function. The actual symbol, `ldexpf`, only
593
+ shows up in the relocation line underneath, and you have to know to cross-
594
+ reference it by hand. Compare that to `gcc -S`, which prints the symbol
595
+ inline because it hasn't been through a linker/relocation step yet:
596
+
597
+ ```
598
+ g:
599
+ endbr64
600
+ jmp ldexpf@PLT
601
+ ```
602
+
603
+ That's why asmdiff compiles with `-S` instead of going through `objdump` on
604
+ a linked object — the thing you're looking for (is this a libcall, and to
605
+ what) is already text, not a relocation entry you have to decode.
606
+
607
+ **3. Strip the noise objdump adds that `-S` doesn't.** Every instruction
608
+ line carries a leading address and (unless `--no-show-raw-insn` is passed)
609
+ raw opcode bytes; there's a `file format elf64-x86-64` banner, a
610
+ `Disassembly of section .text:` header, and an address-annotated function
611
+ label instead of a bare one. None of it is informative for a codegen diff,
612
+ all of it has to be deleted by hand before two functions are readable
613
+ side by side — and it has to be deleted from **every** file in the
614
+ comparison, four of them for the two-function/two-tree case above.
615
+
616
+ **4. Diff the cleaned pair.** `diff -y --width=100 old.txt new.txt` aligns
617
+ by content match, not position — once the two versions diverge even
618
+ slightly it starts pairing unrelated lines, and it has no header row to
619
+ label which side is which. `asmdiff` prints its own aligned columns
620
+ (`side_by_side()`) with the two function names as headers, and never loses
621
+ the pairing because it doesn't try to align by content — it just walks
622
+ both lists in lockstep.
623
+
624
+ **5. Count instructions and classify calls by hand.** Grep for `call`/`jmp`
625
+ in the cleaned text, then manually exclude the ones that are really local
626
+ branches (`jmp 4011a0 <exp2_lut+0x40>`) rather than calls to another
627
+ symbol — the exact distinction `CALL_RE` in `asmdiff.py` encodes once so
628
+ you don't re-derive it per function. Then hand-build a table from four
629
+ separate counts.
630
+
631
+ **6. Do all of the above again per compiler.** asmdiff's default matrix is
632
+ gcc *and* clang; by hand that's every step above, twice.
633
+
634
+ For the full worked example — two functions, two trees, one compiler —
635
+ the manual version is roughly: 2 compiles (with hand-retyped flags) → 4
636
+ `objdump`/relocation-lookup passes → noise-stripped by hand on 4 files →
637
+ 2 `diff -y` runs that don't survive drift → manual instruction counts and
638
+ call classification on 4 files → a hand-assembled summary table. The
639
+ `asmdiff.py` version is the one command already shown above. Neither
640
+ workflow can skip understanding *why* the two functions differ — that part
641
+ is still your judgment — but everything upstream of that judgment, where a
642
+ dropped flag or a misread relocation silently invalidates the comparison,
643
+ is what the tool removes.
644
+
645
+ ### Why not just run gcc -S by hand?
646
+
647
+ `-S` output sidesteps the relocation-decoding problem above — call targets
648
+ are already symbolic text, no PLT stub to resolve. That removes step 2 of
649
+ the objdump workflow. It does not remove the rest.
650
+
651
+ **1. Compile to text instead of an object** — same flags, same risk of a
652
+ silently dropped one:
653
+
654
+ ```bash
655
+ gcc -O3 -Wall -Wno-strict-aliasing -Wextra -Wno-unused-parameter \
656
+ -Wpointer-arith -Wno-float-conversion -Wno-missing-declarations \
657
+ -DAMY_WAVETABLE -Isrc -S src/log2_exp2.c -o /tmp/log2_exp2.s
658
+ ```
659
+
660
+ **2. Find where the function starts and ends in the `.s` file.** The real
661
+ output for `exp2_lut` in this repo (current build, `AMY_USE_FIXEDPOINT`
662
+ on):
663
+
664
+ ```
665
+ exp2_lut:
666
+ .LFB71:
667
+ .cfi_startproc
668
+ endbr64
669
+ movl %edi, %edx
670
+ leaq 2+exp2_fxpt_lutable(%rip), %rcx
671
+ ...
672
+ ret
673
+ .cfi_endproc
674
+ .LFE71:
675
+ .size exp2_lut, .-exp2_lut
676
+ ```
677
+
678
+ There's no `objdump`-style address column to strip, but you still have to
679
+ find the boundary by hand: the function starts at a column-0 label
680
+ (`exp2_lut:`, not `.LFB71:` — that's a bracketing label, not the function),
681
+ and ends at its `.size` directive — which only gcc reliably emits; on a
682
+ compiler that doesn't, you'd fall back to "next function label", which is
683
+ exactly the two-case rule `extract_functions()` implements once instead of
684
+ you re-deriving it per file.
685
+
686
+ **3. Strip compiler furniture — but not indiscriminately.** `.cfi_*`,
687
+ `.LFB`/`.LFE` bracket labels, and `.p2align` carry no information. A local
688
+ `.L`-numbered label sometimes does, though, and you can't tell which
689
+ without reading the body. `log2_lut` in the same file:
690
+
691
+ ```
692
+ log2_lut:
693
+ .LFB70:
694
+ .cfi_startproc
695
+ endbr64
696
+ xorl %eax, %eax
697
+ cmpl $8388607, %edi
698
+ jg .L9
699
+ .p2align 4,,10
700
+ .p2align 3
701
+ .L3:
702
+ addl %edi, %edi
703
+ subl $1, %eax
704
+ cmpl $8388607, %edi
705
+ jle .L3
706
+ cmpl $16777215, %edi
707
+ jle .L11
708
+ .p2align 4,,10
709
+ .p2align 3
710
+ .L5:
711
+ sarl %edi
712
+ addl $1, %eax
713
+ .L9:
714
+ cmpl $16777215, %edi
715
+ jg .L5
716
+ .L11:
717
+ ...
718
+ ```
719
+
720
+ `.L3`, `.L5`, `.L9`, `.L11` are live loop targets — `jg .L9` and `jle .L3`
721
+ jump to them. A quick-and-dirty cleanup pass like `grep -v '^\.'` (strip
722
+ every line starting with a dot) deletes those labels along with the
723
+ `.p2align` noise sitting right next to them, and now the function has
724
+ dangling jumps to labels that no longer exist — silently wrong, not an
725
+ error. The correct rule is "drop this specific set of directives and this
726
+ specific set of *bracketing* labels, keep everything else" — which is a
727
+ narrower, easier-to-get-wrong rule than it looks, and it's what `NOISE`
728
+ and `NOISE_LABEL` encode once in `asmdiff.py` instead of per file.
729
+
730
+ **4. Everything downstream is unchanged from the objdump case:** pair the
731
+ two cleaned functions up for reading, count instructions, classify
732
+ `call`/`jmp` lines as libcalls vs. local branches, repeat per function,
733
+ per file, per compiler, and assemble a summary table by hand.
734
+
735
+ So `-S` over `objdump` buys back exactly one step — the call target is
736
+ already a name, not a relocation to look up — and leaves the rest of the
737
+ manual pipeline (locate, strip correctly, pair, count, classify, tally,
738
+ multiplied by every function/tree/compiler in the matrix) in place. That
739
+ remaining pipeline is `extract_functions()`, `analyze()`,
740
+ `side_by_side()`, and `summary_table()` in `asmdiff.py` — written once,
741
+ instead of re-derived by hand every time someone wants to answer "did this
742
+ still fold?"
743
+
@@ -0,0 +1,6 @@
1
+ asmdiff.py,sha256=iqpMyirSUYH2hWfYZwGnmxlYFTh8lNL-EceTV6S84dY,22885
2
+ asmdiff-0.1.0.dist-info/METADATA,sha256=ovEO3ftLKesG4zJ0xG0S94Krnt4k2O1XzR5Gpr7dVaU,32701
3
+ asmdiff-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
4
+ asmdiff-0.1.0.dist-info/entry_points.txt,sha256=bVeIyjLm3KgWALbgRSeQo16pZ89GXZZ4sFTSWJUMdTI,41
5
+ asmdiff-0.1.0.dist-info/licenses/LICENSE,sha256=Y8JW9wzNeVnfyw62X3qC9cuwZCv_U67rNSF9QUE0HJs,1072
6
+ asmdiff-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.30.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ asmdiff = asmdiff:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Rasmus Tikkanen
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
asmdiff.py ADDED
@@ -0,0 +1,575 @@
1
+ #!/usr/bin/env python3
2
+ """Compare per-function assembly between paired C implementations.
3
+
4
+ Compiles a harness C file across a matrix of compilers, extracts each
5
+ variant function's assembly from the -S output, and prints side-by-side
6
+ listings plus a summary of instruction counts, outbound calls, and loop
7
+ spans (instructions between a local label and its last backward branch).
8
+ Automates fold-vs-libcall analysis when evaluating micro-optimisations
9
+ (e.g. "does this still compile to one instruction, or is it a libcall?").
10
+
11
+ Usage:
12
+ tools/asmdiff/asmdiff.py SOURCE.c [SOURCE2.c] [--pair OLD:NEW]...
13
+ [--across FUNC]... [--cc 'CC FLAGS']...
14
+ [--target NAME]... [--config PATH]
15
+ [-- EXTRA_FLAGS...]
16
+
17
+ Three modes:
18
+ --pair OLD:NEW compares two different functions within one compilation
19
+ (with no --pair, old_X/new_X names auto-pair).
20
+ --across FUNC compares the SAME function across two compilations:
21
+ either one file under two --cc entries (flag/define
22
+ variants), or two source files (before/after versions)
23
+ under each compiler in the matrix.
24
+ (neither) whole-file summary: per-function counts plus a file
25
+ total, for one file or side by side for two.
26
+
27
+ Compilers come from --cc strings, from named targets in an asmdiff.toml
28
+ config file (--target NAME), from the config's `default` entry, or —
29
+ failing all of those — plain `gcc -O3` and `clang -O3`.
30
+ Flags after a bare `--` are appended to every compiler invocation.
31
+ """
32
+ import argparse
33
+ import glob
34
+ import hashlib
35
+ import os
36
+ import re
37
+ import shlex
38
+ import shutil
39
+ import subprocess
40
+ import sys
41
+ import tempfile
42
+ from itertools import zip_longest
43
+ from pathlib import Path
44
+
45
+ try:
46
+ import tomllib # Python >= 3.11; config files are optional without it
47
+ except ModuleNotFoundError:
48
+ tomllib = None
49
+
50
+ DEFAULT_COMPILERS = ["gcc", "clang"]
51
+ FALLBACK_FLAGS = "-O3"
52
+ CONFIG_NAME = "asmdiff.toml"
53
+
54
+ # A label at column 0 that is not a local (.L*) label starts a function.
55
+ FUNC_LABEL = re.compile(r"^([A-Za-z_][\w$.]*):")
56
+ # Assembler directives that carry no information worth reading.
57
+ NOISE = re.compile(
58
+ r"^\s*\.(cfi_|p2align|align\b|loc\b|file\b|text\b|globl\b|global\b|"
59
+ r"type\b|section\b|ident\b|weak\b|hidden\b|addrsig|build_version)"
60
+ )
61
+ # Compiler-generated bracketing labels that add nothing (.LFB0:, .Lfunc_end0:).
62
+ NOISE_LABEL = re.compile(r"^\.(LFB|LFE|Lfunc_begin|Lfunc_end)\d*:")
63
+
64
+
65
+ def extract_functions(asm_text):
66
+ """Map function name -> cleaned asm lines from compiler -S output.
67
+
68
+ A function body runs from its column-0 label to the matching .size
69
+ directive (gcc and clang both emit one on ELF) or the next function
70
+ label. Comment lines, CFI/section/alignment directives, and
71
+ compiler bracketing labels are dropped; instructions and meaningful
72
+ local labels (loop targets) are kept, whitespace-stripped.
73
+ """
74
+ funcs = {}
75
+ current = None
76
+ for raw in asm_text.splitlines():
77
+ m = FUNC_LABEL.match(raw)
78
+ if m:
79
+ current = m.group(1)
80
+ funcs[current] = []
81
+ continue
82
+ if current is None:
83
+ continue
84
+ if re.match(r"^\s*\.size\b", raw):
85
+ current = None
86
+ continue
87
+ line = raw.strip()
88
+ if not line or line.startswith(("#", "//")):
89
+ continue
90
+ if NOISE.match(line) or NOISE_LABEL.match(line):
91
+ continue
92
+ funcs[current].append(line)
93
+ return funcs
94
+
95
+
96
+ # Direct-call / tail-call mnemonics across x86 (call, jmp), ARM (bl, blx),
97
+ # RISC-V (call, tail, jal), and Xtensa (call0/4/8/12, callx*, j). Longest
98
+ # alternatives first so e.g. "callx8" is not consumed as "call". The
99
+ # symbol must be the sole/final operand (optionally @PLT-suffixed), so
100
+ # multi-operand forms like "jal ra, exp2f" don't report the register.
101
+ CALL_RE = re.compile(
102
+ r"^(?:callx\d+|call\d*|callq|jalr|jal|jmp|blx|bl|tail|j)\s+"
103
+ r"([A-Za-z_][\w$.]*)(?:@[\w.]+)?\s*(?:[#;].*)?$"
104
+ )
105
+
106
+
107
+ def analyze(lines):
108
+ """Return (instruction_count, called_symbols) for cleaned asm lines.
109
+
110
+ A call is a call/tail-call mnemonic whose first operand looks like a
111
+ symbol name — local labels (.L*) and %-registers never match, so
112
+ branches inside the function are not counted.
113
+ """
114
+ insns = 0
115
+ calls = []
116
+ for line in lines:
117
+ if line.endswith(":"):
118
+ continue
119
+ insns += 1
120
+ m = CALL_RE.match(line)
121
+ if m:
122
+ sym = m.group(1)
123
+ if sym not in calls:
124
+ calls.append(sym)
125
+ return insns, calls
126
+
127
+
128
+ # A local-label operand (branch target, zero-overhead loop end). Literal
129
+ # pool labels (.LC0) also match, but they are emitted outside function
130
+ # bodies, so they never appear in the label map built from a body.
131
+ LABEL_REF = re.compile(r"\.L[\w$.]+")
132
+
133
+
134
+ def loop_spans(lines):
135
+ """Return [(label, insns)] spans for cleaned asm lines.
136
+
137
+ A span is the run of instructions from a local label to the last
138
+ instruction that references it from below — a backward branch, which
139
+ is what a compiled loop looks like on every target the tool parses.
140
+ Xtensa zero-overhead loops (loop/loopnez/loopgt) reference their END
141
+ label instead; there the span is the instructions the loop encloses.
142
+ Spans are reported in order of appearance, one per label; nested
143
+ labels yield nested spans. The count states how many instructions
144
+ lie in the span — nothing about trip count or hotness, which the
145
+ reader must judge from the source.
146
+ """
147
+ label_at = {ln[:-1]: i for i, ln in enumerate(lines)
148
+ if ln.endswith(":")}
149
+ spans = {}
150
+ for i, ln in enumerate(lines):
151
+ if ln.endswith(":"):
152
+ continue
153
+ mnem = ln.split(None, 1)[0]
154
+ for ref in LABEL_REF.findall(ln):
155
+ if ref not in label_at:
156
+ continue
157
+ j = label_at[ref]
158
+ if j < i: # label above: backward branch
159
+ lo, hi = j, i
160
+ elif mnem.startswith("loop"): # Xtensa: end label below
161
+ lo, hi = i + 1, j - 1
162
+ else:
163
+ continue
164
+ if ref in spans: # several edges to one label
165
+ lo = min(lo, spans[ref][0])
166
+ hi = max(hi, spans[ref][1])
167
+ spans[ref] = (lo, hi)
168
+ result = []
169
+ for ref, (lo, hi) in sorted(spans.items(), key=lambda kv: kv[1]):
170
+ insns = sum(1 for ln in lines[lo:hi + 1] if not ln.endswith(":"))
171
+ result.append((ref, insns))
172
+ return result
173
+
174
+
175
+ def auto_pairs(names):
176
+ """Pair old_X with new_X for every X present in both."""
177
+ names = list(names)
178
+ return [(n, "new_" + n[4:]) for n in names
179
+ if n.startswith("old_") and "new_" + n[4:] in names]
180
+
181
+
182
+ def find_config(explicit, sources):
183
+ """Locate the config file; first hit wins, no merging.
184
+
185
+ Order: --config PATH, then asmdiff.toml next to the first source
186
+ file (a harness directory can carry its own targets), then the
187
+ current directory, then ~/.config/asmdiff.toml.
188
+ """
189
+ if explicit:
190
+ path = Path(explicit)
191
+ if not path.is_file():
192
+ sys.exit(f"error: config file not found: {explicit}")
193
+ return path
194
+ for candidate in (Path(sources[0]).resolve().parent / CONFIG_NAME,
195
+ Path.cwd() / CONFIG_NAME,
196
+ Path.home() / ".config" / CONFIG_NAME):
197
+ if candidate.is_file():
198
+ return candidate
199
+ return None
200
+
201
+
202
+ def load_config(path):
203
+ """Parse a TOML config: one [table] per target, optional top-level
204
+ `default` naming the target(s) to run when no --cc/--target is given."""
205
+ if tomllib is None:
206
+ sys.exit(f"error: {path} exists but this Python has no tomllib "
207
+ "(config files need Python >= 3.11)")
208
+ try:
209
+ with open(path, "rb") as fh:
210
+ return tomllib.load(fh)
211
+ except tomllib.TOMLDecodeError as exc:
212
+ sys.exit(f"error: {path}: {exc}")
213
+
214
+
215
+ def _version_key(path):
216
+ """Sort key that orders embedded numbers numerically, so
217
+ esp-15.2.0 ranks above esp-9.1.0 (lexical order would not)."""
218
+ return [(0, int(tok)) if tok.isdigit() else (1, tok)
219
+ for tok in re.split(r"(\d+)", path)]
220
+
221
+
222
+ def resolve_cc(cc, name):
223
+ """Expand ~, $VARS, and glob patterns in a target's cc value.
224
+
225
+ A pattern like .../xtensa-esp-elf/esp-*/bin/...-gcc keeps the config
226
+ toolchain-version agnostic. If it matches several installed
227
+ toolchains the highest version-sorted one is used, and the choice is
228
+ printed so it is never silent; no match is an error.
229
+ """
230
+ expanded = os.path.expandvars(os.path.expanduser(cc))
231
+ if not any(ch in expanded for ch in "*?["):
232
+ return expanded
233
+ matches = sorted(glob.glob(expanded), key=_version_key)
234
+ if not matches:
235
+ sys.exit(f"error: target [{name}]: cc pattern matched nothing: "
236
+ + expanded)
237
+ if len(matches) > 1:
238
+ print(f"target [{name}]: cc pattern matched {len(matches)} "
239
+ f"toolchains, using {matches[-1]}", file=sys.stderr)
240
+ return matches[-1]
241
+
242
+
243
+ def target_command(config, name, config_path):
244
+ """Resolve a named [target] table to one 'CC FLAGS' matrix entry."""
245
+ entry = (config or {}).get(name)
246
+ if not isinstance(entry, dict):
247
+ known = sorted(k for k, v in (config or {}).items()
248
+ if isinstance(v, dict))
249
+ sys.exit(f"error: no [{name}] target in "
250
+ f"{config_path or 'any config file'}"
251
+ + ("; targets: " + ", ".join(known) if known
252
+ else "; no targets defined"))
253
+ cc = entry.get("cc")
254
+ if not isinstance(cc, str):
255
+ sys.exit(f'error: target [{name}] needs cc = "compiler"')
256
+ flags = entry.get("flags", [])
257
+ if isinstance(flags, str) or not all(isinstance(f, str) for f in flags):
258
+ sys.exit(f"error: target [{name}]: flags must be an array of strings")
259
+ flags = [os.path.expandvars(f) for f in flags]
260
+ return shlex.join([resolve_cc(cc, name), *flags])
261
+
262
+
263
+ def build_matrix(cc_args, target_args, config, config_path):
264
+ """Resolve the compiler matrix.
265
+
266
+ --cc strings verbatim, then --target entries, in that order. With
267
+ neither, the config's `default` (a target name or list of names);
268
+ with no config or no default, plain gcc/clang at -O3.
269
+ """
270
+ entries = list(cc_args)
271
+ entries += [target_command(config, name, config_path)
272
+ for name in target_args]
273
+ if entries:
274
+ return entries
275
+ default = (config or {}).get("default")
276
+ if default:
277
+ names = [default] if isinstance(default, str) else list(default)
278
+ return [target_command(config, name, config_path) for name in names]
279
+ return [f"{cc} {FALLBACK_FLAGS}" for cc in DEFAULT_COMPILERS]
280
+
281
+
282
+ def asm_output_name(cc_cmd, harness):
283
+ """Filesystem-safe .s name for one (compiler, source) compilation.
284
+
285
+ The readable slug of a compiler command can exceed NAME_MAX when the
286
+ command embeds absolute toolchain/include paths; long slugs are
287
+ truncated and kept unique with a short hash of the full command.
288
+ """
289
+ tag = re.sub(r"\W+", "_", cc_cmd)
290
+ if len(tag) > 64:
291
+ tag = tag[:53] + "_" + hashlib.sha1(cc_cmd.encode()).hexdigest()[:10]
292
+ return tag + "_" + Path(harness).stem + ".s"
293
+
294
+
295
+ def compile_to_asm(cc_cmd, extra_flags, harness, out_dir):
296
+ """Run one compiler to -S; return the asm text.
297
+
298
+ Returns None (with a warning) if the compiler is not on PATH.
299
+ Exits with the compiler's stderr on a compile failure.
300
+ """
301
+ argv = shlex.split(cc_cmd)
302
+ if shutil.which(argv[0]) is None:
303
+ print(f"warning: {argv[0]} not found on PATH, skipping",
304
+ file=sys.stderr)
305
+ return None
306
+ out_s = Path(out_dir) / asm_output_name(cc_cmd, harness)
307
+ cmd = argv + list(extra_flags) + ["-S", "-o", str(out_s), str(harness)]
308
+ proc = subprocess.run(cmd, capture_output=True, text=True)
309
+ if proc.returncode != 0:
310
+ sys.exit(f"error: compile failed: {' '.join(cmd)}\n{proc.stderr}")
311
+ return out_s.read_text()
312
+
313
+
314
+ def side_by_side(left, right, ltitle, rtitle, width=44):
315
+ """Two-column view of a pair's asm lines."""
316
+ rows = [f"{ltitle:<{width}} | {rtitle}",
317
+ f"{'-' * width}-+-{'-' * width}"]
318
+ for l, r in zip_longest(left, right, fillvalue=""):
319
+ l = l.expandtabs(8)[:width]
320
+ r = r.expandtabs(8)[:width]
321
+ rows.append(f"{l:<{width}} | {r}")
322
+ return "\n".join(rows)
323
+
324
+
325
+ def render_table(rows):
326
+ """Column-aligned text for a list of equal-length string tuples."""
327
+ widths = [max(len(row[i]) for row in rows) for i in range(len(rows[0]))]
328
+ return "\n".join(
329
+ " ".join(cell.ljust(w) for cell, w in zip(row, widths)).rstrip()
330
+ for row in rows)
331
+
332
+
333
+ def format_spans(spans):
334
+ return " ".join(f"{label}:{n}" for label, n in spans) or "-"
335
+
336
+
337
+ def summary_table(pairs, funcs):
338
+ """Instruction counts, outbound calls, and loop spans per pair member."""
339
+ rows = [("function", "role", "insns", "calls", "loop spans")]
340
+ for old, new in pairs:
341
+ for name, role in ((old, "baseline"), (new, "candidate")):
342
+ insns, calls = analyze(funcs[name])
343
+ rows.append((name, role, str(insns),
344
+ ", ".join(calls) or "-",
345
+ format_spans(loop_spans(funcs[name]))))
346
+ return render_table(rows)
347
+
348
+
349
+ def file_summary_table(funcs):
350
+ """Per-function counts plus a whole-file total row.
351
+
352
+ The total sums instruction counts over every function parsed from
353
+ the -S output and unions their outbound calls — a coarse A/B sanity
354
+ check, not a code-size measurement (literal pools, data, and
355
+ alignment are not included).
356
+ """
357
+ rows = [("function", "insns", "calls", "loop spans")]
358
+ total_insns, all_calls = 0, []
359
+ for name, lines in funcs.items():
360
+ insns, calls = analyze(lines)
361
+ total_insns += insns
362
+ for sym in calls:
363
+ if sym not in all_calls:
364
+ all_calls.append(sym)
365
+ rows.append((name, str(insns), ", ".join(calls) or "-",
366
+ format_spans(loop_spans(lines))))
367
+ rows.append((f"TOTAL ({len(funcs)} functions)", str(total_insns),
368
+ ", ".join(all_calls) or "-", "-"))
369
+ return render_table(rows)
370
+
371
+
372
+ def file_tags(a, b):
373
+ """Shortest distinct labels for two source paths in across-mode output."""
374
+ pa, pb = Path(a), Path(b)
375
+ if pa.name != pb.name:
376
+ return pa.name, pb.name
377
+ ta = f"{pa.parent.name}/{pa.name}"
378
+ tb = f"{pb.parent.name}/{pb.name}"
379
+ if ta != tb:
380
+ return ta, tb
381
+ return str(a), str(b)
382
+
383
+
384
+ def report_across(fn_names, left_funcs, right_funcs, left_tag, right_tag):
385
+ """Side-by-side + summary for the same functions from two compilations."""
386
+ missing = sorted({f for f in fn_names
387
+ if f not in left_funcs or f not in right_funcs})
388
+ if missing:
389
+ sys.exit("error: function(s) not in asm: " + ", ".join(missing)
390
+ + f"; {left_tag} has: " + (", ".join(left_funcs) or "none")
391
+ + f"; {right_tag} has: " + (", ".join(right_funcs) or "none"))
392
+ decorated, pairs = {}, []
393
+ for f in fn_names:
394
+ lt, rt = f"{f} [{left_tag}]", f"{f} [{right_tag}]"
395
+ decorated[lt], decorated[rt] = left_funcs[f], right_funcs[f]
396
+ pairs.append((lt, rt))
397
+ for lt, rt in pairs:
398
+ print(side_by_side(decorated[lt], decorated[rt], lt, rt))
399
+ print()
400
+ print(summary_table(pairs, decorated))
401
+
402
+
403
+ def run_across(sources, matrix, fn_names, extra_flags, tmp):
404
+ """--across mode: same function, two compilations.
405
+
406
+ Two source files: compare fileA's FUNC vs fileB's FUNC under each
407
+ compiler in the matrix. One source file: compare FUNC between the
408
+ first --cc entry (baseline) and each subsequent entry.
409
+ """
410
+ if len(sources) == 2:
411
+ ran_any = False
412
+ for cc_cmd in matrix:
413
+ sides = []
414
+ for src in sources:
415
+ asm = compile_to_asm(cc_cmd, extra_flags, src, tmp)
416
+ if asm is None:
417
+ break
418
+ sides.append(extract_functions(asm))
419
+ if len(sides) < 2:
420
+ continue
421
+ ran_any = True
422
+ print(f"\n== {cc_cmd} ==\n")
423
+ report_across(fn_names, sides[0], sides[1],
424
+ *file_tags(sources[0], sources[1]))
425
+ if not ran_any:
426
+ sys.exit("error: no usable compiler in the matrix")
427
+ return 0
428
+
429
+ usable = []
430
+ for idx, cc_cmd in enumerate(matrix, start=1):
431
+ asm = compile_to_asm(cc_cmd, extra_flags, sources[0], tmp)
432
+ if asm is not None:
433
+ usable.append((f"cc#{idx}", cc_cmd, extract_functions(asm)))
434
+ if len(usable) < 2:
435
+ sys.exit("error: --across needs at least two usable compilers "
436
+ "in the matrix")
437
+ print()
438
+ for tag, cc_cmd, _ in usable:
439
+ print(f"{tag}: {cc_cmd}")
440
+ base_tag, _, base_funcs = usable[0]
441
+ for tag, _, funcs in usable[1:]:
442
+ print(f"\n== {base_tag} vs {tag} ==\n")
443
+ report_across(fn_names, base_funcs, funcs, base_tag, tag)
444
+ return 0
445
+
446
+
447
+ def run_summary(sources, matrix, extra_flags, tmp):
448
+ """No pairs to compare: whole-file summary, one block per file."""
449
+ tags = (file_tags(*sources) if len(sources) == 2
450
+ else [Path(sources[0]).name])
451
+ ran_any = False
452
+ for cc_cmd in matrix:
453
+ sections = []
454
+ for src in sources:
455
+ asm = compile_to_asm(cc_cmd, extra_flags, src, tmp)
456
+ if asm is None:
457
+ break
458
+ sections.append(extract_functions(asm))
459
+ if len(sections) < len(sources):
460
+ continue
461
+ ran_any = True
462
+ print(f"\n== {cc_cmd} ==")
463
+ for tag, funcs in zip(tags, sections):
464
+ if len(sections) > 1:
465
+ print(f"\n-- {tag} --")
466
+ print()
467
+ print(file_summary_table(funcs) if funcs
468
+ else "(no functions found)")
469
+ if not ran_any:
470
+ sys.exit("error: no usable compiler in the matrix")
471
+ return 0
472
+
473
+
474
+ def run_pairs(source, matrix, pair_specs, extra_flags, tmp):
475
+ """--pair mode: two different functions within one compilation.
476
+
477
+ With no --pair and no old_X/new_X functions to auto-pair, falls
478
+ back to the whole-file summary for this compilation.
479
+ """
480
+ ran_any = False
481
+ for cc_cmd in matrix:
482
+ asm = compile_to_asm(cc_cmd, extra_flags, source, tmp)
483
+ if asm is None:
484
+ continue
485
+ ran_any = True
486
+ funcs = extract_functions(asm)
487
+ pairs = ([tuple(p.split(":", 1)) for p in pair_specs]
488
+ or auto_pairs(funcs))
489
+ if not pairs:
490
+ print(f"\n== {cc_cmd} ==\n")
491
+ print(file_summary_table(funcs) if funcs
492
+ else "(no functions found)")
493
+ continue
494
+ missing = sorted({n for p in pairs for n in p if n not in funcs})
495
+ if missing:
496
+ sys.exit("error: function(s) not in asm: "
497
+ + ", ".join(missing)
498
+ + "; functions seen: " + ", ".join(funcs))
499
+ print(f"\n== {cc_cmd} ==\n")
500
+ for old, new in pairs:
501
+ print(side_by_side(funcs[old], funcs[new], old, new))
502
+ print()
503
+ print(summary_table(pairs, funcs))
504
+ if not ran_any:
505
+ sys.exit("error: no usable compiler in the matrix")
506
+ return 0
507
+
508
+
509
+ def main(argv=None):
510
+ argv = list(sys.argv[1:] if argv is None else argv)
511
+ extra_flags = []
512
+ if "--" in argv:
513
+ cut = argv.index("--")
514
+ argv, extra_flags = argv[:cut], argv[cut + 1:]
515
+
516
+ parser = argparse.ArgumentParser(
517
+ description=(__doc__ or "").partition("\n")[0],
518
+ epilog="Flags after a bare -- are appended to every compiler "
519
+ "invocation, e.g.: asmdiff.py h.c -- -fno-math-errno")
520
+ parser.add_argument("sources", nargs="+", metavar="SOURCE.c",
521
+ help="C file to compile; give two files with "
522
+ "--across to compare versions of a function")
523
+ parser.add_argument("--pair", action="append", default=[],
524
+ metavar="OLD:NEW",
525
+ help="compare two functions within one compilation "
526
+ "(repeatable); default: auto-pair old_X/new_X")
527
+ parser.add_argument("--across", action="append", default=[],
528
+ metavar="FUNC",
529
+ help="compare the same function across two "
530
+ "compilations (repeatable): one file + two "
531
+ "--cc entries, or two files")
532
+ parser.add_argument("--cc", action="append", default=[],
533
+ metavar="'CC FLAGS'",
534
+ help="compiler and flags as one string (repeatable); "
535
+ "default: config default target, else gcc and "
536
+ "clang at " + FALLBACK_FLAGS)
537
+ parser.add_argument("--target", action="append", default=[],
538
+ metavar="NAME",
539
+ help="named [table] from the config file, resolved "
540
+ "to a --cc entry (repeatable; appended to the "
541
+ "matrix after --cc entries)")
542
+ parser.add_argument("--config", metavar="PATH",
543
+ help=f"config file; default search: {CONFIG_NAME} "
544
+ "next to SOURCE.c, in the current directory, "
545
+ "then in ~/.config/")
546
+ args = parser.parse_args(argv)
547
+
548
+ if len(args.sources) > 2:
549
+ parser.error("at most two source files may be given")
550
+ if args.across and args.pair:
551
+ parser.error("--across and --pair are mutually exclusive")
552
+ if len(args.sources) == 2 and args.pair:
553
+ parser.error("--pair compares within one file; "
554
+ "use --across FUNC for two files")
555
+ config_path = find_config(args.config, args.sources)
556
+ config = load_config(config_path) if config_path else None
557
+ matrix = build_matrix(args.cc, args.target, config, config_path)
558
+ if args.across and len(args.sources) == 1 and len(matrix) < 2:
559
+ parser.error("--across on one file needs at least two --cc entries")
560
+ for spec in args.pair:
561
+ if ":" not in spec:
562
+ parser.error(f"--pair expects OLD:NEW, got {spec!r}")
563
+
564
+ with tempfile.TemporaryDirectory(prefix="asmdiff") as tmp:
565
+ if args.across:
566
+ return run_across(args.sources, matrix, args.across,
567
+ extra_flags, tmp)
568
+ if len(args.sources) == 2:
569
+ return run_summary(args.sources, matrix, extra_flags, tmp)
570
+ return run_pairs(args.sources[0], matrix, args.pair,
571
+ extra_flags, tmp)
572
+
573
+
574
+ if __name__ == "__main__":
575
+ sys.exit(main())