asmdiff 0.1.0__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- asmdiff-0.1.0.dist-info/METADATA +743 -0
- asmdiff-0.1.0.dist-info/RECORD +6 -0
- asmdiff-0.1.0.dist-info/WHEEL +4 -0
- asmdiff-0.1.0.dist-info/entry_points.txt +2 -0
- asmdiff-0.1.0.dist-info/licenses/LICENSE +21 -0
- asmdiff.py +575 -0
|
@@ -0,0 +1,743 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: asmdiff
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Compare per-function assembly between paired C implementations
|
|
5
|
+
Project-URL: Homepage, https://github.com/rt-rtos/asmdiff
|
|
6
|
+
Project-URL: Repository, https://github.com/rt-rtos/asmdiff
|
|
7
|
+
Author: Rasmus Tikkanen
|
|
8
|
+
License-Expression: MIT
|
|
9
|
+
License-File: LICENSE
|
|
10
|
+
Keywords: assembly,clang,codegen,compiler,disassembly,gcc
|
|
11
|
+
Classifier: Environment :: Console
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Topic :: Software Development :: Compilers
|
|
15
|
+
Classifier: Topic :: Software Development :: Debuggers
|
|
16
|
+
Requires-Python: >=3.8
|
|
17
|
+
Description-Content-Type: text/markdown
|
|
18
|
+
|
|
19
|
+
# asmdiff
|
|
20
|
+
## per-function assembly comparison for paired C implementations
|
|
21
|
+
|
|
22
|
+
> asmdiff is a command-line tool for comparing the generated assembly of individual C functions across implementations, compiler flags, compiler versions, and source revisions. It is intended for investigating compiler code generation rather than benchmarking runtime performance.
|
|
23
|
+
|
|
24
|
+
`asmdiff.py` answers one question fast: **when I rewrite a C construct, what
|
|
25
|
+
does the compiler actually emit - before and after?** It compiles a small
|
|
26
|
+
harness file across a matrix of compilers, extracts each variant function's
|
|
27
|
+
assembly, and prints side-by-side listings plus a summary of instruction
|
|
28
|
+
counts, outbound calls, and loop spans.
|
|
29
|
+
|
|
30
|
+
Compilers and flags are configured per project through named targets in an
|
|
31
|
+
`asmdiff.toml` file — the tool itself has no project-specific defaults and
|
|
32
|
+
parses any GNU-as ELF assembly.
|
|
33
|
+
|
|
34
|
+
Its home use case: checking whether an expression that used to constant-fold
|
|
35
|
+
(e.g. `x * exp2f(5)` → one multiply) turns into a library call (e.g.
|
|
36
|
+
`ldexpf(x, 5)` → `jmp ldexpf@PLT`) after a "cleanup". That distinction is
|
|
37
|
+
invisible in source review and decisive on hot paths.
|
|
38
|
+
|
|
39
|
+
## Quick start
|
|
40
|
+
|
|
41
|
+
Write a harness with your two versions as `old_*` / `new_*` function pairs:
|
|
42
|
+
|
|
43
|
+
```c
|
|
44
|
+
/* myharness.c */
|
|
45
|
+
#include <math.h>
|
|
46
|
+
float old_scale(float x) { return x * exp2f(-5); }
|
|
47
|
+
float new_scale(float x) { return ldexpf(x, -5); }
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
Run:
|
|
51
|
+
|
|
52
|
+
```
|
|
53
|
+
$ asmdiff.py myharness.c
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
Output (gcc section shown, one per compiler):
|
|
57
|
+
|
|
58
|
+
```
|
|
59
|
+
== gcc -O3 ... ==
|
|
60
|
+
|
|
61
|
+
old_scale | new_scale
|
|
62
|
+
---------------------------------------------+---------------------------------------------
|
|
63
|
+
endbr64 | endbr64
|
|
64
|
+
mulss .LC0(%rip), %xmm0 | movl $-5, %edi
|
|
65
|
+
ret | jmp ldexpf@PLT
|
|
66
|
+
|
|
67
|
+
function role insns calls loop spans
|
|
68
|
+
old_scale baseline 3 - -
|
|
69
|
+
new_scale candidate 3 ldexpf -
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
Read the `calls` column first: `-` means the construct lowered to inline
|
|
73
|
+
instructions; a symbol name means a libcall. The side-by-side asm above it is
|
|
74
|
+
the evidence.
|
|
75
|
+
|
|
76
|
+
The `loop spans` column reports `label:N` for every local label that some
|
|
77
|
+
instruction branches back to: N instructions lie between the label and the
|
|
78
|
+
last backward branch targeting it. Whole-function `insns` charges loop-hoisting
|
|
79
|
+
changes for their one-time setup/writeback code; the span count is the part
|
|
80
|
+
that repeats. Which span is your hot loop — and how often it runs — the
|
|
81
|
+
listing and your source know, not the tool.
|
|
82
|
+
|
|
83
|
+
A worked example is included — `asmdiff_example.c` reproduces the
|
|
84
|
+
exp2f/ldexpf analysis for both constant and runtime shift amounts:
|
|
85
|
+
|
|
86
|
+
```
|
|
87
|
+
$ asmdiff.py asmdiff_example.c
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
## Command reference
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
asmdiff.py SOURCE.c [SOURCE2.c] [--pair OLD:NEW]... [--across FUNC]...
|
|
94
|
+
[--cc 'CC FLAGS']... [--target NAME]... [--config PATH]
|
|
95
|
+
[-- EXTRA_FLAGS...]
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
| Option | Meaning |
|
|
99
|
+
|---|---|
|
|
100
|
+
| `SOURCE.c` | C file to compile — a purpose-built harness or a real project source. A second file may be given: with `--across` to compare a function, without it for a whole-file A/B summary. |
|
|
101
|
+
| `--pair OLD:NEW` | Compare two *different* functions within one compilation. Repeatable. Default: every `old_X` is auto-paired with its `new_X`; with no pairs at all, the whole-file summary is printed instead. |
|
|
102
|
+
| `--across FUNC` | Compare the *same* function across two compilations (see below). Repeatable. Mutually exclusive with `--pair`. |
|
|
103
|
+
| `--cc 'CC FLAGS'` | One compiler invocation, command and flags in a single quoted string. Repeatable to build a matrix. |
|
|
104
|
+
| `--target NAME` | A named target from the config file, resolved to a `--cc` entry. Repeatable; appended to the matrix after `--cc` entries. |
|
|
105
|
+
| `--config PATH` | Config file to use. Default search: `asmdiff.toml` next to `SOURCE.c`, then in the current directory, then `~/.config/`. First hit wins. |
|
|
106
|
+
| `-- FLAGS...` | Everything after a bare `--` is appended to *every* compiler invocation. |
|
|
107
|
+
|
|
108
|
+
With no `--cc` and no `--target`, the config file's top-level
|
|
109
|
+
`default` target(s) are used; without a config file, plain `gcc -O3` and
|
|
110
|
+
`clang -O3`. The tool's own advice applies: compile at the flags your
|
|
111
|
+
project ships with — put them in a target.
|
|
112
|
+
|
|
113
|
+
Examples:
|
|
114
|
+
|
|
115
|
+
```bash
|
|
116
|
+
# Explicit pairs, default compilers
|
|
117
|
+
asmdiff.py h.c --pair biquad_v1:biquad_v2 --pair svf_v1:svf_v2
|
|
118
|
+
|
|
119
|
+
# Cross-compilers: quote command and flags together
|
|
120
|
+
asmdiff.py h.c --cc 'xtensa-esp32s3-elf-gcc -O2 -mlongcalls' \
|
|
121
|
+
--cc 'riscv32-esp-elf-gcc -O2'
|
|
122
|
+
|
|
123
|
+
# Try a flag variant across the whole default matrix
|
|
124
|
+
asmdiff.py h.c -- -fno-math-errno
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Compilers missing from `PATH` are skipped with a warning; the run fails only
|
|
128
|
+
if none are usable. Exit status is non-zero only for operational failures
|
|
129
|
+
(compile error — the compiler's stderr is shown — unknown `--pair` name, no
|
|
130
|
+
usable compiler). Differing assembly is the expected result, never an error.
|
|
131
|
+
|
|
132
|
+
## Config file: named targets
|
|
133
|
+
|
|
134
|
+
Retyping a cross-compiler path and ten flags per run is the enemy of actually
|
|
135
|
+
looking at assembly. A TOML config (stdlib `tomllib`, Python ≥ 3.11) names
|
|
136
|
+
each compiler+flags combination once:
|
|
137
|
+
|
|
138
|
+
```toml
|
|
139
|
+
# asmdiff.toml — next to your harnesses, in CWD, or in ~/.config/
|
|
140
|
+
default = "s3-amy" # target(s) used when no --cc/--target is given
|
|
141
|
+
|
|
142
|
+
[s3-amy] # production-like ESP32-S3 codegen
|
|
143
|
+
cc = "$HOME/.espressif/tools/xtensa-esp-elf/esp-*/xtensa-esp-elf/bin/xtensa-esp32s3-elf-gcc"
|
|
144
|
+
flags = [
|
|
145
|
+
"-O2", "-DAMY_USE_FIXEDPOINT", "-DNDEBUG",
|
|
146
|
+
"-Wno-strict-aliasing", "-mlongcalls",
|
|
147
|
+
"-I$HOME/project/components/amy/src",
|
|
148
|
+
]
|
|
149
|
+
|
|
150
|
+
[host-fixed] # same defines on host gcc
|
|
151
|
+
cc = "gcc"
|
|
152
|
+
flags = ["-O2", "-DAMY_USE_FIXEDPOINT", "-I$HOME/amy/src"]
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
`cc` values expand `~` and `$VARS` and may be glob patterns, so a config
|
|
156
|
+
survives toolchain upgrades (`esp-14` → `esp-15`) without editing. A
|
|
157
|
+
pattern matching several installed toolchains resolves to the highest
|
|
158
|
+
version-sorted one — numerically, so `esp-15` beats `esp-9` — and the
|
|
159
|
+
choice is printed to stderr; the `==` header in the output always shows
|
|
160
|
+
the fully resolved command that actually ran. No match is an error. Pin
|
|
161
|
+
the exact directory instead when reproducibility matters more than
|
|
162
|
+
convenience. Flags expand `$VARS` only (no globbing).
|
|
163
|
+
|
|
164
|
+
A target is exactly a saved `--cc` entry — nothing else changes. Useful
|
|
165
|
+
shapes:
|
|
166
|
+
|
|
167
|
+
```bash
|
|
168
|
+
asmdiff.py h.c # config default target(s)
|
|
169
|
+
asmdiff.py h.c --target s3-amy --target host-fixed # two-target matrix
|
|
170
|
+
asmdiff.py h.c --across f --target s3-amy --cc 'gcc -O2' # mix freely
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
A config placed next to your harness files travels with them: any invocation
|
|
174
|
+
naming a source in that directory finds it, from any CWD. `default` may be a
|
|
175
|
+
single name or a list (a whole default matrix). The
|
|
176
|
+
included `asmdiff.example.toml` is a starting point. If a flag or include
|
|
177
|
+
path must vary per machine, that's what per-machine config files are for —
|
|
178
|
+
nothing lives in the tool.
|
|
179
|
+
|
|
180
|
+
## Whole-file summary
|
|
181
|
+
|
|
182
|
+
With no `--pair`, no `--across`, and no `old_*`/`new_*` functions to
|
|
183
|
+
auto-pair, the tool prints what it parsed instead of erroring: every
|
|
184
|
+
function's counts plus a file total. With two files, one block per file:
|
|
185
|
+
|
|
186
|
+
```
|
|
187
|
+
$ asmdiff.py old/delay.c new/delay.c
|
|
188
|
+
|
|
189
|
+
== xtensa-esp32s3-elf-gcc -O2 ... ==
|
|
190
|
+
|
|
191
|
+
-- old/delay.c --
|
|
192
|
+
|
|
193
|
+
function insns calls loop spans
|
|
194
|
+
stereo_reverb 437 - .L108:327
|
|
195
|
+
...
|
|
196
|
+
TOTAL (13 functions) 956 malloc_caps, free, ... -
|
|
197
|
+
|
|
198
|
+
-- new/delay.c --
|
|
199
|
+
...
|
|
200
|
+
TOTAL (13 functions) 1028 malloc_caps, free, ... -
|
|
201
|
+
```
|
|
202
|
+
|
|
203
|
+
The TOTAL row is a coarse sanity check — did this refactor move the file's
|
|
204
|
+
weight, did a call appear that shouldn't have? It sums parsed function
|
|
205
|
+
bodies only (no literal pools, data, or alignment), so it is not a size
|
|
206
|
+
measurement, and per-function rows are where the real information is.
|
|
207
|
+
|
|
208
|
+
## Comparing the same function across two builds (`--across`)
|
|
209
|
+
|
|
210
|
+
`--pair` needs both variants to coexist in one compilation. Real changes
|
|
211
|
+
usually don't look like that: the "old" and "new" versions are the same
|
|
212
|
+
function under different flags, defines, or file revisions. `--across FUNC`
|
|
213
|
+
covers both shapes:
|
|
214
|
+
|
|
215
|
+
**One file, two (or more) `--cc` entries** — flag/define variants. The first
|
|
216
|
+
entry is the baseline; each later entry is compared against it:
|
|
217
|
+
|
|
218
|
+
```bash
|
|
219
|
+
# Did dropping fixed-point change the biquad's codegen?
|
|
220
|
+
asmdiff.py src/filters.c --across dsps_biquad_f32_ansi \
|
|
221
|
+
--cc 'gcc -O3 -DMY_FIXED_CONFIG' --cc 'gcc -O3'
|
|
222
|
+
|
|
223
|
+
# gcc vs clang on the same function
|
|
224
|
+
asmdiff.py src/filters.c --across dsps_biquad_f32_ansi \
|
|
225
|
+
--cc 'gcc -O3' --cc 'clang -O3'
|
|
226
|
+
```
|
|
227
|
+
```bash
|
|
228
|
+
# Size vs Performance Optimizations
|
|
229
|
+
asmdiff.py src/filters.c --across dsps_biquad_f32_ansi \
|
|
230
|
+
--cc 'gcc -Os' --cc 'gcc -O2'
|
|
231
|
+
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
```
|
|
235
|
+
cc#1: gcc -Os
|
|
236
|
+
cc#2: gcc -O3
|
|
237
|
+
|
|
238
|
+
== cc#1 vs cc#2 ==
|
|
239
|
+
|
|
240
|
+
dsps_biquad_f32_ansi [cc#1] | dsps_biquad_f32_ansi [cc#2]
|
|
241
|
+
---------------------------------------------+---------------------------------------------
|
|
242
|
+
endbr64 | endbr64
|
|
243
|
+
movl (%r8), %r11d | movdqu (%r8), %xmm0
|
|
244
|
+
movl 8(%r8), %r10d | pushq %r13
|
|
245
|
+
pushq %r15 | pushq %r12
|
|
246
|
+
xorl %r9d, %r9d | pshufd $255, %xmm0, %xmm1
|
|
247
|
+
pushq %r14 | pushq %rbp
|
|
248
|
+
movl 4(%r8), %r15d | movd %xmm1, %ebp
|
|
249
|
+
pushq %r13 | movdqa %xmm0, %xmm1
|
|
250
|
+
movl 12(%r8), %r13d | pushq %rbx
|
|
251
|
+
pushq %r12 | punpckhdq %xmm0, %xmm1
|
|
252
|
+
movl %edx, %r12d | movd %xmm1, %r10d
|
|
253
|
+
pushq %rbp | pshufd $85, %xmm0, %xmm1
|
|
254
|
+
movq %rsi, %rbp | testl %edx, %edx
|
|
255
|
+
pushq %rbx | jle .L24
|
|
256
|
+
movq %rdi, %rbx | movslq %edx, %rdx
|
|
257
|
+
.L27: | movd %xmm1, %r12d
|
|
258
|
+
cmpl %r9d, %r12d | movd %xmm0, %r11d
|
|
259
|
+
jle .L30 | movq %rsi, %r9
|
|
260
|
+
movl (%rbx,%r9,4), %r14d | leaq (%rdi,%rdx,4), %rbx
|
|
261
|
+
movl (%rcx), %edi | movq %rdi, %rsi
|
|
262
|
+
movl %r14d, %esi | jmp .L25
|
|
263
|
+
call SMULR6 | .L26:
|
|
264
|
+
movl 4(%rcx), %edi | movl %eax, %r10d
|
|
265
|
+
movl %r11d, %esi | movl %edi, %r11d
|
|
266
|
+
movl %eax, %edx | .L25:
|
|
267
|
+
call SMULR6 | movl 4(%rcx), %eax
|
|
268
|
+
movl 8(%rcx), %edi | movl (%rsi), %edi
|
|
269
|
+
movl %r15d, %esi | addl $1024, %r12d
|
|
270
|
+
movl %r11d, %r15d | addl $1024, %ebp
|
|
271
|
+
addl %eax, %edx | sarl $11, %r12d
|
|
272
|
+
movl %r14d, %r11d | sarl $11, %ebp
|
|
273
|
+
call SMULR6 | leal 1024(%rax), %edx
|
|
274
|
+
movl 12(%rcx), %edi | leal 1024(%r11), %eax
|
|
275
|
+
movl %r10d, %esi | sarl $11, %eax
|
|
276
|
+
addl %eax, %edx | sarl $11, %edx
|
|
277
|
+
call SMULR6 | leal 1024(%rdi), %r13d
|
|
278
|
+
movl 16(%rcx), %edi | imull %eax, %edx
|
|
279
|
+
movl %r13d, %esi | movl (%rcx), %eax
|
|
280
|
+
movl %r10d, %r13d | sarl $11, %r13d
|
|
281
|
+
subl %eax, %edx | addl $1024, %eax
|
|
282
|
+
call SMULR6 | sarl $11, %eax
|
|
283
|
+
movl %eax, %esi | addl $1, %edx
|
|
284
|
+
movl %edx, %eax | imull %r13d, %eax
|
|
285
|
+
subl %esi, %eax | sarl %edx
|
|
286
|
+
movl %eax, 0(%rbp,%r9,4) | addl $1, %eax
|
|
287
|
+
movl %eax, %r10d | sarl %eax
|
|
288
|
+
incq %r9 | addl %eax, %edx
|
|
289
|
+
jmp .L27 | movl 8(%rcx), %eax
|
|
290
|
+
.L30: | addl $1024, %eax
|
|
291
|
+
popq %rbx | sarl $11, %eax
|
|
292
|
+
movl %r15d, 4(%r8) | imull %r12d, %eax
|
|
293
|
+
xorl %eax, %eax | leal 1024(%r10), %r12d
|
|
294
|
+
popq %rbp | sarl $11, %r12d
|
|
295
|
+
popq %r12 | addl $1, %eax
|
|
296
|
+
movl %r13d, 12(%r8) | sarl %eax
|
|
297
|
+
movl %r11d, (%r8) | addl %edx, %eax
|
|
298
|
+
popq %r13 | movl 12(%rcx), %edx
|
|
299
|
+
movl %r10d, 8(%r8) | addl $1024, %edx
|
|
300
|
+
popq %r14 | sarl $11, %edx
|
|
301
|
+
popq %r15 | imull %r12d, %edx
|
|
302
|
+
ret | movl %r11d, %r12d
|
|
303
|
+
| addl $1, %edx
|
|
304
|
+
| sarl %edx
|
|
305
|
+
| subl %edx, %eax
|
|
306
|
+
| movl 16(%rcx), %edx
|
|
307
|
+
| addl $1024, %edx
|
|
308
|
+
| sarl $11, %edx
|
|
309
|
+
| imull %ebp, %edx
|
|
310
|
+
| movl %r10d, %ebp
|
|
311
|
+
| addl $1, %edx
|
|
312
|
+
| addq $4, %rsi
|
|
313
|
+
| addq $4, %r9
|
|
314
|
+
| sarl %edx
|
|
315
|
+
| subl %edx, %eax
|
|
316
|
+
| movl %eax, -4(%r9)
|
|
317
|
+
| cmpq %rsi, %rbx
|
|
318
|
+
| jne .L26
|
|
319
|
+
| movd %eax, %xmm1
|
|
320
|
+
| movd %r10d, %xmm2
|
|
321
|
+
| movd %edi, %xmm0
|
|
322
|
+
| movd %r11d, %xmm3
|
|
323
|
+
| punpckldq %xmm2, %xmm1
|
|
324
|
+
| punpckldq %xmm3, %xmm0
|
|
325
|
+
| punpcklqdq %xmm1, %xmm0
|
|
326
|
+
| .L24:
|
|
327
|
+
| popq %rbx
|
|
328
|
+
| xorl %eax, %eax
|
|
329
|
+
| popq %rbp
|
|
330
|
+
| movups %xmm0, (%r8)
|
|
331
|
+
| popq %r12
|
|
332
|
+
| popq %r13
|
|
333
|
+
| ret
|
|
334
|
+
|
|
335
|
+
function role insns calls loop spans
|
|
336
|
+
dsps_biquad_f32_ansi [cc#1] baseline 59 SMULR6 .L27:32
|
|
337
|
+
dsps_biquad_f32_ansi [cc#2] candidate 89 - .L26:54
|
|
338
|
+
```
|
|
339
|
+
|
|
340
|
+
(The columns describe, they don't rank: here `-O2` is bigger by every
|
|
341
|
+
count, and only the listing shows why — `SMULR6` inlined into the loop
|
|
342
|
+
body, vector setup around it. Whether that trade is good is your call.)
|
|
343
|
+
|
|
344
|
+
The output prints a legend mapping `cc#N` tags to the full compiler
|
|
345
|
+
invocations, then one section per baseline/candidate pairing. Runnable
|
|
346
|
+
against the bundled example file:
|
|
347
|
+
|
|
348
|
+
```
|
|
349
|
+
$ asmdiff.py asmdiff_example.c --across new_rt --cc 'gcc -O0' --cc 'gcc -O3'
|
|
350
|
+
|
|
351
|
+
cc#1: gcc -O0
|
|
352
|
+
cc#2: gcc -O3
|
|
353
|
+
|
|
354
|
+
== cc#1 vs cc#2 ==
|
|
355
|
+
|
|
356
|
+
new_rt [cc#1] | new_rt [cc#2]
|
|
357
|
+
---------------------------------------------+---------------------------------------------
|
|
358
|
+
endbr64 | endbr64
|
|
359
|
+
pushq %rbp | jmp ldexpf@PLT
|
|
360
|
+
movq %rsp, %rbp |
|
|
361
|
+
subq $16, %rsp |
|
|
362
|
+
movss %xmm0, -4(%rbp) |
|
|
363
|
+
movl %edi, -8(%rbp) |
|
|
364
|
+
movl -8(%rbp), %edx |
|
|
365
|
+
movl -4(%rbp), %eax |
|
|
366
|
+
movl %edx, %edi |
|
|
367
|
+
movd %eax, %xmm0 |
|
|
368
|
+
call ldexpf@PLT |
|
|
369
|
+
leave |
|
|
370
|
+
ret |
|
|
371
|
+
|
|
372
|
+
function role insns calls loop spans
|
|
373
|
+
new_rt [cc#1] baseline 13 ldexpf -
|
|
374
|
+
new_rt [cc#2] candidate 2 ldexpf -
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
**Two files** — before/after versions of a source file (e.g. from a git
|
|
378
|
+
worktree, a branch checkout, or a patched copy). Each compiler in the matrix
|
|
379
|
+
gets its own section:
|
|
380
|
+
|
|
381
|
+
```bash
|
|
382
|
+
git worktree add ../baseline main
|
|
383
|
+
asmdiff.py ../baseline/src/filters.c src/filters.c \
|
|
384
|
+
--across dsps_biquad_f32_ansi
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
Here the tags in the output are the two file paths (shortened to their
|
|
388
|
+
distinct suffix) instead of `cc#N` — the worked example in the next section
|
|
389
|
+
shows a full result of this shape.
|
|
390
|
+
|
|
391
|
+
Because C quote-includes (`#include "amy.h"`) resolve relative to the
|
|
392
|
+
including file first, each tree picks up **its own** headers automatically —
|
|
393
|
+
so a change made in a header (a macro, a typedef) is compared by pointing
|
|
394
|
+
`--across` at any `.c` file that uses it, without touching that `.c` file.
|
|
395
|
+
|
|
396
|
+
## Worked example: exp2f vs ldexpf in shorepine/AMY sources
|
|
397
|
+
|
|
398
|
+
Suppose the proposal is to change AMY's float-mode shift macros in
|
|
399
|
+
`src/amy_fixedpoint.h` from `(s) * exp2f(b)` to `ldexpf((s), (b))`. No
|
|
400
|
+
harness needed — compare the real functions the macros expand into:
|
|
401
|
+
|
|
402
|
+
```bash
|
|
403
|
+
# 1. A pristine baseline tree (any ref works)
|
|
404
|
+
git worktree add ../amy-baseline HEAD
|
|
405
|
+
|
|
406
|
+
# 2. The macros in question only exist in the float build, so enable it in
|
|
407
|
+
# BOTH trees: comment out `#define AMY_USE_FIXEDPOINT` in src/amy.h
|
|
408
|
+
# (it is hardcoded there).
|
|
409
|
+
|
|
410
|
+
# 3. In the working tree only, apply the candidate change in
|
|
411
|
+
# src/amy_fixedpoint.h:
|
|
412
|
+
# #define SHIFTR(s, b) ldexpf((s), -(b))
|
|
413
|
+
# #define SHIFTL(s, b) ldexpf((s), (b))
|
|
414
|
+
|
|
415
|
+
# 4. Compare real functions containing both kinds of shift site:
|
|
416
|
+
asmdiff.py ../amy-baseline/src/log2_exp2.c src/log2_exp2.c \
|
|
417
|
+
--across exp2_lut --across log2_lut --cc 'gcc -O3 -Wall'
|
|
418
|
+
|
|
419
|
+
# 5. Clean up
|
|
420
|
+
git worktree remove ../amy-baseline
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
`src/log2_exp2.c` is a good probe because it contains both site kinds:
|
|
424
|
+
`exp2_lut` shifts by a **runtime** amount, `log2_lut` by **constants**.
|
|
425
|
+
The summary makes the trade-off immediate:
|
|
426
|
+
|
|
427
|
+
```
|
|
428
|
+
function role insns calls
|
|
429
|
+
exp2_lut [amy-baseline/log2_exp2.c] baseline 65 exp2f
|
|
430
|
+
exp2_lut [amy/log2_exp2.c] candidate 59 ldexpf
|
|
431
|
+
log2_lut [amy-baseline/log2_exp2.c] baseline 58 -
|
|
432
|
+
log2_lut [amy/log2_exp2.c] candidate 64 ldexpf
|
|
433
|
+
```
|
|
434
|
+
|
|
435
|
+
The runtime site improves (a leaner libcall replaces `exp2f` + multiply),
|
|
436
|
+
but the constant site regresses: baseline `log2_lut` had **no** calls —
|
|
437
|
+
`exp2f(±1)` folds to a multiply — while the candidate now pays a `ldexpf`
|
|
438
|
+
libcall inside its normalisation loop. Any other `.c` file whose hot
|
|
439
|
+
functions use the macros (`filters.c`, `oscillators.c`, `delay.c`) can be
|
|
440
|
+
probed the same way.
|
|
441
|
+
|
|
442
|
+
## How it works
|
|
443
|
+
|
|
444
|
+
1. Each compiler runs with `-S` to emit assembly text.
|
|
445
|
+
2. Function bodies are sliced out between the function's label and its
|
|
446
|
+
`.size` directive (or the next function label). CFI/section/alignment
|
|
447
|
+
directives, comments, and compiler bracketing labels are stripped;
|
|
448
|
+
instructions and meaningful local labels (loop targets) are kept.
|
|
449
|
+
3. Instruction counts and outbound calls come from a mnemonic scan covering
|
|
450
|
+
x86 (`call`, `jmp` tail calls), ARM (`bl`, `blx`), RISC-V (`call`,
|
|
451
|
+
`tail`, `jal`), and Xtensa (`call0/4/8/12`, `callx*`, `j`). Local-label
|
|
452
|
+
branches and register-indirect x86 jumps are not counted as calls.
|
|
453
|
+
4. Loop spans come from label references alone — no mnemonic tables, no
|
|
454
|
+
control-flow analysis. The next section walks through it.
|
|
455
|
+
|
|
456
|
+
### How a span is found
|
|
457
|
+
|
|
458
|
+
The parser sees only the cleaned `-S` text of one function: instructions
|
|
459
|
+
and local labels, as line positions rather than addresses. Two passes:
|
|
460
|
+
|
|
461
|
+
1. Record the position of every local label line (`.L2:`).
|
|
462
|
+
2. Scan each instruction's operands for label-shaped tokens (`.L…`). A
|
|
463
|
+
token counts only if that label exists **inside this function body**.
|
|
464
|
+
That one rule filters out literal-pool references — `mulss .LC0(%rip)`,
|
|
465
|
+
`l32r a8, .LC44` — because `.LC*` labels are emitted in data sections
|
|
466
|
+
outside the body and are never in the label map.
|
|
467
|
+
|
|
468
|
+
An instruction that references a label *above* itself is a backward
|
|
469
|
+
branch, whatever its mnemonic (`jne`, `bne`, `bnez.n`, `jnz` — the tool
|
|
470
|
+
never needs to know). The span runs from the label to the last such
|
|
471
|
+
branch, inclusive:
|
|
472
|
+
|
|
473
|
+
```
|
|
474
|
+
.L2: ─┐
|
|
475
|
+
addl $1, %eax │
|
|
476
|
+
cmpl $8, %eax │ span ".L2:3"
|
|
477
|
+
jne .L2 ─┘ backward reference
|
|
478
|
+
ret outside the span
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
Several back-edges to one label (a `continue` plus the loop bottom) merge
|
|
482
|
+
into that label's single span. Nested labels report separately — the
|
|
483
|
+
outer span simply contains the inner one. Forward references (loop exits
|
|
484
|
+
like `jle .L24`) are ignored.
|
|
485
|
+
|
|
486
|
+
The one arch-specific case is Xtensa zero-overhead loops, where the
|
|
487
|
+
hardware — not a branch — repeats the body, and the `loop` instruction
|
|
488
|
+
names its *end* label, forward:
|
|
489
|
+
|
|
490
|
+
```
|
|
491
|
+
loopgt a3, .L5 runs once; not part of the span
|
|
492
|
+
addi.n a2, a2, 1 ─┐
|
|
493
|
+
s32i.n a2, a4, 0 ─┘ span ".L5:2"
|
|
494
|
+
.L5:
|
|
495
|
+
retw.n
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
That is the entire mechanism. There is no CFG, no trip count, and no
|
|
499
|
+
notion of "the" loop: a backward `goto` produces a span exactly like a
|
|
500
|
+
`for` loop, and an unrolled loop's span is the unrolled body. The column
|
|
501
|
+
states where the compiler laid out a repeatable region — nothing more.
|
|
502
|
+
|
|
503
|
+
No verdicts are printed. The tool reports facts; whether a libcall on that
|
|
504
|
+
path — or an instruction inside a span rather than outside it — matters is
|
|
505
|
+
your judgment.
|
|
506
|
+
|
|
507
|
+
## Writing good harnesses
|
|
508
|
+
|
|
509
|
+
- Give variants **runtime arguments** for anything that is runtime in the
|
|
510
|
+
real code, and **literals** for anything that is compile-time constant
|
|
511
|
+
there. The fold-vs-libcall answer depends on exactly this.
|
|
512
|
+
- Keep functions non-`static` so the compiler must emit them standalone.
|
|
513
|
+
- Compile at the **flags your project ships with** — a construct that folds
|
|
514
|
+
at `-O3 -ffast-math` may not fold at plain `-O3`. Encode them once as a
|
|
515
|
+
config target and make it the `default`.
|
|
516
|
+
- Beware of over-synthetic harnesses: a function whose whole body is the
|
|
517
|
+
construct can tail-call (`jmp f`) where real surrounding code would
|
|
518
|
+
`call f` and continue. Same libcall either way, but instruction counts
|
|
519
|
+
read differently.
|
|
520
|
+
|
|
521
|
+
## Porting to another project
|
|
522
|
+
|
|
523
|
+
The tool is one stdlib-only Python 3 file with no imports outside the
|
|
524
|
+
standard library, and contains no project-specific constants. To port:
|
|
525
|
+
|
|
526
|
+
1. Copy this directory (or just `asmdiff.py`).
|
|
527
|
+
2. Write an `asmdiff.toml` for the new project's toolchain and flags
|
|
528
|
+
(start from `asmdiff.example.toml`) and drop it next to your
|
|
529
|
+
harnesses, in your working directory, or in `~/.config/`.
|
|
530
|
+
3. Run the self-tests: `python3 test_asmdiff.py -v` (no compiler needed).
|
|
531
|
+
|
|
532
|
+
## Limitations
|
|
533
|
+
|
|
534
|
+
- Parses **GNU-as ELF** assembly (`gcc`, `clang`, and GNU cross-compilers
|
|
535
|
+
targeting ELF). macOS Mach-O asm (`_name` labels, no `.size`) is not
|
|
536
|
+
supported — on a Mac, compare inside a Linux container or with a
|
|
537
|
+
cross-toolchain.
|
|
538
|
+
- Call detection is a mnemonic heuristic. Register-indirect calls through a
|
|
539
|
+
loaded address (other than x86 `jmp *reg`) can be reported as a call to
|
|
540
|
+
the register's name (e.g. Xtensa `callx8 a10`), which errs toward
|
|
541
|
+
visibility rather than silence.
|
|
542
|
+
- Columns truncate long instruction lines to keep pairs aligned; when a
|
|
543
|
+
line matters, widen it via the `width` parameter of `side_by_side()` or
|
|
544
|
+
read the raw `-S` output by hand.
|
|
545
|
+
- Loop spans are layout facts, not loop analysis. Label numbers are
|
|
546
|
+
compiler-assigned, so a baseline's `.L27` and a candidate's `.L26` may
|
|
547
|
+
or may not be "the same" loop — match them through the listing, not by
|
|
548
|
+
name. Unrolled or versioned loops (common at `-O3`) appear as several
|
|
549
|
+
spans or as one large span; the tool reports what it sees and does not
|
|
550
|
+
reassemble them into a source-level loop.
|
|
551
|
+
read the raw `-S` output by hand
|
|
552
|
+
|
|
553
|
+
---
|
|
554
|
+
## pretend FAQ
|
|
555
|
+
### Why not just run objdump by hand?
|
|
556
|
+
|
|
557
|
+
The two commands above (steps 4–5) replace a manual workflow with real
|
|
558
|
+
friction at every step. Walking through it end to end on a single,
|
|
559
|
+
one-sided example — did `x * exp2f(-5)` fold to a multiply, or did
|
|
560
|
+
`ldexpf(x, n)` become a libcall — shows where the effort goes.
|
|
561
|
+
|
|
562
|
+
**1. Compile to an object, remembering every project flag by hand.**
|
|
563
|
+
|
|
564
|
+
```bash
|
|
565
|
+
gcc -O3 -Wall -Wno-strict-aliasing -Wextra -Wno-unused-parameter \
|
|
566
|
+
-Wpointer-arith -Wno-float-conversion -Wno-missing-declarations \
|
|
567
|
+
-DAMY_WAVETABLE -Isrc -c src/log2_exp2.c -o /tmp/candidate.o
|
|
568
|
+
```
|
|
569
|
+
|
|
570
|
+
Drop one flag (say `-Wno-float-conversion`) and nothing errors — the build
|
|
571
|
+
just quietly takes a different codegen path, and the comparison you're
|
|
572
|
+
about to make is invalid without telling you so. Repeat this for the
|
|
573
|
+
baseline tree with its own `-I`, and again for every extra compiler you
|
|
574
|
+
want in the matrix.
|
|
575
|
+
|
|
576
|
+
**2. Disassemble the function out of the object.**
|
|
577
|
+
|
|
578
|
+
```bash
|
|
579
|
+
objdump -dr --no-show-raw-insn -M no-aliases /tmp/candidate.o
|
|
580
|
+
```
|
|
581
|
+
|
|
582
|
+
For a libcall site (`ldexpf(x, n)` with a runtime `n`), the real output is:
|
|
583
|
+
|
|
584
|
+
```
|
|
585
|
+
0000000000000000 <g>:
|
|
586
|
+
0: endbr64
|
|
587
|
+
4: jmp 9 <g+0x9>
|
|
588
|
+
5: R_X86_64_PLT32 ldexpf-0x4
|
|
589
|
+
```
|
|
590
|
+
|
|
591
|
+
The call target isn't in the instruction — `jmp 9 <g+0x9>` points at an
|
|
592
|
+
unresolved stub inside the same function. The actual symbol, `ldexpf`, only
|
|
593
|
+
shows up in the relocation line underneath, and you have to know to cross-
|
|
594
|
+
reference it by hand. Compare that to `gcc -S`, which prints the symbol
|
|
595
|
+
inline because it hasn't been through a linker/relocation step yet:
|
|
596
|
+
|
|
597
|
+
```
|
|
598
|
+
g:
|
|
599
|
+
endbr64
|
|
600
|
+
jmp ldexpf@PLT
|
|
601
|
+
```
|
|
602
|
+
|
|
603
|
+
That's why asmdiff compiles with `-S` instead of going through `objdump` on
|
|
604
|
+
a linked object — the thing you're looking for (is this a libcall, and to
|
|
605
|
+
what) is already text, not a relocation entry you have to decode.
|
|
606
|
+
|
|
607
|
+
**3. Strip the noise objdump adds that `-S` doesn't.** Every instruction
|
|
608
|
+
line carries a leading address and (unless `--no-show-raw-insn` is passed)
|
|
609
|
+
raw opcode bytes; there's a `file format elf64-x86-64` banner, a
|
|
610
|
+
`Disassembly of section .text:` header, and an address-annotated function
|
|
611
|
+
label instead of a bare one. None of it is informative for a codegen diff,
|
|
612
|
+
all of it has to be deleted by hand before two functions are readable
|
|
613
|
+
side by side — and it has to be deleted from **every** file in the
|
|
614
|
+
comparison, four of them for the two-function/two-tree case above.
|
|
615
|
+
|
|
616
|
+
**4. Diff the cleaned pair.** `diff -y --width=100 old.txt new.txt` aligns
|
|
617
|
+
by content match, not position — once the two versions diverge even
|
|
618
|
+
slightly it starts pairing unrelated lines, and it has no header row to
|
|
619
|
+
label which side is which. `asmdiff` prints its own aligned columns
|
|
620
|
+
(`side_by_side()`) with the two function names as headers, and never loses
|
|
621
|
+
the pairing because it doesn't try to align by content — it just walks
|
|
622
|
+
both lists in lockstep.
|
|
623
|
+
|
|
624
|
+
**5. Count instructions and classify calls by hand.** Grep for `call`/`jmp`
|
|
625
|
+
in the cleaned text, then manually exclude the ones that are really local
|
|
626
|
+
branches (`jmp 4011a0 <exp2_lut+0x40>`) rather than calls to another
|
|
627
|
+
symbol — the exact distinction `CALL_RE` in `asmdiff.py` encodes once so
|
|
628
|
+
you don't re-derive it per function. Then hand-build a table from four
|
|
629
|
+
separate counts.
|
|
630
|
+
|
|
631
|
+
**6. Do all of the above again per compiler.** asmdiff's default matrix is
|
|
632
|
+
gcc *and* clang; by hand that's every step above, twice.
|
|
633
|
+
|
|
634
|
+
For the full worked example — two functions, two trees, one compiler —
|
|
635
|
+
the manual version is roughly: 2 compiles (with hand-retyped flags) → 4
|
|
636
|
+
`objdump`/relocation-lookup passes → noise-stripped by hand on 4 files →
|
|
637
|
+
2 `diff -y` runs that don't survive drift → manual instruction counts and
|
|
638
|
+
call classification on 4 files → a hand-assembled summary table. The
|
|
639
|
+
`asmdiff.py` version is the one command already shown above. Neither
|
|
640
|
+
workflow can skip understanding *why* the two functions differ — that part
|
|
641
|
+
is still your judgment — but everything upstream of that judgment, where a
|
|
642
|
+
dropped flag or a misread relocation silently invalidates the comparison,
|
|
643
|
+
is what the tool removes.
|
|
644
|
+
|
|
645
|
+
### Why not just run gcc -S by hand?
|
|
646
|
+
|
|
647
|
+
`-S` output sidesteps the relocation-decoding problem above — call targets
|
|
648
|
+
are already symbolic text, no PLT stub to resolve. That removes step 2 of
|
|
649
|
+
the objdump workflow. It does not remove the rest.
|
|
650
|
+
|
|
651
|
+
**1. Compile to text instead of an object** — same flags, same risk of a
|
|
652
|
+
silently dropped one:
|
|
653
|
+
|
|
654
|
+
```bash
|
|
655
|
+
gcc -O3 -Wall -Wno-strict-aliasing -Wextra -Wno-unused-parameter \
|
|
656
|
+
-Wpointer-arith -Wno-float-conversion -Wno-missing-declarations \
|
|
657
|
+
-DAMY_WAVETABLE -Isrc -S src/log2_exp2.c -o /tmp/log2_exp2.s
|
|
658
|
+
```
|
|
659
|
+
|
|
660
|
+
**2. Find where the function starts and ends in the `.s` file.** The real
|
|
661
|
+
output for `exp2_lut` in this repo (current build, `AMY_USE_FIXEDPOINT`
|
|
662
|
+
on):
|
|
663
|
+
|
|
664
|
+
```
|
|
665
|
+
exp2_lut:
|
|
666
|
+
.LFB71:
|
|
667
|
+
.cfi_startproc
|
|
668
|
+
endbr64
|
|
669
|
+
movl %edi, %edx
|
|
670
|
+
leaq 2+exp2_fxpt_lutable(%rip), %rcx
|
|
671
|
+
...
|
|
672
|
+
ret
|
|
673
|
+
.cfi_endproc
|
|
674
|
+
.LFE71:
|
|
675
|
+
.size exp2_lut, .-exp2_lut
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
There's no `objdump`-style address column to strip, but you still have to
|
|
679
|
+
find the boundary by hand: the function starts at a column-0 label
|
|
680
|
+
(`exp2_lut:`, not `.LFB71:` — that's a bracketing label, not the function),
|
|
681
|
+
and ends at its `.size` directive — which only gcc reliably emits; on a
|
|
682
|
+
compiler that doesn't, you'd fall back to "next function label", which is
|
|
683
|
+
exactly the two-case rule `extract_functions()` implements once instead of
|
|
684
|
+
you re-deriving it per file.
|
|
685
|
+
|
|
686
|
+
**3. Strip compiler furniture — but not indiscriminately.** `.cfi_*`,
|
|
687
|
+
`.LFB`/`.LFE` bracket labels, and `.p2align` carry no information. A local
|
|
688
|
+
`.L`-numbered label sometimes does, though, and you can't tell which
|
|
689
|
+
without reading the body. `log2_lut` in the same file:
|
|
690
|
+
|
|
691
|
+
```
|
|
692
|
+
log2_lut:
|
|
693
|
+
.LFB70:
|
|
694
|
+
.cfi_startproc
|
|
695
|
+
endbr64
|
|
696
|
+
xorl %eax, %eax
|
|
697
|
+
cmpl $8388607, %edi
|
|
698
|
+
jg .L9
|
|
699
|
+
.p2align 4,,10
|
|
700
|
+
.p2align 3
|
|
701
|
+
.L3:
|
|
702
|
+
addl %edi, %edi
|
|
703
|
+
subl $1, %eax
|
|
704
|
+
cmpl $8388607, %edi
|
|
705
|
+
jle .L3
|
|
706
|
+
cmpl $16777215, %edi
|
|
707
|
+
jle .L11
|
|
708
|
+
.p2align 4,,10
|
|
709
|
+
.p2align 3
|
|
710
|
+
.L5:
|
|
711
|
+
sarl %edi
|
|
712
|
+
addl $1, %eax
|
|
713
|
+
.L9:
|
|
714
|
+
cmpl $16777215, %edi
|
|
715
|
+
jg .L5
|
|
716
|
+
.L11:
|
|
717
|
+
...
|
|
718
|
+
```
|
|
719
|
+
|
|
720
|
+
`.L3`, `.L5`, `.L9`, `.L11` are live loop targets — `jg .L9` and `jle .L3`
|
|
721
|
+
jump to them. A quick-and-dirty cleanup pass like `grep -v '^\.'` (strip
|
|
722
|
+
every line starting with a dot) deletes those labels along with the
|
|
723
|
+
`.p2align` noise sitting right next to them, and now the function has
|
|
724
|
+
dangling jumps to labels that no longer exist — silently wrong, not an
|
|
725
|
+
error. The correct rule is "drop this specific set of directives and this
|
|
726
|
+
specific set of *bracketing* labels, keep everything else" — which is a
|
|
727
|
+
narrower, easier-to-get-wrong rule than it looks, and it's what `NOISE`
|
|
728
|
+
and `NOISE_LABEL` encode once in `asmdiff.py` instead of per file.
|
|
729
|
+
|
|
730
|
+
**4. Everything downstream is unchanged from the objdump case:** pair the
|
|
731
|
+
two cleaned functions up for reading, count instructions, classify
|
|
732
|
+
`call`/`jmp` lines as libcalls vs. local branches, repeat per function,
|
|
733
|
+
per file, per compiler, and assemble a summary table by hand.
|
|
734
|
+
|
|
735
|
+
So `-S` over `objdump` buys back exactly one step — the call target is
|
|
736
|
+
already a name, not a relocation to look up — and leaves the rest of the
|
|
737
|
+
manual pipeline (locate, strip correctly, pair, count, classify, tally,
|
|
738
|
+
multiplied by every function/tree/compiler in the matrix) in place. That
|
|
739
|
+
remaining pipeline is `extract_functions()`, `analyze()`,
|
|
740
|
+
`side_by_side()`, and `summary_table()` in `asmdiff.py` — written once,
|
|
741
|
+
instead of re-derived by hand every time someone wants to answer "did this
|
|
742
|
+
still fold?"
|
|
743
|
+
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
asmdiff.py,sha256=iqpMyirSUYH2hWfYZwGnmxlYFTh8lNL-EceTV6S84dY,22885
|
|
2
|
+
asmdiff-0.1.0.dist-info/METADATA,sha256=ovEO3ftLKesG4zJ0xG0S94Krnt4k2O1XzR5Gpr7dVaU,32701
|
|
3
|
+
asmdiff-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
|
|
4
|
+
asmdiff-0.1.0.dist-info/entry_points.txt,sha256=bVeIyjLm3KgWALbgRSeQo16pZ89GXZZ4sFTSWJUMdTI,41
|
|
5
|
+
asmdiff-0.1.0.dist-info/licenses/LICENSE,sha256=Y8JW9wzNeVnfyw62X3qC9cuwZCv_U67rNSF9QUE0HJs,1072
|
|
6
|
+
asmdiff-0.1.0.dist-info/RECORD,,
|
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Rasmus Tikkanen
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
asmdiff.py
ADDED
|
@@ -0,0 +1,575 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""Compare per-function assembly between paired C implementations.
|
|
3
|
+
|
|
4
|
+
Compiles a harness C file across a matrix of compilers, extracts each
|
|
5
|
+
variant function's assembly from the -S output, and prints side-by-side
|
|
6
|
+
listings plus a summary of instruction counts, outbound calls, and loop
|
|
7
|
+
spans (instructions between a local label and its last backward branch).
|
|
8
|
+
Automates fold-vs-libcall analysis when evaluating micro-optimisations
|
|
9
|
+
(e.g. "does this still compile to one instruction, or is it a libcall?").
|
|
10
|
+
|
|
11
|
+
Usage:
|
|
12
|
+
tools/asmdiff/asmdiff.py SOURCE.c [SOURCE2.c] [--pair OLD:NEW]...
|
|
13
|
+
[--across FUNC]... [--cc 'CC FLAGS']...
|
|
14
|
+
[--target NAME]... [--config PATH]
|
|
15
|
+
[-- EXTRA_FLAGS...]
|
|
16
|
+
|
|
17
|
+
Three modes:
|
|
18
|
+
--pair OLD:NEW compares two different functions within one compilation
|
|
19
|
+
(with no --pair, old_X/new_X names auto-pair).
|
|
20
|
+
--across FUNC compares the SAME function across two compilations:
|
|
21
|
+
either one file under two --cc entries (flag/define
|
|
22
|
+
variants), or two source files (before/after versions)
|
|
23
|
+
under each compiler in the matrix.
|
|
24
|
+
(neither) whole-file summary: per-function counts plus a file
|
|
25
|
+
total, for one file or side by side for two.
|
|
26
|
+
|
|
27
|
+
Compilers come from --cc strings, from named targets in an asmdiff.toml
|
|
28
|
+
config file (--target NAME), from the config's `default` entry, or —
|
|
29
|
+
failing all of those — plain `gcc -O3` and `clang -O3`.
|
|
30
|
+
Flags after a bare `--` are appended to every compiler invocation.
|
|
31
|
+
"""
|
|
32
|
+
import argparse
|
|
33
|
+
import glob
|
|
34
|
+
import hashlib
|
|
35
|
+
import os
|
|
36
|
+
import re
|
|
37
|
+
import shlex
|
|
38
|
+
import shutil
|
|
39
|
+
import subprocess
|
|
40
|
+
import sys
|
|
41
|
+
import tempfile
|
|
42
|
+
from itertools import zip_longest
|
|
43
|
+
from pathlib import Path
|
|
44
|
+
|
|
45
|
+
try:
|
|
46
|
+
import tomllib # Python >= 3.11; config files are optional without it
|
|
47
|
+
except ModuleNotFoundError:
|
|
48
|
+
tomllib = None
|
|
49
|
+
|
|
50
|
+
DEFAULT_COMPILERS = ["gcc", "clang"]
|
|
51
|
+
FALLBACK_FLAGS = "-O3"
|
|
52
|
+
CONFIG_NAME = "asmdiff.toml"
|
|
53
|
+
|
|
54
|
+
# A label at column 0 that is not a local (.L*) label starts a function.
|
|
55
|
+
FUNC_LABEL = re.compile(r"^([A-Za-z_][\w$.]*):")
|
|
56
|
+
# Assembler directives that carry no information worth reading.
|
|
57
|
+
NOISE = re.compile(
|
|
58
|
+
r"^\s*\.(cfi_|p2align|align\b|loc\b|file\b|text\b|globl\b|global\b|"
|
|
59
|
+
r"type\b|section\b|ident\b|weak\b|hidden\b|addrsig|build_version)"
|
|
60
|
+
)
|
|
61
|
+
# Compiler-generated bracketing labels that add nothing (.LFB0:, .Lfunc_end0:).
|
|
62
|
+
NOISE_LABEL = re.compile(r"^\.(LFB|LFE|Lfunc_begin|Lfunc_end)\d*:")
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
def extract_functions(asm_text):
|
|
66
|
+
"""Map function name -> cleaned asm lines from compiler -S output.
|
|
67
|
+
|
|
68
|
+
A function body runs from its column-0 label to the matching .size
|
|
69
|
+
directive (gcc and clang both emit one on ELF) or the next function
|
|
70
|
+
label. Comment lines, CFI/section/alignment directives, and
|
|
71
|
+
compiler bracketing labels are dropped; instructions and meaningful
|
|
72
|
+
local labels (loop targets) are kept, whitespace-stripped.
|
|
73
|
+
"""
|
|
74
|
+
funcs = {}
|
|
75
|
+
current = None
|
|
76
|
+
for raw in asm_text.splitlines():
|
|
77
|
+
m = FUNC_LABEL.match(raw)
|
|
78
|
+
if m:
|
|
79
|
+
current = m.group(1)
|
|
80
|
+
funcs[current] = []
|
|
81
|
+
continue
|
|
82
|
+
if current is None:
|
|
83
|
+
continue
|
|
84
|
+
if re.match(r"^\s*\.size\b", raw):
|
|
85
|
+
current = None
|
|
86
|
+
continue
|
|
87
|
+
line = raw.strip()
|
|
88
|
+
if not line or line.startswith(("#", "//")):
|
|
89
|
+
continue
|
|
90
|
+
if NOISE.match(line) or NOISE_LABEL.match(line):
|
|
91
|
+
continue
|
|
92
|
+
funcs[current].append(line)
|
|
93
|
+
return funcs
|
|
94
|
+
|
|
95
|
+
|
|
96
|
+
# Direct-call / tail-call mnemonics across x86 (call, jmp), ARM (bl, blx),
|
|
97
|
+
# RISC-V (call, tail, jal), and Xtensa (call0/4/8/12, callx*, j). Longest
|
|
98
|
+
# alternatives first so e.g. "callx8" is not consumed as "call". The
|
|
99
|
+
# symbol must be the sole/final operand (optionally @PLT-suffixed), so
|
|
100
|
+
# multi-operand forms like "jal ra, exp2f" don't report the register.
|
|
101
|
+
CALL_RE = re.compile(
|
|
102
|
+
r"^(?:callx\d+|call\d*|callq|jalr|jal|jmp|blx|bl|tail|j)\s+"
|
|
103
|
+
r"([A-Za-z_][\w$.]*)(?:@[\w.]+)?\s*(?:[#;].*)?$"
|
|
104
|
+
)
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
def analyze(lines):
|
|
108
|
+
"""Return (instruction_count, called_symbols) for cleaned asm lines.
|
|
109
|
+
|
|
110
|
+
A call is a call/tail-call mnemonic whose first operand looks like a
|
|
111
|
+
symbol name — local labels (.L*) and %-registers never match, so
|
|
112
|
+
branches inside the function are not counted.
|
|
113
|
+
"""
|
|
114
|
+
insns = 0
|
|
115
|
+
calls = []
|
|
116
|
+
for line in lines:
|
|
117
|
+
if line.endswith(":"):
|
|
118
|
+
continue
|
|
119
|
+
insns += 1
|
|
120
|
+
m = CALL_RE.match(line)
|
|
121
|
+
if m:
|
|
122
|
+
sym = m.group(1)
|
|
123
|
+
if sym not in calls:
|
|
124
|
+
calls.append(sym)
|
|
125
|
+
return insns, calls
|
|
126
|
+
|
|
127
|
+
|
|
128
|
+
# A local-label operand (branch target, zero-overhead loop end). Literal
|
|
129
|
+
# pool labels (.LC0) also match, but they are emitted outside function
|
|
130
|
+
# bodies, so they never appear in the label map built from a body.
|
|
131
|
+
LABEL_REF = re.compile(r"\.L[\w$.]+")
|
|
132
|
+
|
|
133
|
+
|
|
134
|
+
def loop_spans(lines):
|
|
135
|
+
"""Return [(label, insns)] spans for cleaned asm lines.
|
|
136
|
+
|
|
137
|
+
A span is the run of instructions from a local label to the last
|
|
138
|
+
instruction that references it from below — a backward branch, which
|
|
139
|
+
is what a compiled loop looks like on every target the tool parses.
|
|
140
|
+
Xtensa zero-overhead loops (loop/loopnez/loopgt) reference their END
|
|
141
|
+
label instead; there the span is the instructions the loop encloses.
|
|
142
|
+
Spans are reported in order of appearance, one per label; nested
|
|
143
|
+
labels yield nested spans. The count states how many instructions
|
|
144
|
+
lie in the span — nothing about trip count or hotness, which the
|
|
145
|
+
reader must judge from the source.
|
|
146
|
+
"""
|
|
147
|
+
label_at = {ln[:-1]: i for i, ln in enumerate(lines)
|
|
148
|
+
if ln.endswith(":")}
|
|
149
|
+
spans = {}
|
|
150
|
+
for i, ln in enumerate(lines):
|
|
151
|
+
if ln.endswith(":"):
|
|
152
|
+
continue
|
|
153
|
+
mnem = ln.split(None, 1)[0]
|
|
154
|
+
for ref in LABEL_REF.findall(ln):
|
|
155
|
+
if ref not in label_at:
|
|
156
|
+
continue
|
|
157
|
+
j = label_at[ref]
|
|
158
|
+
if j < i: # label above: backward branch
|
|
159
|
+
lo, hi = j, i
|
|
160
|
+
elif mnem.startswith("loop"): # Xtensa: end label below
|
|
161
|
+
lo, hi = i + 1, j - 1
|
|
162
|
+
else:
|
|
163
|
+
continue
|
|
164
|
+
if ref in spans: # several edges to one label
|
|
165
|
+
lo = min(lo, spans[ref][0])
|
|
166
|
+
hi = max(hi, spans[ref][1])
|
|
167
|
+
spans[ref] = (lo, hi)
|
|
168
|
+
result = []
|
|
169
|
+
for ref, (lo, hi) in sorted(spans.items(), key=lambda kv: kv[1]):
|
|
170
|
+
insns = sum(1 for ln in lines[lo:hi + 1] if not ln.endswith(":"))
|
|
171
|
+
result.append((ref, insns))
|
|
172
|
+
return result
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
def auto_pairs(names):
|
|
176
|
+
"""Pair old_X with new_X for every X present in both."""
|
|
177
|
+
names = list(names)
|
|
178
|
+
return [(n, "new_" + n[4:]) for n in names
|
|
179
|
+
if n.startswith("old_") and "new_" + n[4:] in names]
|
|
180
|
+
|
|
181
|
+
|
|
182
|
+
def find_config(explicit, sources):
|
|
183
|
+
"""Locate the config file; first hit wins, no merging.
|
|
184
|
+
|
|
185
|
+
Order: --config PATH, then asmdiff.toml next to the first source
|
|
186
|
+
file (a harness directory can carry its own targets), then the
|
|
187
|
+
current directory, then ~/.config/asmdiff.toml.
|
|
188
|
+
"""
|
|
189
|
+
if explicit:
|
|
190
|
+
path = Path(explicit)
|
|
191
|
+
if not path.is_file():
|
|
192
|
+
sys.exit(f"error: config file not found: {explicit}")
|
|
193
|
+
return path
|
|
194
|
+
for candidate in (Path(sources[0]).resolve().parent / CONFIG_NAME,
|
|
195
|
+
Path.cwd() / CONFIG_NAME,
|
|
196
|
+
Path.home() / ".config" / CONFIG_NAME):
|
|
197
|
+
if candidate.is_file():
|
|
198
|
+
return candidate
|
|
199
|
+
return None
|
|
200
|
+
|
|
201
|
+
|
|
202
|
+
def load_config(path):
|
|
203
|
+
"""Parse a TOML config: one [table] per target, optional top-level
|
|
204
|
+
`default` naming the target(s) to run when no --cc/--target is given."""
|
|
205
|
+
if tomllib is None:
|
|
206
|
+
sys.exit(f"error: {path} exists but this Python has no tomllib "
|
|
207
|
+
"(config files need Python >= 3.11)")
|
|
208
|
+
try:
|
|
209
|
+
with open(path, "rb") as fh:
|
|
210
|
+
return tomllib.load(fh)
|
|
211
|
+
except tomllib.TOMLDecodeError as exc:
|
|
212
|
+
sys.exit(f"error: {path}: {exc}")
|
|
213
|
+
|
|
214
|
+
|
|
215
|
+
def _version_key(path):
|
|
216
|
+
"""Sort key that orders embedded numbers numerically, so
|
|
217
|
+
esp-15.2.0 ranks above esp-9.1.0 (lexical order would not)."""
|
|
218
|
+
return [(0, int(tok)) if tok.isdigit() else (1, tok)
|
|
219
|
+
for tok in re.split(r"(\d+)", path)]
|
|
220
|
+
|
|
221
|
+
|
|
222
|
+
def resolve_cc(cc, name):
|
|
223
|
+
"""Expand ~, $VARS, and glob patterns in a target's cc value.
|
|
224
|
+
|
|
225
|
+
A pattern like .../xtensa-esp-elf/esp-*/bin/...-gcc keeps the config
|
|
226
|
+
toolchain-version agnostic. If it matches several installed
|
|
227
|
+
toolchains the highest version-sorted one is used, and the choice is
|
|
228
|
+
printed so it is never silent; no match is an error.
|
|
229
|
+
"""
|
|
230
|
+
expanded = os.path.expandvars(os.path.expanduser(cc))
|
|
231
|
+
if not any(ch in expanded for ch in "*?["):
|
|
232
|
+
return expanded
|
|
233
|
+
matches = sorted(glob.glob(expanded), key=_version_key)
|
|
234
|
+
if not matches:
|
|
235
|
+
sys.exit(f"error: target [{name}]: cc pattern matched nothing: "
|
|
236
|
+
+ expanded)
|
|
237
|
+
if len(matches) > 1:
|
|
238
|
+
print(f"target [{name}]: cc pattern matched {len(matches)} "
|
|
239
|
+
f"toolchains, using {matches[-1]}", file=sys.stderr)
|
|
240
|
+
return matches[-1]
|
|
241
|
+
|
|
242
|
+
|
|
243
|
+
def target_command(config, name, config_path):
|
|
244
|
+
"""Resolve a named [target] table to one 'CC FLAGS' matrix entry."""
|
|
245
|
+
entry = (config or {}).get(name)
|
|
246
|
+
if not isinstance(entry, dict):
|
|
247
|
+
known = sorted(k for k, v in (config or {}).items()
|
|
248
|
+
if isinstance(v, dict))
|
|
249
|
+
sys.exit(f"error: no [{name}] target in "
|
|
250
|
+
f"{config_path or 'any config file'}"
|
|
251
|
+
+ ("; targets: " + ", ".join(known) if known
|
|
252
|
+
else "; no targets defined"))
|
|
253
|
+
cc = entry.get("cc")
|
|
254
|
+
if not isinstance(cc, str):
|
|
255
|
+
sys.exit(f'error: target [{name}] needs cc = "compiler"')
|
|
256
|
+
flags = entry.get("flags", [])
|
|
257
|
+
if isinstance(flags, str) or not all(isinstance(f, str) for f in flags):
|
|
258
|
+
sys.exit(f"error: target [{name}]: flags must be an array of strings")
|
|
259
|
+
flags = [os.path.expandvars(f) for f in flags]
|
|
260
|
+
return shlex.join([resolve_cc(cc, name), *flags])
|
|
261
|
+
|
|
262
|
+
|
|
263
|
+
def build_matrix(cc_args, target_args, config, config_path):
|
|
264
|
+
"""Resolve the compiler matrix.
|
|
265
|
+
|
|
266
|
+
--cc strings verbatim, then --target entries, in that order. With
|
|
267
|
+
neither, the config's `default` (a target name or list of names);
|
|
268
|
+
with no config or no default, plain gcc/clang at -O3.
|
|
269
|
+
"""
|
|
270
|
+
entries = list(cc_args)
|
|
271
|
+
entries += [target_command(config, name, config_path)
|
|
272
|
+
for name in target_args]
|
|
273
|
+
if entries:
|
|
274
|
+
return entries
|
|
275
|
+
default = (config or {}).get("default")
|
|
276
|
+
if default:
|
|
277
|
+
names = [default] if isinstance(default, str) else list(default)
|
|
278
|
+
return [target_command(config, name, config_path) for name in names]
|
|
279
|
+
return [f"{cc} {FALLBACK_FLAGS}" for cc in DEFAULT_COMPILERS]
|
|
280
|
+
|
|
281
|
+
|
|
282
|
+
def asm_output_name(cc_cmd, harness):
|
|
283
|
+
"""Filesystem-safe .s name for one (compiler, source) compilation.
|
|
284
|
+
|
|
285
|
+
The readable slug of a compiler command can exceed NAME_MAX when the
|
|
286
|
+
command embeds absolute toolchain/include paths; long slugs are
|
|
287
|
+
truncated and kept unique with a short hash of the full command.
|
|
288
|
+
"""
|
|
289
|
+
tag = re.sub(r"\W+", "_", cc_cmd)
|
|
290
|
+
if len(tag) > 64:
|
|
291
|
+
tag = tag[:53] + "_" + hashlib.sha1(cc_cmd.encode()).hexdigest()[:10]
|
|
292
|
+
return tag + "_" + Path(harness).stem + ".s"
|
|
293
|
+
|
|
294
|
+
|
|
295
|
+
def compile_to_asm(cc_cmd, extra_flags, harness, out_dir):
|
|
296
|
+
"""Run one compiler to -S; return the asm text.
|
|
297
|
+
|
|
298
|
+
Returns None (with a warning) if the compiler is not on PATH.
|
|
299
|
+
Exits with the compiler's stderr on a compile failure.
|
|
300
|
+
"""
|
|
301
|
+
argv = shlex.split(cc_cmd)
|
|
302
|
+
if shutil.which(argv[0]) is None:
|
|
303
|
+
print(f"warning: {argv[0]} not found on PATH, skipping",
|
|
304
|
+
file=sys.stderr)
|
|
305
|
+
return None
|
|
306
|
+
out_s = Path(out_dir) / asm_output_name(cc_cmd, harness)
|
|
307
|
+
cmd = argv + list(extra_flags) + ["-S", "-o", str(out_s), str(harness)]
|
|
308
|
+
proc = subprocess.run(cmd, capture_output=True, text=True)
|
|
309
|
+
if proc.returncode != 0:
|
|
310
|
+
sys.exit(f"error: compile failed: {' '.join(cmd)}\n{proc.stderr}")
|
|
311
|
+
return out_s.read_text()
|
|
312
|
+
|
|
313
|
+
|
|
314
|
+
def side_by_side(left, right, ltitle, rtitle, width=44):
|
|
315
|
+
"""Two-column view of a pair's asm lines."""
|
|
316
|
+
rows = [f"{ltitle:<{width}} | {rtitle}",
|
|
317
|
+
f"{'-' * width}-+-{'-' * width}"]
|
|
318
|
+
for l, r in zip_longest(left, right, fillvalue=""):
|
|
319
|
+
l = l.expandtabs(8)[:width]
|
|
320
|
+
r = r.expandtabs(8)[:width]
|
|
321
|
+
rows.append(f"{l:<{width}} | {r}")
|
|
322
|
+
return "\n".join(rows)
|
|
323
|
+
|
|
324
|
+
|
|
325
|
+
def render_table(rows):
|
|
326
|
+
"""Column-aligned text for a list of equal-length string tuples."""
|
|
327
|
+
widths = [max(len(row[i]) for row in rows) for i in range(len(rows[0]))]
|
|
328
|
+
return "\n".join(
|
|
329
|
+
" ".join(cell.ljust(w) for cell, w in zip(row, widths)).rstrip()
|
|
330
|
+
for row in rows)
|
|
331
|
+
|
|
332
|
+
|
|
333
|
+
def format_spans(spans):
|
|
334
|
+
return " ".join(f"{label}:{n}" for label, n in spans) or "-"
|
|
335
|
+
|
|
336
|
+
|
|
337
|
+
def summary_table(pairs, funcs):
|
|
338
|
+
"""Instruction counts, outbound calls, and loop spans per pair member."""
|
|
339
|
+
rows = [("function", "role", "insns", "calls", "loop spans")]
|
|
340
|
+
for old, new in pairs:
|
|
341
|
+
for name, role in ((old, "baseline"), (new, "candidate")):
|
|
342
|
+
insns, calls = analyze(funcs[name])
|
|
343
|
+
rows.append((name, role, str(insns),
|
|
344
|
+
", ".join(calls) or "-",
|
|
345
|
+
format_spans(loop_spans(funcs[name]))))
|
|
346
|
+
return render_table(rows)
|
|
347
|
+
|
|
348
|
+
|
|
349
|
+
def file_summary_table(funcs):
|
|
350
|
+
"""Per-function counts plus a whole-file total row.
|
|
351
|
+
|
|
352
|
+
The total sums instruction counts over every function parsed from
|
|
353
|
+
the -S output and unions their outbound calls — a coarse A/B sanity
|
|
354
|
+
check, not a code-size measurement (literal pools, data, and
|
|
355
|
+
alignment are not included).
|
|
356
|
+
"""
|
|
357
|
+
rows = [("function", "insns", "calls", "loop spans")]
|
|
358
|
+
total_insns, all_calls = 0, []
|
|
359
|
+
for name, lines in funcs.items():
|
|
360
|
+
insns, calls = analyze(lines)
|
|
361
|
+
total_insns += insns
|
|
362
|
+
for sym in calls:
|
|
363
|
+
if sym not in all_calls:
|
|
364
|
+
all_calls.append(sym)
|
|
365
|
+
rows.append((name, str(insns), ", ".join(calls) or "-",
|
|
366
|
+
format_spans(loop_spans(lines))))
|
|
367
|
+
rows.append((f"TOTAL ({len(funcs)} functions)", str(total_insns),
|
|
368
|
+
", ".join(all_calls) or "-", "-"))
|
|
369
|
+
return render_table(rows)
|
|
370
|
+
|
|
371
|
+
|
|
372
|
+
def file_tags(a, b):
|
|
373
|
+
"""Shortest distinct labels for two source paths in across-mode output."""
|
|
374
|
+
pa, pb = Path(a), Path(b)
|
|
375
|
+
if pa.name != pb.name:
|
|
376
|
+
return pa.name, pb.name
|
|
377
|
+
ta = f"{pa.parent.name}/{pa.name}"
|
|
378
|
+
tb = f"{pb.parent.name}/{pb.name}"
|
|
379
|
+
if ta != tb:
|
|
380
|
+
return ta, tb
|
|
381
|
+
return str(a), str(b)
|
|
382
|
+
|
|
383
|
+
|
|
384
|
+
def report_across(fn_names, left_funcs, right_funcs, left_tag, right_tag):
|
|
385
|
+
"""Side-by-side + summary for the same functions from two compilations."""
|
|
386
|
+
missing = sorted({f for f in fn_names
|
|
387
|
+
if f not in left_funcs or f not in right_funcs})
|
|
388
|
+
if missing:
|
|
389
|
+
sys.exit("error: function(s) not in asm: " + ", ".join(missing)
|
|
390
|
+
+ f"; {left_tag} has: " + (", ".join(left_funcs) or "none")
|
|
391
|
+
+ f"; {right_tag} has: " + (", ".join(right_funcs) or "none"))
|
|
392
|
+
decorated, pairs = {}, []
|
|
393
|
+
for f in fn_names:
|
|
394
|
+
lt, rt = f"{f} [{left_tag}]", f"{f} [{right_tag}]"
|
|
395
|
+
decorated[lt], decorated[rt] = left_funcs[f], right_funcs[f]
|
|
396
|
+
pairs.append((lt, rt))
|
|
397
|
+
for lt, rt in pairs:
|
|
398
|
+
print(side_by_side(decorated[lt], decorated[rt], lt, rt))
|
|
399
|
+
print()
|
|
400
|
+
print(summary_table(pairs, decorated))
|
|
401
|
+
|
|
402
|
+
|
|
403
|
+
def run_across(sources, matrix, fn_names, extra_flags, tmp):
|
|
404
|
+
"""--across mode: same function, two compilations.
|
|
405
|
+
|
|
406
|
+
Two source files: compare fileA's FUNC vs fileB's FUNC under each
|
|
407
|
+
compiler in the matrix. One source file: compare FUNC between the
|
|
408
|
+
first --cc entry (baseline) and each subsequent entry.
|
|
409
|
+
"""
|
|
410
|
+
if len(sources) == 2:
|
|
411
|
+
ran_any = False
|
|
412
|
+
for cc_cmd in matrix:
|
|
413
|
+
sides = []
|
|
414
|
+
for src in sources:
|
|
415
|
+
asm = compile_to_asm(cc_cmd, extra_flags, src, tmp)
|
|
416
|
+
if asm is None:
|
|
417
|
+
break
|
|
418
|
+
sides.append(extract_functions(asm))
|
|
419
|
+
if len(sides) < 2:
|
|
420
|
+
continue
|
|
421
|
+
ran_any = True
|
|
422
|
+
print(f"\n== {cc_cmd} ==\n")
|
|
423
|
+
report_across(fn_names, sides[0], sides[1],
|
|
424
|
+
*file_tags(sources[0], sources[1]))
|
|
425
|
+
if not ran_any:
|
|
426
|
+
sys.exit("error: no usable compiler in the matrix")
|
|
427
|
+
return 0
|
|
428
|
+
|
|
429
|
+
usable = []
|
|
430
|
+
for idx, cc_cmd in enumerate(matrix, start=1):
|
|
431
|
+
asm = compile_to_asm(cc_cmd, extra_flags, sources[0], tmp)
|
|
432
|
+
if asm is not None:
|
|
433
|
+
usable.append((f"cc#{idx}", cc_cmd, extract_functions(asm)))
|
|
434
|
+
if len(usable) < 2:
|
|
435
|
+
sys.exit("error: --across needs at least two usable compilers "
|
|
436
|
+
"in the matrix")
|
|
437
|
+
print()
|
|
438
|
+
for tag, cc_cmd, _ in usable:
|
|
439
|
+
print(f"{tag}: {cc_cmd}")
|
|
440
|
+
base_tag, _, base_funcs = usable[0]
|
|
441
|
+
for tag, _, funcs in usable[1:]:
|
|
442
|
+
print(f"\n== {base_tag} vs {tag} ==\n")
|
|
443
|
+
report_across(fn_names, base_funcs, funcs, base_tag, tag)
|
|
444
|
+
return 0
|
|
445
|
+
|
|
446
|
+
|
|
447
|
+
def run_summary(sources, matrix, extra_flags, tmp):
|
|
448
|
+
"""No pairs to compare: whole-file summary, one block per file."""
|
|
449
|
+
tags = (file_tags(*sources) if len(sources) == 2
|
|
450
|
+
else [Path(sources[0]).name])
|
|
451
|
+
ran_any = False
|
|
452
|
+
for cc_cmd in matrix:
|
|
453
|
+
sections = []
|
|
454
|
+
for src in sources:
|
|
455
|
+
asm = compile_to_asm(cc_cmd, extra_flags, src, tmp)
|
|
456
|
+
if asm is None:
|
|
457
|
+
break
|
|
458
|
+
sections.append(extract_functions(asm))
|
|
459
|
+
if len(sections) < len(sources):
|
|
460
|
+
continue
|
|
461
|
+
ran_any = True
|
|
462
|
+
print(f"\n== {cc_cmd} ==")
|
|
463
|
+
for tag, funcs in zip(tags, sections):
|
|
464
|
+
if len(sections) > 1:
|
|
465
|
+
print(f"\n-- {tag} --")
|
|
466
|
+
print()
|
|
467
|
+
print(file_summary_table(funcs) if funcs
|
|
468
|
+
else "(no functions found)")
|
|
469
|
+
if not ran_any:
|
|
470
|
+
sys.exit("error: no usable compiler in the matrix")
|
|
471
|
+
return 0
|
|
472
|
+
|
|
473
|
+
|
|
474
|
+
def run_pairs(source, matrix, pair_specs, extra_flags, tmp):
|
|
475
|
+
"""--pair mode: two different functions within one compilation.
|
|
476
|
+
|
|
477
|
+
With no --pair and no old_X/new_X functions to auto-pair, falls
|
|
478
|
+
back to the whole-file summary for this compilation.
|
|
479
|
+
"""
|
|
480
|
+
ran_any = False
|
|
481
|
+
for cc_cmd in matrix:
|
|
482
|
+
asm = compile_to_asm(cc_cmd, extra_flags, source, tmp)
|
|
483
|
+
if asm is None:
|
|
484
|
+
continue
|
|
485
|
+
ran_any = True
|
|
486
|
+
funcs = extract_functions(asm)
|
|
487
|
+
pairs = ([tuple(p.split(":", 1)) for p in pair_specs]
|
|
488
|
+
or auto_pairs(funcs))
|
|
489
|
+
if not pairs:
|
|
490
|
+
print(f"\n== {cc_cmd} ==\n")
|
|
491
|
+
print(file_summary_table(funcs) if funcs
|
|
492
|
+
else "(no functions found)")
|
|
493
|
+
continue
|
|
494
|
+
missing = sorted({n for p in pairs for n in p if n not in funcs})
|
|
495
|
+
if missing:
|
|
496
|
+
sys.exit("error: function(s) not in asm: "
|
|
497
|
+
+ ", ".join(missing)
|
|
498
|
+
+ "; functions seen: " + ", ".join(funcs))
|
|
499
|
+
print(f"\n== {cc_cmd} ==\n")
|
|
500
|
+
for old, new in pairs:
|
|
501
|
+
print(side_by_side(funcs[old], funcs[new], old, new))
|
|
502
|
+
print()
|
|
503
|
+
print(summary_table(pairs, funcs))
|
|
504
|
+
if not ran_any:
|
|
505
|
+
sys.exit("error: no usable compiler in the matrix")
|
|
506
|
+
return 0
|
|
507
|
+
|
|
508
|
+
|
|
509
|
+
def main(argv=None):
|
|
510
|
+
argv = list(sys.argv[1:] if argv is None else argv)
|
|
511
|
+
extra_flags = []
|
|
512
|
+
if "--" in argv:
|
|
513
|
+
cut = argv.index("--")
|
|
514
|
+
argv, extra_flags = argv[:cut], argv[cut + 1:]
|
|
515
|
+
|
|
516
|
+
parser = argparse.ArgumentParser(
|
|
517
|
+
description=(__doc__ or "").partition("\n")[0],
|
|
518
|
+
epilog="Flags after a bare -- are appended to every compiler "
|
|
519
|
+
"invocation, e.g.: asmdiff.py h.c -- -fno-math-errno")
|
|
520
|
+
parser.add_argument("sources", nargs="+", metavar="SOURCE.c",
|
|
521
|
+
help="C file to compile; give two files with "
|
|
522
|
+
"--across to compare versions of a function")
|
|
523
|
+
parser.add_argument("--pair", action="append", default=[],
|
|
524
|
+
metavar="OLD:NEW",
|
|
525
|
+
help="compare two functions within one compilation "
|
|
526
|
+
"(repeatable); default: auto-pair old_X/new_X")
|
|
527
|
+
parser.add_argument("--across", action="append", default=[],
|
|
528
|
+
metavar="FUNC",
|
|
529
|
+
help="compare the same function across two "
|
|
530
|
+
"compilations (repeatable): one file + two "
|
|
531
|
+
"--cc entries, or two files")
|
|
532
|
+
parser.add_argument("--cc", action="append", default=[],
|
|
533
|
+
metavar="'CC FLAGS'",
|
|
534
|
+
help="compiler and flags as one string (repeatable); "
|
|
535
|
+
"default: config default target, else gcc and "
|
|
536
|
+
"clang at " + FALLBACK_FLAGS)
|
|
537
|
+
parser.add_argument("--target", action="append", default=[],
|
|
538
|
+
metavar="NAME",
|
|
539
|
+
help="named [table] from the config file, resolved "
|
|
540
|
+
"to a --cc entry (repeatable; appended to the "
|
|
541
|
+
"matrix after --cc entries)")
|
|
542
|
+
parser.add_argument("--config", metavar="PATH",
|
|
543
|
+
help=f"config file; default search: {CONFIG_NAME} "
|
|
544
|
+
"next to SOURCE.c, in the current directory, "
|
|
545
|
+
"then in ~/.config/")
|
|
546
|
+
args = parser.parse_args(argv)
|
|
547
|
+
|
|
548
|
+
if len(args.sources) > 2:
|
|
549
|
+
parser.error("at most two source files may be given")
|
|
550
|
+
if args.across and args.pair:
|
|
551
|
+
parser.error("--across and --pair are mutually exclusive")
|
|
552
|
+
if len(args.sources) == 2 and args.pair:
|
|
553
|
+
parser.error("--pair compares within one file; "
|
|
554
|
+
"use --across FUNC for two files")
|
|
555
|
+
config_path = find_config(args.config, args.sources)
|
|
556
|
+
config = load_config(config_path) if config_path else None
|
|
557
|
+
matrix = build_matrix(args.cc, args.target, config, config_path)
|
|
558
|
+
if args.across and len(args.sources) == 1 and len(matrix) < 2:
|
|
559
|
+
parser.error("--across on one file needs at least two --cc entries")
|
|
560
|
+
for spec in args.pair:
|
|
561
|
+
if ":" not in spec:
|
|
562
|
+
parser.error(f"--pair expects OLD:NEW, got {spec!r}")
|
|
563
|
+
|
|
564
|
+
with tempfile.TemporaryDirectory(prefix="asmdiff") as tmp:
|
|
565
|
+
if args.across:
|
|
566
|
+
return run_across(args.sources, matrix, args.across,
|
|
567
|
+
extra_flags, tmp)
|
|
568
|
+
if len(args.sources) == 2:
|
|
569
|
+
return run_summary(args.sources, matrix, extra_flags, tmp)
|
|
570
|
+
return run_pairs(args.sources[0], matrix, args.pair,
|
|
571
|
+
extra_flags, tmp)
|
|
572
|
+
|
|
573
|
+
|
|
574
|
+
if __name__ == "__main__":
|
|
575
|
+
sys.exit(main())
|