fluent-plugin-perf-tools 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +15 -0
- data/.rubocop.yml +26 -0
- data/.ruby-version +1 -0
- data/CHANGELOG.md +5 -0
- data/CODE_OF_CONDUCT.md +84 -0
- data/Gemfile +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +43 -0
- data/Rakefile +17 -0
- data/bin/console +15 -0
- data/bin/setup +8 -0
- data/fluent-plugin-perf-tools.gemspec +48 -0
- data/lib/fluent/plugin/in_perf_tools.rb +42 -0
- data/lib/fluent/plugin/perf_tools/cachestat.rb +65 -0
- data/lib/fluent/plugin/perf_tools/command.rb +30 -0
- data/lib/fluent/plugin/perf_tools/version.rb +9 -0
- data/lib/fluent/plugin/perf_tools.rb +11 -0
- data/perf-tools/LICENSE +339 -0
- data/perf-tools/README.md +205 -0
- data/perf-tools/bin/bitesize +1 -0
- data/perf-tools/bin/cachestat +1 -0
- data/perf-tools/bin/execsnoop +1 -0
- data/perf-tools/bin/funccount +1 -0
- data/perf-tools/bin/funcgraph +1 -0
- data/perf-tools/bin/funcslower +1 -0
- data/perf-tools/bin/functrace +1 -0
- data/perf-tools/bin/iolatency +1 -0
- data/perf-tools/bin/iosnoop +1 -0
- data/perf-tools/bin/killsnoop +1 -0
- data/perf-tools/bin/kprobe +1 -0
- data/perf-tools/bin/opensnoop +1 -0
- data/perf-tools/bin/perf-stat-hist +1 -0
- data/perf-tools/bin/reset-ftrace +1 -0
- data/perf-tools/bin/syscount +1 -0
- data/perf-tools/bin/tcpretrans +1 -0
- data/perf-tools/bin/tpoint +1 -0
- data/perf-tools/bin/uprobe +1 -0
- data/perf-tools/deprecated/README.md +1 -0
- data/perf-tools/deprecated/execsnoop-proc +150 -0
- data/perf-tools/deprecated/execsnoop-proc.8 +80 -0
- data/perf-tools/deprecated/execsnoop-proc_example.txt +46 -0
- data/perf-tools/disk/bitesize +175 -0
- data/perf-tools/examples/bitesize_example.txt +63 -0
- data/perf-tools/examples/cachestat_example.txt +58 -0
- data/perf-tools/examples/execsnoop_example.txt +153 -0
- data/perf-tools/examples/funccount_example.txt +126 -0
- data/perf-tools/examples/funcgraph_example.txt +2178 -0
- data/perf-tools/examples/funcslower_example.txt +110 -0
- data/perf-tools/examples/functrace_example.txt +341 -0
- data/perf-tools/examples/iolatency_example.txt +350 -0
- data/perf-tools/examples/iosnoop_example.txt +302 -0
- data/perf-tools/examples/killsnoop_example.txt +62 -0
- data/perf-tools/examples/kprobe_example.txt +379 -0
- data/perf-tools/examples/opensnoop_example.txt +47 -0
- data/perf-tools/examples/perf-stat-hist_example.txt +149 -0
- data/perf-tools/examples/reset-ftrace_example.txt +88 -0
- data/perf-tools/examples/syscount_example.txt +297 -0
- data/perf-tools/examples/tcpretrans_example.txt +93 -0
- data/perf-tools/examples/tpoint_example.txt +210 -0
- data/perf-tools/examples/uprobe_example.txt +321 -0
- data/perf-tools/execsnoop +292 -0
- data/perf-tools/fs/cachestat +167 -0
- data/perf-tools/images/perf-tools_2016.png +0 -0
- data/perf-tools/iolatency +296 -0
- data/perf-tools/iosnoop +296 -0
- data/perf-tools/kernel/funccount +146 -0
- data/perf-tools/kernel/funcgraph +259 -0
- data/perf-tools/kernel/funcslower +248 -0
- data/perf-tools/kernel/functrace +192 -0
- data/perf-tools/kernel/kprobe +270 -0
- data/perf-tools/killsnoop +263 -0
- data/perf-tools/man/man8/bitesize.8 +70 -0
- data/perf-tools/man/man8/cachestat.8 +111 -0
- data/perf-tools/man/man8/execsnoop.8 +104 -0
- data/perf-tools/man/man8/funccount.8 +76 -0
- data/perf-tools/man/man8/funcgraph.8 +166 -0
- data/perf-tools/man/man8/funcslower.8 +129 -0
- data/perf-tools/man/man8/functrace.8 +123 -0
- data/perf-tools/man/man8/iolatency.8 +116 -0
- data/perf-tools/man/man8/iosnoop.8 +169 -0
- data/perf-tools/man/man8/killsnoop.8 +100 -0
- data/perf-tools/man/man8/kprobe.8 +162 -0
- data/perf-tools/man/man8/opensnoop.8 +113 -0
- data/perf-tools/man/man8/perf-stat-hist.8 +111 -0
- data/perf-tools/man/man8/reset-ftrace.8 +49 -0
- data/perf-tools/man/man8/syscount.8 +96 -0
- data/perf-tools/man/man8/tcpretrans.8 +93 -0
- data/perf-tools/man/man8/tpoint.8 +140 -0
- data/perf-tools/man/man8/uprobe.8 +168 -0
- data/perf-tools/misc/perf-stat-hist +223 -0
- data/perf-tools/net/tcpretrans +311 -0
- data/perf-tools/opensnoop +280 -0
- data/perf-tools/syscount +192 -0
- data/perf-tools/system/tpoint +232 -0
- data/perf-tools/tools/reset-ftrace +123 -0
- data/perf-tools/user/uprobe +390 -0
- metadata +349 -0
@@ -0,0 +1,88 @@
|
|
1
|
+
Demonstrations of reset-ftrace, the Linux ftrace tool.
|
2
|
+
|
3
|
+
|
4
|
+
You will probably never need this tool. If you kill -9 an ftrace-based tool,
|
5
|
+
leaving the kernel in a tracing enabled state, you could try using this tool
|
6
|
+
to reset ftrace and disable tracing. Make sure no other ftrace sessions are
|
7
|
+
in use on your system, or it will kill those.
|
8
|
+
|
9
|
+
Here's an example:
|
10
|
+
|
11
|
+
# ./opensnoop
|
12
|
+
Tracing open()s. Ctrl-C to end.
|
13
|
+
ERROR: ftrace may be in use by PID 2197 /var/tmp/.ftrace-lock
|
14
|
+
|
15
|
+
I tried to run opensnoop, but there's a lock file for PID 2197. Checking if it
|
16
|
+
exists:
|
17
|
+
|
18
|
+
# ps -fp 2197
|
19
|
+
UID PID PPID C STIME TTY TIME CMD
|
20
|
+
#
|
21
|
+
|
22
|
+
No.
|
23
|
+
|
24
|
+
I also know that no one is using ftrace on this system. So I'll use reset-ftrace
|
25
|
+
to clean up this lock file and ftrace state:
|
26
|
+
|
27
|
+
# ./reset-ftrace
|
28
|
+
ERROR: ftrace lock (/var/tmp/.ftrace-lock) exists. It shows ftrace may be in use by PID 2197.
|
29
|
+
Double check to see if that PID is still active. If not, consider using -f to force a reset. Exiting.
|
30
|
+
|
31
|
+
... except it's complaining about the lock file too. I'm already sure that this
|
32
|
+
PID doesn't exist, so I'll add the -f option:
|
33
|
+
|
34
|
+
# ./reset-ftrace -f
|
35
|
+
Reseting ftrace state...
|
36
|
+
|
37
|
+
current_tracer, before:
|
38
|
+
1 nop
|
39
|
+
current_tracer, after:
|
40
|
+
1 nop
|
41
|
+
|
42
|
+
set_ftrace_filter, before:
|
43
|
+
1 #### all functions enabled ####
|
44
|
+
set_ftrace_filter, after:
|
45
|
+
1 #### all functions enabled ####
|
46
|
+
|
47
|
+
set_ftrace_pid, before:
|
48
|
+
1 no pid
|
49
|
+
set_ftrace_pid, after:
|
50
|
+
1 no pid
|
51
|
+
|
52
|
+
kprobe_events, before:
|
53
|
+
kprobe_events, after:
|
54
|
+
|
55
|
+
Done.
|
56
|
+
|
57
|
+
The output shows what has been reset, including the before and after state of
|
58
|
+
these files.
|
59
|
+
|
60
|
+
Now I can try iosnoop again:
|
61
|
+
|
62
|
+
# ./iosnoop
|
63
|
+
Tracing block I/O. Ctrl-C to end.
|
64
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
65
|
+
supervise 1689 W 202,1 17039664 4096 0.58
|
66
|
+
supervise 1689 W 202,1 17039672 4096 0.47
|
67
|
+
supervise 1694 W 202,1 17039744 4096 0.98
|
68
|
+
supervise 1694 W 202,1 17039752 4096 0.74
|
69
|
+
supervise 1684 W 202,1 17039760 4096 0.63
|
70
|
+
[...]
|
71
|
+
|
72
|
+
Fixed.
|
73
|
+
|
74
|
+
Note that reset-ftrace currently only resets a few methods of enabling
|
75
|
+
tracing, such as set_ftrace_filter and kprobe_events. Static tracepoints could
|
76
|
+
be enabled individually, and this script currently doesn't find and disable
|
77
|
+
those.
|
78
|
+
|
79
|
+
|
80
|
+
Use -h to print the USAGE message:
|
81
|
+
|
82
|
+
# ./reset-ftrace -h
|
83
|
+
USAGE: reset-ftrace [-fhq]
|
84
|
+
-f # force: delete ftrace lock file
|
85
|
+
-q # quiet: reset, but say nothing
|
86
|
+
-h # this usage message
|
87
|
+
eg,
|
88
|
+
reset-ftrace # disable active ftrace session
|
@@ -0,0 +1,297 @@
|
|
1
|
+
Demonstrations of syscount, the Linux perf_events version.
|
2
|
+
|
3
|
+
|
4
|
+
The first mode I use is "-c", where it behaves like "strace -c", but for the
|
5
|
+
entire system (all procesess) and with much lower overhead:
|
6
|
+
|
7
|
+
# ./syscount -c
|
8
|
+
Tracing... Ctrl-C to end.
|
9
|
+
^Csleep: Interrupt
|
10
|
+
SYSCALL COUNT
|
11
|
+
accept 1
|
12
|
+
getsockopt 1
|
13
|
+
setsid 1
|
14
|
+
chdir 2
|
15
|
+
getcwd 2
|
16
|
+
getpeername 2
|
17
|
+
getsockname 2
|
18
|
+
setgid 2
|
19
|
+
setgroups 2
|
20
|
+
setpgid 2
|
21
|
+
setuid 2
|
22
|
+
getpgrp 4
|
23
|
+
getpid 4
|
24
|
+
rename 4
|
25
|
+
setitimer 4
|
26
|
+
setrlimit 4
|
27
|
+
setsockopt 4
|
28
|
+
statfs 4
|
29
|
+
set_tid_address 5
|
30
|
+
readlink 6
|
31
|
+
set_robust_list 6
|
32
|
+
nanosleep 7
|
33
|
+
newuname 7
|
34
|
+
faccessat 8
|
35
|
+
futex 10
|
36
|
+
clock_gettime 16
|
37
|
+
newlstat 20
|
38
|
+
pipe 20
|
39
|
+
epoll_wait 24
|
40
|
+
getrlimit 25
|
41
|
+
socket 27
|
42
|
+
connect 29
|
43
|
+
exit_group 30
|
44
|
+
getppid 31
|
45
|
+
dup2 34
|
46
|
+
wait4 51
|
47
|
+
fcntl 58
|
48
|
+
getegid 72
|
49
|
+
getgid 72
|
50
|
+
getuid 72
|
51
|
+
geteuid 75
|
52
|
+
perf_event_open 100
|
53
|
+
munmap 121
|
54
|
+
gettimeofday 216
|
55
|
+
access 266
|
56
|
+
ioctl 340
|
57
|
+
poll 348
|
58
|
+
sendto 374
|
59
|
+
mprotect 414
|
60
|
+
brk 597
|
61
|
+
rt_sigaction 632
|
62
|
+
recvfrom 664
|
63
|
+
lseek 749
|
64
|
+
newfstatat 2922
|
65
|
+
openat 2925
|
66
|
+
newfstat 3229
|
67
|
+
newstat 4334
|
68
|
+
open 4534
|
69
|
+
fchdir 5845
|
70
|
+
getdents 5854
|
71
|
+
read 7673
|
72
|
+
close 7728
|
73
|
+
select 9633
|
74
|
+
rt_sigprocmask 19886
|
75
|
+
write 34581
|
76
|
+
|
77
|
+
While tracing, the write() syscall was executed 34,581 times.
|
78
|
+
|
79
|
+
This mode uses "perf stat" to count the syscalls:* tracepoints in-kernel.
|
80
|
+
|
81
|
+
|
82
|
+
You can add a duration (-d) and limit the number shown (-t):
|
83
|
+
|
84
|
+
# ./syscount -cd 5 -t 10
|
85
|
+
Tracing for 5 seconds. Top 10 only...
|
86
|
+
SYSCALL COUNT
|
87
|
+
gettimeofday 1009
|
88
|
+
write 3583
|
89
|
+
read 8174
|
90
|
+
openat 21550
|
91
|
+
newfstat 21558
|
92
|
+
open 21824
|
93
|
+
fchdir 43098
|
94
|
+
getdents 43106
|
95
|
+
close 43694
|
96
|
+
newfstatat 110936
|
97
|
+
|
98
|
+
While tracing for 5 seconds, the newfstatat() syscall was executed 110,936
|
99
|
+
times.
|
100
|
+
|
101
|
+
|
102
|
+
Without the -c, syscount shows syscalls by process name:
|
103
|
+
|
104
|
+
# ./syscount -d 5 -t 10
|
105
|
+
Tracing for 5 seconds. Top 10 only...
|
106
|
+
[ perf record: Woken up 66 times to write data ]
|
107
|
+
[ perf record: Captured and wrote 16.513 MB perf.data (~721455 samples) ]
|
108
|
+
COMM COUNT
|
109
|
+
stat 450
|
110
|
+
perl 537
|
111
|
+
catalina.sh 1700
|
112
|
+
postgres 2094
|
113
|
+
run 2362
|
114
|
+
:6946 4764
|
115
|
+
ps 5961
|
116
|
+
sshd 45796
|
117
|
+
find 61039
|
118
|
+
|
119
|
+
So processes named "find" called 61,039 syscalls during the 5 seconds of
|
120
|
+
tracing.
|
121
|
+
|
122
|
+
Note that this mode writes a perf.data file. This is higher overhead for a
|
123
|
+
few reasons:
|
124
|
+
|
125
|
+
- all data is passed from kernel to user space, which eats CPU for the memory
|
126
|
+
copy. Note that it is buffered in an efficient way by perf_events, which
|
127
|
+
wakes up and context switches only a small number of times: 66 in this case,
|
128
|
+
to hand 16 Mbytes of trace data to user space.
|
129
|
+
- data is post-processed in user space, eating more CPU.
|
130
|
+
- data is stored on the file system in the perf.data file, consuming available
|
131
|
+
storage.
|
132
|
+
|
133
|
+
This will be improved in future kernels, but it is difficult to improve this
|
134
|
+
much further in existing kernels. For example, using a pipe to "perf script"
|
135
|
+
instead of writing perf.data can have issues with feedback loops, where
|
136
|
+
perf traces itself. This syscount version goes to lengths to avoid tracing
|
137
|
+
its own perf, but
|
138
|
+
right now with existing functionality in older kernels. The trip via perf.data
|
139
|
+
is necessary
|
140
|
+
|
141
|
+
|
142
|
+
Running without options shows syscalls by process name until Ctrl-C:
|
143
|
+
|
144
|
+
# ./syscount
|
145
|
+
Tracing... Ctrl-C to end.
|
146
|
+
^C[ perf record: Woken up 39 times to write data ]
|
147
|
+
[ perf record: Captured and wrote 9.644 MB perf.data (~421335 samples) ]
|
148
|
+
COMM COUNT
|
149
|
+
apache2 8
|
150
|
+
apacheLogParser 13
|
151
|
+
platformservice 16
|
152
|
+
snmpd 16
|
153
|
+
ntpd 21
|
154
|
+
multilog 66
|
155
|
+
supervise 84
|
156
|
+
dirname 102
|
157
|
+
echo 102
|
158
|
+
svstat 108
|
159
|
+
cut 111
|
160
|
+
bash 113
|
161
|
+
grep 132
|
162
|
+
xargs 132
|
163
|
+
redis-server 190
|
164
|
+
sed 192
|
165
|
+
setuidgid 294
|
166
|
+
stat 450
|
167
|
+
perl 537
|
168
|
+
catalina.sh 1275
|
169
|
+
postgres 1736
|
170
|
+
run 2352
|
171
|
+
:7396 4527
|
172
|
+
ps 5925
|
173
|
+
sshd 20154
|
174
|
+
find 28700
|
175
|
+
|
176
|
+
Note again it is writing a perf.data file to do this.
|
177
|
+
|
178
|
+
|
179
|
+
The -v option adds process IDs:
|
180
|
+
|
181
|
+
# ./syscount -v
|
182
|
+
Tracing... Ctrl-C to end.
|
183
|
+
^C[ perf record: Woken up 48 times to write data ]
|
184
|
+
[ perf record: Captured and wrote 12.114 MB perf.data (~529276 samples) ]
|
185
|
+
PID COMM COUNT
|
186
|
+
3599 apacheLogParser 3
|
187
|
+
7977 xargs 3
|
188
|
+
7982 supervise 3
|
189
|
+
7993 xargs 3
|
190
|
+
3575 apache2 4
|
191
|
+
1311 ntpd 6
|
192
|
+
3135 postgres 6
|
193
|
+
3600 apacheLogParser 6
|
194
|
+
3210 platformservice 8
|
195
|
+
6503 sshd 9
|
196
|
+
7978 :7978 9
|
197
|
+
7994 run 9
|
198
|
+
7968 :7968 11
|
199
|
+
7984 run 11
|
200
|
+
1451 snmpd 16
|
201
|
+
3040 svscan 17
|
202
|
+
3066 postgres 17
|
203
|
+
3133 postgres 24
|
204
|
+
3134 postgres 24
|
205
|
+
3136 postgres 24
|
206
|
+
3061 multilog 29
|
207
|
+
3055 supervise 30
|
208
|
+
7979 bash 31
|
209
|
+
7977 echo 34
|
210
|
+
7981 dirname 34
|
211
|
+
7993 echo 34
|
212
|
+
7968 svstat 36
|
213
|
+
7984 svstat 36
|
214
|
+
7975 cut 37
|
215
|
+
7991 cut 37
|
216
|
+
9857 bash 37
|
217
|
+
7967 :7967 40
|
218
|
+
7983 run 40
|
219
|
+
7972 :7972 41
|
220
|
+
7976 xargs 41
|
221
|
+
7988 run 41
|
222
|
+
7992 xargs 41
|
223
|
+
7969 :7969 42
|
224
|
+
7976 :7976 42
|
225
|
+
7985 run 42
|
226
|
+
7992 run 42
|
227
|
+
7973 :7973 43
|
228
|
+
7974 :7974 43
|
229
|
+
7989 run 43
|
230
|
+
7990 run 43
|
231
|
+
7973 grep 44
|
232
|
+
7989 grep 44
|
233
|
+
7975 :7975 45
|
234
|
+
7991 run 45
|
235
|
+
7970 :7970 51
|
236
|
+
7986 run 51
|
237
|
+
7981 catalina.sh 52
|
238
|
+
7974 sed 64
|
239
|
+
7990 sed 64
|
240
|
+
3455 postgres 66
|
241
|
+
7971 :7971 66
|
242
|
+
7987 run 66
|
243
|
+
7966 :7966 96
|
244
|
+
7966 setuidgid 98
|
245
|
+
3064 redis-server 110
|
246
|
+
7970 stat 150
|
247
|
+
7986 stat 150
|
248
|
+
7969 perl 179
|
249
|
+
7985 perl 179
|
250
|
+
7982 run 341
|
251
|
+
7966 catalina.sh 373
|
252
|
+
7980 postgres 432
|
253
|
+
7972 ps 1971
|
254
|
+
7988 ps 1983
|
255
|
+
9832 sshd 37511
|
256
|
+
7979 find 51040
|
257
|
+
|
258
|
+
Once you've found a process ID of interest, you can use "-c" and "-p PID" to
|
259
|
+
show syscall names. This also switches to "perf stat" mode for in-kernel
|
260
|
+
counts, and lower overhead:
|
261
|
+
|
262
|
+
# ./syscount -cp 7979
|
263
|
+
Tracing PID 7979... Ctrl-C to end.
|
264
|
+
^CSYSCALL COUNT
|
265
|
+
brk 10
|
266
|
+
newfstat 2171
|
267
|
+
open 2171
|
268
|
+
newfstatat 2175
|
269
|
+
openat 2175
|
270
|
+
close 4346
|
271
|
+
fchdir 4346
|
272
|
+
getdents 4351
|
273
|
+
write 25482
|
274
|
+
|
275
|
+
So the most frequent syscall by PID 7979 was write().
|
276
|
+
|
277
|
+
|
278
|
+
Use -h to print the USAGE message:
|
279
|
+
|
280
|
+
# ./syscount -h
|
281
|
+
USAGE: syscount [-chv] [-t top] {-p PID|-d seconds|command}
|
282
|
+
syscount # count by process name
|
283
|
+
-c # show counts by syscall name
|
284
|
+
-h # this usage message
|
285
|
+
-v # verbose: shows PID
|
286
|
+
-p PID # trace this PID only
|
287
|
+
-d seconds # duration of trace
|
288
|
+
-t num # show top number only
|
289
|
+
command # run and trace this command
|
290
|
+
eg,
|
291
|
+
syscount # syscalls by process name
|
292
|
+
syscount -c # syscalls by syscall name
|
293
|
+
syscount -d 5 # trace for 5 seconds
|
294
|
+
syscount -cp 923 # syscall names for PID 923
|
295
|
+
syscount -c ls # syscall names for "ls"
|
296
|
+
|
297
|
+
See the man page and example file for more info.
|
@@ -0,0 +1,93 @@
|
|
1
|
+
Demonstrations of tcpretrans, the Linux ftrace version.
|
2
|
+
|
3
|
+
|
4
|
+
Tracing TCP retransmits on a busy server:
|
5
|
+
|
6
|
+
# ./tcpretrans
|
7
|
+
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
|
8
|
+
05:16:44 3375 10.150.18.225:53874 R> 10.105.152.3:6001 ESTABLISHED
|
9
|
+
05:16:44 3375 10.150.18.225:53874 R> 10.105.152.3:6001 ESTABLISHED
|
10
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
11
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
12
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
13
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
14
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
15
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
16
|
+
05:16:54 4028 10.150.18.225:6002 R> 10.150.30.249:1710 ESTABLISHED
|
17
|
+
05:16:55 0 10.150.18.225:47115 R> 10.71.171.158:6001 ESTABLISHED
|
18
|
+
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
|
19
|
+
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
|
20
|
+
05:16:58 0 10.150.18.225:44388 R> 10.103.130.120:6001 ESTABLISHED
|
21
|
+
05:16:59 0 10.150.18.225:56086 R> 10.150.32.107:6001 ESTABLISHED
|
22
|
+
05:16:59 0 10.150.18.225:56086 R> 10.150.32.107:6001 ESTABLISHED
|
23
|
+
^C
|
24
|
+
Ending tracing...
|
25
|
+
|
26
|
+
This shows TCP retransmits by dynamically tracing the kernel function that does
|
27
|
+
the retransmit. This is a low overhead approach.
|
28
|
+
|
29
|
+
The PID may or may not make sense: it's showing the PID that was on-CPU,
|
30
|
+
however, retransmits are often timer-based, where it's the kernel that is
|
31
|
+
on-CPU.
|
32
|
+
|
33
|
+
The STATE column shows the TCP state for the socket performing the retransmit.
|
34
|
+
The "--" column is the packet type. "R>" for retransmit.
|
35
|
+
|
36
|
+
|
37
|
+
Kernel stack traces can be included with -s, which may show the type of
|
38
|
+
retransmit:
|
39
|
+
|
40
|
+
# ./tcpretrans -s
|
41
|
+
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
|
42
|
+
06:21:10 19516 10.144.107.151:22 R> 10.13.106.251:32167 ESTABLISHED
|
43
|
+
=> tcp_fastretrans_alert
|
44
|
+
=> tcp_ack
|
45
|
+
=> tcp_rcv_established
|
46
|
+
=> tcp_v4_do_rcv
|
47
|
+
=> tcp_v4_rcv
|
48
|
+
=> ip_local_deliver_finish
|
49
|
+
=> ip_local_deliver
|
50
|
+
=> ip_rcv_finish
|
51
|
+
=> ip_rcv
|
52
|
+
=> __netif_receive_skb
|
53
|
+
=> netif_receive_skb
|
54
|
+
=> handle_incoming_queue
|
55
|
+
=> xennet_poll
|
56
|
+
=> net_rx_action
|
57
|
+
=> __do_softirq
|
58
|
+
=> call_softirq
|
59
|
+
=> do_softirq
|
60
|
+
=> irq_exit
|
61
|
+
=> xen_evtchn_do_upcall
|
62
|
+
=> xen_do_hypervisor_callback
|
63
|
+
|
64
|
+
This looks like a fast retransmit (inclusion of tcp_fastretrans_alert(), and
|
65
|
+
being based on receiving an ACK, rather than a timer).
|
66
|
+
|
67
|
+
|
68
|
+
The -l option will include TCP tail loss probe events (TLP; see
|
69
|
+
http://lwn.net/Articles/542642/). Eg:
|
70
|
+
|
71
|
+
# ./tcpretrans -l
|
72
|
+
TIME PID LADDR:LPORT -- RADDR:RPORT STATE
|
73
|
+
21:56:06 0 10.100.155.200:22 R> 10.10.237.72:18554 LAST_ACK
|
74
|
+
21:56:08 0 10.100.155.200:22 R> 10.10.237.72:18554 LAST_ACK
|
75
|
+
21:56:10 16452 10.100.155.200:22 R> 10.10.237.72:18554 LAST_ACK
|
76
|
+
21:56:10 0 10.100.155.200:22 L> 10.10.237.72:46408 LAST_ACK
|
77
|
+
21:56:10 0 10.100.155.200:22 R> 10.10.237.72:46408 LAST_ACK
|
78
|
+
21:56:12 0 10.100.155.200:22 R> 10.10.237.72:46408 LAST_ACK
|
79
|
+
21:56:13 0 10.100.155.200:22 R> 10.10.237.72:46408 LAST_ACK
|
80
|
+
^C
|
81
|
+
Ending tracing...
|
82
|
+
|
83
|
+
Look for "L>" in the type column ("--") for TLP events.
|
84
|
+
|
85
|
+
|
86
|
+
Use -h to print the USAGE message:
|
87
|
+
|
88
|
+
# ./tcpretrans -h
|
89
|
+
USAGE: tcpretrans [-hs]
|
90
|
+
-h # help message
|
91
|
+
-s # print stack traces
|
92
|
+
eg,
|
93
|
+
tcpretrans # trace TCP retransmits
|
@@ -0,0 +1,210 @@
|
|
1
|
+
Demonstrations of tpoint, the Linux ftrace version.
|
2
|
+
|
3
|
+
|
4
|
+
Let's trace block:block_rq_issue, to see block device (disk) I/O requests:
|
5
|
+
|
6
|
+
# ./tpoint block:block_rq_issue
|
7
|
+
Tracing block:block_rq_issue. Ctrl-C to end.
|
8
|
+
supervise-1692 [001] d... 7269912.982162: block_rq_issue: 202,1 W 0 () 17039656 + 8 [supervise]
|
9
|
+
supervise-1696 [000] d... 7269912.982243: block_rq_issue: 202,1 W 0 () 12862264 + 8 [supervise]
|
10
|
+
cksum-12994 [000] d... 7269913.317924: block_rq_issue: 202,1 R 0 () 9357056 + 72 [cksum]
|
11
|
+
cksum-12994 [000] d... 7269913.319013: block_rq_issue: 202,1 R 0 () 2977536 + 144 [cksum]
|
12
|
+
cksum-12994 [000] d... 7269913.320217: block_rq_issue: 202,1 R 0 () 2986240 + 216 [cksum]
|
13
|
+
cksum-12994 [000] d... 7269913.321677: block_rq_issue: 202,1 R 0 () 620344 + 56 [cksum]
|
14
|
+
cksum-12994 [001] d... 7269913.329309: block_rq_issue: 202,1 R 0 () 9107912 + 88 [cksum]
|
15
|
+
cksum-12994 [001] d... 7269913.340133: block_rq_issue: 202,1 R 0 () 3147008 + 248 [cksum]
|
16
|
+
cksum-12994 [001] d... 7269913.354551: block_rq_issue: 202,1 R 0 () 11583488 + 256 [cksum]
|
17
|
+
cksum-12994 [001] d... 7269913.379904: block_rq_issue: 202,1 R 0 () 11583744 + 256 [cksum]
|
18
|
+
[...]
|
19
|
+
^C
|
20
|
+
Ending tracing...
|
21
|
+
|
22
|
+
Great, that was easy!
|
23
|
+
|
24
|
+
perf_events can do this as well, and is better in many ways, including a more
|
25
|
+
efficient buffering strategy, and multi-user access. It's not that easy to do
|
26
|
+
this one-liner in perf_events, however. An equivalent for recent kernels is:
|
27
|
+
|
28
|
+
perf record --no-buffer -e block:block_rq_issue -a -o - | PAGER=cat stdbuf -oL perf script -i -
|
29
|
+
|
30
|
+
Older kernels, use -D instead of --no-buffer. Even better is to set the buffer
|
31
|
+
page size to a sufficient grouping (using -m), to minimize overheads, at the
|
32
|
+
expense of liveliness of updates. Note that stack traces (-g) don't work on
|
33
|
+
my systems with this perf one-liner, however, they do work with tpoint -s.
|
34
|
+
|
35
|
+
|
36
|
+
Column headings can be printed using -H:
|
37
|
+
|
38
|
+
# ./tpoint -H block:block_rq_issue
|
39
|
+
Tracing block:block_rq_issue. Ctrl-C to end.
|
40
|
+
# tracer: nop
|
41
|
+
#
|
42
|
+
# entries-in-buffer/entries-written: 0/0 #P:2
|
43
|
+
#
|
44
|
+
# _-----=> irqs-off
|
45
|
+
# / _----=> need-resched
|
46
|
+
# | / _---=> hardirq/softirq
|
47
|
+
# || / _--=> preempt-depth
|
48
|
+
# ||| / delay
|
49
|
+
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
50
|
+
# | | | |||| | |
|
51
|
+
supervise-1697 [000] d... 7270545.340856: block_rq_issue: 202,1 W 0 () 12862464 + 8 [supervise]
|
52
|
+
supervise-1697 [000] d... 7270545.341256: block_rq_issue: 202,1 W 0 () 12862472 + 8 [supervise]
|
53
|
+
supervise-1690 [000] d... 7270545.342363: block_rq_issue: 202,1 W 0 () 17040368 + 8 [supervise]
|
54
|
+
[...]
|
55
|
+
|
56
|
+
They are also documented in the Linux kernel source under:
|
57
|
+
Documentation/trace/ftrace.txt.
|
58
|
+
|
59
|
+
|
60
|
+
How about stacks traces for those block_rq_issue events? Adding -s:
|
61
|
+
|
62
|
+
# ./tpoint -s block:block_rq_issue
|
63
|
+
Tracing block:block_rq_issue. Ctrl-C to end.
|
64
|
+
supervise-1691 [000] d... 7269511.079179: block_rq_issue: 202,1 W 0 () 17040232 + 8 [supervise]
|
65
|
+
supervise-1691 [000] d... 7269511.079188: <stack trace>
|
66
|
+
=> blk_peek_request
|
67
|
+
=> do_blkif_request
|
68
|
+
=> __blk_run_queue
|
69
|
+
=> queue_unplugged
|
70
|
+
=> blk_flush_plug_list
|
71
|
+
=> blk_finish_plug
|
72
|
+
=> ext4_writepages
|
73
|
+
=> do_writepages
|
74
|
+
=> __filemap_fdatawrite_range
|
75
|
+
=> filemap_flush
|
76
|
+
=> ext4_alloc_da_blocks
|
77
|
+
=> ext4_rename
|
78
|
+
=> vfs_rename
|
79
|
+
=> SYSC_renameat2
|
80
|
+
=> SyS_renameat2
|
81
|
+
=> SyS_rename
|
82
|
+
=> system_call_fastpath
|
83
|
+
cksum-7428 [000] d... 7269511.331778: block_rq_issue: 202,1 R 0 () 9006848 + 208 [cksum]
|
84
|
+
cksum-7428 [000] d... 7269511.331784: <stack trace>
|
85
|
+
=> blk_peek_request
|
86
|
+
=> do_blkif_request
|
87
|
+
=> __blk_run_queue
|
88
|
+
=> queue_unplugged
|
89
|
+
=> blk_flush_plug_list
|
90
|
+
=> blk_finish_plug
|
91
|
+
=> __do_page_cache_readahead
|
92
|
+
=> ondemand_readahead
|
93
|
+
=> page_cache_async_readahead
|
94
|
+
=> generic_file_read_iter
|
95
|
+
=> new_sync_read
|
96
|
+
=> vfs_read
|
97
|
+
=> SyS_read
|
98
|
+
=> system_call_fastpath
|
99
|
+
cksum-7428 [000] d... 7269511.332631: block_rq_issue: 202,1 R 0 () 620992 + 200 [cksum]
|
100
|
+
cksum-7428 [000] d... 7269511.332639: <stack trace>
|
101
|
+
=> blk_peek_request
|
102
|
+
=> do_blkif_request
|
103
|
+
=> __blk_run_queue
|
104
|
+
=> queue_unplugged
|
105
|
+
=> blk_flush_plug_list
|
106
|
+
=> blk_finish_plug
|
107
|
+
=> __do_page_cache_readahead
|
108
|
+
=> ondemand_readahead
|
109
|
+
=> page_cache_sync_readahead
|
110
|
+
=> generic_file_read_iter
|
111
|
+
=> new_sync_read
|
112
|
+
=> vfs_read
|
113
|
+
=> SyS_read
|
114
|
+
=> system_call_fastpath
|
115
|
+
^C
|
116
|
+
Ending tracing...
|
117
|
+
|
118
|
+
Easy. Now I can read the ancestry to understand what actually lead to issuing
|
119
|
+
a block device (disk) I/O.
|
120
|
+
|
121
|
+
|
122
|
+
Here's insertion onto the block I/O queue (better matches processes):
|
123
|
+
|
124
|
+
# ./tpoint -s block:block_rq_insert
|
125
|
+
Tracing block:block_rq_insert. Ctrl-C to end.
|
126
|
+
cksum-11908 [000] d... 7269834.882517: block_rq_insert: 202,1 R 0 () 736304 + 256 [cksum]
|
127
|
+
cksum-11908 [000] d... 7269834.882528: <stack trace>
|
128
|
+
=> __elv_add_request
|
129
|
+
=> blk_flush_plug_list
|
130
|
+
=> blk_finish_plug
|
131
|
+
=> __do_page_cache_readahead
|
132
|
+
=> ondemand_readahead
|
133
|
+
=> page_cache_sync_readahead
|
134
|
+
=> generic_file_read_iter
|
135
|
+
=> new_sync_read
|
136
|
+
=> vfs_read
|
137
|
+
=> SyS_read
|
138
|
+
=> system_call_fastpath
|
139
|
+
[...]
|
140
|
+
|
141
|
+
|
142
|
+
You can also add tracepoint filters. To see what variables you can use, use -v:
|
143
|
+
|
144
|
+
# ./tpoint -v block:block_rq_issue
|
145
|
+
name: block_rq_issue
|
146
|
+
ID: 942
|
147
|
+
format:
|
148
|
+
field:unsigned short common_type; offset:0; size:2; signed:0;
|
149
|
+
field:unsigned char common_flags; offset:2; size:1; signed:0;
|
150
|
+
field:unsigned char common_preempt_count; offset:3; size:1; signed:0;
|
151
|
+
field:int common_pid; offset:4; size:4; signed:1;
|
152
|
+
|
153
|
+
field:dev_t dev; offset:8; size:4; signed:0;
|
154
|
+
field:sector_t sector; offset:16; size:8; signed:0;
|
155
|
+
field:unsigned int nr_sector; offset:24; size:4; signed:0;
|
156
|
+
field:unsigned int bytes; offset:28; size:4; signed:0;
|
157
|
+
field:char rwbs[8]; offset:32; size:8; signed:1;
|
158
|
+
field:char comm[16]; offset:40; size:16; signed:1;
|
159
|
+
field:__data_loc char[] cmd; offset:56; size:4; signed:1;
|
160
|
+
|
161
|
+
print fmt: "%d,%d %s %u (%s) %llu + %u [%s]", ((unsigned int) ((REC->dev) >> 20)), ((unsigned int) ((REC->dev) & ((1U << 20) - 1))), REC->rwbs, REC->bytes, __get_str(cmd), (unsigned long long)REC->sector, REC->nr_sector, REC->comm
|
162
|
+
|
163
|
+
|
164
|
+
Now I'll add a filter to check that the rwbs field (I/O type) includes an "R",
|
165
|
+
making it a read:
|
166
|
+
|
167
|
+
# ./tpoint -s block:block_rq_insert 'rwbs ~ "*R*"'
|
168
|
+
cksum-11908 [000] d... 7269839.919098: block_rq_insert: 202,1 R 0 () 736560 + 136 [cksum]
|
169
|
+
cksum-11908 [000] d... 7269839.919107: <stack trace>
|
170
|
+
=> __elv_add_request
|
171
|
+
=> blk_flush_plug_list
|
172
|
+
=> blk_finish_plug
|
173
|
+
=> __do_page_cache_readahead
|
174
|
+
=> ondemand_readahead
|
175
|
+
=> page_cache_async_readahead
|
176
|
+
=> generic_file_read_iter
|
177
|
+
=> new_sync_read
|
178
|
+
=> vfs_read
|
179
|
+
=> SyS_read
|
180
|
+
=> system_call_fastpath
|
181
|
+
[...]
|
182
|
+
|
183
|
+
|
184
|
+
Use -h to print the USAGE message:
|
185
|
+
|
186
|
+
# ./tpoint -h
|
187
|
+
USAGE: tpoint [-hHsv] [-d secs] [-p PID] [-L TID] tracepoint [filter]
|
188
|
+
tpoint -l
|
189
|
+
-d seconds # trace duration, and use buffers
|
190
|
+
-p PID # PID to match on events
|
191
|
+
-L TID # thread id to match on events
|
192
|
+
-v # view format file (don't trace)
|
193
|
+
-H # include column headers
|
194
|
+
-l # list all tracepoints
|
195
|
+
-s # show kernel stack traces
|
196
|
+
-h # this usage message
|
197
|
+
|
198
|
+
Note that these examples may need modification to match your kernel
|
199
|
+
version's function names and platform's register usage.
|
200
|
+
eg,
|
201
|
+
tpoint -l | grep open
|
202
|
+
# find tracepoints containing "open"
|
203
|
+
tpoint syscalls:sys_enter_open
|
204
|
+
# trace open() syscall entry
|
205
|
+
tpoint block:block_rq_issue
|
206
|
+
# trace block I/O issue
|
207
|
+
tpoint -s block:black_rq_issue
|
208
|
+
# show kernel stacks
|
209
|
+
|
210
|
+
See the man page and example file for more info.
|