fluent-plugin-perf-tools 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +15 -0
- data/.rubocop.yml +26 -0
- data/.ruby-version +1 -0
- data/CHANGELOG.md +5 -0
- data/CODE_OF_CONDUCT.md +84 -0
- data/Gemfile +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +43 -0
- data/Rakefile +17 -0
- data/bin/console +15 -0
- data/bin/setup +8 -0
- data/fluent-plugin-perf-tools.gemspec +48 -0
- data/lib/fluent/plugin/in_perf_tools.rb +42 -0
- data/lib/fluent/plugin/perf_tools/cachestat.rb +65 -0
- data/lib/fluent/plugin/perf_tools/command.rb +30 -0
- data/lib/fluent/plugin/perf_tools/version.rb +9 -0
- data/lib/fluent/plugin/perf_tools.rb +11 -0
- data/perf-tools/LICENSE +339 -0
- data/perf-tools/README.md +205 -0
- data/perf-tools/bin/bitesize +1 -0
- data/perf-tools/bin/cachestat +1 -0
- data/perf-tools/bin/execsnoop +1 -0
- data/perf-tools/bin/funccount +1 -0
- data/perf-tools/bin/funcgraph +1 -0
- data/perf-tools/bin/funcslower +1 -0
- data/perf-tools/bin/functrace +1 -0
- data/perf-tools/bin/iolatency +1 -0
- data/perf-tools/bin/iosnoop +1 -0
- data/perf-tools/bin/killsnoop +1 -0
- data/perf-tools/bin/kprobe +1 -0
- data/perf-tools/bin/opensnoop +1 -0
- data/perf-tools/bin/perf-stat-hist +1 -0
- data/perf-tools/bin/reset-ftrace +1 -0
- data/perf-tools/bin/syscount +1 -0
- data/perf-tools/bin/tcpretrans +1 -0
- data/perf-tools/bin/tpoint +1 -0
- data/perf-tools/bin/uprobe +1 -0
- data/perf-tools/deprecated/README.md +1 -0
- data/perf-tools/deprecated/execsnoop-proc +150 -0
- data/perf-tools/deprecated/execsnoop-proc.8 +80 -0
- data/perf-tools/deprecated/execsnoop-proc_example.txt +46 -0
- data/perf-tools/disk/bitesize +175 -0
- data/perf-tools/examples/bitesize_example.txt +63 -0
- data/perf-tools/examples/cachestat_example.txt +58 -0
- data/perf-tools/examples/execsnoop_example.txt +153 -0
- data/perf-tools/examples/funccount_example.txt +126 -0
- data/perf-tools/examples/funcgraph_example.txt +2178 -0
- data/perf-tools/examples/funcslower_example.txt +110 -0
- data/perf-tools/examples/functrace_example.txt +341 -0
- data/perf-tools/examples/iolatency_example.txt +350 -0
- data/perf-tools/examples/iosnoop_example.txt +302 -0
- data/perf-tools/examples/killsnoop_example.txt +62 -0
- data/perf-tools/examples/kprobe_example.txt +379 -0
- data/perf-tools/examples/opensnoop_example.txt +47 -0
- data/perf-tools/examples/perf-stat-hist_example.txt +149 -0
- data/perf-tools/examples/reset-ftrace_example.txt +88 -0
- data/perf-tools/examples/syscount_example.txt +297 -0
- data/perf-tools/examples/tcpretrans_example.txt +93 -0
- data/perf-tools/examples/tpoint_example.txt +210 -0
- data/perf-tools/examples/uprobe_example.txt +321 -0
- data/perf-tools/execsnoop +292 -0
- data/perf-tools/fs/cachestat +167 -0
- data/perf-tools/images/perf-tools_2016.png +0 -0
- data/perf-tools/iolatency +296 -0
- data/perf-tools/iosnoop +296 -0
- data/perf-tools/kernel/funccount +146 -0
- data/perf-tools/kernel/funcgraph +259 -0
- data/perf-tools/kernel/funcslower +248 -0
- data/perf-tools/kernel/functrace +192 -0
- data/perf-tools/kernel/kprobe +270 -0
- data/perf-tools/killsnoop +263 -0
- data/perf-tools/man/man8/bitesize.8 +70 -0
- data/perf-tools/man/man8/cachestat.8 +111 -0
- data/perf-tools/man/man8/execsnoop.8 +104 -0
- data/perf-tools/man/man8/funccount.8 +76 -0
- data/perf-tools/man/man8/funcgraph.8 +166 -0
- data/perf-tools/man/man8/funcslower.8 +129 -0
- data/perf-tools/man/man8/functrace.8 +123 -0
- data/perf-tools/man/man8/iolatency.8 +116 -0
- data/perf-tools/man/man8/iosnoop.8 +169 -0
- data/perf-tools/man/man8/killsnoop.8 +100 -0
- data/perf-tools/man/man8/kprobe.8 +162 -0
- data/perf-tools/man/man8/opensnoop.8 +113 -0
- data/perf-tools/man/man8/perf-stat-hist.8 +111 -0
- data/perf-tools/man/man8/reset-ftrace.8 +49 -0
- data/perf-tools/man/man8/syscount.8 +96 -0
- data/perf-tools/man/man8/tcpretrans.8 +93 -0
- data/perf-tools/man/man8/tpoint.8 +140 -0
- data/perf-tools/man/man8/uprobe.8 +168 -0
- data/perf-tools/misc/perf-stat-hist +223 -0
- data/perf-tools/net/tcpretrans +311 -0
- data/perf-tools/opensnoop +280 -0
- data/perf-tools/syscount +192 -0
- data/perf-tools/system/tpoint +232 -0
- data/perf-tools/tools/reset-ftrace +123 -0
- data/perf-tools/user/uprobe +390 -0
- metadata +349 -0
@@ -0,0 +1,350 @@
|
|
1
|
+
Demonstrations of iolatency, the Linux ftrace version.
|
2
|
+
|
3
|
+
|
4
|
+
Here's a busy system doing over 4k disk IOPS:
|
5
|
+
|
6
|
+
# ./iolatency
|
7
|
+
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
|
8
|
+
|
9
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
10
|
+
0 -> 1 : 4381 |######################################|
|
11
|
+
1 -> 2 : 9 |# |
|
12
|
+
2 -> 4 : 5 |# |
|
13
|
+
4 -> 8 : 0 | |
|
14
|
+
8 -> 16 : 1 |# |
|
15
|
+
|
16
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
17
|
+
0 -> 1 : 4053 |######################################|
|
18
|
+
1 -> 2 : 18 |# |
|
19
|
+
2 -> 4 : 9 |# |
|
20
|
+
4 -> 8 : 2 |# |
|
21
|
+
8 -> 16 : 1 |# |
|
22
|
+
16 -> 32 : 1 |# |
|
23
|
+
|
24
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
25
|
+
0 -> 1 : 4658 |######################################|
|
26
|
+
1 -> 2 : 9 |# |
|
27
|
+
2 -> 4 : 2 |# |
|
28
|
+
|
29
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
30
|
+
0 -> 1 : 4298 |######################################|
|
31
|
+
1 -> 2 : 17 |# |
|
32
|
+
2 -> 4 : 10 |# |
|
33
|
+
4 -> 8 : 1 |# |
|
34
|
+
8 -> 16 : 1 |# |
|
35
|
+
^C
|
36
|
+
Ending tracing...
|
37
|
+
|
38
|
+
Disk I/O latency is usually between 0 and 1 milliseconds, as this system uses
|
39
|
+
SSDs. There are occasional outliers, up to the 16->32 ms range.
|
40
|
+
|
41
|
+
Identifying outliers like these is difficult from iostat(1) alone, which at
|
42
|
+
the same time reported:
|
43
|
+
|
44
|
+
# iostat 1
|
45
|
+
[...]
|
46
|
+
avg-cpu: %user %nice %system %iowait %steal %idle
|
47
|
+
0.53 0.00 1.05 46.84 0.53 51.05
|
48
|
+
|
49
|
+
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
|
50
|
+
xvdap1 0.00 0.00 0.00 28.00 0.00 112.00 8.00 0.02 0.71 0.00 0.71 0.29 0.80
|
51
|
+
xvdb 0.00 0.00 2134.00 0.00 18768.00 0.00 17.59 0.51 0.24 0.24 0.00 0.23 50.00
|
52
|
+
xvdc 0.00 0.00 2088.00 0.00 18504.00 0.00 17.72 0.47 0.22 0.22 0.00 0.22 46.40
|
53
|
+
md0 0.00 0.00 4222.00 0.00 37256.00 0.00 17.65 0.00 0.00 0.00 0.00 0.00 0.00
|
54
|
+
|
55
|
+
I/O latency ("await") averages 0.24 and 0.22 ms for our busy disks, but this
|
56
|
+
output doesn't show that occasionally is much higher.
|
57
|
+
|
58
|
+
To get more information on these I/O, try the iosnoop(8) tool.
|
59
|
+
|
60
|
+
|
61
|
+
The -Q option includes the block I/O queued time, by tracing based on
|
62
|
+
block_rq_insert instead of block_rq_issue:
|
63
|
+
|
64
|
+
# ./iolatency -Q
|
65
|
+
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.
|
66
|
+
|
67
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
68
|
+
0 -> 1 : 1913 |######################################|
|
69
|
+
1 -> 2 : 438 |######### |
|
70
|
+
2 -> 4 : 100 |## |
|
71
|
+
4 -> 8 : 145 |### |
|
72
|
+
8 -> 16 : 43 |# |
|
73
|
+
16 -> 32 : 43 |# |
|
74
|
+
32 -> 64 : 1 |# |
|
75
|
+
|
76
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
77
|
+
0 -> 1 : 2360 |######################################|
|
78
|
+
1 -> 2 : 132 |### |
|
79
|
+
2 -> 4 : 72 |## |
|
80
|
+
4 -> 8 : 14 |# |
|
81
|
+
8 -> 16 : 1 |# |
|
82
|
+
|
83
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
84
|
+
0 -> 1 : 2138 |######################################|
|
85
|
+
1 -> 2 : 496 |######### |
|
86
|
+
2 -> 4 : 81 |## |
|
87
|
+
4 -> 8 : 40 |# |
|
88
|
+
8 -> 16 : 1 |# |
|
89
|
+
16 -> 32 : 2 |# |
|
90
|
+
^C
|
91
|
+
Ending tracing...
|
92
|
+
|
93
|
+
I use this along with the default mode to identify problems of load (queueing)
|
94
|
+
vs problems of the device, which is shown by default.
|
95
|
+
|
96
|
+
|
97
|
+
Here's a more interesting system. This is doing a mixed read/write workload,
|
98
|
+
and has a pretty awful latency distribution:
|
99
|
+
|
100
|
+
# ./iolatency 5 3
|
101
|
+
Tracing block I/O. Output every 5 seconds.
|
102
|
+
|
103
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
104
|
+
0 -> 1 : 2809 |######################################|
|
105
|
+
1 -> 2 : 32 |# |
|
106
|
+
2 -> 4 : 14 |# |
|
107
|
+
4 -> 8 : 6 |# |
|
108
|
+
8 -> 16 : 7 |# |
|
109
|
+
16 -> 32 : 14 |# |
|
110
|
+
32 -> 64 : 39 |# |
|
111
|
+
64 -> 128 : 1556 |###################### |
|
112
|
+
|
113
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
114
|
+
0 -> 1 : 3027 |######################################|
|
115
|
+
1 -> 2 : 19 |# |
|
116
|
+
2 -> 4 : 6 |# |
|
117
|
+
4 -> 8 : 5 |# |
|
118
|
+
8 -> 16 : 3 |# |
|
119
|
+
16 -> 32 : 7 |# |
|
120
|
+
32 -> 64 : 14 |# |
|
121
|
+
64 -> 128 : 540 |####### |
|
122
|
+
|
123
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
124
|
+
0 -> 1 : 2939 |######################################|
|
125
|
+
1 -> 2 : 25 |# |
|
126
|
+
2 -> 4 : 15 |# |
|
127
|
+
4 -> 8 : 2 |# |
|
128
|
+
8 -> 16 : 3 |# |
|
129
|
+
16 -> 32 : 7 |# |
|
130
|
+
32 -> 64 : 17 |# |
|
131
|
+
64 -> 128 : 936 |############# |
|
132
|
+
|
133
|
+
Ending tracing...
|
134
|
+
|
135
|
+
It's multi-modal, with most I/O taking 0 to 1 milliseconds, then many between
|
136
|
+
64 and 128 milliseconds. This is how it looks in iostat:
|
137
|
+
|
138
|
+
# iostat -x 1
|
139
|
+
|
140
|
+
avg-cpu: %user %nice %system %iowait %steal %idle
|
141
|
+
0.52 0.00 12.37 32.99 0.00 54.12
|
142
|
+
|
143
|
+
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
|
144
|
+
xvdap1 0.00 12.00 0.00 156.00 0.00 19968.00 256.00 52.17 184.38 0.00 184.38 2.33 36.40
|
145
|
+
xvdb 0.00 0.00 298.00 0.00 2732.00 0.00 18.34 0.04 0.12 0.12 0.00 0.11 3.20
|
146
|
+
xvdc 0.00 0.00 297.00 0.00 2712.00 0.00 18.26 0.08 0.27 0.27 0.00 0.24 7.20
|
147
|
+
md0 0.00 0.00 595.00 0.00 5444.00 0.00 18.30 0.00 0.00 0.00 0.00 0.00 0.00
|
148
|
+
|
149
|
+
Fortunately, it turns out that the high latency is to xvdap1, which is for files
|
150
|
+
from a low priority application (processing and writing log files). A high
|
151
|
+
priority application is reading from the other disks, xvdb and xvdc.
|
152
|
+
|
153
|
+
Examining xvdap1 only:
|
154
|
+
|
155
|
+
# ./iolatency -d 202,1 5
|
156
|
+
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
|
157
|
+
|
158
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
159
|
+
0 -> 1 : 38 |## |
|
160
|
+
1 -> 2 : 18 |# |
|
161
|
+
2 -> 4 : 0 | |
|
162
|
+
4 -> 8 : 0 | |
|
163
|
+
8 -> 16 : 5 |# |
|
164
|
+
16 -> 32 : 11 |# |
|
165
|
+
32 -> 64 : 26 |## |
|
166
|
+
64 -> 128 : 894 |######################################|
|
167
|
+
|
168
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
169
|
+
0 -> 1 : 75 |### |
|
170
|
+
1 -> 2 : 11 |# |
|
171
|
+
2 -> 4 : 0 | |
|
172
|
+
4 -> 8 : 4 |# |
|
173
|
+
8 -> 16 : 4 |# |
|
174
|
+
16 -> 32 : 7 |# |
|
175
|
+
32 -> 64 : 13 |# |
|
176
|
+
64 -> 128 : 1141 |######################################|
|
177
|
+
|
178
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
179
|
+
0 -> 1 : 61 |######## |
|
180
|
+
1 -> 2 : 21 |### |
|
181
|
+
2 -> 4 : 5 |# |
|
182
|
+
4 -> 8 : 1 |# |
|
183
|
+
8 -> 16 : 5 |# |
|
184
|
+
16 -> 32 : 7 |# |
|
185
|
+
32 -> 64 : 19 |### |
|
186
|
+
64 -> 128 : 324 |######################################|
|
187
|
+
128 -> 256 : 7 |# |
|
188
|
+
256 -> 512 : 26 |#### |
|
189
|
+
^C
|
190
|
+
Ending tracing...
|
191
|
+
|
192
|
+
And now xvdb:
|
193
|
+
|
194
|
+
# ./iolatency -d 202,16 5
|
195
|
+
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
|
196
|
+
|
197
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
198
|
+
0 -> 1 : 1427 |######################################|
|
199
|
+
1 -> 2 : 5 |# |
|
200
|
+
2 -> 4 : 3 |# |
|
201
|
+
|
202
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
203
|
+
0 -> 1 : 1409 |######################################|
|
204
|
+
1 -> 2 : 6 |# |
|
205
|
+
2 -> 4 : 1 |# |
|
206
|
+
4 -> 8 : 1 |# |
|
207
|
+
|
208
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
209
|
+
0 -> 1 : 1478 |######################################|
|
210
|
+
1 -> 2 : 6 |# |
|
211
|
+
2 -> 4 : 5 |# |
|
212
|
+
4 -> 8 : 0 | |
|
213
|
+
8 -> 16 : 2 |# |
|
214
|
+
|
215
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
216
|
+
0 -> 1 : 1437 |######################################|
|
217
|
+
1 -> 2 : 5 |# |
|
218
|
+
2 -> 4 : 7 |# |
|
219
|
+
4 -> 8 : 0 | |
|
220
|
+
8 -> 16 : 1 |# |
|
221
|
+
[...]
|
222
|
+
|
223
|
+
While that's much better, it is reaching the 8 - 16 millisecond range,
|
224
|
+
and these are SSDs with a light workload (~1500 IOPS).
|
225
|
+
|
226
|
+
I already know from iosnoop(8) analysis the reason for these high latency
|
227
|
+
outliers: they are queued behind writes. However, these writes are to a
|
228
|
+
different disk -- somewhere in this virtualized guest (Xen) there may be a
|
229
|
+
shared I/O queue.
|
230
|
+
|
231
|
+
One way to explore this is to reduce the queue length for the low priority disk,
|
232
|
+
so that it is less likely to pollute any shared queue. (There are other ways to
|
233
|
+
investigate and fix this too.) Here I reduce the disk queue length from its
|
234
|
+
default of 128 to 4:
|
235
|
+
|
236
|
+
# echo 4 > /sys/block/xvda1/queue/nr_requests
|
237
|
+
|
238
|
+
The overall distribution looks much better:
|
239
|
+
|
240
|
+
# ./iolatency 5
|
241
|
+
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
|
242
|
+
|
243
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
244
|
+
0 -> 1 : 3005 |######################################|
|
245
|
+
1 -> 2 : 19 |# |
|
246
|
+
2 -> 4 : 9 |# |
|
247
|
+
4 -> 8 : 45 |# |
|
248
|
+
8 -> 16 : 859 |########### |
|
249
|
+
16 -> 32 : 16 |# |
|
250
|
+
|
251
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
252
|
+
0 -> 1 : 2959 |######################################|
|
253
|
+
1 -> 2 : 43 |# |
|
254
|
+
2 -> 4 : 16 |# |
|
255
|
+
4 -> 8 : 39 |# |
|
256
|
+
8 -> 16 : 1009 |############# |
|
257
|
+
16 -> 32 : 76 |# |
|
258
|
+
|
259
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
260
|
+
0 -> 1 : 3031 |######################################|
|
261
|
+
1 -> 2 : 27 |# |
|
262
|
+
2 -> 4 : 9 |# |
|
263
|
+
4 -> 8 : 24 |# |
|
264
|
+
8 -> 16 : 422 |###### |
|
265
|
+
16 -> 32 : 5 |# |
|
266
|
+
^C
|
267
|
+
Ending tracing...
|
268
|
+
|
269
|
+
Latency only reaching 32 ms.
|
270
|
+
|
271
|
+
Our important disk didn't appear to change much -- maybe a slight improvement
|
272
|
+
to the outliers:
|
273
|
+
|
274
|
+
# ./iolatency -d 202,16 5
|
275
|
+
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
|
276
|
+
|
277
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
278
|
+
0 -> 1 : 1449 |######################################|
|
279
|
+
1 -> 2 : 6 |# |
|
280
|
+
2 -> 4 : 5 |# |
|
281
|
+
4 -> 8 : 1 |# |
|
282
|
+
|
283
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
284
|
+
0 -> 1 : 1519 |######################################|
|
285
|
+
1 -> 2 : 12 |# |
|
286
|
+
|
287
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
288
|
+
0 -> 1 : 1466 |######################################|
|
289
|
+
1 -> 2 : 2 |# |
|
290
|
+
2 -> 4 : 3 |# |
|
291
|
+
|
292
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
293
|
+
0 -> 1 : 1460 |######################################|
|
294
|
+
1 -> 2 : 4 |# |
|
295
|
+
2 -> 4 : 7 |# |
|
296
|
+
[...]
|
297
|
+
|
298
|
+
And here's the other disk after the queue length change:
|
299
|
+
|
300
|
+
# ./iolatency -d 202,1 5
|
301
|
+
Tracing block I/O. Output every 5 seconds. Ctrl-C to end.
|
302
|
+
|
303
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
304
|
+
0 -> 1 : 85 |### |
|
305
|
+
1 -> 2 : 12 |# |
|
306
|
+
2 -> 4 : 21 |# |
|
307
|
+
4 -> 8 : 76 |## |
|
308
|
+
8 -> 16 : 1539 |######################################|
|
309
|
+
16 -> 32 : 10 |# |
|
310
|
+
|
311
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
312
|
+
0 -> 1 : 123 |################## |
|
313
|
+
1 -> 2 : 8 |## |
|
314
|
+
2 -> 4 : 6 |# |
|
315
|
+
4 -> 8 : 17 |### |
|
316
|
+
8 -> 16 : 270 |######################################|
|
317
|
+
16 -> 32 : 2 |# |
|
318
|
+
|
319
|
+
>=(ms) .. <(ms) : I/O |Distribution |
|
320
|
+
0 -> 1 : 91 |### |
|
321
|
+
1 -> 2 : 23 |# |
|
322
|
+
2 -> 4 : 8 |# |
|
323
|
+
4 -> 8 : 71 |### |
|
324
|
+
8 -> 16 : 1223 |######################################|
|
325
|
+
16 -> 32 : 12 |# |
|
326
|
+
^C
|
327
|
+
Ending tracing...
|
328
|
+
|
329
|
+
Much better looking distribution.
|
330
|
+
|
331
|
+
|
332
|
+
Use -h to print the USAGE message:
|
333
|
+
|
334
|
+
# ./iolatency -h
|
335
|
+
USAGE: iolatency [-hQT] [-d device] [-i iotype] [interval [count]]
|
336
|
+
-d device # device string (eg, "202,1)
|
337
|
+
-i iotype # match type (eg, '*R*' for all reads)
|
338
|
+
-Q # use queue insert as start time
|
339
|
+
-T # timestamp on output
|
340
|
+
-h # this usage message
|
341
|
+
interval # summary interval, seconds (default 1)
|
342
|
+
count # number of summaries
|
343
|
+
eg,
|
344
|
+
iolatency # summarize latency every second
|
345
|
+
iolatency -Q # include block I/O queue time
|
346
|
+
iolatency 5 2 # 2 x 5 second summaries
|
347
|
+
iolatency -i '*R*' # trace reads
|
348
|
+
iolatency -d 202,1 # trace device 202,1 only
|
349
|
+
|
350
|
+
See the man page and example file for more info.
|
@@ -0,0 +1,302 @@
|
|
1
|
+
Demonstrations of iosnoop, the Linux ftrace version.
|
2
|
+
|
3
|
+
|
4
|
+
Here's Linux 3.16, tracing tar archiving a filesystem:
|
5
|
+
|
6
|
+
# ./iosnoop
|
7
|
+
Tracing block I/O... Ctrl-C to end.
|
8
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
9
|
+
supervise 1809 W 202,1 17039968 4096 1.32
|
10
|
+
supervise 1809 W 202,1 17039976 4096 1.30
|
11
|
+
tar 14794 RM 202,1 8457608 4096 7.53
|
12
|
+
tar 14794 RM 202,1 8470336 4096 14.90
|
13
|
+
tar 14794 RM 202,1 8470368 4096 0.27
|
14
|
+
tar 14794 RM 202,1 8470784 4096 7.74
|
15
|
+
tar 14794 RM 202,1 8470360 4096 0.25
|
16
|
+
tar 14794 RM 202,1 8469968 4096 0.24
|
17
|
+
tar 14794 RM 202,1 8470240 4096 0.24
|
18
|
+
tar 14794 RM 202,1 8470392 4096 0.23
|
19
|
+
tar 14794 RM 202,1 8470544 4096 5.96
|
20
|
+
tar 14794 RM 202,1 8470552 4096 0.27
|
21
|
+
tar 14794 RM 202,1 8470384 4096 0.24
|
22
|
+
[...]
|
23
|
+
|
24
|
+
The "tar" I/O looks like it is slightly random (based on BLOCK) and 4 Kbytes
|
25
|
+
in size (BYTES). One returned in 14.9 milliseconds, but the rest were fast,
|
26
|
+
so fast (0.24 ms) some may be returning from some level of cache (disk or
|
27
|
+
controller).
|
28
|
+
|
29
|
+
The "RM" TYPE means Read of Metadata. The start of the trace shows a
|
30
|
+
couple of Writes by supervise PID 1809.
|
31
|
+
|
32
|
+
|
33
|
+
Here's a deliberate random I/O workload:
|
34
|
+
|
35
|
+
# ./iosnoop
|
36
|
+
Tracing block I/O. Ctrl-C to end.
|
37
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
38
|
+
randread 9182 R 202,32 30835224 8192 0.18
|
39
|
+
randread 9182 R 202,32 21466088 8192 0.15
|
40
|
+
randread 9182 R 202,32 13529496 8192 0.16
|
41
|
+
randread 9182 R 202,16 21250648 8192 0.18
|
42
|
+
randread 9182 R 202,16 1536776 32768 0.30
|
43
|
+
randread 9182 R 202,32 17157560 24576 0.23
|
44
|
+
randread 9182 R 202,32 21313320 8192 0.16
|
45
|
+
randread 9182 R 202,32 862184 8192 0.18
|
46
|
+
randread 9182 R 202,16 25496872 8192 0.21
|
47
|
+
randread 9182 R 202,32 31471768 8192 0.18
|
48
|
+
randread 9182 R 202,16 27571336 8192 0.20
|
49
|
+
randread 9182 R 202,16 30783448 8192 0.16
|
50
|
+
randread 9182 R 202,16 21435224 8192 1.28
|
51
|
+
randread 9182 R 202,16 970616 8192 0.15
|
52
|
+
randread 9182 R 202,32 13855608 8192 0.16
|
53
|
+
randread 9182 R 202,32 17549960 8192 0.15
|
54
|
+
randread 9182 R 202,32 30938232 8192 0.14
|
55
|
+
[...]
|
56
|
+
|
57
|
+
Note the changing offsets. The resulting latencies are very good in this case,
|
58
|
+
because the storage devices are flash memory-based solid state disks (SSDs).
|
59
|
+
For rotational disks, I'd expect these latencies to be roughly 10 ms.
|
60
|
+
|
61
|
+
|
62
|
+
Here's an idle Linux 3.2 system:
|
63
|
+
|
64
|
+
# ./iosnoop
|
65
|
+
Tracing block I/O. Ctrl-C to end.
|
66
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
67
|
+
supervise 3055 W 202,1 12852496 4096 0.64
|
68
|
+
supervise 3055 W 202,1 12852504 4096 1.32
|
69
|
+
supervise 3055 W 202,1 12852800 4096 0.55
|
70
|
+
supervise 3055 W 202,1 12852808 4096 0.52
|
71
|
+
jbd2/xvda1-212 212 WS 202,1 1066720 45056 41.52
|
72
|
+
jbd2/xvda1-212 212 WS 202,1 1066808 12288 41.52
|
73
|
+
jbd2/xvda1-212 212 WS 202,1 1066832 4096 32.37
|
74
|
+
supervise 3055 W 202,1 12852800 4096 14.28
|
75
|
+
supervise 3055 W 202,1 12855920 4096 14.07
|
76
|
+
supervise 3055 W 202,1 12855960 4096 0.67
|
77
|
+
supervise 3055 W 202,1 12858208 4096 1.00
|
78
|
+
flush:1-409 409 W 202,1 12939640 12288 18.00
|
79
|
+
[...]
|
80
|
+
|
81
|
+
This shows supervise doing various writes from PID 3055. The highest latency
|
82
|
+
was from jbd2/xvda1-212, the journaling block device driver, doing
|
83
|
+
synchronous writes (TYPE = WS).
|
84
|
+
|
85
|
+
|
86
|
+
Options can be added to show the start time (-s) and end time (-t):
|
87
|
+
|
88
|
+
# ./iosnoop -ts
|
89
|
+
Tracing block I/O. Ctrl-C to end.
|
90
|
+
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms
|
91
|
+
5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62
|
92
|
+
5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42
|
93
|
+
5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48
|
94
|
+
5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43
|
95
|
+
5982800.308849 5982800.309452 supervise 1810 W 202,1 12862464 4096 0.60
|
96
|
+
5982800.308856 5982800.309470 supervise 1806 W 202,1 17039632 4096 0.61
|
97
|
+
5982800.309206 5982800.309740 supervise 1806 W 202,1 17039640 4096 0.53
|
98
|
+
5982800.309211 5982800.309805 supervise 1810 W 202,1 12862472 4096 0.59
|
99
|
+
5982800.309332 5982800.309953 supervise 1812 W 202,1 17039648 4096 0.62
|
100
|
+
5982800.309676 5982800.310283 supervise 1812 W 202,1 17039656 4096 0.61
|
101
|
+
[...]
|
102
|
+
|
103
|
+
This is useful when gathering I/O event data for post-processing.
|
104
|
+
|
105
|
+
|
106
|
+
Now for matching on a single PID:
|
107
|
+
|
108
|
+
# ./iosnoop -p 1805
|
109
|
+
Tracing block I/O issued by PID 1805. Ctrl-C to end.
|
110
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
111
|
+
supervise 1805 W 202,1 17039648 4096 0.68
|
112
|
+
supervise 1805 W 202,1 17039672 4096 0.60
|
113
|
+
supervise 1805 W 202,1 17040040 4096 0.62
|
114
|
+
supervise 1805 W 202,1 17040056 4096 0.47
|
115
|
+
supervise 1805 W 202,1 17040624 4096 0.49
|
116
|
+
supervise 1805 W 202,1 17040632 4096 0.44
|
117
|
+
^C
|
118
|
+
Ending tracing...
|
119
|
+
|
120
|
+
This option works by using an in-kernel filter for that PID on I/O issue. There
|
121
|
+
is also a "-n" option to match on process names, however, that currently does so
|
122
|
+
in user space, so is less efficient.
|
123
|
+
|
124
|
+
I would say that this will generally identify the origin process, but there will
|
125
|
+
be an error margin. Depending on the file system, block I/O queueing, and I/O
|
126
|
+
subsystem, this could miss events that aren't issued in this PID context but are
|
127
|
+
related to this PID (eg, triggering a read readahead on the completion of
|
128
|
+
previous I/O. Again, whether this happens is up to the file system and storage
|
129
|
+
subsystem). You can try the -Q option for more reliable process identification.
|
130
|
+
|
131
|
+
|
132
|
+
The -Q option begins tracing on block I/O queue insert, instead of issue.
|
133
|
+
Here's before and after, while dd(1) writes a large file:
|
134
|
+
|
135
|
+
# ./iosnoop
|
136
|
+
Tracing block I/O. Ctrl-C to end.
|
137
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
138
|
+
dd 26983 WS 202,16 4064416 45056 16.70
|
139
|
+
dd 26983 WS 202,16 4064504 45056 16.72
|
140
|
+
dd 26983 WS 202,16 4064592 45056 16.74
|
141
|
+
dd 26983 WS 202,16 4064680 45056 16.75
|
142
|
+
cat 27031 WS 202,16 4064768 45056 16.56
|
143
|
+
cat 27031 WS 202,16 4064856 45056 16.46
|
144
|
+
cat 27031 WS 202,16 4064944 45056 16.40
|
145
|
+
gawk 27030 WS 202,16 4065032 45056 0.88
|
146
|
+
gawk 27030 WS 202,16 4065120 45056 1.01
|
147
|
+
gawk 27030 WS 202,16 4065208 45056 16.15
|
148
|
+
gawk 27030 WS 202,16 4065296 45056 16.16
|
149
|
+
gawk 27030 WS 202,16 4065384 45056 16.16
|
150
|
+
[...]
|
151
|
+
|
152
|
+
The output here shows the block I/O time from issue to completion (LATms),
|
153
|
+
which is largely representative of the device.
|
154
|
+
|
155
|
+
The process names and PIDs identify dd, cat, and gawk. By default iosnoop shows
|
156
|
+
who is on-CPU at time of block I/O issue, but these may not be the processes
|
157
|
+
that originated the I/O. In this case (having debugged it), the reason is that
|
158
|
+
processes such as cat and gawk are making hypervisor calls (this is a Xen
|
159
|
+
guest instance), eg, for memory operations, and during hypervisor processing a
|
160
|
+
queue of pending work is checked and dispatched. So cat and gawk were on-CPU
|
161
|
+
when the block device I/O was issued, but they didn't originate it.
|
162
|
+
|
163
|
+
Now the -Q option is used:
|
164
|
+
|
165
|
+
# ./iosnoop -Q
|
166
|
+
Tracing block I/O. Ctrl-C to end.
|
167
|
+
COMM PID TYPE DEV BLOCK BYTES LATms
|
168
|
+
kjournald 1217 WS 202,16 6132200 45056 141.12
|
169
|
+
kjournald 1217 WS 202,16 6132288 45056 141.10
|
170
|
+
kjournald 1217 WS 202,16 6132376 45056 141.10
|
171
|
+
kjournald 1217 WS 202,16 6132464 45056 141.11
|
172
|
+
kjournald 1217 WS 202,16 6132552 40960 141.11
|
173
|
+
dd 27718 WS 202,16 6132624 4096 0.18
|
174
|
+
flush:16-1279 1279 W 202,16 6132632 20480 0.52
|
175
|
+
flush:16-1279 1279 W 202,16 5940856 4096 0.50
|
176
|
+
flush:16-1279 1279 W 202,16 5949056 4096 0.52
|
177
|
+
flush:16-1279 1279 W 202,16 5957256 4096 0.54
|
178
|
+
flush:16-1279 1279 W 202,16 5965456 4096 0.56
|
179
|
+
flush:16-1279 1279 W 202,16 5973656 4096 0.58
|
180
|
+
flush:16-1279 1279 W 202,16 5981856 4096 0.60
|
181
|
+
flush:16-1279 1279 W 202,16 5990056 4096 0.63
|
182
|
+
[...]
|
183
|
+
|
184
|
+
This uses the block_rq_insert tracepoint as the starting point of I/O, instead
|
185
|
+
of block_rq_issue. This makes the following differences to columns and options:
|
186
|
+
|
187
|
+
- COMM: more likely to show the originating process.
|
188
|
+
- PID: more likely to show the originating process.
|
189
|
+
- LATms: shows the I/O time, including time spent on the block I/O queue.
|
190
|
+
- STARTs (not shown above): shows the time of queue insert, not I/O issue.
|
191
|
+
- -p PID: more likely to match the originating process.
|
192
|
+
- -n name: more likely to match the originating process.
|
193
|
+
|
194
|
+
The reason that this ftrace-based iosnoop does not just instrument both insert
|
195
|
+
and issue tracepoints is one of overhead. Even with buffering, iosnoop can
|
196
|
+
have difficulty under high load.
|
197
|
+
|
198
|
+
|
199
|
+
If I want to capture events for post-processing, I use the duration mode, which
|
200
|
+
not only lets me set the duration, but also uses buffering, which reduces the
|
201
|
+
overheads of tracing.
|
202
|
+
|
203
|
+
Capturing 5 seconds, with both start timestamps (-s) and end timestamps (-t):
|
204
|
+
|
205
|
+
# time ./iosnoop -ts 5 > out
|
206
|
+
|
207
|
+
real 0m5.566s
|
208
|
+
user 0m0.336s
|
209
|
+
sys 0m0.140s
|
210
|
+
# wc out
|
211
|
+
27010 243072 2619744 out
|
212
|
+
|
213
|
+
This server is doing over 5,000 disk IOPS. Even with buffering, this did
|
214
|
+
consume a measurable amount of CPU to capture: 0.48 seconds of CPU time in
|
215
|
+
total. Note that the run took 5.57 seconds: this is 5 seconds for the capture,
|
216
|
+
followed by the CPU time for iosnoop to fetch and process the buffer.
|
217
|
+
|
218
|
+
Now tracing for 30 seconds:
|
219
|
+
|
220
|
+
# time ./iosnoop -ts 30 > out
|
221
|
+
|
222
|
+
real 0m31.207s
|
223
|
+
user 0m0.884s
|
224
|
+
sys 0m0.472s
|
225
|
+
# wc out
|
226
|
+
64259 578313 6232898 out
|
227
|
+
|
228
|
+
Since it's the same server and workload, this should have over 150k events,
|
229
|
+
but only has 64k. The tracing buffer has overflowed, and events have been
|
230
|
+
dropped. If I really must capture this many events, I can either increase
|
231
|
+
the trace buffer size (it's the bufsize_kb setting in the script), or, use
|
232
|
+
a different tracer (perf_evets, SystemTap, ktap, etc.) If the IOPS rate is low
|
233
|
+
(eg, less than 5k), then unbuffered (no duration), despite the higher overheads,
|
234
|
+
may be sufficient, and will keep capturing events until Ctrl-C.
|
235
|
+
|
236
|
+
|
237
|
+
Here's an example of digging into the sequence of I/O to explain an outlier.
|
238
|
+
My randread program on an SSD server (which is an AWS EC2 instance) usually
|
239
|
+
experiences about 0.15 ms I/O latency, but there are some outliers as high as
|
240
|
+
20 milliseconds. Here's an excerpt:
|
241
|
+
|
242
|
+
# ./iosnoop -ts > out
|
243
|
+
# more out
|
244
|
+
Tracing block I/O. Ctrl-C to end.
|
245
|
+
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms
|
246
|
+
6037559.121523 6037559.121685 randread 22341 R 202,32 29295416 8192 0.16
|
247
|
+
6037559.121719 6037559.121874 randread 22341 R 202,16 27515304 8192 0.16
|
248
|
+
[...]
|
249
|
+
6037595.999508 6037596.000051 supervise 1692 W 202,1 12862968 4096 0.54
|
250
|
+
6037595.999513 6037596.000144 supervise 1687 W 202,1 17040160 4096 0.63
|
251
|
+
6037595.999634 6037596.000309 supervise 1693 W 202,1 17040168 4096 0.68
|
252
|
+
6037595.999937 6037596.000440 supervise 1693 W 202,1 17040176 4096 0.50
|
253
|
+
6037596.000579 6037596.001192 supervise 1689 W 202,1 17040184 4096 0.61
|
254
|
+
6037596.000826 6037596.001360 supervise 1689 W 202,1 17040192 4096 0.53
|
255
|
+
6037595.998302 6037596.018133 randread 22341 R 202,32 954168 8192 20.03
|
256
|
+
6037595.998303 6037596.018150 randread 22341 R 202,32 954200 8192 20.05
|
257
|
+
6037596.018182 6037596.018347 randread 22341 R 202,32 18836600 8192 0.16
|
258
|
+
[...]
|
259
|
+
|
260
|
+
It's important to sort on the I/O completion time (ENDs). In this case it's
|
261
|
+
already in the correct order.
|
262
|
+
|
263
|
+
So my 20 ms reads happened after a large group of supervise writes were
|
264
|
+
completed (I truncated dozens of supervise write lines to keep this example
|
265
|
+
short). Other latency outliers in this output file showed the same sequence:
|
266
|
+
slow reads after a batch of writes.
|
267
|
+
|
268
|
+
Note the I/O request timestamp (STARTs), which shows that these 20 ms reads were
|
269
|
+
issued before the supervise writes – so they had been sitting on a queue. I've
|
270
|
+
debugged this type of issue many times before, but this one is different: those
|
271
|
+
writes were to a different device (202,1), so I would have assumed they would be
|
272
|
+
on different queues, and wouldn't interfere with each other. Somewhere in this
|
273
|
+
system (Xen guest) it looks like there is a shared queue. (Having just
|
274
|
+
discovered this using iosnoop, I can't yet tell you which queue, but I'd hope
|
275
|
+
that after identifying it there would be a way to tune its queueing behavior,
|
276
|
+
so that we can eliminate or reduce the severity of these outliers.)
|
277
|
+
|
278
|
+
|
279
|
+
Use -h to print the USAGE message:
|
280
|
+
|
281
|
+
# ./iosnoop -h
|
282
|
+
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name]
|
283
|
+
[duration]
|
284
|
+
-d device # device string (eg, "202,1)
|
285
|
+
-i iotype # match type (eg, '*R*' for all reads)
|
286
|
+
-n name # process name to match on I/O issue
|
287
|
+
-p PID # PID to match on I/O issue
|
288
|
+
-Q # use queue insert as start time
|
289
|
+
-s # include start time of I/O (s)
|
290
|
+
-t # include completion time of I/O (s)
|
291
|
+
-h # this usage message
|
292
|
+
duration # duration seconds, and use buffers
|
293
|
+
eg,
|
294
|
+
iosnoop # watch block I/O live (unbuffered)
|
295
|
+
iosnoop 1 # trace 1 sec (buffered)
|
296
|
+
iosnoop -Q # include queueing time in LATms
|
297
|
+
iosnoop -ts # include start and end timestamps
|
298
|
+
iosnoop -i '*R*' # trace reads
|
299
|
+
iosnoop -p 91 # show I/O issued when PID 91 is on-CPU
|
300
|
+
iosnoop -Qp 91 # show I/O queued by PID 91, queue time
|
301
|
+
|
302
|
+
See the man page and example file for more info.
|