fluent-plugin-perf-tools 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +15 -0
- data/.rubocop.yml +26 -0
- data/.ruby-version +1 -0
- data/CHANGELOG.md +5 -0
- data/CODE_OF_CONDUCT.md +84 -0
- data/Gemfile +5 -0
- data/LICENSE.txt +21 -0
- data/README.md +43 -0
- data/Rakefile +17 -0
- data/bin/console +15 -0
- data/bin/setup +8 -0
- data/fluent-plugin-perf-tools.gemspec +48 -0
- data/lib/fluent/plugin/in_perf_tools.rb +42 -0
- data/lib/fluent/plugin/perf_tools/cachestat.rb +65 -0
- data/lib/fluent/plugin/perf_tools/command.rb +30 -0
- data/lib/fluent/plugin/perf_tools/version.rb +9 -0
- data/lib/fluent/plugin/perf_tools.rb +11 -0
- data/perf-tools/LICENSE +339 -0
- data/perf-tools/README.md +205 -0
- data/perf-tools/bin/bitesize +1 -0
- data/perf-tools/bin/cachestat +1 -0
- data/perf-tools/bin/execsnoop +1 -0
- data/perf-tools/bin/funccount +1 -0
- data/perf-tools/bin/funcgraph +1 -0
- data/perf-tools/bin/funcslower +1 -0
- data/perf-tools/bin/functrace +1 -0
- data/perf-tools/bin/iolatency +1 -0
- data/perf-tools/bin/iosnoop +1 -0
- data/perf-tools/bin/killsnoop +1 -0
- data/perf-tools/bin/kprobe +1 -0
- data/perf-tools/bin/opensnoop +1 -0
- data/perf-tools/bin/perf-stat-hist +1 -0
- data/perf-tools/bin/reset-ftrace +1 -0
- data/perf-tools/bin/syscount +1 -0
- data/perf-tools/bin/tcpretrans +1 -0
- data/perf-tools/bin/tpoint +1 -0
- data/perf-tools/bin/uprobe +1 -0
- data/perf-tools/deprecated/README.md +1 -0
- data/perf-tools/deprecated/execsnoop-proc +150 -0
- data/perf-tools/deprecated/execsnoop-proc.8 +80 -0
- data/perf-tools/deprecated/execsnoop-proc_example.txt +46 -0
- data/perf-tools/disk/bitesize +175 -0
- data/perf-tools/examples/bitesize_example.txt +63 -0
- data/perf-tools/examples/cachestat_example.txt +58 -0
- data/perf-tools/examples/execsnoop_example.txt +153 -0
- data/perf-tools/examples/funccount_example.txt +126 -0
- data/perf-tools/examples/funcgraph_example.txt +2178 -0
- data/perf-tools/examples/funcslower_example.txt +110 -0
- data/perf-tools/examples/functrace_example.txt +341 -0
- data/perf-tools/examples/iolatency_example.txt +350 -0
- data/perf-tools/examples/iosnoop_example.txt +302 -0
- data/perf-tools/examples/killsnoop_example.txt +62 -0
- data/perf-tools/examples/kprobe_example.txt +379 -0
- data/perf-tools/examples/opensnoop_example.txt +47 -0
- data/perf-tools/examples/perf-stat-hist_example.txt +149 -0
- data/perf-tools/examples/reset-ftrace_example.txt +88 -0
- data/perf-tools/examples/syscount_example.txt +297 -0
- data/perf-tools/examples/tcpretrans_example.txt +93 -0
- data/perf-tools/examples/tpoint_example.txt +210 -0
- data/perf-tools/examples/uprobe_example.txt +321 -0
- data/perf-tools/execsnoop +292 -0
- data/perf-tools/fs/cachestat +167 -0
- data/perf-tools/images/perf-tools_2016.png +0 -0
- data/perf-tools/iolatency +296 -0
- data/perf-tools/iosnoop +296 -0
- data/perf-tools/kernel/funccount +146 -0
- data/perf-tools/kernel/funcgraph +259 -0
- data/perf-tools/kernel/funcslower +248 -0
- data/perf-tools/kernel/functrace +192 -0
- data/perf-tools/kernel/kprobe +270 -0
- data/perf-tools/killsnoop +263 -0
- data/perf-tools/man/man8/bitesize.8 +70 -0
- data/perf-tools/man/man8/cachestat.8 +111 -0
- data/perf-tools/man/man8/execsnoop.8 +104 -0
- data/perf-tools/man/man8/funccount.8 +76 -0
- data/perf-tools/man/man8/funcgraph.8 +166 -0
- data/perf-tools/man/man8/funcslower.8 +129 -0
- data/perf-tools/man/man8/functrace.8 +123 -0
- data/perf-tools/man/man8/iolatency.8 +116 -0
- data/perf-tools/man/man8/iosnoop.8 +169 -0
- data/perf-tools/man/man8/killsnoop.8 +100 -0
- data/perf-tools/man/man8/kprobe.8 +162 -0
- data/perf-tools/man/man8/opensnoop.8 +113 -0
- data/perf-tools/man/man8/perf-stat-hist.8 +111 -0
- data/perf-tools/man/man8/reset-ftrace.8 +49 -0
- data/perf-tools/man/man8/syscount.8 +96 -0
- data/perf-tools/man/man8/tcpretrans.8 +93 -0
- data/perf-tools/man/man8/tpoint.8 +140 -0
- data/perf-tools/man/man8/uprobe.8 +168 -0
- data/perf-tools/misc/perf-stat-hist +223 -0
- data/perf-tools/net/tcpretrans +311 -0
- data/perf-tools/opensnoop +280 -0
- data/perf-tools/syscount +192 -0
- data/perf-tools/system/tpoint +232 -0
- data/perf-tools/tools/reset-ftrace +123 -0
- data/perf-tools/user/uprobe +390 -0
- metadata +349 -0
@@ -0,0 +1,153 @@
|
|
1
|
+
Demonstrations of execsnoop, the Linux ftrace version.
|
2
|
+
|
3
|
+
|
4
|
+
Here's execsnoop showing what's really executed by "man ls":
|
5
|
+
|
6
|
+
# ./execsnoop
|
7
|
+
Tracing exec()s. Ctrl-C to end.
|
8
|
+
PID PPID ARGS
|
9
|
+
22898 22004 man ls
|
10
|
+
22905 22898 preconv -e UTF-8
|
11
|
+
22908 22898 pager -s
|
12
|
+
22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
|
13
|
+
22906 22898 tbl
|
14
|
+
22911 22910 locale charmap
|
15
|
+
22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
|
16
|
+
22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
|
17
|
+
22914 22912 grotty
|
18
|
+
|
19
|
+
Many commands. This is particularly useful for understanding application
|
20
|
+
startup.
|
21
|
+
|
22
|
+
|
23
|
+
Another use for execsnoop is identifying short-lived processes. Eg, with the -t
|
24
|
+
option to see timestamps:
|
25
|
+
|
26
|
+
# ./execsnoop -t
|
27
|
+
Tracing exec()s. Ctrl-C to end.
|
28
|
+
TIMEs PID PPID ARGS
|
29
|
+
7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...]
|
30
|
+
7419756.154131 8186 8184 cat -v trace_pipe
|
31
|
+
7419756.245264 8188 1698 ./run
|
32
|
+
7419756.245691 8189 1696 ./run
|
33
|
+
7419756.246212 8187 1689 ./run
|
34
|
+
7419756.278993 8190 1693 ./run
|
35
|
+
7419756.278996 8191 1692 ./run
|
36
|
+
7419756.288430 8192 1695 ./run
|
37
|
+
7419756.290115 8193 1691 ./run
|
38
|
+
7419756.292406 8194 1699 ./run
|
39
|
+
7419756.293986 8195 1690 ./run
|
40
|
+
7419756.294149 8196 1686 ./run
|
41
|
+
7419756.296527 8197 1687 ./run
|
42
|
+
7419756.296973 8198 1697 ./run
|
43
|
+
7419756.298356 8200 1685 ./run
|
44
|
+
7419756.298683 8199 1688 ./run
|
45
|
+
7419757.269883 8201 1696 ./run
|
46
|
+
[...]
|
47
|
+
|
48
|
+
So we're running many "run" commands every second. The PPID is included, so I
|
49
|
+
can debug this further (they are "supervise" processes).
|
50
|
+
|
51
|
+
Short-lived processes can consume CPU and not be visible from top(1), and can
|
52
|
+
be the source of hidden performance issues.
|
53
|
+
|
54
|
+
|
55
|
+
Here's another example: I noticed CPU usage was high in top(1), but couldn't
|
56
|
+
see the responsible process:
|
57
|
+
|
58
|
+
$ top
|
59
|
+
top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14
|
60
|
+
Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
|
61
|
+
Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
|
62
|
+
Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers
|
63
|
+
Swap: 0k total, 0k used, 0k free, 5432356k cached
|
64
|
+
|
65
|
+
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
|
66
|
+
7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs
|
67
|
+
1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init
|
68
|
+
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
|
69
|
+
3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0
|
70
|
+
4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
|
71
|
+
5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0
|
72
|
+
6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0
|
73
|
+
7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0
|
74
|
+
8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1
|
75
|
+
9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
|
76
|
+
[...]
|
77
|
+
|
78
|
+
See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this
|
79
|
+
is a two CPU server, so that's equivalent to one full CPU), but this CPU usage
|
80
|
+
isn't visible from the process listing.
|
81
|
+
|
82
|
+
vmstat agreed, showing the same average CPU usage statistics:
|
83
|
+
|
84
|
+
# vmstat 1
|
85
|
+
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
|
86
|
+
r b swpd free buff cache si so bi bo in cs us sy id wa
|
87
|
+
2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0
|
88
|
+
1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0
|
89
|
+
1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0
|
90
|
+
1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0
|
91
|
+
1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0
|
92
|
+
1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0
|
93
|
+
1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0
|
94
|
+
1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0
|
95
|
+
1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0
|
96
|
+
1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0
|
97
|
+
1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1
|
98
|
+
1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1
|
99
|
+
[...]
|
100
|
+
|
101
|
+
So this could be caused by short-lived processes, who vanish before they are
|
102
|
+
seen by top(1). Do I have my execsnoop handy? Yes:
|
103
|
+
|
104
|
+
# ~/perf-tools/bin/execsnoop
|
105
|
+
Tracing exec()s. Ctrl-C to end.
|
106
|
+
PID PPID ARGS
|
107
|
+
10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...]
|
108
|
+
10240 10238 cat -v trace_pipe
|
109
|
+
10242 7225 sh [?]
|
110
|
+
10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt
|
111
|
+
10245 7225 sh [?]
|
112
|
+
10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt
|
113
|
+
10248 7225 sh [?]
|
114
|
+
10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt
|
115
|
+
10251 7225 sh [?]
|
116
|
+
10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt
|
117
|
+
10254 7225 sh [?]
|
118
|
+
10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt
|
119
|
+
10257 7225 sh [?]
|
120
|
+
10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt
|
121
|
+
10260 7225 sh [?]
|
122
|
+
10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt
|
123
|
+
10263 7225 sh [?]
|
124
|
+
10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt
|
125
|
+
10266 7225 sh [?]
|
126
|
+
10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt
|
127
|
+
[...]
|
128
|
+
|
129
|
+
The output scrolled quickly, showing that many shell and lsof processes were
|
130
|
+
being launched. If you check the PID and PPID columns carefully, you can see that
|
131
|
+
these are ultimately all from PID 7225. We saw that earlier in the top output:
|
132
|
+
ec2rotatelogs, at 3% CPU. I now know the culprit.
|
133
|
+
|
134
|
+
I should have used "-t" to show the timestamps with this example.
|
135
|
+
|
136
|
+
|
137
|
+
Run -h to print the USAGE message:
|
138
|
+
|
139
|
+
# ./execsnoop -h
|
140
|
+
USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
|
141
|
+
-d seconds # trace duration, and use buffers
|
142
|
+
-a argc # max args to show (default 8)
|
143
|
+
-r # include re-execs
|
144
|
+
-t # include time (seconds)
|
145
|
+
-h # this usage message
|
146
|
+
name # process name to match (REs allowed)
|
147
|
+
eg,
|
148
|
+
execsnoop # watch exec()s live (unbuffered)
|
149
|
+
execsnoop -d 1 # trace 1 sec (buffered)
|
150
|
+
execsnoop grep # trace process names containing grep
|
151
|
+
execsnoop 'log$' # filenames ending in "log"
|
152
|
+
|
153
|
+
See the man page and example file for more info.
|
@@ -0,0 +1,126 @@
|
|
1
|
+
Demonstrations of funccount, the Linux ftrace version.
|
2
|
+
|
3
|
+
|
4
|
+
Tracing all kernel functions that start with "bio_" (which would be block
|
5
|
+
interface functions), and counting how many times they were executed until
|
6
|
+
Ctrl-C is hit:
|
7
|
+
|
8
|
+
# ./funccount 'bio_*'
|
9
|
+
Tracing "bio_*"... Ctrl-C to end.
|
10
|
+
^C
|
11
|
+
FUNC COUNT
|
12
|
+
bio_attempt_back_merge 26
|
13
|
+
bio_get_nr_vecs 361
|
14
|
+
bio_alloc 536
|
15
|
+
bio_alloc_bioset 536
|
16
|
+
bio_endio 536
|
17
|
+
bio_free 536
|
18
|
+
bio_fs_destructor 536
|
19
|
+
bio_init 536
|
20
|
+
bio_integrity_enabled 536
|
21
|
+
bio_put 729
|
22
|
+
bio_add_page 1004
|
23
|
+
|
24
|
+
Note that these counts are performed in-kernel context, using the ftrace
|
25
|
+
function profiler, which means this is a (relatively) low overhead technique.
|
26
|
+
Test yourself to quantify overhead.
|
27
|
+
|
28
|
+
|
29
|
+
As was demonstrated here, wildcards can be used. Individual functions can also
|
30
|
+
be specified. For example, all of the following are valid arguments:
|
31
|
+
|
32
|
+
bio_init
|
33
|
+
bio_*
|
34
|
+
*init
|
35
|
+
*bio*
|
36
|
+
|
37
|
+
A "*" within a string (eg, "bio*init") is not supported.
|
38
|
+
|
39
|
+
The full list of what can be traced is in:
|
40
|
+
/sys/kernel/debug/tracing/available_filter_functions, which can be grep'd to
|
41
|
+
check what is there. Note that grep uses regular expressions, whereas
|
42
|
+
funccount uses globbing for wildcards.
|
43
|
+
|
44
|
+
|
45
|
+
Counting all "tcp_" kernel functions, and printing a summary every one second:
|
46
|
+
|
47
|
+
# ./funccount -i 1 -t 5 'tcp_*'
|
48
|
+
Tracing "tcp_*". Top 5 only... Ctrl-C to end.
|
49
|
+
|
50
|
+
FUNC COUNT
|
51
|
+
tcp_cleanup_rbuf 386
|
52
|
+
tcp_service_net_dma 386
|
53
|
+
tcp_established_options 549
|
54
|
+
tcp_v4_md5_lookup 560
|
55
|
+
tcp_v4_md5_do_lookup 890
|
56
|
+
|
57
|
+
FUNC COUNT
|
58
|
+
tcp_service_net_dma 498
|
59
|
+
tcp_cleanup_rbuf 499
|
60
|
+
tcp_established_options 664
|
61
|
+
tcp_v4_md5_lookup 672
|
62
|
+
tcp_v4_md5_do_lookup 1071
|
63
|
+
|
64
|
+
[...]
|
65
|
+
|
66
|
+
Neat.
|
67
|
+
|
68
|
+
|
69
|
+
Tracing all "ext4*" kernel functions for 10 seconds, and printing the top 25:
|
70
|
+
|
71
|
+
# ./funccount -t 25 -d 10 'ext4*'
|
72
|
+
Tracing "ext4*" for 10 seconds. Top 25 only...
|
73
|
+
|
74
|
+
FUNC COUNT
|
75
|
+
ext4_inode_bitmap 840
|
76
|
+
ext4_meta_trans_blocks 840
|
77
|
+
ext4_ext_drop_refs 843
|
78
|
+
ext4_find_entry 845
|
79
|
+
ext4_discard_preallocations 1008
|
80
|
+
ext4_free_inodes_count 1120
|
81
|
+
ext4_group_desc_csum 1120
|
82
|
+
ext4_group_desc_csum_set 1120
|
83
|
+
ext4_getblk 1128
|
84
|
+
ext4_es_free_extent 1328
|
85
|
+
ext4_map_blocks 1471
|
86
|
+
ext4_es_lookup_extent 1751
|
87
|
+
ext4_mb_check_limits 1873
|
88
|
+
ext4_es_lru_add 2031
|
89
|
+
ext4_data_block_valid 2312
|
90
|
+
ext4_journal_check_start 3080
|
91
|
+
ext4_mark_inode_dirty 5320
|
92
|
+
ext4_get_inode_flags 5955
|
93
|
+
ext4_get_inode_loc 5955
|
94
|
+
ext4_mark_iloc_dirty 5955
|
95
|
+
ext4_reserve_inode_write 5955
|
96
|
+
ext4_inode_table 7076
|
97
|
+
ext4_get_group_desc 8476
|
98
|
+
ext4_has_inline_data 9492
|
99
|
+
ext4_inode_touch_time_cmp 38980
|
100
|
+
|
101
|
+
Ending tracing...
|
102
|
+
|
103
|
+
So ext4_inode_touch_time_cmp() was called the most frequently, at 38,980 times.
|
104
|
+
This may be normal, this may not. The purpose of this tool is to give you one
|
105
|
+
view of how one or many kernel functions are executed. Previously I had little
|
106
|
+
idea what ext4 was doing internally. Now I know the top 25 functions, and their
|
107
|
+
rate, and can begin researching them from the source code.
|
108
|
+
|
109
|
+
|
110
|
+
Use -h to print the USAGE message:
|
111
|
+
|
112
|
+
# ./funccount -h
|
113
|
+
USAGE: funccount [-hT] [-i secs] [-d secs] [-t top] funcstring
|
114
|
+
-d seconds # total duration of trace
|
115
|
+
-h # this usage message
|
116
|
+
-i seconds # interval summary
|
117
|
+
-t top # show top num entries only
|
118
|
+
-T # include timestamp (for -i)
|
119
|
+
eg,
|
120
|
+
funccount 'vfs*' # trace all funcs that match "vfs*"
|
121
|
+
funccount -d 5 'tcp*' # trace "tcp*" funcs for 5 seconds
|
122
|
+
funccount -t 10 'ext3*' # show top 10 "ext3*" funcs
|
123
|
+
funccount -i 1 'ext3*' # summary every 1 second
|
124
|
+
funccount -i 1 -d 5 'ext3*' # 5 x 1 second summaries
|
125
|
+
|
126
|
+
See the man page and example file for more info.
|