fluent-plugin-perf-tools 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (98) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +15 -0
  3. data/.rubocop.yml +26 -0
  4. data/.ruby-version +1 -0
  5. data/CHANGELOG.md +5 -0
  6. data/CODE_OF_CONDUCT.md +84 -0
  7. data/Gemfile +5 -0
  8. data/LICENSE.txt +21 -0
  9. data/README.md +43 -0
  10. data/Rakefile +17 -0
  11. data/bin/console +15 -0
  12. data/bin/setup +8 -0
  13. data/fluent-plugin-perf-tools.gemspec +48 -0
  14. data/lib/fluent/plugin/in_perf_tools.rb +42 -0
  15. data/lib/fluent/plugin/perf_tools/cachestat.rb +65 -0
  16. data/lib/fluent/plugin/perf_tools/command.rb +30 -0
  17. data/lib/fluent/plugin/perf_tools/version.rb +9 -0
  18. data/lib/fluent/plugin/perf_tools.rb +11 -0
  19. data/perf-tools/LICENSE +339 -0
  20. data/perf-tools/README.md +205 -0
  21. data/perf-tools/bin/bitesize +1 -0
  22. data/perf-tools/bin/cachestat +1 -0
  23. data/perf-tools/bin/execsnoop +1 -0
  24. data/perf-tools/bin/funccount +1 -0
  25. data/perf-tools/bin/funcgraph +1 -0
  26. data/perf-tools/bin/funcslower +1 -0
  27. data/perf-tools/bin/functrace +1 -0
  28. data/perf-tools/bin/iolatency +1 -0
  29. data/perf-tools/bin/iosnoop +1 -0
  30. data/perf-tools/bin/killsnoop +1 -0
  31. data/perf-tools/bin/kprobe +1 -0
  32. data/perf-tools/bin/opensnoop +1 -0
  33. data/perf-tools/bin/perf-stat-hist +1 -0
  34. data/perf-tools/bin/reset-ftrace +1 -0
  35. data/perf-tools/bin/syscount +1 -0
  36. data/perf-tools/bin/tcpretrans +1 -0
  37. data/perf-tools/bin/tpoint +1 -0
  38. data/perf-tools/bin/uprobe +1 -0
  39. data/perf-tools/deprecated/README.md +1 -0
  40. data/perf-tools/deprecated/execsnoop-proc +150 -0
  41. data/perf-tools/deprecated/execsnoop-proc.8 +80 -0
  42. data/perf-tools/deprecated/execsnoop-proc_example.txt +46 -0
  43. data/perf-tools/disk/bitesize +175 -0
  44. data/perf-tools/examples/bitesize_example.txt +63 -0
  45. data/perf-tools/examples/cachestat_example.txt +58 -0
  46. data/perf-tools/examples/execsnoop_example.txt +153 -0
  47. data/perf-tools/examples/funccount_example.txt +126 -0
  48. data/perf-tools/examples/funcgraph_example.txt +2178 -0
  49. data/perf-tools/examples/funcslower_example.txt +110 -0
  50. data/perf-tools/examples/functrace_example.txt +341 -0
  51. data/perf-tools/examples/iolatency_example.txt +350 -0
  52. data/perf-tools/examples/iosnoop_example.txt +302 -0
  53. data/perf-tools/examples/killsnoop_example.txt +62 -0
  54. data/perf-tools/examples/kprobe_example.txt +379 -0
  55. data/perf-tools/examples/opensnoop_example.txt +47 -0
  56. data/perf-tools/examples/perf-stat-hist_example.txt +149 -0
  57. data/perf-tools/examples/reset-ftrace_example.txt +88 -0
  58. data/perf-tools/examples/syscount_example.txt +297 -0
  59. data/perf-tools/examples/tcpretrans_example.txt +93 -0
  60. data/perf-tools/examples/tpoint_example.txt +210 -0
  61. data/perf-tools/examples/uprobe_example.txt +321 -0
  62. data/perf-tools/execsnoop +292 -0
  63. data/perf-tools/fs/cachestat +167 -0
  64. data/perf-tools/images/perf-tools_2016.png +0 -0
  65. data/perf-tools/iolatency +296 -0
  66. data/perf-tools/iosnoop +296 -0
  67. data/perf-tools/kernel/funccount +146 -0
  68. data/perf-tools/kernel/funcgraph +259 -0
  69. data/perf-tools/kernel/funcslower +248 -0
  70. data/perf-tools/kernel/functrace +192 -0
  71. data/perf-tools/kernel/kprobe +270 -0
  72. data/perf-tools/killsnoop +263 -0
  73. data/perf-tools/man/man8/bitesize.8 +70 -0
  74. data/perf-tools/man/man8/cachestat.8 +111 -0
  75. data/perf-tools/man/man8/execsnoop.8 +104 -0
  76. data/perf-tools/man/man8/funccount.8 +76 -0
  77. data/perf-tools/man/man8/funcgraph.8 +166 -0
  78. data/perf-tools/man/man8/funcslower.8 +129 -0
  79. data/perf-tools/man/man8/functrace.8 +123 -0
  80. data/perf-tools/man/man8/iolatency.8 +116 -0
  81. data/perf-tools/man/man8/iosnoop.8 +169 -0
  82. data/perf-tools/man/man8/killsnoop.8 +100 -0
  83. data/perf-tools/man/man8/kprobe.8 +162 -0
  84. data/perf-tools/man/man8/opensnoop.8 +113 -0
  85. data/perf-tools/man/man8/perf-stat-hist.8 +111 -0
  86. data/perf-tools/man/man8/reset-ftrace.8 +49 -0
  87. data/perf-tools/man/man8/syscount.8 +96 -0
  88. data/perf-tools/man/man8/tcpretrans.8 +93 -0
  89. data/perf-tools/man/man8/tpoint.8 +140 -0
  90. data/perf-tools/man/man8/uprobe.8 +168 -0
  91. data/perf-tools/misc/perf-stat-hist +223 -0
  92. data/perf-tools/net/tcpretrans +311 -0
  93. data/perf-tools/opensnoop +280 -0
  94. data/perf-tools/syscount +192 -0
  95. data/perf-tools/system/tpoint +232 -0
  96. data/perf-tools/tools/reset-ftrace +123 -0
  97. data/perf-tools/user/uprobe +390 -0
  98. metadata +349 -0
@@ -0,0 +1,153 @@
1
+ Demonstrations of execsnoop, the Linux ftrace version.
2
+
3
+
4
+ Here's execsnoop showing what's really executed by "man ls":
5
+
6
+ # ./execsnoop
7
+ Tracing exec()s. Ctrl-C to end.
8
+ PID PPID ARGS
9
+ 22898 22004 man ls
10
+ 22905 22898 preconv -e UTF-8
11
+ 22908 22898 pager -s
12
+ 22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
13
+ 22906 22898 tbl
14
+ 22911 22910 locale charmap
15
+ 22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
16
+ 22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
17
+ 22914 22912 grotty
18
+
19
+ Many commands. This is particularly useful for understanding application
20
+ startup.
21
+
22
+
23
+ Another use for execsnoop is identifying short-lived processes. Eg, with the -t
24
+ option to see timestamps:
25
+
26
+ # ./execsnoop -t
27
+ Tracing exec()s. Ctrl-C to end.
28
+ TIMEs PID PPID ARGS
29
+ 7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...]
30
+ 7419756.154131 8186 8184 cat -v trace_pipe
31
+ 7419756.245264 8188 1698 ./run
32
+ 7419756.245691 8189 1696 ./run
33
+ 7419756.246212 8187 1689 ./run
34
+ 7419756.278993 8190 1693 ./run
35
+ 7419756.278996 8191 1692 ./run
36
+ 7419756.288430 8192 1695 ./run
37
+ 7419756.290115 8193 1691 ./run
38
+ 7419756.292406 8194 1699 ./run
39
+ 7419756.293986 8195 1690 ./run
40
+ 7419756.294149 8196 1686 ./run
41
+ 7419756.296527 8197 1687 ./run
42
+ 7419756.296973 8198 1697 ./run
43
+ 7419756.298356 8200 1685 ./run
44
+ 7419756.298683 8199 1688 ./run
45
+ 7419757.269883 8201 1696 ./run
46
+ [...]
47
+
48
+ So we're running many "run" commands every second. The PPID is included, so I
49
+ can debug this further (they are "supervise" processes).
50
+
51
+ Short-lived processes can consume CPU and not be visible from top(1), and can
52
+ be the source of hidden performance issues.
53
+
54
+
55
+ Here's another example: I noticed CPU usage was high in top(1), but couldn't
56
+ see the responsible process:
57
+
58
+ $ top
59
+ top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14
60
+ Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
61
+ Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
62
+ Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers
63
+ Swap: 0k total, 0k used, 0k free, 5432356k cached
64
+
65
+ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
66
+ 7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs
67
+ 1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init
68
+ 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
69
+ 3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0
70
+ 4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
71
+ 5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0
72
+ 6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0
73
+ 7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0
74
+ 8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1
75
+ 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
76
+ [...]
77
+
78
+ See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this
79
+ is a two CPU server, so that's equivalent to one full CPU), but this CPU usage
80
+ isn't visible from the process listing.
81
+
82
+ vmstat agreed, showing the same average CPU usage statistics:
83
+
84
+ # vmstat 1
85
+ procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
86
+ r b swpd free buff cache si so bi bo in cs us sy id wa
87
+ 2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0
88
+ 1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0
89
+ 1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0
90
+ 1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0
91
+ 1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0
92
+ 1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0
93
+ 1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0
94
+ 1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0
95
+ 1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0
96
+ 1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0
97
+ 1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1
98
+ 1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1
99
+ [...]
100
+
101
+ So this could be caused by short-lived processes, who vanish before they are
102
+ seen by top(1). Do I have my execsnoop handy? Yes:
103
+
104
+ # ~/perf-tools/bin/execsnoop
105
+ Tracing exec()s. Ctrl-C to end.
106
+ PID PPID ARGS
107
+ 10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...]
108
+ 10240 10238 cat -v trace_pipe
109
+ 10242 7225 sh [?]
110
+ 10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt
111
+ 10245 7225 sh [?]
112
+ 10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt
113
+ 10248 7225 sh [?]
114
+ 10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt
115
+ 10251 7225 sh [?]
116
+ 10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt
117
+ 10254 7225 sh [?]
118
+ 10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt
119
+ 10257 7225 sh [?]
120
+ 10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt
121
+ 10260 7225 sh [?]
122
+ 10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt
123
+ 10263 7225 sh [?]
124
+ 10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt
125
+ 10266 7225 sh [?]
126
+ 10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt
127
+ [...]
128
+
129
+ The output scrolled quickly, showing that many shell and lsof processes were
130
+ being launched. If you check the PID and PPID columns carefully, you can see that
131
+ these are ultimately all from PID 7225. We saw that earlier in the top output:
132
+ ec2rotatelogs, at 3% CPU. I now know the culprit.
133
+
134
+ I should have used "-t" to show the timestamps with this example.
135
+
136
+
137
+ Run -h to print the USAGE message:
138
+
139
+ # ./execsnoop -h
140
+ USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
141
+ -d seconds # trace duration, and use buffers
142
+ -a argc # max args to show (default 8)
143
+ -r # include re-execs
144
+ -t # include time (seconds)
145
+ -h # this usage message
146
+ name # process name to match (REs allowed)
147
+ eg,
148
+ execsnoop # watch exec()s live (unbuffered)
149
+ execsnoop -d 1 # trace 1 sec (buffered)
150
+ execsnoop grep # trace process names containing grep
151
+ execsnoop 'log$' # filenames ending in "log"
152
+
153
+ See the man page and example file for more info.
@@ -0,0 +1,126 @@
1
+ Demonstrations of funccount, the Linux ftrace version.
2
+
3
+
4
+ Tracing all kernel functions that start with "bio_" (which would be block
5
+ interface functions), and counting how many times they were executed until
6
+ Ctrl-C is hit:
7
+
8
+ # ./funccount 'bio_*'
9
+ Tracing "bio_*"... Ctrl-C to end.
10
+ ^C
11
+ FUNC COUNT
12
+ bio_attempt_back_merge 26
13
+ bio_get_nr_vecs 361
14
+ bio_alloc 536
15
+ bio_alloc_bioset 536
16
+ bio_endio 536
17
+ bio_free 536
18
+ bio_fs_destructor 536
19
+ bio_init 536
20
+ bio_integrity_enabled 536
21
+ bio_put 729
22
+ bio_add_page 1004
23
+
24
+ Note that these counts are performed in-kernel context, using the ftrace
25
+ function profiler, which means this is a (relatively) low overhead technique.
26
+ Test yourself to quantify overhead.
27
+
28
+
29
+ As was demonstrated here, wildcards can be used. Individual functions can also
30
+ be specified. For example, all of the following are valid arguments:
31
+
32
+ bio_init
33
+ bio_*
34
+ *init
35
+ *bio*
36
+
37
+ A "*" within a string (eg, "bio*init") is not supported.
38
+
39
+ The full list of what can be traced is in:
40
+ /sys/kernel/debug/tracing/available_filter_functions, which can be grep'd to
41
+ check what is there. Note that grep uses regular expressions, whereas
42
+ funccount uses globbing for wildcards.
43
+
44
+
45
+ Counting all "tcp_" kernel functions, and printing a summary every one second:
46
+
47
+ # ./funccount -i 1 -t 5 'tcp_*'
48
+ Tracing "tcp_*". Top 5 only... Ctrl-C to end.
49
+
50
+ FUNC COUNT
51
+ tcp_cleanup_rbuf 386
52
+ tcp_service_net_dma 386
53
+ tcp_established_options 549
54
+ tcp_v4_md5_lookup 560
55
+ tcp_v4_md5_do_lookup 890
56
+
57
+ FUNC COUNT
58
+ tcp_service_net_dma 498
59
+ tcp_cleanup_rbuf 499
60
+ tcp_established_options 664
61
+ tcp_v4_md5_lookup 672
62
+ tcp_v4_md5_do_lookup 1071
63
+
64
+ [...]
65
+
66
+ Neat.
67
+
68
+
69
+ Tracing all "ext4*" kernel functions for 10 seconds, and printing the top 25:
70
+
71
+ # ./funccount -t 25 -d 10 'ext4*'
72
+ Tracing "ext4*" for 10 seconds. Top 25 only...
73
+
74
+ FUNC COUNT
75
+ ext4_inode_bitmap 840
76
+ ext4_meta_trans_blocks 840
77
+ ext4_ext_drop_refs 843
78
+ ext4_find_entry 845
79
+ ext4_discard_preallocations 1008
80
+ ext4_free_inodes_count 1120
81
+ ext4_group_desc_csum 1120
82
+ ext4_group_desc_csum_set 1120
83
+ ext4_getblk 1128
84
+ ext4_es_free_extent 1328
85
+ ext4_map_blocks 1471
86
+ ext4_es_lookup_extent 1751
87
+ ext4_mb_check_limits 1873
88
+ ext4_es_lru_add 2031
89
+ ext4_data_block_valid 2312
90
+ ext4_journal_check_start 3080
91
+ ext4_mark_inode_dirty 5320
92
+ ext4_get_inode_flags 5955
93
+ ext4_get_inode_loc 5955
94
+ ext4_mark_iloc_dirty 5955
95
+ ext4_reserve_inode_write 5955
96
+ ext4_inode_table 7076
97
+ ext4_get_group_desc 8476
98
+ ext4_has_inline_data 9492
99
+ ext4_inode_touch_time_cmp 38980
100
+
101
+ Ending tracing...
102
+
103
+ So ext4_inode_touch_time_cmp() was called the most frequently, at 38,980 times.
104
+ This may be normal, this may not. The purpose of this tool is to give you one
105
+ view of how one or many kernel functions are executed. Previously I had little
106
+ idea what ext4 was doing internally. Now I know the top 25 functions, and their
107
+ rate, and can begin researching them from the source code.
108
+
109
+
110
+ Use -h to print the USAGE message:
111
+
112
+ # ./funccount -h
113
+ USAGE: funccount [-hT] [-i secs] [-d secs] [-t top] funcstring
114
+ -d seconds # total duration of trace
115
+ -h # this usage message
116
+ -i seconds # interval summary
117
+ -t top # show top num entries only
118
+ -T # include timestamp (for -i)
119
+ eg,
120
+ funccount 'vfs*' # trace all funcs that match "vfs*"
121
+ funccount -d 5 'tcp*' # trace "tcp*" funcs for 5 seconds
122
+ funccount -t 10 'ext3*' # show top 10 "ext3*" funcs
123
+ funccount -i 1 'ext3*' # summary every 1 second
124
+ funccount -i 1 -d 5 'ext3*' # 5 x 1 second summaries
125
+
126
+ See the man page and example file for more info.