fluent-plugin-perf-tools 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (98) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +15 -0
  3. data/.rubocop.yml +26 -0
  4. data/.ruby-version +1 -0
  5. data/CHANGELOG.md +5 -0
  6. data/CODE_OF_CONDUCT.md +84 -0
  7. data/Gemfile +5 -0
  8. data/LICENSE.txt +21 -0
  9. data/README.md +43 -0
  10. data/Rakefile +17 -0
  11. data/bin/console +15 -0
  12. data/bin/setup +8 -0
  13. data/fluent-plugin-perf-tools.gemspec +48 -0
  14. data/lib/fluent/plugin/in_perf_tools.rb +42 -0
  15. data/lib/fluent/plugin/perf_tools/cachestat.rb +65 -0
  16. data/lib/fluent/plugin/perf_tools/command.rb +30 -0
  17. data/lib/fluent/plugin/perf_tools/version.rb +9 -0
  18. data/lib/fluent/plugin/perf_tools.rb +11 -0
  19. data/perf-tools/LICENSE +339 -0
  20. data/perf-tools/README.md +205 -0
  21. data/perf-tools/bin/bitesize +1 -0
  22. data/perf-tools/bin/cachestat +1 -0
  23. data/perf-tools/bin/execsnoop +1 -0
  24. data/perf-tools/bin/funccount +1 -0
  25. data/perf-tools/bin/funcgraph +1 -0
  26. data/perf-tools/bin/funcslower +1 -0
  27. data/perf-tools/bin/functrace +1 -0
  28. data/perf-tools/bin/iolatency +1 -0
  29. data/perf-tools/bin/iosnoop +1 -0
  30. data/perf-tools/bin/killsnoop +1 -0
  31. data/perf-tools/bin/kprobe +1 -0
  32. data/perf-tools/bin/opensnoop +1 -0
  33. data/perf-tools/bin/perf-stat-hist +1 -0
  34. data/perf-tools/bin/reset-ftrace +1 -0
  35. data/perf-tools/bin/syscount +1 -0
  36. data/perf-tools/bin/tcpretrans +1 -0
  37. data/perf-tools/bin/tpoint +1 -0
  38. data/perf-tools/bin/uprobe +1 -0
  39. data/perf-tools/deprecated/README.md +1 -0
  40. data/perf-tools/deprecated/execsnoop-proc +150 -0
  41. data/perf-tools/deprecated/execsnoop-proc.8 +80 -0
  42. data/perf-tools/deprecated/execsnoop-proc_example.txt +46 -0
  43. data/perf-tools/disk/bitesize +175 -0
  44. data/perf-tools/examples/bitesize_example.txt +63 -0
  45. data/perf-tools/examples/cachestat_example.txt +58 -0
  46. data/perf-tools/examples/execsnoop_example.txt +153 -0
  47. data/perf-tools/examples/funccount_example.txt +126 -0
  48. data/perf-tools/examples/funcgraph_example.txt +2178 -0
  49. data/perf-tools/examples/funcslower_example.txt +110 -0
  50. data/perf-tools/examples/functrace_example.txt +341 -0
  51. data/perf-tools/examples/iolatency_example.txt +350 -0
  52. data/perf-tools/examples/iosnoop_example.txt +302 -0
  53. data/perf-tools/examples/killsnoop_example.txt +62 -0
  54. data/perf-tools/examples/kprobe_example.txt +379 -0
  55. data/perf-tools/examples/opensnoop_example.txt +47 -0
  56. data/perf-tools/examples/perf-stat-hist_example.txt +149 -0
  57. data/perf-tools/examples/reset-ftrace_example.txt +88 -0
  58. data/perf-tools/examples/syscount_example.txt +297 -0
  59. data/perf-tools/examples/tcpretrans_example.txt +93 -0
  60. data/perf-tools/examples/tpoint_example.txt +210 -0
  61. data/perf-tools/examples/uprobe_example.txt +321 -0
  62. data/perf-tools/execsnoop +292 -0
  63. data/perf-tools/fs/cachestat +167 -0
  64. data/perf-tools/images/perf-tools_2016.png +0 -0
  65. data/perf-tools/iolatency +296 -0
  66. data/perf-tools/iosnoop +296 -0
  67. data/perf-tools/kernel/funccount +146 -0
  68. data/perf-tools/kernel/funcgraph +259 -0
  69. data/perf-tools/kernel/funcslower +248 -0
  70. data/perf-tools/kernel/functrace +192 -0
  71. data/perf-tools/kernel/kprobe +270 -0
  72. data/perf-tools/killsnoop +263 -0
  73. data/perf-tools/man/man8/bitesize.8 +70 -0
  74. data/perf-tools/man/man8/cachestat.8 +111 -0
  75. data/perf-tools/man/man8/execsnoop.8 +104 -0
  76. data/perf-tools/man/man8/funccount.8 +76 -0
  77. data/perf-tools/man/man8/funcgraph.8 +166 -0
  78. data/perf-tools/man/man8/funcslower.8 +129 -0
  79. data/perf-tools/man/man8/functrace.8 +123 -0
  80. data/perf-tools/man/man8/iolatency.8 +116 -0
  81. data/perf-tools/man/man8/iosnoop.8 +169 -0
  82. data/perf-tools/man/man8/killsnoop.8 +100 -0
  83. data/perf-tools/man/man8/kprobe.8 +162 -0
  84. data/perf-tools/man/man8/opensnoop.8 +113 -0
  85. data/perf-tools/man/man8/perf-stat-hist.8 +111 -0
  86. data/perf-tools/man/man8/reset-ftrace.8 +49 -0
  87. data/perf-tools/man/man8/syscount.8 +96 -0
  88. data/perf-tools/man/man8/tcpretrans.8 +93 -0
  89. data/perf-tools/man/man8/tpoint.8 +140 -0
  90. data/perf-tools/man/man8/uprobe.8 +168 -0
  91. data/perf-tools/misc/perf-stat-hist +223 -0
  92. data/perf-tools/net/tcpretrans +311 -0
  93. data/perf-tools/opensnoop +280 -0
  94. data/perf-tools/syscount +192 -0
  95. data/perf-tools/system/tpoint +232 -0
  96. data/perf-tools/tools/reset-ftrace +123 -0
  97. data/perf-tools/user/uprobe +390 -0
  98. metadata +349 -0
@@ -0,0 +1,153 @@
1
+ Demonstrations of execsnoop, the Linux ftrace version.
2
+
3
+
4
+ Here's execsnoop showing what's really executed by "man ls":
5
+
6
+ # ./execsnoop
7
+ Tracing exec()s. Ctrl-C to end.
8
+ PID PPID ARGS
9
+ 22898 22004 man ls
10
+ 22905 22898 preconv -e UTF-8
11
+ 22908 22898 pager -s
12
+ 22907 22898 nroff -mandoc -rLL=164n -rLT=164n -Tutf8
13
+ 22906 22898 tbl
14
+ 22911 22910 locale charmap
15
+ 22912 22907 groff -mtty-char -Tutf8 -mandoc -rLL=164n -rLT=164n
16
+ 22913 22912 troff -mtty-char -mandoc -rLL=164n -rLT=164n -Tutf8
17
+ 22914 22912 grotty
18
+
19
+ Many commands. This is particularly useful for understanding application
20
+ startup.
21
+
22
+
23
+ Another use for execsnoop is identifying short-lived processes. Eg, with the -t
24
+ option to see timestamps:
25
+
26
+ # ./execsnoop -t
27
+ Tracing exec()s. Ctrl-C to end.
28
+ TIMEs PID PPID ARGS
29
+ 7419756.154031 8185 8181 mawk -W interactive -v o=1 -v opt_name=0 -v name= [...]
30
+ 7419756.154131 8186 8184 cat -v trace_pipe
31
+ 7419756.245264 8188 1698 ./run
32
+ 7419756.245691 8189 1696 ./run
33
+ 7419756.246212 8187 1689 ./run
34
+ 7419756.278993 8190 1693 ./run
35
+ 7419756.278996 8191 1692 ./run
36
+ 7419756.288430 8192 1695 ./run
37
+ 7419756.290115 8193 1691 ./run
38
+ 7419756.292406 8194 1699 ./run
39
+ 7419756.293986 8195 1690 ./run
40
+ 7419756.294149 8196 1686 ./run
41
+ 7419756.296527 8197 1687 ./run
42
+ 7419756.296973 8198 1697 ./run
43
+ 7419756.298356 8200 1685 ./run
44
+ 7419756.298683 8199 1688 ./run
45
+ 7419757.269883 8201 1696 ./run
46
+ [...]
47
+
48
+ So we're running many "run" commands every second. The PPID is included, so I
49
+ can debug this further (they are "supervise" processes).
50
+
51
+ Short-lived processes can consume CPU and not be visible from top(1), and can
52
+ be the source of hidden performance issues.
53
+
54
+
55
+ Here's another example: I noticed CPU usage was high in top(1), but couldn't
56
+ see the responsible process:
57
+
58
+ $ top
59
+ top - 00:04:32 up 78 days, 15:41, 3 users, load average: 0.85, 0.29, 0.14
60
+ Tasks: 123 total, 1 running, 121 sleeping, 0 stopped, 1 zombie
61
+ Cpu(s): 15.7%us, 34.9%sy, 0.0%ni, 49.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
62
+ Mem: 7629464k total, 7537216k used, 92248k free, 1376492k buffers
63
+ Swap: 0k total, 0k used, 0k free, 5432356k cached
64
+
65
+ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
66
+ 7225 bgregg-t 20 0 29480 6196 2128 S 3 0.1 0:02.64 ec2rotatelogs
67
+ 1 root 20 0 24320 2256 1340 S 0 0.0 0:01.23 init
68
+ 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
69
+ 3 root 20 0 0 0 0 S 0 0.0 1:19.61 ksoftirqd/0
70
+ 4 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/0:0
71
+ 5 root 20 0 0 0 0 S 0 0.0 0:00.01 kworker/u:0
72
+ 6 root RT 0 0 0 0 S 0 0.0 0:16.00 migration/0
73
+ 7 root RT 0 0 0 0 S 0 0.0 0:17.29 watchdog/0
74
+ 8 root RT 0 0 0 0 S 0 0.0 0:15.85 migration/1
75
+ 9 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/1:0
76
+ [...]
77
+
78
+ See the line starting with "Cpu(s):". So there's about 50% CPU utilized (this
79
+ is a two CPU server, so that's equivalent to one full CPU), but this CPU usage
80
+ isn't visible from the process listing.
81
+
82
+ vmstat agreed, showing the same average CPU usage statistics:
83
+
84
+ # vmstat 1
85
+ procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
86
+ r b swpd free buff cache si so bi bo in cs us sy id wa
87
+ 2 0 0 92816 1376476 5432188 0 0 0 3 2 1 0 1 99 0
88
+ 1 0 0 92676 1376484 5432264 0 0 0 24 6573 6130 12 38 49 0
89
+ 1 0 0 91964 1376484 5432272 0 0 0 0 6529 6097 16 35 49 0
90
+ 1 0 0 92692 1376484 5432272 0 0 0 0 6192 5775 17 35 49 0
91
+ 1 0 0 92692 1376484 5432272 0 0 0 0 6554 6121 14 36 50 0
92
+ 1 0 0 91940 1376484 5432272 0 0 0 12 6546 6101 13 38 49 0
93
+ 1 0 0 92560 1376484 5432272 0 0 0 0 6201 5769 15 35 49 0
94
+ 1 0 0 92676 1376484 5432272 0 0 0 0 6524 6123 17 34 49 0
95
+ 1 0 0 91932 1376484 5432272 0 0 0 0 6546 6107 10 40 49 0
96
+ 1 0 0 92832 1376484 5432272 0 0 0 0 6057 5710 13 38 49 0
97
+ 1 0 0 92248 1376484 5432272 0 0 84 28 6592 6183 16 36 48 1
98
+ 1 0 0 91504 1376492 5432348 0 0 0 12 6540 6098 18 33 49 1
99
+ [...]
100
+
101
+ So this could be caused by short-lived processes, who vanish before they are
102
+ seen by top(1). Do I have my execsnoop handy? Yes:
103
+
104
+ # ~/perf-tools/bin/execsnoop
105
+ Tracing exec()s. Ctrl-C to end.
106
+ PID PPID ARGS
107
+ 10239 10229 gawk -v o=0 -v opt_name=0 -v name= -v opt_duration=0 [...]
108
+ 10240 10238 cat -v trace_pipe
109
+ 10242 7225 sh [?]
110
+ 10243 10242 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.201201.3122.txt
111
+ 10245 7225 sh [?]
112
+ 10246 10245 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.202201.3122.txt
113
+ 10248 7225 sh [?]
114
+ 10249 10248 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.203201.3122.txt
115
+ 10251 7225 sh [?]
116
+ 10252 10251 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.204201.3122.txt
117
+ 10254 7225 sh [?]
118
+ 10255 10254 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.205201.3122.txt
119
+ 10257 7225 sh [?]
120
+ 10258 10257 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.210201.3122.txt
121
+ 10260 7225 sh [?]
122
+ 10261 10260 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.211201.3122.txt
123
+ 10263 7225 sh [?]
124
+ 10264 10263 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.212201.3122.txt
125
+ 10266 7225 sh [?]
126
+ 10267 10266 /usr/sbin/lsof -X /logs/tomcat/cores/threaddump.20141215.213201.3122.txt
127
+ [...]
128
+
129
+ The output scrolled quickly, showing that many shell and lsof processes were
130
+ being launched. If you check the PID and PPID columns carefully, you can see that
131
+ these are ultimately all from PID 7225. We saw that earlier in the top output:
132
+ ec2rotatelogs, at 3% CPU. I now know the culprit.
133
+
134
+ I should have used "-t" to show the timestamps with this example.
135
+
136
+
137
+ Run -h to print the USAGE message:
138
+
139
+ # ./execsnoop -h
140
+ USAGE: execsnoop [-hrt] [-a argc] [-d secs] [name]
141
+ -d seconds # trace duration, and use buffers
142
+ -a argc # max args to show (default 8)
143
+ -r # include re-execs
144
+ -t # include time (seconds)
145
+ -h # this usage message
146
+ name # process name to match (REs allowed)
147
+ eg,
148
+ execsnoop # watch exec()s live (unbuffered)
149
+ execsnoop -d 1 # trace 1 sec (buffered)
150
+ execsnoop grep # trace process names containing grep
151
+ execsnoop 'log$' # filenames ending in "log"
152
+
153
+ See the man page and example file for more info.
@@ -0,0 +1,126 @@
1
+ Demonstrations of funccount, the Linux ftrace version.
2
+
3
+
4
+ Tracing all kernel functions that start with "bio_" (which would be block
5
+ interface functions), and counting how many times they were executed until
6
+ Ctrl-C is hit:
7
+
8
+ # ./funccount 'bio_*'
9
+ Tracing "bio_*"... Ctrl-C to end.
10
+ ^C
11
+ FUNC COUNT
12
+ bio_attempt_back_merge 26
13
+ bio_get_nr_vecs 361
14
+ bio_alloc 536
15
+ bio_alloc_bioset 536
16
+ bio_endio 536
17
+ bio_free 536
18
+ bio_fs_destructor 536
19
+ bio_init 536
20
+ bio_integrity_enabled 536
21
+ bio_put 729
22
+ bio_add_page 1004
23
+
24
+ Note that these counts are performed in-kernel context, using the ftrace
25
+ function profiler, which means this is a (relatively) low overhead technique.
26
+ Test yourself to quantify overhead.
27
+
28
+
29
+ As was demonstrated here, wildcards can be used. Individual functions can also
30
+ be specified. For example, all of the following are valid arguments:
31
+
32
+ bio_init
33
+ bio_*
34
+ *init
35
+ *bio*
36
+
37
+ A "*" within a string (eg, "bio*init") is not supported.
38
+
39
+ The full list of what can be traced is in:
40
+ /sys/kernel/debug/tracing/available_filter_functions, which can be grep'd to
41
+ check what is there. Note that grep uses regular expressions, whereas
42
+ funccount uses globbing for wildcards.
43
+
44
+
45
+ Counting all "tcp_" kernel functions, and printing a summary every one second:
46
+
47
+ # ./funccount -i 1 -t 5 'tcp_*'
48
+ Tracing "tcp_*". Top 5 only... Ctrl-C to end.
49
+
50
+ FUNC COUNT
51
+ tcp_cleanup_rbuf 386
52
+ tcp_service_net_dma 386
53
+ tcp_established_options 549
54
+ tcp_v4_md5_lookup 560
55
+ tcp_v4_md5_do_lookup 890
56
+
57
+ FUNC COUNT
58
+ tcp_service_net_dma 498
59
+ tcp_cleanup_rbuf 499
60
+ tcp_established_options 664
61
+ tcp_v4_md5_lookup 672
62
+ tcp_v4_md5_do_lookup 1071
63
+
64
+ [...]
65
+
66
+ Neat.
67
+
68
+
69
+ Tracing all "ext4*" kernel functions for 10 seconds, and printing the top 25:
70
+
71
+ # ./funccount -t 25 -d 10 'ext4*'
72
+ Tracing "ext4*" for 10 seconds. Top 25 only...
73
+
74
+ FUNC COUNT
75
+ ext4_inode_bitmap 840
76
+ ext4_meta_trans_blocks 840
77
+ ext4_ext_drop_refs 843
78
+ ext4_find_entry 845
79
+ ext4_discard_preallocations 1008
80
+ ext4_free_inodes_count 1120
81
+ ext4_group_desc_csum 1120
82
+ ext4_group_desc_csum_set 1120
83
+ ext4_getblk 1128
84
+ ext4_es_free_extent 1328
85
+ ext4_map_blocks 1471
86
+ ext4_es_lookup_extent 1751
87
+ ext4_mb_check_limits 1873
88
+ ext4_es_lru_add 2031
89
+ ext4_data_block_valid 2312
90
+ ext4_journal_check_start 3080
91
+ ext4_mark_inode_dirty 5320
92
+ ext4_get_inode_flags 5955
93
+ ext4_get_inode_loc 5955
94
+ ext4_mark_iloc_dirty 5955
95
+ ext4_reserve_inode_write 5955
96
+ ext4_inode_table 7076
97
+ ext4_get_group_desc 8476
98
+ ext4_has_inline_data 9492
99
+ ext4_inode_touch_time_cmp 38980
100
+
101
+ Ending tracing...
102
+
103
+ So ext4_inode_touch_time_cmp() was called the most frequently, at 38,980 times.
104
+ This may be normal, this may not. The purpose of this tool is to give you one
105
+ view of how one or many kernel functions are executed. Previously I had little
106
+ idea what ext4 was doing internally. Now I know the top 25 functions, and their
107
+ rate, and can begin researching them from the source code.
108
+
109
+
110
+ Use -h to print the USAGE message:
111
+
112
+ # ./funccount -h
113
+ USAGE: funccount [-hT] [-i secs] [-d secs] [-t top] funcstring
114
+ -d seconds # total duration of trace
115
+ -h # this usage message
116
+ -i seconds # interval summary
117
+ -t top # show top num entries only
118
+ -T # include timestamp (for -i)
119
+ eg,
120
+ funccount 'vfs*' # trace all funcs that match "vfs*"
121
+ funccount -d 5 'tcp*' # trace "tcp*" funcs for 5 seconds
122
+ funccount -t 10 'ext3*' # show top 10 "ext3*" funcs
123
+ funccount -i 1 'ext3*' # summary every 1 second
124
+ funccount -i 1 -d 5 'ext3*' # 5 x 1 second summaries
125
+
126
+ See the man page and example file for more info.