ruby-prof 2.0.2 → 2.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/docs/architecture.md CHANGED
@@ -1,122 +1,304 @@
1
- # Architecture
2
-
3
- ## Overview
4
-
5
- ruby-prof is a C extension that uses Ruby's [TracePoint](https://docs.ruby-lang.org/en/master/TracePoint.html) API to intercept method calls and returns. Every time a method is entered or exited, ruby-prof records timing and (optionally) allocation data. This tracing approach means ruby-prof captures every method invocation, giving exact call counts and complete call graphs.
6
-
7
- The diagram below shows the main classes that make up ruby-prof:
8
-
9
- ```mermaid
10
- classDiagram
11
- Profile "1" *-- "1" Measurer
12
- Profile "1" *-- "*" Thread
13
- Thread "1" *-- "1" Stack
14
- Thread "1" *-- "*" MethodInfo
15
- Thread "1" *-- "1" CallTree
16
- Stack "1" o-- "*" Frame
17
- Frame --> CallTree
18
- CallTree "1" *-- "1" Measurement
19
- CallTree --> MethodInfo : target
20
- MethodInfo "1" *-- "1" CallTrees
21
- MethodInfo "1" *-- "1" Measurement
22
- MethodInfo "1" *-- "*" Allocation
23
- CallTrees o-- "*" CallTree
24
-
25
- class Profile {
26
- +threads: Hash
27
- +measurer: Measurer
28
- }
29
- class Measurer {
30
- +mode: MeasurerMode
31
- +track_allocations: boolean
32
- +multiplier: double
33
- +measure: function pointer
34
- }
35
- class Thread {
36
- +methods: Hash
37
- +stack: Stack
38
- +callTree: CallTree
39
- }
40
- class Stack {
41
- +frames: Array
42
- }
43
- class Frame {
44
- +callTree: CallTree
45
- }
46
- class CallTree {
47
- +parent: CallTree
48
- +children: Hash
49
- +target: MethodInfo
50
- +measurement: Measurement
51
- }
52
- class MethodInfo {
53
- +allocations: Hash
54
- +callTrees: CallTrees
55
- +measurement: Measurement
56
- }
57
- class Measurement {
58
- +total_time: double
59
- +self_time: double
60
- +wait_time: double
61
- +called: integer
62
- }
63
- class Allocation {
64
- +count: integer
65
- +source_file: string
66
- +source_line: int
67
- +klass: VALUE
68
- }
69
- class CallTrees {
70
- +callTrees: Array
71
- }
72
- ```
73
-
74
- ## Profile
75
-
76
- Profile is the top-level object returned by a profiling run:
77
-
78
- ```ruby
79
- profile = RubyProf::Profile.profile do
80
- ...
81
- end
82
- ```
83
-
84
- A Profile owns a Measurer that determines what is being measured, and a collection of Threads representing each thread (or fiber) that was active during profiling.
85
-
86
- ## Measurer and Measurement
87
-
88
- The **Measurer** controls what ruby-prof measures. It holds a function pointer that is called on every method entry and exit to take a measurement. The three modes are:
89
-
90
- - **Wall time** — elapsed real time
91
- - **Process time** — CPU time consumed by the process (excludes time spent in sleep or I/O)
92
- - **Allocations** — number of objects allocated
93
-
94
- Each CallTree and MethodInfo holds a **Measurement** that accumulates the results: total time, self time (excluding children), wait time (time spent waiting on other threads), and call count.
95
-
96
- ## Thread
97
-
98
- Each Thread tracks the methods called on that thread and owns the root of a call tree. It also maintains an internal Stack of Frames used during profiling to track the current call depth.
99
-
100
- **Stack** and **Frame** are transient — they exist only while profiling is active. A Frame records timing data for a single method invocation on the stack, including start time and time spent in child calls. When a method returns, its Frame is popped and the accumulated timing is transferred to the corresponding CallTree node.
101
-
102
- ## CallTree and MethodInfo
103
-
104
- These two classes are central to ruby-prof and represent two different views of the same profiling data:
105
-
106
- - **CallTree** records the calling structure — which method called which, forming a graph. Each node has a parent, children, and a reference to its target MethodInfo. A method that is called from two different call sites will have two separate CallTree nodes, each with its own Measurement. Recursive methods create cycles in the graph.
107
-
108
- - **MethodInfo** represents a single method regardless of where it was called from. It aggregates data across all call sites. Each MethodInfo holds a CallTrees collection that links back to every CallTree node that invoked that method, providing both caller and callee information.
109
-
110
- This separation is what allows ruby-prof to generate both call graph reports (which show calling relationships) and flat reports (which show per-method totals).
111
-
112
- ## Allocation
113
-
114
- When allocation tracking is enabled, each MethodInfo records the objects it allocated. An Allocation tracks the class of object created, the source location, and the count.
115
-
116
- ## Memory Management
117
-
118
- The Profile object is responsible for managing the memory of its child objects, which are C structures. When a Profile is garbage collected, it recursively frees all its objects. In the class diagram, composition relationships (filled diamond) indicate ownership — a Profile frees its Threads, Threads free their CallTrees and MethodInfo instances, and so on.
119
-
120
- ruby-prof keeps a Profile alive as long as there are live references to any of its MethodInfo or CallTree objects. This is done via Ruby's GC mark phase: CallTree instances mark their associated MethodInfo, and MethodInfo instances mark their owning Profile.
121
-
122
- Starting with version 1.5, it is possible to create Thread, CallTree and MethodInfo instances from Ruby (this was added to support testing). These Ruby-created objects are owned by Ruby's garbage collector rather than the C extension. An internal ownership flag on each instance tracks who is responsible for freeing it.
1
+ # Architecture
2
+
3
+ ## Overview
4
+
5
+ ruby-prof is a C extension that uses Ruby's [TracePoint](https://docs.ruby-lang.org/en/master/TracePoint.html) API to intercept method calls and returns. Every time a method is entered or exited, ruby-prof records timing and (optionally) allocation data. This tracing approach means ruby-prof captures every method invocation, giving exact call counts and complete call graphs.
6
+
7
+ The diagram below shows the main classes that make up ruby-prof:
8
+
9
+ ```mermaid
10
+ classDiagram
11
+ Profile "1" *-- "1" Measurer
12
+ Profile "1" *-- "*" Thread
13
+ Thread "1" *-- "1" Stack
14
+ Thread "1" *-- "*" MethodInfo
15
+ Thread "1" *-- "1" CallTree
16
+ Stack "1" o-- "*" Frame
17
+ Frame --> CallTree
18
+ CallTree "1" *-- "1" Measurement
19
+ CallTree --> MethodInfo : target
20
+ MethodInfo "1" *-- "1" CallTrees
21
+ MethodInfo "1" *-- "1" Measurement
22
+ MethodInfo "1" *-- "*" Allocation
23
+ CallTrees o-- "*" CallTree
24
+
25
+ class Profile {
26
+ +threads: Hash
27
+ +measurer: Measurer
28
+ }
29
+ class Measurer {
30
+ +mode: MeasurerMode
31
+ +track_allocations: boolean
32
+ +multiplier: double
33
+ +measure: function pointer
34
+ }
35
+ class Thread {
36
+ +methods: Hash
37
+ +stack: Stack
38
+ +callTree: CallTree
39
+ }
40
+ class Stack {
41
+ +frames: Array
42
+ }
43
+ class Frame {
44
+ +callTree: CallTree
45
+ }
46
+ class CallTree {
47
+ +parent: CallTree
48
+ +children: Hash
49
+ +target: MethodInfo
50
+ +measurement: Measurement
51
+ }
52
+ class MethodInfo {
53
+ +allocations: Hash
54
+ +callTrees: CallTrees
55
+ +measurement: Measurement
56
+ }
57
+ class Measurement {
58
+ +total_time: double
59
+ +self_time: double
60
+ +wait_time: double
61
+ +called: integer
62
+ }
63
+ class Allocation {
64
+ +count: integer
65
+ +source_file: string
66
+ +source_line: int
67
+ +klass: VALUE
68
+ }
69
+ class CallTrees {
70
+ +callTrees: Array
71
+ }
72
+ ```
73
+
74
+ ## Profile
75
+
76
+ Profile is the top-level object returned by a profiling run:
77
+
78
+ ```ruby
79
+ profile = RubyProf::Profile.profile do
80
+ ...
81
+ end
82
+ ```
83
+
84
+ A Profile owns a Measurer that determines what is being measured, and a collection of Threads representing each thread (or fiber) that was active during profiling.
85
+
86
+ ## Measurer and Measurement
87
+
88
+ The **Measurer** controls what ruby-prof measures. It holds a function pointer that is called on every method entry and exit to take a measurement. The three modes are:
89
+
90
+ - **Wall time** — elapsed real time
91
+ - **Process time** — CPU time consumed by the process (excludes time spent in sleep or I/O)
92
+ - **Allocations** — number of objects allocated
93
+
94
+ Each CallTree and MethodInfo holds a **Measurement** that accumulates the results: total time, self time (excluding children), wait time (time spent waiting on other threads), and call count.
95
+
96
+ ## Thread
97
+
98
+ Each Thread tracks the methods called on that thread and owns the root of a call tree. It also maintains an internal Stack of Frames used during profiling to track the current call depth.
99
+
100
+ **Stack** and **Frame** are transient — they exist only while profiling is active. A Frame records timing data for a single method invocation on the stack, including start time and time spent in child calls. When a method returns, its Frame is popped and the accumulated timing is transferred to the corresponding CallTree node.
101
+
102
+ ## CallTree and MethodInfo
103
+
104
+ These two classes are central to ruby-prof and represent two different views of the same profiling data:
105
+
106
+ - **CallTree** records the calling structure — which method called which, forming a tree. Each node has a parent, children, and a reference to its target MethodInfo. A method that is called from two different call sites will have two separate CallTree nodes, each with its own Measurement. Recursive methods are handled by creating a chain of CallTree nodes (see [Recursion](#recursion) below).
107
+
108
+ - **MethodInfo** represents a single method regardless of where it was called from. It aggregates data across all call sites. Each MethodInfo holds a CallTrees collection that links back to every CallTree node that invoked that method, providing both caller and callee information.
109
+
110
+ This separation is what allows ruby-prof to generate both call graph reports (which show calling relationships) and flat reports (which show per-method totals).
111
+
112
+ ## Building the Call Tree
113
+
114
+ This section describes how the call tree is constructed during profiling.
115
+
116
+ Consider profiling this code:
117
+
118
+ ```ruby
119
+ def process
120
+ validate
121
+ save
122
+ end
123
+
124
+ def save
125
+ validate
126
+ write
127
+ end
128
+ ```
129
+
130
+ The resulting CallTree looks like:
131
+
132
+ ```mermaid
133
+ graph TD
134
+ A{{"[global]"}} -->|child| B{{"process"}}
135
+ B -->|child| C{{"validate"}}
136
+ B -->|child| D{{"save"}}
137
+ D -->|child| E{{"validate"}}
138
+ D -->|child| F{{"write"}}
139
+
140
+ B -.->|parent| A
141
+ C -.->|parent| B
142
+ D -.->|parent| B
143
+ E -.->|parent| D
144
+ F -.->|parent| D
145
+ ```
146
+
147
+ Notice that `validate` appears as two separate CallTree nodes — one under `process` and one under `save` — because it was called from two different call sites. Each has its own parent and its own Measurement. Both nodes reference the same `validate` MethodInfo, which aggregates the data across both call sites.
148
+
149
+ The following diagram shows both views together. CallTree nodes (hexagons) reference their target MethodInfo (rectangles) via dashed arrows:
150
+
151
+ ```mermaid
152
+ graph TD
153
+ classDef calltree fill:#E8F4FD,stroke:#2E86C1
154
+ classDef methodinfo fill:#FADBD8,stroke:#E74C3C
155
+
156
+ CT1{{"[global]"}}:::calltree --> CT2{{"process"}}:::calltree
157
+ CT2 --> CT3{{"validate"}}:::calltree
158
+ CT2 --> CT4{{"save"}}:::calltree
159
+ CT4 --> CT5{{"validate"}}:::calltree
160
+ CT4 --> CT6{{"write"}}:::calltree
161
+
162
+ CT1 -.->|target| M1["[global]"]:::methodinfo
163
+ CT2 -.->|target| M2["process"]:::methodinfo
164
+ CT3 -.->|target| M3["validate"]:::methodinfo
165
+ CT4 -.->|target| M4["save"]:::methodinfo
166
+ CT5 -.->|target| M3
167
+ CT6 -.->|target| M5["write"]:::methodinfo
168
+
169
+ M3 -.->|call_trees| CT3
170
+ M3 -.->|call_trees| CT5
171
+ ```
172
+
173
+ Both `validate` CallTree nodes point to the same `validate` MethodInfo via `target`. The MethodInfo points back to its CallTree nodes via `call_trees` — a flat array of every CallTree node that invoked this method. From this array, `callers` and `callees` are derived: `callers` walks each node's parent, and `callees` walks each node's children. Both are aggregated by method to produce a single entry per caller or callee method.
174
+
175
+ ### Parents
176
+
177
+ Each CallTree node has exactly one parent, set at creation. When a method call event fires, the profiler determines the parent from the current frame on the stack:
178
+
179
+ ```c
180
+ parent_call_tree = frame->call_tree;
181
+ ```
182
+
183
+ The parent is the CallTree node that was active (top of the stack) when this method was called. The root CallTree node for each thread has no parent.
184
+
185
+ ### Children
186
+
187
+ A CallTree node's children are stored in a hash table keyed by method. The profiler looks up the method in the current parent's children. If a child already exists for that method, the existing CallTree node is **reused** and its `called` count increments. Otherwise a new node is created:
188
+
189
+ ```c
190
+ call_tree = call_tree_table_lookup(parent_call_tree->children, method->key);
191
+
192
+ if (!call_tree)
193
+ {
194
+ call_tree = prof_call_tree_create(method, parent_call_tree, ...);
195
+ prof_call_tree_add_child(parent_call_tree, call_tree);
196
+ }
197
+ ```
198
+
199
+ This means each parent has **one** child CallTree per method. For example, if `foo` calls `bar` ten times, there is a single `bar` CallTree node under `foo` with `called: 10`.
200
+
201
+ ## Allocation
202
+
203
+ When allocation tracking is enabled, each MethodInfo records the objects it allocated. An Allocation tracks the class of object created, the source location, and the count.
204
+
205
+ ## Memory Management
206
+
207
+ The Profile object is responsible for managing the memory of its child objects, which are C structures. When a Profile is garbage collected, it recursively frees all its objects. In the class diagram, composition relationships (filled diamond) indicate ownership — a Profile frees its Threads, Threads free their CallTrees and MethodInfo instances, and so on.
208
+
209
+ ruby-prof keeps a Profile alive as long as there are live references to any of its MethodInfo or CallTree objects. This is done via Ruby's GC mark phase: CallTree instances mark their associated MethodInfo, and MethodInfo instances mark their owning Profile.
210
+
211
+ Starting with version 1.5, it is possible to create Thread, CallTree and MethodInfo instances from Ruby (this was added to support testing). These Ruby-created objects are owned by Ruby's garbage collector rather than the C extension. An internal ownership flag on each instance tracks who is responsible for freeing it.
212
+
213
+ ## Recursion
214
+
215
+ The call tree handles recursion naturally — each recursive call has a different parent, so new nodes are created at each level just like any other method call. The only special handling is in timing calculation, where care is needed to avoid double-counting.
216
+
217
+ ### How Recursive Calls Create New Nodes
218
+
219
+ Consider a simple recursive method:
220
+
221
+ ```ruby
222
+ def simple(n)
223
+ sleep(1)
224
+ return if n == 0
225
+ simple(n - 1)
226
+ end
227
+
228
+ simple(2)
229
+ ```
230
+
231
+ Each recursive call to `simple` has a different parent CallTree node, so the lookup in the parent's children misses and a new node is created at each level:
232
+
233
+ ```mermaid
234
+ graph TD
235
+ classDef calltree fill:#E8F4FD,stroke:#2E86C1
236
+ classDef methodinfo fill:#FADBD8,stroke:#E74C3C
237
+
238
+ A{{"[global]"}}:::calltree --> B{{"simple"}}:::calltree
239
+ B --> C{{"sleep"}}:::calltree
240
+ B --> D{{"simple"}}:::calltree
241
+ D --> E{{"sleep"}}:::calltree
242
+ D --> F{{"simple"}}:::calltree
243
+ F --> G{{"sleep"}}:::calltree
244
+
245
+ B -.-> A
246
+ C -.-> B
247
+ D -.-> B
248
+ E -.-> D
249
+ F -.-> D
250
+ G -.-> F
251
+
252
+ B -.-> M["simple"]:::methodinfo
253
+ D -.-> M
254
+ F -.-> M
255
+ ```
256
+
257
+ The CallTree is always acyclic — each recursive call creates a new node at a deeper level. However, there is a single `simple` MethodInfo (red rectangle at the bottom of the diagram), and each CallTree node points to it.
258
+
259
+ ### The Visits Counter
260
+
261
+ Both CallTree and MethodInfo have a `visits` field that tracks how many times that node or method is currently on the stack. This counter is incremented on method entry and decremented on method exit:
262
+
263
+ ```c
264
+ // Method entry (prof_frame_push):
265
+ call_tree->visits++;
266
+ if (call_tree->method->visits > 0)
267
+ call_tree->method->recursive = true;
268
+ call_tree->method->visits++;
269
+
270
+ // Method exit (prof_frame_pop):
271
+ call_tree->visits--;
272
+ call_tree->method->visits--;
273
+ ```
274
+
275
+ The MethodInfo `visits` counter serves two purposes:
276
+
277
+ 1. Detecting recursion — if `method->visits > 0` when a method is entered, the method is currently an ancestor of itself in the call stack and is marked recursive.
278
+
279
+ 2. Correct total_time accounting — total time is only added to the Measurement when a node's `visits` drops back to 1, meaning it is the outermost invocation:
280
+
281
+ ```c
282
+ // Only accumulate total_time at the outermost visit
283
+ if (call_tree->visits == 1)
284
+ call_tree->measurement->total_time += total_time;
285
+
286
+ if (call_tree->method->visits == 1)
287
+ call_tree->method->measurement->total_time += total_time;
288
+ ```
289
+
290
+ Without this guard, total time would be double-counted. Consider `simple(2)` with 1-second sleeps. The outermost call takes ~3 seconds total, the middle call ~2 seconds, and the innermost ~1 second. Naively summing all three would give 6 seconds, but the actual elapsed time is only 3 seconds. By only recording total_time at the outermost visit, the MethodInfo correctly reports 3 seconds.
291
+
292
+ ### Recursion at the MethodInfo Level
293
+
294
+ At the MethodInfo level, recursive methods create cycles. A recursive `simple` method has itself as both a caller and a callee:
295
+
296
+ ```mermaid
297
+ graph TD
298
+ classDef methodinfo fill:#FADBD8,stroke:#E74C3C
299
+ A["global"]:::methodinfo -->|"calls"| B["simple"]:::methodinfo
300
+ B -->|"calls"| C["sleep"]:::methodinfo
301
+ B -->|"calls"| B
302
+ ```
303
+
304
+ This is why MethodInfo has a `recursive?` flag — printers that operate on MethodInfo (such as the graph printer) need to be aware of these cycles. However, the underlying CallTree structure is always a tree with no structural cycles.
data/docs/reports.md CHANGED
@@ -41,6 +41,7 @@ The first parameter is any writable IO object such as STDOUT or a file. All prin
41
41
  | `max_percent` | `100` | Maximum %self time for a method to be included (0–100). |
42
42
  | `filter_by` | `:self_time` | Which time metric to use when applying `min_percent` and `max_percent`. |
43
43
  | `sort_method` | varies | How to sort methods. Values: `:total_time`, `:self_time`, `:wait_time`, `:children_time`. |
44
+ | `max_depth` | `nil` | Maximum call tree depth to display. When set, printers that walk the call tree stop descending beyond this depth. Applies to `FlameGraphPrinter`, `CallStackPrinter`, and `CallInfoPrinter`. |
44
45
 
45
46
  ## Report Types
46
47
 
@@ -1,36 +1,39 @@
1
- module RubyProf
2
- # The call info visitor class does a depth-first traversal across a
3
- # list of call infos. At each call_tree node, the visitor executes
4
- # the block provided in the #visit method. The block is passed two
5
- # parameters, the event and the call_tree instance. Event will be
6
- # either :enter or :exit.
7
- #
8
- # visitor = RubyProf::CallTreeVisitor.new(result.threads.first.call_tree)
9
- #
10
- # method_names = Array.new
11
- #
12
- # visitor.visit do |call_tree, event|
13
- # method_names << call_tree.target.full_name if event == :enter
14
- # end
15
- #
16
- # puts method_names
17
- class CallTreeVisitor
18
- def initialize(call_tree)
19
- @call_tree = call_tree
20
- end
21
-
22
- def visit(&block)
23
- visit_call_tree(@call_tree, &block)
24
- end
25
-
26
- private
27
-
28
- def visit_call_tree(call_tree, &block)
29
- yield call_tree, :enter
30
- call_tree.children.each do |child|
31
- visit_call_tree(child, &block)
32
- end
33
- yield call_tree, :exit
34
- end
35
- end
36
- end
1
+ module RubyProf
2
+ # The call info visitor class does a depth-first traversal across a
3
+ # list of call infos. At each call_tree node, the visitor executes
4
+ # the block provided in the #visit method. The block is passed two
5
+ # parameters, the event and the call_tree instance. Event will be
6
+ # either :enter or :exit.
7
+ #
8
+ # visitor = RubyProf::CallTreeVisitor.new(result.threads.first.call_tree)
9
+ #
10
+ # method_names = Array.new
11
+ #
12
+ # visitor.visit do |call_tree, event|
13
+ # method_names << call_tree.target.full_name if event == :enter
14
+ # end
15
+ #
16
+ # puts method_names
17
+ class CallTreeVisitor
18
+ def initialize(call_tree, max_depth: nil)
19
+ @call_tree = call_tree
20
+ @max_depth = max_depth
21
+ end
22
+
23
+ def visit(&block)
24
+ visit_call_tree(@call_tree, 0, &block)
25
+ end
26
+
27
+ private
28
+
29
+ def visit_call_tree(call_tree, depth, &block)
30
+ yield call_tree, :enter
31
+ if @max_depth.nil? || depth < @max_depth
32
+ call_tree.children.each do |child|
33
+ visit_call_tree(child, depth + 1, &block)
34
+ end
35
+ end
36
+ yield call_tree, :exit
37
+ end
38
+ end
39
+ end