memdump 0.1.0 → 0.2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: bb8daaffee80155c4227920118ed3793e3afe4ab
4
- data.tar.gz: b57f30d96b9e35d8519d800cbca852ef6995d965
3
+ metadata.gz: deaf03849e0949a5cf0f6150ea598b02e55411cb
4
+ data.tar.gz: e8d53128488b1d0c83392b0b56eed940327d5a0a
5
5
  SHA512:
6
- metadata.gz: 84b9700c70b49d35b7c7910e1a047d4f1fd9aba7161888090bc3190e6e983a835b9417363376c3eabe648763eb537e57d89726440f9b8ac022b0920be60bd2fd
7
- data.tar.gz: 206a0adbfd4c9f3b8d10f21c0ad07f911384977f79dee4a22e01af9c3710ab6e430e6f4ade72a37655e24eef2137025cec050dce232d2dacb0db75d1ae8f86e3
6
+ metadata.gz: bf78d4e885d83b66e47f8f642b90ba74117a3b7c5a6963ce602f0dbbbe5eab1c19ce2bce1a54f01290f787926a2170520b39e20661ce23ef9f4ee08d1fc2ee68
7
+ data.tar.gz: bbb79a73ec1e0dc13e42040380c12168ebdabc2e9c91d9801421aa025245c1eb3e6070b79f886642d275504ce74749afdd7bdc17c5ccd22d867db101164134b7
data/Gemfile CHANGED
@@ -1,4 +1,6 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
+ gem 'rbtrace', platform: 'mri'
4
+
3
5
  # Specify your gem's dependencies in memdump.gemspec
4
6
  gemspec
data/README.md CHANGED
@@ -86,10 +86,10 @@ Allocation tracing is enabled with
86
86
 
87
87
  ~~~ ruby
88
88
  require 'objspace'
89
- ObjectSpace.trace_objects_allocation_start
89
+ ObjectSpace.trace_object_allocations_start
90
90
  ~~~
91
91
 
92
- ## Analyzing the dump
92
+ ## Basic analysis
93
93
 
94
94
  The first thing you will probably want to do is to run the replace-class command
95
95
  on the dump. It replaces the class attribute, which in the original dump is the
@@ -105,13 +105,122 @@ count by class. For memory leaks, the **diff** command allows you to output the
105
105
  part of the graph that involves new objects (removing the
106
106
  "old-and-not-referred-to-by-new")
107
107
 
108
+ Beyond, this analyzing the dump is best done through the interactive mode:
109
+
110
+ ```
111
+ memdump interactive /tmp/mydump
112
+ ```
113
+
114
+ will get you a pry shell in the context of the loaded MemoryDump object. Use
115
+ the MemoryDump API to filter out what you need. If you're dealing with big dumps,
116
+ it is usually a good idea to save them regularly with `#dump`.
117
+
118
+ One useful call to do at the beginning is #common_cleanup. It collapses the
119
+ common collections (Array, Set, Hash) as well as internal bookkeeping objects
120
+ (ICLASS, …). I usually run this, save the result and re-load the result (which
121
+ is usually significantly smaller).
122
+
123
+ After, the usual process is to find out which non-standard classes are
124
+ unexpectedly present in high numbers using `stats`, extract the objects from
125
+ these classes with `dump = objects_of_class('classname')` and the subgraph that
126
+ keeps them alive with `roots_of(dump)`
127
+
128
+ ```
129
+ # Get the subgraph of all objects whose class name matches /Plan/ and export
130
+ # it to GML to process with Gephi (see below)
131
+ parent_dump, _ = roots_of(objects_of_class(/Plan/))
132
+ parent_dump.to_gml('plan-subgraph.gml')
133
+ ```
134
+
135
+ Once you start filtering dumps, don't forget to simplify your life by `cd`'ing
136
+ in the context of the newly filtered dumps
137
+
108
138
  Beyond that, I usually go back and forth between the memory dump and
109
- [gephi](http://gephi.org), a graph analysis application. the **gml** command
110
- allows to convert the memory dump into a graph format that gephi can import.
111
- From there, use gephi's layouting and filtering algorithms to get an idea of the
112
- most likely objects. Then, you can "massage" the dump using the **root_of**,
113
- **subgraph_of** and **remove-node** commands to narrow the dump to its most useful
114
- parts.
139
+ [gephi](http://gephi.org), a graph analysis application. `to_gml` allows to
140
+ convert the memory dump into a graph format that gephi can import. From there,
141
+ use gephi's layouting and filtering algorithms to get an idea of the shape of
142
+ the dump. Note that you need to first get a graph smaller than a few 10k of objects
143
+ before you can use gephi.
144
+
145
+ ## Dump diffs
146
+
147
+ One powerful way to find out where memory is leaked is to look at objects that
148
+ got allocated and find the interface between the long-term objects and these
149
+ objects. memdump supports this by computing diffs.
150
+
151
+ If you mean to use dump diffs you **MUST** enable allocation tracing. Not doing
152
+ so will make the diffs inaccurate, as memdump will not be able to recognize that some
153
+ object addresses have been reused after a garbage collection.
154
+
155
+ Let's assume that we have a "before.json" and "after.json" dumps. Start an interactive
156
+ shell loading `before`.
157
+
158
+ ```
159
+ memdump interactive before.json
160
+ ```
161
+
162
+ Then, in the shell, let's load the after dump
163
+
164
+ ```
165
+ > after = MemDump::JSONDump.load('after.json')
166
+ ```
167
+
168
+ The set of objects that are in `after` and `before` is given by `#diff`
169
+
170
+ ```
171
+ d = diff(after)
172
+ ```
173
+
174
+ We'll also add a special marker to the records in `d` so that we can easily colorize
175
+ them differently in Gephi.
176
+
177
+ ```
178
+ d = d.map { |r| r['in_after'] = 1; r }
179
+ ```
180
+
181
+ ## Case 1: few new objects are linked to the old ones
182
+
183
+ One possibility is that there are only a few objects in the diff that are kept
184
+ alive from `before`. These objects in turn keep alive a lot more objects (which
185
+ cause the noticeable memory leak). What's interesting in this case is to
186
+ visualize the interface, that is that set of objects.
187
+
188
+ In memdump, one computes it with the `interface_with` method, which computes the
189
+ interface between the receiver and the argument. The receiver must contain the
190
+ edges between itself and the argument, which means in our case that we must use
191
+ `after`.
192
+
193
+ ```
194
+ self_border, diff_border = after.interface_with(d)
195
+ ```
196
+
197
+ In addition to computing the border, it computes the count of objects that are
198
+ kept alive by each object in `diff_border`. Each record in `diff_border` has an
199
+ attribute called `keepalive_count` that counts the amount of nodes in `after`
200
+ that are reachable (i.e. kept alive by) it. It is usually a good idea to
201
+ visualize the distribution of `keepalive_count` to see whether there's indeed
202
+ only a few nodes, and whether some are keeping a lot more objects alive than
203
+ others. Note that cycles that involve more than one "border node" will be
204
+ counted multiple ones (so the sum of `keepalive_count` will be higher than
205
+ `d.size`)
206
+
207
+ ```
208
+ diff_border.size # is this much smaller than d.size ?
209
+ diff_border.each_record.map { |r| r['keepalive_count'] }.sort.reverse # are there some high counts at the top ?
210
+ ```
211
+
212
+ From there, one needs to do a bunch of back-and-forth between memdump and Gephi.
213
+ What I usually do is start by dumping the whole subgraph that contains the border
214
+ and visualize. If I can't make any sense of it, I isolate the high-count elements
215
+ in the border and visualize the related subgraph
216
+
217
+ ```
218
+ full_subgraph = after.roots_of(diff_border)
219
+ full_subgraph.to_gml 'full.gml'
220
+ filtered_border = diff_border.find_all { |r| r['keepalive_count'] > 1000 }
221
+ filtered_subgraph = after.roots_of(filtered_border)
222
+ filtered_subgraph.to_gml 'filtered.gml'
223
+ ```
115
224
 
116
225
  ## Contributing
117
226
 
@@ -1,2 +1,3 @@
1
+ #! /usr/bin/env ruby
1
2
  require 'memdump/cli'
2
3
  MemDump::CLI.start(ARGV)
@@ -1,5 +1,22 @@
1
+ require 'rgl/adjacency'
2
+ require 'rgl/dijkstra'
3
+ require 'rgl/traversal'
4
+
1
5
  require "memdump/version"
6
+ require 'memdump/json_dump'
7
+ require 'memdump/memory_dump'
8
+
9
+ require 'memdump/cleanup_references'
10
+ require 'memdump/common_ancestor'
11
+ require 'memdump/convert_to_gml'
12
+ require 'memdump/out_degree'
13
+ require 'memdump/remove_node'
14
+ require 'memdump/replace_class_address_by_name'
15
+ require 'memdump/root_of'
16
+ require 'memdump/subgraph_of'
2
17
 
3
- module Memdump
4
- # Your code goes here...
18
+ module MemDump
19
+ def self.pry(dump)
20
+ binding.pry
21
+ end
5
22
  end
@@ -1,6 +1,6 @@
1
1
  require 'thor'
2
2
  require 'pathname'
3
- require 'memdump/json_dump'
3
+ require 'memdump'
4
4
 
5
5
  module MemDump
6
6
  class CLI < Thor
@@ -17,17 +17,14 @@ module MemDump
17
17
 
18
18
  desc 'diff SOURCE TARGET OUTPUT', 'generate a memory dump that contains the objects in TARGET not in SOURCE, and all their parents'
19
19
  def diff(source, target, output)
20
- require 'memdump/diff'
21
-
22
- STDOUT.sync = true
23
- from = MemDump::JSONDump.new(Pathname.new(source))
24
- to = MemDump::JSONDump.new(Pathname.new(target))
25
- records = MemDump.diff(from, to)
26
- File.open(output, 'w') do |io|
27
- records.each do |r|
28
- io.puts JSON.dump(r)
29
- end
30
- end
20
+ from = MemDump::JSONDump.load(source)
21
+ to = MemDump::JSONDump.load(target)
22
+ diff = from.diff(to)
23
+ STDOUT.sync
24
+ puts "#{diff.size} nodes are in target but not in source"
25
+ diff = to.roots_of(diff)
26
+ puts "#{diff.size} nodes in final dump"
27
+ diff.save(output)
31
28
  end
32
29
 
33
30
  desc 'gml DUMP GML', 'converts a memory dump into a graph in the GML format (for processing by e.g. gephi)'
@@ -82,13 +79,9 @@ module MemDump
82
79
  if output_path then Pathname.new(output_path)
83
80
  else dump_path
84
81
  end
85
- dump = MemDump::JSONDump.new(dump_path)
86
- result = MemDump.replace_class_address_by_name(dump, add_reference_to_class: options[:add_ref])
87
- output_path.open('w') do |io|
88
- result.each do |r|
89
- io.puts JSON.dump(r)
90
- end
91
- end
82
+ dump = MemDump::JSONDump.load(dump_path)
83
+ dump = dump.replace_class_id_by_class_name(add_reference_to_class: options[:add_ref])
84
+ dump.save(output_path)
92
85
  end
93
86
 
94
87
  desc 'cleanup-refs DUMP OUTPUT', "removes references to deleted objects"
@@ -121,13 +114,39 @@ module MemDump
121
114
  def stats(dump)
122
115
  require 'pp'
123
116
  require 'memdump/stats'
124
- dump = MemDump::JSONDump.new(Pathname.new(dump))
125
- unknown, by_type = MemDump.stats(dump)
117
+ dump = MemDump::JSONDump.load(dump)
118
+ unknown, by_type = dump.stats
126
119
  puts "#{unknown} objects without a known type"
127
120
  by_type.sort_by { |n, v| v }.reverse.each do |n, v|
128
121
  puts "#{n}: #{v}"
129
122
  end
130
123
  end
124
+
125
+ desc 'out_degree DUMP', 'display the direct count of objects held by each object in the dump'
126
+ option "min", desc: "hide the objects whose degree is lower than this",
127
+ type: :numeric
128
+ def out_degree(dump)
129
+ dump = MemDump::JSONDump.new(Pathname.new(dump))
130
+ min = options[:min] || 0
131
+ sorted = dump.each_record.sort_by { |r| (r['references'] || Array.new).size }
132
+ sorted.each do |r|
133
+ size = (r['references'] || Array.new).size
134
+ break if size > min
135
+ puts "#{size} #{r}"
136
+ end
137
+ end
138
+
139
+ desc 'interactive DUMP', 'loads a dump file and spawn a pry shell'
140
+ option :load, desc: 'load the whole dump in memory', type: :boolean, default: true
141
+ def interactive(dump)
142
+ require 'memdump'
143
+ require 'pry'
144
+ dump = MemDump::JSONDump.new(Pathname.new(dump))
145
+ if options[:load]
146
+ dump = dump.load
147
+ end
148
+ dump.pry
149
+ end
131
150
  end
132
151
  end
133
152
 
@@ -0,0 +1,44 @@
1
+ module MemDump
2
+ def self.common_ancestors(dump, class_name, threshold: 0.1)
3
+ selected_records = Hash.new
4
+ remaining_records = Array.new
5
+ dump.each_record do |r|
6
+ if class_name === r['class']
7
+ selected_records[r['address']] = r
8
+ else
9
+ remaining_records << r
10
+ end
11
+ end
12
+
13
+ remaining_records = Array.new
14
+ selected_records = Hash.new
15
+ selected_root = root_address
16
+ dump.each_record do |r|
17
+ address = (r['address'] || r['root'])
18
+ if selected_root == address
19
+ selected_records[address] = r
20
+ selected_root = nil;
21
+ else
22
+ remaining_records << r
23
+ end
24
+ end
25
+
26
+ count = 0
27
+ while count != selected_records.size
28
+ count = selected_records.size
29
+ remaining_records.delete_if do |r|
30
+ references = r['references']
31
+ if references && references.any? { |a| selected_records.has_key?(a) }
32
+ address = (r['address'] || r['root'])
33
+ selected_records[address] = r
34
+ end
35
+ end
36
+ end
37
+
38
+ selected_records.values.reverse.each do |r|
39
+ if refs = r['references']
40
+ refs.delete_if { |a| !selected_records.has_key?(a) }
41
+ end
42
+ end
43
+ end
44
+ end
@@ -1,47 +1,37 @@
1
- require 'set'
2
-
3
1
  module MemDump
4
2
  def self.convert_to_gml(dump, io)
5
- nodes = dump.each_record.map do |row|
6
- if row['class_address'] # transformed with replace_class_address_by_name
7
- name = row['class']
8
- else
9
- name = row['struct'] || row['root'] || row['type']
10
- end
11
-
12
- address = row['address'] || row['root']
13
- refs = Hash.new
14
- if row_refs = row['references']
15
- row_refs.each { |r| refs[r] = nil }
16
- end
17
-
18
- [address, refs, name]
19
- end
20
-
21
3
  io.puts "graph"
22
4
  io.puts "["
23
- known_addresses = Set.new
24
- nodes.each do |address, refs, name|
25
- known_addresses << address
5
+
6
+ edges = []
7
+ dump.each_record do |row|
8
+ address = row['address']
9
+
26
10
  io.puts " node"
27
11
  io.puts " ["
28
12
  io.puts " id #{address}"
29
- io.puts " label \"#{name}\""
13
+ row.each do |key, value|
14
+ if value.respond_to?(:to_str)
15
+ io.puts " #{key} \"#{value}\""
16
+ elsif value.kind_of?(Numeric)
17
+ io.puts " #{key} #{value}"
18
+ end
19
+ end
30
20
  io.puts " ]"
31
- end
32
21
 
33
- nodes.each do |address, refs, _|
34
- refs.each do |ref_address, ref_label|
35
- io.puts " edge"
36
- io.puts " ["
37
- io.puts " source #{address}"
38
- io.puts " target #{ref_address}"
39
- if ref_label
40
- io.puts " label \"#{ref_label}\""
41
- end
42
- io.puts " ]"
22
+ row['references'].each do |ref_address|
23
+ edges << address << ref_address
43
24
  end
44
25
  end
26
+
27
+ edges.each_slice(2) do |address, ref_address|
28
+ io.puts " edge"
29
+ io.puts " ["
30
+ io.puts " source #{address}"
31
+ io.puts " target #{ref_address}"
32
+ io.puts " ]"
33
+ end
34
+
45
35
  io.puts "]"
46
36
  end
47
37
  end
@@ -1,22 +1,65 @@
1
+ require 'pathname'
1
2
  require 'json'
2
3
  module MemDump
3
4
  class JSONDump
5
+ def self.load(filename)
6
+ new(filename).load
7
+ end
8
+
4
9
  def initialize(filename)
5
- @filename = filename
10
+ @filename = Pathname(filename)
6
11
  end
7
12
 
8
13
  def each_record
9
14
  return enum_for(__method__) if !block_given?
10
15
 
11
- if @cached_entries
12
- @cached_entries.each(&proc)
13
- else
14
- @filename.open do |f|
15
- f.each_line do |line|
16
- yield JSON.parse(line)
16
+ @filename.open do |f|
17
+ f.each_line do |line|
18
+ r = JSON.parse(line)
19
+ r['address'] ||= r['root']
20
+ r['references'] ||= Set.new
21
+ yield r
22
+ end
23
+ end
24
+ end
25
+
26
+ def load
27
+ address_to_record = Hash.new
28
+ generations = Hash.new
29
+ each_record do |r|
30
+ if !(address = r['address'])
31
+ raise "no address in #{r}"
32
+ end
33
+ r = r.dup
34
+
35
+ if generation = r['generation']
36
+ generations[address] = r['address'] = "#{address}:#{generation}"
37
+ end
38
+ r['references'] = r['references'].to_set
39
+ address_to_record[r['address']] = r
40
+ end
41
+
42
+ if !generations.empty?
43
+ address_to_record.each_value do |r|
44
+ if class_address = r['class']
45
+ r['class'] = generations.fetch(class_address, class_address)
46
+ end
47
+ if class_address = r['class_address']
48
+ r['class_address'] = generations.fetch(class_address, class_address)
17
49
  end
50
+
51
+ refs = Set.new
52
+ r['references'].each do |ref_address|
53
+ refs << generations.fetch(ref_address, ref_address)
54
+ end
55
+ r['references'] = refs
18
56
  end
19
57
  end
58
+ MemoryDump.new(address_to_record)
59
+ end
60
+
61
+ def inspect
62
+ to_s
20
63
  end
21
64
  end
22
65
  end
@@ -0,0 +1,662 @@
1
+ module MemDump
2
+ class MemoryDump
3
+ attr_reader :address_to_record
4
+
5
+ def initialize(address_to_record)
6
+ @address_to_record = address_to_record
7
+ @forward_graph = nil
8
+ @backward_graph = nil
9
+ end
10
+
11
+ def include?(address)
12
+ address_to_record.has_key?(address)
13
+ end
14
+
15
+ def each_record(&block)
16
+ address_to_record.each_value(&block)
17
+ end
18
+
19
+ def addresses
20
+ address_to_record.keys
21
+ end
22
+
23
+ def size
24
+ address_to_record.size
25
+ end
26
+
27
+ def find_by_address(address)
28
+ address_to_record[address]
29
+ end
30
+
31
+ def inspect
32
+ to_s
33
+ end
34
+
35
+ def save(io_or_path)
36
+ if io_or_path.respond_to?(:open)
37
+ io_or_path.open 'w' do |io|
38
+ save(io)
39
+ end
40
+ else
41
+ each_record do |r|
42
+ io_or_path.puts JSON.dump(r)
43
+ end
44
+ end
45
+ end
46
+
47
+ # Filter the records
48
+ #
49
+ # @yieldparam record a record
50
+ # @yieldreturn [Object] the record object that should be included in the
51
+ # returned dump
52
+ # @return [MemoryDump]
53
+ def find_all
54
+ return enum_for(__method__) if !block_given?
55
+
56
+ address_to_record = Hash.new
57
+ each_record do |r|
58
+ if yield(r)
59
+ address_to_record[r['address']] = r
60
+ end
61
+ end
62
+ MemoryDump.new(address_to_record)
63
+ end
64
+
65
+ # Map the records
66
+ #
67
+ # @yieldparam record a record
68
+ # @yieldreturn [Object] the record object that should be included in the
69
+ # returned dump
70
+ # @return [MemoryDump]
71
+ def map
72
+ return enum_for(__method__) if !block_given?
73
+
74
+ address_to_record = Hash.new
75
+ each_record do |r|
76
+ address_to_record[r['address']] = yield(r.dup).to_hash
77
+ end
78
+ MemoryDump.new(address_to_record)
79
+ end
80
+
81
+ # Filter the entries, removing those for which the block returns falsy
82
+ #
83
+ # @yieldparam record a record
84
+ # @yieldreturn [nil,Object] either a record object, or falsy to remove
85
+ # this record in the returned dump
86
+ # @return [MemoryDump]
87
+ def find_and_map
88
+ return enum_for(__method__) if !block_given?
89
+
90
+ address_to_record = Hash.new
91
+ each_record do |r|
92
+ if result = yield(r.dup)
93
+ address_to_record[r['address']] = result.to_hash
94
+ end
95
+ end
96
+ MemoryDump.new(address_to_record)
97
+ end
98
+
99
+ # Return the records of a given type
100
+ #
101
+ # @param [String] name the type
102
+ # @return [MemoryDump] the matching records
103
+ #
104
+ # @example return all ICLASS (singleton) records
105
+ # objects_of_class("ICLASS")
106
+ def objects_of_type(name)
107
+ find_all { |r| name === r['type'] }
108
+ end
109
+
110
+ # Return the records of a given class
111
+ #
112
+ # @param [String] name the class
113
+ # @return [MemoryDump] the matching entries
114
+ #
115
+ # @example return all string records
116
+ # objects_of_class("String")
117
+ def objects_of_class(name)
118
+ find_all { |r| name === r['class'] }
119
+ end
120
+
121
+ # Return the entries that refer to the entries in the dump
122
+ #
123
+ # @param [MemoryDump] the set of entries whose parents we're looking for
124
+ # @param [Integer] min only return the entries in self that refer to
125
+ # more than this much entries in 'dump'
126
+ # @param [Boolean] exclude_dump exclude the entries that are already in
127
+ # 'dump'
128
+ # @return [(MemoryDump,Hash)] the parent entries, and a mapping from
129
+ # records in the parent entries to the count of entries in 'dump' they
130
+ # refer to
131
+ def parents_of(dump, min: 0, exclude_dump: false)
132
+ children = dump.addresses.to_set
133
+ counts = Hash.new
134
+ filtered = find_all do |r|
135
+ next if exclude_dump && children.include?(r['address'])
136
+
137
+ count = r['references'].count { |r| children.include?(r) }
138
+ if count > min
139
+ counts[r] = count
140
+ true
141
+ end
142
+ end
143
+ return filtered, counts
144
+ end
145
+
146
+ # Remove entries from this dump, keeping the transitivity in the
147
+ # remaining graph
148
+ #
149
+ # @param [MemoryDump] entries entries to remove
150
+ #
151
+ # @example remove all entries that are of type HASH
152
+ # collapse(objects_of_type('HASH'))
153
+ def collapse(entries)
154
+ collapsed_entries = Hash.new
155
+ entries.each_record do |r|
156
+ collapsed_entries[r['address']] = r['references'].dup
157
+ end
158
+
159
+
160
+ # Remove references in-between the entries to collapse
161
+ already_expanded = Hash.new { |h, k| h[k] = Set[k] }
162
+ begin
163
+ changed_entries = Hash.new
164
+ collapsed_entries.each do |address, references|
165
+ sets = references.classify { |ref_address| collapsed_entries.has_key?(ref_address) }
166
+ updated_references = sets[false] || Set.new
167
+ if to_collapse = sets[true]
168
+ to_collapse.each do |ref_address|
169
+ next if already_expanded[address].include?(ref_address)
170
+ updated_references.merge(collapsed_entries[ref_address])
171
+ end
172
+ already_expanded[address].merge(to_collapse)
173
+ changed_entries[address] = updated_references
174
+ end
175
+ end
176
+ puts "#{changed_entries.size} changed entries"
177
+ collapsed_entries.merge!(changed_entries)
178
+ end while !changed_entries.empty?
179
+
180
+ find_and_map do |record|
181
+ next if collapsed_entries.has_key?(record['address'])
182
+
183
+ sets = record['references'].classify do |ref_address|
184
+ collapsed_entries.has_key?(ref_address)
185
+ end
186
+ updated_references = sets[false] || Set.new
187
+ if to_collapse = sets[true]
188
+ to_collapse.each do |ref_address|
189
+ updated_references.merge(collapsed_entries[ref_address])
190
+ end
191
+ record = record.dup
192
+ record['references'] = updated_references
193
+ end
194
+ record
195
+ end
196
+ end
197
+
198
+ # Remove entries from the dump, and all references to them
199
+ #
200
+ # @param [MemoryDump] the set of entries to remove, as e.g. returned by
201
+ # {#objects_of_class}
202
+ # @return [MemoryDump] the filtered dump
203
+ def without(entries)
204
+ find_and_map do |record|
205
+ next if entries.include?(record['address'])
206
+ record_refs = record['references']
207
+ references = record_refs.find_all { |r| !entries.include?(r) }
208
+ if references.size != record_refs.size
209
+ record = record.dup
210
+ record['references'] = references.to_set
211
+ end
212
+ record
213
+ end
214
+ end
215
+
216
+ # Write the dump to a GML file that can loaded by Gephi
217
+ #
218
+ # @param [Pathname,String,IO] the path or the IO stream into which we should
219
+ # dump
220
+ def to_gml(io_or_path)
221
+ if io_or_path.kind_of?(IO)
222
+ MemDump.convert_to_gml(self, io_or_path)
223
+ else
224
+ Pathname(io_or_path).open 'w' do |io|
225
+ to_gml(io)
226
+ end
227
+ end
228
+ nil
229
+ end
230
+
231
+ # Save the dump
232
+ def save(io_or_path)
233
+ if io_or_path.kind_of?(IO)
234
+ each_record do |r|
235
+ r = r.dup
236
+ r['address'] = r['address'].gsub(/:\d+$/, '')
237
+ if r['class_address']
238
+ r['class_address'] = r['class_address'].gsub(/:\d+$/, '')
239
+ elsif r['address']
240
+ r['address'] = r['address'].gsub(/:\d+$/, '')
241
+ end
242
+ r['references'] = r['references'].map { |ref_addr| ref_addr.gsub(/:\d+$/, '') }
243
+ io_or_path.puts JSON.dump(r)
244
+ end
245
+ nil
246
+ else
247
+ Pathname(io_or_path).open 'w' do |io|
248
+ save(io)
249
+ end
250
+ end
251
+ end
252
+
253
+ COMMON_COLLAPSE_TYPES = %w{IMEMO HASH ARRAY}
254
+ COMMON_COLLAPSE_CLASSES = %w{Set RubyVM::Env}
255
+
256
+ # Perform common initial cleanup
257
+ #
258
+ # It basically removes common classes that usually make a dump analysis
259
+ # more complicated without providing more information
260
+ #
261
+ # Namely, it collapses internal Ruby node types ROOT and IMEMO, as well
262
+ # as common collection classes {COMMON_COLLAPSE_CLASSES}.
263
+ #
264
+ # One usually analyses a cleaned-up dump before getting into the full
265
+ # dump
266
+ #
267
+ # @return [MemDump] the filtered dump
268
+ def common_cleanup
269
+ without_weakrefs = remove(objects_of_class 'WeakRef')
270
+ to_collapse = without_weakrefs.find_all do |r|
271
+ COMMON_COLLAPSE_CLASSES.include?(r['class']) ||
272
+ COMMON_COLLAPSE_TYPES.include?(r['type']) ||
273
+ r['method'] == 'dump_all'
274
+ end
275
+ without_weakrefs.collapse(to_collapse)
276
+ end
277
+
278
+ # Remove entries in the reference for which we can't find an object with
279
+ # the matching address
280
+ #
281
+ # @return [(MemoryDump,Set)] the filtered dump and the set of missing addresses found
282
+ def remove_invalid_references
283
+ addresses = self.addresses.to_set
284
+ missing = Set.new
285
+ result = map do |r|
286
+ common = (addresses & r['references'])
287
+ if common.size != r['references'].size
288
+ missing.merge(r['references'] - common)
289
+ end
290
+ r = r.dup
291
+ r['references'] = common
292
+ r
293
+ end
294
+ return result, missing
295
+ end
296
+
297
+ # Return the graph of object that keeps objects in dump alive
298
+ #
299
+ # It contains only the shortest paths from the roots to the objects in
300
+ # dump
301
+ #
302
+ # @param [MemoryDump] dump
303
+ # @return [MemoryDump]
304
+ def roots_of(dump, root_dump: nil)
305
+ if root_dump && root_dump.empty?
306
+ raise ArgumentError, "no roots provided"
307
+ end
308
+
309
+ root_addresses =
310
+ if root_dump then root_dump.addresses
311
+ else
312
+ ['ALL_ROOTS']
313
+ end
314
+
315
+ ensure_graphs_computed
316
+
317
+ result_nodes = Set.new
318
+ dump_addresses = dump.addresses
319
+ root_addresses.each do |root_address|
320
+ visitor = RGL::DijkstraVisitor.new(@forward_graph)
321
+ dijkstra = RGL::DijkstraAlgorithm.new(@forward_graph, Hash.new(1), visitor)
322
+ dijkstra.find_shortest_paths(root_address)
323
+ path_builder = RGL::PathBuilder.new(root_address, visitor.parents_map)
324
+
325
+ dump_addresses.each_with_index do |record_address, record_i|
326
+ if path = path_builder.path(record_address)
327
+ result_nodes.merge(path)
328
+ end
329
+ end
330
+ end
331
+
332
+ find_and_map do |record|
333
+ address = record['address']
334
+ next if !result_nodes.include?(address)
335
+
336
+ # Prefer records in 'dump' to allow for annotations in the
337
+ # source
338
+ record = dump.find_by_address(address) || record
339
+ record = record.dup
340
+ record['references'] = result_nodes & record['references']
341
+ record
342
+ end
343
+ end
344
+
345
+ def minimum_spanning_tree(root_dump)
346
+ if root_dump.size != 1
347
+ raise ArgumentError, "there should be exactly one root"
348
+ end
349
+ root_address, _ = root_dump.address_to_record.first
350
+ if !(root = address_to_record[root_address])
351
+ raise ArgumentError, "no record with address #{root_address} in self"
352
+ end
353
+
354
+ ensure_graphs_computed
355
+
356
+ mst = @forward_graph.minimum_spanning_tree(root)
357
+ map = Hash.new
358
+ mst.each_vertex do |record|
359
+ record = record.dup
360
+ record['references'] = record['references'].dup
361
+ record['references'].delete_if { |ref_address| !mst.has_vertex?(ref_address) }
362
+ end
363
+ MemoryDump.new(map)
364
+ end
365
+
366
+ # @api private
367
+ #
368
+ # Ensure that @forward_graph and @backward_graph are computed
369
+ def ensure_graphs_computed
370
+ if !@forward_graph
371
+ @forward_graph, @backward_graph = compute_graphs
372
+ end
373
+ end
374
+
375
+ # @api private
376
+ #
377
+ # Force recomputation of the graph representation of the dump the next
378
+ # time it is needed
379
+ def clear_graph
380
+ @forward_graph = nil
381
+ @backward_graph = nil
382
+ end
383
+
384
+ # @api private
385
+ #
386
+ # Create two RGL::DirectedAdjacencyGraph, for the forward and backward edges of the graph
387
+ def compute_graphs
388
+ forward_graph = RGL::DirectedAdjacencyGraph.new
389
+ forward_graph.add_vertex 'ALL_ROOTS'
390
+ address_to_record.each do |address, record|
391
+ forward_graph.add_vertex(address)
392
+
393
+ if record['type'] == 'ROOT'
394
+ forward_graph.add_edge('ALL_ROOTS', address)
395
+ end
396
+ record['references'].each do |ref_address|
397
+ forward_graph.add_edge(address, ref_address)
398
+ end
399
+ end
400
+
401
+ backward_graph = RGL::DirectedAdjacencyGraph.new
402
+ forward_graph.each_edge do |u, v|
403
+ backward_graph.add_edge(v, u)
404
+ end
405
+ return forward_graph, backward_graph
406
+ end
407
+
408
+ def depth_first_visit(root, &block)
409
+ ensure_graphs_computed
410
+ @forward_graph.depth_first_visit(root, &block)
411
+ end
412
+
413
+ # Validate that all reference entries have a matching dump entry
414
+ #
415
+ # @raise [RuntimeError] if references have been found
416
+ def validate_references
417
+ addresses = self.addresses.to_set
418
+ each_record do |r|
419
+ common = addresses & r['references']
420
+ if common.size != r['references'].size
421
+ missing = r['references'] - common
422
+ raise "#{r} references #{missing.to_a.sort.join(", ")} which do not exist"
423
+ end
424
+ end
425
+ nil
426
+ end
427
+
428
+ # Get a random sample of the records
429
+ #
430
+ # The sampling is random, so the returned set might be bigger or smaller
431
+ # than expected. Do not use on small sets.
432
+ #
433
+ # @param [Float] the ratio of selected samples vs. total samples (0.1
434
+ # will select approximately 10% of the samples)
435
+ def sample(ratio)
436
+ result = Hash.new
437
+ each_record do |record|
438
+ if rand <= ratio
439
+ result[record['address']] = record
440
+ end
441
+ end
442
+ MemoryDump.new(result)
443
+ end
444
+
445
+ # @api private
446
+ #
447
+ # Return the set of record addresses that are the addresses of roots in
448
+ # the live graph
449
+ #
450
+ # @return [Set<String>]
451
+ def root_addresses
452
+ roots = self.addresses.to_set.dup
453
+ each_record do |r|
454
+ roots.subtract(r['references'])
455
+ end
456
+ roots
457
+ end
458
+
459
+ # Returns the set of roots
460
+ def roots(with_keepalive_count: false)
461
+ result = Hash.new
462
+ self.root_addresses.each do |addr|
463
+ record = find_by_address(addr)
464
+ if with_keepalive_count
465
+ record = record.dup
466
+ count = 0
467
+ depth_first_visit(addr) { count += 1 }
468
+ record['keepalive_count'] = count
469
+ end
470
+ result[addr] = record
471
+ end
472
+ MemoryDump.new(result)
473
+ end
474
+
475
+ def add_children(roots, with_keepalive_count: false)
476
+ result = Hash.new
477
+ roots.each_record do |root_record|
478
+ result[root_record['address']] = root_record
479
+
480
+ root_record['references'].each do |addr|
481
+ ref_record = find_by_address(addr)
482
+ next if !ref_record
483
+
484
+ if with_keepalive_count
485
+ ref_record = ref_record.dup
486
+ count = 0
487
+ depth_first_visit(addr) { count += 1 }
488
+ ref_record['keepalive_count'] = count
489
+ end
490
+ result[addr] = ref_record
491
+ end
492
+ end
493
+ MemoryDump.new(result)
494
+ end
495
+
496
+ def dup
497
+ find_all { true }
498
+ end
499
+
500
+ # Simply remove the given objects
501
+ def remove(objects)
502
+ removed_addresses = objects.addresses.to_set
503
+ return dup if removed_addresses.empty?
504
+
505
+ find_and_map do |r|
506
+ if !removed_addresses.include?(r['address'])
507
+ references = r['references'].dup
508
+ references.delete_if { |a| removed_addresses.include?(a) }
509
+ r['references'] = references
510
+ r
511
+ end
512
+ end
513
+ end
514
+
515
+ # Remove all components that are smaller than the given number of nodes
516
+ #
517
+ # It really looks only at the number of nodes reachable from a root
518
+ # (i.e. won't notice if two smaller-than-threshold roots have nodes in
519
+ # common)
520
+ def remove_small_components(max_size: 1)
521
+ roots = self.addresses.to_set.dup
522
+ leaves = Set.new
523
+ each_record do |r|
524
+ refs = r['references']
525
+ if refs.empty?
526
+ leaves << r['address']
527
+ else
528
+ roots.subtract(r['references'])
529
+ end
530
+ end
531
+
532
+ to_remove = Set.new
533
+ roots.each do |root_address|
534
+ component = Set[]
535
+ queue = Set[root_address]
536
+ while !queue.empty? && (component.size <= max_size)
537
+ address = queue.first
538
+ queue.delete(address)
539
+ next if component.include?(address)
540
+ component << address
541
+ queue.merge(address_to_record[address]['references'])
542
+ end
543
+
544
+ if component.size <= max_size
545
+ to_remove.merge(component)
546
+ end
547
+ end
548
+
549
+ without(find_all { |r| to_remove.include?(r['address']) })
550
+ end
551
+
552
+ def stats
553
+ unknown_class = 0
554
+ by_class = Hash.new(0)
555
+ each_record do |r|
556
+ if klass = (r['class'] || r['type'] || r['root'])
557
+ by_class[klass] += 1
558
+ else
559
+ unknown_class += 1
560
+ end
561
+ end
562
+ return unknown_class, by_class
563
+ end
564
+
565
+ # Compute the set of records that are not in self but are in to
566
+ #
567
+ # @param [MemoryDump]
568
+ # @return [MemoryDump]
569
+ def diff(to)
570
+ diff = Hash.new
571
+ to.each_record do |r|
572
+ address = r['address']
573
+ if !@address_to_record.include?(address)
574
+ diff[address] = r
575
+ end
576
+ end
577
+ MemoryDump.new(diff)
578
+ end
579
+
580
+ # Compute the interface between self and the other dump, that is the
581
+ # elements of self that have a child in dump, and the elements of dump
582
+ # that have a parent in self
583
+ def interface_with(dump)
584
+ self_border = Hash.new
585
+ dump_border = Hash.new
586
+ each_record do |r|
587
+ next if dump.find_by_address(r['address'])
588
+
589
+ refs_in_dump = r['references'].map do |addr|
590
+ dump.find_by_address(addr)
591
+ end.compact
592
+
593
+ if !refs_in_dump.empty?
594
+ self_border[r['address']] = r
595
+ refs_in_dump.each do |child|
596
+ dump_border[child['address']] = child.dup
597
+ end
598
+ end
599
+ end
600
+
601
+ self_border = MemoryDump.new(self_border)
602
+ dump_border = MemoryDump.new(dump_border)
603
+
604
+ dump.update_keepalive_count(dump_border)
605
+ return self_border, dump_border
606
+ end
607
+
608
+ # Replace all objects in dump by a single "group" object
609
+ def group(name, dump, attributes = Hash.new)
610
+ group_addresses = Set.new
611
+ group_references = Set.new
612
+ dump.each_record do |r|
613
+ group_addresses << r['address']
614
+ group_references.merge(r['references'])
615
+ end
616
+ group_record = attributes.dup
617
+ group_record['address'] = name
618
+ group_record['references'] = group_references - group_addresses
619
+
620
+ updated = Hash[name => group_record]
621
+ each_record do |record|
622
+ next if group_addresses.include?(record['address'])
623
+
624
+ updated_record = record.dup
625
+ updated_record['references'] -= group_addresses
626
+ if updated_record['references'].size != record['references'].size
627
+ updated_record['references'] << name
628
+ end
629
+
630
+ if group_addresses.include?(updated_record['class_address'])
631
+ updated_record['class_address'] = name
632
+ end
633
+ if group_addresses.include?(updated_record['class'])
634
+ updated_record['class'] = name
635
+ end
636
+
637
+ updated[updated_record['address']] = updated_record
638
+ end
639
+
640
+ MemoryDump.new(updated)
641
+ end
642
+
643
+ def update_keepalive_count(dump)
644
+ ensure_graphs_computed
645
+ dump.each_record do |record|
646
+ count = 0
647
+ dump.depth_first_visit(record['address']) { |obj| count += 1 }
648
+ record['keepalive_count'] = count
649
+ record
650
+ end
651
+ end
652
+
653
+ def replace_class_id_by_class_name(add_reference_to_class: false)
654
+ MemDump.replace_class_address_by_name(self, add_reference_to_class: add_reference_to_class)
655
+ end
656
+
657
+ def to_s
658
+ "#<MemoryDump size=#{size}>"
659
+ end
660
+ end
661
+ end
662
+
@@ -0,0 +1,7 @@
1
+ module MemDump
2
+ def self.out_degree(dump)
3
+ records = dump.each_record.sort_by { |r| (r['references'] || Array.new).size }
4
+ end
5
+ end
6
+
7
+
@@ -2,23 +2,40 @@ module MemDump
2
2
  # Replace the address in the 'class' attribute by the class name
3
3
  def self.replace_class_address_by_name(dump, add_reference_to_class: false)
4
4
  class_names = Hash.new
5
+ iclasses = Hash.new
5
6
  dump.each_record do |row|
6
7
  if row['type'] == 'CLASS' || row['type'] == 'MODULE'
7
8
  class_names[row['address']] = row['name']
9
+ elsif row['type'] == 'ICLASS' || row['type'] == "IMEMO"
10
+ iclasses[row['address']] = row
8
11
  end
9
12
  end
10
13
 
11
- dump.each_record.map do |r|
14
+ iclass_size = 0
15
+ while !iclasses.empty? && (iclass_size != iclasses.size)
16
+ iclass_size = iclasses.size
17
+ iclasses.delete_if do |_, r|
18
+ if (klass = r['class']) && (class_name = class_names[klass])
19
+ class_names[r['address']] = "I(#{class_name})"
20
+ r['class'] = class_name
21
+ r['class_address'] = klass
22
+ if add_reference_to_class
23
+ (r['references'] ||= Set.new) << klass
24
+ end
25
+ true
26
+ end
27
+ end
28
+ end
29
+
30
+ dump.map do |r|
12
31
  if klass = r['class']
32
+ r = r.dup
13
33
  r['class'] = class_names[klass] || klass
14
34
  r['class_address'] = klass
15
35
  if add_reference_to_class
16
- (r['references'] ||= Array.new) << klass
36
+ (r['references'] ||= Set.new) << klass
17
37
  end
18
38
  end
19
- if r['type'] == 'ICLASS'
20
- r['class'] = "I(#{r['class']})"
21
- end
22
39
  r
23
40
  end
24
41
  end
@@ -1,3 +1,3 @@
1
1
  module Memdump
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
@@ -20,7 +20,8 @@ Gem::Specification.new do |spec|
20
20
  spec.require_paths = ["lib"]
21
21
 
22
22
  spec.add_dependency 'thor'
23
- spec.add_dependency 'rbtrace'
23
+ spec.add_dependency 'rgl'
24
+ spec.add_dependency 'pry'
24
25
  spec.add_development_dependency "bundler", "~> 1.11"
25
26
  spec.add_development_dependency "rake", "~> 10.0"
26
27
  spec.add_development_dependency "minitest", "~> 5.0"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: memdump
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sylvain Joyeux
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-04-25 00:00:00.000000000 Z
11
+ date: 2018-02-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: thor
@@ -25,7 +25,21 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: rbtrace
28
+ name: rgl
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: pry
29
43
  requirement: !ruby/object:Gem::Requirement
30
44
  requirements:
31
45
  - - ">="
@@ -98,13 +112,14 @@ files:
98
112
  - lib/memdump.rb
99
113
  - lib/memdump/cleanup_references.rb
100
114
  - lib/memdump/cli.rb
115
+ - lib/memdump/common_ancestor.rb
101
116
  - lib/memdump/convert_to_gml.rb
102
- - lib/memdump/diff.rb
103
117
  - lib/memdump/json_dump.rb
118
+ - lib/memdump/memory_dump.rb
119
+ - lib/memdump/out_degree.rb
104
120
  - lib/memdump/remove_node.rb
105
121
  - lib/memdump/replace_class_address_by_name.rb
106
122
  - lib/memdump/root_of.rb
107
- - lib/memdump/stats.rb
108
123
  - lib/memdump/subgraph_of.rb
109
124
  - lib/memdump/version.rb
110
125
  - memdump.gemspec
@@ -128,9 +143,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
128
143
  version: '0'
129
144
  requirements: []
130
145
  rubyforge_project:
131
- rubygems_version: 2.2.3
146
+ rubygems_version: 2.5.1
132
147
  signing_key:
133
148
  specification_version: 4
134
149
  summary: Tools to manipulate Ruby 2.1+ memory dumps
135
150
  test_files: []
136
- has_rdoc:
@@ -1,44 +0,0 @@
1
- require 'set'
2
-
3
- module MemDump
4
- def self.diff(from, to)
5
- from_objects = Set.new
6
- from.each_record { |r| from_objects << (r['address'] || r['root']) }
7
- puts "#{from_objects.size} objects found in source dump"
8
-
9
- selected_records = Hash.new
10
- remaining_records = Array.new
11
- to.each_record do |r|
12
- address = (r['address'] || r['root'])
13
- if !from_objects.include?(address)
14
- selected_records[address] = r
15
- r['only_in_target'] = 1
16
- else
17
- remaining_records << r
18
- end
19
- end
20
-
21
- total = remaining_records.size + selected_records.size
22
- count = 0
23
- while selected_records.size != count
24
- count = selected_records.size
25
- puts "#{count}/#{total} records selected so far"
26
- remaining_records.delete_if do |r|
27
- address = (r['address'] || r['root'])
28
- references = r['references']
29
-
30
- if references && references.any? { |r| selected_records.has_key?(r) }
31
- selected_records[address] = r
32
- end
33
- end
34
- end
35
- puts "#{count}/#{total} records selected"
36
-
37
- selected_records.each_value do |r|
38
- if references = r['references']
39
- references.delete_if { |a| !selected_records.has_key?(a) }
40
- end
41
- end
42
- selected_records.each_value
43
- end
44
- end
@@ -1,15 +0,0 @@
1
- module MemDump
2
- def self.stats(memdump)
3
- unknown_class = 0
4
- by_class = Hash.new(0)
5
- memdump.each_record do |r|
6
- if klass = (r['class'] || r['type'] || r['root'])
7
- by_class[klass] += 1
8
- else
9
- unknown_class += 1
10
- end
11
- end
12
- return unknown_class, by_class
13
- end
14
- end
15
-