memdump 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: bb8daaffee80155c4227920118ed3793e3afe4ab
4
- data.tar.gz: b57f30d96b9e35d8519d800cbca852ef6995d965
3
+ metadata.gz: deaf03849e0949a5cf0f6150ea598b02e55411cb
4
+ data.tar.gz: e8d53128488b1d0c83392b0b56eed940327d5a0a
5
5
  SHA512:
6
- metadata.gz: 84b9700c70b49d35b7c7910e1a047d4f1fd9aba7161888090bc3190e6e983a835b9417363376c3eabe648763eb537e57d89726440f9b8ac022b0920be60bd2fd
7
- data.tar.gz: 206a0adbfd4c9f3b8d10f21c0ad07f911384977f79dee4a22e01af9c3710ab6e430e6f4ade72a37655e24eef2137025cec050dce232d2dacb0db75d1ae8f86e3
6
+ metadata.gz: bf78d4e885d83b66e47f8f642b90ba74117a3b7c5a6963ce602f0dbbbe5eab1c19ce2bce1a54f01290f787926a2170520b39e20661ce23ef9f4ee08d1fc2ee68
7
+ data.tar.gz: bbb79a73ec1e0dc13e42040380c12168ebdabc2e9c91d9801421aa025245c1eb3e6070b79f886642d275504ce74749afdd7bdc17c5ccd22d867db101164134b7
data/Gemfile CHANGED
@@ -1,4 +1,6 @@
1
1
  source 'https://rubygems.org'
2
2
 
3
+ gem 'rbtrace', platform: 'mri'
4
+
3
5
  # Specify your gem's dependencies in memdump.gemspec
4
6
  gemspec
data/README.md CHANGED
@@ -86,10 +86,10 @@ Allocation tracing is enabled with
86
86
 
87
87
  ~~~ ruby
88
88
  require 'objspace'
89
- ObjectSpace.trace_objects_allocation_start
89
+ ObjectSpace.trace_object_allocations_start
90
90
  ~~~
91
91
 
92
- ## Analyzing the dump
92
+ ## Basic analysis
93
93
 
94
94
  The first thing you will probably want to do is to run the replace-class command
95
95
  on the dump. It replaces the class attribute, which in the original dump is the
@@ -105,13 +105,122 @@ count by class. For memory leaks, the **diff** command allows you to output the
105
105
  part of the graph that involves new objects (removing the
106
106
  "old-and-not-referred-to-by-new")
107
107
 
108
+ Beyond, this analyzing the dump is best done through the interactive mode:
109
+
110
+ ```
111
+ memdump interactive /tmp/mydump
112
+ ```
113
+
114
+ will get you a pry shell in the context of the loaded MemoryDump object. Use
115
+ the MemoryDump API to filter out what you need. If you're dealing with big dumps,
116
+ it is usually a good idea to save them regularly with `#dump`.
117
+
118
+ One useful call to do at the beginning is #common_cleanup. It collapses the
119
+ common collections (Array, Set, Hash) as well as internal bookkeeping objects
120
+ (ICLASS, …). I usually run this, save the result and re-load the result (which
121
+ is usually significantly smaller).
122
+
123
+ After, the usual process is to find out which non-standard classes are
124
+ unexpectedly present in high numbers using `stats`, extract the objects from
125
+ these classes with `dump = objects_of_class('classname')` and the subgraph that
126
+ keeps them alive with `roots_of(dump)`
127
+
128
+ ```
129
+ # Get the subgraph of all objects whose class name matches /Plan/ and export
130
+ # it to GML to process with Gephi (see below)
131
+ parent_dump, _ = roots_of(objects_of_class(/Plan/))
132
+ parent_dump.to_gml('plan-subgraph.gml')
133
+ ```
134
+
135
+ Once you start filtering dumps, don't forget to simplify your life by `cd`'ing
136
+ in the context of the newly filtered dumps
137
+
108
138
  Beyond that, I usually go back and forth between the memory dump and
109
- [gephi](http://gephi.org), a graph analysis application. the **gml** command
110
- allows to convert the memory dump into a graph format that gephi can import.
111
- From there, use gephi's layouting and filtering algorithms to get an idea of the
112
- most likely objects. Then, you can "massage" the dump using the **root_of**,
113
- **subgraph_of** and **remove-node** commands to narrow the dump to its most useful
114
- parts.
139
+ [gephi](http://gephi.org), a graph analysis application. `to_gml` allows to
140
+ convert the memory dump into a graph format that gephi can import. From there,
141
+ use gephi's layouting and filtering algorithms to get an idea of the shape of
142
+ the dump. Note that you need to first get a graph smaller than a few 10k of objects
143
+ before you can use gephi.
144
+
145
+ ## Dump diffs
146
+
147
+ One powerful way to find out where memory is leaked is to look at objects that
148
+ got allocated and find the interface between the long-term objects and these
149
+ objects. memdump supports this by computing diffs.
150
+
151
+ If you mean to use dump diffs you **MUST** enable allocation tracing. Not doing
152
+ so will make the diffs inaccurate, as memdump will not be able to recognize that some
153
+ object addresses have been reused after a garbage collection.
154
+
155
+ Let's assume that we have a "before.json" and "after.json" dumps. Start an interactive
156
+ shell loading `before`.
157
+
158
+ ```
159
+ memdump interactive before.json
160
+ ```
161
+
162
+ Then, in the shell, let's load the after dump
163
+
164
+ ```
165
+ > after = MemDump::JSONDump.load('after.json')
166
+ ```
167
+
168
+ The set of objects that are in `after` and `before` is given by `#diff`
169
+
170
+ ```
171
+ d = diff(after)
172
+ ```
173
+
174
+ We'll also add a special marker to the records in `d` so that we can easily colorize
175
+ them differently in Gephi.
176
+
177
+ ```
178
+ d = d.map { |r| r['in_after'] = 1; r }
179
+ ```
180
+
181
+ ## Case 1: few new objects are linked to the old ones
182
+
183
+ One possibility is that there are only a few objects in the diff that are kept
184
+ alive from `before`. These objects in turn keep alive a lot more objects (which
185
+ cause the noticeable memory leak). What's interesting in this case is to
186
+ visualize the interface, that is that set of objects.
187
+
188
+ In memdump, one computes it with the `interface_with` method, which computes the
189
+ interface between the receiver and the argument. The receiver must contain the
190
+ edges between itself and the argument, which means in our case that we must use
191
+ `after`.
192
+
193
+ ```
194
+ self_border, diff_border = after.interface_with(d)
195
+ ```
196
+
197
+ In addition to computing the border, it computes the count of objects that are
198
+ kept alive by each object in `diff_border`. Each record in `diff_border` has an
199
+ attribute called `keepalive_count` that counts the amount of nodes in `after`
200
+ that are reachable (i.e. kept alive by) it. It is usually a good idea to
201
+ visualize the distribution of `keepalive_count` to see whether there's indeed
202
+ only a few nodes, and whether some are keeping a lot more objects alive than
203
+ others. Note that cycles that involve more than one "border node" will be
204
+ counted multiple ones (so the sum of `keepalive_count` will be higher than
205
+ `d.size`)
206
+
207
+ ```
208
+ diff_border.size # is this much smaller than d.size ?
209
+ diff_border.each_record.map { |r| r['keepalive_count'] }.sort.reverse # are there some high counts at the top ?
210
+ ```
211
+
212
+ From there, one needs to do a bunch of back-and-forth between memdump and Gephi.
213
+ What I usually do is start by dumping the whole subgraph that contains the border
214
+ and visualize. If I can't make any sense of it, I isolate the high-count elements
215
+ in the border and visualize the related subgraph
216
+
217
+ ```
218
+ full_subgraph = after.roots_of(diff_border)
219
+ full_subgraph.to_gml 'full.gml'
220
+ filtered_border = diff_border.find_all { |r| r['keepalive_count'] > 1000 }
221
+ filtered_subgraph = after.roots_of(filtered_border)
222
+ filtered_subgraph.to_gml 'filtered.gml'
223
+ ```
115
224
 
116
225
  ## Contributing
117
226
 
@@ -1,2 +1,3 @@
1
+ #! /usr/bin/env ruby
1
2
  require 'memdump/cli'
2
3
  MemDump::CLI.start(ARGV)
@@ -1,5 +1,22 @@
1
+ require 'rgl/adjacency'
2
+ require 'rgl/dijkstra'
3
+ require 'rgl/traversal'
4
+
1
5
  require "memdump/version"
6
+ require 'memdump/json_dump'
7
+ require 'memdump/memory_dump'
8
+
9
+ require 'memdump/cleanup_references'
10
+ require 'memdump/common_ancestor'
11
+ require 'memdump/convert_to_gml'
12
+ require 'memdump/out_degree'
13
+ require 'memdump/remove_node'
14
+ require 'memdump/replace_class_address_by_name'
15
+ require 'memdump/root_of'
16
+ require 'memdump/subgraph_of'
2
17
 
3
- module Memdump
4
- # Your code goes here...
18
+ module MemDump
19
+ def self.pry(dump)
20
+ binding.pry
21
+ end
5
22
  end
@@ -1,6 +1,6 @@
1
1
  require 'thor'
2
2
  require 'pathname'
3
- require 'memdump/json_dump'
3
+ require 'memdump'
4
4
 
5
5
  module MemDump
6
6
  class CLI < Thor
@@ -17,17 +17,14 @@ module MemDump
17
17
 
18
18
  desc 'diff SOURCE TARGET OUTPUT', 'generate a memory dump that contains the objects in TARGET not in SOURCE, and all their parents'
19
19
  def diff(source, target, output)
20
- require 'memdump/diff'
21
-
22
- STDOUT.sync = true
23
- from = MemDump::JSONDump.new(Pathname.new(source))
24
- to = MemDump::JSONDump.new(Pathname.new(target))
25
- records = MemDump.diff(from, to)
26
- File.open(output, 'w') do |io|
27
- records.each do |r|
28
- io.puts JSON.dump(r)
29
- end
30
- end
20
+ from = MemDump::JSONDump.load(source)
21
+ to = MemDump::JSONDump.load(target)
22
+ diff = from.diff(to)
23
+ STDOUT.sync
24
+ puts "#{diff.size} nodes are in target but not in source"
25
+ diff = to.roots_of(diff)
26
+ puts "#{diff.size} nodes in final dump"
27
+ diff.save(output)
31
28
  end
32
29
 
33
30
  desc 'gml DUMP GML', 'converts a memory dump into a graph in the GML format (for processing by e.g. gephi)'
@@ -82,13 +79,9 @@ module MemDump
82
79
  if output_path then Pathname.new(output_path)
83
80
  else dump_path
84
81
  end
85
- dump = MemDump::JSONDump.new(dump_path)
86
- result = MemDump.replace_class_address_by_name(dump, add_reference_to_class: options[:add_ref])
87
- output_path.open('w') do |io|
88
- result.each do |r|
89
- io.puts JSON.dump(r)
90
- end
91
- end
82
+ dump = MemDump::JSONDump.load(dump_path)
83
+ dump = dump.replace_class_id_by_class_name(add_reference_to_class: options[:add_ref])
84
+ dump.save(output_path)
92
85
  end
93
86
 
94
87
  desc 'cleanup-refs DUMP OUTPUT', "removes references to deleted objects"
@@ -121,13 +114,39 @@ module MemDump
121
114
  def stats(dump)
122
115
  require 'pp'
123
116
  require 'memdump/stats'
124
- dump = MemDump::JSONDump.new(Pathname.new(dump))
125
- unknown, by_type = MemDump.stats(dump)
117
+ dump = MemDump::JSONDump.load(dump)
118
+ unknown, by_type = dump.stats
126
119
  puts "#{unknown} objects without a known type"
127
120
  by_type.sort_by { |n, v| v }.reverse.each do |n, v|
128
121
  puts "#{n}: #{v}"
129
122
  end
130
123
  end
124
+
125
+ desc 'out_degree DUMP', 'display the direct count of objects held by each object in the dump'
126
+ option "min", desc: "hide the objects whose degree is lower than this",
127
+ type: :numeric
128
+ def out_degree(dump)
129
+ dump = MemDump::JSONDump.new(Pathname.new(dump))
130
+ min = options[:min] || 0
131
+ sorted = dump.each_record.sort_by { |r| (r['references'] || Array.new).size }
132
+ sorted.each do |r|
133
+ size = (r['references'] || Array.new).size
134
+ break if size > min
135
+ puts "#{size} #{r}"
136
+ end
137
+ end
138
+
139
+ desc 'interactive DUMP', 'loads a dump file and spawn a pry shell'
140
+ option :load, desc: 'load the whole dump in memory', type: :boolean, default: true
141
+ def interactive(dump)
142
+ require 'memdump'
143
+ require 'pry'
144
+ dump = MemDump::JSONDump.new(Pathname.new(dump))
145
+ if options[:load]
146
+ dump = dump.load
147
+ end
148
+ dump.pry
149
+ end
131
150
  end
132
151
  end
133
152
 
@@ -0,0 +1,44 @@
1
+ module MemDump
2
+ def self.common_ancestors(dump, class_name, threshold: 0.1)
3
+ selected_records = Hash.new
4
+ remaining_records = Array.new
5
+ dump.each_record do |r|
6
+ if class_name === r['class']
7
+ selected_records[r['address']] = r
8
+ else
9
+ remaining_records << r
10
+ end
11
+ end
12
+
13
+ remaining_records = Array.new
14
+ selected_records = Hash.new
15
+ selected_root = root_address
16
+ dump.each_record do |r|
17
+ address = (r['address'] || r['root'])
18
+ if selected_root == address
19
+ selected_records[address] = r
20
+ selected_root = nil;
21
+ else
22
+ remaining_records << r
23
+ end
24
+ end
25
+
26
+ count = 0
27
+ while count != selected_records.size
28
+ count = selected_records.size
29
+ remaining_records.delete_if do |r|
30
+ references = r['references']
31
+ if references && references.any? { |a| selected_records.has_key?(a) }
32
+ address = (r['address'] || r['root'])
33
+ selected_records[address] = r
34
+ end
35
+ end
36
+ end
37
+
38
+ selected_records.values.reverse.each do |r|
39
+ if refs = r['references']
40
+ refs.delete_if { |a| !selected_records.has_key?(a) }
41
+ end
42
+ end
43
+ end
44
+ end
@@ -1,47 +1,37 @@
1
- require 'set'
2
-
3
1
  module MemDump
4
2
  def self.convert_to_gml(dump, io)
5
- nodes = dump.each_record.map do |row|
6
- if row['class_address'] # transformed with replace_class_address_by_name
7
- name = row['class']
8
- else
9
- name = row['struct'] || row['root'] || row['type']
10
- end
11
-
12
- address = row['address'] || row['root']
13
- refs = Hash.new
14
- if row_refs = row['references']
15
- row_refs.each { |r| refs[r] = nil }
16
- end
17
-
18
- [address, refs, name]
19
- end
20
-
21
3
  io.puts "graph"
22
4
  io.puts "["
23
- known_addresses = Set.new
24
- nodes.each do |address, refs, name|
25
- known_addresses << address
5
+
6
+ edges = []
7
+ dump.each_record do |row|
8
+ address = row['address']
9
+
26
10
  io.puts " node"
27
11
  io.puts " ["
28
12
  io.puts " id #{address}"
29
- io.puts " label \"#{name}\""
13
+ row.each do |key, value|
14
+ if value.respond_to?(:to_str)
15
+ io.puts " #{key} \"#{value}\""
16
+ elsif value.kind_of?(Numeric)
17
+ io.puts " #{key} #{value}"
18
+ end
19
+ end
30
20
  io.puts " ]"
31
- end
32
21
 
33
- nodes.each do |address, refs, _|
34
- refs.each do |ref_address, ref_label|
35
- io.puts " edge"
36
- io.puts " ["
37
- io.puts " source #{address}"
38
- io.puts " target #{ref_address}"
39
- if ref_label
40
- io.puts " label \"#{ref_label}\""
41
- end
42
- io.puts " ]"
22
+ row['references'].each do |ref_address|
23
+ edges << address << ref_address
43
24
  end
44
25
  end
26
+
27
+ edges.each_slice(2) do |address, ref_address|
28
+ io.puts " edge"
29
+ io.puts " ["
30
+ io.puts " source #{address}"
31
+ io.puts " target #{ref_address}"
32
+ io.puts " ]"
33
+ end
34
+
45
35
  io.puts "]"
46
36
  end
47
37
  end
@@ -1,22 +1,65 @@
1
+ require 'pathname'
1
2
  require 'json'
2
3
  module MemDump
3
4
  class JSONDump
5
+ def self.load(filename)
6
+ new(filename).load
7
+ end
8
+
4
9
  def initialize(filename)
5
- @filename = filename
10
+ @filename = Pathname(filename)
6
11
  end
7
12
 
8
13
  def each_record
9
14
  return enum_for(__method__) if !block_given?
10
15
 
11
- if @cached_entries
12
- @cached_entries.each(&proc)
13
- else
14
- @filename.open do |f|
15
- f.each_line do |line|
16
- yield JSON.parse(line)
16
+ @filename.open do |f|
17
+ f.each_line do |line|
18
+ r = JSON.parse(line)
19
+ r['address'] ||= r['root']
20
+ r['references'] ||= Set.new
21
+ yield r
22
+ end
23
+ end
24
+ end
25
+
26
+ def load
27
+ address_to_record = Hash.new
28
+ generations = Hash.new
29
+ each_record do |r|
30
+ if !(address = r['address'])
31
+ raise "no address in #{r}"
32
+ end
33
+ r = r.dup
34
+
35
+ if generation = r['generation']
36
+ generations[address] = r['address'] = "#{address}:#{generation}"
37
+ end
38
+ r['references'] = r['references'].to_set
39
+ address_to_record[r['address']] = r
40
+ end
41
+
42
+ if !generations.empty?
43
+ address_to_record.each_value do |r|
44
+ if class_address = r['class']
45
+ r['class'] = generations.fetch(class_address, class_address)
46
+ end
47
+ if class_address = r['class_address']
48
+ r['class_address'] = generations.fetch(class_address, class_address)
17
49
  end
50
+
51
+ refs = Set.new
52
+ r['references'].each do |ref_address|
53
+ refs << generations.fetch(ref_address, ref_address)
54
+ end
55
+ r['references'] = refs
18
56
  end
19
57
  end
58
+ MemoryDump.new(address_to_record)
59
+ end
60
+
61
+ def inspect
62
+ to_s
20
63
  end
21
64
  end
22
65
  end
@@ -0,0 +1,662 @@
1
+ module MemDump
2
+ class MemoryDump
3
+ attr_reader :address_to_record
4
+
5
+ def initialize(address_to_record)
6
+ @address_to_record = address_to_record
7
+ @forward_graph = nil
8
+ @backward_graph = nil
9
+ end
10
+
11
+ def include?(address)
12
+ address_to_record.has_key?(address)
13
+ end
14
+
15
+ def each_record(&block)
16
+ address_to_record.each_value(&block)
17
+ end
18
+
19
+ def addresses
20
+ address_to_record.keys
21
+ end
22
+
23
+ def size
24
+ address_to_record.size
25
+ end
26
+
27
+ def find_by_address(address)
28
+ address_to_record[address]
29
+ end
30
+
31
+ def inspect
32
+ to_s
33
+ end
34
+
35
+ def save(io_or_path)
36
+ if io_or_path.respond_to?(:open)
37
+ io_or_path.open 'w' do |io|
38
+ save(io)
39
+ end
40
+ else
41
+ each_record do |r|
42
+ io_or_path.puts JSON.dump(r)
43
+ end
44
+ end
45
+ end
46
+
47
+ # Filter the records
48
+ #
49
+ # @yieldparam record a record
50
+ # @yieldreturn [Object] the record object that should be included in the
51
+ # returned dump
52
+ # @return [MemoryDump]
53
+ def find_all
54
+ return enum_for(__method__) if !block_given?
55
+
56
+ address_to_record = Hash.new
57
+ each_record do |r|
58
+ if yield(r)
59
+ address_to_record[r['address']] = r
60
+ end
61
+ end
62
+ MemoryDump.new(address_to_record)
63
+ end
64
+
65
+ # Map the records
66
+ #
67
+ # @yieldparam record a record
68
+ # @yieldreturn [Object] the record object that should be included in the
69
+ # returned dump
70
+ # @return [MemoryDump]
71
+ def map
72
+ return enum_for(__method__) if !block_given?
73
+
74
+ address_to_record = Hash.new
75
+ each_record do |r|
76
+ address_to_record[r['address']] = yield(r.dup).to_hash
77
+ end
78
+ MemoryDump.new(address_to_record)
79
+ end
80
+
81
+ # Filter the entries, removing those for which the block returns falsy
82
+ #
83
+ # @yieldparam record a record
84
+ # @yieldreturn [nil,Object] either a record object, or falsy to remove
85
+ # this record in the returned dump
86
+ # @return [MemoryDump]
87
+ def find_and_map
88
+ return enum_for(__method__) if !block_given?
89
+
90
+ address_to_record = Hash.new
91
+ each_record do |r|
92
+ if result = yield(r.dup)
93
+ address_to_record[r['address']] = result.to_hash
94
+ end
95
+ end
96
+ MemoryDump.new(address_to_record)
97
+ end
98
+
99
+ # Return the records of a given type
100
+ #
101
+ # @param [String] name the type
102
+ # @return [MemoryDump] the matching records
103
+ #
104
+ # @example return all ICLASS (singleton) records
105
+ # objects_of_class("ICLASS")
106
+ def objects_of_type(name)
107
+ find_all { |r| name === r['type'] }
108
+ end
109
+
110
+ # Return the records of a given class
111
+ #
112
+ # @param [String] name the class
113
+ # @return [MemoryDump] the matching entries
114
+ #
115
+ # @example return all string records
116
+ # objects_of_class("String")
117
+ def objects_of_class(name)
118
+ find_all { |r| name === r['class'] }
119
+ end
120
+
121
+ # Return the entries that refer to the entries in the dump
122
+ #
123
+ # @param [MemoryDump] the set of entries whose parents we're looking for
124
+ # @param [Integer] min only return the entries in self that refer to
125
+ # more than this much entries in 'dump'
126
+ # @param [Boolean] exclude_dump exclude the entries that are already in
127
+ # 'dump'
128
+ # @return [(MemoryDump,Hash)] the parent entries, and a mapping from
129
+ # records in the parent entries to the count of entries in 'dump' they
130
+ # refer to
131
+ def parents_of(dump, min: 0, exclude_dump: false)
132
+ children = dump.addresses.to_set
133
+ counts = Hash.new
134
+ filtered = find_all do |r|
135
+ next if exclude_dump && children.include?(r['address'])
136
+
137
+ count = r['references'].count { |r| children.include?(r) }
138
+ if count > min
139
+ counts[r] = count
140
+ true
141
+ end
142
+ end
143
+ return filtered, counts
144
+ end
145
+
146
+ # Remove entries from this dump, keeping the transitivity in the
147
+ # remaining graph
148
+ #
149
+ # @param [MemoryDump] entries entries to remove
150
+ #
151
+ # @example remove all entries that are of type HASH
152
+ # collapse(objects_of_type('HASH'))
153
+ def collapse(entries)
154
+ collapsed_entries = Hash.new
155
+ entries.each_record do |r|
156
+ collapsed_entries[r['address']] = r['references'].dup
157
+ end
158
+
159
+
160
+ # Remove references in-between the entries to collapse
161
+ already_expanded = Hash.new { |h, k| h[k] = Set[k] }
162
+ begin
163
+ changed_entries = Hash.new
164
+ collapsed_entries.each do |address, references|
165
+ sets = references.classify { |ref_address| collapsed_entries.has_key?(ref_address) }
166
+ updated_references = sets[false] || Set.new
167
+ if to_collapse = sets[true]
168
+ to_collapse.each do |ref_address|
169
+ next if already_expanded[address].include?(ref_address)
170
+ updated_references.merge(collapsed_entries[ref_address])
171
+ end
172
+ already_expanded[address].merge(to_collapse)
173
+ changed_entries[address] = updated_references
174
+ end
175
+ end
176
+ puts "#{changed_entries.size} changed entries"
177
+ collapsed_entries.merge!(changed_entries)
178
+ end while !changed_entries.empty?
179
+
180
+ find_and_map do |record|
181
+ next if collapsed_entries.has_key?(record['address'])
182
+
183
+ sets = record['references'].classify do |ref_address|
184
+ collapsed_entries.has_key?(ref_address)
185
+ end
186
+ updated_references = sets[false] || Set.new
187
+ if to_collapse = sets[true]
188
+ to_collapse.each do |ref_address|
189
+ updated_references.merge(collapsed_entries[ref_address])
190
+ end
191
+ record = record.dup
192
+ record['references'] = updated_references
193
+ end
194
+ record
195
+ end
196
+ end
197
+
198
+ # Remove entries from the dump, and all references to them
199
+ #
200
+ # @param [MemoryDump] the set of entries to remove, as e.g. returned by
201
+ # {#objects_of_class}
202
+ # @return [MemoryDump] the filtered dump
203
+ def without(entries)
204
+ find_and_map do |record|
205
+ next if entries.include?(record['address'])
206
+ record_refs = record['references']
207
+ references = record_refs.find_all { |r| !entries.include?(r) }
208
+ if references.size != record_refs.size
209
+ record = record.dup
210
+ record['references'] = references.to_set
211
+ end
212
+ record
213
+ end
214
+ end
215
+
216
+ # Write the dump to a GML file that can loaded by Gephi
217
+ #
218
+ # @param [Pathname,String,IO] the path or the IO stream into which we should
219
+ # dump
220
+ def to_gml(io_or_path)
221
+ if io_or_path.kind_of?(IO)
222
+ MemDump.convert_to_gml(self, io_or_path)
223
+ else
224
+ Pathname(io_or_path).open 'w' do |io|
225
+ to_gml(io)
226
+ end
227
+ end
228
+ nil
229
+ end
230
+
231
+ # Save the dump
232
+ def save(io_or_path)
233
+ if io_or_path.kind_of?(IO)
234
+ each_record do |r|
235
+ r = r.dup
236
+ r['address'] = r['address'].gsub(/:\d+$/, '')
237
+ if r['class_address']
238
+ r['class_address'] = r['class_address'].gsub(/:\d+$/, '')
239
+ elsif r['address']
240
+ r['address'] = r['address'].gsub(/:\d+$/, '')
241
+ end
242
+ r['references'] = r['references'].map { |ref_addr| ref_addr.gsub(/:\d+$/, '') }
243
+ io_or_path.puts JSON.dump(r)
244
+ end
245
+ nil
246
+ else
247
+ Pathname(io_or_path).open 'w' do |io|
248
+ save(io)
249
+ end
250
+ end
251
+ end
252
+
253
+ COMMON_COLLAPSE_TYPES = %w{IMEMO HASH ARRAY}
254
+ COMMON_COLLAPSE_CLASSES = %w{Set RubyVM::Env}
255
+
256
+ # Perform common initial cleanup
257
+ #
258
+ # It basically removes common classes that usually make a dump analysis
259
+ # more complicated without providing more information
260
+ #
261
+ # Namely, it collapses internal Ruby node types ROOT and IMEMO, as well
262
+ # as common collection classes {COMMON_COLLAPSE_CLASSES}.
263
+ #
264
+ # One usually analyses a cleaned-up dump before getting into the full
265
+ # dump
266
+ #
267
+ # @return [MemDump] the filtered dump
268
+ def common_cleanup
269
+ without_weakrefs = remove(objects_of_class 'WeakRef')
270
+ to_collapse = without_weakrefs.find_all do |r|
271
+ COMMON_COLLAPSE_CLASSES.include?(r['class']) ||
272
+ COMMON_COLLAPSE_TYPES.include?(r['type']) ||
273
+ r['method'] == 'dump_all'
274
+ end
275
+ without_weakrefs.collapse(to_collapse)
276
+ end
277
+
278
+ # Remove entries in the reference for which we can't find an object with
279
+ # the matching address
280
+ #
281
+ # @return [(MemoryDump,Set)] the filtered dump and the set of missing addresses found
282
+ def remove_invalid_references
283
+ addresses = self.addresses.to_set
284
+ missing = Set.new
285
+ result = map do |r|
286
+ common = (addresses & r['references'])
287
+ if common.size != r['references'].size
288
+ missing.merge(r['references'] - common)
289
+ end
290
+ r = r.dup
291
+ r['references'] = common
292
+ r
293
+ end
294
+ return result, missing
295
+ end
296
+
297
+ # Return the graph of object that keeps objects in dump alive
298
+ #
299
+ # It contains only the shortest paths from the roots to the objects in
300
+ # dump
301
+ #
302
+ # @param [MemoryDump] dump
303
+ # @return [MemoryDump]
304
+ def roots_of(dump, root_dump: nil)
305
+ if root_dump && root_dump.empty?
306
+ raise ArgumentError, "no roots provided"
307
+ end
308
+
309
+ root_addresses =
310
+ if root_dump then root_dump.addresses
311
+ else
312
+ ['ALL_ROOTS']
313
+ end
314
+
315
+ ensure_graphs_computed
316
+
317
+ result_nodes = Set.new
318
+ dump_addresses = dump.addresses
319
+ root_addresses.each do |root_address|
320
+ visitor = RGL::DijkstraVisitor.new(@forward_graph)
321
+ dijkstra = RGL::DijkstraAlgorithm.new(@forward_graph, Hash.new(1), visitor)
322
+ dijkstra.find_shortest_paths(root_address)
323
+ path_builder = RGL::PathBuilder.new(root_address, visitor.parents_map)
324
+
325
+ dump_addresses.each_with_index do |record_address, record_i|
326
+ if path = path_builder.path(record_address)
327
+ result_nodes.merge(path)
328
+ end
329
+ end
330
+ end
331
+
332
+ find_and_map do |record|
333
+ address = record['address']
334
+ next if !result_nodes.include?(address)
335
+
336
+ # Prefer records in 'dump' to allow for annotations in the
337
+ # source
338
+ record = dump.find_by_address(address) || record
339
+ record = record.dup
340
+ record['references'] = result_nodes & record['references']
341
+ record
342
+ end
343
+ end
344
+
345
+ def minimum_spanning_tree(root_dump)
346
+ if root_dump.size != 1
347
+ raise ArgumentError, "there should be exactly one root"
348
+ end
349
+ root_address, _ = root_dump.address_to_record.first
350
+ if !(root = address_to_record[root_address])
351
+ raise ArgumentError, "no record with address #{root_address} in self"
352
+ end
353
+
354
+ ensure_graphs_computed
355
+
356
+ mst = @forward_graph.minimum_spanning_tree(root)
357
+ map = Hash.new
358
+ mst.each_vertex do |record|
359
+ record = record.dup
360
+ record['references'] = record['references'].dup
361
+ record['references'].delete_if { |ref_address| !mst.has_vertex?(ref_address) }
362
+ end
363
+ MemoryDump.new(map)
364
+ end
365
+
366
+ # @api private
367
+ #
368
+ # Ensure that @forward_graph and @backward_graph are computed
369
+ def ensure_graphs_computed
370
+ if !@forward_graph
371
+ @forward_graph, @backward_graph = compute_graphs
372
+ end
373
+ end
374
+
375
+ # @api private
376
+ #
377
+ # Force recomputation of the graph representation of the dump the next
378
+ # time it is needed
379
+ def clear_graph
380
+ @forward_graph = nil
381
+ @backward_graph = nil
382
+ end
383
+
384
+ # @api private
385
+ #
386
+ # Create two RGL::DirectedAdjacencyGraph, for the forward and backward edges of the graph
387
+ def compute_graphs
388
+ forward_graph = RGL::DirectedAdjacencyGraph.new
389
+ forward_graph.add_vertex 'ALL_ROOTS'
390
+ address_to_record.each do |address, record|
391
+ forward_graph.add_vertex(address)
392
+
393
+ if record['type'] == 'ROOT'
394
+ forward_graph.add_edge('ALL_ROOTS', address)
395
+ end
396
+ record['references'].each do |ref_address|
397
+ forward_graph.add_edge(address, ref_address)
398
+ end
399
+ end
400
+
401
+ backward_graph = RGL::DirectedAdjacencyGraph.new
402
+ forward_graph.each_edge do |u, v|
403
+ backward_graph.add_edge(v, u)
404
+ end
405
+ return forward_graph, backward_graph
406
+ end
407
+
408
+ def depth_first_visit(root, &block)
409
+ ensure_graphs_computed
410
+ @forward_graph.depth_first_visit(root, &block)
411
+ end
412
+
413
+ # Validate that all reference entries have a matching dump entry
414
+ #
415
+ # @raise [RuntimeError] if references have been found
416
+ def validate_references
417
+ addresses = self.addresses.to_set
418
+ each_record do |r|
419
+ common = addresses & r['references']
420
+ if common.size != r['references'].size
421
+ missing = r['references'] - common
422
+ raise "#{r} references #{missing.to_a.sort.join(", ")} which do not exist"
423
+ end
424
+ end
425
+ nil
426
+ end
427
+
428
+ # Get a random sample of the records
429
+ #
430
+ # The sampling is random, so the returned set might be bigger or smaller
431
+ # than expected. Do not use on small sets.
432
+ #
433
+ # @param [Float] the ratio of selected samples vs. total samples (0.1
434
+ # will select approximately 10% of the samples)
435
+ def sample(ratio)
436
+ result = Hash.new
437
+ each_record do |record|
438
+ if rand <= ratio
439
+ result[record['address']] = record
440
+ end
441
+ end
442
+ MemoryDump.new(result)
443
+ end
444
+
445
+ # @api private
446
+ #
447
+ # Return the set of record addresses that are the addresses of roots in
448
+ # the live graph
449
+ #
450
+ # @return [Set<String>]
451
+ def root_addresses
452
+ roots = self.addresses.to_set.dup
453
+ each_record do |r|
454
+ roots.subtract(r['references'])
455
+ end
456
+ roots
457
+ end
458
+
459
+ # Returns the set of roots
460
+ def roots(with_keepalive_count: false)
461
+ result = Hash.new
462
+ self.root_addresses.each do |addr|
463
+ record = find_by_address(addr)
464
+ if with_keepalive_count
465
+ record = record.dup
466
+ count = 0
467
+ depth_first_visit(addr) { count += 1 }
468
+ record['keepalive_count'] = count
469
+ end
470
+ result[addr] = record
471
+ end
472
+ MemoryDump.new(result)
473
+ end
474
+
475
+ def add_children(roots, with_keepalive_count: false)
476
+ result = Hash.new
477
+ roots.each_record do |root_record|
478
+ result[root_record['address']] = root_record
479
+
480
+ root_record['references'].each do |addr|
481
+ ref_record = find_by_address(addr)
482
+ next if !ref_record
483
+
484
+ if with_keepalive_count
485
+ ref_record = ref_record.dup
486
+ count = 0
487
+ depth_first_visit(addr) { count += 1 }
488
+ ref_record['keepalive_count'] = count
489
+ end
490
+ result[addr] = ref_record
491
+ end
492
+ end
493
+ MemoryDump.new(result)
494
+ end
495
+
496
+ def dup
497
+ find_all { true }
498
+ end
499
+
500
+ # Simply remove the given objects
501
+ def remove(objects)
502
+ removed_addresses = objects.addresses.to_set
503
+ return dup if removed_addresses.empty?
504
+
505
+ find_and_map do |r|
506
+ if !removed_addresses.include?(r['address'])
507
+ references = r['references'].dup
508
+ references.delete_if { |a| removed_addresses.include?(a) }
509
+ r['references'] = references
510
+ r
511
+ end
512
+ end
513
+ end
514
+
515
+ # Remove all components that are smaller than the given number of nodes
516
+ #
517
+ # It really looks only at the number of nodes reachable from a root
518
+ # (i.e. won't notice if two smaller-than-threshold roots have nodes in
519
+ # common)
520
+ def remove_small_components(max_size: 1)
521
+ roots = self.addresses.to_set.dup
522
+ leaves = Set.new
523
+ each_record do |r|
524
+ refs = r['references']
525
+ if refs.empty?
526
+ leaves << r['address']
527
+ else
528
+ roots.subtract(r['references'])
529
+ end
530
+ end
531
+
532
+ to_remove = Set.new
533
+ roots.each do |root_address|
534
+ component = Set[]
535
+ queue = Set[root_address]
536
+ while !queue.empty? && (component.size <= max_size)
537
+ address = queue.first
538
+ queue.delete(address)
539
+ next if component.include?(address)
540
+ component << address
541
+ queue.merge(address_to_record[address]['references'])
542
+ end
543
+
544
+ if component.size <= max_size
545
+ to_remove.merge(component)
546
+ end
547
+ end
548
+
549
+ without(find_all { |r| to_remove.include?(r['address']) })
550
+ end
551
+
552
+ def stats
553
+ unknown_class = 0
554
+ by_class = Hash.new(0)
555
+ each_record do |r|
556
+ if klass = (r['class'] || r['type'] || r['root'])
557
+ by_class[klass] += 1
558
+ else
559
+ unknown_class += 1
560
+ end
561
+ end
562
+ return unknown_class, by_class
563
+ end
564
+
565
+ # Compute the set of records that are not in self but are in to
566
+ #
567
+ # @param [MemoryDump]
568
+ # @return [MemoryDump]
569
+ def diff(to)
570
+ diff = Hash.new
571
+ to.each_record do |r|
572
+ address = r['address']
573
+ if !@address_to_record.include?(address)
574
+ diff[address] = r
575
+ end
576
+ end
577
+ MemoryDump.new(diff)
578
+ end
579
+
580
+ # Compute the interface between self and the other dump, that is the
581
+ # elements of self that have a child in dump, and the elements of dump
582
+ # that have a parent in self
583
+ def interface_with(dump)
584
+ self_border = Hash.new
585
+ dump_border = Hash.new
586
+ each_record do |r|
587
+ next if dump.find_by_address(r['address'])
588
+
589
+ refs_in_dump = r['references'].map do |addr|
590
+ dump.find_by_address(addr)
591
+ end.compact
592
+
593
+ if !refs_in_dump.empty?
594
+ self_border[r['address']] = r
595
+ refs_in_dump.each do |child|
596
+ dump_border[child['address']] = child.dup
597
+ end
598
+ end
599
+ end
600
+
601
+ self_border = MemoryDump.new(self_border)
602
+ dump_border = MemoryDump.new(dump_border)
603
+
604
+ dump.update_keepalive_count(dump_border)
605
+ return self_border, dump_border
606
+ end
607
+
608
+ # Replace all objects in dump by a single "group" object
609
+ def group(name, dump, attributes = Hash.new)
610
+ group_addresses = Set.new
611
+ group_references = Set.new
612
+ dump.each_record do |r|
613
+ group_addresses << r['address']
614
+ group_references.merge(r['references'])
615
+ end
616
+ group_record = attributes.dup
617
+ group_record['address'] = name
618
+ group_record['references'] = group_references - group_addresses
619
+
620
+ updated = Hash[name => group_record]
621
+ each_record do |record|
622
+ next if group_addresses.include?(record['address'])
623
+
624
+ updated_record = record.dup
625
+ updated_record['references'] -= group_addresses
626
+ if updated_record['references'].size != record['references'].size
627
+ updated_record['references'] << name
628
+ end
629
+
630
+ if group_addresses.include?(updated_record['class_address'])
631
+ updated_record['class_address'] = name
632
+ end
633
+ if group_addresses.include?(updated_record['class'])
634
+ updated_record['class'] = name
635
+ end
636
+
637
+ updated[updated_record['address']] = updated_record
638
+ end
639
+
640
+ MemoryDump.new(updated)
641
+ end
642
+
643
+ def update_keepalive_count(dump)
644
+ ensure_graphs_computed
645
+ dump.each_record do |record|
646
+ count = 0
647
+ dump.depth_first_visit(record['address']) { |obj| count += 1 }
648
+ record['keepalive_count'] = count
649
+ record
650
+ end
651
+ end
652
+
653
+ def replace_class_id_by_class_name(add_reference_to_class: false)
654
+ MemDump.replace_class_address_by_name(self, add_reference_to_class: add_reference_to_class)
655
+ end
656
+
657
+ def to_s
658
+ "#<MemoryDump size=#{size}>"
659
+ end
660
+ end
661
+ end
662
+
@@ -0,0 +1,7 @@
1
+ module MemDump
2
+ def self.out_degree(dump)
3
+ records = dump.each_record.sort_by { |r| (r['references'] || Array.new).size }
4
+ end
5
+ end
6
+
7
+
@@ -2,23 +2,40 @@ module MemDump
2
2
  # Replace the address in the 'class' attribute by the class name
3
3
  def self.replace_class_address_by_name(dump, add_reference_to_class: false)
4
4
  class_names = Hash.new
5
+ iclasses = Hash.new
5
6
  dump.each_record do |row|
6
7
  if row['type'] == 'CLASS' || row['type'] == 'MODULE'
7
8
  class_names[row['address']] = row['name']
9
+ elsif row['type'] == 'ICLASS' || row['type'] == "IMEMO"
10
+ iclasses[row['address']] = row
8
11
  end
9
12
  end
10
13
 
11
- dump.each_record.map do |r|
14
+ iclass_size = 0
15
+ while !iclasses.empty? && (iclass_size != iclasses.size)
16
+ iclass_size = iclasses.size
17
+ iclasses.delete_if do |_, r|
18
+ if (klass = r['class']) && (class_name = class_names[klass])
19
+ class_names[r['address']] = "I(#{class_name})"
20
+ r['class'] = class_name
21
+ r['class_address'] = klass
22
+ if add_reference_to_class
23
+ (r['references'] ||= Set.new) << klass
24
+ end
25
+ true
26
+ end
27
+ end
28
+ end
29
+
30
+ dump.map do |r|
12
31
  if klass = r['class']
32
+ r = r.dup
13
33
  r['class'] = class_names[klass] || klass
14
34
  r['class_address'] = klass
15
35
  if add_reference_to_class
16
- (r['references'] ||= Array.new) << klass
36
+ (r['references'] ||= Set.new) << klass
17
37
  end
18
38
  end
19
- if r['type'] == 'ICLASS'
20
- r['class'] = "I(#{r['class']})"
21
- end
22
39
  r
23
40
  end
24
41
  end
@@ -1,3 +1,3 @@
1
1
  module Memdump
2
- VERSION = "0.1.0"
2
+ VERSION = "0.2.0"
3
3
  end
@@ -20,7 +20,8 @@ Gem::Specification.new do |spec|
20
20
  spec.require_paths = ["lib"]
21
21
 
22
22
  spec.add_dependency 'thor'
23
- spec.add_dependency 'rbtrace'
23
+ spec.add_dependency 'rgl'
24
+ spec.add_dependency 'pry'
24
25
  spec.add_development_dependency "bundler", "~> 1.11"
25
26
  spec.add_development_dependency "rake", "~> 10.0"
26
27
  spec.add_development_dependency "minitest", "~> 5.0"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: memdump
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Sylvain Joyeux
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2016-04-25 00:00:00.000000000 Z
11
+ date: 2018-02-03 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: thor
@@ -25,7 +25,21 @@ dependencies:
25
25
  - !ruby/object:Gem::Version
26
26
  version: '0'
27
27
  - !ruby/object:Gem::Dependency
28
- name: rbtrace
28
+ name: rgl
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - ">="
32
+ - !ruby/object:Gem::Version
33
+ version: '0'
34
+ type: :runtime
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - ">="
39
+ - !ruby/object:Gem::Version
40
+ version: '0'
41
+ - !ruby/object:Gem::Dependency
42
+ name: pry
29
43
  requirement: !ruby/object:Gem::Requirement
30
44
  requirements:
31
45
  - - ">="
@@ -98,13 +112,14 @@ files:
98
112
  - lib/memdump.rb
99
113
  - lib/memdump/cleanup_references.rb
100
114
  - lib/memdump/cli.rb
115
+ - lib/memdump/common_ancestor.rb
101
116
  - lib/memdump/convert_to_gml.rb
102
- - lib/memdump/diff.rb
103
117
  - lib/memdump/json_dump.rb
118
+ - lib/memdump/memory_dump.rb
119
+ - lib/memdump/out_degree.rb
104
120
  - lib/memdump/remove_node.rb
105
121
  - lib/memdump/replace_class_address_by_name.rb
106
122
  - lib/memdump/root_of.rb
107
- - lib/memdump/stats.rb
108
123
  - lib/memdump/subgraph_of.rb
109
124
  - lib/memdump/version.rb
110
125
  - memdump.gemspec
@@ -128,9 +143,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
128
143
  version: '0'
129
144
  requirements: []
130
145
  rubyforge_project:
131
- rubygems_version: 2.2.3
146
+ rubygems_version: 2.5.1
132
147
  signing_key:
133
148
  specification_version: 4
134
149
  summary: Tools to manipulate Ruby 2.1+ memory dumps
135
150
  test_files: []
136
- has_rdoc:
@@ -1,44 +0,0 @@
1
- require 'set'
2
-
3
- module MemDump
4
- def self.diff(from, to)
5
- from_objects = Set.new
6
- from.each_record { |r| from_objects << (r['address'] || r['root']) }
7
- puts "#{from_objects.size} objects found in source dump"
8
-
9
- selected_records = Hash.new
10
- remaining_records = Array.new
11
- to.each_record do |r|
12
- address = (r['address'] || r['root'])
13
- if !from_objects.include?(address)
14
- selected_records[address] = r
15
- r['only_in_target'] = 1
16
- else
17
- remaining_records << r
18
- end
19
- end
20
-
21
- total = remaining_records.size + selected_records.size
22
- count = 0
23
- while selected_records.size != count
24
- count = selected_records.size
25
- puts "#{count}/#{total} records selected so far"
26
- remaining_records.delete_if do |r|
27
- address = (r['address'] || r['root'])
28
- references = r['references']
29
-
30
- if references && references.any? { |r| selected_records.has_key?(r) }
31
- selected_records[address] = r
32
- end
33
- end
34
- end
35
- puts "#{count}/#{total} records selected"
36
-
37
- selected_records.each_value do |r|
38
- if references = r['references']
39
- references.delete_if { |a| !selected_records.has_key?(a) }
40
- end
41
- end
42
- selected_records.each_value
43
- end
44
- end
@@ -1,15 +0,0 @@
1
- module MemDump
2
- def self.stats(memdump)
3
- unknown_class = 0
4
- by_class = Hash.new(0)
5
- memdump.each_record do |r|
6
- if klass = (r['class'] || r['type'] || r['root'])
7
- by_class[klass] += 1
8
- else
9
- unknown_class += 1
10
- end
11
- end
12
- return unknown_class, by_class
13
- end
14
- end
15
-