Skip to content

Implement Report#merge for efficiently combining many reports #110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 54 additions & 33 deletions lib/stackprof/report.rb
Original file line number Diff line number Diff line change
Expand Up @@ -358,49 +358,70 @@ def print_file(filter, f = STDOUT)
end

def +(other)
raise ArgumentError, "cannot combine #{other.class}" unless self.class == other.class
raise ArgumentError, "cannot combine #{modeline} with #{other.modeline}" unless modeline == other.modeline
raise ArgumentError, "cannot combine v#{version} with v#{other.version}" unless version == other.version

f1, f2 = normalized_frames, other.normalized_frames
frames = (f1.keys + f2.keys).uniq.inject(Hash.new) do |hash, id|
if f1[id].nil?
hash[id] = f2[id]
elsif f2[id]
hash[id] = f1[id]
hash[id][:total_samples] += f2[id][:total_samples]
hash[id][:samples] += f2[id][:samples]
if f2[id][:edges]
edges = hash[id][:edges] ||= {}
f2[id][:edges].each do |edge, weight|
edges[edge] ||= 0
edges[edge] += weight
merge(other)
end

def merge(*others)
other_class = others.find {|o| o.class != self.class}
if other_class
raise ArgumentError, "cannot combine #{self.class} with #{other_class}"
end

other_modeline = others.find {|o| o.modeline != modeline}
if other_modeline
raise ArgumentError, "cannot combine #{modeline} with #{other_modeline}"
end

other_version = others.find {|o| o.version != version}
if other_version
raise ArgumentError, "cannot combine v#{version} with v#{other_version}"
end

all = [self] + others
merged = {}

all.each do |report|
report.normalized_frames.each do |id, frame|
if !merged[id]
merged[id] = frame
next
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the early eject here. 👍

end

merged_frame = merged[id]

merged_frame[:total_samples] += frame[:total_samples]
merged_frame[:samples] += frame[:samples]

if frame[:edges]
merged_edges = (merged_frame[:edges] ||= {})
frame[:edges].each do |edge, weight|
merged_edges[edge] ||= 0
merged_edges[edge] += weight
end
end
if f2[id][:lines]
lines = hash[id][:lines] ||= {}
f2[id][:lines].each do |line, weight|
lines[line] = add_lines(lines[line], weight)

if frame[:lines]
merged_lines = (merged_frame[:lines] ||= {})
frame[:lines].each do |line, weight|
merged_lines[line] = add_lines(merged_lines[line], weight)
end
end
else
hash[id] = f1[id]
end
hash
end

d1, d2 = data, other.data
data = {
all_data = all.map(&:data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you could save a few iterations of this data by just calculating the samples, gc_samples, and missed_samples in a single loop:

# untested code... you have been warned!
new_samples, new_gc_samples, new_missed_samples = all.inject([0,0,0]) do |result, data|
                                                    result[0] += data[:samples]
                                                    result[1] += data[:gc_samples]
                                                    result[2] += data[:missed_samples]
                                                    result
                                                  end

new_data = {
  # ...
  samples: new_samples,
  gc_samples: new_gc_samples,
  missed_samples: new_missed_samples
}

However, I realize it looks a bit uglier this way...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's worth optimizing this much since it's O(number of reports) whereas the main loop is O(frames) which should usually be many orders of magnitude larger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, also that. Just something I noticed when looking through the code that stuck out to me, but definitely agree that it probably isn't the big focal point of the optimization.

I think it can be adjusted at a second pass if it turns out to be beneficial, but I think what you have is more readable and fine to keep as is.


new_data = {
version: version,
mode: d1[:mode],
interval: d1[:interval],
samples: d1[:samples] + d2[:samples],
gc_samples: d1[:gc_samples] + d2[:gc_samples],
missed_samples: d1[:missed_samples] + d2[:missed_samples],
frames: frames
mode: data[:mode],
interval: data[:interval],
samples: all_data.map {|d| d[:samples]}.inject(:+),
gc_samples: all_data.map {|d| d[:gc_samples]}.inject(:+),
missed_samples: all_data.map {|d| d[:missed_samples]}.inject(:+),
frames: merged
}

self.class.new(data)
self.class.new(new_data)
end

private
Expand Down
90 changes: 89 additions & 1 deletion test/test_report.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
require 'stackprof'
require 'minitest/autorun'

class ReportDumpTest < MiniTest::Test
class ReportTest < MiniTest::Test
require 'stringio'

def test_dump_to_stdout
Expand All @@ -26,8 +26,96 @@ def test_dump_to_file
assert_dump data, f.string
end

def test_merge
data = [
StackProf.run(mode: :cpu, raw: true) do
foo
end,
StackProf.run(mode: :cpu, raw: true) do
foo
bar
end,
StackProf.run(mode: :cpu, raw: true) do
foo
bar
baz
end
]
expectations = {
foo: {
total_samples: 3,
samples: 3,
total_lines: 1,
},
bar: {
total_samples: 4,
samples: 2,
total_lines: 2,
has_boz_edge: true
},
baz: {
total_samples: 2,
samples: 1,
total_lines: 2,
has_boz_edge: true,
},
boz: {
total_samples: 3,
samples: 3,
total_lines: 1,
},
}

reports = data.map {|d| StackProf::Report.new(d)}
combined = reports[0].merge(*reports[1..-1])

frames = expectations.keys.inject(Hash.new) do |hash, key|
hash[key] = find_method_frame(combined, key)
hash
end

expectations.each do |key, expect|
frame = frames[key]
assert_equal expect[:total_samples], frame[:total_samples], key
assert_equal expect[:samples], frame[:samples], key

assert_equal expect[:total_lines], frame[:lines].length, key
assert_includes frame[:lines], frame[:line] + 1, key
assert_equal [expect[:samples], expect[:samples]], frame[:lines][frame[:line] + 1], key

if expect[:has_boz_edge]
assert_equal ({frames[:boz][:hash] => expect[:samples]}), frame[:edges]
end
end

end

private

def find_method_frame(profile, name)
profile.frames.values.find do |frame|
frame[:name] == "ReportTest##{name}"
end
end

def foo
StackProf.sample
end

def bar
StackProf.sample
boz
end

def baz
StackProf.sample
boz
end

def boz
StackProf.sample
end

def assert_dump(expected, marshal_data)
assert_equal expected, Marshal.load(marshal_data)
end
Expand Down