Performance Profiling and Flame Graphs

CodSpeed’s profiling capabilities provide deep insights into your application’s performance through detailed flame graphs and execution traces. Flame graphs are available for benchmarks using the CPU Simulation instrument and the Walltime instrument.

Reading Flame Graphs

Flame graphs are a visualization tool for profiling software. They provide a graphical representation of your program’s execution, making it easier to understand the runtime complexities involved. Let’s start with an example:

Here, each rectangle represents a function. The width of the rectangle is proportional to the amount of time spent in that function. The wider the rectangle, the more time was spent in that function. The vertical axis represents the call depth (a.k.a. stack depth), which represents the call hierarchy. For this example, here is the call hierarchy:

The root caller is the app function.
app calls the init, handleRequest and terminate functions.
handleRequest calls both authenticateUser and processData.
processData calls foo which in turn calls bar.

Aggregated function calls

Functions calls are aggregated, so if a function is called multiple times, the time spent in all calls is aggregated into a single block. Thus, the following code:

def bar():
    pass

def foo():
    bar()
    bar()

def main():
    foo()
    foo()

Will generate the following flame graph:

Where the foo function is called twice and the bar function is called 4 times, but the time spent is aggregated into a single block for each function.

Self-costs

In the previous example, we could see the global cost of each function call quite clearly. However, it can be tricky to find out how much time was spent within the function itself.

In the above illustration, we can see two types of self-costs:

(implicit) self-costs: the time spent in the function itself is the whole width of the rectangle since it doesn’t call any other functions.
self-costs: the self-cost here is visible as the space not occupied by the children of the block.

Self-costs in interpreted languagesIn Python or Node.js, the self-cost is the time spent in the function itself, but also the time spent by the interpreter. This means that a function will always have a self-cost, even if the function does nothing.

If we change a bit the example, adding a lot of computation directly in the main function:

def main():
    for i in range(100):
        # do some computation
    foo()
    foo()

Then the flame graph would look like this, with the self-cost of main being much bigger than before:

Viewing Flame Graphs

On the pull request page, you can access the flame graphs of a benchmark by expanding it.

Example of flame graphs on a pull request page

Three types of flame graphs are available:

Base: flame graph of the benchmark base run
Head: flame graph of the benchmark run from the latest commit of the pull request
Diff: difference between the head and the base flame graphs

FFI SupportIf you’re using Foreign Function Interface, typically calling C/C++/Rust code from Python or Node.js, make sure to generate debug symbols for the foreign functions so that you can see them in the flame graph.

Inspector

Hover any bar to open the span details. This panel shows you what the function is, where it comes from, and how its time is spent.

Metadata: Function name, source file, code origin
Self time: Time spent in the function body only, excluding child calls.
Total time: Time spent in the function including all its children.

The details of metrics displayed depend on the instrument used to collect the profiling data, head over to the respective instrument documentation for more information:

Color modes

By Origin

Colors by code origin: User, Library, System, or Unknown. Useful to separate your code from dependencies and the kernel.

Differential

Compares Base vs Head and colors spans by change: slower, faster, added, or removed. Ideal for scanning regressions and wins after a commit.

Colors reflect the status of the span compared to the base run:

Slower: The span is slower than in the base run.
Faster: The span is faster than in the base run.
Added: The span has been added in the head run.
Removed: The span is removed in the head run.

By Bottleneck

Colors each span by the dominant bound on its self time: instruction-bound, cache-bound, memory-bound, or system-bound. Fast way to see what is blocking work.

By Function

Colors spans by function symbol so identical functions share a color, no matter where they are called. Helps spot hot functions across call sites.

Function colors will be the same across benchmarks, projects, and runs. So you can easily recognize the same function across different runs.

System Calls toggle

Include kernel and low-level runtime contributions. Off keeps focus on application and library code.

Function list

Upon expanding a flamegraph, you can access the function list. And dive in the details of each span.

Next Steps

https://mintcdn.com/codspeed/DYQLS2JiztYwuRW0/assets/icons/data-primary.svg?fit=max&auto=format&n=DYQLS2JiztYwuRW0&q=85&s=d73410b814160a15cf20f797c167af08

Setup Benchmarks with CPU Simulation

Learn how to enable CPU Simulation to generate flame graphs

Setup Benchmarks with the Walltime instrument

Learn how to enable Walltime instrumentation to generate flame graphs

Performance Regression Detection

Set up automated checks to catch performance issues early

Benchmark Creation Guides

Create comprehensive benchmarks for your codebase

Get Started

Instruments

Integrations

Explore Features

Performance Profiling and Flame Graphs