SamplingProfiler is a performance profiling tool for Delphi 5 to 32bits Delphi XE4. Its purpose is to help locate bottlenecks, even in final, optimized code running at full-speed.
News, Tips and posts about SamplingProfiler
Online Mini-Guide – Support, Bugs &Suggestions
Though it may be able to profile application compiled by many other compilers, the focus is (currently) solely on Delphi applications.
What is a sampling profiler?
There are basically two kinds of profiling tools: instrumenting profilers (source or binary) and sampling profilers. Instrumenting profilers work by altering an application code or binary, and adding calls to functions that will count how many time each procedure was called and how many time was spent inside.
This approach allows an exhaustive analysis of which code called which code, and how many times was spent in each procedures. However, it will typically incur a significant execution speed and memory penalty that can only be avoided by spending time and insight and limiting instrumentation to a subset of an application’s functions, making them more suitable when you know where the issue is (see GpProfile for a free instrumenting Delphi profiler).
Sampling profilers on the other hand do not require instrumentation and proceed by a statistical analysis by periodically looking at which code is currently being executed by the profiled application.
The statistical nature means that not all code may be seen by the profiler (only code that takes time to execute), profiling information may also vary statistically between executions, and context information for bottlenecks is typically more limited.
By focusing on what code is actually taking execution time, and not being as intrusive, they can be used to pinpoint actual bottlenecks in production code, a feat instrumenting profilers aren’t capable of. They also provide bottleneck information down to the code line, and can point to issues that aren’t in your explicit code (such as call convention overhead, local values initialization/cleanup, etc.).
With little practice, a single profiling run using the production executable is usually enough to gather enough information to identify bottlenecks and focus optimizations where they will truly matter.
Why should I use a sampling profiler?
Using a sampling profiler has benefits:
- it will not affect the execution speed significantly, neither because of its own execution times, nor because it affects the CPU instruction or data cache by its instrumenting code (ie. you get a measure of actual performance like if there was no profiler running)
- it is immune to the heisenbug of instrumenting profiler that inflate disproportionately the execution time of small procedures invoked in tight loops or from many contexts in an application’s code (instrumenting profilers often attempt to subtract their overhead from their timings, but on modern, super-scalar, pipelined processors with multi-leveled caches, this approach is never correct, even statistically speaking).
- it is able to measure the time spent in other OS components or DLLs (like the video driver, OpenGL, etc.), not just the time spent in your application
- profiling latencies won’t hide your application’s latencies (hard disk accesses, network accesses, video driver waits…), which can be particularly significant if your application makes asynchronous accesses.
- it can pinpoint bottlenecks at the code-line level (not just procedure level), for the entire application.
- it can be used to profile over long periods of time, like a full batch run of computations or a complete game level, you can literally have an application being profiled for days
- being lightweight, you can profile multiple applications simultaneously (like a client and a server running on the same development machine)
RealTime Monitor
With version 1.7+, SamplingProfiler includes a small http web server which can be used for real-time monitoring of the profiled application. The monitor provides code hot-spot information in real-time, in HTML or XML form.
This feature can help diagnostic infrequent usage spikes or near-freezes (like infinite loops). It can also be used for monitoring long-running processes executing on other machines.