DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gprof.info) Implementation

Info Catalog (gprof.info) Details (gprof.info) File Format
 
 Implementation of Profiling
 ===========================
 
    Profiling works by changing how every function in your program is
 compiled so that when it is called, it will stash away some information
 about where it was called from.  From this, the profiler can figure out
 what function called it, and can count how many times it was called.
 This change is made by the compiler when your program is compiled with
 the `-pg' option, which causes every function to call `mcount' (or
 `_mcount', or `__mcount', depending on the OS and compiler) as one of
 its first operations.
 
    The `mcount' routine, included in the profiling library, is
 responsible for recording in an in-memory call graph table both its
 parent routine (the child) and its parent's parent.  This is typically
 done by examining the stack frame to find both the address of the
 child, and the return address in the original parent.  Since this is a
 very machine-dependent operation, `mcount' itself is typically a short
 assembly-language stub routine that extracts the required information,
 and then calls `__mcount_internal' (a normal C function) with two
 arguments - `frompc' and `selfpc'.  `__mcount_internal' is responsible
 for maintaining the in-memory call graph, which records `frompc',
 `selfpc', and the number of times each of these call arcs was traversed.
 
    GCC Version 2 provides a magical function
 (`__builtin_return_address'), which allows a generic `mcount' function
 to extract the required information from the stack frame.  However, on
 some architectures, most notably the SPARC, using this builtin can be
 very computationally expensive, and an assembly language version of
 `mcount' is used for performance reasons.
 
    Number-of-calls information for library routines is collected by
 using a special version of the C library.  The programs in it are the
 same as in the usual C library, but they were compiled with `-pg'.  If
 you link your program with `gcc ... -pg', it automatically uses the
 profiling version of the library.
 
    Profiling also involves watching your program as it runs, and
 keeping a histogram of where the program counter happens to be every
 now and then.  Typically the program counter is looked at around 100
 times per second of run time, but the exact frequency may vary from
 system to system.
 
    This is done is one of two ways.  Most UNIX-like operating systems
 provide a `profil()' system call, which registers a memory array with
 the kernel, along with a scale factor that determines how the program's
 address space maps into the array.  Typical scaling values cause every
 2 to 8 bytes of address space to map into a single array slot.  On
 every tick of the system clock (assuming the profiled program is
 running), the value of the program counter is examined and the
 corresponding slot in the memory array is incremented.  Since this is
 done in the kernel, which had to interrupt the process anyway to handle
 the clock interrupt, very little additional system overhead is required.
 
    However, some operating systems, most notably Linux 2.0 (and
 earlier), do not provide a `profil()' system call.  On such a system,
 arrangements are made for the kernel to periodically deliver a signal
 to the process (typically via `setitimer()'), which then performs the
 same operation of examining the program counter and incrementing a slot
 in the memory array.  Since this method requires a signal to be
 delivered to user space every time a sample is taken, it uses
 considerably more overhead than kernel-based profiling.  Also, due to
 the added delay required to deliver the signal, this method is less
 accurate as well.
 
    A special startup routine allocates memory for the histogram and
 either calls `profil()' or sets up a clock signal handler.  This
 routine (`monstartup') can be invoked in several ways.  On Linux
 systems, a special profiling startup file `gcrt0.o', which invokes
 `monstartup' before `main', is used instead of the default `crt0.o'.
 Use of this special startup file is one of the effects of using `gcc
 ... -pg' to link.  On SPARC systems, no special startup files are used.
 Rather, the `mcount' routine, when it is invoked for the first time
 (typically when `main' is called), calls `monstartup'.
 
    If the compiler's `-a' option was used, basic-block counting is also
 enabled.  Each object file is then compiled with a static array of
 counts, initially zero.  In the executable code, every time a new
 basic-block begins (i.e. when an `if' statement appears), an extra
 instruction is inserted to increment the corresponding count in the
 array.  At compile time, a paired array was constructed that recorded
 the starting address of each basic-block.  Taken together, the two
 arrays record the starting address of every basic-block, along with the
 number of times it was executed.
 
    The profiling library also includes a function (`mcleanup') which is
 typically registered using `atexit()' to be called as the program
 exits, and is responsible for writing the file `gmon.out'.  Profiling
 is turned off, various headers are output, and the histogram is
 written, followed by the call-graph arcs and the basic-block counts.
 
    The output from `gprof' gives no indication of parts of your program
 that are limited by I/O or swapping bandwidth.  This is because samples
 of the program counter are taken at fixed intervals of the program's
 run time.  Therefore, the time measurements in `gprof' output say
 nothing about time that your program was not running.  For example, a
 part of the program that creates so much data that it cannot all fit in
 physical memory at once may run very slowly due to thrashing, but
 `gprof' will say it uses little time.  On the other hand, sampling by
 run time has the advantage that the amount of load due to other users
 won't directly affect the output you get.
 
Info Catalog (gprof.info) Details (gprof.info) File Format
automatically generated byinfo2html