DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gprof.info) Internals

Info Catalog (gprof.info) File Format (gprof.info) Details (gprof.info) Debugging
 
 `gprof''s Internal Operation
 ============================
 
    Like most programs, `gprof' begins by processing its options.
 During this stage, it may building its symspec list
 (`sym_ids.c:sym_id_add'), if options are specified which use symspecs.
 `gprof' maintains a single linked list of symspecs, which will
 eventually get turned into 12 symbol tables, organized into six
 include/exclude pairs - one pair each for the flat profile
 (INCL_FLAT/EXCL_FLAT), the call graph arcs (INCL_ARCS/EXCL_ARCS),
 printing in the call graph (INCL_GRAPH/EXCL_GRAPH), timing propagation
 in the call graph (INCL_TIME/EXCL_TIME), the annotated source listing
 (INCL_ANNO/EXCL_ANNO), and the execution count listing
 (INCL_EXEC/EXCL_EXEC).
 
    After option processing, `gprof' finishes building the symspec list
 by adding all the symspecs in `default_excluded_list' to the exclude
 lists EXCL_TIME and EXCL_GRAPH, and if line-by-line profiling is
 specified, EXCL_FLAT as well.  These default excludes are not added to
 EXCL_ANNO, EXCL_ARCS, and EXCL_EXEC.
 
    Next, the BFD library is called to open the object file, verify that
 it is an object file, and read its symbol table (`core.c:core_init'),
 using `bfd_canonicalize_symtab' after mallocing an appropriately sized
 array of symbols.  At this point, function mappings are read (if the
 `--file-ordering' option has been specified), and the core text space
 is read into memory (if the `-c' option was given).
 
    `gprof''s own symbol table, an array of Sym structures, is now built.
 This is done in one of two ways, by one of two routines, depending on
 whether line-by-line profiling (`-l' option) has been enabled.  For
 normal profiling, the BFD canonical symbol table is scanned.  For
 line-by-line profiling, every text space address is examined, and a new
 symbol table entry gets created every time the line number changes.  In
 either case, two passes are made through the symbol table - one to
 count the size of the symbol table required, and the other to actually
 read the symbols.  In between the two passes, a single array of type
 `Sym' is created of the appropriate length.  Finally,
 `symtab.c:symtab_finalize' is called to sort the symbol table and
 remove duplicate entries (entries with the same memory address).
 
    The symbol table must be a contiguous array for two reasons.  First,
 the `qsort' library function (which sorts an array) will be used to
 sort the symbol table.  Also, the symbol lookup routine
 (`symtab.c:sym_lookup'), which finds symbols based on memory address,
 uses a binary search algorithm which requires the symbol table to be a
 sorted array.  Function symbols are indicated with an `is_func' flag.
 Line number symbols have no special flags set.  Additionally, a symbol
 can have an `is_static' flag to indicate that it is a local symbol.
 
    With the symbol table read, the symspecs can now be translated into
 Syms (`sym_ids.c:sym_id_parse').  Remember that a single symspec can
 match multiple symbols.  An array of symbol tables (`syms') is created,
 each entry of which is a symbol table of Syms to be included or
 excluded from a particular listing.  The master symbol table and the
 symspecs are examined by nested loops, and every symbol that matches a
 symspec is inserted into the appropriate syms table.  This is done
 twice, once to count the size of each required symbol table, and again
 to build the tables, which have been malloced between passes.  From now
 on, to determine whether a symbol is on an include or exclude symspec
 list, `gprof' simply uses its standard symbol lookup routine on the
 appropriate table in the `syms' array.
 
    Now the profile data file(s) themselves are read
 (`gmon_io.c:gmon_out_read'), first by checking for a new-style
 `gmon.out' header, then assuming this is an old-style BSD `gmon.out' if
 the magic number test failed.
 
    New-style histogram records are read by `hist.c:hist_read_rec'.  For
 the first histogram record, allocate a memory array to hold all the
 bins, and read them in.  When multiple profile data files (or files
 with multiple histogram records) are read, the starting address, ending
 address, number of bins and sampling rate must match between the
 various histograms, or a fatal error will result.  If everything
 matches, just sum the additional histograms into the existing in-memory
 array.
 
    As each call graph record is read (`call_graph.c:cg_read_rec'), the
 parent and child addresses are matched to symbol table entries, and a
 call graph arc is created by `cg_arcs.c:arc_add', unless the arc fails
 a symspec check against INCL_ARCS/EXCL_ARCS.  As each arc is added, a
 linked list is maintained of the parent's child arcs, and of the child's
 parent arcs.  Both the child's call count and the arc's call count are
 incremented by the record's call count.
 
    Basic-block records are read (`basic_blocks.c:bb_read_rec'), but
 only if line-by-line profiling has been selected.  Each basic-block
 address is matched to a corresponding line symbol in the symbol table,
 and an entry made in the symbol's bb_addr and bb_calls arrays.  Again,
 if multiple basic-block records are present for the same address, the
 call counts are cumulative.
 
    A gmon.sum file is dumped, if requested (`gmon_io.c:gmon_out_write').
 
    If histograms were present in the data files, assign them to symbols
 (`hist.c:hist_assign_samples') by iterating over all the sample bins
 and assigning them to symbols.  Since the symbol table is sorted in
 order of ascending memory addresses, we can simple follow along in the
 symbol table as we make our pass over the sample bins.  This step
 includes a symspec check against INCL_FLAT/EXCL_FLAT.  Depending on the
 histogram scale factor, a sample bin may span multiple symbols, in
 which case a fraction of the sample count is allocated to each symbol,
 proportional to the degree of overlap.  This effect is rare for normal
 profiling, but overlaps are more common during line-by-line profiling,
 and can cause each of two adjacent lines to be credited with half a
 hit, for example.
 
    If call graph data is present, `cg_arcs.c:cg_assemble' is called.
 First, if `-c' was specified, a machine-dependent routine (`find_call')
 scans through each symbol's machine code, looking for subroutine call
 instructions, and adding them to the call graph with a zero call count.
 A topological sort is performed by depth-first numbering all the
 symbols (`cg_dfn.c:cg_dfn'), so that children are always numbered less
 than their parents, then making a array of pointers into the symbol
 table and sorting it into numerical order, which is reverse topological
 order (children appear before parents).  Cycles are also detected at
 this point, all members of which are assigned the same topological
 number.  Two passes are now made through this sorted array of symbol
 pointers.  The first pass, from end to beginning (parents to children),
 computes the fraction of child time to propagate to each parent and a
 print flag.  The print flag reflects symspec handling of
 INCL_GRAPH/EXCL_GRAPH, with a parent's include or exclude (print or no
 print) property being propagated to its children, unless they
 themselves explicitly appear in INCL_GRAPH or EXCL_GRAPH.  A second
 pass, from beginning to end (children to parents) actually propagates
 the timings along the call graph, subject to a check against
 INCL_TIME/EXCL_TIME.  With the print flag, fractions, and timings now
 stored in the symbol structures, the topological sort array is now
 discarded, and a new array of pointers is assembled, this time sorted
 by propagated time.
 
    Finally, print the various outputs the user requested, which is now
 fairly straightforward.  The call graph (`cg_print.c:cg_print') and
 flat profile (`hist.c:hist_print') are regurgitations of values already
 computed.  The annotated source listing
 (`basic_blocks.c:print_annotated_source') uses basic-block information,
 if present, to label each line of code with call counts, otherwise only
 the function call counts are presented.
 
    The function ordering code is marginally well documented in the
 source code itself (`cg_print.c').  Basically, the functions with the
 most use and the most parents are placed first, followed by other
 functions with the most use, followed by lower use functions, followed
 by unused functions at the end.
 
Info Catalog (gprof.info) File Format (gprof.info) Details (gprof.info) Debugging
automatically generated byinfo2html