C and C++ compilation system

Link editing

This topic examines the link editing process in detail. It starts with the default arrangement, and with the basics of linking your program with the standard libraries supplied by the C and C++ compilation system. It also details the implementation of the dynamic linking mechanism, and looks at some coding guidelines and maintenance tips for shared library development.

NOTE: Because this topic tries to cover the widest possible audience, it may provide more background than many users will need to link their programs with C and C++ language libraries. If you are interested only in the how-to, and are comfortable with a purely formal presentation that scants motivation and background alike, you may want to skip to the quick-reference guide in the last subsection.

Link editing refers to the process in which a symbol referenced in one module of your program is connected with its definition in another -- more concretely, the process by which the symbol printf in our sample source file hello.c is connected with its definition in the standard C library. Whichever link editing model you choose, static or dynamic, the link editor will search each module of your program, including any libraries you have used, for definitions of undefined external symbols in the other modules. If it does not find a definition for a symbol, the link editor will report an error by default, and fail to create an executable program. Multiply defined symbols are treated differently, however, under each approach. For details, see ``Handling multiply defined symbols''. The principal difference between static and dynamic linking lies in what happens after this search is completed.

Static linking

Under static linking, copies of the archive library object files that satisfy still unresolved external references in your program are incorporated in your executable at link time. External references in your program are connected with their definitions -- assigned addresses in memory -- when the executable is created.

Dynamic linking

Under dynamic linking, the contents of a shared object are mapped into the virtual address space of your process at run time. External references in your program are connected with their definitions when the program is executed.

You might prefer dynamic to static linking because of the following reasons:

Dynamically linked programs save disk storage and system process memory by sharing library code at run time.
Dynamically linked code can be fixed or enhanced without having to relink applications that depend on it.

Default arrangement

The default cc command line

   $ cc file1.c file2.c file3.c

creates object files corresponding to each of your source files, and links them with each other to create an executable program. These object files are called relocatable object files because they contain references to symbols that have not yet been connected with their definitions -- have not yet been assigned addresses in memory.

This command line arranges for the standard C library functions that you have called in your program to be linked with your executable automatically. By default, the linker looks for these functions in the file libc.so.

NOTE: The standard C library, libc.so, is not a pure shared object library, as it contains both run-time loadable and statically linkable functions. If you use one of these static functions in your program, the code for the function will be statically bound to your executable at link time.

The standard C library contains the system calls described in Section 2 and the C language functions described in Section 3, Subsections 3C and 3S.

NOTE: See ``Libraries and header files'' for details.

The C++ Standard Library is contained within the dynamic library libC.so and the archive library libC.a. Its contents are documented in the 3C++std man pages, see Intro(3C++std) and the 3C++ man pages, see Intro(3C++).

Now let's look at the formal basis for this arrangement:

By convention, shared objects, or dynamically linked libraries, are designated by the prefix lib and the suffix .so; archives, or statically linked libraries, are designated by the prefix lib and the suffix .a with the exception of libc.so as explained in the previous NOTE section.
These conventions are recognized, in turn, by the -l option to the cc and CC commands.
```
$ cc file1.c file2.c file3.c -lx
```
directs the link editor to search the shared object libx.so or the archive library libx.a. The cc command automatically passes -lc to the link editor. The CC command automatically passes -lC -lc to the link editor.
By default, the link editor chooses the shared object implementation of a library, libx.so, in preference to the archive library implementation, libx.a, in the same directory.
By default, the link editor searches for libraries in the standard places on your system, /usr/ccs/lib and /usr/lib, in that order. The standard libraries supplied by the compilation system normally are kept in /usr/ccs/lib.

Therefore, the default cc command line will direct the link editor to search /usr/ccs/lib/libc.so rather than its archive library counterpart.

In ``Creating and linking with archive libraries'' we'll show you how to link your program with the archive version of libc to avoid the dynamic linking default. Of course, you can link your program with libraries that perform other tasks as well. Finally, you can create your own shared objects and archive libraries.

Under the default arrangement the cc command creates and then links relocatable object files to generate an executable program, then arranges for the executable to be linked with the shared C library at run time. If you are satisfied with this arrangement, you need make no other provision for link editing on the cc command line.

Linking with standard libraries

A shared object is a single object file that contains the code for every function in a given library. When you call a function in that library, and dynamically link your program with it, the entire contents of the shared object are mapped into the virtual address space of your process at run time.

Archive libraries are configured differently. Each function, or small group of related functions (typically, the related functions that you will sometimes find on the same manual page), is stored in its own object file. These object files are then collected in archives that are searched by the link editor when you specify the necessary options on the cc command line. The link editor makes available to your program only the object files in these archives that contain a function you have called in your program.

Turning off dynamic linking

As noted, libc.a is the archive version of the standard C library. The cc command will direct the link editor to search libc.a if you turn off the dynamic linking default with the -dn option:

   $ cc -dn file1.c file2.c file3.c

Copies of the object files in libc.a that resolve still unresolved external references in your program will be incorporated in your executable at link time.

Linking with other standard libraries

If you need to point the link editor to standard libraries that are not searched automatically, you specify the -l option explicitly on the cc command line. As shown previously, -lx directs the link editor to search the shared object libx.so or the archive library libx.a. So if your program calls the function sin, for example, in the standard math library libm, the command

   $ cc file1.c file2.c file3.c -lm

will direct the link editor to search for /usr/ccs/lib/libm.so, and if it does not find it, /usr/ccs/lib/libm.a, to satisfy references to sin in your program. Because the compilation system does not supply a shared object version of libm, the above command will direct the link editor to search libm.a unless you have installed a shared object version of libm in the standard place. Note that because the dynamic linking default was not turned off with the -dn option, the above command will direct the link editor to search libc.so rather than libc.a. You would use the same command with the -dn option to link your program statically with libm.a and libc.a. The contents of libm are described in ``Math library (libm)''

NOTE: Because the link editor searches an archive library only to resolve undefined external references it has previously seen, the placement of the -l option on the cc command line is important. The command

   $ cc -dn file1.c -lm file2.c file3.c

will direct the link editor to search libm.a only for definitions that satisfy still unresolved external references in file1.c. As a rule, then, it is best to put -l at the end of the command line.

Creating and linking with archive libraries

This topic describes the basic mechanisms by which archives and shared objects are built. The idea is to give you some sense of where these libraries come from, as a basis for understanding how they are implemented and linked with your programs.

The following commands

   $ cc -c function1.c function2.c function3.c
   $ ar -r libfoo.a function1.o function2.o function3.o

will create an archive library, libfoo.a, that consists of the named object files.

NOTE: See ar(1) for details of usage.

When you use the -l option to link your program with libfoo.a

   $ cc -Ldir file1.c file2.c file3.c -lfoo

the link editor will incorporate in your executable only the object files in this archive that contain a function you have called in your program. Note, again, that because the dynamic linking default was not turned off with the -dn option, the above command will direct the link editor to search libc.so as well as libfoo.a.

Creating and linking with C shared object libraries

Create a shared object library by specifying the -G option to the link editor:

   $ cc -G -o libfoo.so -K PIC function1.c function2.c function3.c

That command will create the shared object libfoo.so consisting of the object code for the functions contained in the named files.

NOTE: See ``Implementation'' for details of compiler option -K PIC.

When you use the -l option to link your program with libfoo.so

   $ cc -Ldir file1.c file2.c file3.c -lfoo

the link editor will record in your executable the name of the shared object and a small amount of bookkeeping information for use by the system at run time. Another component of the system -- the dynamic linker -- does the actual linking.

NOTE: Because shared object code is not copied into your executable object file at link time, a dynamically linked executable normally will use less disk space than a statically linked executable. For the same reason, shared object code can be changed without breaking executables that depend on it. Even if the shared C library were enhanced in the future, you would not have to relink programs that depended on it as long as the enhancements were compatible with your code. The dynamic linker would simply use the definitions in the new version of the library to resolve external references in your executables at run time. See ``Checking for run-time compatibility'' for more information.

Naming your shared object

You can specify the name of the shared object that you want to create under the -G option. The following command, for example, will create a shared object called a.out:

   $ cc -G function1.o function2.o function3.o

You can then rename the shared object:

   $ mv a.out libfoo.so

As noted, you use the lib prefix and the .so suffix because they are conventions recognized by -l, just as are lib and .a for archive libraries. So while it is legitimate to create a shared object that does not follow the naming convention, and to link it with your program

   $ cc -G -o sharedob function1.o function2.o function3.o
   $ cc file1.c file2.c file3.c /path/sharedob

we recommend against it. Not only will you have to enter a path name on the cc command line every time you use sharedob in a program, that path name will be hard-coded in your executables.

The command line

   $ cc -Ldir file1.c file2.c file3.c -lfoo

directs the link editor to record in your executable the name of the shared object with which it is to be linked at run time.

NOTE: cc links the name of the shared object, not its path name.

When you use the -l option to link your program with a shared object library, not only must the link editor be told which directory to search for that library, so must the dynamic linker (unless the directory is the standard place, which the dynamic linker searches by default). See ``Specifying directories to be searched by the dynamic linker'' for more information about pointing to the dynamic linker. However, as long as the path name of a shared object is not hard-coded in your executable, you can move the shared object to a different directory without breaking your program. You should avoid using path names of shared objects on the cc command line. Those path names will be hard-coded in your executable. They won't be if you use -l.

Linking a shared object with another library

Finally, the cc -G command will not only create a shared object, it will accept a shared object or archive library as input. When you create libfoo.so, you can link it with a library you have already created such as libsharedob.so:

   $ cc -G -o libfoo.so -Ldir function1.o function2.o \
      function3.o -lsharedob

That command will arrange for libsharedob.so to be linked with libfoo.so when, at run time, libfoo.so is linked with your program. It will also arrange for ld to search libsharedobj.so for unresolved symbols when you link a program with libfoo.so. Note that here you will have to point the dynamic linker to the directories in which both libfoo.so and libsharedob.so are stored.

In order to link your program with libfoo.so, you will have to point the link editor to the directories in which libfoo.so and libsharedobj.so are stored. In the following discussions, libsharedobj.so will be referred to as the needed library.

Specifying directories to be searched by the link editor

In the previous section you created the archive library libfoo.a and the shared objects libsharedobj.so and libfoo.so. For this example, all three of these libraries are stored in the directory /home/mylibs, and the executable is being created in a different directory. This example reflects the way most programmers organize their work on the UNIX^® operating system.

In order to link your program with either of these libraries, the link editor must access the /home/mylibs directory. Specify the directory's path name with the -L option:

   $ cc -L/home/mylibs file1.c file2.c file3.c -lfoo

The -L option directs the link editor to search for the libraries named with -l and the needed libraries first in the specified directory, then in the standard places. In this example, having found the directory /home/mylibs, the link editor will search libfoo.so rather than libfoo.a. As shown earlier, when the link editor encounters otherwise identically named shared object and archive libraries in the same directory, it searches the library with the .so suffix by default. For the same reason, it will search libc.so here rather than libc.a. Note that you must specify -L if you want the link editor to search for libraries in your current directory. You can use a period (.) to represent the current directory.

To direct the link editor to search libfoo.a, you can turn off the dynamic linking default:

   $ cc -dn -L/home/mylibs file1.c file2.c file3.c -lfoo

Under -dn, the link editor will not accept shared objects as input. Here, then, it will search libfoo.a rather than libfoo.so, and libc.a rather than libc.so. Note that libsharedobj.so will not be searched because libfoo.a is an archive library.

To link your program statically with libfoo.a and dynamically with libc.so, you can do either of two things. First, you can move libfoo.a to a different directory -- /home/archives, for example -- then specify /home/archives with the -L option:

   $ cc -L/home/archives -L/home/mylibs file1.c file2.c \
      file3.c -lfoo

As long as the link editor encounters the /home/archives directory before it encounters the /home/mylibs directory, it will search libfoo.a rather than libfoo.so. When otherwise identically named .so and .a libraries exist in different directories, the link editor will search the first one it finds. The same thing is true, by the way, for identically named libraries of either type. If you have different versions of libfoo.a in your directories, the link editor will search the first one it finds.

A better alternative might be to leave libfoo.a where you had it in the first place and use the -Bstatic and -Bdynamic options to turn dynamic linking off and on. The following command will link your program statically with libfoo.a and dynamically with libc.so:

   $ cc -L/home/mylibs file1.c file2.c file3.c -Bstatic \
      -lfoo -Bdynamic

When you specify -Bstatic, the link editor will not accept a shared object as input until you specify -Bdynamic. You can use these options as toggles as often as needed on the cc command line:

   $ cc -L/home/mylibs file1.c file2.c -Bstatic -lfoo \
      file3.c -Bdynamic -lsharedob

That command will direct the link editor to search the following libraries:

libfoo.a to resolve still unresolved external references in file1.c and file2.c;
libsharedob.so to resolve still unresolved external references in all three files and in libfoo.a;
libc.so to resolve still unresolved external references in all three files and the preceding libraries.

Files, including libraries, are searched for definitions in the order they are listed on the cc command line. The standard C library is always searched after the libraries named with -l and before the needed libraries..

You can add to the list of directories to be searched by the link editor by using the environment variable LD_LIBRARY_PATH. LD_LIBRARY_PATH must be a list of colon-separated directory names. An optional second list is separated from the first by a semicolon:

   $ LD_LIBRARY_PATH=dir:dir;dir:dir export LD_LIBRARY_PATH

The directories specified before the semicolon are searched, in order, before the directories specified with -L; the directories specified after the semicolon are searched, in order, after the directories specified with -L. Note that you can use LD_LIBRARY_PATH in place of -L altogether. In that case the link editor will search for libraries named with -l and the needed libraries first in the directories specified before the semicolon, next in the directories specified after the semicolon, and last in the standard places. You should use absolute path names when you set this environment variable.

NOTE: LD_LIBRARY_PATH is also used by the dynamic linker. If LD_LIBRARY_PATH exists in your environment, the dynamic linker will search the directories named in it for shared objects to be linked with your program at execution. In using LD_LIBRARY_PATH with the link editor or the dynamic linker, then, you should keep in mind that any directories you give to one you are also giving to the other.

Specifying directories to be searched by the dynamic linker

When you use the -l option, you must point the dynamic linker to the directories of the shared objects that are to be linked with your program at execution. The environment variable LD_RUN_PATH lets you do that at link time. To set LD_RUN_PATH, list the absolute path names of the directories you want searched in the order you want them searched. Separate path names with a colon, as shown in the following example:

   $ LD_RUN_PATH=/home/mylibs1:/home/mylibs2 export LD_RUN_PATH

The command

   $ cc -o prog -L/home/mylibs file1.c file2.c file3.c -lfoo

will direct the dynamic linker to search for libfoo.so in /home/mylibs1 then /home/mylibs2 when you execute your program:

   $ prog

The dynamic linker searches the standard place by default, after searching the directories you have assigned to LD_RUN_PATH (/home/mylibs1 and /home/mylibs2). Note that as far as the dynamic linker is concerned, the standard place for libraries is /usr/lib. Any executable versions of libraries supplied by the compilation system are kept in /usr/lib.

The environment variable LD_LIBRARY_PATH lets you do the same thing at run time. Suppose you have moved libfoo.so to /home/sharedobs. It is too late to replace /home/mylibs with /home/sharedobs in LD_RUN_PATH, at least without link editing your program again. You can, however, assign the new directory to LD_LIBRARY_PATH, as follows:

   $ LD_LIBRARY_PATH=/home/sharedobs export LD_LIBRARY_PATH

Now when you execute your program

   $ prog

the dynamic linker will search for libfoo.so first in /home/mylibs1 then in /home/mylibs2 and, not finding it in either directory, in /home/sharedobs. The directories assigned to LD_RUN_PATH are searched before the directories assigned to LD_LIBRARY_PATH. The important point is that because the path name of libfoo.so is not hard-coded in prog, you can direct the dynamic linker to search a different directory when you execute your program. You can move a shared object without breaking your application.

You can set LD_LIBRARY_PATH without first having set LD_RUN_PATH. The main difference between them is that once you have used LD_RUN_PATH for an application, the dynamic linker will search the specified directories every time the application is executed (unless you have relinked the application in a different environment). In contrast, you can assign different directories to LD_LIBRARY_PATH each time you execute the application. LD_LIBRARY_PATH directs the dynamic linker to search the assigned directories before it searches the standard place. Directories, including those in the optional second list, are searched in the order listed.

NOTE: For security, the dynamic linker ignores LD_LIBRARY_PATH for set-user and set-group ID programs and for privileged processes. It does, however, search LD_RUN_PATH directories and /usr/lib.

Implementation

The following lists the basic implementation of the static and dynamic linking mechanisms:

When you use an archive library function, a copy of the object file that contains the function is incorporated in your executable at link time. External references to the function are assigned virtual addresses when the executable is created.
When you use a shared library function, the entire contents of the library are mapped into the virtual address space of your process at run time. External references to the function are assigned virtual addresses when you execute the program. The link editor records in your executable only the name of the shared object and a small amount of bookkeeping information for use by the dynamic linker at run time.

There are one or two cases in which you might not want to use dynamic linking. Because shared object code is not copied into your executable object file at link time, a dynamically linked executable normally will use less disk space than a statically linked executable. If your program calls only a few small library functions, however, the bookkeeping information to be used by the dynamic linker may take up more space in your executable than the code for those functions. You can use the size command to determine the difference. See size(1) for more information.

In a similar way, using a shared object may occasionally add to the memory requirements of a process. Although a shared object's text is shared by all processes that use it, its writable data typically is not. See ``Guidelines for building shared objects'' for details. Every process that uses a shared object usually gets a private copy of its entire data segment, regardless of how much of the data is needed. If an application uses only a small portion of a shared library's text and data, executing the application might require more memory with a shared object than without one. For example, it would waste memory to use the standard C shared object library to access only strcmp. Although sharing strcmp saves space on your disk and memory on the system, the memory cost to your process of having a private copy of the C library's data segment would make the archive version of strcmp the more appropriate choice.

Now let's consider dynamic linking in a bit more detail. First, each process that uses a shared object references a single copy of its code in memory. That means that when other users on your system call a function in a shared object library, the entire contents of that library are mapped into the virtual address space of their processes as well. If they have called the same function as you, external references to the function in their programs will, in all likelihood, be assigned different virtual addresses. Because the function may be loaded at a different virtual address for each process that uses it, the system cannot calculate absolute addresses in memory until run time.

Second, the memory management scheme underlying dynamic linking shares memory among processes at the granularity of a page. Memory pages can be shared as long as they are not modified at run time. If a process writes to a shared page while relocating a reference to a shared object, it gets a private copy of that page and loses the benefits of code sharing without affecting other users of the page.

Third, to create programs that require the least possible amount of page modification at run time, the compiler generates position-independent code under the -K PIC option. Whereas executable code normally must be tied to a fixed address in memory, position-independent code can be loaded anywhere in the address space of a process. Because the code is not tied to specific addresses, it will execute correctly -- without page modification -- at a different address in each process that uses it. As we have indicated, you should specify -K PIC when you create a shared object:

   $ cc -K PIC -G -o libfoo.so function1.c function2.c \
      function3.c

Relocatable references in your object code will be moved from its text segment to tables in the data segment. See ``Object files'' for details.

Handling multiply defined symbols

Multiply defined symbols -- except for different-sized initialized data objects -- are not reported as errors under dynamic linking. The link editor will not report an error for multiple definitions of a function or a same-sized data object when each such definition resides within a different shared object or within a dynamically linked executable and different shared objects. The dynamic linker will use the definition in whichever object occurs first on the cc command line. You can, however, specify -Bsymbolic when you create a shared object

   $ cc -K PIC -G -Bsymbolic -o libfoo.so function1.c \
      function2.c function3.c

to insure that the dynamic linker will use the shared object's definition of one of its own symbols, rather than a definition of the same symbol in an executable or another library.

In contrast, multiply defined symbols are generally reported as errors under static linking, because definitions of so-called weak symbols can be hidden from the link editor by a definition of a global symbol. If a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error.

To illustrate this, let's look at our own implementation of the standard C library. This library provides services that users are allowed to redefine and replace. At the same time, however, ANSI C defines standard services that must be present on the system and cannot be replaced in a strictly conforming program. fread, for example, is an ANSI C library function; the system function read is not. So a conforming program may redefine read and still use fread in a predictable way.

The problem with this is that read underlies the fread implementation in the standard C library. A program that redefines read could ``confuse'' the fread implementation. To guard against this, ANSI C states that an implementation cannot use a name that is not reserved to it. Therefore _read -- note the leading underscore -- is used to implement fread in the standard C library.

Now suppose that a program you have written calls read. If your program is going to work, the definition for read has to exist in the C library. It is identical to the definition for _read and is contained in the same object file.

Suppose further that another program you have written redefines read, and that this same program calls fread. Because you get our definitions of both _read and read when you use fread, we would expect the link editor to report the multiply defined symbol read as an error, and fail to create an executable program. To prevent that, use the #pragma directive in your source code for the library as follows:

   #pragma weak read = _read

Because our read is defined as a weak symbol, your own definition of read will override the definition in the standard C library. You can use the #pragma directive in the same way in your own library code.

There's a second use for weak symbols that you ought to know about:

   #pragma weak read

tells the link editor not to complain if it does not find a definition for the weak symbol read. References to the symbol use the symbol value if defined, 0 otherwise. The link editor does not extract archive members to resolve undefined weak symbols. The mechanism is intended to be used primarily with functions. Although it will work for most data objects, it should not be used with uninitialized global data (``common'' symbols) or with shared library data objects that are exported to executables.