DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

KCRASH PART 2 : VIRTUAL MEMORY SYSTEM



VM Architecture Overview

Our discussion of the Virtual Memory (VM) System will include such things as user and kernel virtual address space, file and memory mapping, and paging and swapping. As in the previous section, there are new macros to load from Appendix C as well as many diagrams which depict the structures we discuss.

The loadmacs file for this section will require that you obtain the following macros which you can find in Appendix C2.

anon.k
anon_map.k
pageall.k
procvm.k
segall.k
seg_ops.k
seglst.k
segvn_data.k
swapinfo.k
vm.k

Invoke kcrash on a crash dump :

# kcrash crash.mmdd sym.mmdd
S> rg panicregs
S> < /crash/macros/loadmacs

Now that we have a dump to analyze, we need a process to disect. In our example, we use the current active process denoted by "*practive". You may choose any process. Run the 'ps' macro if you want to use a process other than "practive". Once you have the address of the process, give it as an argument to the "procvm" macro which will show us the portions of the process structure that are related to VM.

Figure 1 (struct proc)

  S> procvm *practive 
  *p_seguslo    D1282228      /* process's segu slot address */
  *p_segu       F2206000      /* pointer to seguser structure */
  *p_as         D1741880      /* pointer to as structure */
  *p_trace      00000000      /* pointer to /proc vnode */
  *p_exec       D14377F8      /* pointer to a.out vnode */
  p_pri         00000041      /* scheduling priority */
  p_usize       0002          /* size of ublocks (*4096 bytes) */
u.u_psargs = "sleep 60 "
The process structure points to the address space structure, the seguser structure and the vnode structure as well as many others not related to VM. We will discuss the elements above in more detail in the subsection labeled User Virtual Address Space below. Right now, we just want to preview the structure of the VM system.

Since the memory address space of each process is defined by the elements of the as structure, we can review the relevant memobers of it by using the *p_as value given by procvm.

Figure 2 (struct as)

  S> as D1741880 
  as [D1741880]: keepcnt 000000 segs D172B560 seglast D172CBE0 sz C000 rss B
    hat: pts D1726960 ptlast D1726960 pdtp 00000000 cr3 00000000 ref -780881472
The segs entry is a pointer to a sorted, doubly linked list of segment structures. This circular list begins at the virtual address of a segment or s_base and is sorted in ascending order. The pointer seglast is a pointer to the last address in this list. The total number of bytes used by the process is given by size and the amount of memory claimed by the process is given in the rss field. Both size and rss are reported in hex. The 'hat' members pts, ptlast and pdtp point to the HAT layer of the virtual memory system.

The hardware address translation or HAT layer of the VM system handles the address translation hardware as a cache which is driven by system calls and exception handlers which are at a much higher level in the VM system. The job of HAT is to manage the hardware. We use the 'as' structures' "hat: pts" value as an argument to the hatpt macro.

Figure 3 (struct hatpt)

  S> hatpt D1163640
  D1163640: forw D153E460 back D153E460 next D153E460 prev D14A4320
   pde 7D5007 pdtep E0200080 as D153AF00 aec 1A2 locks 2 pgtp D153E280
   mcp[00000000 00000000 C2648500 C31B4400 C31B4480 C31B4500 C399BF00 C2648F80]
   mcp[C31B4380 C2648F00 C2648080 C2648C00 C1FEB080 C1FEB300 C1FEB780 C1FEB800]
...
  002 mapping  hat_mcpp  +offset hat_epmc  +offset    pte
     C2648500 C2648000 C2648028 C07D511C C07D5100 00000000
  003 mapping  hat_mcpp  +offset hat_epmc  +offset    pte
     C31B4400 C31B4000 C31B4020 C07D519F C07D5180 00639025
...
The forw, back, next and prev entries are all pointers to related hatpt structures. The 'pde' entry is the page directory entry (PDE) for the page table, pdtep is the page directory table (PDT) entry pointer, as is the pointer back to the containing address structure, aec is the active entry count, and locks represents the number of locked PTEs. The mcp information is the mapping chuck pointer array. What follows is the hat_mcpp (HAT_MCPP) which are pointers to the page table chunks for the 31 mapping chunks in each page. The hat_epmc or HAT_EPMC are the pointers to the entries per mapping chunk. Finally, the pte value is the page table entry.

Not only does the address space of each process point us to the HAT layer, it gives us pointers to the segments of the process. We pass to the segall macro the address given in the 'seg' element of as. Figure 4 (struct seg)

  S> segall D172B560 
  s_lock     @D172B560      /* lock to prevent races */
  *s_base     08046000      /* base virtual address of segment */
  s_size      00002000      /* size in bytes of this segment */
  *s_as       D1741880      /* pointer back to the containing address space */
  *s_next     D15C7F20      /* pointer to next seg in this address space */
  *s_prev     D172CBE0      /* pointer to prev seg in this address space */
  *s_ops      D01ADDD4      /* pointer to segment operations structure */
  *s_data     D17780B4      /* pointer to segvn_data */
The segall macro only shows one segment of a process at a time. In the User Virtual Address Space discussion, we will use a different macro to display all the segments of a process and give a detailed account of process address space and segments. For now, we want to just use some of the pointers in the seg structure to preview other VM related segment structures.

Some of the segment operations are segvn_fault, segvn_dup, segvn_checkprot, and segvn_getvp. Actually, there are 17 segment operations which can be seen in the seg_ops structure. To see these, we use a generated macro called seg_ops and give it the s_ops address above as an argument:

Figure 5 (struct seg_ops)

  S>  seg_ops D01ADDD4
  *dup                  D0068AC0      /* duplicate the segment */
  *unmap                D0068CD0      /* used to unmap the segment */
  *free                 D0069250      /*unmaps and deletes all resources used*/
  *fault                D0069CB0      /* used by page fault routines */
  *faulta               D006A180      /* used by pre-fetch pages */
  *unload               D006A260      /* free hats associated with pages */
  *setprot              D006A350      /* set page protections */
  *checkprot            D006A670      /* display page protections */
  *kluster              D006A7F0      /* used by vm for pre-fetch */
  *swapout              D006A930      /* used to swap out pages */
  *sync                 D006AB00      /*write changed pages to map/swap file*/
  *incore               D006AD80      /* are pages in physical mem? */
  *lockop               D006AF70      /* lock used for segment pages */
  *getprot              D006A6F0      /* give page protections */
  *getoffset            D006A780      /* s_base */
  *gettype              D006A7A0      /* give page type */
  *getvp                D006A7C0      /* return vnode pointer */
All of the entries in seg_ops are pointers to segment operations. For example, *fault is used to handle a page fault which can be done by the segvn_fault routine:
  S> di D0069CB0
  segvn_fault:  55                            pushl  %ebp
So, using the kcrash command 'di' with any of the seg_ops addresses would display the routine that was called, so viewing the seg_ops structure can be valuable.

Within the segment structure, the other element relevant to our discussion on VM is the segvn_data structure. It is important at this time because it shows whether this segment of the process has any anonymous pages associated with it. We pass the *s_data value from segall as an argument.

Figure 6 (struct segvn_data)

  S> segvn_data D17780B4
  lock       @D17780B4      /* lock on segment pages */
  pageprot    00            /* true if per page protections present */
  prot        0F            /* current segment prot if pageprot==0 */
  maxprot     0F            /* max segment protections*/
  type        02            /* type of sharing done */
  *vp         00000000      /* vnode that segment is mapped to */
  offset      00000000      /* starting offset of vnode for mapping */
  anon_index  00000000      /* starting index into anon_map anon array */
  *amp        D171D2D8      /* pointer to anon_map */
  *vpage      00000000      /* per-page information, if needed */
  *cred       D1256C00      /* pointer to credential structure */
  swresv      00002000      /* amount of swap reserved for this segment */
For now, we are only interested in the *amp address because it will show us the next VM structure which is has to do with anonymous pages. However, we will return to this structure when we disect segments at which time we will also discuss how vnodes relate to segments.

Unlike shared memory pages and stack pages, anonymous pages have no named file storage. Anonymous pages are associated with the swap device. There is an anon structure for each swap page on the system.

First we view the anon_map structure which uses the *amp pointer found in the segvn_data structure.

Figure 7 (struct anon_map)

  S>  anon_map D171D2D8  
  refcnt      00000001      /*reference count on this structure */ 
  size        00002000      /* size in bytes mapped by the anon array */      
  **anon      D16F7A10      /* pointer to an array of anon * pointers */
  swresv      00000000      /* swap space reserved for this anon_map */
  mutex      @D171D2E8      /* Multiprocessing lock for segment manipulation */
As explained in the comments, the size field is the size in bytes of the anonymous array and the swresv shows the amount of swap space reserved for this particular anon_map. For now, we want the pointer to the anon to use as an argument to the anon macro. Although one would think that we would use *D16F7A10 we do not.

Figure 8 (struct anon)

  S> anon D16F7A10
  an_refcnt             00000000      /* reference count */
  un_*an_page           D14CF820      /* union of page and anon */
  *an_bap               00000000      /* pointer to real anon */
  an_flag               0001          /* an_flag values */
  an_use                0000          /* used for debuggin */
So, the "un_*an_page" pointer is a union of two structures. The header file defines the union in this manner: (/usr/include/vm/anon.h)
  union {
         struct  page *an_page;  /* ``hint'' to the real page */
         struct  anon *an_next;  /* free list pointer */
        } un;
The an_flag values are defined in /usr/include/vm/anon.h as are the an_use values. Remember that the original pointer was **anon. So, we must run anon twice in order to get the pointer to the page structure:
  S> anon D14CF820
  an_refcnt             00000001
  un_*an_page           D10560E8
  *an_bap               00000000
  an_flag               0000
  an_use                0000
Now, we can use the "un_*an_page" as a argument to page. This is the last structure in the VM system that we will discuss.

Figure 9 (struct page)

  S> page D10560E8
  page [D10560E8]: MOD  REF
  nio 0000, keepcnt 000000, vnode D14B4A04, offset 00582000
  next D10560E8, prev D10560E8, vpnext D11B8064, vpprev D102C4A4
  mapping C4F7051C, lckcnt 00000000, cowcnt 00000000
In order of appearance, the page macro shows:
  page [D10560E8]: MOD  REF       /* the bits that are on */
  nio 0000                        /* number of outstanding io reqs needed */
  keepcnt 000000                  /* number of page `keeps' */
  vnode D14B4A04                  /* logical vnode this page is from */
  offset 00582000                 /* offset into vnode for this page */
  next D10560E8                   /* next page in free/intrans lists */
  prev D10560E8                   /* prev page in free/intrans lists */
  vpnext D11B8064                 /* next page in vnode list */
  vpprev D102C4A4                 /* prev page in vnode list */
  mapping C4F7051C                /* page mappings from phat struct */ 
  lckcnt 00000000                 /* number of locks on page data */
  cowcnt 00000000                 /* number of copy on write locks */ 
This page has the modify and reference bits set. See /usr/include/vm/page.h 'struct page' for all the bits that can be set as well as for information on the phat structure which defines the page mappings. There is more information on page mappings in /usr/include/vm/vm_hat.h. In the subsection titled _Paging and Swapping_, we will discuss the anon, page and vnode structures at length.

Now we are ready to get down to the nitty gritty details of the virtual memory system. We will start with the user address space.


Figure 10: User Address Space (HAT)

User Virtual Address Space

As depicted above, in the user address space, executables are mapped starting with their text segment at virtual address 0x08048000. Following the text sement, will be the data segment and then the bss segment if it exists. Each segment ends on a page boundary. The user stack starts at virtual address 0x08047000 and grows downward. Finally, shared objects (or libraries) and mapped objects are mapped starting at virtual address 0x80000000 and grow upward.

We can see all the segments of a given process and see how they are mapped and shared using kcrash. For the following example, I created an executable which includes several dynamic libraries but was also linked with two static libraries. After we look at this mixed executable, we will compare it to the completely dynamic executable.

Remember that static libraries are included in the size of the executable while dynamic libraries are allocated at runtime so we will only see dynamic libraries in the truss output.

The truss command becomes a valuable tool at this time. It is valuable because we can then put a name to the different segments of the process. For example, running "truss -o stat pops" gives us the following information:

  execve("pops", 0x08047CA8, 0x08047CB0)  argc = 1
  open("/dev/zero", O_RDONLY, 01001076274)        = 3
  mmap(0x00000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x8003C000
...
  open("/usr/lib/libXm.so.1.2", O_RDONLY, 01001073564) = 5
  read(5, "7F E L F010101\0\0\0\0\0".., 308)      = 308
  mmap(0x00000000, 1473456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x8003E000
  mmap(0x8003E000, 1347808, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 5, 0) =
  0x8003E000
  mmap(0x80188000, 117724, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED
  ,5, 1347584) = 0x80188000
  mprotect(0x801A5000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
...
In kcrash, we can use the mmap addresses to determine which segments are associated with each library. The mappings shown by kcrash should match those of the truss output.

First, we run the 'ps' macro to obtain the address of the process we are interested in. The name of the executable in this case is "pops":

  S>  ps 
  ADDRESS  PID   PPID  UID   FLAGS    K U R WCHAN    ST  COMMAND
  D1396400 06053 00001 00000 00102010 - - - D02F34CC SLEEP /usr/lib/saf/ttymon 
  D13EFE00 06052 06042 00103 00502010 - - - D02F34CC SLEEP ./pops
  D13EEE00 06042 06041 00103 00102010 - - - D13EEE00 SLEEP -ksh
...
Then we run the 'procvm' macro to get the address of 'as' as explained in the overview:
  S>  procvm D13EFE00 
    *p_seguslo    D10EF968
    *p_segu       F0490000
    *p_as         D12D34E0
    *p_trace      00000000
    *p_exec       D1253E98
    p_pri         0000004E
    p_usize       0002
  u.u_psargs = "./pops "
Next we use the 'as' macro to obtain the address of the segment list:
  S>  as D12D34E0 
  as [D12D34E0]: keepcnt 000000 segs D14FCF80 seglast D13EFBE0 sz 30E000 rss 1CA
    hat: pts D14F98C0 ptlast D14F9500 pdtp 00000000 cr3 D01C88B0 ref 0
Finally, as promised, we get to see all the segments of the process by using the macro "seglst" and giving it the address in "segs" above.
  S> seglst D14FCF80 
  ADDRESS    DATA     BASE   NPGS MAP PROT  VNODE    OFFSET  ANON_MAP  SWPRESV
  D14FCF80 D14FC920 08041000 0007  02  0F  00000000 FFFFB000 D1531000  00007000
  D14FCF60 D14FC8FC 08048000 0055  02  0D  D1253E98 00000000 00000000  00000000
  D13EF900 D13EFD68 0807F000 0005  02  0F  D1253E98 00036000 D13EB188  00005000
  D1337920 D1337CFC 08084000 0048  02  0F  00000000 00000000 D1531150  00030000
  D13EFBC0 D1322290 80000000 0056  02  0D  D122B4E8 00000000 00000000  00000000
  D13EFBA0 D132226C 80038000 0002  02  0F  D122B4E8 00038000 D14FD3B8  00002000
  D13EFB80 D1322248 8003A000 0001  02  0F  00000000 00000000 D15310A8  00001000
  D152F040 D1322320 8003C000 0001  02  0B  00000000 00000000 D13DBAD8  00001000
  D13EFBE0 D13222B4 8003E000 0330  02  0D  D1241518 00000000 00000000  00000000
  D152F0E0 D13223D4 80188000 0029  02  0F  D1241518 00149000 D13DB8E0  0001D000
  D152F020 D13222FC 801A5000 0001  02  0F  00000000 00000000 D15310E0  00001000
  D152F100 D13EBE00 801A7000 0095  02  0D  D124CCD8 00000000 00000000  00000000
  D1337800 D14FC9B0 80206000 0003  02  0F  D124CCD8 0005E000 D1531038  00003000
  D152F140 D13EBE48 8020A000 0049  02  0D  D1236CA8 00000000 00000000  00000000
  D152F160 D13EBE6C 8023B000 0005  02  0F  D1236CA8 00030000 D15311C0  00005000
  D152F120 D13EBE24 80240000 0004  02  0F  00000000 00000000 D15311F8  00004000
  D152F1A0 D13EBEB4 80245000 0062  02  0D  D1240A28 00000000 00000000  00000000
  D152F000 D13222D8 80283000 0004  02  0F  D1240A28 0003D000 D1531188  00004000
  D152F180 D13EBE90 80287000 0001  02  0F  00000000 00000000 00000000  00001000
  D152F080 D1322368 80289000 0016  02  0D  D123D148 00000000 00000000  00000000
  D152F060 D1322344 80299000 0001  02  0F  D123D148 0000F000 D1531118  00001000
  D152F0C0 D13223B0 8029A000 0002  02  0F  00000000 00000000 00000000  00002000
  D152F1C0 D13EBED8 8029D000 0004  02  0D  D1221308 00000000 00000000  00000000
  D152F0A0 D132238C 802A1000 0001  02  0F  D1221308 00003000 D1531230  00001000
The address is the individual segment address. The data address is a pointer to the segvn_data structure. The base address is the base virtual address of the segment which matches the address given by truss. The 'npgs' gives the number of pages used by the process. If NPGS is multiplied by 4096, it should be equal to the value given by 'sz' from the address space macro. The MAP and PROT are the file mappings and protections which we will discuss next. Then the vnode of the segment is given, the offset, the pointer to the anon_map and lastly the amount of swap that has been reserved for the segment.

The first segment is the user stack. We can see this by the base address 08041000. The user stack is using 7 pages. A "MAP" value of 02 means that the segment is MAP_PRIVATE and the 0F indicates that the segment has user, executable, write and read (UEWR) permissions. There is also swap associated with this segment.

See the mman.h include file for all the mappings and protections. The mapping will either be shared or private.

The next segment should be the text segment of the executable. We should see that this segment is shareable. The base address is 08048000 which is correct. There are 55 pages, mapping is 02 or MAP_PRIVATE and the permissions are OD or UE-R. No swap has been reserved. All of these add up to the fact that this text segment is shareable. If there were swap associated with the segment, it would not be shareable.

Following the text segment should be the data segment. We would not expect the data segment to be sharable. We find that the segment starting at base 0807F000 is MAP_PRIVATE, UEWR permissions and has swap reserved. So, this must be the data segment.

The last segment of the process should be the bss if it exists. Since we have a segment starting at 08084000 it must be BSS. It has 48 pages, is mapped MAP_PRIVATE, has UEWR permissions and has swap associated with it. It is not shared.

The library starting at 80000000 which is not listed in the truss is the text segment of libc.so.1. It is followed by its data and bss segments. At 8003C000 is the run time link editor (rtld) which has permissions of 0B or U-WR. These four segments will always be mapped the same way.

Now, the next segment should be the text segment of libXm.so.1.2 which is the first library listed in the truss output. Looking at the truss information, we would expect it to have a beginning address of 0x8003E000. It does. Further analysis shows us that the segment is shared as we would expect and the two segments following appear to be the associated data and bss segments.

By again referring to the truss output, we would expect the next text segment to begin at 0x801A7000 and so on.

The static executable has a total size of 30E000 or 782 pages (approx 3.2M). Now let us compare it to the completely dynamically linked executable.

Again, starting with 'truss -o dyn popd':

  execve("popd", 0x08047CA8, 0x08047CB0)  argc = 1
  open("/dev/zero", O_RDONLY, 01001076274)        = 3
  mmap(0x00000000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x8003C000
...
  open("/usr/lib/libXm.so.1.2", O_RDONLY, 01001073564) = 5
  read(5, "7F E L F010101\0\0\0\0\0".., 308)      = 308
  mmap(0x00000000, 1473456, PROT_READ, MAP_PRIVATE, 3, 0) = 0x8003E000
  mmap(0x8003E000, 1347808, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 5, 0) = 
  0x8003E000
  mmap(0x80188000, 117724, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED
  ,5, 1347584) = 0x80188000
  mprotect(0x801A5000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
...
And get all the necessary kcrash information:
  S>  ps 
  ADDRESS  PID   PPID  UID   FLAGS    K U R WCHAN    ST  COMMAND
  D14BEE00 08881 08880 00000 00102010 - - 0          ONPROC kcrash -k /dev/mem 
  D146A800 08880 08838 00000 00102010 - - - D146A800 SLEEP ksh
  D15C6800 08879 08808 00103 00502010 - - - D02F34CC SLEEP popd

S> procvm D15C6800 *p_seguslo D10EDB54 *p_segu F0A54000 *p_as D15CDDE0 *p_trace 00000000 *p_exec D12161D8 p_pri 0000004E p_usize 0002 u.u_psargs = "popd " S> as D15CDDE0 as [D15CDDE0]: keepcnt 000000 segs D1551F60 seglast D15406E0 sz 2D3000 rss 24E hat: pts D159C320 ptlast D159C320 pdtp 00000000 cr3 D15CD5C0 ref -782443648

S> seglst D1551F60 ADDRESS DATA BASE NPGS MAP PROT VNODE OFFSET ANON_MAP SWPRESV D1551F60 D1550A48 08042000 0006 02 0F 00000000 FFFFC000 D15529C0 00006000 D15B1BC0 D15C0C24 08048000 0005 02 0D D12161D8 00000000 00000000 00000000 D15407C0 D1552224 0804D000 0001 02 0F D12161D8 00004000 D1552B80 00001000 D15406E0 D1552520 0804E000 0049 02 0F 00000000 00000000 D1551268 00031000 D15406A0 D15524D8 80000000 0056 02 0D D12363E8 00000000 00000000 00000000 D15407E0 D1552248 80038000 0002 02 0F D12363E8 00038000 D15C7348 00002000 D137F2C0 D15602B4 8003A000 0001 02 0F 00000000 00000000 D1551000 00001000 D1551C80 D15522FC 8003C000 0001 02 0B 00000000 00000000 D1551230 00001000 D1540700 D1552544 8003E000 0330 02 0D D121F8C8 00000000 00000000 00000000 D137F220 D1560200 80188000 0029 02 0F D121F8C8 00149000 D1579838 0001D000 D15C0AE0 D15C7BB0 801A5000 0001 02 0F 00000000 00000000 D15B12A0 00001000 D15C0920 D15C0DB0 801A7000 0062 02 0D D1226628 00000000 00000000 00000000 D155F0C0 D153DAFC 801E5000 0004 02 0F D1226628 0003D000 D15513B8 00004000 D1542CC0 D1542824 801E9000 0001 02 0F 00000000 00000000 D15C71C0 00001000 D15B19A0 D15B1FB0 801EB000 0095 02 0D D1251738 00000000 00000000 00000000 D1551D60 D1551800 8024A000 0003 02 0F D1251738 0005E000 D1551118 00003000 D15C58A0 D15C6BB0 8024E000 0016 02 0D D122BDA8 00000000 00000000 00000000 D15C5880 D15C6B8C 8025E000 0001 02 0F D122BDA8 0000F000 D1552AA0 00001000 D15C0900 D15C0D8C 8025F000 0002 02 0F 00000000 00000000 D159DB48 00002000 D1551C00 D155226C 80262000 0049 02 0D D1247328 00000000 00000000 00000000 D1551D80 D1551824 80293000 0005 02 0F D1247328 00030000 D155B0A8 00005000 D15C0B80 D15C6A6C 80298000 0004 02 0F 00000000 00000000 D15513F0 00004000

We can first see a difference by comparing the 'sz' value given in the output from the as macro. We see that 723 pages are used by the second executable. The text segment of the dynamic executable (base address 08048000) only uses 5 pages versus 55. Which is a savings of over 200000 bytes per execution.

File and Memory Mapping

Access Privileges

There are two levels of access privileges when a virtual memory object is mapped into the virtual address space of a process. The first is the access to the file which is file system dependent. It is initially established by open(2). i.e. read-write-execute permissions.

Access privileges or protections to the mapped pages are chosen by "or-ing" together the following bits. Note that a write will not succeed unless PROT_WRITE has been set. If PROT_NONE has been set, then no access will be allowed.


  PROT_READ       0x1             /* pages can be read */
  PROT_WRITE      0x2             /* pages can be written */
  PROT_EXEC       0x4             /* pages can be executed */
  PROT_USER       0x8             /* pages are user accessable */
  PROT_ALL        (PROT_READ | PROT_WRITE | PROT_EXEC | PROT_USER)
  PROT_NONE       0x0             /* pages cannot be accessed */

Table 1 - File Access Privileges


Mapping Type

There are only two mapping types. Only one may be specified. MAP_SHARED allows changes to the virtual memory object while MAP_PRIVATE creates a private copy of the memory object (copy-on-write) and does not change the underlying virtual memory object.

A file with read access permissions may be specifed as MAP_PRIVATE with PROT_WRITE, but write access permissions are necessary to declare an object as MAP_SHARED with PROT_WRITE.

Mappings are retained across a fork. (see fork(2))

Copy on Write

If a file is mapped using MAP_PRIVATE an update to the file by either the parent or the child process will cause a "copy on write fault" to be processed by the kernel. It is only at this time that a copy of the original page is created. This saves time as the kernel is only copying pages that have been modified.

Kernel Virtual Address Space

Note: This section will describe the kernel address space for the 1.3 and 1.4 operating system versions. 1.3.1 and 1.3.2 were implementation/ hardware dependent versions and have been superceded by version 1.4.


 kpioseg  - This segment is used for physical, usually disk, i/o and
            allows a physical i/o buffer to be passed to a driver
            strategy routine to perform direct i/o to an address 
            space.  Currently, this is only used by the vxfs driver.

kpseg - This segment is used by the kernel to map physical addresses to virtual addresses.

kpseg2 - This was used to support up to 764MB of memory when we mapped virtual memory to physical memory 1-to-1. It is not used in 1.4.

ktextseg - This segment maps kernel text, data and bss.

kvseg - or sptmap. This is the segment used for dynamic kernel memory allocation, i.e. kmem_alloc() or sptalloc(). For more information on sptmap, please reference the following documents: Kernel Tunable Parameters

SYSSEGSZ definition and usage

segkmap - or kvsegmap. The segment used to implement the I/O page cache. The I/O page cache is used by the file system code.

segu - This is the user-block segment.

Figure 11.1.3 Kernel Virtual Address Space for 1.3 To see the kernel address space, first use:
  S> as kas
  as [D034B87C]: keepcnt 000000 segs D0349CB8 seglast D10DD800 sz 0 rss 627
  hat: pts D111F1E0 ptlast D111F1E0 pdtp 00000000 cr3 00000000 ref -787642496
Then, you will use the segn macro and give it the address of segs above:
  S> segn D0349CB8
    addr     base      end    size    as      data   physical
  D0349CB8 C0000000 CFFFFFFF 10000000 D034B87C C0000000 0000000
  D0312F0C D0010000 D03C24E7 3B24E8 D034B87C D0010000 0010000
  D0393EF4 D1000000 D2FFFFFF 2000000 D034B87C D1000000 0439000
  D02A9114 D5000000 D53FFFFF 400000 D034B87C D5000000
  D10DD800 D5400000 D5BFFFFF 800000 D034B87C D10D8FC0 274D000
  D039393C E0400000 F03FFFFF 10000000 D034B87C E0400000
  D10DD820 F0400000 FF7FFFFF F400000 D034B87C D10D8F80
To find the names of the segments, we can use 'dl addr' like this:
  S> dl D0349CB8
  kpseg:  00000000 C0000000 10000000 D034B87C  ............|.4.
If we do this for each kas address segment, we get a table like:

              addr     base      end    
  kpseg:    D0349CB8 C0000000 CFFFFFFF 
  ktextseg: D0312F0C D0010000 D03991D7
  kvseg:    D0393EF4 D1000000 D2FFFFFF 
  kpioseg:  D02A9114 D5000000 D53FFFFF 
  segkmap:  D10DD800 D5400000 D5BFFFFF 
  kpseg2:   D039393C E0400000 F03FFFFF 
  segu:     D10DD820 F0400000 FF7FFFFF 

Table 2 - Kernel Address Space 1.3


Figure 11.1.4 Kernel Virtual Address Space for 1.4 For 1.4, the table will look like this:

              addr     base      end    
  kpseg:    C13C64E0 C0000000 C0FFFFFF 
  ktextseg: C137F670 C1010000 C14A0E37
  kpioseg:  C132F5A4 CA000000 CAFFFFFF
  segkmap:  D52BBC00 CB000000 CBFFFFFF
  segu:     D52BBC20 CC000000 D4FFFFFF
  kvseg:    C1433000 D5000000 FEBFFFFF

Table 3 - Kernel Address Space 1.4


Paging and Swapping

There are several whitepapers on our web page that discuss paging, swapping and memory usage so here, I would like to cover some macros which should be helpful in analyzing those areas.

The first is widely used - kmeminfo: Figure 12 (struct kmeminfo)


  S> kmeminfo
     km_mem[0]    000C4000         /*small KMEM request index*/
     km_mem[1]    003F4000         /*large KMEM request index*/
     km_mem[2]    00000000         /*outsize KMEM request index*/
     km_alloc[0]  00098E20         /*amount of small KMEM allocated*/
     km_alloc[1]  00305400         /*amount of large KMEM allocated*/
     km_alloc[2]  0029D000         /*amount of outsize KMEM allocated*/
     km_fail[0]   00000000         /*number of small KMEM failures*/
     km_fail[1]   00000000         /*number of large KMEM failures*/
     km_fail[2]   00000000         /*number of outsize KMEM failures*/
The kmeminfo struct has three groups of three fields. The meanings are: km_mem contains the memory (in bytes) which is currently in an allocation pool, km_alloc is how much of that pool is currently allocated and km_fail indicates how many allocation requests from that pool have been rejected.

The fields in slot 0 (zero), apply to the KMEM_SMALL pool (under 256 byte allocations), slot 1 applies to KMEM_LARGE (up to 16384 bytes) and slot 2 is for KMEM_OSIZE (anything larger than 16384). For oversize allocations, there is no pool. Requests are rounded up to a page boundary and the page allocator is called directly. This means km_mem[2] must always be zero. If not, the kmeminfo struct must be corrupt.

The swapinfo structure is useful for viewing how much swap has been configured on the system, finding the actual names of the swap device(s), and determining how much swap is being used. Figure 13 (struct swapinfo)


  S> swapinfo
  vnode *si_vp         D12BE604       /* vnode for this swap device */
  vnode *si_svp        D12BE704       /* svnode for this swap device */
  uint    si_soff      00000000       /* starting offset (bytes) of file */
  uint    si_eoff      02FFD000       /* ending offset (bytes) of file */
  anon *si_anon        D12D0000       /* pointer to anon array */
  anon *si_eanon       D12FFFC0       /* pointer to end of anon array */
  anon *si_free        D12D65D0       /* anon free list for this vp */
  int     si_allocs    0000000030     /* # of conseq. allocs from this area */
  swapinfo *si_next    D1331340       /* next swap area */
  short   si_flags     0000           /* deletion flags */
  ulong   si_npgs      0000012285     /* number of pages of swap space */
  ulong   si_nfpgs     0000010747     /* number of free pages of swap space */
  char    *si_pname    /dev/swap
The si_vp pointer allows one to use the vnode macro to obtain more information about the swap device and the anon pointers, of course, allow one to view the anon information for swap. In this case, there is a second swap device. See the /usr/include/sys/swap.h file for the flag definitions listed under ste_flags. The number of pages is given in decimal 4k pages. The macro prints out all swap devices defined, the second swap device information is not included here.

Related macros minfo, mpinfo, sysinfo and vminfo are documented in the Alphabetic Index of Macros. Back to Part2 Contents