DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk.info) Extract Program

Info Catalog (gawk.info) History Sorting (gawk.info) Miscellaneous Programs (gawk.info) Simple Sed
 
 Extracting Programs from Texinfo Source Files
 ---------------------------------------------
 
    The nodes  A Library of `awk' Functions Library Functions, and
  Practical `awk' Programs Sample Programs, are the top level
 nodes for a large number of `awk' programs.  If you wish to experiment
 with these programs, it is tedious to have to type them in by hand.
 Here we present a program that can extract parts of a Texinfo input
 file into separate files.
 
    This Info file is written in Texinfo, the GNU project's document
 formatting language.  A single Texinfo source file can be used to
 produce both printed and on-line documentation.  The Texinfo language
 is described fully, starting with  Introduction (texi)Top.
 
    For our purposes, it is enough to know three things about Texinfo
 input files.
 
    * The "at" symbol, `@', is special in Texinfo, much like `\' in C or
      `awk'.  Literal `@' symbols are represented in Texinfo source
      files as `@@'.
 
    * Comments start with either `@c' or `@comment'.  The file
      extraction program will work by using special comments that start
      at the beginning of a line.
 
    * Example text that should not be split across a page boundary is
      bracketed between lines containing `@group' and `@end group'
      commands.
 
    The following program, `extract.awk', reads through a Texinfo source
 file, and does two things, based on the special comments.  Upon seeing
 `@c system ...', it runs a command, by extracting the command text from
 the control line and passing it on to the `system' function (
 Built-in Functions for Input/Output I/O Functions.).  Upon seeing `@c
 file FILENAME', each subsequent line is sent to the file FILENAME,
 until `@c endfile' is encountered.  The rules in `extract.awk' will
 match either `@c' or `@comment' by letting the `omment' part be
 optional.  Lines containing `@group' and `@end group' are simply
 removed.  `extract.awk' uses the `join' library function ( Merging
 an Array Into a String Join Function.).
 
    The example programs in the on-line Texinfo source for `Effective
 AWK Programming' (`gawk.texi') have all been bracketed inside `file',
 and `endfile' lines.  The `gawk' distribution uses a copy of
 `extract.awk' to extract the sample programs and install many of them
 in a standard directory, where `gawk' can find them.  The Texinfo file
 looks something like this:
 
      ...
      This program has a @code{BEGIN} block,
      which prints a nice message:
      
      @example
      @c file examples/messages.awk
      BEGIN @{ print "Don't panic!" @}
      @c end file
      @end example
      
      It also prints some final advice:
      
      @example
      @c file examples/messages.awk
      END @{ print "Always avoid bored archeologists!" @}
      @c end file
      @end example
      ...
 
    `extract.awk' begins by setting `IGNORECASE' to one, so that mixed
 upper-case and lower-case letters in the directives won't matter.
 
    The first rule handles calling `system', checking that a command was
 given (`NF' is at least three), and also checking that the command
 exited with a zero exit status, signifying OK.
 
      # extract.awk --- extract files and run programs
      #                 from texinfo files
      # Arnold Robbins, arnold@gnu.org, Public Domain, May 1993
      
      BEGIN    { IGNORECASE = 1 }
      
      /^@c(omment)?[ \t]+system/    \
      {
          if (NF < 3) {
              e = (FILENAME ":" FNR)
              e = (e  ": badly formed `system' line")
              print e > "/dev/stderr"
              next
          }
          $1 = ""
          $2 = ""
          stat = system($0)
          if (stat != 0) {
              e = (FILENAME ":" FNR)
              e = (e ": warning: system returned " stat)
              print e > "/dev/stderr"
          }
      }
 
 The variable `e' is used so that the function fits nicely on the screen.
 
    The second rule handles moving data into files.  It verifies that a
 file name was given in the directive.  If the file named is not the
 current file, then the current file is closed.  This means that an `@c
 endfile' was not given for that file.  (We should probably print a
 diagnostic in this case, although at the moment we do not.)
 
    The `for' loop does the work.  It reads lines using `getline' (
 Explicit Input with `getline' Getline.).  For an unexpected end of
 file, it calls the `unexpected_eof' function.  If the line is an
 "endfile" line, then it breaks out of the loop.  If the line is an
 `@group' or `@end group' line, then it ignores it, and goes on to the
 next line.  (These Texinfo control lines keep blocks of code together
 on one page; unfortunately, TeX isn't always smart enough to do things
 exactly right, and we have to give it some advice.)
 
    Most of the work is in the following few lines.  If the line has no
 `@' symbols, it can be printed directly.  Otherwise, each leading `@'
 must be stripped off.
 
    To remove the `@' symbols, the line is split into separate elements
 of the array `a', using the `split' function ( Built-in Functions
 for String Manipulation String Functions.).  Each element of `a' that
 is empty indicates two successive `@' symbols in the original line.
 For each two empty elements (`@@' in the original file), we have to add
 back in a single `@' symbol.
 
    When the processing of the array is finished, `join' is called with
 the value of `SUBSEP', to rejoin the pieces back into a single line.
 That line is then printed to the output file.
 
      /^@c(omment)?[ \t]+file/    \
      {
          if (NF != 3) {
              e = (FILENAME ":" FNR ": badly formed `file' line")
              print e > "/dev/stderr"
              next
          }
          if ($3 != curfile) {
              if (curfile != "")
                  close(curfile)
              curfile = $3
          }
      
          for (;;) {
              if ((getline line) <= 0)
                  unexpected_eof()
              if (line ~ /^@c(omment)?[ \t]+endfile/)
                  break
              else if (line ~ /^@(end[ \t]+)?group/)
                  continue
              if (index(line, "@") == 0) {
                  print line > curfile
                  continue
              }
              n = split(line, a, "@")
              # if a[1] == "", means leading @,
              # don't add one back in.
              for (i = 2; i <= n; i++) {
                  if (a[i] == "") { # was an @@
                      a[i] = "@"
                      if (a[i+1] == "")
                          i++
                  }
              }
              print join(a, 1, n, SUBSEP) > curfile
          }
      }
 
    An important thing to note is the use of the `>' redirection.
 Output done with `>' only opens the file once; it stays open and
 subsequent output is appended to the file ( Redirecting Output of
 `print' and `printf' Redirection.).  This allows us to easily mix
 program text and explanatory prose for the same sample source file (as
 has been done here!) without any hassle.  The file is only closed when
 a new data file name is encountered, or at the end of the input file.
 
    Finally, the function `unexpected_eof' prints an appropriate error
 message and then exits.
 
    The `END' rule handles the final cleanup, closing the open file.
 
      function unexpected_eof()
      {
          printf("%s:%d: unexpected EOF or error\n", \
              FILENAME, FNR) > "/dev/stderr"
          exit 1
      }
      
      END {
          if (curfile)
              close(curfile)
      }
 
Info Catalog (gawk.info) History Sorting (gawk.info) Miscellaneous Programs (gawk.info) Simple Sed
automatically generated byinfo2html