DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk.info) Nextfile Function

Info Catalog (gawk.info) Portability Notes (gawk.info) Library Functions (gawk.info) Assert Function
 
 Implementing `nextfile' as a Function
 =====================================
 
    The `nextfile' statement presented in  The `nextfile'
 Statement Nextfile Statement, is a `gawk'-specific extension.  It is
 not available in other implementations of `awk'.  This section shows
 two versions of a `nextfile' function that you can use to simulate
 `gawk''s `nextfile' statement if you cannot use `gawk'.
 
    Here is a first attempt at writing a `nextfile' function.
 
      # nextfile --- skip remaining records in current file
      
      # this should be read in before the "main" awk program
      
      function nextfile()    { _abandon_ = FILENAME; next }
      
      _abandon_ == FILENAME  { next }
 
    This file should be included before the main program, because it
 supplies a rule that must be executed first.  This rule compares the
 current data file's name (which is always in the `FILENAME' variable)
 to a private variable named `_abandon_'.  If the file name matches,
 then the action part of the rule executes a `next' statement, to go on
 to the next record.  (The use of `_' in the variable name is a
 convention.  It is discussed more fully in  Naming Library
 Function Global Variables Library Names.)
 
    The use of the `next' statement effectively creates a loop that reads
 all the records from the current data file.  Eventually, the end of the
 file is reached, and a new data file is opened, changing the value of
 `FILENAME'.  Once this happens, the comparison of `_abandon_' to
 `FILENAME' fails, and execution continues with the first rule of the
 "real" program.
 
    The `nextfile' function itself simply sets the value of `_abandon_'
 and then executes a `next' statement to start the loop going.(1)
 
    This initial version has a subtle problem.  What happens if the same
 data file is listed _twice_ on the command line, one right after the
 other, or even with just a variable assignment between the two
 occurrences of the file name?
 
    In such a case, this code will skip right through the file, a second
 time, even though it should stop when it gets to the end of the first
 occurrence.  Here is a second version of `nextfile' that remedies this
 problem.
 
      # nextfile --- skip remaining records in current file
      # correctly handle successive occurrences of the same file
      # Arnold Robbins, arnold@gnu.org, Public Domain
      # May, 1993
      
      # this should be read in before the "main" awk program
      
      function nextfile()   { _abandon_ = FILENAME; next }
      
      _abandon_ == FILENAME {
            if (FNR == 1)
                _abandon_ = ""
            else
                next
      }
 
    The `nextfile' function has not changed.  It sets `_abandon_' equal
 to the current file name and then executes a `next' satement.  The
 `next' statement reads the next record and increments `FNR', so `FNR'
 is guaranteed to have a value of at least two.  However, if `nextfile'
 is called for the last record in the file, then `awk' will close the
 current data file and move on to the next one.  Upon doing so,
 `FILENAME' will be set to the name of the new file, and `FNR' will be
 reset to one.  If this next file is the same as the previous one,
 `_abandon_' will still be equal to `FILENAME'.  However, `FNR' will be
 equal to one, telling us that this is a new occurrence of the file, and
 not the one we were reading when the `nextfile' function was executed.
 In that case, `_abandon_' is reset to the empty string, so that further
 executions of this rule will fail (until the next time that `nextfile'
 is called).
 
    If `FNR' is not one, then we are still in the original data file,
 and the program executes a `next' statement to skip through it.
 
    An important question to ask at this point is: "Given that the
 functionality of `nextfile' can be provided with a library file, why is
 it built into `gawk'?"  This is an important question.  Adding features
 for little reason leads to larger, slower programs that are harder to
 maintain.
 
    The answer is that building `nextfile' into `gawk' provides
 significant gains in efficiency.  If the `nextfile' function is executed
 at the beginning of a large data file, `awk' still has to scan the
 entire file, splitting it up into records, just to skip over it.  The
 built-in `nextfile' can simply close the file immediately and proceed
 to the next one, saving a lot of time.  This is particularly important
 in `awk', since `awk' programs are generally I/O bound (i.e.  they
 spend most of their time doing input and output, instead of performing
 computations).
 
    ---------- Footnotes ----------
 
    (1) Some implementations of `awk' do not allow you to execute `next'
 from within a function body. Some other work-around will be necessary
 if you use such a version.
 
Info Catalog (gawk.info) Portability Notes (gawk.info) Library Functions (gawk.info) Assert Function
automatically generated byinfo2html