DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk.info) Filetrans Function

Info Catalog (gawk.info) Gettimeofday Function (gawk.info) Library Functions (gawk.info) Getopt Function
 
 Noting Data File Boundaries
 ===========================
 
    The `BEGIN' and `END' rules are each executed exactly once, at the
 beginning and end respectively of your `awk' program ( The `BEGIN'
 and `END' Special Patterns BEGIN/END.).  We (the `gawk' authors) once
 had a user who mistakenly thought that the `BEGIN' rule was executed at
 the beginning of each data file and the `END' rule was executed at the
 end of each data file.  When informed that this was not the case, the
 user requested that we add new special patterns to `gawk', named
 `BEGIN_FILE' and `END_FILE', that would have the desired behavior.  He
 even supplied us the code to do so.
 
    However, after a little thought, I came up with the following
 library program.  It arranges to call two user-supplied functions,
 `beginfile' and `endfile', at the beginning and end of each data file.
 Besides solving the problem in only nine(!) lines of code, it does so
 _portably_; this will work with any implementation of `awk'.
 
      # transfile.awk
      #
      # Give the user a hook for filename transitions
      #
      # The user must supply functions beginfile() and endfile()
      # that each take the name of the file being started or
      # finished, respectively.
      #
      # Arnold Robbins, arnold@gnu.org, January 1992
      # Public Domain
      
      FILENAME != _oldfilename \
      {
          if (_oldfilename != "")
              endfile(_oldfilename)
          _oldfilename = FILENAME
          beginfile(FILENAME)
      }
      
      END   { endfile(FILENAME) }
 
    This file must be loaded before the user's "main" program, so that
 the rule it supplies will be executed first.
 
    This rule relies on `awk''s `FILENAME' variable that automatically
 changes for each new data file.  The current file name is saved in a
 private variable, `_oldfilename'.  If `FILENAME' does not equal
 `_oldfilename', then a new data file is being processed, and it is
 necessary to call `endfile' for the old file.  Since `endfile' should
 only be called if a file has been processed, the program first checks
 to make sure that `_oldfilename' is not the null string.  The program
 then assigns the current file name to `_oldfilename', and calls
 `beginfile' for the file.  Since, like all `awk' variables,
 `_oldfilename' will be initialized to the null string, this rule
 executes correctly even for the first data file.
 
    The program also supplies an `END' rule, to do the final processing
 for the last file.  Since this `END' rule comes before any `END' rules
 supplied in the "main" program, `endfile' will be called first.  Once
 again the value of multiple `BEGIN' and `END' rules should be clear.
 
    This version has same problem as the first version of `nextfile'
 ( Implementing `nextfile' as a Function Nextfile Function.).  If
 the same data file occurs twice in a row on command line, then
 `endfile' and `beginfile' will not be executed at the end of the first
 pass and at the beginning of the second pass.  This version solves the
 problem.
 
      # ftrans.awk --- handle data file transitions
      #
      # user supplies beginfile() and endfile() functions
      #
      # Arnold Robbins, arnold@gnu.org, November 1992
      # Public Domain
      
      FNR == 1 {
          if (_filename_ != "")
              endfile(_filename_)
          _filename_ = FILENAME
          beginfile(FILENAME)
      }
      
      END  { endfile(_filename_) }
 
    In  Counting Things Wc Program, you will see how this library
 function can be used, and how it simplifies writing the main program.
 
Info Catalog (gawk.info) Gettimeofday Function (gawk.info) Library Functions (gawk.info) Getopt Function
automatically generated byinfo2html