DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk) Egrep Program

Info Catalog (gawk) Cut Program (gawk) Clones (gawk) Id Program
 
 Searching for Regular Expressions in Files
 ------------------------------------------
 
    The `egrep' utility searches files for patterns.  It uses regular
 expressions that are almost identical to those available in `awk'
 ( Regular Expression Constants Regexp Constants.).  It is used
 this way:
 
      egrep [ OPTIONS ] 'PATTERN' FILES ...
 
    The PATTERN is a regexp.  In typical usage, the regexp is quoted to
 prevent the shell from expanding any of the special characters as file
 name wildcards.  Normally, `egrep' prints the lines that matched.  If
 multiple file names are provided on the command line, each output line
 is preceded by the name of the file and a colon.
 
    The options are:
 
 `-c'
      Print out a count of the lines that matched the pattern, instead
      of the lines themselves.
 
 `-s'
      Be silent.  No output is produced, and the exit value indicates
      whether or not the pattern was matched.
 
 `-v'
      Invert the sense of the test. `egrep' prints the lines that do
      _not_ match the pattern, and exits successfully if the pattern was
      not matched.
 
 `-i'
      Ignore case distinctions in both the pattern and the input data.
 
 `-l'
      Only print the names of the files that matched, not the lines that
      matched.
 
 `-e PATTERN'
      Use PATTERN as the regexp to match.  The purpose of the `-e'
      option is to allow patterns that start with a `-'.
 
    This version uses the `getopt' library function ( Processing
 Command Line Options Getopt Function.), and the file transition
 library program ( Noting Data File Boundaries Filetrans
 Function.).
 
    The program begins with a descriptive comment, and then a `BEGIN'
 rule that processes the command line arguments with `getopt'.  The `-i'
 (ignore case) option is particularly easy with `gawk'; we just use the
 `IGNORECASE' built in variable ( Built-in Variables).
 
      # egrep.awk --- simulate egrep in awk
      # Arnold Robbins, arnold@gnu.org, Public Domain
      # May 1993
      
      # Options:
      #    -c    count of lines
      #    -s    silent - use exit value
      #    -v    invert test, success if no match
      #    -i    ignore case
      #    -l    print filenames only
      #    -e    argument is pattern
      
      BEGIN {
          while ((c = getopt(ARGC, ARGV, "ce:svil")) != -1) {
              if (c == "c")
                  count_only++
              else if (c == "s")
                  no_print++
              else if (c == "v")
                  invert++
              else if (c == "i")
                  IGNORECASE = 1
              else if (c == "l")
                  filenames_only++
              else if (c == "e")
                  pattern = Optarg
              else
                  usage()
          }
 
    Next comes the code that handles the `egrep' specific behavior. If no
 pattern was supplied with `-e', the first non-option on the command
 line is used.  The `awk' command line arguments up to `ARGV[Optind]'
 are cleared, so that `awk' won't try to process them as files.  If no
 files were specified, the standard input is used, and if multiple files
 were specified, we make sure to note this so that the file names can
 precede the matched lines in the output.
 
    The last two lines are commented out, since they are not needed in
 `gawk'.  They should be uncommented if you have to use another version
 of `awk'.
 
          if (pattern == "")
              pattern = ARGV[Optind++]
      
          for (i = 1; i < Optind; i++)
              ARGV[i] = ""
          if (Optind >= ARGC) {
              ARGV[1] = "-"
              ARGC = 2
          } else if (ARGC - Optind > 1)
              do_filenames++
      
      #    if (IGNORECASE)
      #        pattern = tolower(pattern)
      }
 
    The next set of lines should be uncommented if you are not using
 `gawk'.  This rule translates all the characters in the input line into
 lower-case if the `-i' option was specified.  The rule is commented out
 since it is not necessary with `gawk'.
 
      #{
      #    if (IGNORECASE)
      #        $0 = tolower($0)
      #}
 
    The `beginfile' function is called by the rule in `ftrans.awk' when
 each new file is processed.  In this case, it is very simple; all it
 does is initialize a variable `fcount' to zero. `fcount' tracks how
 many lines in the current file matched the pattern.
 
      function beginfile(junk)
      {
          fcount = 0
      }
 
    The `endfile' function is called after each file has been processed.
 It is used only when the user wants a count of the number of lines that
 matched.  `no_print' will be true only if the exit status is desired.
 `count_only' will be true if line counts are desired.  `egrep' will
 therefore only print line counts if printing and counting are enabled.
 The output format must be adjusted depending upon the number of files
 to be processed.  Finally, `fcount' is added to `total', so that we
 know how many lines altogether matched the pattern.
 
      function endfile(file)
      {
          if (! no_print && count_only)
              if (do_filenames)
                  print file ":" fcount
              else
                  print fcount
      
          total += fcount
      }
 
    This rule does most of the work of matching lines. The variable
 `matches' will be true if the line matched the pattern. If the user
 wants lines that did not match, the sense of the `matches' is inverted
 using the `!' operator. `fcount' is incremented with the value of
 `matches', which will be either one or zero, depending upon a
 successful or unsuccessful match.  If the line did not match, the
 `next' statement just moves on to the next record.
 
    There are several optimizations for performance in the following few
 lines of code. If the user only wants exit status (`no_print' is true),
 and we don't have to count lines, then it is enough to know that one
 line in this file matched, and we can skip on to the next file with
 `nextfile'.  Along similar lines, if we are only printing file names,
 and we don't need to count lines, we can print the file name, and then
 skip to the next file with `nextfile'.
 
    Finally, each line is printed, with a leading filename and colon if
 necessary.
 
      {
          matches = ($0 ~ pattern)
          if (invert)
              matches = ! matches
      
          fcount += matches    # 1 or 0
      
          if (! matches)
              next
      
          if (no_print && ! count_only)
              nextfile
      
          if (filenames_only && ! count_only) {
              print FILENAME
              nextfile
          }
      
          if (do_filenames && ! count_only)
              print FILENAME ":" $0
          else if (! count_only)
              print
      }
 
    The `END' rule takes care of producing the correct exit status. If
 there were no matches, the exit status is one, otherwise it is zero.
 
      END    \
      {
          if (total == 0)
              exit 1
          exit 0
      }
 
    The `usage' function prints a usage message in case of invalid
 options and then exits.
 
      function usage(    e)
      {
          e = "Usage: egrep [-csvil] [-e pat] [files ...]"
          print e > "/dev/stderr"
          exit 1
      }
 
    The variable `e' is used so that the function fits nicely on the
 printed page.
 
    Just a note on programming style. You may have noticed that the `END'
 rule uses backslash continuation, with the open brace on a line by
 itself.  This is so that it more closely resembles the way functions
 are written.  Many of the examples use this style. You can decide for
 yourself if you like writing your `BEGIN' and `END' rules this way, or
 not.
 
Info Catalog (gawk) Cut Program (gawk) Clones (gawk) Id Program
automatically generated byinfo2html