DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk) Getopt Function

Info Catalog (gawk) Filetrans Function (gawk) Library Functions (gawk) Passwd Functions
 
 Processing Command Line Options
 ===============================
 
    Most utilities on POSIX compatible systems take options or
 "switches" on the command line that can be used to change the way a
 program behaves.  `awk' is an example of such a program ( Command
 Line Options Options.).  Often, options take "arguments", data that
 the program needs to correctly obey the command line option.  For
 example, `awk''s `-F' option requires a string to use as the field
 separator.  The first occurrence on the command line of either `--' or a
 string that does not begin with `-' ends the options.
 
    Most Unix systems provide a C function named `getopt' for processing
 command line arguments.  The programmer provides a string describing
 the one letter options. If an option requires an argument, it is
 followed in the string with a colon.  `getopt' is also passed the count
 and values of the command line arguments, and is called in a loop.
 `getopt' processes the command line arguments for option letters.  Each
 time around the loop, it returns a single character representing the
 next option letter that it found, or `?' if it found an invalid option.
 When it returns -1, there are no options left on the command line.
 
    When using `getopt', options that do not take arguments can be
 grouped together.  Furthermore, options that take arguments require
 that the argument be present.  The argument can immediately follow the
 option letter, or it can be a separate command line argument.
 
    Given a hypothetical program that takes three command line options,
 `-a', `-b', and `-c', and `-b' requires an argument, all of the
 following are valid ways of invoking the program:
 
      prog -a -b foo -c data1 data2 data3
      prog -ac -bfoo -- data1 data2 data3
      prog -acbfoo data1 data2 data3
 
    Notice that when the argument is grouped with its option, the rest of
 the command line argument is considered to be the option's argument.
 In the above example, `-acbfoo' indicates that all of the `-a', `-b',
 and `-c' options were supplied, and that `foo' is the argument to the
 `-b' option.
 
    `getopt' provides four external variables that the programmer can
 use.
 
 `optind'
      The index in the argument value array (`argv') where the first
      non-option command line argument can be found.
 
 `optarg'
      The string value of the argument to an option.
 
 `opterr'
      Usually `getopt' prints an error message when it finds an invalid
      option.  Setting `opterr' to zero disables this feature.  (An
      application might wish to print its own error message.)
 
 `optopt'
      The letter representing the command line option.  While not
      usually documented, most versions supply this variable.
 
    The following C fragment shows how `getopt' might process command
 line arguments for `awk'.
 
      int
      main(int argc, char *argv[])
      {
          ...
          /* print our own message */
          opterr = 0;
          while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
              switch (c) {
              case 'f':    /* file */
                  ...
                  break;
              case 'F':    /* field separator */
                  ...
                  break;
              case 'v':    /* variable assignment */
                  ...
                  break;
              case 'W':    /* extension */
                  ...
                  break;
              case '?':
              default:
                  usage();
                  break;
              }
          }
          ...
      }
 
    As a side point, `gawk' actually uses the GNU `getopt_long' function
 to process both normal and GNU-style long options ( Command Line
 Options Options.).
 
    The abstraction provided by `getopt' is very useful, and would be
 quite handy in `awk' programs as well.  Here is an `awk' version of
 `getopt'.  This function highlights one of the greatest weaknesses in
 `awk', which is that it is very poor at manipulating single characters.
 Repeated calls to `substr' are necessary for accessing individual
 characters ( Built-in Functions for String Manipulation String
 Functions.).
 
    The discussion walks through the code a bit at a time.
 
      # getopt --- do C library getopt(3) function in awk
      #
      # arnold@gnu.org
      # Public domain
      #
      # Initial version: March, 1991
      # Revised: May, 1993
      
      # External variables:
      #    Optind -- index of ARGV for first non-option argument
      #    Optarg -- string value of argument to current option
      #    Opterr -- if non-zero, print our own diagnostic
      #    Optopt -- current option letter
      
      # Returns
      #    -1     at end of options
      #    ?      for unrecognized option
      #    <c>    a character representing the current option
      
      # Private Data
      #    _opti  index in multi-flag option, e.g., -abc
 
    The function starts out with some documentation: who wrote the code,
 and when it was revised, followed by a list of the global variables it
 uses, what the return values are and what they mean, and any global
 variables that are "private" to this library function.  Such
 documentation is essential for any program, and particularly for
 library functions.
 
      function getopt(argc, argv, options,    optl, thisopt, i)
      {
          optl = length(options)
          if (optl == 0)        # no options given
              return -1
      
          if (argv[Optind] == "--") {  # all done
              Optind++
              _opti = 0
              return -1
          } else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) {
              _opti = 0
              return -1
          }
 
    The function first checks that it was indeed called with a string of
 options (the `options' parameter).  If `options' has a zero length,
 `getopt' immediately returns -1.
 
    The next thing to check for is the end of the options.  A `--' ends
 the command line options, as does any command line argument that does
 not begin with a `-'.  `Optind' is used to step through the array of
 command line arguments; it retains its value across calls to `getopt',
 since it is a global variable.
 
    The regexp used, `/^-[^: \t\n\f\r\v\b]/', is perhaps a bit of
 overkill; it checks for a `-' followed by anything that is not
 whitespace and not a colon.  If the current command line argument does
 not match this pattern, it is not an option, and it ends option
 processing.
 
          if (_opti == 0)
              _opti = 2
          thisopt = substr(argv[Optind], _opti, 1)
          Optopt = thisopt
          i = index(options, thisopt)
          if (i == 0) {
              if (Opterr)
                  printf("%c -- invalid option\n",
                                        thisopt) > "/dev/stderr"
              if (_opti >= length(argv[Optind])) {
                  Optind++
                  _opti = 0
              } else
                  _opti++
              return "?"
          }
 
    The `_opti' variable tracks the position in the current command line
 argument (`argv[Optind]').  In the case that multiple options were
 grouped together with one `-' (e.g., `-abx'), it is necessary to return
 them to the user one at a time.
 
    If `_opti' is equal to zero, it is set to two, the index in the
 string of the next character to look at (we skip the `-', which is at
 position one).  The variable `thisopt' holds the character, obtained
 with `substr'.  It is saved in `Optopt' for the main program to use.
 
    If `thisopt' is not in the `options' string, then it is an invalid
 option.  If `Opterr' is non-zero, `getopt' prints an error message on
 the standard error that is similar to the message from the C version of
 `getopt'.
 
    Since the option is invalid, it is necessary to skip it and move on
 to the next option character.  If `_opti' is greater than or equal to
 the length of the current command line argument, then it is necessary
 to move on to the next one, so `Optind' is incremented and `_opti' is
 reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
 incremented.
 
    In any case, since the option was invalid, `getopt' returns `?'.
 The main program can examine `Optopt' if it needs to know what the
 invalid option letter actually was.
 
          if (substr(options, i + 1, 1) == ":") {
              # get option argument
              if (length(substr(argv[Optind], _opti + 1)) > 0)
                  Optarg = substr(argv[Optind], _opti + 1)
              else
                  Optarg = argv[++Optind]
              _opti = 0
          } else
              Optarg = ""
 
    If the option requires an argument, the option letter is followed by
 a colon in the `options' string.  If there are remaining characters in
 the current command line argument (`argv[Optind]'), then the rest of
 that string is assigned to `Optarg'.  Otherwise, the next command line
 argument is used (`-xFOO' vs. `-x FOO'). In either case, `_opti' is
 reset to zero, since there are no more characters left to examine in
 the current command line argument.
 
          if (_opti == 0 || _opti >= length(argv[Optind])) {
              Optind++
              _opti = 0
          } else
              _opti++
          return thisopt
      }
 
    Finally, if `_opti' is either zero or greater than the length of the
 current command line argument, it means this element in `argv' is
 through being processed, so `Optind' is incremented to point to the
 next element in `argv'.  If neither condition is true, then only
 `_opti' is incremented, so that the next option letter can be processed
 on the next call to `getopt'.
 
      BEGIN {
          Opterr = 1    # default is to diagnose
          Optind = 1    # skip ARGV[0]
      
          # test program
          if (_getopt_test) {
              while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
                  printf("c = <%c>, optarg = <%s>\n",
                                             _go_c, Optarg)
              printf("non-option arguments:\n")
              for (; Optind < ARGC; Optind++)
                  printf("\tARGV[%d] = <%s>\n",
                                          Optind, ARGV[Optind])
          }
      }
 
    The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
 `Opterr' is set to one, since the default behavior is for `getopt' to
 print a diagnostic message upon seeing an invalid option.  `Optind' is
 set to one, since there's no reason to look at the program name, which
 is in `ARGV[0]'.
 
    The rest of the `BEGIN' rule is a simple test program.  Here is the
 result of two sample runs of the test program.
 
      $ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
      -| c = <a>, optarg = <>
      -| c = <c>, optarg = <>
      -| c = <b>, optarg = <ARG>
      -| non-option arguments:
      -|         ARGV[3] = <bax>
      -|         ARGV[4] = <-x>
      
      $ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
      -| c = <a>, optarg = <>
      error--> x -- invalid option
      -| c = <?>, optarg = <>
      -| non-option arguments:
      -|         ARGV[4] = <xyz>
      -|         ARGV[5] = <abc>
 
    The first `--' terminates the arguments to `awk', so that it does
 not try to interpret the `-a' etc. as its own options.
 
    Several of the sample programs presented in  Practical `awk'
 Programs Sample Programs, use `getopt' to process their arguments.
 
Info Catalog (gawk) Filetrans Function (gawk) Library Functions (gawk) Passwd Functions
automatically generated byinfo2html