(gawk) Getopt Function
Info Catalog
(gawk) Filetrans Function
(gawk) Library Functions
(gawk) Passwd Functions
Processing Command Line Options
===============================
Most utilities on POSIX compatible systems take options or
"switches" on the command line that can be used to change the way a
program behaves. `awk' is an example of such a program ( Command
Line Options Options.). Often, options take "arguments", data that
the program needs to correctly obey the command line option. For
example, `awk''s `-F' option requires a string to use as the field
separator. The first occurrence on the command line of either `--' or a
string that does not begin with `-' ends the options.
Most Unix systems provide a C function named `getopt' for processing
command line arguments. The programmer provides a string describing
the one letter options. If an option requires an argument, it is
followed in the string with a colon. `getopt' is also passed the count
and values of the command line arguments, and is called in a loop.
`getopt' processes the command line arguments for option letters. Each
time around the loop, it returns a single character representing the
next option letter that it found, or `?' if it found an invalid option.
When it returns -1, there are no options left on the command line.
When using `getopt', options that do not take arguments can be
grouped together. Furthermore, options that take arguments require
that the argument be present. The argument can immediately follow the
option letter, or it can be a separate command line argument.
Given a hypothetical program that takes three command line options,
`-a', `-b', and `-c', and `-b' requires an argument, all of the
following are valid ways of invoking the program:
prog -a -b foo -c data1 data2 data3
prog -ac -bfoo -- data1 data2 data3
prog -acbfoo data1 data2 data3
Notice that when the argument is grouped with its option, the rest of
the command line argument is considered to be the option's argument.
In the above example, `-acbfoo' indicates that all of the `-a', `-b',
and `-c' options were supplied, and that `foo' is the argument to the
`-b' option.
`getopt' provides four external variables that the programmer can
use.
`optind'
The index in the argument value array (`argv') where the first
non-option command line argument can be found.
`optarg'
The string value of the argument to an option.
`opterr'
Usually `getopt' prints an error message when it finds an invalid
option. Setting `opterr' to zero disables this feature. (An
application might wish to print its own error message.)
`optopt'
The letter representing the command line option. While not
usually documented, most versions supply this variable.
The following C fragment shows how `getopt' might process command
line arguments for `awk'.
int
main(int argc, char *argv[])
{
...
/* print our own message */
opterr = 0;
while ((c = getopt(argc, argv, "v:f:F:W:")) != -1) {
switch (c) {
case 'f': /* file */
...
break;
case 'F': /* field separator */
...
break;
case 'v': /* variable assignment */
...
break;
case 'W': /* extension */
...
break;
case '?':
default:
usage();
break;
}
}
...
}
As a side point, `gawk' actually uses the GNU `getopt_long' function
to process both normal and GNU-style long options ( Command Line
Options Options.).
The abstraction provided by `getopt' is very useful, and would be
quite handy in `awk' programs as well. Here is an `awk' version of
`getopt'. This function highlights one of the greatest weaknesses in
`awk', which is that it is very poor at manipulating single characters.
Repeated calls to `substr' are necessary for accessing individual
characters ( Built-in Functions for String Manipulation String
Functions.).
The discussion walks through the code a bit at a time.
# getopt --- do C library getopt(3) function in awk
#
# arnold@gnu.org
# Public domain
#
# Initial version: March, 1991
# Revised: May, 1993
# External variables:
# Optind -- index of ARGV for first non-option argument
# Optarg -- string value of argument to current option
# Opterr -- if non-zero, print our own diagnostic
# Optopt -- current option letter
# Returns
# -1 at end of options
# ? for unrecognized option
# <c> a character representing the current option
# Private Data
# _opti index in multi-flag option, e.g., -abc
The function starts out with some documentation: who wrote the code,
and when it was revised, followed by a list of the global variables it
uses, what the return values are and what they mean, and any global
variables that are "private" to this library function. Such
documentation is essential for any program, and particularly for
library functions.
function getopt(argc, argv, options, optl, thisopt, i)
{
optl = length(options)
if (optl == 0) # no options given
return -1
if (argv[Optind] == "--") { # all done
Optind++
_opti = 0
return -1
} else if (argv[Optind] !~ /^-[^: \t\n\f\r\v\b]/) {
_opti = 0
return -1
}
The function first checks that it was indeed called with a string of
options (the `options' parameter). If `options' has a zero length,
`getopt' immediately returns -1.
The next thing to check for is the end of the options. A `--' ends
the command line options, as does any command line argument that does
not begin with a `-'. `Optind' is used to step through the array of
command line arguments; it retains its value across calls to `getopt',
since it is a global variable.
The regexp used, `/^-[^: \t\n\f\r\v\b]/', is perhaps a bit of
overkill; it checks for a `-' followed by anything that is not
whitespace and not a colon. If the current command line argument does
not match this pattern, it is not an option, and it ends option
processing.
if (_opti == 0)
_opti = 2
thisopt = substr(argv[Optind], _opti, 1)
Optopt = thisopt
i = index(options, thisopt)
if (i == 0) {
if (Opterr)
printf("%c -- invalid option\n",
thisopt) > "/dev/stderr"
if (_opti >= length(argv[Optind])) {
Optind++
_opti = 0
} else
_opti++
return "?"
}
The `_opti' variable tracks the position in the current command line
argument (`argv[Optind]'). In the case that multiple options were
grouped together with one `-' (e.g., `-abx'), it is necessary to return
them to the user one at a time.
If `_opti' is equal to zero, it is set to two, the index in the
string of the next character to look at (we skip the `-', which is at
position one). The variable `thisopt' holds the character, obtained
with `substr'. It is saved in `Optopt' for the main program to use.
If `thisopt' is not in the `options' string, then it is an invalid
option. If `Opterr' is non-zero, `getopt' prints an error message on
the standard error that is similar to the message from the C version of
`getopt'.
Since the option is invalid, it is necessary to skip it and move on
to the next option character. If `_opti' is greater than or equal to
the length of the current command line argument, then it is necessary
to move on to the next one, so `Optind' is incremented and `_opti' is
reset to zero. Otherwise, `Optind' is left alone and `_opti' is merely
incremented.
In any case, since the option was invalid, `getopt' returns `?'.
The main program can examine `Optopt' if it needs to know what the
invalid option letter actually was.
if (substr(options, i + 1, 1) == ":") {
# get option argument
if (length(substr(argv[Optind], _opti + 1)) > 0)
Optarg = substr(argv[Optind], _opti + 1)
else
Optarg = argv[++Optind]
_opti = 0
} else
Optarg = ""
If the option requires an argument, the option letter is followed by
a colon in the `options' string. If there are remaining characters in
the current command line argument (`argv[Optind]'), then the rest of
that string is assigned to `Optarg'. Otherwise, the next command line
argument is used (`-xFOO' vs. `-x FOO'). In either case, `_opti' is
reset to zero, since there are no more characters left to examine in
the current command line argument.
if (_opti == 0 || _opti >= length(argv[Optind])) {
Optind++
_opti = 0
} else
_opti++
return thisopt
}
Finally, if `_opti' is either zero or greater than the length of the
current command line argument, it means this element in `argv' is
through being processed, so `Optind' is incremented to point to the
next element in `argv'. If neither condition is true, then only
`_opti' is incremented, so that the next option letter can be processed
on the next call to `getopt'.
BEGIN {
Opterr = 1 # default is to diagnose
Optind = 1 # skip ARGV[0]
# test program
if (_getopt_test) {
while ((_go_c = getopt(ARGC, ARGV, "ab:cd")) != -1)
printf("c = <%c>, optarg = <%s>\n",
_go_c, Optarg)
printf("non-option arguments:\n")
for (; Optind < ARGC; Optind++)
printf("\tARGV[%d] = <%s>\n",
Optind, ARGV[Optind])
}
}
The `BEGIN' rule initializes both `Opterr' and `Optind' to one.
`Opterr' is set to one, since the default behavior is for `getopt' to
print a diagnostic message upon seeing an invalid option. `Optind' is
set to one, since there's no reason to look at the program name, which
is in `ARGV[0]'.
The rest of the `BEGIN' rule is a simple test program. Here is the
result of two sample runs of the test program.
$ awk -f getopt.awk -v _getopt_test=1 -- -a -cbARG bax -x
-| c = <a>, optarg = <>
-| c = <c>, optarg = <>
-| c = <b>, optarg = <ARG>
-| non-option arguments:
-| ARGV[3] = <bax>
-| ARGV[4] = <-x>
$ awk -f getopt.awk -v _getopt_test=1 -- -a -x -- xyz abc
-| c = <a>, optarg = <>
error--> x -- invalid option
-| c = <?>, optarg = <>
-| non-option arguments:
-| ARGV[4] = <xyz>
-| ARGV[5] = <abc>
The first `--' terminates the arguments to `awk', so that it does
not try to interpret the `-a' etc. as its own options.
Several of the sample programs presented in Practical `awk'
Programs Sample Programs, use `getopt' to process their arguments.
Info Catalog
(gawk) Filetrans Function
(gawk) Library Functions
(gawk) Passwd Functions
automatically generated byinfo2html