(gawk.info) Regexp Field Splitting
Info Catalog
(gawk.info) Basic Field Splitting
(gawk.info) Field Separators
(gawk.info) Single Character Fields
Using Regular Expressions to Separate Fields
--------------------------------------------
The previous node discussed the use of single characters or simple
strings as the value of `FS'. More generally, the value of `FS' may be
a string containing any regular expression. In this case, each match
in the record for the regular expression separates fields. For
example, the assignment:
FS = ", \t"
makes every area of an input line that consists of a comma followed by a
space and a tab, into a field separator. (`\t' is an "escape sequence"
that stands for a tab; Escape Sequences, for the complete list
of similar escape sequences.)
For a less trivial example of a regular expression, suppose you want
single spaces to separate fields the way single commas were used above.
You can set `FS' to `"[ ]"' (left bracket, space, right bracket). This
regular expression matches a single space and nothing else (
Regular Expressions Regexp.).
There is an important difference between the two cases of `FS = " "'
(a single space) and `FS = "[ \t\n]+"' (left bracket, space, backslash,
"t", backslash, "n", right bracket, which is a regular expression
matching one or more spaces, tabs, or newlines). For both values of
`FS', fields are separated by runs of spaces, tabs and/or newlines.
However, when the value of `FS' is `" "', `awk' will first strip
leading and trailing whitespace from the record, and then decide where
the fields are.
For example, the following pipeline prints `b':
$ echo ' a b c d ' | awk '{ print $2 }'
-| b
However, this pipeline prints `a' (note the extra spaces around each
letter):
$ echo ' a b c d ' | awk 'BEGIN { FS = "[ \t]+" }
> { print $2 }'
-| a
In this case, the first field is "null", or empty.
The stripping of leading and trailing whitespace also comes into
play whenever `$0' is recomputed. For instance, study this pipeline:
$ echo ' a b c d' | awk '{ print; $2 = $2; print }'
-| a b c d
-| a b c d
The first `print' statement prints the record as it was read, with
leading whitespace intact. The assignment to `$2' rebuilds `$0' by
concatenating `$1' through `$NF' together, separated by the value of
`OFS'. Since the leading whitespace was ignored when finding `$1', it
is not part of the new `$0'. Finally, the last `print' statement
prints the new `$0'.
Info Catalog
(gawk.info) Basic Field Splitting
(gawk.info) Field Separators
(gawk.info) Single Character Fields
automatically generated byinfo2html