DOC HOME SITE MAP MAN PAGES GNU INFO SEARCH PRINT BOOK
 

(gawk.info) Regexp Field Splitting

Info Catalog (gawk.info) Basic Field Splitting (gawk.info) Field Separators (gawk.info) Single Character Fields
 
 Using Regular Expressions to Separate Fields
 --------------------------------------------
 
    The previous node discussed the use of single characters or simple
 strings as the value of `FS'.  More generally, the value of `FS' may be
 a string containing any regular expression.  In this case, each match
 in the record for the regular expression separates fields.  For
 example, the assignment:
 
      FS = ", \t"
 
 makes every area of an input line that consists of a comma followed by a
 space and a tab, into a field separator.  (`\t' is an "escape sequence"
 that stands for a tab;  Escape Sequences, for the complete list
 of similar escape sequences.)
 
    For a less trivial example of a regular expression, suppose you want
 single spaces to separate fields the way single commas were used above.
 You can set `FS' to `"[ ]"' (left bracket, space, right bracket).  This
 regular expression matches a single space and nothing else (
 Regular Expressions Regexp.).
 
    There is an important difference between the two cases of `FS = " "'
 (a single space) and `FS = "[ \t\n]+"' (left bracket, space, backslash,
 "t", backslash, "n", right bracket, which is a regular expression
 matching one or more spaces, tabs, or newlines).  For both values of
 `FS', fields are separated by runs of spaces, tabs and/or newlines.
 However, when the value of `FS' is `" "', `awk' will first strip
 leading and trailing whitespace from the record, and then decide where
 the fields are.
 
    For example, the following pipeline prints `b':
 
      $ echo ' a b c d ' | awk '{ print $2 }'
      -| b
 
 However, this pipeline prints `a' (note the extra spaces around each
 letter):
 
      $ echo ' a  b  c  d ' | awk 'BEGIN { FS = "[ \t]+" }
      >                                  { print $2 }'
      -| a
 
 In this case, the first field is "null", or empty.
 
    The stripping of leading and trailing whitespace also comes into
 play whenever `$0' is recomputed.  For instance, study this pipeline:
 
      $ echo '   a b c d' | awk '{ print; $2 = $2; print }'
      -|    a b c d
      -| a b c d
 
 The first `print' statement prints the record as it was read, with
 leading whitespace intact.  The assignment to `$2' rebuilds `$0' by
 concatenating `$1' through `$NF' together, separated by the value of
 `OFS'.  Since the leading whitespace was ignored when finding `$1', it
 is not part of the new `$0'.  Finally, the last `print' statement
 prints the new `$0'.
 
Info Catalog (gawk.info) Basic Field Splitting (gawk.info) Field Separators (gawk.info) Single Character Fields
automatically generated byinfo2html