(gawk.info) GNU Regexp Operators
Info Catalog
(gawk.info) Regexp Operators
(gawk.info) Regexp
(gawk.info) Case-sensitivity
Additional Regexp Operators Only in `gawk'
==========================================
GNU software that deals with regular expressions provides a number of
additional regexp operators. These operators are described in this
section, and are specific to `gawk'; they are not available in other
`awk' implementations.
Most of the additional operators are for dealing with word matching.
For our purposes, a "word" is a sequence of one or more letters, digits,
or underscores (`_').
`\w'
This operator matches any word-constituent character, i.e. any
letter, digit, or underscore. Think of it as a short-hand for
`[[:alnum:]_]'.
`\W'
This operator matches any character that is not word-constituent.
Think of it as a short-hand for `[^[:alnum:]_]'.
`\<'
This operator matches the empty string at the beginning of a word.
For example, `/\<away/' matches `away', but not `stowaway'.
`\>'
This operator matches the empty string at the end of a word. For
example, `/stow\>/' matches `stow', but not `stowaway'.
`\y'
This operator matches the empty string at either the beginning or
the end of a word (the word boundar*y*). For example, `\yballs?\y'
matches either `ball' or `balls' as a separate word.
`\B'
This operator matches the empty string within a word. In other
words, `\B' matches the empty string that occurs between two
word-constituent characters. For example, `/\Brat\B/' matches
`crate', but it does not match `dirty rat'. `\B' is essentially
the opposite of `\y'.
There are two other operators that work on buffers. In Emacs, a
"buffer" is, naturally, an Emacs buffer. For other programs, the
regexp library routines that `gawk' uses consider the entire string to
be matched as the buffer.
For `awk', since `^' and `$' always work in terms of the beginning
and end of strings, these operators don't add any new capabilities.
They are provided for compatibility with other GNU software.
`\`'
This operator matches the empty string at the beginning of the
buffer.
`\''
This operator matches the empty string at the end of the buffer.
In other GNU software, the word boundary operator is `\b'. However,
that conflicts with the `awk' language's definition of `\b' as
backspace, so `gawk' uses a different letter.
An alternative method would have been to require two backslashes in
the GNU operators, but this was deemed to be too confusing, and the
current method of using `\y' for the GNU `\b' appears to be the lesser
of two evils.
The various command line options ( Command Line Options
Options.) control how `gawk' interprets characters in regexps.
No options
In the default case, `gawk' provides all the facilities of POSIX
regexps and the GNU regexp operators described in Regular
Expression Operators Regexp Operators. However, interval
expressions are not supported.
`--posix'
Only POSIX regexps are supported, the GNU operators are not special
(e.g., `\w' matches a literal `w'). Interval expressions are
allowed.
`--traditional'
Traditional Unix `awk' regexps are matched. The GNU operators are
not special, interval expressions are not available, and neither
are the POSIX character classes (`[[:alnum:]]' and so on).
Characters described by octal and hexadecimal escape sequences are
treated literally, even if they represent regexp metacharacters.
`--re-interval'
Allow interval expressions in regexps, even if `--traditional' has
been provided.
Info Catalog
(gawk.info) Regexp Operators
(gawk.info) Regexp
(gawk.info) Case-sensitivity
automatically generated byinfo2html