|
|
awk is especially useful for producing reports that summarize and format information. Suppose you want to produce a report from the file countries in which the continents are listed alphabetically, and the countries on each continent are listed after in decreasing order of population:
Africa: Sudan 19 Algeria 18As with many data processing tasks, it is much easier to produce this report in several stages. First, create a list of continent-country-population triples, in which each field is separated by a colon. This can be done with the following program triples, which uses an array pop indexed by subscripts of the form continent:country to store the population of a given country. The print statement in the END section of the program creates the list of continent-country-population triples that are piped to the sort routine.Asia: China 866 India 637 USSR 262
Australia: Australia 14
North America: USA 219 Canada 24
South America: Brazil 116 Argentina 26
BEGIN { FS = "\t" }
{ pop[$4 ":" $1] += $3 }
END { for (cc in pop)
print cc ":" pop[cc] | "sort -t: +0 -1 +2nr" }
The arguments for
sort
deserve special mention.
The
-t:
argument
tells
sort
to use
:
as its field separator.
The
+0 -1
arguments make the first field the primary sort key.
In general,
+i -j
makes fields
i+1, i+2, . . ., j the sort key.
If -j is omitted, the fields from i+1 to the end of the record are used.
The
+2nr
argument makes the third field, numerically decreasing,
the secondary sort key
(n is for numeric,
r
for reverse order).
Invoked on the file
countries,
this program produces as output
Africa:Sudan:19 Africa:Algeria:18 Asia:China:866 Asia:India:637 Asia:USSR:262 Australia:Australia:14 North America:USA:219 North America:Canada:24 South America:Brazil:116 South America:Argentina:26This output is in the right order but the wrong format. To transform the output into the desired form, run it through a second awk program format:
BEGIN { FS = ":" }
{ if ($1 != prev) {
print "\n" $1 ":"
prev = $1
}
printf "\t%-10s %6d\n", $2, $3
}
This is a control-break program
that prints only the first occurrence of a continent name
and formats the country-population lines
associated with that continent in the desired manner.
The command line
$ awk -f triples countries | awk -f format<<Return>>gives the desired report. As this example suggests, complex data transformation and formatting tasks can often be reduced to a few simple awk commands and sorts.