String collation

ANSI C provides two functions for locale-dependent string compares. strcoll is analogous to strcmp except that the two strings are compared according to the LC_COLLATE category of the current locale. (See the strcoll(3C) manual page and strcmp on the string(3C) manual page). Conceptually, collation occurs in two passes to obtain an appropriate ordering of accented characters, two-character sequences that should be treated as one (the Spanish character ch, for example), and single characters that should be treated as two (the sharp s in German, for instance). Since this comparison is not necessarily as inexpensive as strcmp, the strxfrm function is provided to transform a string into another. Therefore, any two such after-translation strings can be passed to strcmp to get an ordering identical to what strcoll would have returned if passed the two pre-translation strings. You are responsible for keeping track of the strings in their translated and printable forms. Generally, you should use strxfrm when a string will be compared a number of times.

The following example uses qsort(3C) and strcoll(3C) to sort lines in a text file:

#include <stdio.h>
#include <string.h>
#include <locale.h>

char table [ELEMENTS] [WIDTH];

main(argc, argv) int argc; char **argv; { FILE *fp; int nel, i;

setlocale(LC_ALL, "");

if ((fp = fopen(argv[1], "r")) == NULL) { fprintf(stderr, gettxt("progmsgs:2", "Can't open %s\n", argv[1]); exit(2); } for (nel = 0; nel < ELEMENTS && fgets(table[nel], WIDTH, fp); ++nel);


if (nel >= ELEMENTS) { fprintf(stderr, gettxt("progmsgs:3", "File too large\n"); exit(3); } qsort(table, nel, WIDTH, strcoll); for (i = 0; i < nel; ++i) fputs(table(i), stdout); return(0); }

The next example does the same thing with a function that uses strxfrm:

compare (s1, s2)
char *s1, *s2;
	char *tmp;
	int result;
	size_t n1 = strxfrm(NULL, s1, 0) + 1;
	size_t n2 = strxfrm(NULL, s2, 0) + 1;

if ((tmp = malloc(n1 + n2)) == NULL) return strcmp(s1, s2);

(void)strxfrm(tmp, s1, n1); (void)strxfrm(tmp + n1 + 1, s2, n2);

result = strcmp(tmp, tmp + n1 + 1); free(tmp); return(result); }

Assuming malloc succeeds, the return value of compare (s1, s2) should correspond to the return value of strcoll(s1, s2). Although it is too complicated to show here, it would probably be better to hold onto the strings for subsequent comparisons rather than transforming them each time the function is called. See the strcoll(3C) and strxfrm(3C) manual pages.

Next topic: Message handling
Previous topic: Numeric and monetary information

© 2004 The SCO Group, Inc. All rights reserved.
UnixWare 7 Release 7.1.4 - 27 April 2004