File and device input/output

Reading and writing files

The functions read and write do I/O on files. For both, the first argument is a file-descriptor, the second argument is a buffer in the user program where the data comes from or goes to and the third argument is the number of bytes of data to transfer. Each call returns a count of the number of bytes actually transferred. These calls look like:

n = read(fildes, buffer, count);

n = write(fildes, buffer, count);

n = readv(fildes, iov, iovcnt);

n = writev(fildes, iov, iovcnt);

Up to count bytes are transferred between the file denoted by fildes and the byte array pointed to by buffer. The returned value n is the number of bytes actually transferred.

For writing, the returned value is the number of bytes actually written; it is generally an error if this fails to equal the number of bytes requested. In the write case, n is the same as count except under exceptional conditions, such as I/O errors or end of physical medium on special files; in a read, however, n may without error be less than count.

writev performs the same action as write, but gathers the output data from the iovcnt buffers specified by the members of the iov array: iov[0], iov[1], . . ., iov [iovcnt-1]. The iovcnt is valid only if greater than 0, and less than or equal to {IOV_MAX}.

Each iovec entry specifies the base address and length of an area in memory from which data should be written. writev always writes a complete area before proceeding to the next.

For reading, the number of bytes returned may be less than the number requested, because fewer than count bytes remained to be read. If the file-offset is so near the end of the file that reading count characters would cause reading beyond the end, only sufficient bytes are transferred to reach the end of the file, also, typewriter-like terminals never return more than one line of input. (When the file is a terminal, read normally reads only up to the next new-line, which is generally less than what was requested.)

readv performs the same action as read, but places the input data into iovcnt buffers specified by the members of the iov array: iov[0], iov[1], . . ., iov [iovcnt-1].

Each iovec entry specifies the base address and length of an area in memory where data should be placed. readv always fills one buffer completely before proceeding to the next.

When a read call returns with n equal to zero, the end of the file has been reached. For disk files this occurs when the file-offset equals the current size of the file. It is possible to generate an end-of-file from a terminal by use of an escape sequence that depends on the device used. The function read returns 0 to signify end-of-file, and returns -1 to signify an error.

The number of bytes to be read or written is quite arbitrary. The two most common values are 1, which means one character at a time (``unbuffered''), and 512, which corresponds to a physical block size on many peripheral devices. This latter size is most efficient, but even character at a time I/O is not overly expensive. Bytes written affect only those parts of a file implied by the position of the file-offset and the count; no other part of the file is changed. If the last byte lies beyond the end of the file, the file grows as needed.

A simple program using the read and write functions to copy its input to its output can copy anything, since the input and output can be redirected to any file or device.

#define  BUFSIZE  512

main()   /* copy input to output */
{
   char buf[BUFSIZE];
   int  n;

   while ((n = read(0, buf, BUFSIZE)) > 0)
      write( 1, buf, n);
   exit(0);
}

If the file size is not a multiple of BUFSIZE, some read will return a smaller number of bytes to be written by write: the next call to read after that will return zero indicating end-of-file.

To see how read and write can be used to construct higher level functions like getchar and putchar, here is an example of getchar which does unbuffered input:

#define  CMASK   0377  /* for making char's > 0 */

getchar() /* unbuffered single character input */
{
   char c;

   return((read(0, &c, 1) > 0) ? c & CMASK : EOF);
}

The variable c must be declared char, because read accepts a character pointer. The character returned must be masked with 0377 to ensure that it is positive; otherwise, sign extension may make it negative.

The second version of getchar does input in big chunks, and hands out the characters one at a time.

#define  CMASK   0377  /* for making char's > 0 */
#define  BUFSIZE  512

getchar()  /* buffered version */
{
   static char    buf[BUFSIZE];
   static char    *bufp = buf;
   static int     n = 0;

   if (n == 0)  {   /* buffer is empty */
      n = read(0, buf, BUFSIZE);
      bufp = buf;
   }
   return((--n >= 0) ? *bufp++ & CMASK : EOF);
}