Berkeley DB Reference Guide:
Berkeley DB Replication


Building replicated applications

The simplest way to build a replicated Berkeley DB application is to first build (and debug!) the transactional version of the same application. Then, add a thin replication layer to the application. All highly available applications use the following additional four Berkeley DB methods: DB_ENV->rep_elect, DB_ENV->rep_process_message, DB_ENV->rep_start and DB_ENV->set_rep_transport and may also use the configuration method DB_ENV->set_rep_limit:

The DB_ENV->set_rep_transport method configures the replication system's communications infrastructure.
The DB_ENV->rep_start method configures (or reconfigures) an existing database environment to be a replication master or client.
The DB_ENV->rep_process_message method is used to process incoming messages from other environments in the replication group. For clients, it is responsible for accepting log records and updating the local databases based on messages from the master. For both the master and the clients, it is responsible for handling administrative functions (for example, the protocol for dealing with lost messages), and permitting new clients to join an active replication group. This method should only be called after the environment has been configured as a replication master or client via DB_ENV->rep_start.
The DB_ENV->rep_elect method causes the replication group to elect a new master; it is called whenever contact with the master is lost and the application wants the remaining sites to select a new master.
The DB_ENV->set_rep_limit method imposes an upper bound on the amount of data that will be sent in response to a single call to DB_ENV->rep_process_message. During client recovery, that is, when a replica site is trying to synchronize with the master, clients may ask the master for a large number of log records. If it is going to harm an application for the master message loop to remain busy for an extended period transmitting records to the replica, then the application will want to use DB_ENV->set_rep_limit to limit the amount of data the master will send before relinquishing control and accepting other messages.

To add replication to a Berkeley DB application, application initialization must be changed and the application's communications infrastructure must be written. The application initialization changes are relatively simple, but the communications infrastructure code can be complex.

For implementation reasons, all replicated databases must reside in the data directories set from DB_ENV->set_data_dir or in the default environment home directory. If your databases reside in the default environment home directory, they must be in the home directory itself, not subdirectories below the environment home. Care must be taken in applications using relative pathnames and changing working directories after opening the environment. In such applications the replication initialization code may not be able to locate the databases, and applications that change their working directories may need to use absolute pathnames.

During application initialization, the application performs three additional tasks: first, it must specify the DB_INIT_REP flag when opening its database environment; second, it must provide Berkeley DB information about its communications infrastructure; and third, it must start the Berkeley DB replication system. Generally, a replicated application will do normal Berkeley DB recovery and configuration, exactly like any other transactional application. Then, once the database environment has been opened, it will call the DB_ENV->set_rep_transport method to configure Berkeley DB for replication, and then will call the DB_ENV->rep_start method to join or create the replication group.

When calling DB_ENV->rep_start at application startup, the application has two choices: specifically configure the master for the replication group, or, alternatively, configure all group members as clients and then call an election, letting the clients select the master from among themselves. Either is correct, and the choice is entirely up to the application. The result of calling DB_ENV->rep_start is usually the discovery of a master, or the declaration of the local environment as the master. If a master has not been discovered after a reasonable amount of time, the application should call DB_ENV->rep_elect to call for an election.

Consider the case of multiple processes or multiple environment handles that modify databases in the replicated environment. All modifications must be done on the master environment. The first process to join or create the master environment must call both the DB_ENV->set_rep_transport method and the DB_ENV->rep_start method. Subsequent replication processes must at least call the DB_ENV->set_rep_transport method. Those processes may call the DB_ENV->rep_start method (as long as they use the same master or client argument). If multiple processes are modifying the master environment there must be a unified communication infrastructure such that messages arriving at clients have a single master ID. Additionally the application must be structured so that all incoming messages are able to be processed by a single DB_ENV handle.

Note that not all processes running in replicated environments need to call DB_ENV->set_rep_transport or DB_ENV->rep_start. Read-only processes running in a master environment do not need to be configured for replication in any way. Processes running in a client environment are read-only by definition, and so do not need to be configured for replication either (although, in the case of clients that may become masters, it is usually simplest to configure for replication on process startup rather than trying to reconfigure when the client becomes a master). Obviously, at least one thread of control on each client must be configured for replication as messages must be passed between the master and the client.

For implementation reasons, all incoming replication messages must be processed using the same DB_ENV handle. It is not required that a single thread of control process all messages, only that all threads of control processing messages use the same handle.

No additional calls are required to shut down a database environment participating in a replication group. The application should shut down the environment in the usual manner, by calling the DB_ENV->close method.


Copyright (c) 1996-2005 Sleepycat Software, Inc. - All rights reserved.