Berkeley DB Reference Guide: Building replicated applications

Berkeley DB Reference Guide:
Berkeley DB Replication

Building replicated applications

The simplest way to build a replicated Berkeley DB application is to first build (and debug!) the transactional version of the same application. Then, add a thin replication layer: application initialization must be changed and the application's communication infrastructure must be added.

The application initialization changes are relatively simple. Replication Manager provides a communication infrastructure, but in order to use the Base replication API you must provide your own.

For implementation reasons, all replicated databases must reside in the data directories set from DB_ENV->set_data_dir (or in the default environment home directory, if not using DB_ENV->set_data_dir), rather than in a subdirectory below the specified directory. Care must be taken in applications using relative pathnames and changing working directories after opening the environment. In such applications the replication initialization code may not be able to locate the databases, and applications that change their working directories may need to use absolute pathnames.

During application initialization, the application performs three additional tasks: first, it must specify the DB_INIT_REP flag when opening its database environment and additionally, a Replication Manager application must also specify the DB_THREAD flag; second, it must provide Berkeley DB information about its communications infrastructure; and third, it must start the Berkeley DB replication system. Generally, a replicated application will do normal Berkeley DB recovery and configuration, exactly like any other transactional application.

Replication Manager applications configure the built-in communications infrastructure by calling the DB_ENV->repmgr_set_local_site method once and the DB_ENV->repmgr_add_remote_site method zero or more times. Once the environment has been opened, the application starts the replication system by calling the DB_ENV->repmgr_start method.

If using the Base replication API, the application calls the DB_ENV->rep_set_transport method to configure the entry point to its own communications infrastructure, and then it calls the DB_ENV->rep_start method to join or create the replication group.

When starting the replication system, an application has two choices: it may choose the group master site explicitly, or alternatively it may configure all group members as clients and then call for an election, letting the clients select the master from among themselves. Either is correct, and the choice is entirely up to the application.

For an application that uses the Base replication API, the result of calling DB_ENV->rep_start is usually the discovery of a master, or the declaration of the local environment as the master. If a master has not been discovered after a reasonable amount of time, the application should call DB_ENV->rep_elect to call for an election.

Replication Manager applications have these same two choices. But they configure their start-up behavior simply by setting the flags parameter to the DB_ENV->repmgr_start method.

Consider the case of multiple processes or multiple environment handles that modify databases in the replicated environment. All modifications must be done on the master environment. The first process to join or create the master environment must call both the DB_ENV->rep_set_transport method and the DB_ENV->rep_start method. Subsequent replication processes must at least call the DB_ENV->rep_set_transport method. Those processes may call the DB_ENV->rep_start method (as long as they use the same master or client argument). If multiple processes are modifying the master environment there must be a unified communication infrastructure such that messages arriving at clients have a single master ID. Additionally the application must be structured so that all incoming messages are able to be processed by a single DB_ENV handle.

Note that not all processes running in replicated environments need to call DB_ENV->rep_set_transport or DB_ENV->rep_start. Read-only processes running in a master environment do not need to be configured for replication in any way. Processes running in a client environment are read-only by definition, and so do not need to be configured for replication either (although, in the case of clients that may become masters, it is usually simplest to configure for replication on process startup rather than trying to reconfigure when the client becomes a master). Obviously, at least one thread of control on each client must be configured for replication as messages must be passed between the master and the client.

Any site in a replication group may have its own private transactional databases in the environment as well. A site may create a local database by using the DB_TXN_NOT_DURABLE flag to the DB->set_flags. The application must never create a private database with the same name as a database replicated across the entire environment as data corruption can result.

For implementation reasons, all incoming replication messages must be processed using the same DB_ENV handle. It is not required that a single thread of control process all messages, only that all threads of control processing messages use the same handle.

No additional calls are required to shut down a database environment participating in a replication group. The application should shut down the environment in the usual manner, by calling the DB_ENV->close method. For Replication Manager applications, this also terminates all network connections and background processing threads.