[2023-03] Mechanics of the database merge

This article provides details about the mechanics of Relevant’s migration to a single database. It is intended for Relevant project teams and health center IT administrators. See the main article, Merging the Staging and Production Databases, for context.

Behind the scenes, here’s how the database merge process will work. The following steps will be performed by Relevant, with advance notice to the health center’s Relevant project team. We estimate that the entire process will take less than one hour.

Step 1: Merge the databases

First, we perform the following steps:

Take a backup of relevant_production (herein “The Old Production Backup”)
Rename relevant_etl.public to relevant_etl.custom
Rename the relevant_etl database to relevant_production
Load the contents of The Old Production Backup into relevant_production
Rename relevant_production.public to relevant_production.rdm

At this point, we have a nice new relevant_production database, where:

rdm holds the previous contents of relevant_production’s public schema
custom holds the previous contents of relevant_etl’s public schema
Any other schemas that previously existed in relevant_etl are here, unchanged
Any other schemas that previously existed in relevant_production are unchanged

Step 2: Move tables into their new EHR-based schema

Now, we want to move some tables from “custom” into EHR-specific schemas. Doing this is a little trickier than it seems at first, because we don’t have a foolproof way of knowing where each table that’s currently in “custom” comes from. Here’s how we do it:

First, we configure the EHR of all data sources, by clicking into the “Data Source” configuration screens within the Relevant web application. Note: this is done for all data sources, whether or not the data source is currently used by an acquisition plan or not.
Then, we invoke some code to move any tables that are referenced in an acquisition plan that has run at least once. To identify the name of the schema to move it into, we use the EHR of the data source associated with the acquisition plan. Note: if a table name happens to be referenced in more than one acquisition plan, we’ll move it according to the data source specified in the acquisition plan which has run most recently.

After these changes are made, the nightly data pipeline will run according to its normal schedule.