Data Pipeline Overview

This article references the single database warehouse structure, which rolled out in spring and summer 2023. If you wish to understand the previous staging and production dual database structure, read more about the transition.

Relevant’s data pipeline is the nightly process which extracts data from the EHR (and potentially other sources), and then runs a variety cleanup and transformation steps to prepare that data for accurate reporting. The following table describes the main steps of the data pipeline, the order in which they run, and what happens if they fail.

         
Stage 1 What it does When it runs If a failure occurs 2 Frequent causes of failure
Data Acquisition Copies data from source database(s) into Relevant’s data warehouse As specified in an Acquisition Plan Later pipeline stages are canceled. See this support article for details and recommended fixes.
Transformers Runs cleanup and transformation steps in the custom schema After Data Acquisition completes Later pipeline stages are canceled, as well as any dependent Transformers. Check for SQL errors 3
Data Elements Builds standard concepts in the rdm schema After Transformers complete Later pipeline stages are canceled, as well as any dependent Data Elements. Check for SQL errors. In addition, check the data and mapping of source tables.
Populations Groups patients into standard and custom populations After Data Elements complete Risk Models are canceled. Other  pipeline stages are not affected. Same as Data Elements
Care Gaps Flags patients who satisfy various conditions After Populations complete Other pipeline stages are not affected. Same as Data Elements
Risk Models Calculates patient risk scores for various risk models After Care Gaps complete Other pipeline stages are not affected. Same as Data Elements
Quality Measures Calculates quality measure compliance 4 After Risk Models complete Other pipeline stages are not affected. Same as Data Elements
Custom Alerts Runs SQL and emails users if condition is met After Quality Measures complete 5 Other pipeline stages are not affected. Check for SQL errors. In addition, ensure that the Alert has at least one recipient configured.
  1. Within each pipeline stage, tasks run in parallel, in batches of up to five. For example, during the Care Gaps stage, up to five Care Gaps will be calculated simultaneously. Transformers are run in an order that respects any dependencies that may exist between them; this order is calculated automatically by inspecting the SQL of each transformer. The same is true for Data Elements. 

  2. When a pipeline task fails, the remaining steps within that pipeline stage will still run, except where noted. For example: if a Care Gap fails, all other Care Gaps will still be run. If a Transformer fails, other Transformer will still run, except for any specific Transformers that have a dependency on the failed Transformer. 

  3. SQL syntax errors are checked when the pipeline task is saved. However, SQL runtime errors may still occur. For example: “PG::UndefinedColumn: ERROR: column “started_on” does not exist.” 

  4. After calculating quality measure compliance at the patient level, aggregate compliance statistics are also calculated; these aggregate results are stored in the `fact_measure_results` table in the rdm schema. 

  5. Custom Alerts which are configured with a “When to run” of “After pipeline finishes” will not run if Transformers or Data elements fail. This is because if one of these early pipeline stages fails, you will likely receive an email about that, and we don’t want to inundate you with emails when something goes wrong.