Data Pipeline Overview

This article references the single database warehouse structure, which rolled out in spring and summer 2023. If you wish to understand the previous staging and production dual database structure, read more about the transition.

Relevant’s data pipeline is the nightly process which extracts data from the EHR (and potentially other sources), and then runs a variety cleanup and transformation steps to prepare that data for accurate reporting. The following table describes the main steps of the data pipeline, the order in which they run, and what happens if they fail.


Stage ¹	What it does	When it runs	If a failure occurs ²	Frequent causes of failure
Data Acquisition	Copies data from source database(s) into Relevant’s data warehouse	As specified in an Acquisition Plan	Later pipeline stages are canceled.	See this support article for details and recommended fixes.
Transformers	Runs cleanup and transformation steps in the custom schema	After Data Acquisition completes	Later pipeline stages are canceled, as well as any dependent Transformers.	Check for SQL errors ³
Data Elements	Builds standard concepts in the rdm schema	After Transformers complete	Later pipeline stages are canceled, as well as any dependent Data Elements.	Check for SQL errors. In addition, check the data and mapping of source tables.
Populations	Groups patients into standard and custom populations	After Data Elements complete	Risk Models are canceled. Other pipeline stages are not affected.	Same as Data Elements
Care Gaps	Flags patients who satisfy various conditions	After Populations complete	Other pipeline stages are not affected.	Same as Data Elements
Risk Models	Calculates patient risk scores for various risk models	After Care Gaps complete	Other pipeline stages are not affected.	Same as Data Elements
Quality Measures	Calculates quality measure compliance ⁴	After Risk Models complete	Other pipeline stages are not affected.	Same as Data Elements
Custom Alerts	Runs SQL and emails users if condition is met	After Quality Measures complete ⁵	Other pipeline stages are not affected.	Check for SQL errors. In addition, ensure that the Alert has at least one recipient configured.

Within each pipeline stage, tasks run in parallel, in batches of up to five. For example, during the Care Gaps stage, up to five Care Gaps will be calculated simultaneously. Transformers are run in an order that respects any dependencies that may exist between them; this order is calculated automatically by inspecting the SQL of each transformer. The same is true for Data Elements. ↩
When a pipeline task fails, the remaining steps within that pipeline stage will still run, except where noted. For example: if a Care Gap fails, all other Care Gaps will still be run. If a Transformer fails, other Transformer will still run, except for any specific Transformers that have a dependency on the failed Transformer. ↩
SQL syntax errors are checked when the pipeline task is saved. However, SQL runtime errors may still occur. For example: “PG::UndefinedColumn: ERROR: column “started_on” does not exist.” ↩
After calculating quality measure compliance at the patient level, aggregate compliance statistics are also calculated; these aggregate results are stored in the `fact_measure_results` table in the rdm schema. ↩
Custom Alerts which are configured with a “When to run” of “After pipeline finishes” will not run if Transformers or Data elements fail. This is because if one of these early pipeline stages fails, you will likely receive an email about that, and we don’t want to inundate you with emails when something goes wrong. ↩