Speeding up the Data Pipeline

If the Data Pipeline is too slow to complete, you can speed it up by using the information in this article.

First, check which step is taking a long time by going to Data Pipeline > Monitor Pipeline. In the row labeled “Main Pipeline”, click “View Details”.

If Data Acquisition is slow

If a few tables take far longer to acquire than others, consider importing these slow tables less frequently. To do so, configure a separate Acquisition Plan which runs once per week for these tables, and remove them from the main daily Acquisition Plan.

If the entire data acquisition is slow, the details and troubleshooting steps will depend on the configuration of your acquisition. If your health center hosts the source database and the Data Acquisition Agent, check for the following:

  • Performance issues (such as CPU or RAM spikes) for the machines that host the source database or the Data Acquisition Agent;
  • A slow connection between the machines that host the source database and the Data Acquisition Agent;
  • A slow connection from the machine that hosts the Data Acquisition Agent to the Relevant application, which lives on Relevant’s infrastructure in Google Cloud.

The health center’s IT department, or the IT vendor that manages the source database, will need to troubleshoot. We recommend testing network latency between the three endpoints discussed above; reducing the number of running processes on the Data Acquisition Agent machine and the source database machine; or increasing the CPU and RAM for both machines.

If Quality Measures are slow

If all Quality Measures take a long time to run, consider reducing the number of measurement periods that are computed (measures default to running the last 3 years’ worth of measurement periods).

To speed up all Measures, see Configuring Measure Runs to use Fast Runs. Fast runs allow you to run fewer measurement periods on most days, and you can still run the full set on some days (we recommend one day per week, generally Sunday) to update older measurement periods.

If Transformers, Data Elements, Care Gaps, or Quality Measures are slow

If one or a few Transformers, Data Elements, Care Gaps, or Quality Measures take a long time to run, there may be inefficiencies in the SQL that is being run. Check for indexes on each join, and identify any other refactors that can be done to increase speed or decrease memory usage.

This article lists only some possible interventions for slow Data Pipelines. You can reach us for additional support on speeding up the Data Pipeline at support@relevant.healthcare.