How to make your data pipelines more reliable with precision and recall

Barr Moses & Ryan Kearns

Image courtesy of Edvin Richardson on Pexels

Data pipelines can break for a million different reasons, but how can we ensure that this “data downtime” is identified and addressed in real-time? Sometimes, all it takes is some SQL, a Jupyter Notebook, and a bit of machine learning.

In this article series, we walk through how you can create your own data observability monitors from scratch, mapping to the five key pillars of data health. Part I can be found here, and Part II can be found here.

Part III of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training…


Using metadata to unlock context and take your data quality monitors to the next level

Barr Moses & Ryan Kearns

Image courtesy of Lucas Pezeta on Pexels.

In this article series, we walk through how you can create your own data observability monitors from scratch, mapping to five key pillars of data health. Part 1 can be found here.

Part 2 of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training, Managing Data Downtime: Applying Observability to Your Data Pipelines, the industry’s first-ever course on data observability. The associated exercises are available here, and the adapted code shown in this article is available here.

As the world’s appetite for data increases, robust data pipelines are all the more imperative…


How to build your own data monitors to identify freshness and distribution anomalies in your data pipelines

In this article series, we walk through how you can create your own data observability monitors from scratch, mapping to five key pillars of data health. Part 1 of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training, Managing Data Downtime: Applying Observability to Your Data Pipelines, the industry’s first-ever course on data observability. The associated exercises are available here, and the adapted code shown in this article is available here.

From null values and duplicate rows, to modeling errors and schema changes, data can break for many reasons.

Ryan Kearns

Stanford Philosophy and Computer Science, c/o ‘22. Engineer & Data Scientist at Monte Carlo.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store