Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

Data Observability: Building Your Own Data Quality Monitors Using SQL

Ryan Kearns
Level Up Coding
Published in
10 min readJan 28, 2021

--

Data Observability in practice

$ sqlite3 EXOPLANETS.db
sqlite> PRAGMA TABLE_INFO(EXOPLANETS);
0 | _id | TEXT | 0 | | 0
1 | distance | REAL | 0 | | 0
2 | g | REAL | 0 | | 0
3 | orbital_period | REAL | 0 | | 0
4 | avg_temp | REAL | 0 | | 0
5 | date_added | TEXT | 0 | | 0
sqlite> SELECT * FROM EXOPLANETS LIMIT 5;

Freshness

Plotting rows_added vs. date_added
Plotting days_since_last_update vs. date_added
Freshness detections!
Freshness detections
Note the two undetected outages — these must be fewer than 3-day gaps.

Distribution

Our first distribution anomalies.

What’s next?

--

--

Written by Ryan Kearns

Data Scientist at Monte Carlo. Previously Stanford Phil. & CS; Stanford Open Virtual Assistant Lab. I cannot pass a Turing test, would you like to play chess?

Responses (2)

Write a response