Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

Data Observability: Building Your Own Anomaly Detectors Using SQL

Ryan Kearns
Level Up Coding
Published in
12 min readFeb 11, 2021

--

Image courtesy of Lucas Pezeta on Pexels.

Our Data Environment

$ sqlite3 EXOPLANETS.db
sqlite> PRAGMA TABLE_INFO(EXOPLANETS);
0 | _id | TEXT | 0 | | 0
1 | distance | REAL | 0 | | 0
2 | g | REAL | 0 | | 0
3 | orbital_period | REAL | 0 | | 0
4 | avg_temp | REAL | 0 | | 0
5 | date_added | TEXT | 0 | | 0
sqlite> SELECT DATE_ADDED FROM EXOPLANETS ORDER BY DATE_ADDED DESC LIMIT 1;2020–07–18
sqlite> SELECT DATE_ADDED FROM EXOPLANETS_EXTENDED ORDER BY DATE_ADDED ASC LIMIT 1;
2020–01–01
sqlite> SELECT DATE_ADDED FROM EXOPLANETS_EXTENDED ORDER BY DATE_ADDED DESC LIMIT 1;
2020–09–06

Visualizing Schema Changes

sqlite> PRAGMA TABLE_INFO(EXOPLANETS_EXTENDED);
0 | _ID | VARCHAR(16777216) | 1 | | 0
1 | DISTANCE | FLOAT | 0 | | 0
2 | G | FLOAT | 0 | | 0
3 | ORBITAL_PERIOD | FLOAT | 0 | | 0
4 | AVG_TEMP | FLOAT | 0 | | 0
5 | DATE_ADDED | TIMESTAMP_NTZ(6) | 1 | | 0
6 | ECCENTRICITY | FLOAT | 0 | | 0
7 | ATMOSPHERE | VARCHAR(16777216) | 0 | | 0
sqlite> SELECT
...> DATE_ADDED,
...> ECCENTRICITY,
...> ATMOSPHERE
...> FROM
...> EXOPLANETS_EXTENDED
...> ORDER BY
...> DATE_ADDED ASC
...> LIMIT 10;
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
2020–01–01 | |
sqlite> PRAGMA TABLE_INFO(EXOPLANETS_COLUMNS);
0 | DATE | TEXT | 0 | | 0
1 | COLUMNS | TEXT | 0 | | 0
sqlite> SELECT * FROM EXOPLANETS_COLUMNS ORDER BY DATE ASC LIMIT 1;
2020–01–01 | [
(0, ‘_id’, ‘TEXT’, 0, None, 0),
(1, ‘distance’, ‘REAL’, 0, None, 0),
(2, ‘g’, ‘REAL’, 0, None, 0),
(3, ‘orbital_period’, ‘REAL’, 0, None, 0),
(4, ‘avg_temp’, ‘REAL’, 0, None, 0),
(5, ‘date_added’, ‘TEXT’, 0, None, 0)
]
sqlite> SELECT * FROM EXOPLANETS_COLUMNS ORDER BY DATE DESC LIMIT 1;
2020–09–06 | [
(0, ‘_id’, ‘TEXT’, 0, None, 0),
(1, ‘distance’, ‘REAL’, 0, None, 0),
(2, ‘g’, ‘REAL’, 0, None, 0),
(3, ‘orbital_period’, ‘REAL’, 0, None, 0),
(4, ‘avg_temp’, ‘REAL’, 0, None, 0),
(5, ‘date_added’, ‘TEXT’, 0, None, 0),
(6, ‘eccentricity’, ‘REAL’, 0, None, 0),
(7, ‘atmosphere’, ‘TEXT’, 0, None, 0)
]
DATE:         2020–07–19
NEW_COLUMNS: [
(0, ‘_id’, ‘TEXT’, 0, None, 0),
(1, ‘distance’, ‘REAL’, 0, None, 0),
(2, ‘g’, ‘REAL’, 0, None, 0),
(3, ‘orbital_period’, ‘REAL’, 0, None, 0),
(4, ‘avg_temp’, ‘REAL’, 0, None, 0),
(5, ‘date_added’, ‘TEXT’, 0, None, 0),
(6, ‘eccentricity’, ‘REAL’, 0, None, 0),
(7, ‘atmosphere’, ‘TEXT’, 0, None, 0)
]
PAST_COLUMNS: [
(0, ‘_id’, ‘TEXT’, 0, None, 0),
(1, ‘distance’, ‘REAL’, 0, None, 0),
(2, ‘g’, ‘REAL’, 0, None, 0),
(3, ‘orbital_period’, ‘REAL’, 0, None, 0),
(4, ‘avg_temp’, ‘REAL’, 0, None, 0),
(5, ‘date_added’, ‘TEXT’, 0, None, 0)
]

Visualizing Lineage

Introducing: HABITABLES

sqlite> PRAGMA TABLE_INFO(HABITABLES);
0 | _id | TEXT | 0 | | 0
1 | perihelion | REAL | 0 | | 0
2 | aphelion | REAL | 0 | | 0
3 | atmosphere | TEXT | 0 | | 0
4 | habitability | REAL | 0 | | 0
5 | min_temp | REAL | 0 | | 0
6 | max_temp | REAL | 0 | | 0
7 | date_added | TEXT | 0 | | 0
sqlite> SELECT * FROM HABITABLES LIMIT 5;

Investigating an Anomaly

$ sqlite3 EXOPLANETS.db < queries/lineage/habitability-zero-rate-detector.sql
DATE_ADDED | HABITABILITY_ZERO_RATE | PREV_HABITABILITY_ZERO_RATE
2020–07–19 | 0.369047619047619 | 0.0

What’s next?

--

--

Written by Ryan Kearns

Data Scientist at Monte Carlo. Previously Stanford Phil. & CS; Stanford Open Virtual Assistant Lab. I cannot pass a Turing test, would you like to play chess?

No responses yet

Write a response