data-infra / field-guide v1.0 · 12 lessons · ~3h · IC5 / staff

Data infrastructure, at system-design depth._

A field guide to the data stack at the level of an IC5 / staff system-design interview. Storage internals, streaming semantics, lakehouse formats, and the operational craft — built around interactive simulators of every concept that's usually drawn on a whiteboard.

$ start lesson 01 → Browse the course

12 lessons 40+ live simulations ~3h total 1 mock interview at the end

the data stack · top → bottom stack.svg

01source

Where data is born.

App backends, mobile clients, IoT sensors, third-party APIs. Every event has a creator.

PostgresiOS SDKStripe

02log

The append-only spine.

An ordered, durable, partitioned log. Decouples producers from consumers. The cleanest abstraction in this whole stack.

KafkaKinesisPub/Sub

03process

Where shape changes.

Stream jobs filter, enrich, window, aggregate. Batch jobs do the same, just on bounded data.

FlinkSparkdbt

04store

Bytes that survive.

Object storage holds the raw. Open table formats give it ACID. Indexes give it speed.

S3IcebergParquet

05serve

Sub-second answers.

OLAP engines for BI, vector stores for ML, key-value stores for online features.

SnowflakeTrinoDynamoDB

06consume

The whole point.

Dashboards, ML features, billing, fraud, the recommender. The stack only matters because of this row.

LookerFeature storeAPI