Streaming systems compute over unbounded input. There is no "end of file." The consumer must produce useful answers continuously, with bounded latency, while new data keeps arriving — and arriving out-of-order, and arriving late, and sometimes never arriving at all.
The whole field is engineered around two clocks that disagree: event time (when the thing happened in the real world, encoded in the event) and processing time (when your system saw it). They diverge constantly. A user's phone goes offline at 14:32, queues 200 events, comes back online at 14:47 and uploads them all at once. Event time of those events: 14:32. Processing time: 14:47. Your stream job has to choose which one to use, and the choice is almost always event time.
Kafka: the canonical log
Five concepts fit on the back of a napkin:
- Topic — a named append-only log.
orders,page_views. - Partition — a topic is split into N partitions; each partition is an independent ordered log. Ordering is per-partition, not per-topic.
- Producer — writes records to a topic. Picks a partition by hashing a key, or round-robin if no key.
- Consumer group — a set of consumer processes that share the work of reading a topic. Kafka assigns each partition to exactly one consumer in the group. Add consumers, work spreads. Up to the partition count, after which extra consumers idle.
- Offset — each consumer group remembers where it last read in each partition. Read is non-destructive; rewinding to an old offset replays history.
user_id=42 always land in the same partition, so they're ordered. Two events for different users may interleave.Event time vs processing time
Imagine a "count of events per minute" job. At 14:35 (processing time) you see an event with event-time 14:32. Does it count toward the 14:32 bucket (event time) or the 14:35 bucket (processing time)? Almost always you want event time — that's what business users mean. But event time is hard, because you don't know when the event-time minute is "complete." An event for 14:32 might still arrive at 14:50.
This is what watermarks exist to solve. A watermark is the system's promise: "I think I've seen all events with event-time ≤ T." When the watermark passes 14:33, the engine emits the result for the 14:32 minute and assumes it's done. Late data that arrives after the watermark is either dropped, sent to a dead-letter, or triggers a "late update" that revises the result.
The watermark, watched
Window types
| Window | Shape | Use for |
|---|---|---|
| Tumbling | Fixed, non-overlapping (every minute, every hour) | "events per minute" |
| Hopping (Sliding) | Fixed, overlapping (every 30s, sized 5min) | moving averages, smooth dashboards |
| Session | Variable, gap-based (closes after T seconds of inactivity) | user sessions, IoT bursts |
| Global | One window forever; uses custom triggers | running totals with manual flush |
Delivery semantics: at-most-once, at-least-once, exactly-once
Every streaming system makes a promise about what happens when something fails. The three levels, in ascending complexity:
- At-most-once. Each message is delivered zero or one times. If the consumer crashes after reading but before processing, the message is lost. The consumer advances its offset before processing. Simplest, but data loss is possible. Acceptable only for non-critical metrics (e.g. approximate page-view counters where loss is tolerable).
- At-least-once. Each message is delivered one or more times. The consumer commits its offset only after successfully processing. On crash and restart, unacked messages are redelivered. No data loss, but duplicates are possible. The consumer must be idempotent (able to safely process the same message twice). This is the default for most Kafka consumers and most streaming frameworks out of the box.
- Exactly-once. Each message is processed exactly once, with no duplicates and no loss. Hard to achieve in a distributed system. Kafka provides the building blocks: (a) idempotent producers (sequence numbers prevent broker-side duplicate writes), (b) transactional producers (atomic multi-partition writes), (c) read-committed consumers (only see committed transactions). Together these form the "exactly-once semantics" (EOS) API — but both producer and consumer must participate. If the sink (e.g. a database) doesn't support transactional writes, you need a different approach (e.g. upsert with a deduplication key).
Flink vs Spark Structured Streaming
Two engines dominate production streaming in 2026. Both are mature and capable; the right choice depends on your latency target, team skillset, and existing infrastructure:
| Apache Flink | Spark Structured Streaming | |
|---|---|---|
| Processing model | True streaming (record-by-record, pipeline-based) | Micro-batch (processes small batches at a configured trigger interval) |
| Latency | Sub-second (milliseconds possible) | Seconds (micro-batch interval; configurable but typically 1-30s) |
| State management | First-class: RocksDB-backed, incremental checkpoints, queryable state | Managed via Spark's structured state store; less flexible for complex stateful ops |
| Exactly-once | Native EOS via checkpoints + 2-phase commit to sinks | EOS via WAL + idempotent sinks; depends on sink support |
| Event time / watermarks | First-class, per-operator watermarks, flexible late-data handling | Watermark support in Structured Streaming; less expressive than Flink for complex patterns |
| Ecosystem fit | Strong for dedicated streaming infrastructure; Confluent Cloud, Kinesis, CDC | Natural for teams already on Spark/Databricks; unified batch+streaming codebase |
| Sweet spot | Low-latency streaming, complex stateful logic, fraud detection, CEP | Spark shops wanting near-real-time without a separate streaming cluster |
Quick check
Key takeaways
- Always use event time for business aggregations. Processing time is non-deterministic — the same event produces different results depending on when it arrived. Event time is what the business means; use it, even though it's harder.
- Watermarks are a latency/completeness trade-off knob. Tight watermark = low latency, more dropped late data. Loose watermark = less dropped data, higher latency on all results. Set to p99 of observed event lateness for your workload.
- At-least-once + idempotent sink is usually the right call. True end-to-end exactly-once requires every component to participate; at-least-once with a deduplicating upsert sink achieves the same observable result at lower complexity.
- Flink for sub-second and complex state; Spark Structured Streaming for Spark shops. Don't over-engineer: the micro-batch model handles most analytical streaming needs. Reach for Flink when latency or state complexity genuinely demands it.
- Kafka partition count is the parallelism ceiling. You cannot have more active consumers (in a group) than partitions. Size partitions at design time; increasing them later breaks key ordering.