Apache Kafka is a distributed event streaming platform built for moving massive amounts of data between systems in real time. Think of it as a high-throughput, fault-tolerant highway for data events. When a client project needs to process thousands of events per second, order placements, user actions, sensor readings, webhook payloads, Kafka is the infrastructure that makes it possible without losing a single event. I bring Kafka into projects where the data volume or real-time requirements exceed what a simple REST API and database can handle. It is not a tool for every project, but when you need it, nothing else fills the role.
LinkedIn had a data problem in 2010. The company was generating billions of events per day, page views, profile updates, connection requests, job postings, and needed to move that data between dozens of internal systems in real time. Existing message brokers like RabbitMQ and ActiveMQ could not keep up with the throughput requirements. Jay Kreps, Neha Narkhede, and Jun Rao, three LinkedIn engineers, built Kafka internally to solve this specific problem. They named it after Franz Kafka, the author, because it was "a system optimized for writing", though Kreps later admitted the name was mostly because it sounded cool. LinkedIn open-sourced Kafka in early 2011, and it became an Apache top-level project in 2012. Kreps and Narkhede left LinkedIn in 2014 to found Confluent, a company dedicated to commercializing and developing Kafka. By 2020, over 80% of Fortune 100 companies were running Kafka in production. The New York Times uses it to publish articles. Uber uses it to match riders with drivers. Netflix uses it to process billions of events per day for its recommendation engine. Kafka went from an internal LinkedIn tool to the backbone of modern data infrastructure in under a decade.
Kafka's architecture is fundamentally different from traditional message brokers. A traditional broker delivers a message to a consumer and then deletes it. Kafka writes messages to an append-only log on disk and retains them for a configurable period, days, weeks, or indefinitely. This means multiple consumers can read the same data independently, at their own pace, without interfering with each other. Your analytics pipeline can process yesterday's events while your real-time dashboard processes events from this second. If a consumer crashes, it simply picks up where it left off. This log-based design also makes Kafka extraordinarily fast. Because writes are sequential (appended to the end of a file), Kafka achieves disk write speeds that approach the theoretical maximum of the underlying hardware. A single Kafka broker can handle hundreds of thousands of messages per second. For my client projects that need event-driven architecture, an e-commerce platform that updates inventory, sends confirmation emails, triggers analytics, and notifies warehouses all from a single "order placed" event, Kafka ensures every downstream system gets the data reliably without any of them needing to know about the others.
Building a system that processes thousands of events per second?