Book Chat: Designing Data-Intensive Applications

I start most of these book chats by describing the book and end by describing the reader who it seems like it would benefit. This time I’m going to describe the reader first.

If you write software that interacts with another computer you should read this. To me, the title is somewhat misleading since it focuses on the data aspect, however the intensity comes from the size of the data and implies a distributed system. You also get an amazing survey of different database implementations as a side benefit. The only complaint that I have is that the first part of the book starts very much at the beginning of data-intensive systems (e.g.,  data locality and how it is organized on disk), which is important if you are building your own database system but isn’t as applicable to the average reader.

There are sections on a vast number of different topics. It covers both SQL-style ACID databases as well as BASE-style NOSQL databases. It even gets into things like graph databases that don’t fit neatly in either box. The authors cover topics as varied as locking and commit strategies, the levels of consistency available in a database and what they really mean, distributed consensus, replication, and streaming.

The majority of the text is written in a technology-agnostic way but it will reference specific implementations that demonstrate a concept. There is also a deep academic rooting with a well-referenced selection of footnotes to satisfy any further curiosities you may have on the topics. It seems like it should be fairly accessible as a whole text to a relative beginner since it introduces concepts in a way that doesn’t require much prior knowledge. I don’t think a beginner could jump into a chapter in the middle and be able to follow along, but given the complexity of the topic I don’t think that’s an unreasonable thing.

Even if you aren’t building a database, deeply understanding the tradeoffs of the database you are using will make your application more correct. The difficulty of testing into a lot of the concurrent failure scenarios makes understanding the system at a logical level the only way to attempt to handle all failure cases. I do think all software engineers working on the web would benefit from the material here. It won’t make your day-to-day much better but it will help keep you out of the really bad places where the system is intermittently failing.