Book Chat: Site Reliability Engineering

Site Reliability Engineering is about the practices and processes Google uses internally to run their infrastructure and services. There are a series of principles and practices espoused for how to run that sort of highly available distributed systems. Some of the practices are obvious, like having a good plan for what to do during an incident; some are more complex, like how to design a system to be resilient to cascading failures.

For those unaware of the Site Reliability Engineering (SRE) team at Google, it is a hybrid operations-software engineering team that isn’t responsible for functionality of a system but is responsible for ensuring that the service meets its uptime requirements. Not all services get a corresponding SRE team, just those with higher business value and reliability needs. By bringing in individuals with the blend of skills that are not as common and giving them this unique mission they are uniquely positioned to solve reliability problems in a systematic way.

The book describes a framework for discussing and measuring the risks of changing a software system. Most incidents are the direct result of a change to the system. The authors argue that necessitates putting the team that is responsible for the reliability of the system into the flow of releases and giving them the ability to influence the rate of change of the underlying service. That allows them to flow information back to the engineers building the system in a structured way. The ability to ‘return the pager’ gives the SRE team leverage that a normal operations team doesn’t have when dealing with an engineering team.

The limits of operational burden on the SRE team are a strong cultural point. The team is engineers and they need to leverage their software engineering skills to automate their jobs so that the number of SREs scales with the complexity of the service not the size of the service. By placing this limit to the amount of manual work the team engages in and the fact that they have a process in place for how to reboot a team that has gotten too deep into manual work builds a strong understanding of what a successful team looks like. The cultural aspect of rebuilding a team is more important than the technical aspect of it since each of these people knows how to do the right thing but their priorities have gotten warped over time.

As someone on the engineering side, there are significant portions of the book that aren’t immediately relevant to what I do. In reading this I may have learned more than I ever really wanted to know about load balancing or distributed consensus protocols. But the sections on effective incident response, post mortems, and culture more than make up for it for me.

The SRE discipline is an interesting hybrid of software engineering and software operations, and it is the only real way to handle the complexities of software systems going forward. The book stressed repeatedly that it takes a special breed to see how to build the systems to enable automation of this sort of work. I can see that in the operations staff I’ve interacted with over the years. A lot of them had a strong “take a ticket, do a ticket” mentality with no thought on to how to make the tickets self-service, or remove the need to perform the task at all. It’s a lot like bringing back the distinction between systems programming and application programming, where there was one kind of engineer that was capable of working at that lower level of the stack and building the pieces other users could work with on top of that.

Overall I enjoyed the book. It brought together the ideas that operations teams shouldn’t be that different from the engineering teams in terms of the sort of culture that makes them effective. The book really covers good software practices from the guise of that lower level of the operational stack. Then again I’m a sucker for the kind of software book that has 5 appendices and 12 pages of bibliography.

Advertisements

Book Chat: Refactoring

Refactoring sets out describe what refactoring is, why you should refactor code, and to catalog the different refactorings that can be done to an object oriented codebase. This isn’t the first instance of the idea of refactoring, but it was the big coming out party of the idea in 1999. It is an audacious goal in that the effort to catalog all of anything can be daunting. While I’m not an authority on refactoring by any means, it certainly captured all of the basic refactorings I’ve used over the years. It even includes refactoring to the template method design pattern, though it doesn’t reference something like refactor to the decorator pattern. It seems odd to have included refactor to one design pattern but not to several others.

The description of the “what” and “why” of refactoring are excellent and concise. The catalog is ~250 pages of examples and UML diagrams of each refactoring technique; that each refactoring needed to be shown, feels like overkill. In general, the author shows both directions of a refactor, e.g., extract method and inline method, which can be rather overwhelming. A newer volume on refactoring like Working Effectively With Legacy Code seems more useful in its presentation of actual refactoring techniques, in that it prioritizes where we wish to go, rather than exhaustively describing each individual modifications. Honestly, I think that since Refactoring predates automated tools for performing refactoring, given that  the internet in 1999 wasn’t as full of help on these sorts of topics, the book needed to be more specific since it was the only source of help.

It’s an interesting historical piece, but not an actively useful thing to try to improve your craft.

Book Chat: Beyond Legacy Code

Beyond Legacy Code is a description of nine practices to help improve the value of software. The author directed it not just at developers or engineers, but also at development or IT managers, product managers, project managers, and software customers. That’s a broad array of people who are coming to a problem with a wide set of goals and preconceptions. Eight of the nine practices are pretty normal and obvious items for most software engineers. One however was novel to me: implement the design last.

The basic idea is pretty straight forward – do a sort of bottom up build of components and then compose them into larger and larger units. Then allow the design of the larger pieces to emerge from that. Since you already have all these well written and tested units you can compose them together safely in ways that you understand. It keeps you from reaching for a more complex design pattern that you may not need because you are still working through all of the smaller pieces. I see it as the red-green-refactor mantra in the macro sense.

I had often tried to accomplish this similarly by starting at the top and stubbing out other smaller pieces as I went. This didn’t always work out since the interface for the piece you stubbed out may not have the information it needed to do its work. I have also seen this end up with odd pieces that don’t really make sense outside of the context of what I was working on so I had less reusable components afterwards. Overall it worked fairly well to try to map decompose the problem in the initial pass.

Since reading this book, I’ve tried their bottom up buildout a couple of times. It seems to have taken me significantly longer to do the work, but I think the overall reusability of the subsequent design is better. I feel that with more practice that I should be able to be at least as productive as before. I haven’t had to come back to any of the code I wrote like this in maintenance so I don’t have any data yet on if it delivers on the maintainability ideas that it promises.

I don’t think that book delivers to the full audience of people who are intended as readers, but it feels well directed at Software Engineers, considering the principles and guidelines we use. I don’t know what a large portion of the audience would get from reading this other than a familiarization with the terms used so they could communicate better. I don’t see how it would cause a project manager to reconsider the schedule, or an IT manager to deal with a project differently. Maybe I can’t take their point of view well enough, so I saw large portions of the suggested practices as ‘normal’ but to those roles they would help articulate the value of unit testing. I don’t know of a better modern book for those involved in the management side of software without a software background that is still technical. Classics like Peopleware or The Mythical Man-Month still show most of what you need to do to run a software team from a strictly management perspective and this doesn’t supplant those. Looking at the reviews on Amazon though it seems as though my concerns that this isn’t what non-developers want seems to be unfounded. The consistent praise it is garnering there makes me curious to see what other non-developers I know would think if they read it.

Long Term Open Source Success

I was having a discussion with some of my coworkers at our engineering book club, talking about the first few chapters of Clean Architecture. We were discussing whether anyone had worked at a place that took a really long-term view of the architecture of a system. Most people didn’t think they had, I thought I might have once but it was hard to say if the company’s leadership had that view or that they got lucky with some key engineers just making it happen. Unfortunately the company ran into some business issues, with the very large competitors in the industry picking off their largest customers. From there I posited a different question to the group: is the architectural difference between open source software and commercial software one of the reasons for its long term success?

There ended up being a palpable pause and the idea at once made sense to everyone, but nobody was sure of how to try to confirm or deny the thought. We all agreed open source, at least initially, is built by software devotees for personal reasons. Whether it is for personal usage, to learn something, or to prove out an idea, people take time out of their day and build a piece of software and put it out there for the world to see. It’s finished when it’s finished and if you aren’t enjoying working on it then it’s no big loss to just set it aside and do something else. There was some discussion around the open source code with paid support model and whether that had more to do with the development of large chunks of open source software. There was a discussion about a resurgence in popularity for postgres because it was feature packed and solid, and whether its continuation in flexibility was because of an underlying architectural quality difference, given that it does not have corporate backing like other common relational databases.

From an ideological point of view, the idea that software succeeds in the long term because of better architecture is greatly pleasing. I would love to see data bearing that idea out, but I don’t know how you would get access to an appropriately large selection of equivalent projects. Something like The Architecture of Open Source Applications tries to make it easier to understand the big picture of some successful and long lived open source applications, but you would need a match set of closed source applications to compare and contrast against.

Building a great architecture requires taking time to deeply understand the problem. Sometimes you truly need to “build one to throw away,” knowing all the cost that represents. The pressure of commercial success and unbridled growth puts short term thinking into the forefront, and often prevents throwing it away.

Open source is a thousand developers trying to fix problems they run into alone and putting it out there. When you hear about the solution you go take a look at it. If its API doesn’t work for you the project doesn’t gain a new follower and without followers there are no contributors and the project eventually withers and dies. If the API does work for others and it solves problems the project grows. You can look at it as a distributed genetic algorithm doing API first development. You rarely hear about the ideas that don’t catch on; you hear about the things  where people felt they gained value.

One of my favorite definitions of software architecture is making the decisions that are hard to change. If an open source project decides they are just going to change their underlying central abstractions for the next year, that’s a strictly technical decision. They don’t need to build a business justification, or fight to get it on the schedule. They don’t even necessarily need a consensus that it’s a good idea, they could just fork the project if they felt so strongly that was the proper solution.

At work we’ve got a central architectural abstraction that is, in my opinion, not right. I could go build a PR on my time to change it, but that change would then ripple out to a dozen other teams as they start using the new abstraction. I could help them adopt the new abstraction, but I would need a strong consensus from those teams to get the PR in. Even though I think that consensus exists in engineering, the product schedule doesn’t leave time for this sort of change. It’s hard to quantify the drag this abstraction is causing which makes that difficult. If you could use a library that was built by those who are quality obsessed or a library built by those who are trying to match schedule, which would you choose?

Book Chat: Elastic Leadership

I recently ran into a situation at work that I wasn’t sure how to resolve the specifics of the situation aren’t important to this post. I ended up rereading several books looking for some sort of kernel of knowledge that would give me some additional guidance on what to do. I started with Peopleware, moved on to Managing Humans, and finally ended up on Elastic Leadership. Here I found something to help with my problem.

The “something” was a description of how people are influenced that felt like it applied to my problem and helped break down my feeling in a way that I could describe to others. The influence description consisted of two axis, the type of influence (personal, social, and environmental), and ability vs motivation. It ended up with the six zones of influence. For example, personal-ability is influence through skills you have, while environmental-motivation is structural incentives like giving public recognition for different kinds of behaviors. Looking at the problem from the perspective of each zone helped me to articulate my problem and arrive at a course of action.

There are other useful constructs in the book as well. There is an alternative to the “Storming-Forming-Norming-Performing” model of group development. This alternative model has three stages: surviving, learning, and self-organizing. This construct is used to describe how the behavior of a manager should be different in different stages of the team’s development. When you are in the surviving phase the manager’s goal is to get the team to the learning phase. Once in the learning phase the goal is maximize learning and enable the team to gain the confidence to self-organize. I identified with this model, since it emphasizes that the role of a manager for a team in trouble is vastly different than a team in a good place.

Overall it’s an interesting read but a lot of it is what I would describe as management advice rather than leadership advice, in the sense that you need to be in a place of structural power to use a lot of it. Even then, understanding of the management perspective can help you understand the situations going on around you, like it did for me.

Book Chat: The Master Algorithm

The Master Algorithm is a description of the pros and cons of different machine learning techniques and the author’s quest to unify them into a single algorithm that can tackle any kind of problem. It has sections on five major kinds of learning algorithms: nearest neighbor, naive bayes, decision trees, support vector machines (SVM), and neural networks. It then covers Alchemy, the author’s attempt to unify multiple disparate styles of learning algorithms into a single overarching implementation. Overall the book succeeds at a popular science level of description of the current status of machine learning techniques, but it didn’t satisfy my needs as someone closer to the software.

The author declared an intention to keep the amount of math to a minimum. It ended up that the author tried to describe mathematical concepts in prose, and that didn’t work as well for me as just using some formulas probably would have. I wanted either enough math that I felt that I understood what was going on fully or no math at all so I didn’t feel like I had a mediocre partial understanding.

The descriptions of the individual learning algorithms made sense with a surface read. Once I stopped to try and really understand the differences it was less clear, I think the lack of hard formulas impeded my understanding. There was a section that described a unification of several different algorithms, but turned out to be a metaphor for early attempts to unify learning algorithms where the math never worked. The inclusion of this effort and the way it was described ended up being confusing to me.

The discussion of the real, but incomplete, unification scheme in Alchemy was interesting. The implication that you would still need an advanced degree in machine learning to use it says to me there is more to do. If machine learning is truly going to change the world the means of training models needs to be opened up to at least the average software engineer, if not business users.

I feel like the author’s particular perspective put him too close to the problem to really write a more popular science style book. I think he could have written the book I was looking for with more math and a more practical software aspect. From a technical prerequisite perspective this could be an excellent text for people who are trying to come at machine learning from a non-technical background.

Book Chat: Becoming A Technical Leader

I had added this to my list of books to read a while back, then after seeing a copy of it in the office getting passed from person to person it jumped to the top. I can’t think of a specific anecdote from the book I think you would need to know, but I do think if you are involved in making software you should read it. It presents a sort of zen of software management where you ensure that those doing the work have the space and resources to solve the problem. It’s expressed in a lot of short chapters that mostly center around anecdotes from the author’s career as a consultant.

It talks about how to help spur innovation and motivate others. There are ideas that I had seen in practice from my manager about how to craft and express a vision. I specifically remembered the discussion of the vision since it was really effective and highly motivating. The template for the discussion came straight from the book. Initially reading the template I felt skeptical, I’m not sure if it was because I didn’t remember the usage of the template and it was tickling the back of my mind, or because it doesn’t seem good on paper. Having seen the results of it I am convinced.

There was a chapter about power conversion which seems to me the central idea. If you have power in one realm you can use it to express power in another realm by figuring out how to leverage your strengths. This seemed like the one specific point that I personally need to learn more about. I haven’t figured out how to do it yet, but I keep circling back to this particular idea as something I need to figure out.

It strikes me as a lot of Theory Y management, although it never says that. Since this is about leadership the act not management the title, it is applicable to both engineers who want to stay on the technical track and those who are interested in the management side as well. If you’ve got a strong grasp on the technical portion of the job and want to better understand how to expand your influence check this out.

Book Chat: The Psychology of Computer Programming

The Psychology of Computer Programming by Gerald Weinberg is describing the how and why of computer programming in the abstract. It covers topics like when and where to leave comments, how the choice of programming language influences the eventual program written, or how to go about hiring programmers. I read the silver anniversary edition which added some annotations about how events had changed between the original 1971 release and 1997 when the silver anniversary edition was written.

The book starts out with talking about reading programs with some example code written in PL/1. The example reads fine even if you know nothing about PL/1, it goes through several variants of the same program, dissecting the pros and cons of each implementation. While modern programming has mostly eschewed limits on memory usage and program size, similar pros and cons could be applied to things like GC pressure or context switching.

Each chapter closes with a series of introspective questions about the topic for programmers and  managers , mainly about how it could be applied to your day to day activities. After a chapter on programming as a social activity it asks the manager, “In setting your own working goals, what part is set by what is passed down from above and what part is set by what comes up from below? Are you satisfied with this arrangement, or would you like to alter it in some ways?” Whereas it asks the programmer “What part do you play in setting the goals of your team? What part would you like to play? What part would you like others to play?” These two sets of questions sort of suggest a  conversation between perspectives and helped me to understand the perspective of management better.

The section on time sharing systems vs batch systems did not age well since neither system is used anymore. It was still interesting to see a breakdown of the pros and cons of the two systems and how it impacts the culture of the workplace. It provided a case study of a company where they switched from a batch system to a time sharing system, which resulted in a breakdown of the informal communication system between the developers. Under the batch system they would congregate around the result return since when the results would be back wasn’t certain. Once they switched to a time sharing system everyone spent time in their office and there was little communication and teamwork among the programmers.

There was no single takeaway from this book where I would recommend that you should read this to achieve a particular end. Overall it was an interesting read from a conceptual perspective, but I don’t think it’s applicable to the average programmer. There is more value I think on the management side, but since that isn’t what I do it’s harder for me to judge.

Book Chat: Programming in Scala

I had previously mentioned Programming in Scala when discussing Scala for the Impatient, saying that Scala for the Impatient was written as a reaction to the ~800 page bulk of Programming in Scala for people who wanted just enough to get started with. After having read Programming in Scala it feels like criticism of its length is fair. The first half of the book was massive overkill for people who had experience in any C derived language or any other sort of object oriented language. There were sections that were marked off as optional reading  if you were familiar with Java because the behavior being described was similar. In comparison, the second half of the book was a wonderful experience even for an experienced programmer, since there were in depth explanations of all of the advanced language features.

Some of the things I learned were simple. For instance, that regular expressions can be used as extractors, which is a straightforward idea. Or, that predef is implicitly imported everywhere. Or that the arrow operator is actually defined as an implicit conversion in predef and not an explicit part of the language.

Other sections were more complex. The rules for how for expressions get mapped into other syntax were similar to what I had figured out, but the rule about how conditionals and assignments within the expression are evaluated added a lot of clarity to what I had learned by doing. I learned the ways you can use type bounds to improve variance indicators. The authors also discussed the transform method on futures that will be available in Scala 2.12, which has me excited to get to that upgrade.

There were some other things covered that even after a considerable study I’m not sure I understand. I understand the syntax for refinement types but I don’t think I understand the value even after the provided in-depth example using currencies. There was also an in-depth discussion of how the designers arrived at CanBuildFrom in the collections package. CanBuildFrom enables extraction of common operations from many collections but returns a collection of that same type and not some supertype. It makes sense in an abstract sense, but I don’t think I could implement a similar pattern without copying it directly out of the book.

Despite the book’s heft, there were a couple of topic I would have liked to know more about. I was hoping for a discussion of the reflection capabilities provided by manifests, type tags, and class tags, but since they are just library pieces and not integral to the language they weren’t covered. There were some oblique references to how bytecode gets generated from various Scala structures, but I was hoping for more insight into how to make interfaces that are less susceptible to breaking changes under the hood even when the Scala side looks fine.

Overall it’s a good read and not as long to read as you would think a book this size would be. It’s easily divided up into small sections so you can easily sit down and read a page or two and make progress over time.

Book Chat: How To Solve It

How To Solve It isn’t a programming book. It’s not exactly a math book either, but you will find yourself doing geometry while reading it. It isn’t a book on logic, but it is all about structured thought processes. I would describe it as a manual to teaching a systematic approach to problem solving to others, using geometry and a series of examples. It tries to lay out all of the thoughts that whiz through your head when you see a problem and understand how to solve it without really contemplating how you knew it. It’s a fast read, assuming that you know the geometry he uses in the examples.

The problem solving process is broken into four basic steps: understanding the problem, devising a plan, carrying out the plan, and looking back. At first it seems obvious, but that’s thing about a structured approach, you need to cover everything and be exhaustive about it. For example, to understand the problem you identify the unknown, identify the data, identify what you want to accomplish, try to draw a picture, introduce suitable notation, and figure out how to determine success. If you wanted to know should you buy milk at the store this sort of formal process is overkill, but if you are struggling with a more complex problem like trying to figure out what’s causing a memory leak or setting up a cache invalidation strategy it might be valuable to structure your thoughts.

I haven’t had a chance to apply it to a real problem yet. I did use some of the teaching suggestions – how to guide the pupil to solve their own problems – with one of the junior engineers I mentor and it seemed productive. I got him to answer his own question, however not enough time has passed to see if it improves his problem solving abilities in the future.

Overall the book was an interesting experience to read and seems practically applicable to the real world.