Book Chat: Effective DevOps

Effective DevOps is about the culture of the DevOps movement. The technical practices that today coincide with DevOps are the result of the culture practices, not the cause. The cause is an underlying culture that is safe, and respectful to those in it, which truly empowers the team to try things to improve the way that work is done and leads to the technical practices associated with DevOps. The book is overall written more from a management perspective than an individual contributor perspective. The book is centered around the four pillars of effective DevOps: Collaboration, Affinity, Tools, and Scaling.

Collaboration is the normal sort of mentoring, and workflow information that would be familiar to most agile or lean practitioners. The Affinity pillar though builds on top of Collaboration with the idea that it takes time and work to forge a group of individuals into a team and explores the requirements to build those bonds. These two pillars lead i7nto the Scaling pillar nicely since, while you can eliminate waste and automate things, at the end of the day the biggest scaling maneuver is in hiring. Hiring renews the importance of the Collaboration and Affinity aspects of this since as you bring new people into the system you must fully integrate them.

The section on the Tools pillar is written in a tool agnostic fashion, wherein it describes categories of tools commonly used to the DevOps. That makes it much more interesting than any other book that is tied to a particular set of technologies since it is focused on the concept not the implementation.

Overall it’s an interesting read. The focus on the social aspects of what’s going on makes it less useful in my day to day activities, but the longer I do this job the more I think that the technical aspect is essentially table stakes to doing the job and everything else is where more long term growth come from.

Advertisements

Burnout

Burnout is a common topic in our industry. I’m thinking about it right now because I got burnt out fairly badly recently and as a result stopped blogging for a while. It broke the commitment device I had formed by posting weekly. I think I’ve recovered and want to take this opportunity to discuss what happened and how I think I could have avoided burning myself out.

My team got split in two, some people left to form a new team with a new mission and some stayed in the existing team to continue the existing work. This meant that there was roughly the same work to do and fewer hands to do it. Also when this happened we started reporting up to a different executive. All of this change together was a bit of a shock to the team; our overall output took a hit from the loss of people, but our productivity stayed good. We lost our leader and most of the other senior engineers. The other remaining senior engineer moved to management, like he had been hoping to.

This left us with myself and two junior engineers to commit code day to day. It was a slow process, we got an experienced front end engineer relatively quickly to complement my skill set. Overall the majority of the work we had to do was on the backend and the new management was putting the pressure on schedule-wise. We also had an influx of QA automation resources to the team, which we sorely needed to build out our suite of API and UI automation. This build out of the test suite did, however bring to light a number of edge cases in the API that hadn’t been accounted for, which needed to be cleaned up. I felt this influx of bugs and the schedule pressure as a weight mostly on myself. I tried to take on too much, and let my newly promoted boss try to handle the new executive.

Retrospectively, I should have pushed back sooner and taken a more active role in dealing with the new executive. It’s not that my new boss was doing poorly, he was definitely doing better than I had the first time I was put into that situation. It was just that being thrust into that situation of the first time isn’t easy for anyone.

I ended up reading The Truth About Burnout to try and get a better grip on what was happening to me. It suggested that the path forward was to take more direct control in what is happening, essentially that the cause of burnout was a lack of control, not the situation itself. This is an interesting idea, but in the situations where I have experienced burnout it wasn’t a lack of attainable control, it was the lack of any mechanism to take control and fix the situation that did the most damage.

It’s a weird sort of mental knot, the lack of being able to fix the problem is the real problem not the initial problem itself. On one hand it feels like victim blaming – you are unhappy because you aren’t fixing your own problem. On the other hand it’s a much more powerful statement about what you can do.

Book Chat: Site Reliability Engineering

Site Reliability Engineering is about the practices and processes Google uses internally to run their infrastructure and services. There are a series of principles and practices espoused for how to run that sort of highly available distributed systems. Some of the practices are obvious, like having a good plan for what to do during an incident; some are more complex, like how to design a system to be resilient to cascading failures.

For those unaware of the Site Reliability Engineering (SRE) team at Google, it is a hybrid operations-software engineering team that isn’t responsible for functionality of a system but is responsible for ensuring that the service meets its uptime requirements. Not all services get a corresponding SRE team, just those with higher business value and reliability needs. By bringing in individuals with the blend of skills that are not as common and giving them this unique mission they are uniquely positioned to solve reliability problems in a systematic way.

The book describes a framework for discussing and measuring the risks of changing a software system. Most incidents are the direct result of a change to the system. The authors argue that necessitates putting the team that is responsible for the reliability of the system into the flow of releases and giving them the ability to influence the rate of change of the underlying service. That allows them to flow information back to the engineers building the system in a structured way. The ability to ‘return the pager’ gives the SRE team leverage that a normal operations team doesn’t have when dealing with an engineering team.

The limits of operational burden on the SRE team are a strong cultural point. The team is engineers and they need to leverage their software engineering skills to automate their jobs so that the number of SREs scales with the complexity of the service not the size of the service. By placing this limit to the amount of manual work the team engages in and the fact that they have a process in place for how to reboot a team that has gotten too deep into manual work builds a strong understanding of what a successful team looks like. The cultural aspect of rebuilding a team is more important than the technical aspect of it since each of these people knows how to do the right thing but their priorities have gotten warped over time.

As someone on the engineering side, there are significant portions of the book that aren’t immediately relevant to what I do. In reading this I may have learned more than I ever really wanted to know about load balancing or distributed consensus protocols. But the sections on effective incident response, post mortems, and culture more than make up for it for me.

The SRE discipline is an interesting hybrid of software engineering and software operations, and it is the only real way to handle the complexities of software systems going forward. The book stressed repeatedly that it takes a special breed to see how to build the systems to enable automation of this sort of work. I can see that in the operations staff I’ve interacted with over the years. A lot of them had a strong “take a ticket, do a ticket” mentality with no thought on to how to make the tickets self-service, or remove the need to perform the task at all. It’s a lot like bringing back the distinction between systems programming and application programming, where there was one kind of engineer that was capable of working at that lower level of the stack and building the pieces other users could work with on top of that.

Overall I enjoyed the book. It brought together the ideas that operations teams shouldn’t be that different from the engineering teams in terms of the sort of culture that makes them effective. The book really covers good software practices from the guise of that lower level of the operational stack. Then again I’m a sucker for the kind of software book that has 5 appendices and 12 pages of bibliography.

Book Chat: Elastic Leadership

I recently ran into a situation at work that I wasn’t sure how to resolve the specifics of the situation aren’t important to this post. I ended up rereading several books looking for some sort of kernel of knowledge that would give me some additional guidance on what to do. I started with Peopleware, moved on to Managing Humans, and finally ended up on Elastic Leadership. Here I found something to help with my problem.

The “something” was a description of how people are influenced that felt like it applied to my problem and helped break down my feeling in a way that I could describe to others. The influence description consisted of two axis, the type of influence (personal, social, and environmental), and ability vs motivation. It ended up with the six zones of influence. For example, personal-ability is influence through skills you have, while environmental-motivation is structural incentives like giving public recognition for different kinds of behaviors. Looking at the problem from the perspective of each zone helped me to articulate my problem and arrive at a course of action.

There are other useful constructs in the book as well. There is an alternative to the “Storming-Forming-Norming-Performing” model of group development. This alternative model has three stages: surviving, learning, and self-organizing. This construct is used to describe how the behavior of a manager should be different in different stages of the team’s development. When you are in the surviving phase the manager’s goal is to get the team to the learning phase. Once in the learning phase the goal is maximize learning and enable the team to gain the confidence to self-organize. I identified with this model, since it emphasizes that the role of a manager for a team in trouble is vastly different than a team in a good place.

Overall it’s an interesting read but a lot of it is what I would describe as management advice rather than leadership advice, in the sense that you need to be in a place of structural power to use a lot of it. Even then, understanding of the management perspective can help you understand the situations going on around you, like it did for me.

Strike Teams Part 2

This is a follow up to the original strike team post I did a while back. I’m writing this as the strike team is wrapping up. It’s been an interesting experience. We hit most of our broad goals but did it in a significantly different way than anticipated.

The first big change came early on when we were looking into the proposed changes to an existing piece of software that was going to be a consumer of what we were building. We realized that accommodating it was probably weeks of work by itself. The initial assumption of how much work was needed had been minimal, but apparently that particular solution was untenable to the people who supported the consuming software in the long term. This ended up with them pulling support from the strike team and becoming uninvolved. At the time, the scope and resourcing change was challenging since it threw everything into an uncertain state. I think it ended up being a good thing; we were able to focus on building out the functionality desired without spending a lot of time making changes to one consumer of the service. It does make the follow up more complex since we do need to make sure that work gets done otherwise the whole system will end up in a more complex state.

There was one portion of what we were going to deliver that had a complex deployment step. There were other changes that had to go either before or after this particular piece so we tried to move that deployment as far forward on the project plan as possible. The intention was to deliver it at the end of the first sprint. We encountered some significant scope changes that all unfortunately all had to go before that step. This ended up pushing the actual deployment out until the middle of the third sprint. At this point we had about ten tickets that we did the work for all sitting in this feature branch just waiting for the eventual opportunity to merge the whole thing back to master with nearly 5000 lines of changes. We ended up performing a series of messy merges bringing changes from master back to the feature branch. The final merge from the feature branch back to master was ugly since we wanted to preserve some but not all of the history of that branch. In retrospect we should have seen the merge issues coming and been more proactive about doing more small merges, but lesson learned for later.

Every team member lost some time to dealing with issues brought to them from the team they were on loan from. We had anticipated this during the first sprint but didn’t expect it to continue through the remainder of the strike team. I know I personally got sucked into some production oddities that were occurring and that after the first sprint without me my team missed my presence in code review to the point that they got me back to reviewing basically everything. Both of the other members of the team got pulled back to the their original team for issues that happened. We didn’t fight any of these needs figuring that since they seemed like critical issues for those teams, having their people available was the best overall usage of time, even if it hurt our overall ability to get things done on the strike team.

The team as a whole never really jelled to any significant degree. Everyone kept working the way their team did, which created some minor conflicts. One engineer’s team has a very lax overall structure and he would sort of disappear for days then show up with large amounts of excellent code, but it was frustrating for him to essentially go off the grid, without forewarning. The other came from a team with experience in a different technical stack and vacillated between asking not enough questions and spending too much time stuck and asking too many questions. The three of us were also in three different locations which made it difficult to really all get on the same page and get working together.

Overall the strike team model worked for this, since having representatives from some of the teams that were going to consume this service working on it made sure that we didn’t run into any issues with a misunderstanding of the domain. There were problems setting up a new team, that we should have attacked proactively since setting up any new team needs to initially get organized and form their own norms. The transient nature of the strike team prohibits a lot of identity building which, in my opinion is key to building a good team. Overall based on this experience I think that the strike team model can deal with software projects crossing multiple software team boundaries, but there may be other better ways out there to be found still.

Book Chat: The Psychology of Computer Programming

The Psychology of Computer Programming by Gerald Weinberg is describing the how and why of computer programming in the abstract. It covers topics like when and where to leave comments, how the choice of programming language influences the eventual program written, or how to go about hiring programmers. I read the silver anniversary edition which added some annotations about how events had changed between the original 1971 release and 1997 when the silver anniversary edition was written.

The book starts out with talking about reading programs with some example code written in PL/1. The example reads fine even if you know nothing about PL/1, it goes through several variants of the same program, dissecting the pros and cons of each implementation. While modern programming has mostly eschewed limits on memory usage and program size, similar pros and cons could be applied to things like GC pressure or context switching.

Each chapter closes with a series of introspective questions about the topic for programmers and  managers , mainly about how it could be applied to your day to day activities. After a chapter on programming as a social activity it asks the manager, “In setting your own working goals, what part is set by what is passed down from above and what part is set by what comes up from below? Are you satisfied with this arrangement, or would you like to alter it in some ways?” Whereas it asks the programmer “What part do you play in setting the goals of your team? What part would you like to play? What part would you like others to play?” These two sets of questions sort of suggest a  conversation between perspectives and helped me to understand the perspective of management better.

The section on time sharing systems vs batch systems did not age well since neither system is used anymore. It was still interesting to see a breakdown of the pros and cons of the two systems and how it impacts the culture of the workplace. It provided a case study of a company where they switched from a batch system to a time sharing system, which resulted in a breakdown of the informal communication system between the developers. Under the batch system they would congregate around the result return since when the results would be back wasn’t certain. Once they switched to a time sharing system everyone spent time in their office and there was little communication and teamwork among the programmers.

There was no single takeaway from this book where I would recommend that you should read this to achieve a particular end. Overall it was an interesting read from a conceptual perspective, but I don’t think it’s applicable to the average programmer. There is more value I think on the management side, but since that isn’t what I do it’s harder for me to judge.

Imposter Syndrome Meetup

I was at a local meetup about imposter syndrome this week and it made me remember how far I have come in my own career. The speaker talked about his journey and the times he felt like an imposter, even though he had the sorts of experiences that would make most engineers jealous. I want to talk through my own background and about how I managed to come to grips with my own professional insecurities. Hopefully this will inspire others to have more confidence in themselves.

I remember at my first job how little I felt like I was learning and how little it seemed my coworkers around me knew. When I went to interview for my next job a couple of years later I was highly nervous that I was behind the curve, and I wasn’t sure that I entirely understood how real-world software engineering was meant to work. When I got the job, I told myself that I got lucky that the interview was devised by a bunch of alumni from my college, so it covered the kinds of questions I had seen in school.

At that job the imposter syndrome kicked in immediately. I was afraid that they had certain expectations of me based on my experience level, where I felt like I hadn’t progressed past what I had learned in school. I thought that I was behind the curve on version control practices, and I hadn’t gained any exposure to any sort of real domain modeling or object oriented programming. I knew these skills were going to be important at this job and had assumed that they were things they could expect me to know when I walked in the door. There was all of the domain information that goes with a new job as well, and at this company it was literal rocket science so I couldn’t really slouch on that aspect either. The first few months were definitely rough, I had a couple of days where I spent all day fighting with basic ideas and couldn’t get anything to work, which made me feel like I didn’t deserve to be there. It eventually got better as I gained experience with the domain and the technology. I gained confidence after running down a couple of very gnarly bugs and getting praised for a creative solution to an awkward problem. Ultimately though, my anxiety was misplaced. It turned out that my managers never had expected me to walk in the door with these skills, they had picked me because they were happy with what I knew already.

Sadly the company ran into hard times and I got laid off, but this paradoxically resulted in big confidence boost. When I got back to my desk from hearing the bad news from my boss I had a ringing phone from a former coworker to schedule an interview with his new company. All that time I had thought that I was barely getting by, he’d thought I was doing fantastic work. In the two weeks between then and my last day at that job I got another offer as well which helped boost my confidence.

I took my former colleague’s offer; I’m sorry to say it ended up not being a great culture fit for me. But by now, the increase in confidence meant I was more willing to take a chance and make a move, looking for something that would be more of a challenge. I took a position in the same domain as an expert to help salvage a failing software project. This job was a good fit on paper for me, since I had experience on both the domain and the technical stack. I was confident going in and was initially given a lot of latitude to do what needed to be done, which was great technical experience. But, it had me doing a lot of more management style activities than I wanted to do, which was an area that I also felt I didn’t really have the right skills and experience. Then after getting the software stabilized and finding a order of magnitude performance improvement, the reward was to bog down the entire project in a mountain of process. So, while I’d gained confidence in my own abilities and standing when it came to technical issues, I was circling back to feeling like an imposter in this new process/management role.

My discomfort resulted in me moving to a small startup to help anchor their development team. The environment was very unstructured and goals changed week to week. I was immediately being asked to give expert opinions on technologies I had never worked with before. The situation was stressful because I felt like I wasn’t qualified to give these opinions, but it wasn’t clear whether they had anyone more qualified. On one hand I was faking it in that I didn’t know a lot of what I was talking about, but on the other hand I took the initiative to learn a lot about these new technologies. Really, I learned how to learn about technologies. The few technology decisions I made during my time there all seemed to work out fine, but I don’t know how that compares to having made other choices and I wasn’t there long enough to see the long term outcomes. Even now, I still find myself downplaying the difficulty of the work I did, still feeling like I was just a pretender.

My next job was at a large tech company, and it was an eye opening experience. This was the first ‘normal’ web application I had worked on since my first job and I was worried that I was out of practice. Since I had so many more years of experience than the last time I worked on web applications, I assumed the expectations for me would be higher than I could meet. I was worried that I would show up and not know how to do anything and would be summarily fired. This turned out not to be the case, but the impression I had going in impacted my ability to leverage myself to accomplish anything. My assumption that I wouldn’t be able to contribute right away meant I stayed quiet about areas where I could have made improvements to benefit the company; I let mediocre practices I witnessed linger way too long before trying to change them.

Despite the good work I did there, my inability to change the culture and other improvable development practices really hurt my confidence about what I could achieve in this environment. This, combined with the lack of knowledge around building web applications, pushed me to do anything and everything I could do to try and grow more. I put a concerted effort into getting out into the local development community to try and find a broader sense of inspiration. This was the time period when I started writing this blog as well. I started attending a number of local meetups and listening to various podcasts. Talking to so many new people who shared my struggles helped me understand that others don’t know some magical trick that I don’t. And, it made me realize that learning how to learn was one of the most important things I had achieved. For me, moving on from imposter syndrome has been about accepting that I don’t know everything I wish I did on a topic, but neither does anyone else, it’s all about our willingness and ability to learn and improve.

This all culminates with my current position where I changed tech stacks to stuff I had never used at all before. My specific experiences weren’t immediately relevant to this new technology stack, but I did bring a lot of thoughts on doing unit testing, domain modeling and other good technical practices. Since this was my fourth stack in 12 years as a professional I had a fair idea about how to pick up a new stack and leverage what I did know to learn new things. There are still lots of things I don’t know, but I managed to get enough together to know how to ask reasonable questions and to apply the concepts from other stacks. I am still at points concerned that I don’t know enough about certain topics but I have become become fearless about asking questions and unafraid of looking uninformed. This question asking seems to have helped one of the junior engineers on my team to have the confidence to ask questions in pull requests when he doesn’t understand what’s going on. That sort of safe space amongst the team is the sort of environment that I want to be in and having accepted my own lack of knowledge on some fronts has empowered those around me to find a better way for themselves.

Strike Teams

At work there has been a new practice of starting up strike teams for different projects. The idea is that for projects that require expertise not found on any individual team, you pull in a person or two from multiple different teams to get all of the correct skills on a single team. That’s the pitch of the strike team model, but the hidden downside is that it breaks up the cohesion of the teams that people are pulled from and the created team may not be together long enough to create new cohesion. This post is mostly going to be a chronicle of the issues that I encountered starting up a strike team and what we did to try to resolve the problems.

The first problem was coordinating the various different teams to figure out who was going to be involved. In the case of the strike team I was forming, the two teams contributing resources both wanted to know who the other team was going to contribute before making their own decisions. They also wanted to have a fixed end date to the project before deciding who to contribute. We ended up resolving this issue with a fixed date for whoever they contributed to the team and getting the two of them to discuss the situation amongst themselves.

The second problem was aligning the start date of the strike team. The three teams contributing resources all have different schedules to their individual sprints so it makes coordinating a ‘start’ date for the strike team difficult.  Those who were on teams about to start another sprint wanted the strike team to start now, whereas those with other commitments made weren’t available. We ended up doing a rolling start where as each team finished their sprint they rolled onto the team and the team ramps up as people become available. We did some preparatory work to get everyone up to speed on the goals and challenges of the team so they were aware of what’s going on whenever they were able to join.

The third big problem is more specific to our particular organization, not to the strike team model. As part of setting up the strike team we needed to schedule things like standups and retros for the team. Since the team is split across both coasts, the hours for scheduling these meetings are limited. The conference rooms are also pretty much all booked because every other team beat us to scheduling. We ended up asking IT to rearrange some other non-recurring meetings and managed to get a consistent slot for the standups. The retro slot was more complicated but we managed to get meetings roughly spaced; by not being a stickler for strict week deliminations we managed to find times that worked.

So with all the overhead sorted out we finally get to move on to the real work of the situation which will be a nice change of pace from the administrative aspect.

Book Chat: Pair Programming Illuminated

My team has been doing more pair programming recently so I picked up a copy of Pair Programming Illuminated. I had never done a significant amount of pair programming before and while I felt I understood the basics, I was hoping to ramp up on some of the nuances of the practice.

It covers why you should be pair programming, convincing management that you should be able to pair program, the physical environment for local pairing, and common social constructs around different kinds of pairs. All of this is useful information, to varying degrees. Since the book was written in 2003, some of the specifics of the physical environment section didn’t age well – advising the use of 17” monitors most obviously. Both of the evangelizing sections seemed to cover the same ground, and did not seem to be written in a way to try and convince someone who is not already open to the concept. Neither section seemed to be written to the person who isn’t already in favor of doing pair programming. There were lots of references to studies, and some personal anecdotes, but none of it stuck in a way that felt like it would change someone’s mind.

The social aspects were interesting, however most of the section was stuff that felt obvious. If you have two introverts working together then they need to work differently than if you have two extroverts working together. A lot of the time the tips were common sense, and didn’t seem like it was necessary to write it down in the book. I would have liked to see more discussion of getting someone to vocalize more and clearly what they’re thinking about.

I feel like I’m better equipped to do pair programming because of having read this, but I also feel like a long blog post would have been just as good a resource and much more focused. I don’t know what else I would have wanted to fill out the rest of the book.

Changing Stacks

I had an odd realization recently – the jobs that I changed stacks for seemed to be better than the jobs where I already knew the stack. I’m not sure if it’s a common experience or something that’s unique to my experiences.

I’ve got a few theories about why this might be the case for me. First, the jobs where I changed stacks appeared more engaging to me because there was more to learn. Second, the jobs where I changed stacks felt better because they used different hiring practices, where they looked for underlying talent and skills as opposed to specific stack-related experiences. Third, relating to the second point, the diverse points of view brought together because so many colleagues were also changing stacks creates a better workplace. Fourth, the people that are willing to change stacks are the kind of people who are more open to learning going forward. Lastly, the jobs where I changed stacks happened to be better by pure chance due to small sample sizes. None of these theories is particularly provable. Several of them could be true and working together. They could all be false and it could be something completely different.

I know that from talking to some of my coworkers that they all came into their current jobs without particular experience in the Scala/Play Framework/MongoDB stack as well, although most of them came from a much more similar Java stack rather than the C# stack I came from most recently, or any of the other stacks I worked with in the past. They mentioned that we have issues recruiting in our DC office because of the pay differentials for cleared work in the area;  there are lots of places you can go and get a comfortable government contracting job and not really stretch yourself and grow. There has also been some discussion about how the stack effects our ability to recruit since it’s not as common as other stacks, and some candidates have expressed reservations since the stack isn’t really growing in popularity. There was a lot of information in the most recent Stack Overflow Developer Survey that corroborates the idea that the Scala is shrinking, but it says it is shrinking in relative of share of all votes. In absolute terms the change is less clear.

I guess that I’ve enjoyed the stack changes I’ve made. Only once did I set out with an intention of changing stacks as part of a job change. Then I wasn’t looking to go to anywhere particular, but I was looking to move away from ColdFusion since it didn’t really mesh with how I like to do development. I don’t have any intention to make another change any time soon, but I’ll definitely consider another stack change when I do just because it opens up a wider variety of options and seems to have worked out for me in the past.