Burnout

Burnout is a common topic in our industry. I’m thinking about it right now because I got burnt out fairly badly recently and as a result stopped blogging for a while. It broke the commitment device I had formed by posting weekly. I think I’ve recovered and want to take this opportunity to discuss what happened and how I think I could have avoided burning myself out.

My team got split in two, some people left to form a new team with a new mission and some stayed in the existing team to continue the existing work. This meant that there was roughly the same work to do and fewer hands to do it. Also when this happened we started reporting up to a different executive. All of this change together was a bit of a shock to the team; our overall output took a hit from the loss of people, but our productivity stayed good. We lost our leader and most of the other senior engineers. The other remaining senior engineer moved to management, like he had been hoping to.

This left us with myself and two junior engineers to commit code day to day. It was a slow process, we got an experienced front end engineer relatively quickly to complement my skill set. Overall the majority of the work we had to do was on the backend and the new management was putting the pressure on schedule-wise. We also had an influx of QA automation resources to the team, which we sorely needed to build out our suite of API and UI automation. This build out of the test suite did, however bring to light a number of edge cases in the API that hadn’t been accounted for, which needed to be cleaned up. I felt this influx of bugs and the schedule pressure as a weight mostly on myself. I tried to take on too much, and let my newly promoted boss try to handle the new executive.

Retrospectively, I should have pushed back sooner and taken a more active role in dealing with the new executive. It’s not that my new boss was doing poorly, he was definitely doing better than I had the first time I was put into that situation. It was just that being thrust into that situation of the first time isn’t easy for anyone.

I ended up reading The Truth About Burnout to try and get a better grip on what was happening to me. It suggested that the path forward was to take more direct control in what is happening, essentially that the cause of burnout was a lack of control, not the situation itself. This is an interesting idea, but in the situations where I have experienced burnout it wasn’t a lack of attainable control, it was the lack of any mechanism to take control and fix the situation that did the most damage.

It’s a weird sort of mental knot, the lack of being able to fix the problem is the real problem not the initial problem itself. On one hand it feels like victim blaming – you are unhappy because you aren’t fixing your own problem. On the other hand it’s a much more powerful statement about what you can do.

Advertisements

Book Chat: Site Reliability Engineering

Site Reliability Engineering is about the practices and processes Google uses internally to run their infrastructure and services. There are a series of principles and practices espoused for how to run that sort of highly available distributed systems. Some of the practices are obvious, like having a good plan for what to do during an incident; some are more complex, like how to design a system to be resilient to cascading failures.

For those unaware of the Site Reliability Engineering (SRE) team at Google, it is a hybrid operations-software engineering team that isn’t responsible for functionality of a system but is responsible for ensuring that the service meets its uptime requirements. Not all services get a corresponding SRE team, just those with higher business value and reliability needs. By bringing in individuals with the blend of skills that are not as common and giving them this unique mission they are uniquely positioned to solve reliability problems in a systematic way.

The book describes a framework for discussing and measuring the risks of changing a software system. Most incidents are the direct result of a change to the system. The authors argue that necessitates putting the team that is responsible for the reliability of the system into the flow of releases and giving them the ability to influence the rate of change of the underlying service. That allows them to flow information back to the engineers building the system in a structured way. The ability to ‘return the pager’ gives the SRE team leverage that a normal operations team doesn’t have when dealing with an engineering team.

The limits of operational burden on the SRE team are a strong cultural point. The team is engineers and they need to leverage their software engineering skills to automate their jobs so that the number of SREs scales with the complexity of the service not the size of the service. By placing this limit to the amount of manual work the team engages in and the fact that they have a process in place for how to reboot a team that has gotten too deep into manual work builds a strong understanding of what a successful team looks like. The cultural aspect of rebuilding a team is more important than the technical aspect of it since each of these people knows how to do the right thing but their priorities have gotten warped over time.

As someone on the engineering side, there are significant portions of the book that aren’t immediately relevant to what I do. In reading this I may have learned more than I ever really wanted to know about load balancing or distributed consensus protocols. But the sections on effective incident response, post mortems, and culture more than make up for it for me.

The SRE discipline is an interesting hybrid of software engineering and software operations, and it is the only real way to handle the complexities of software systems going forward. The book stressed repeatedly that it takes a special breed to see how to build the systems to enable automation of this sort of work. I can see that in the operations staff I’ve interacted with over the years. A lot of them had a strong “take a ticket, do a ticket” mentality with no thought on to how to make the tickets self-service, or remove the need to perform the task at all. It’s a lot like bringing back the distinction between systems programming and application programming, where there was one kind of engineer that was capable of working at that lower level of the stack and building the pieces other users could work with on top of that.

Overall I enjoyed the book. It brought together the ideas that operations teams shouldn’t be that different from the engineering teams in terms of the sort of culture that makes them effective. The book really covers good software practices from the guise of that lower level of the operational stack. Then again I’m a sucker for the kind of software book that has 5 appendices and 12 pages of bibliography.

Functional Programming Katas

Based upon my success with the F# koans I went looking for some more covering the functional side of Scala programming. I’ve found a couple so far, which I’ve completed with varying levels of success.

First was the Learn FP repo is a github repo to checkout and has some code to fill in to make some tests pass. The exercise asks you to provide the implementations of various type classes for different types. There were some links to other articles about the topics but otherwise it was just code. The first part of this was fairly straightforward; I had some trouble with State and Writer but otherwise persevered until I hit the wall at Free. I ended up breaking down and looking up the completed solutions provided in a different branch to find that IntelliJ indicates that the correct solution doesn’t compile (Turns out the IntelliJ Scala plugin has an entire scala compiler in it, and it’s rough around the edges). That frustrated me for a while but I eventually managed to power through, and thankfully the rest of the exercises didn’t suffer from the same problem.

Next was the Cats tutorial. I had done some of the other exercises here when first learning Scala and that had been pretty helpful. This has a neat interactive website to run the code you fill in, but it makes it harder to experiment more with the code. This seemed like a reasonable place to start to cover a lot of the major type classes in Cats. It has you look at sample code and fill in what it would evaluate to. It was good but I had two issues with it. First, there are multiple blanks to fill in some of the sections and it evaluates all of them as a group and doesn’t provide any feedback helping you know which one you got wrong. Second, it’s a lot of looking at other code and describing what it does, no writing of code in this style yourself. Overall it helped me feel more comfortable with some of the terminology, but didn’t produce that “ah ha” moment I was looking for regarding the bigger picture.

Then I went to the Functional Structures Refactoring Kata, which is an application to be refactored into a more functional style with samples in multiple languages.The authors provide a ‘solution’ repo with refactored code to compare to. The issue I had with this exercise is that other than going to look at the solution there isn’t a real way to tell when you’re done.  Even then, some of the ways they factored their solution are opinion based. While seeing that opinion is interesting they don’t really explain the why of their decisions.

The last tutorial I tried was the Functional Programming in Scala exercises. It’s from the same people as the Cats tutorial above and is based on the exercises in the book Functional Programming in Scala. I managed to get about halfway through it without having read the book. While there is some prose in between exercises, it doesn’t adequately explain all of the concepts. While I’m reading the book I will come back to this and do the rest of the exercises.

Overall I would strongly recommend the Learn FP repo, and recommend the Cats tutorial. I would pass on Functional Structures Refactoring Kata. I’ll hold judgment on Functional Programming in Scala until I can try it with  the book. While these were largely good starts, I still haven’t had that conceptual breakthrough I’m looking for on how to use all of these pieces in practice.

HTTP/2 Multiplexing and Bundling Frontend Assets

This post is the result of two somewhat related ideas coming to mind together to form  a question. How will the adoption of HTTP/2 with support for multiplexing impact the practice of resource bundling for front end assets?

The first idea is HTTP/2 enables multiplexing, which is multiple HTTP requests sharing a single TCP connection. This removes the TCP connection overhead of completing the handshake. This also works around some of the slow start issues with TCP connections so your connection needs to scale up once instead of for every subsequent HTTP request.

The second idea is bundling frontend assets, which is taking multiple smaller javascript or css files and packaging them into one big file of that type. Working with the code base you want to have multiple smaller scripts to make it easier to organize and understand the code. However, the network didn’t work well with moving lots of tiny files around, due to the issues discussed above. This resulted in the bundling practice of combining the assets into larger files before serving them to end users. But, this fights with HTTP caching semantics since these large files change more often, which invalidates caches more often.

So the two ideas clearly impact each other, but since HTTP/2 adoption is still early the outcomes aren’t necessarily sorted out yet. Some brief internet research yields two different opinions on possible outcomes. The first is a suggestion to avoid bundling, except where it makes sense from compression perspective. The second is to do semantic bundling, since there is some overhead to each resource but much less than before. Essentially, you bundle resources that likely to change together to maximize caching.

So there isn’t a strict best practice, and since adoption of HTTP/2 is still in the 20-25% range, we’ve got a while to go before changing your asset delivery plan. If you’re reworking your frontend resource pipeline it might be an interesting consideration anyway, given that you will want to significantly change your bundling strategy in the 2020 timeframe. This little bit of knowledge doesn’t seem to have much practical purpose today, but it will come up in the future.HTTP/2 Multiplexing and Bundling Frontend Assets

Book Chat: Refactoring

Refactoring sets out describe what refactoring is, why you should refactor code, and to catalog the different refactorings that can be done to an object oriented codebase. This isn’t the first instance of the idea of refactoring, but it was the big coming out party of the idea in 1999. It is an audacious goal in that the effort to catalog all of anything can be daunting. While I’m not an authority on refactoring by any means, it certainly captured all of the basic refactorings I’ve used over the years. It even includes refactoring to the template method design pattern, though it doesn’t reference something like refactor to the decorator pattern. It seems odd to have included refactor to one design pattern but not to several others.

The description of the “what” and “why” of refactoring are excellent and concise. The catalog is ~250 pages of examples and UML diagrams of each refactoring technique; that each refactoring needed to be shown, feels like overkill. In general, the author shows both directions of a refactor, e.g., extract method and inline method, which can be rather overwhelming. A newer volume on refactoring like Working Effectively With Legacy Code seems more useful in its presentation of actual refactoring techniques, in that it prioritizes where we wish to go, rather than exhaustively describing each individual modifications. Honestly, I think that since Refactoring predates automated tools for performing refactoring, given that  the internet in 1999 wasn’t as full of help on these sorts of topics, the book needed to be more specific since it was the only source of help.

It’s an interesting historical piece, but not an actively useful thing to try to improve your craft.

Akka From A Beginner’s Perspective

I wandered into a new part of the codebase at work recently that contains a number of Akka actors. I was aware of both the actor concept and the library, but had never worked with either.

Actors are a way to encapsulate state from threads so that if you want to make a change to the state you need to send a message to that thread. If you’ve ever done any work with an event loop, it’s similar to that but genericized to whatever sort of data not just events. The idea is that each actor provides a mailbox where you can leave a message, then that actor processes the message and whatever happens to the actor’s state happens on that thread. This means the messages go the actor’s thread rather than the data being fetched from the actor and brought back to the caller’s thread. The big advantage of this is that there isn’t any need for locking since no mutable state is shared. The downside to this message-passing style is that the default message flow is one way. Some typical code using an actor would look like

actor ! message

This would send the message to the actor. The actor itself can be pretty simple such as

class ActorExample extends Actor {
  def receive = {
    case SampleMessage=> println(“got message”)
  }
}

That receives the message and runs the listed code if it is of some expected type (in this case, SampleMesage). This is good for data sinks, but actors can be composed too.

class ForwardingActor(destination: OtherActor) extends Actor {
  def receive = {
    case SampleMessage(content)=>
      println(s“got message {content}”)
      destination ! SomeOtherMessage(content)
  }
}

This actor logs the contained data and passes the data along inside a different message wrapper. This is interesting but requires you to define the destination when creating the actor. Akka provides a syntax for finding out the actor that sent you a message too.

class ReplyingActor extends Actor {
  def receive = {
    case SampleMessage(content)=>
      sender() ! Reply(content) // The () is optional but using it for clarity here
  }
}

This simply sends back the same content inside a new message envelope. There is one small gotcha in this code – if you close over sender() itself it will have unintended consequences, so a different pattern is recommended for your receive message.

class ReplyingActor extends Actor {
  def receive = {
    case SampleMessage(content)=>
      processSampleMessage(sender(), content)
  }
  private def processSampleMessage(sender: ActorRef, content: String) = {
      sender ! Reply
  }
}

This figures out the sending actor before doing any processing to be sure you don’t end up closing over the wrong actor as you chain more complex pieces together. The other interesting thing about this example is that the type of sender is ActorRef not Actor. The ActorRef is a handle to wrap around the actor which keeps track of where it runs and how to get the message to its mailbox for you. This allows you to do things like have two actors interact even though they are being scheduled independently. This all seems pretty straightforward if you send a message from an actor to another actor, but if you send a message from something that isn’t an actor, what does sender() do and how does that work?

The answer is that the message is generally discarded, unless the call was made with an ‘ask’ such as

val result = actor ? message

This captures the result of the actor as a Future[Any], which at least returns the result so you can inspect it even if the type isn’t that useful. Akka currently provides typed actors to try and work around that pain, which is intended to be replaced by Akka Typed which isn’t quite ready for production as of this writing.

That’s all the Akka I picked up delving into this new portion of the codebase. I didn’t need to get into supervision or schedulers, but if building a new application from scratch I’m sure those concepts would come up.

Book Chat: Beyond Legacy Code

Beyond Legacy Code is a description of nine practices to help improve the value of software. The author directed it not just at developers or engineers, but also at development or IT managers, product managers, project managers, and software customers. That’s a broad array of people who are coming to a problem with a wide set of goals and preconceptions. Eight of the nine practices are pretty normal and obvious items for most software engineers. One however was novel to me: implement the design last.

The basic idea is pretty straight forward – do a sort of bottom up build of components and then compose them into larger and larger units. Then allow the design of the larger pieces to emerge from that. Since you already have all these well written and tested units you can compose them together safely in ways that you understand. It keeps you from reaching for a more complex design pattern that you may not need because you are still working through all of the smaller pieces. I see it as the red-green-refactor mantra in the macro sense.

I had often tried to accomplish this similarly by starting at the top and stubbing out other smaller pieces as I went. This didn’t always work out since the interface for the piece you stubbed out may not have the information it needed to do its work. I have also seen this end up with odd pieces that don’t really make sense outside of the context of what I was working on so I had less reusable components afterwards. Overall it worked fairly well to try to map decompose the problem in the initial pass.

Since reading this book, I’ve tried their bottom up buildout a couple of times. It seems to have taken me significantly longer to do the work, but I think the overall reusability of the subsequent design is better. I feel that with more practice that I should be able to be at least as productive as before. I haven’t had to come back to any of the code I wrote like this in maintenance so I don’t have any data yet on if it delivers on the maintainability ideas that it promises.

I don’t think that book delivers to the full audience of people who are intended as readers, but it feels well directed at Software Engineers, considering the principles and guidelines we use. I don’t know what a large portion of the audience would get from reading this other than a familiarization with the terms used so they could communicate better. I don’t see how it would cause a project manager to reconsider the schedule, or an IT manager to deal with a project differently. Maybe I can’t take their point of view well enough, so I saw large portions of the suggested practices as ‘normal’ but to those roles they would help articulate the value of unit testing. I don’t know of a better modern book for those involved in the management side of software without a software background that is still technical. Classics like Peopleware or The Mythical Man-Month still show most of what you need to do to run a software team from a strictly management perspective and this doesn’t supplant those. Looking at the reviews on Amazon though it seems as though my concerns that this isn’t what non-developers want seems to be unfounded. The consistent praise it is garnering there makes me curious to see what other non-developers I know would think if they read it.

2017 Year in Review

I had a couple of goals from last year for the blog.

  • Keep up the weekly post cadence
  • Continued increases in readership
  • Write a longform piece and see what that looks like in this format

I kept the weekly post cadence, but a couple of times I did feel like I was running out of material. Readership was up from 440 views last year to 587 this year. The page getting the most views was the homepage, which almost tripled from last year with 133 views vs 48 last year. I’m not sure what that means, it could mean people just coming back to see what’s new, or visitors just looking at the homepage after whichever article they searched for. Otherwise the most visited page was last year’s Anonymous type XML serialization with 94 views. I appreciate that it seems to have garnered a pretty regular stream of visitors since I first posted it. To me, that means it has lasting value. As for longform I don’t think I accomplished that, I feel like the posts I wrote this year actually hit the 500-750 word mark pretty consistently as compared to previous years.

For 2018 I’ve still got last year’s goals for the blog to continue on. I also want to better understand what articles people are reading. In order to do this I want to try overhauling the landing page, so that it gets me that information. I could achieve this same end by putting a ‘read more’ tag into each post, but I don’t like those when I read content so I would feel hypocritical using them on my own. I am considering a static landing page to see what that does, but I don’t know what I would want to put on it.

In terms of personal technical goals going into next year, I want to carve out time to learn some Haskell or Cats and those efforts seem like they would create interesting posts. I know we’ve got a project coming up at work that will mean switching from Reactive Mongo to the official Mongo Scala driver and upgrading to Play 2.5, which will probably be occupying a good bit of my time early in the year. I’m not sure what other projects I would be doing at work. Other technologies I use at work that I want to get further into are RxScala and the Docker and Mesos infrastructure we build on top of. That’s a lot of ground to cover, and I know I would need to find a project to do with any of those to really dig in.

2017 had plenty of surprises in my programming experiences, and I don’t expect to be able to predict what 2018 will hold. I’ve got a direction, not a destination.

Goals For Senior Engineers

It’s annual review time around my office. Looking back at my personal goals for the last year and trying to come up with some for next year is a good opportunity for reflection. First up are the ones from last year, summarized.

  1. Make creating a new microservice simpler
  2. Lead a major project with a company wide impact
  3. Get the team into a better place regarding technical debt
  4. Mentor some Junior Engineers on the team
  5. Improve on call documentation/processes

I didn’t succeed at the first one, largely because I never really got a chance to try. It’s unfortunate because it would be great for the company in the long term. However, it’s an engineering-centric efficiency improvement for which the benefits are difficult to quantify, so it was never urgent. Not getting this done was disappointing for me personally. I had identified a way to help lots of engineering teams through an idea that’s obvious in hindsight, but I wasn’t able to find the time to get it implemented.

The second one was a pretty big success. The design I had put together got built out and the resulting architecture worked out great. Despite running into some issues with some of the prerequisites to operationalizing it we were able to get the whole thing into production ahead of schedule. On top of that we beat our load, response time, and resource usage targets handily. Overall I’m amazingly pleased with how this one came out.

The third one is an odd one, since by the metrics I set out for myself we nailed it, but by other metrics you could use to measure it we failed. I led some process changes to help us stop accruing technical debt, and start making more time to deal with the debt. This resulted in us closing an amazing number of tickets, three times the target my boss said was ‘too ambitious’ when I set it. He had suggested tracking the number of open technical debt tickets, but I stuck to the number of tickets dealt with since I felt as people saw these tickets got resolved they would be more inclined to file more of them. I was right on that, so while our metrics got worse, our position really got better.

I spent significant time mentoring our two new grads and two other new hires to help them get up to speed, not just on Scala but also on our systems and software engineering in general. The plan was to set up pair programming time and do accomplish the mentoring during that time. Since I’m on the East coast and a morning person and the new grads are on the West coast and not morning people it made things a bit complex at first in terms of scheduling time. Eventually we found a rhythm of me pairing with each of them an hour a week. This ended up helping them out a bunch. They got solutions to their problems and I taught them a lot of little things like IDE shortcuts and how to use some problem solving resources like symbol hound (a search engine that will search for special characters).

The last goal was meant to resolve a problem that had existed from the time I joined the team. Then, the on call rotation was two people trading off with no backup. The reason was that there wasn’t any documentation for how to deal with some of the ‘done’ systems the team supported, i.e., the systems that are feature complete and don’t have any active development on them. Nobody learned anything about them since they never touched them, but it also meant that the new engineers, myself included, couldn’t really go on call since if the system broke they didn’t have any idea what to do. I ended up doing a deep dive into each of the systems and producing docs about how they worked/logged so that if you got an error you had somewhere to start and an idea of what to do next. We opened up the pager rotation and everyone on the team gets the opportunity to be primary. It’s unclear if the documentation is any good because the systems haven’t had any problems.

This all brings me to goals for next year. There is some reorganization going on so I may end up on a new team. With that unknown right now, I’m not sure how I’ll figure out my goals for the coming year. This complicates the issues I have with this topic every year. We do scrum, so the team is supposed to be working on tickets selected by the product owner, so while you can influence them, you don’t have a ton of control over what you build. You can pick goals around how you work, like the mentoring and technical debt goals I had last year, which you can apply to whatever you end up doing. The more project based goals were based on large work items I knew were coming down the pipeline and probably wouldn’t be moved.

If I stay on my current team I have a good idea of what sort of work I’ll be doing and some process items to try to get to. If I move to a new team I’m not sure what exactly I’ll be doing. I also don’t know if what team I end up on will be finalized before I need to submit my official goals. I could write some goals around community involvement, or doing internal training/presentations. Maybe finding a way to automate some new metrics would be worth doing. I’ve always had difficulty finding good individual goals, since I see what we do as a team effort.

Individual goals are difficult to align with the team-based metrics generally used when doing scrum. This creates a sort of tension between individual goals that the team isn’t aware of and the team metrics and goals which are secondary to everyone’s individual performance. I see the goals for Senior Engineers as needing to be more team focused than those of lower lever engineers since they are more big picture oriented. If you are a Senior Engineer, you’ve reached the point where you are trusted to tackle issues independently, which should creates space to focus not just on merely doing, but on doing better.

Long Term Open Source Success

I was having a discussion with some of my coworkers at our engineering book club, talking about the first few chapters of Clean Architecture. We were discussing whether anyone had worked at a place that took a really long-term view of the architecture of a system. Most people didn’t think they had, I thought I might have once but it was hard to say if the company’s leadership had that view or that they got lucky with some key engineers just making it happen. Unfortunately the company ran into some business issues, with the very large competitors in the industry picking off their largest customers. From there I posited a different question to the group: is the architectural difference between open source software and commercial software one of the reasons for its long term success?

There ended up being a palpable pause and the idea at once made sense to everyone, but nobody was sure of how to try to confirm or deny the thought. We all agreed open source, at least initially, is built by software devotees for personal reasons. Whether it is for personal usage, to learn something, or to prove out an idea, people take time out of their day and build a piece of software and put it out there for the world to see. It’s finished when it’s finished and if you aren’t enjoying working on it then it’s no big loss to just set it aside and do something else. There was some discussion around the open source code with paid support model and whether that had more to do with the development of large chunks of open source software. There was a discussion about a resurgence in popularity for postgres because it was feature packed and solid, and whether its continuation in flexibility was because of an underlying architectural quality difference, given that it does not have corporate backing like other common relational databases.

From an ideological point of view, the idea that software succeeds in the long term because of better architecture is greatly pleasing. I would love to see data bearing that idea out, but I don’t know how you would get access to an appropriately large selection of equivalent projects. Something like The Architecture of Open Source Applications tries to make it easier to understand the big picture of some successful and long lived open source applications, but you would need a match set of closed source applications to compare and contrast against.

Building a great architecture requires taking time to deeply understand the problem. Sometimes you truly need to “build one to throw away,” knowing all the cost that represents. The pressure of commercial success and unbridled growth puts short term thinking into the forefront, and often prevents throwing it away.

Open source is a thousand developers trying to fix problems they run into alone and putting it out there. When you hear about the solution you go take a look at it. If its API doesn’t work for you the project doesn’t gain a new follower and without followers there are no contributors and the project eventually withers and dies. If the API does work for others and it solves problems the project grows. You can look at it as a distributed genetic algorithm doing API first development. You rarely hear about the ideas that don’t catch on; you hear about the things  where people felt they gained value.

One of my favorite definitions of software architecture is making the decisions that are hard to change. If an open source project decides they are just going to change their underlying central abstractions for the next year, that’s a strictly technical decision. They don’t need to build a business justification, or fight to get it on the schedule. They don’t even necessarily need a consensus that it’s a good idea, they could just fork the project if they felt so strongly that was the proper solution.

At work we’ve got a central architectural abstraction that is, in my opinion, not right. I could go build a PR on my time to change it, but that change would then ripple out to a dozen other teams as they start using the new abstraction. I could help them adopt the new abstraction, but I would need a strong consensus from those teams to get the PR in. Even though I think that consensus exists in engineering, the product schedule doesn’t leave time for this sort of change. It’s hard to quantify the drag this abstraction is causing which makes that difficult. If you could use a library that was built by those who are quality obsessed or a library built by those who are trying to match schedule, which would you choose?