Strike Teams Part 2

This is a follow up to the original strike team post I did a while back. I’m writing this as the strike team is wrapping up. It’s been an interesting experience. We hit most of our broad goals but did it in a significantly different way than anticipated.

The first big change came early on when we were looking into the proposed changes to an existing piece of software that was going to be a consumer of what we were building. We realized that accommodating it was probably weeks of work by itself. The initial assumption of how much work was needed had been minimal, but apparently that particular solution was untenable to the people who supported the consuming software in the long term. This ended up with them pulling support from the strike team and becoming uninvolved. At the time, the scope and resourcing change was challenging since it threw everything into an uncertain state. I think it ended up being a good thing; we were able to focus on building out the functionality desired without spending a lot of time making changes to one consumer of the service. It does make the follow up more complex since we do need to make sure that work gets done otherwise the whole system will end up in a more complex state.

There was one portion of what we were going to deliver that had a complex deployment step. There were other changes that had to go either before or after this particular piece so we tried to move that deployment as far forward on the project plan as possible. The intention was to deliver it at the end of the first sprint. We encountered some significant scope changes that all unfortunately all had to go before that step. This ended up pushing the actual deployment out until the middle of the third sprint. At this point we had about ten tickets that we did the work for all sitting in this feature branch just waiting for the eventual opportunity to merge the whole thing back to master with nearly 5000 lines of changes. We ended up performing a series of messy merges bringing changes from master back to the feature branch. The final merge from the feature branch back to master was ugly since we wanted to preserve some but not all of the history of that branch. In retrospect we should have seen the merge issues coming and been more proactive about doing more small merges, but lesson learned for later.

Every team member lost some time to dealing with issues brought to them from the team they were on loan from. We had anticipated this during the first sprint but didn’t expect it to continue through the remainder of the strike team. I know I personally got sucked into some production oddities that were occurring and that after the first sprint without me my team missed my presence in code review to the point that they got me back to reviewing basically everything. Both of the other members of the team got pulled back to the their original team for issues that happened. We didn’t fight any of these needs figuring that since they seemed like critical issues for those teams, having their people available was the best overall usage of time, even if it hurt our overall ability to get things done on the strike team.

The team as a whole never really jelled to any significant degree. Everyone kept working the way their team did, which created some minor conflicts. One engineer’s team has a very lax overall structure and he would sort of disappear for days then show up with large amounts of excellent code, but it was frustrating for him to essentially go off the grid, without forewarning. The other came from a team with experience in a different technical stack and vacillated between asking not enough questions and spending too much time stuck and asking too many questions. The three of us were also in three different locations which made it difficult to really all get on the same page and get working together.

Overall the strike team model worked for this, since having representatives from some of the teams that were going to consume this service working on it made sure that we didn’t run into any issues with a misunderstanding of the domain. There were problems setting up a new team, that we should have attacked proactively since setting up any new team needs to initially get organized and form their own norms. The transient nature of the strike team prohibits a lot of identity building which, in my opinion is key to building a good team. Overall based on this experience I think that the strike team model can deal with software projects crossing multiple software team boundaries, but there may be other better ways out there to be found still.

Advertisements

Type Classes

I have been trying to understand type classes for a while and having a hard time figuring out exactly what they mean or how to effectively use them. The idea that they are a means to extend the behavior of a class in a consistent matter makes sense in an abstract way. But I’m not sure how exactly they accomplish that goal, or what a type class is or isn’t. A coworker had offhandedly referred to a type class as “a trait with a type parameter,” but it feels like there has to be more to it than that. This post is a sort of journal of my efforts to figure out exactly what a type class is.

The wikipedia page wasn’t that helpful to me since it defines type classes in terms of other unknown terms. It had one useful tidbit of information from my perspective: in Scala, type classes are implemented with implicits. Following some more links I ended up at the Cats documentation, which has a bunch of example type classes and code using them that I found useful. This got me thinking about whether I had seen anything that had a signature that looked like it might be using a type class. I remembered the sum method on List, which had stuck in my mind because it was unclear as to how it knew how to sum the items and what types it would be legal to sum.

This definitely looks like a type class given our definition so far. We have an implicit argument that is a trait with a type parameter. It is being used to extend numeric types to give them orderings and perform arithmetic operations, but it also signals that the type is numeric. The type class is also being used to constrain what types the sum method is available for, since if the implicit is not available it won’t compile. This constraint also plays nicely with the context bound syntax.

So we’ve got an example and some rules, but that’s not really a definition. I went looking for some more examples in the Cats codebase since that is full of type classes. Each of the individual type classes in Cats definitely follows the pattern of a trait with a type parameter. I think the missing piece for my understanding, lay in what the methods on the type class are. The methods all seem to take at least one argument of the type parameter so that appears to be a reasonable constraint on the functions.

Type classes are different than an implicit class since you can constrain type signatures with it, but they both let you add new functionality to an existing type. The implicit lookup progress imposes some constraints on the implicit class, such as only taking one non-implicit argument, which the type class methods can bypass. You could write an implicit class to expose the type class in a more fluent way like.

 implicit class NumericWrapper[T:Numeric](x: T) {
  def plus(b: T): T = {
    val n = implicitly[Numeric[T]]
    n.plus(x, b)
  }
}

Type classes seem like they would be more useful in a language without multiple inheritance. Since in Scala I can have a type implementing multiple traits that already have implementation associated with them, just mixing in more code seems like an easier way around the extension problems. I found this proposal for adding type classes to C#, which seems very cool and in line with the sorts of powerful abstractions they’ve been trying to add to the language. Seeing a different syntax for using type classes without implicits being involved helped me understand what they really are.

Going back to my initial goal of figuring out what a type class is and how it works, I think I’ve figured out both. It is a generic type that adds functionality to that type but defers the implementation of that functionality. Then you specify something that expects the type class and brings the implementation of the functionality and the data together. In Scala it accomplishes this using type bounds to specify the type class and implicit parameters to pass the type class into the implementation. I’m still not sure when I would want to write my own type class as opposed using other polymorphic concepts, but I’m now confident about using existing type classes and even breaking out some of the Cats-based ones as opposed to just some of the inbuilt ones.

Media Diet

Last week I wrote about how I really tackled my imposter syndrome by reaching out into the wider community. It helped me feel like I was making progress outside of whatever was going on at work. I wanted to share the resources I use to find new ideas and keep up my continuous learning.

Blogs

Podcasts

 

This may seem like a lot of stuff, but most of the podcasts publish once a week, and blogs are generally less frequent than that. I generally try to get to a meetup or two a week on top of this. The whole diet helps me feel more informed and in touch with a software community outside work.

Imposter Syndrome Meetup

I was at a local meetup about imposter syndrome this week and it made me remember how far I have come in my own career. The speaker talked about his journey and the times he felt like an imposter, even though he had the sorts of experiences that would make most engineers jealous. I want to talk through my own background and about how I managed to come to grips with my own professional insecurities. Hopefully this will inspire others to have more confidence in themselves.

I remember at my first job how little I felt like I was learning and how little it seemed my coworkers around me knew. When I went to interview for my next job a couple of years later I was highly nervous that I was behind the curve, and I wasn’t sure that I entirely understood how real-world software engineering was meant to work. When I got the job, I told myself that I got lucky that the interview was devised by a bunch of alumni from my college, so it covered the kinds of questions I had seen in school.

At that job the imposter syndrome kicked in immediately. I was afraid that they had certain expectations of me based on my experience level, where I felt like I hadn’t progressed past what I had learned in school. I thought that I was behind the curve on version control practices, and I hadn’t gained any exposure to any sort of real domain modeling or object oriented programming. I knew these skills were going to be important at this job and had assumed that they were things they could expect me to know when I walked in the door. There was all of the domain information that goes with a new job as well, and at this company it was literal rocket science so I couldn’t really slouch on that aspect either. The first few months were definitely rough, I had a couple of days where I spent all day fighting with basic ideas and couldn’t get anything to work, which made me feel like I didn’t deserve to be there. It eventually got better as I gained experience with the domain and the technology. I gained confidence after running down a couple of very gnarly bugs and getting praised for a creative solution to an awkward problem. Ultimately though, my anxiety was misplaced. It turned out that my managers never had expected me to walk in the door with these skills, they had picked me because they were happy with what I knew already.

Sadly the company ran into hard times and I got laid off, but this paradoxically resulted in big confidence boost. When I got back to my desk from hearing the bad news from my boss I had a ringing phone from a former coworker to schedule an interview with his new company. All that time I had thought that I was barely getting by, he’d thought I was doing fantastic work. In the two weeks between then and my last day at that job I got another offer as well which helped boost my confidence.

I took my former colleague’s offer; I’m sorry to say it ended up not being a great culture fit for me. But by now, the increase in confidence meant I was more willing to take a chance and make a move, looking for something that would be more of a challenge. I took a position in the same domain as an expert to help salvage a failing software project. This job was a good fit on paper for me, since I had experience on both the domain and the technical stack. I was confident going in and was initially given a lot of latitude to do what needed to be done, which was great technical experience. But, it had me doing a lot of more management style activities than I wanted to do, which was an area that I also felt I didn’t really have the right skills and experience. Then after getting the software stabilized and finding a order of magnitude performance improvement, the reward was to bog down the entire project in a mountain of process. So, while I’d gained confidence in my own abilities and standing when it came to technical issues, I was circling back to feeling like an imposter in this new process/management role.

My discomfort resulted in me moving to a small startup to help anchor their development team. The environment was very unstructured and goals changed week to week. I was immediately being asked to give expert opinions on technologies I had never worked with before. The situation was stressful because I felt like I wasn’t qualified to give these opinions, but it wasn’t clear whether they had anyone more qualified. On one hand I was faking it in that I didn’t know a lot of what I was talking about, but on the other hand I took the initiative to learn a lot about these new technologies. Really, I learned how to learn about technologies. The few technology decisions I made during my time there all seemed to work out fine, but I don’t know how that compares to having made other choices and I wasn’t there long enough to see the long term outcomes. Even now, I still find myself downplaying the difficulty of the work I did, still feeling like I was just a pretender.

My next job was at a large tech company, and it was an eye opening experience. This was the first ‘normal’ web application I had worked on since my first job and I was worried that I was out of practice. Since I had so many more years of experience than the last time I worked on web applications, I assumed the expectations for me would be higher than I could meet. I was worried that I would show up and not know how to do anything and would be summarily fired. This turned out not to be the case, but the impression I had going in impacted my ability to leverage myself to accomplish anything. My assumption that I wouldn’t be able to contribute right away meant I stayed quiet about areas where I could have made improvements to benefit the company; I let mediocre practices I witnessed linger way too long before trying to change them.

Despite the good work I did there, my inability to change the culture and other improvable development practices really hurt my confidence about what I could achieve in this environment. This, combined with the lack of knowledge around building web applications, pushed me to do anything and everything I could do to try and grow more. I put a concerted effort into getting out into the local development community to try and find a broader sense of inspiration. This was the time period when I started writing this blog as well. I started attending a number of local meetups and listening to various podcasts. Talking to so many new people who shared my struggles helped me understand that others don’t know some magical trick that I don’t. And, it made me realize that learning how to learn was one of the most important things I had achieved. For me, moving on from imposter syndrome has been about accepting that I don’t know everything I wish I did on a topic, but neither does anyone else, it’s all about our willingness and ability to learn and improve.

This all culminates with my current position where I changed tech stacks to stuff I had never used at all before. My specific experiences weren’t immediately relevant to this new technology stack, but I did bring a lot of thoughts on doing unit testing, domain modeling and other good technical practices. Since this was my fourth stack in 12 years as a professional I had a fair idea about how to pick up a new stack and leverage what I did know to learn new things. There are still lots of things I don’t know, but I managed to get enough together to know how to ask reasonable questions and to apply the concepts from other stacks. I am still at points concerned that I don’t know enough about certain topics but I have become become fearless about asking questions and unafraid of looking uninformed. This question asking seems to have helped one of the junior engineers on my team to have the confidence to ask questions in pull requests when he doesn’t understand what’s going on. That sort of safe space amongst the team is the sort of environment that I want to be in and having accepted my own lack of knowledge on some fronts has empowered those around me to find a better way for themselves.

Strike Teams

At work there has been a new practice of starting up strike teams for different projects. The idea is that for projects that require expertise not found on any individual team, you pull in a person or two from multiple different teams to get all of the correct skills on a single team. That’s the pitch of the strike team model, but the hidden downside is that it breaks up the cohesion of the teams that people are pulled from and the created team may not be together long enough to create new cohesion. This post is mostly going to be a chronicle of the issues that I encountered starting up a strike team and what we did to try to resolve the problems.

The first problem was coordinating the various different teams to figure out who was going to be involved. In the case of the strike team I was forming, the two teams contributing resources both wanted to know who the other team was going to contribute before making their own decisions. They also wanted to have a fixed end date to the project before deciding who to contribute. We ended up resolving this issue with a fixed date for whoever they contributed to the team and getting the two of them to discuss the situation amongst themselves.

The second problem was aligning the start date of the strike team. The three teams contributing resources all have different schedules to their individual sprints so it makes coordinating a ‘start’ date for the strike team difficult.  Those who were on teams about to start another sprint wanted the strike team to start now, whereas those with other commitments made weren’t available. We ended up doing a rolling start where as each team finished their sprint they rolled onto the team and the team ramps up as people become available. We did some preparatory work to get everyone up to speed on the goals and challenges of the team so they were aware of what’s going on whenever they were able to join.

The third big problem is more specific to our particular organization, not to the strike team model. As part of setting up the strike team we needed to schedule things like standups and retros for the team. Since the team is split across both coasts, the hours for scheduling these meetings are limited. The conference rooms are also pretty much all booked because every other team beat us to scheduling. We ended up asking IT to rearrange some other non-recurring meetings and managed to get a consistent slot for the standups. The retro slot was more complicated but we managed to get meetings roughly spaced; by not being a stickler for strict week deliminations we managed to find times that worked.

So with all the overhead sorted out we finally get to move on to the real work of the situation which will be a nice change of pace from the administrative aspect.

Seven More Languages in Seven Weeks

Seven More Languages in Seven Weeks is a continuation of the idea started in Seven Languages in Seven Weeks that by looking at other languages you can expand your understanding of concepts in software engineering. While you may never write production code in any of these languages, looking at the ideas that are available may influence the way you think about problems and provide better idioms for solving them.

This installment brings chapters on Lua, Factor, Elixir, Elm, Julia, MiniKanren, and Idris. Each of these languages is out on the forefront of some part of software engineering. Lua is a scripting language with excellent syntax for expressing data as code. Factor is a stack-based programming language with interesting function composition capabilities. Elixir is Ruby-like syntax on the Erlang VM. Elm is reactive functional programming targeting javascript as an output language. Julia is technical computing with a more user friendly atmosphere, and good parallelism primitives. MiniKanren is a logic programming language and constraint solver built on top of Clojure. Idris is a Haskell descendent bringing in the power of dependent types to provide provably correct functional code.

Overall it was an interesting survey of the variety of programming languages. Some I had done a bit with before (Lua, Elixir) some I had heard of before (Elm, Julia, and Idris) and some I hadn’t even heard of (Factor and MiniKanren). Each chapter was broken into three ‘days’ indicating a logical chunk of the book to tackle at once. Each day ends with a series of exercises to help make sure you understand what’s being presented.

Since these languages are out on the edge of the world in programming terms, they are evolving fairly quickly. This ended up biting the Elm example code particularly hard since large portions of it have been deprecated in the releases since then and they didn’t work on the current runtime. Compared to the lineup from the original book (Clojure, Haskell, Io, Prolog, Scala, Erlang, and Ruby) you’ve got a much broader variety of languages in the sequel, but nothing with the popularity of Ruby or the legacy install base of Erlang. Since this was written in 2014, none of these have had a massive breakout in terms of popularity and adoption, however they do seem to do well in terms of languages people want to work with.

Overall it’s an interesting take on where things could be going.  I don’t think most of the languages covered have significant mainstream appeal right now. Two of these languages seem to be more ready for the primetime than the others. Julia definitely has a niche where it could be successful. I feel like the environment is ripe for something like Elm to surge in popularity since frontend technology seems to be going through constant revisions.

Bug Bash

We had a big all-hands off-site meeting. In the runup to it,  there were two days that most teams had left out of  their planning process, so we ended up running a company wide bug bash to close out all of the existing bugs. Even those pesky minor bugs that are never really a priority. It felt like over those two days my team closed out an entire three week sprint’s worth of bugs by points. After the fact I went back and counted and found that the feeling was right. Some of the bugs had previously been pointed and some were quick estimates I put on them after the fact, but it was just a slew of 1-3 point tickets that all got closed out.

Dealing with ~30 tickets in two days was a furious endeavour for the team. It was really satisfying to deal with all of those tickets that had built up. I know they had been weighing on my mind in the sense that the list just seemed to keep growing and I wasn’t sure what to do about it.

This brought me to another interesting question: why were we so much more effective in this than in a normal sprint? Are we underestimating the stories relative to the bugs? That seems the most obvious answer. Is the low end of the point spectrum too compressed so the difference between a two point task and a three point task isn’t sufficiently granular? I spoke with some of the other members of my team about it and there was some speculation that since we each just took tickets related to what we already knew we just got more done, but that shouldn’t account for all of the difference. There was also a possibility that since all of the pieces were strongly independent we had less communication lag.

Maybe I should just be happy that all of those bugs got dealt with. But, I’d really love to find a way to bring that efficiency to our normal processes. If anyone has any ideas, please share in the comments.

Java Containers on Mesos

I recently ran into an interesting issue with an application running in a container. It would fire off a bunch of parallel web requests (~50) and sometimes would get but not process the results in a timely manner. This was despite the application performance monitoring we were using saying the CPU usage during the request stayed very low. After a ton of investigation, I found out a few very important facts that contradicted some assumptions I had made about how containers and the JVM interact.

  1. We had been running the containers in marathon with a very low CPU allocation (0.5) since they didn’t regularly do much computation. This isn’t a hard cap on resource usage of the container. Instead it is used by Mesos to decide which physical host should run the container and it influences the scheduler of the host machine. More information available on this in this blog post.
  2. The number of processors the runtime reports is the number of processors the host node has. It doesn’t have anything to do with a CPU allocation made to the container. This impacts all sorts of under the hood optimizations the runtime makes including thread pool sizes and JIT resources allocated. Check out this presentation for more information on this topic.
  3. Mesos can be configured with different isolation modes that control how the system behaves when containers begin to contest for resources. In my case this was configured to let me pull against future CPU allocation up to a certain point.

This all resulted in the service firing off all of the web requests on independent threads which burned through the CPU allocation for the current time period and the next. So then the results came back and weren’t processed. Immediately we changed the code to only fire off a maximum number of requests at a time. In the longer term we’re going to change how we are defining the number of threads but since that has a larger impact it got deferred until later when we could measure the impact more carefully.

Snipe Hunt

Recently I got pulled into a project to help get a feature that was mostly finished and needed to go through a “final QA round” before being ready for release. I felt that this wouldn’t require much of my time, but as you can imagine, things didn’t quite go as expected. The QA round found about a dozen errors in the new feature that I eventually divided into two classifications: requirements SNAFUs and code quality issues.

The requirements SNAFUs were the sorts of problems where the original programmer built what was explicitly asked for, but QA took the one of everything approach trying all sorts of cases that weren’t specified at all. These sorts of problems can be impactful from a time consumption perspective but aren’t that difficult to fix. The code quality issues are much more pernicious.

Digging into the code itself I quickly found an interesting fact. There were two fields, the currentPlanId and the activePlan, that were being mutated in various portions of the application, generally together. There wasn’t any clear distinction between the active plan and the current plan in the code, and at one point the currentPlanId was being set to the id from the active plan, sort of implying it’s the same thing but with poor naming. There were other places where one or both of them would mutate, and I went about tracing what caused the two to diverge or converge.

On initial page load the two would be different, with the active plan being blank, then when an item was selected on the drop down the two could converge, depending on what was selected.  I went and started looking for the tests covering this to see if there would be any clarification of the scenarios that were going on and turned up none. At this point I let others know of my findings and that while the problem seemed minor, there was a bigger quality problem under the hood of the system.

The first code change I made was a relatively minor one affecting when a particular button should show up; adding a special case and another test case started behaving. So far so good. Then I started tweaking the functions that were setting currentPlanId and activePlan. By this point I had managed to figure that current was a chronological state and active was a UI state, but it still wasn’t immediately clear how the system was making decisions about which plan was current. This obscured information seemed to be intertwined with the cause of a lot of the remaining broken cases.

I followed the webservice calls back through various layers microservices to where I knew the information had to be coming from and made an intriguing discovery. The way the frontend was deciding which plan was current was incorrectly being based on the timing between two different web service calls. I started digging around trying to find the right way to set all of this information and started to become clear that the initial architecture was missing a layer to coordinate the requests at the initial page load.

That got everything trending in the right direction. I still want to find some time to work through some additional unit tests and leave the code in a good state rather than just a better state.

Book Chat: The Pragmatic Programmer

For a long time this had been on my list of books to buy and read with a “note to self” saying to check if there was a copy of it somewhere on my bookshelf before buying one. It felt like a book I had read at some point years ago, but that I didn’t really remember anymore. Even the woodworking plane on the cover felt familiar. It felt like it was full of ideas about creating software that you love when you encounter them but are disappointingly sparse in practice. Despite being from the year 2000 it still contains a wealth of great advice on the craft of creating software.

Since it is about the craft of software, not any specific technologies or tools or styles, it aged much better than other books. That timeless quality makes the book like a great piece of hardwood furniture, it may wear a little but it develops that patina that says these are the ideas that really matter. There is an entire chapter devoted to mastering the basic tools of the trade: your editors and debuggers, as well as the suite of command line tools available to help deal with basic automation tasks. While we’ve developed a number of specialized tools to do a lot of these tasks it is valuable to remember than you don’t need to break out a really big tool to accomplish a small but valuable task.

It’s all about the fundamentals, and mastering these sorts of skills will transfer across domains and technical stacks. It was popular enough that is spawned an entire series of books – The Pragmatic Bookshelf – and while I have only written about one of them I have read a few more and they’ve all been informative.

About two-thirds of the way through the book I realized that I had indeed read it before – I had borrowed a copy of it from a coworker at my second job. He had recommended it to me as a source he had learned a lot from. I remember having enjoyed it a lot but not really appreciating the timeless quality. Probably since that would have been around 2007, it wouldn’t have seemed as old, especially since things seemed to be moving less quickly then. Maybe I just feel that way since I didn’t know enough of the old stuff to see it changing.

If you haven’t read it, go do it.

tumblr_inline_o2aushqfpx1slrvm0_1280