Strike Teams

At work there has been a new practice of starting up strike teams for different projects. The idea is that for projects that require expertise not found on any individual team, you pull in a person or two from multiple different teams to get all of the correct skills on a single team. That’s the pitch of the strike team model, but the hidden downside is that it breaks up the cohesion of the teams that people are pulled from and the created team may not be together long enough to create new cohesion. This post is mostly going to be a chronicle of the issues that I encountered starting up a strike team and what we did to try to resolve the problems.

The first problem was coordinating the various different teams to figure out who was going to be involved. In the case of the strike team I was forming, the two teams contributing resources both wanted to know who the other team was going to contribute before making their own decisions. They also wanted to have a fixed end date to the project before deciding who to contribute. We ended up resolving this issue with a fixed date for whoever they contributed to the team and getting the two of them to discuss the situation amongst themselves.

The second problem was aligning the start date of the strike team. The three teams contributing resources all have different schedules to their individual sprints so it makes coordinating a ‘start’ date for the strike team difficult.  Those who were on teams about to start another sprint wanted the strike team to start now, whereas those with other commitments made weren’t available. We ended up doing a rolling start where as each team finished their sprint they rolled onto the team and the team ramps up as people become available. We did some preparatory work to get everyone up to speed on the goals and challenges of the team so they were aware of what’s going on whenever they were able to join.

The third big problem is more specific to our particular organization, not to the strike team model. As part of setting up the strike team we needed to schedule things like standups and retros for the team. Since the team is split across both coasts, the hours for scheduling these meetings are limited. The conference rooms are also pretty much all booked because every other team beat us to scheduling. We ended up asking IT to rearrange some other non-recurring meetings and managed to get a consistent slot for the standups. The retro slot was more complicated but we managed to get meetings roughly spaced; by not being a stickler for strict week deliminations we managed to find times that worked.

So with all the overhead sorted out we finally get to move on to the real work of the situation which will be a nice change of pace from the administrative aspect.

Advertisements

Book Chat: Pair Programming Illuminated

My team has been doing more pair programming recently so I picked up a copy of Pair Programming Illuminated. I had never done a significant amount of pair programming before and while I felt I understood the basics, I was hoping to ramp up on some of the nuances of the practice.

It covers why you should be pair programming, convincing management that you should be able to pair program, the physical environment for local pairing, and common social constructs around different kinds of pairs. All of this is useful information, to varying degrees. Since the book was written in 2003, some of the specifics of the physical environment section didn’t age well – advising the use of 17” monitors most obviously. Both of the evangelizing sections seemed to cover the same ground, and did not seem to be written in a way to try and convince someone who is not already open to the concept. Neither section seemed to be written to the person who isn’t already in favor of doing pair programming. There were lots of references to studies, and some personal anecdotes, but none of it stuck in a way that felt like it would change someone’s mind.

The social aspects were interesting, however most of the section was stuff that felt obvious. If you have two introverts working together then they need to work differently than if you have two extroverts working together. A lot of the time the tips were common sense, and didn’t seem like it was necessary to write it down in the book. I would have liked to see more discussion of getting someone to vocalize more and clearly what they’re thinking about.

I feel like I’m better equipped to do pair programming because of having read this, but I also feel like a long blog post would have been just as good a resource and much more focused. I don’t know what else I would have wanted to fill out the rest of the book.

Changing Stacks

I had an odd realization recently – the jobs that I changed stacks for seemed to be better than the jobs where I already knew the stack. I’m not sure if it’s a common experience or something that’s unique to my experiences.

I’ve got a few theories about why this might be the case for me. First, the jobs where I changed stacks appeared more engaging to me because there was more to learn. Second, the jobs where I changed stacks felt better because they used different hiring practices, where they looked for underlying talent and skills as opposed to specific stack-related experiences. Third, relating to the second point, the diverse points of view brought together because so many colleagues were also changing stacks creates a better workplace. Fourth, the people that are willing to change stacks are the kind of people who are more open to learning going forward. Lastly, the jobs where I changed stacks happened to be better by pure chance due to small sample sizes. None of these theories is particularly provable. Several of them could be true and working together. They could all be false and it could be something completely different.

I know that from talking to some of my coworkers that they all came into their current jobs without particular experience in the Scala/Play Framework/MongoDB stack as well, although most of them came from a much more similar Java stack rather than the C# stack I came from most recently, or any of the other stacks I worked with in the past. They mentioned that we have issues recruiting in our DC office because of the pay differentials for cleared work in the area;  there are lots of places you can go and get a comfortable government contracting job and not really stretch yourself and grow. There has also been some discussion about how the stack effects our ability to recruit since it’s not as common as other stacks, and some candidates have expressed reservations since the stack isn’t really growing in popularity. There was a lot of information in the most recent Stack Overflow Developer Survey that corroborates the idea that the Scala is shrinking, but it says it is shrinking in relative of share of all votes. In absolute terms the change is less clear.

I guess that I’ve enjoyed the stack changes I’ve made. Only once did I set out with an intention of changing stacks as part of a job change. Then I wasn’t looking to go to anywhere particular, but I was looking to move away from ColdFusion since it didn’t really mesh with how I like to do development. I don’t have any intention to make another change any time soon, but I’ll definitely consider another stack change when I do just because it opens up a wider variety of options and seems to have worked out for me in the past.

Scoring Intern Puzzles

I recently got to score coding puzzle solutions submitted by a crop of students who wanted to be interns for my company. Overall, I scored six puzzles and gave two passing grades. These were applications generally submitted by rising Juniors and Seniors, so I had fairly modest expectations about the overall quality of the solutions they would be presenting. I wanted to see some basic object oriented design, usage of the proper data structures, and correctness in a couple edge cases. The issues with the failures varied, but the most egregious and common issue was a failure in understanding object orientation. The typical solution was a java file, with one class generally nothing but static methods and everything public.

This led me to two immediate thoughts: are these students horribly unprepared or am I expecting too much? I went and asked one of the engineers who had scored another batch of solutions about what he got. His batch was distinctly better, one even had unit tests, but there were still some significant duds. The conversation attracted two recent grads who took a look at some of the solutions, and also thought they weren’t that good; since they were closer experience-wise, I thought that their expectations would be more reasonable. I know that they are getting a computer science education not a software engineering education, and I had talked about this difference before, but they need some ability in computer programming to demonstrate their computer science skills.

I tried to think back to when I was in school and what I was like at that point. I know I didn’t have the best practices then, but I feel like I was ahead of where they were. I wish I still had some of the code I had written then to look back at and compare. I don’t think I was great at that point in my programming journey. But, I feel like I would have at least put together a class or two as part of the solution, even if they weren’t really needed,  just to show I could decompose the problem.

I suspect that since internships have become a significant need that students are pushing out applications and doing this puzzle is a hurdle to them doing yet another application but they don’t recognize how to optimize their solution to show off their skills. Each solution had some positive aspects, but were missing most of the aspects I wanted to see. To me this meant that the skills were being taught, since they had all picked up some of them, just not enough of them yet.

Productivity and Maintainability

As I had alluded to previously, the project I’m currently on has some very aggressive and fixed deadlines. The project got scoped down as much as possible but there have still been some trade-offs for productivity over maintainability. Some of it happened as a team decision – we discussed ways to compress the schedule and found some shortcuts to produce the needed functionality faster. But as new requirements came up in the process those shortcuts ended up causing much more work overall.

We knew that the shortcuts would cause more total work but we thought that the additional work could be deferred until after the initial deadline. The change of requirements meant that some of the automated testing that had been deferred was an issue. The initial manual testing plus the retest after the changes probably wasn’t much less than the effort to do the automated testing the first time. This specific shortcut wasn’t responsible for much of the time we shaved off the schedule, but when you’re doing similar things all over, the total cut was significant.

The key part of this tradeoff is that it assumed we would go back after the initial deadline and fill in the bits like we intended to instead of moving to something else (which is appearing more and more likely). Product management has a request for an additional feature in another existing application. This request has a similarly tight deadline, but is much smaller in scope (15 points) than the initial request(~120 points for the scoped down version). Picking up this request would mean that we end up further from the decision to defer portions of this and we’d be less likely to be able to deal with the technical debt in a timely manner that we accrued to the do the first request.

Since I’m still new at this job I’m not sure if this behavior is regular or since both of these items are related to the same event if it’s just a one-time thing. If this is a one-off problem then we can survive more or less regardless of how well we deal with it. If this is going to be a regular occurrence without an opportunity to resolve the problem, I feel like I’d need to take a different tactic in the future about how to work with the problem of short term needs.

This feels like the overall challenge of engineering: there are problems that need to be solved in a timeboxed way. We need to make good enough solutions that fit all of the parameters, be they time or cost. We can be productive and leave behind a disaster in our wake but succeed at business objectives, or we can build an amazing throne to software architecture but have the project fail for business reasons. The balance between the two is the where this gets really hard. If you’ve already been cutting corners day to day and the request to push the needle gets made, you have no resources left. The lack of an ability to really quantify the technical debt of an application in a systematic way makes it hard to project where we are on the continuum to others who aren’t working on the system daily.

Without this information, program-level decisions are hard to make and you end up with awkward mandates from the top that aren’t based on the realistic situation on the ground. Schedule pressure causes technical debt, testing coverage mandates cause tests that are brittle to just ratchet up the coverage, mandating technology and architectural decisions results in applications that just don’t work right.

Automation and the Definition of Done

I recently listened to this episode of Hanselminutes about including test automation in the definition of done. This reminded me of acceptance test driven development (ATDD) where you define the acceptance criteria as a test which you build software to fix. These are both ways to try to roll the creation of high level automated tests into the normal software creation workflow. I’ve always been a fan of doing more further up the test pyramid but never had significant success with it.

The issue I ran into the time I attempted to do ATDD was that the people who were writing the stories weren’t able to define the tests. We tried using gherkin syntax but kept getting into ambiguous states where terms were being used inconsistently or the intention was missing in the tests. I think if we had continued to do it past the ~3 months we tried it, we would have gotten our terminology consistent and gotten more experience at it.

At a different job we used gherkin syntax written by the test team; they enjoyed it but produced some significantly detailed tests that were difficult to keep up-to-date with changing requirements. The suite was eventually scrapped as they were making it hard to change the system due to the number of test cases that needed to be updated and the way that they were (in my opinion) overly specific.

I don’t think either of these experience say that the idea isn’t the right thing to do, just that the execution is difficult. At my current job we’ve been trying to backfill some API and UI level tests. The intention is to eventually roll this into the normal workflow;  I’m not sure how that transition will go, but gaining experience with the related technologies ahead of time seems like a good place to get started. We’ve been writing the tests using the Robot Framework for the API tests and Selenium for the UI ones. The two efforts are being led by different portions of the organization, and so far the UI tests seem to be having more success but neither has provided much value yet as they have been filling in over portions of the application that are more stable. Neither effort is far enough along to help regress the application more efficiently yet, but the gains will hopefully be significant.

Like a lot of changes in a software engineering the change is more social than technical. We need to integrate the change into the workflow, learn to use new tools and frameworks, or unify terminology. The question of which to do keeps coming up but the execution has been lacking in several attempts I’ve seen. I don’t think it’s harder than other social changes I’ve seen adopted, such as an agile transition, but I do think it’s easier to give up on since it isn’t necessary to have high level automated testing.

Including automation in the definition of done is a different way to describe the problem of keeping the automation up to date. I think saying it’s a whole team responsibility to ensure the automation gets created and is kept up to date makes it harder to back out later. The issue of getting over the hump of the missing automation is still there, but backfilling as you make changes to an application rather than all upfront may work well.

Theories of Technical Debt

There are a couple of different major causes of technical debt even on the best run projects.

  1. Schedule pressure
  2. Changed requirements
  3. Lack of understanding of the domain

You can choose to take on debt strategically to accommodate an aggressive schedule, you can accumulate debt from having things change, or you can collect debt from doing what appeared to be the right thing but turned out not to be once you learned more about the underlying situation. There are plenty of other reasons that technical debt can be acquired, but most of those can be avoided. Changing requirements can’t really be avoided; things can change, that’s the nature of life. Understanding of the domain is a tricky issue, since you can spend more time upfront to better understand the domain but you will still uncover new aspects as you go.

Schedule pressure is the most demanding of the three. You can consciously say that technical debt should be taken on by doing something the expedient way. You can also have implicit schedule pressure that pervades the decision making process. This sort of pervasive pressure causes people to value different things. If leadership discusses the schedule day in, day out, but doesn’t mention quality, it ends up lacking.

Technical debt is fundamentally a lack of quality; not in the defect sense but in the lack of craftsmanship sense. All of those 5,000 line classes got written by other engineers who were doing the best they could within the constraints of the environment at the time. But some engineers look at that and don’t want to touch it afraid of what it is and how hard it is to change. Some other engineers look at it and see a mountain to be climbed, or a wilderness to be civilized. The problem code is something to be taken and broken to your will. Each kind of engineer has a place in the development lifecycle.

If a company needs to hit a product window and is only full of the kind of engineers who see debt as a challenge to be dealt with they might not be able to make that tradeoff. If you only have the engineers who are concerned with maximum velocity but leave behind chaos in their wake you will have trouble as the codebase matures. Every engineer seems to have a range on the spectrum where they are most comfortable. Where in that range they land day to day seems to be controlled by the environment around them.

If you are surrounded by one side or the other you might lean towards that side of the range. The messages management sends out about the quality of software relative to the messages they send out about schedule of the project is another important factor. If they keep hammering home to get more done, people will find corners they think can get cut. What those corners are will differ between people, but they will take on debt in various corners of the codebase. This sort of debt is really insidious since it was never consciously decided on. If you decide that you will defer good error messages, or avoid building out an abstraction now, but you do it explicitly because of the schedule is the right business choice, then since the team discussed and decided to do it, they know as a whole that’s not the best technical solution but is the best business solution, whereas if someone just does it everyone else may not even be aware of the other options.

Quality Software

What makes a piece of quality software? Low current defect count is an obvious part of quality software. An intuitive and powerful interface is another obvious part; the interface should be easy for a beginner to start with, yet rich and expressive for an experienced user. These are both important to quality software from a product perspective, but quality on the engineering side is more complex.

On the engineering side, the most important part of quality software is that those who need to make changes can do so confidently. This is a multifaceted goal, with lots of components.

quality-software

You can accomplish each of these subgoals through multiple paths of action. You can get a well-tested application by doing TDD, ATDD, BDD, ad-hoc unit testing, or with enough time, just plain old manual testing. Organizations have all picked specific tools or practices to try to accomplish a particular intention and enshrined that particular practice as the right way to do things. The techniques give you practices but it is hard to say if you are achieving what they are designed to do without quantifying them somehow.

Most of these aspects are difficult to quantify if you’re trying to measure “quality.” Code coverage, while a quantifiable metric, isn’t the important part; if you cover 10,000 lines of models but skip 150 lines of serious business logic, the metric will say you’re doing great but you missed the important parts. The important part of code coverage is knowing that you got the important parts of the codebase. There is some interesting research going on into quantifying the readability of source code, but it acknowledges that the humans they used to score the samples did not regularly agree on what was readable. You can use Cyclomatic Complexity as a proxy against modularity, which can tell you that you successfully broke the problem into smaller pieces, but it doesn’t tell you anything about your ability to replace one without impacting others. Easy-to-build and easy-to-start-with can be quantified to a certain degree; if you give a new developer the codebase and related documentation, how long does it take them to get it running on their local machine. The problem with that metric is you don’t often bring in new developers to a project and new developers aren’t all created equal. I don’t think I’ve ever seen someone actively collect that metric in minutes and hours.

The more time I spend in this industry the more it seems most activity is designed to ensure that the system is functionally correct, since that is the first level of success. This second level of creating truly quality software seems to be underappreciated because it is hard to quantify and unless you spend your days with the code base you can’t appreciate the difference. I think this difference is what can make some tech companies unstoppable juggernauts in the face of competition and everything else. I suspect that a lot of the companies that are putting out new and innovative ideas on the edge of building higher quality systems have found ways to quantify the presence of quality, not just the lack of quality signified by downtime and defect counts.

Propinquity

Propinquity can be described as the rather obvious idea that you interact more with people near you. Not just physical nearness – it also encompases mental nearness like shared beliefs, common experiences, or a shared ubiquitous language. This comes up in software development in a number of different ways. Propinquity is teams that are more efficient because they understand each other and the domain at a deeper level. It is a team that intimately understands each other and the domain. It has negative expressions too. You get software ecosystems that become insular and don’t integrate new ideas. It is even sexism and racism. It is an unconscious bias that influences the way you think about things.

Recognizing propinquity can help you overcome your negative biases, build stronger teams, create solutions to your hardest problems from outside the box. Team culture is propinquity – it can be good forming bonds between coworkers, it can be bad when the team looks for people like themselves and ends up rejecting the best candidate because they didn’t have a strong enough resemblance. This can also be seen as a kind of implicit bias; when you start to think of the discussion of diversity in the workplace in terms of propinquity, it starts to mean something different.

I recently heard an interesting discussion with Leslie Miley of Twitter on a podcast (starting about 12 minutes in) about how the lack of diversity at Twitter could be negatively impacting their ability to solve problems. His thesis is that if you hire everyone with very similar backgrounds that they will all look at the same problem the same way and come up with the same solutions. They would then miss the same things, and get stumped by the same problems. This is why you often get great solutions from people outside of the problem, because they are looking at the problem from a different perspective.

I don’t have much insight into how other professions interview people, but in software it seems like the interview is more to determine if you can do the job than if the applicant is the person you actually want to be working with. But the way we go about determining if you can do the job is to ask questions that you don’t truly need to be able to answer to do the job, but implicitly reflect schools of thought on how the job should be done. If you ask a question about red-black trees you are looking for someone who had a computer science education; you aren’t expecting anyone to build a red-black tree but you are using that as a proxy for having been exposed to certain ideas. If instead you ask questions about TDD or design patterns it says you are the kind of people who care about those ideas instead and have less of an interest in computer science fundamentals and optimizations.

I went on an interview previously where they asked lots of low-level questions. The job had nothing particular to do with those skills, but they felt that those skills were what differentiated good programmers from bad ones. Their understanding of what makes a good programmer is that you can twiddle bits and will micro-optimize solutions. The interviews we gave at a place I used to work were focused on design patterns, which  was the language everyone there shared. At that job we didn’t have the low-level optimization skills to find a particular optimization that would have been a big win for an ongoing performance problem.

This ties back into the post I had on evolution of the javascript ecosystem. The ecosystem has propinquity in terms of javascript language itself, but has diversity in that most of the developers came from different stacks with different backgrounds. This level of creativity is what can be achieved if you balance propinquity and diversity against each other well.

Propinquity looks like culture superficially, but it really is a double-edged sword: a team that intuitively understands each other, but blocks out other perspectives that could be valuable to solve problems better. The values espoused by your organization are all about creating a culture and a shared experience to build the positive aspects of propinquity, but the negative aspects can appear too if not kept in mind. The different aspects of what an engineer looks like can’t be too rigid, otherwise you reject great candidates for lack of a degree, or interview for markers that imply competence but aren’t competence themselves. Conversely diversity is more than ethnic diversity – it’s diversity of thought. You need people who think in design patterns and people who think functionally and people who buck the trends of what you are doing right now.

Ideally you could get people who can do all of those things. That’s obviously not a realistic way to do recruitment. You want people who fill in your weak areas, ideally creating something like this.

overlapping

It represents the whole organization and how each individual’s competencies overlap but are each valuable. Each individual is close enough to the others to have some propinquity, but the whole group embodies the diversity you would want to get different viewpoints on problems. Conceptually with smaller units you can also look at this as a radar chart, if you map out where your team is currently strong and where it is currently weak, you see where you need help and where you don’t need as much, but it’s still hard to find the right way to find that person when they look different, and may not be in your normal talent pipelines.

Even just considering these sorts of issues points you in the right direction. If you consider that the person you are looking for is different than those you have, you might be willing to ask different questions or judge the responses differently.

Fix vs Replace

I was thinking about when to fix a piece of software versus replace it as part of our normal software lifecycle, prompted by a discussion at work of a particular piece of software. The software in question works great, but nobody is confident that they are successful when changes do need to be made. Part of the lack of confidence is because changes are only made once or twice a year, so getting it set up and running locally and testing it is a chore, plus it turns into a complex process with many chances of failure. The other part is it runs on top of another library that nobody is really familiar with; the library isn’t used anywhere else in our stack, so nobody develops any familiarity with it. This particular piece of software is also critical to expanding our business, as it’s the primary way new client data is loaded into the system.

If we wanted to fix it, we could take the existing solid software and enhance it to make the setup and testing easier, or we could clean up the usage of the underlying library so it’s more intuitive. However, the decision was instead made to replace it with something totally new. I wasn’t involved in the decision, but became involved with the original piece of software after the fact to make a change while the replacement is still being developed. It seems like this software could be rehabilitated at first glance, but clearly someone else thought otherwise.

I know in general I’m biased towards fixing existing software. I’ve spent most of my career working on brownfield applications and building oddly shaped pegs to fit back into the oddly shaped holes of those applications. I think that I’ve done this because I enjoy it; building everything from scratch is almost too easy since there are so many fewer constraints involved. I get a different sort of satisfaction from it. I know I’m not the only one who has this particular tendency; the folks at Corgibytes are specializing in this sort of work. I’ve even been nostalgic for a codebase I’ve worked on – not the application but the codebase itself.

I feel like most organizations are biased towards replacing software because it lets you just say the entire thing is bad and try again instead of having to pick a particular thing to do or change. You don’t have to agree on what’s wrong with it, or get into details of what to change. This flexibility of scope leads to the quid pro quo rewrite where a piece of software is replaced with a new version that also contains a major new feature; this concept was introduced to me by Re-Engineering Legacy Software. Re-Engineering Legacy Software describes this as a bargaining tool to enable you to gain acceptance of a plan to do the rewrite, but I’ve always seen the business bring up the rewrite with the idea that they’re unhappy with the team’s ability to change a piece of software and this would clean up the underlying causes of the problems.

That’s the big problem: the current software has problems due to something. And unless you deal with whatever that “something” is, the new software probably is not going to be significantly better than the old. It may not have had the chance to become crufty yet, so it seems better when it’s new, but given a few years goes back to the same sorts of problems you had the last time. You need normal software processes that enable you to create and maintain quality software even as requirements change.

You need feedback into your initial processes of what caused the cruftiness to accumulate. This can be seen as a form of double-loop learning, where the feedback of what happened impacts how you see the world and thereby influences your decision-making process, not just the decisions themselves. If you are accumulating cruft because you put schedule pressure on the initial development resulting in a less modular design, the feedback to the decision-making process would be different than if you are accumulating cruft because the requirements changed radically. To make true long-term improvements, that’s the step you need to take, which sometimes might lead you to fix, and sometimes might lead you to replace.