Effective DevOps is about the culture of the DevOps movement. The technical practices that today coincide with DevOps are the result of the culture practices, not the cause. The cause is an underlying culture that is safe, and respectful to those in it, which truly empowers the team to try things to improve the way that work is done and leads to the technical practices associated with DevOps. The book is overall written more from a management perspective than an individual contributor perspective. The book is centered around the four pillars of effective DevOps: Collaboration, Affinity, Tools, and Scaling.
Collaboration is the normal sort of mentoring, and workflow information that would be familiar to most agile or lean practitioners. The Affinity pillar though builds on top of Collaboration with the idea that it takes time and work to forge a group of individuals into a team and explores the requirements to build those bonds. These two pillars lead i7nto the Scaling pillar nicely since, while you can eliminate waste and automate things, at the end of the day the biggest scaling maneuver is in hiring. Hiring renews the importance of the Collaboration and Affinity aspects of this since as you bring new people into the system you must fully integrate them.
The section on the Tools pillar is written in a tool agnostic fashion, wherein it describes categories of tools commonly used to the DevOps. That makes it much more interesting than any other book that is tied to a particular set of technologies since it is focused on the concept not the implementation.
Overall it’s an interesting read. The focus on the social aspects of what’s going on makes it less useful in my day to day activities, but the longer I do this job the more I think that the technical aspect is essentially table stakes to doing the job and everything else is where more long term growth come from.
This is a follow up to the original strike team post I did a while back. I’m writing this as the strike team is wrapping up. It’s been an interesting experience. We hit most of our broad goals but did it in a significantly different way than anticipated.
The first big change came early on when we were looking into the proposed changes to an existing piece of software that was going to be a consumer of what we were building. We realized that accommodating it was probably weeks of work by itself. The initial assumption of how much work was needed had been minimal, but apparently that particular solution was untenable to the people who supported the consuming software in the long term. This ended up with them pulling support from the strike team and becoming uninvolved. At the time, the scope and resourcing change was challenging since it threw everything into an uncertain state. I think it ended up being a good thing; we were able to focus on building out the functionality desired without spending a lot of time making changes to one consumer of the service. It does make the follow up more complex since we do need to make sure that work gets done otherwise the whole system will end up in a more complex state.
There was one portion of what we were going to deliver that had a complex deployment step. There were other changes that had to go either before or after this particular piece so we tried to move that deployment as far forward on the project plan as possible. The intention was to deliver it at the end of the first sprint. We encountered some significant scope changes that all unfortunately all had to go before that step. This ended up pushing the actual deployment out until the middle of the third sprint. At this point we had about ten tickets that we did the work for all sitting in this feature branch just waiting for the eventual opportunity to merge the whole thing back to master with nearly 5000 lines of changes. We ended up performing a series of messy merges bringing changes from master back to the feature branch. The final merge from the feature branch back to master was ugly since we wanted to preserve some but not all of the history of that branch. In retrospect we should have seen the merge issues coming and been more proactive about doing more small merges, but lesson learned for later.
Every team member lost some time to dealing with issues brought to them from the team they were on loan from. We had anticipated this during the first sprint but didn’t expect it to continue through the remainder of the strike team. I know I personally got sucked into some production oddities that were occurring and that after the first sprint without me my team missed my presence in code review to the point that they got me back to reviewing basically everything. Both of the other members of the team got pulled back to the their original team for issues that happened. We didn’t fight any of these needs figuring that since they seemed like critical issues for those teams, having their people available was the best overall usage of time, even if it hurt our overall ability to get things done on the strike team.
The team as a whole never really jelled to any significant degree. Everyone kept working the way their team did, which created some minor conflicts. One engineer’s team has a very lax overall structure and he would sort of disappear for days then show up with large amounts of excellent code, but it was frustrating for him to essentially go off the grid, without forewarning. The other came from a team with experience in a different technical stack and vacillated between asking not enough questions and spending too much time stuck and asking too many questions. The three of us were also in three different locations which made it difficult to really all get on the same page and get working together.
Overall the strike team model worked for this, since having representatives from some of the teams that were going to consume this service working on it made sure that we didn’t run into any issues with a misunderstanding of the domain. There were problems setting up a new team, that we should have attacked proactively since setting up any new team needs to initially get organized and form their own norms. The transient nature of the strike team prohibits a lot of identity building which, in my opinion is key to building a good team. Overall based on this experience I think that the strike team model can deal with software projects crossing multiple software team boundaries, but there may be other better ways out there to be found still.
Growing Object-Oriented Software Guided By Tests is an early text on TDD. Since it was published in 2010, the code samples are fairly dated, but the essence of TDD is there to be expressed. So, you need to look past some of the specific listings since their choice of libraries (JUnit, jMock, and something called Window Licker I had never heard of) seem to have fallen out of favor. Instead, focus on the listings where they show all of the steps and how their code evolved through building out each individual item. It’s sort of as if you are engaged in pair programming with the book, in that you see the thought process and those intermediate steps that would never show up in a commit history, sort of like this old post on refactoring but with the code intermixed.
This would have been mind blowing stuff to me in 2010, however the march of time seems to have moved three of the five parts of the book into ‘correct but commonly known’ territory. The last two parts cover what people are still having trouble with when doing TDD.
Part 4 of the book really spoke to me. It is an anti-pattern listing describing ways they had seen TDD go off the rails and options for how to try to deal with each of those issues. Some of the anti-patterns were architectural like singletons, some were specific technical ideas like patterns for making test data, and some were more social in terms of how to write the tests to make the more readable or create better failure messages.
Part 5 covers some advanced topics like how to write tests for threads or asynchronous code. I haven’t had a chance to try the strategies they are showing but they do look better than the ways I had coped with these problems in the past. There is also an awesome appendix on how to write a hamcrest matcher which when I’ve had to do it in the past was more difficult to to do the first time than it would look.
Overall if you are doing TDD and are running into issues, checking out part 4 of this book could easily help you immediately. Reading parts 1 through 3 is still a great introduction to the topic if you aren’t already familiar. I didn’t have a good recommendation book on TDD before and while this isn’t amazing in all respects I would recommend it to someone looking to get started with the ideas.
The team I joined at my new job had been doing about 50 points per sprint with 4 devs and the manager when I joined. There were some additional staff changes, but when everything smoothed out again we ended up doing about 80 points per sprint with 4 devs and the manager. The new team did not seem better than the old team, but during the turbulence of the staff changes we changed how we prepared work to be done and managed to get better organized, which enables us to do more.
The change was a different way of collecting the relevant dependencies in the tickets. For example, instead of giving a vague reference to an endpoint that needed to be called, we made sure to give an explicit url, plus links to the swagger definition for the endpoint or a listing of explicit endpoints to be created and the related models. We also linked together all of the tickets that were related to make it easier to juggle which services were ready to release or where we had dependencies between tickets. It doesn’t seem like a significant change, but it resolved a bunch of the little dependencies where something wasn’t clear and you needed to hunt down an answer.
The other half of the anecdote is that due to scheduling conflicts we missed out on time to do some of the preparation for a sprint and ended up going ahead without as much preparation; we dropped back down to about 50 points completed. Our manager didn’t seem to have gotten any credit for us having gone from 50 to 80 but certainly caught flack for the sprint where we went back to 50.
Maybe this anecdote will inspire you to experiment with that little change that didn’t seem to be worth trying since it wasn’t clear how it would really help. Lining up a little change is easier than doing something big.
This is somewhat outside of the normal material I write about, but this idea came to mind and I wanted to take an opportunity to explore it further. The context of this thought was that a mentoring program is being launched at work. Why is a manager both responsible for guiding the day-to-day execution of a project and for development of staff beneath them?
The goals of the project and the goals of personnel development are often at odds. Any specific experience a person needs or wants to get is likely something they don’t have much (if any) experience with yet, which means they won’t be as efficient or as good at that particular skill. While management taking care of your people gains you flexibility and goodwill in the long term, it has short term costs.
The product owner is separated from the scrum master, separating the “what” from the “how” in order to prevent a conflict of interest. Similarly, splitting the staff development aspect from the technical management of the project seems like it could prevent a similar conflict of interest about an individual. Maybe a stronger explicit mentorship program would handle mitigate this conflict of interest, but unless you give out a mentor who is both capable of and interested in working on staff development it wouldn’t help much. I have seen an explicit mentorship program where serving as a mentor was informally required for promotion to higher level, which resulted in people becoming mentors explicitly to check the box for their own personal benefit.
By setting up part of management to be incentivized to do staff development and achieve technical excellence, rather than completing projects or shipping features, you can create an environment that allows those closest to the system under development to do their best without the pressure to achieve explicit results. This reminds me of the idea of slack in queueing theory, where putting less work into the system means more work will come out. Once you build up staff appropriately and get everyone cross-trained, the overall outcome becomes better. Think of it as an optimization problem where you may have achieved a local maximum; getting to a higher peak would be better but the cost of valley between the peaks needs to be paid.
You could theoretically have the manager of the developers on a team be involved with the work of a different team, but it would be hard to see what those developers needed in order to help develop them in the longer term. Spending all day with a group of other developers working on the technical problems you face doesn’t really give you the insights necessary to see what a completely separate group of developers is struggling with. If you look at a problem with a new perspective, you can easily see different solutions, you could also miss important details about the problem itself and backtrack to already tread territory.
Maybe this is just my perspective based on the places I’ve worked, where the scrum master has been more of a process leader and impediment resolver than technical coach or project management. In my experience, development managers have spent a large portion of time working with the product owner to help factor stories in a more completable fashion and to derive the technical requirements from the business requirements. It always felt like the development manager spent more of their time working on those urgent but not really important things, and ignoring the important but delayable things because they were hard.
To answer the original question “Why is a manager both responsible for guiding the day-to-day execution of a project and for development of staff beneath them?” it seems to be because splitting up the responsibility differently won’t give management the day-to-day visibility into how to effectively provide useful developmental guidance. The ability to coach people on how to perform their role better requires that you spend enough time seeing them perform the role, so you can’t really be engaged in other technical activities on a day-to-day basis. Even if they had the insight to do so, it is exceedingly hard to measure staff development, which would make it hard to create goals and metrics around the activity. If you’ve had a different experience with how these roles have been broken down post a comment.
“Write the code you want and then make it compile” was a thought expressed on library design while I was at the NE Scala Symposium. It is a different way to describe the TDD maxim of letting the usage in tests guide the design. It is very much influenced by the extremely flexible syntax rules and DSL creation abilities in Scala. One of the talks, Can a DSL be Human? by Katrin Shechtman, took a song’s lyrics and produced a DSL that would compile them.
Since you can make any set of arbitrary semantics compile, there is no reason you can’t have the code you want for your application. There is an underlying library layer that may not be the prettiest code, or may be significantly verbose but you can always make it work. Segregating the complexity to one portion of the code base means that most of the business logic is set up in a clean fashion and that the related errors can be handled in a structured and centralized fashion.
Taking the time to do all of this for a little utility probably isn’t worth it, but the more often a library is used the more valuable this becomes. If you’ve got a library that will be used by hundreds, really refining the interface to make it match how you think would be really user friendly.
Building software that works is the easy part, building an intuitive interface and all of the comprehensive documentation so others can understand what a library can do for you is the hard part. I’m going to take this to heart with some changes coming up with a library at work.
This still doesn’t even cover the aspect of deciding what you want. There are different ways you can express the same idea. The difference between a function, a symbolic operator, or create a DSL can all express the same functionality. You can express the domain in multiple ways, case classes, enums, or a sealed trait. You can declare a trait, a free function, or an implicit class. Deciding on the right way to express all of this is the dividing line between a working library and a good library.
I’ve been involved in a set of architecture discussions recently and it had me thinking about the role of architecture in an agile development team. The specific discussion mostly was around which services owned what data and which services were confederates using that data. My team owned all of the services in question and felt that the data should be centralized and the edge pieces should query the central store. There were some other interested parties that felt that the data should be distributed across the edge pieces which would coordinate amongst themselves and had a central proxy for outside services to query through. Both options had some pros and had some cons.
I had proposed some changes to the initial centralization plan to account for some issues raised by the other parties. Then something unexpected happened – they said that’s great but you need to do it our way anyway. This had me stunned at first. This didn’t change the way that they would use the resulting system so why was it their decision to make? Sure, they had higher titles but that isn’t a license to make architectural decisions universally. I hadn’t been in a situation like this before and wasn’t quite sure what to do with it to convince everyone involved that my proposal was the best option for the organization as a whole..
Fortunately the exchange was over email so I took a while to regroup and collect some opinions about what to do. I asked a few different people and got a couple of specific pieces of advice. First, that I should set a meeting with the all of the sets of stakeholders, so they would remember that there were more people involved. Second, that I should prep a full written description of the various options in advance. The goal of setting a meeting was to get a concentrated block of attention from the other parties rather than a series of ad hoc email exchanges. The written description was to make it clear what the full state was, rather than just a series of deltas from the original plan. That would make it clearer what was going on. Bringing together all of the stakeholders meant that those opposed to it for reasonable but team-specific reasons would see the other stakeholders and hopefully see their perspective.
All of this went down over two sprints where we had set aside time to sort out this architecture for our next big project. Defining these tasks was tricky; the first sprint had tickets to handle figuring out all of the requirements and speccing several different options. The requirements weren’t put together by product since this was not an end user-facing feature. The second sprint was to gain consensus on the various options, which brings us back to the anecdote above.
The two points came together and once we had their concentrated attention and the complete description of the plan they came around to our centralization plan, with modifications.
I previously mentioned the new service we were spinning up and the discussion of the overhead therein. Having finished getting the initial version of the service out into production, I feel like I have some answers now. The overhead wasn’t that bad, but could have been lower.
The repo was easy as expected. The tool for setting up the CI jobs was quite helpful, although we didn’t know about a lot of the configuration options available to us. We initially configured with the options we were familiar with, but found ourselves going back to make a couple of tweaks to the initial configuration. The code generators worked out great and saved a ton of time to get started.
The environment configuration didn’t work out as well as expected. The idea was that the new service would pick up defaults for essentially all of its needed configuration, which would reduce the time we would need to spend figuring it out ourselves. This worked out reasonably well in the development environment. In the integration environment we ran into some problems because the default configuration was missing some required elements. This resulted in us not having any port mappings set up so nothing could talk to our container. We burned a couple of hours on sorting out this problem. But when we went to the preproduction environment we again found its port mapping settings were different from the lower environments and needed to be setup differently. Here we ended up burning even more time since the service isn’t exposed externally and we needed to figure out how to troubleshoot the problem differently.
In the end I still think spinning up the new services on this short timeframe was the right thing to do – we would have had to learn this stuff eventually when building a new service. Doing it all on the tight timeline was unfortunate but the idea of getting the services factored right is the best thing.
At work we started spinning up a new service and concerns were expressed by some interested parties about the overhead required to get a new service into production. Among the specific concerns that were articulated: it needed to have a repo made, CI builds set up, be configured for the various environments, be creating databases in those environments, etc. Having not deployed a new service at this job I’m not sure if the concerns are overblown or not.
We’ve got a platform set up to help speed the creation of new microservices. The platform can help spin up CI jobs and simplify environment configuration. Creating the repo should be a couple of clicks to create it and assign the right permissions then set up the hook for the CI process. I’m optimistic that this process should make it easy, but only two people on the team were part of spinning up a new service the last time the team did it, and neither of them was involved in much of dealing with this infrastructure.
The project all of this is for needs to be put together and running in production in the next month, so the overhead can’t take up much of the actual schedule. The first version of the service isn’t complicated at all and can probably be cranked out in a week or so of heads-down work. It needs a little bit of front-end work but has been descoped as far as possible to make the timelines easy. We’ve got a couple of code generators to help bootstrap a lot of the infrastructure of the service; we’ve even got a custom yeoman generator to help bootstrap the UI code.
I’m curious if the concerns were memories of spinning up new services in the world that predates a lot of the platform infrastructure we have today or if it’s a reasonable understanding in the complexity of doing all of this. But it raises the question of how much overhead you should have for spinning up a new service. As little as possible seems reasonable, but the amount of effort to automate this setup relative to how often you create new services makes that not as reasonable as it first appears. I think it needs to be easy enough that you create a new service when it makes logical sense in the architecture and the overhead of doing so isn’t considered in the equation.
I recently listened to this episode of Hanselminutes about including test automation in the definition of done. This reminded me of acceptance test driven development (ATDD) where you define the acceptance criteria as a test which you build software to fix. These are both ways to try to roll the creation of high level automated tests into the normal software creation workflow. I’ve always been a fan of doing more further up the test pyramid but never had significant success with it.
The issue I ran into the time I attempted to do ATDD was that the people who were writing the stories weren’t able to define the tests. We tried using gherkin syntax but kept getting into ambiguous states where terms were being used inconsistently or the intention was missing in the tests. I think if we had continued to do it past the ~3 months we tried it, we would have gotten our terminology consistent and gotten more experience at it.
At a different job we used gherkin syntax written by the test team; they enjoyed it but produced some significantly detailed tests that were difficult to keep up-to-date with changing requirements. The suite was eventually scrapped as they were making it hard to change the system due to the number of test cases that needed to be updated and the way that they were (in my opinion) overly specific.
I don’t think either of these experience say that the idea isn’t the right thing to do, just that the execution is difficult. At my current job we’ve been trying to backfill some API and UI level tests. The intention is to eventually roll this into the normal workflow; I’m not sure how that transition will go, but gaining experience with the related technologies ahead of time seems like a good place to get started. We’ve been writing the tests using the Robot Framework for the API tests and Selenium for the UI ones. The two efforts are being led by different portions of the organization, and so far the UI tests seem to be having more success but neither has provided much value yet as they have been filling in over portions of the application that are more stable. Neither effort is far enough along to help regress the application more efficiently yet, but the gains will hopefully be significant.
Like a lot of changes in a software engineering the change is more social than technical. We need to integrate the change into the workflow, learn to use new tools and frameworks, or unify terminology. The question of which to do keeps coming up but the execution has been lacking in several attempts I’ve seen. I don’t think it’s harder than other social changes I’ve seen adopted, such as an agile transition, but I do think it’s easier to give up on since it isn’t necessary to have high level automated testing.
Including automation in the definition of done is a different way to describe the problem of keeping the automation up to date. I think saying it’s a whole team responsibility to ensure the automation gets created and is kept up to date makes it harder to back out later. The issue of getting over the hump of the missing automation is still there, but backfilling as you make changes to an application rather than all upfront may work well.