Recently, I was trying to describe the relationship of my team to a particular piece of software within our company. I ended up settling on the term “caretaking.” We don’t have ownership of this particular piece of software in the traditional sense, wherein we are responsible for the content of the software and whether it is correct. Yet, we were in the terrible position of being the first people called whenever it ran into problems.
The software itself was in weird but decent shape; a lot of old battle-tested code that could benefit from some modernization and better automated testing (it had a couple of high-level integration tests but they were finicky and would only run in one particular QA environment). It is in that zombie state where we don’t make any changes to it since it meets the requirements, but nobody would look at it and say that’s some nice code. Nobody on the team had any particular experience with this code, but someone who had been on the team years ago had worked on it before joining this team. The code just sort of followed along with the person, and nobody ever really pushed back on it, since it didn’t run into problems. This particular code is even slated to be replaced, as a new team – we’ll call them “Triangle” – is centralizing all of responsibilities in this application’s domain into a couple of different services. We were caretaking – being nominally responsible for now but not paying too much attention to it.
This was all fine, right up until a serious security bug was found in this code. Our security team was rightfully saying this needed to get resolved ASAP. We wanted Triangle to just replace it now since that was going to happen anyway and we were already booked on other time-critical activities. Triangle wasn’t inclined to get pulled into this and commit to something on a deadline, which is a completely fair reaction on their part.
We took a look at our options for trying to resolve the problem and realized we had three. The first was a short-term fix, which would prevent the security exploit but would likely cause other problems in the medium term. The second was to gut the code and rebuild it without the exploit in place; this was pretty quickly rejected because it seemed wasteful to gut it for this then replace it again in the short term, especially given that the other team was planning to replace it in a different tech stack than the current one. The third was a nuanced tweak to the existing behavior that looked like it would fix the security issues but it was unclear as to if it would have any other negative side effects. We decided to go with the third approach since the first approach would likely end up coming back and costing us more time in the long term.
We decided at this point that we were also going to actively take ownership of the piece of code in the medium term. We didn’t gut the existing solution, but I applied some iterative refactoring to the situation to safely create the seams we needed to test the software to our satisfaction. The existing code was one big method, about 1500 lines. I started by extracting out some methods from that, then converting those methods into a couple of classes. I did it this way so that the available tools handled most of the refactoring so that I could be confident in the correctness of the refactored code, even with the limited integration tests available. We felt that this little bit of unit testing would enable us to make the change we needed to with confidence that we didn’t break any of the broader test cases.
We ended up fixing the problem and one little corner of the codebase. Applying some TLC to this portion of the codebase was rewarding, since every time we had looked at it before when it did something odd it was a negative experience everyone wanted to avoid. Overall, I only spent about a day cleaning it out and making the basic workflow conceptually clean. The software got much better and it just took the will to make the changes. The effort was lower than what would have been expected, to try and fix this and it built a good bit of confidence if anything else in this area. It was a rewarding little win.
The pride of ownership and fully tackling a problem to make the system better, rather than continually avoiding, has lots of benefits. You actually make things better, which can be motivating to continue to tackle more problems. Making something a little better, then a little better, then a little better again, and all of a sudden you’ve changed the tenor of the system. In this instance, tackling the problem head-on proved to be easier than we thought. We are still caretaking a bunch of other code, a good bit of which is slated to be replaced as well, but if push comes to shove we’ll hopefully go ahead take full ownership of that code too.