Book Chat: Growing Object-Oriented Software Guided By Tests

Growing Object-Oriented Software Guided By Tests is an early text on TDD. Since it was published in 2010, the code samples are fairly dated, but the essence of TDD is there to be expressed. So, you need to look past some of the specific listings since their choice of libraries (JUnit, jMock, and something called Window Licker I had never heard of) seem to have fallen out of favor. Instead, focus on the listings where they show all of the steps and how their code evolved through building out each individual item. It’s sort of as if you are engaged in pair programming with the book, in that you see the thought process and those intermediate steps that would never show up in a commit history, sort of like this old post on refactoring but with the code intermixed.

This would have been mind blowing stuff to me in 2010, however the march of time seems to have moved three of the five parts of the book into ‘correct but commonly known’ territory. The last two parts cover what people are still having trouble with when doing TDD.

Part 4 of the book really spoke to me. It is an anti-pattern listing describing ways they had seen TDD go off the rails and options for how to try to deal with each of those issues. Some of the anti-patterns were architectural like singletons, some were specific technical ideas like patterns for making test data, and some were more social in terms of how to write the tests to make the more readable or create better failure messages.

Part 5 covers some advanced topics like how to write tests for threads or asynchronous code. I haven’t had a chance to try the strategies they are showing but they do look better than the ways I had coped with these problems in the past. There is also an awesome appendix on how to write a hamcrest matcher which when I’ve had to do it in the past was more difficult to to do the first time than it would look.

Overall if you are doing TDD and are running into issues, checking out part 4 of this book could easily help you immediately. Reading parts 1 through 3 is still a great introduction to the topic if you aren’t already familiar. I didn’t have a good recommendation book on TDD before and while this isn’t amazing in all respects I would recommend it to someone looking to get started with the ideas.

Session Fun

I’ve been working on getting a new session infrastructure set up for the web application I’m working on. We ended up going with a stateful session stored in mongoDB along with some endpoints to query the session with. This design has a couple of nice aspects – all of the logic about if a session is active or not can live inside of one specific service and a session can be terminated if needed.

Building a session infrastructure is a fairly common activity, but we are building a session that is used by multiple services and we can’t roll all of them out simultaneously. So we’ve been building a setup that can process both the new session and the old session as a way to have a backwards compatible intermediate step. This is creating some interesting flow issues. There are two specific issues I wanted to discuss: (1) how to maintain the activity of the new session during backwards compatibility and (2) processing of identity federation.

Maintaining the new session from applications that have not been updated is nuanced. Since, by rule, you haven’t made changes to the applications that haven’t been updated, you can’t add any calls. In our case there was a call made from all of the application frontends to a specific service, so we are using that to piggyback keeping the session alive. We looked into a couple of other options but didn’t find anything easy. We considered rolling out a heartbeat to the various application frontends but that would require an extra round of updates, testing, and deploys for code that was likely to all be ripped out when we were done.

The federation flow is extra complex because a federation in the new scheme is not that different from some of the session passing semantics under the old session scheme. This ends up mixing together the case where there is just a federation occurring and the case where the new session has timed out and the old session is still valid. This creates an awkward compromise; you’d like to be able to say that if the new session has timed out the entire session is expired, but if you can’t tell the difference between the two cases that’s not possible. This means that the new session expiration can’t be any longer than the old session expiration. But it also solved the problem with maintaining the new session while in applications that haven’t been updated yet.

The one problem solved the other problem which was a nice little win.

Continuation Passing Style

I have been doing some work with a library that creates guards around various web endpoints. They have different kinds of authentication and authorization rules, but are all written in a continuation passing style. The idea of the continuation passing style is that you give some construct a function to ‘continue’ execution with once it does something. If you’ve ever written an event handler, that was a continuation. The usages all look somewhat like

secureActionAsync(parseAs[ModelType]) { (userInfo, model) => user code goes here }

There was some discussion around whether we wanted to do it like that or with a more traditional control flow like

secureActionAsync(parseAs[ModelType]) match {
    case Allowed(userInfo, model) => user code goes here then common post action code
    case Unallowed => common error handling code
}

The obvious issue with the second code sample is the need to call the error handling code and post action code by hand. This creates an opportunity to fail to do so or to do so incorrectly. The extra routine calls also distract from the specific user code that is the point of the method.  

There were some additional concerns about the testability and debuggability of the code in the continuation passing style. The debugging side does have some complexity but it isn’t any more difficult to work through than a normal map call which is already common in the codebase. The testability aspect is somewhat more complex though. The method can be overwritten with a version that always calls the action, but it still needs to do the common post action code. The common post action code may or may not need to be mocked. If it doesn’t need to be mocked this solution works great. If the post action code needs to be mocked then putting together a method that will set up the mocks can simplify that issue.

This usage of a continuation helps collect the cross cutting concern and keeps it all on one place. You could wrap up this concern other ways, notably with something like like AspectJ. The issue with doing it with something like AspectJ is that it is much less accessible than the continuation passing style. AspectJ for this problem is like using a bazooka on a fly, it can solve it but the level of complexity introduced isn’t worth it.

Being a Wizard

A somewhat obscure question got asked in a chat channel at work that I knew the answer to, which helped out some other engineer. The question wasn’t anything that abnormal – it was about a weird error message coming from an internal library. Searching through the library’s code wasn’t immediately helpful since the unique part of the error message didn’t appear in the code. The reason I knew the answer wasn’t because it was easy, but because I had spent an hour investigating it the day before.

Sometimes when you see someone have an apparently impressive insight, that doesn’t necessarily mean they are better than you, they may just have had an experience which makes the answer obvious to them. This applies to all sorts of other technical activities. During the Hackathon I did a similar thing. One of the other devs on the team was integrating the portion of the code I was working on and having trouble. It was immediately obvious to me why, because I had put in the time earlier to figure it out the hard way. Your mind is a powerful pattern matching system. It immediately recognizes this:

 

happycatOr thisftc

 

If you think back to when you first started learning calculus, the terminology and symbols of it were complicated and foreign, but after a while you gained a certain familiarity with them and after a while they became second nature.

You may go to work and make some business web app in one particular technology stack, but there are all sorts of concepts that go with it that aren’t the business or the tech stack. You’re synthesizing things like design patterns, test driven development, RESTful web services, algorithms, or just the HTTP stack and everything that goes with that. These are all the transferable skills that can help you “cast a spell” and jump past a problem.

When I sat down to learn Scala, it wasn’t that big a task since most of the language features had equivalents I was familiar with in other languages. That let me skip forward to the nuances of those implementations and the few language features I was less familiar with. Getting experience with those ideas in the abstract let me appear as a wizard going forward since I jumped ahead on the learning curve and look the wizard. Some of the common feelings of impostor syndrome are the worry to be found out like another wizard.

wizard_behind_the_curtain

Planning Anecdote

The team I joined at my new job had been doing about 50 points per sprint with 4 devs and the manager when I joined. There were some additional staff changes, but when everything smoothed out again we ended up doing about 80 points per sprint with 4 devs and the manager. The new team did not seem better than the old team, but during the turbulence of the staff changes we changed how we prepared work to be done and managed to get better organized, which enables us to do more.

The change was a different way of collecting the relevant dependencies in the tickets. For example, instead of giving a vague reference to an endpoint that needed to be called, we made sure to give an explicit url, plus links to the swagger definition for the endpoint or a listing of explicit endpoints to be created and the related models. We also linked together all of the tickets that were related to make it easier to juggle which services were ready to release or where we had dependencies between tickets. It doesn’t seem like a significant change, but it resolved a bunch of the little dependencies where something wasn’t clear and you needed to hunt down an answer.

The other half of the anecdote is that due to scheduling conflicts we missed out on time to do some of the preparation for a sprint and ended up going ahead without as much preparation; we dropped back down to about 50 points completed. Our manager didn’t seem to have gotten any credit for us having gone from 50 to 80 but certainly caught flack for the sprint where we went back to 50.

Maybe this anecdote will inspire you to experiment with that little change that didn’t seem to be worth trying since it wasn’t clear how it would really help. Lining up a little change is easier than doing something big.

Hackathon

At work we recently held a hackathon where everyone who was interested in participating had 24 hours to build whatever they thought would be interesting and useful. We had 21 teams across 4 offices with ~60 participants total. I had never done a hackathon before and this seemed interesting so I registered for it without a particular idea in mind. As the start rolled around I was planning to put together a tool to simplify how we were generating configurations for containers in marathon. However, at the kickoff pizza dinner I heard another developer saying he had a plan to solve our issues concerning the lack of available conference rooms. Every afternoon there would be an hour or two where there were no meeting rooms left available, and while we’ve got more space coming it will be a while until it is available. Having been a victim of this problem before I asked if they could use another hand on their team and was graciously invited onboard.

The key insight they had was to use the logs from the wireless network to figure out who was in what office each day. Once we had an idea of who was in what office that day we could cross reference that with their calendar appointments and see what rooms were booked and didn’t need to be. There was some concern about false positives, i.e.,  we didn’t want to have the system saying that you weren’t in the office by 10 so release your room at 11 while you were stuck in traffic. So we built a hipchat integration to check with you about it.

The three of us started Thursday night at about 6 with a general divide and conquer along the three major components: data mining/parsing, calendar matching and decision, and the hipchat integration. I mostly worked on the hipchat portion. Since the bot had to reach out to specific people on it’s own volition as opposed to responding to people or messaging a fixed channel, our needs were different than what most of the prebuilt hipchat integrations are doing. I ended up doing an XMPP integration using Smack. The biggest challenge in getting this working in the context of a web service was that I needed to keep the connection to hipchat open longer than the API implied it needed to be. I found this out when my initial attempt to send a message and then close the connection failed because the message didn’t finish going through but we had closed the connection on our end. After spending several hours working through that I called it a night at about 1:30 a.m. and headed home to catch some sleep.

Getting back the next morning at about 7:30, in my office there was one lone developer who had been there working on his project all night. He had been working on porting a feature from the web app to the android app, because when he used the app he wanted that feature. I spent the first part of the morning working on getting the response from hipchat hooked up and found another interesting problem. I wasn’t able to respond to myself as the bot for whatever reason. So if the bot was using my credentials to send messages it wouldn’t see my response to it. I suspect it was because hipchat was being clever and not sending the message as a message but some sort of history, but I never was able to confirm. At 8:30 the dev who had been working on the matching stuff for our project got in and started processing live data for that day; our app immediately started spitting out rooms we thought didn’t need to be booked. I went and did a little scouting at about 10:30 to confirm the situation and matching seemed right.

We ran into a credentials snag on getting an account with the rights to unreserve other people’s meetings. So we didn’t have a full demo but the example meetings we had identified painted a pretty picture of how well it could work and the number of rooms it could free up.

When demo time rolled around we all got together to show off what we built. There was a bunch of interesting stuff put together. There was a set of living visualizations of service dependencies built by parsing urls from the system configuration data. There was a port of one of our mobile apps to Apple Watch. There were two different teams that built Alexa integrations for different portions of our products. Several teams built features for various mobile apps. One team set up a version of Netflix’s Chaos Monkey in the load test environment, including a hacked Amazon Dash button that would kill a server in that environment at the push of a button. Another team built a deploy football in the vein of the nuclear football complete with keys and switches and a little screen to display progress. Two tech writers twisted arms and got someone to build a hipchat integration to look up acronyms from a glossary they had put together on the wiki.

Overall I had a blast but ended up pretty exhausted from the ordeal. Some prizes will be given out on Monday. I’m not sure of the exact criteria for them but I wasn’t competing at all – I was enjoying the latitude to do what I thought was best. There will be one more prize given out at the company-wide all hands wherein everyone gets to vote on as the most impactful project after we get a chance to see how everything  turns out in real usage.

Future[Unit] Harmful

I’m not the first person to have seen this behavior but I saw it bite another engineer recently so it’s clearly not well enough known. The Unit type in Scala is a special type, there is effectively an implicit conversion from every type to Unit. If you are interested in the specifics check out this post, I’m not going to get into the specifics of how it works or the edge cases. This means that you can write the something like

val doubleWrapped: Future[Unit] = Future.successful(Future.successful(true))

and that compiles. It breaks the type safety that we have come to expect. More intriguingly if you were to do

val innerFailure: Future[Unit] = Future.successful(Future.failed(new RuntimeException))
innerFailure.value

what would you expect the result to be?

If you said

Some(Success(()))

you would be right, but that would probably not be what you were looking for. If you had a map where you needed a flatMap you would end up with compiling code that silently swallows all error conditions. If you have an appropriate set of unit tests you will notice this problem, but you can have that code that looks simple enough that it doesn’t need any unit tests.

The compiler flag -Ywarn-value-discard should give you some protection, but you need to turn it on explicitly and it may produce a fair bit of news in an existing large codebase. So keep an eye out for this issue, and be forewarned.

Write the Code You Want

“Write the code you want and then make it compile” was a thought expressed on library design while I was at the NE Scala Symposium. It is a different way to describe the TDD maxim of letting the usage in tests guide the design. It is very much influenced by the extremely flexible syntax rules and DSL creation abilities in Scala. One of the talks, Can a DSL be Human? by Katrin Shechtman, took a song’s lyrics and produced a DSL that would compile them.

Since you can make any set of arbitrary semantics compile, there is no reason you can’t have the code you want for your application. There is an underlying library layer that may not be the prettiest code, or may be significantly verbose but you can always make it work. Segregating the complexity to one portion of the code base means that most of the business logic is set up in a clean fashion and that the related errors can be handled in a structured and centralized fashion.

Taking the time to do all of this for a little utility probably isn’t worth it, but the more often a library is used the more valuable this becomes. If you’ve got a library that will be used by hundreds, really refining the interface to make it match how you think would be really user friendly.

Building software that works is the easy part, building an intuitive interface and all of the comprehensive documentation so others can understand what a library can do for you is the hard part. I’m going to take this to heart with some changes coming up with a library at work.

This still doesn’t even cover the aspect of deciding what you want. There are different ways you can express the same idea. The difference between a function, a symbolic operator, or create a DSL can all express the same functionality. You can express the domain in multiple ways, case classes, enums, or a sealed trait. You can declare a trait, a free function, or an implicit class. Deciding on the right way to express all of this is the dividing line between a working library and a good library.

Category Theory Intro

While I was at the NE Scala Symposium there was a lot of discussion of the finer points of functional programming. There was a lot of discussion of Category Theory and Monads; while I’d like to say I understood everything that was going on I’m not going to claim I did. I had seen discussion of the topics before but never really got it, or why it was valuable. I got an understanding of the basic concepts for the first time and wanted to try and write it out with the beginner in mind, since the biggest issue is the terminology and wikipedia assumes you’ve got a significant education in abstract math. As someone who isn’t an expert on this I’m going to simplify and probably end up misusing some terms, but hopefully it will get the basic ideas through in a way that you understand enough of the terminology to read more advanced works, or to understand why you don’t need to.

Starting from the theoretical aspect, you’ve got a category that is made up of three different aspects: objects, arrows, and the means to compose multiple arrows in a way that is commutative and has an identity. The arrows represent transitions between objects. That’s it.

There is a special class of categories called monoids, which are categories with only one object. At first this seems kind of pointless, but depending on how you define that object they become interesting. For example, if you define the object as the set of integers, and the arrows as individual numbers then the composition of numbers becomes an operator like addition. Addition is commutative so that’s not an issue, and adding 0 is an identity. It is a somewhat odd way to define addition, but it ensures you get a couple of different properties out of the system.

But what good is this monoid structure? It is the mathematical underpinning of why you can fold over a collection and why MapReduce is provably correct.

A monad is a transformation within a category (i.e., an arrow) that defines two additional properties, where there is an identity available and it is associative. Where have we seen that before? It’s a monoid! So adding 3 to an integer is a monad. What’s the big deal? The integer example is straightforward if dubiously valuable; with something more complex it might make more sense. So imagine you have some structure, with the ability to transform into another structure that is similar, has an identity and is associative. So what’s is that? It’s flatMap.

I’m going to switch to the other end and come at this from the programming side now for a minute. We’ve got these structures that have values and can be converted into other similar structures. That sounds kind of like a program to me. If you’ve got a List[String] with some strings in it and you convert that into another List[String], you could have taken those urls and returned the contents of the url, or you could have taken a list of file names and read the contents of the files. This tells us a couple of things: string typed things interoperate with all sorts of stuff and the act of the transformation can be abstracted from the data of the transformation. The first part embodies the idea that if a functional program compiles, it’s probably right since functional programs will define more specific data types that ensure you are putting the pieces together. The second represents the separation of data from the transformations of data or from the actual business logic of the application. This is the opposite of object oriented programming where code and data are coupled together in an object. So we’ve separated the code and the data, and without objects you’ve essentially got free functions that eventually get composed together into larger and larger computational pieces. No reason those transformations can’t be set up to be associative and have identities. So now you have a program represented as a series of monad transformations, easy.

There is some space between the programming perspective and the math perspective. But a lot of it is just constructing categories in specific ways so the transformations and compositions are what you want so I’m going to leave that alone, but I would like to discuss some of what this matters. The separation of the transformation and the data also allows you to separate the definition of the transformation from the execution of the transformation. That makes your code inherently testable, as well as imbuing it with the ability to apply large classes of optimizations. Those optimizations are part of what makes functional programming parallel processing friendly and highly performant.

So while you probably don’t care that the code has these underlying mathematical properties, you do care what those properties can imbue upon your program. They make it easy to test and easy to be fast. You don’t really need to understand monads and category theory at a theoretical level to take advantage of all of this and what it means for your program. Libraries like Cats are full of the category theory terminology, making it harder to find what you want without understanding the underlying terminology. Having a combinable rather than a semigroup would make it more accessible, but harder to understand why it works.

Traveling Stories

Traveling has always made me introspective. When I was going to the West coast regularly for work, my wife knew she would get a rambling email from me that I wrote while in the plane. Some of them were sappy, some were crazy, some were a disjointed mess. But they were all those thoughts that came out when I was stuck there with nothing but those thoughts to keep me company.

Sure I’d read, listen to a podcast, or watch something but eventually the mind would wander to that place it wanted to go. Like a very slow form of meditation. Once my mind emptied of thoughts of the day or about where I was going, I achieved this zenlike state where answers to questions would just unfold like an origami crane.

The worst part is that the answers were always fleeting. There for a moment, gone the next without the chance to fully understand the epiphany. A deep insight into the universe that you know was there, but never get to appreciate.

This post started with one of those epiphanies I was on the train on the way to New York for the NE Scala Symposium and there was this moment of clarity about a communication struggle I had been having at work. For a moment, I had a vision of the creative action plan I had been looking for, then a PA announcement came on and the thought was gone. I hadn’t recorded the thought in any way but I know it was there right outside of Trenton.

On the ride home I managed to rekindle some of the thought, but it wasn’t the same deep insight that I had originally had. Initially, my work team had been proposing to adjust this existing framework to enable a new usage. But by rephrasing it as “replacing” the entire framework and building a brand new system everyone was immediately on board. By reframing the initial idea from being a change to a new thing it got everyone on board. I think this reasoning is twofold, it would mean we can roll out the change in smaller increments and we can go back without as much effort. It’s the same work and the same expected end state, but the reception was significantly different.