Session Fun

I’ve been working on getting a new session infrastructure set up for the web application I’m working on. We ended up going with a stateful session stored in mongoDB along with some endpoints to query the session with. This design has a couple of nice aspects – all of the logic about if a session is active or not can live inside of one specific service and a session can be terminated if needed.

Building a session infrastructure is a fairly common activity, but we are building a session that is used by multiple services and we can’t roll all of them out simultaneously. So we’ve been building a setup that can process both the new session and the old session as a way to have a backwards compatible intermediate step. This is creating some interesting flow issues. There are two specific issues I wanted to discuss: (1) how to maintain the activity of the new session during backwards compatibility and (2) processing of identity federation.

Maintaining the new session from applications that have not been updated is nuanced. Since, by rule, you haven’t made changes to the applications that haven’t been updated, you can’t add any calls. In our case there was a call made from all of the application frontends to a specific service, so we are using that to piggyback keeping the session alive. We looked into a couple of other options but didn’t find anything easy. We considered rolling out a heartbeat to the various application frontends but that would require an extra round of updates, testing, and deploys for code that was likely to all be ripped out when we were done.

The federation flow is extra complex because a federation in the new scheme is not that different from some of the session passing semantics under the old session scheme. This ends up mixing together the case where there is just a federation occurring and the case where the new session has timed out and the old session is still valid. This creates an awkward compromise; you’d like to be able to say that if the new session has timed out the entire session is expired, but if you can’t tell the difference between the two cases that’s not possible. This means that the new session expiration can’t be any longer than the old session expiration. But it also solved the problem with maintaining the new session while in applications that haven’t been updated yet.

The one problem solved the other problem which was a nice little win.

Advertisements

Book Chat: AWS in Action

Amazon Web Services in Action is a great introduction to the basics of AWS. It mostly discusses IaaS services, but touches on some of the PaaS services too. It also covers a good portion of what’s in the Architecting on AWS course if you were considering that. I was hoping for coverage of AWS Lambda, API Gateway and EC2 Container Service but they weren’t included. There were references to when to use AWS Cloudfront but it was never introduced like the other services were.

It’s a very quick read, even though it weighs in about 400 pages. There are lots of detailed examples including specific information about if the example was covered under the free tier of services and how to be sure to roll back everything. Lots of screenshots of consoles and the CLI interface and plenty of code samples of using the various SDKs, mostly in node.

If you are already familiar with AWS this probably isn’t the right book for you. It doesn’t cover the architecture aspect in depth, just simple examples of how to combine the various services. For my taste it doesn’t sufficiently cover how to decide when to use a PaaS solution vs rolling your own at the IaaS level. There is one little chart in the portion covering Elastic Beanstalk about the benefits of it vs other less managed options.

Overall it wasn’t the book I was looking for. But it is the sort of thing that can be helpful to lots of people who want to try and understand how to apply their existing architectural knowledge to the AWS platform.

Continuation Passing Style

I have been doing some work with a library that creates guards around various web endpoints. They have different kinds of authentication and authorization rules, but are all written in a continuation passing style. The idea of the continuation passing style is that you give some construct a function to ‘continue’ execution with once it does something. If you’ve ever written an event handler, that was a continuation. The usages all look somewhat like

secureActionAsync(parseAs[ModelType]) { (userInfo, model) => user code goes here }

There was some discussion around whether we wanted to do it like that or with a more traditional control flow like

secureActionAsync(parseAs[ModelType]) match {
    case Allowed(userInfo, model) => user code goes here then common post action code
    case Unallowed => common error handling code
}

The obvious issue with the second code sample is the need to call the error handling code and post action code by hand. This creates an opportunity to fail to do so or to do so incorrectly. The extra routine calls also distract from the specific user code that is the point of the method.  

There were some additional concerns about the testability and debuggability of the code in the continuation passing style. The debugging side does have some complexity but it isn’t any more difficult to work through than a normal map call which is already common in the codebase. The testability aspect is somewhat more complex though. The method can be overwritten with a version that always calls the action, but it still needs to do the common post action code. The common post action code may or may not need to be mocked. If it doesn’t need to be mocked this solution works great. If the post action code needs to be mocked then putting together a method that will set up the mocks can simplify that issue.

This usage of a continuation helps collect the cross cutting concern and keeps it all on one place. You could wrap up this concern other ways, notably with something like like AspectJ. The issue with doing it with something like AspectJ is that it is much less accessible than the continuation passing style. AspectJ for this problem is like using a bazooka on a fly, it can solve it but the level of complexity introduced isn’t worth it.

Being a Wizard

A somewhat obscure question got asked in a chat channel at work that I knew the answer to, which helped out some other engineer. The question wasn’t anything that abnormal – it was about a weird error message coming from an internal library. Searching through the library’s code wasn’t immediately helpful since the unique part of the error message didn’t appear in the code. The reason I knew the answer wasn’t because it was easy, but because I had spent an hour investigating it the day before.

Sometimes when you see someone have an apparently impressive insight, that doesn’t necessarily mean they are better than you, they may just have had an experience which makes the answer obvious to them. This applies to all sorts of other technical activities. During the Hackathon I did a similar thing. One of the other devs on the team was integrating the portion of the code I was working on and having trouble. It was immediately obvious to me why, because I had put in the time earlier to figure it out the hard way. Your mind is a powerful pattern matching system. It immediately recognizes this:

 

happycatOr thisftc

 

If you think back to when you first started learning calculus, the terminology and symbols of it were complicated and foreign, but after a while you gained a certain familiarity with them and after a while they became second nature.

You may go to work and make some business web app in one particular technology stack, but there are all sorts of concepts that go with it that aren’t the business or the tech stack. You’re synthesizing things like design patterns, test driven development, RESTful web services, algorithms, or just the HTTP stack and everything that goes with that. These are all the transferable skills that can help you “cast a spell” and jump past a problem.

When I sat down to learn Scala, it wasn’t that big a task since most of the language features had equivalents I was familiar with in other languages. That let me skip forward to the nuances of those implementations and the few language features I was less familiar with. Getting experience with those ideas in the abstract let me appear as a wizard going forward since I jumped ahead on the learning curve and look the wizard. Some of the common feelings of impostor syndrome are the worry to be found out like another wizard.

wizard_behind_the_curtain

Book Chat: Zero Bugs and Program Faster

Zero Bugs and Program Faster by Kate Thompson is a book that’s hard to describe. None of it is a really novel way of looking at creating software but it’s all of those things that you would expect to describe when you think about how to do programming well. It’s a breezy and fun read that is divided into enough small sections that you can read it in however much time you have available.

The book is structured in two parts. The first part is a series of short vignettes about programming. Some of which are more direct, like the chapter on ACID; some are more abstract, like the chapter entitled “The Many Sides of the Elephant.” I appreciated the dual chapters of “Do It Now” and “Do It Later” that are about how you can’t always do it now but you shouldn’t always defer it either. None of it was a mind shattering revelation but it was all solid advice about programming.

The second part is extracts from various programs to demonstrate a lot of different ideas. The code samples in the second part were generally significantly older, mostly in assembly or C. The low level nature of the examples made it more difficult for me to appreciate. Seeing Altair assembly from the 70s that’s notable for being clever and concise won’t help me build a better web service today.

If you are the sort of person who is reading lots of programming books, you will appreciate the book, however you may not get much from it. If you aren’t the kind of person who reads lots of programming books some of the more oblique points may be obscured. I don’t have anything bad to say about it but don’t know who I would recommend the book to.

Planning Anecdote

The team I joined at my new job had been doing about 50 points per sprint with 4 devs and the manager when I joined. There were some additional staff changes, but when everything smoothed out again we ended up doing about 80 points per sprint with 4 devs and the manager. The new team did not seem better than the old team, but during the turbulence of the staff changes we changed how we prepared work to be done and managed to get better organized, which enables us to do more.

The change was a different way of collecting the relevant dependencies in the tickets. For example, instead of giving a vague reference to an endpoint that needed to be called, we made sure to give an explicit url, plus links to the swagger definition for the endpoint or a listing of explicit endpoints to be created and the related models. We also linked together all of the tickets that were related to make it easier to juggle which services were ready to release or where we had dependencies between tickets. It doesn’t seem like a significant change, but it resolved a bunch of the little dependencies where something wasn’t clear and you needed to hunt down an answer.

The other half of the anecdote is that due to scheduling conflicts we missed out on time to do some of the preparation for a sprint and ended up going ahead without as much preparation; we dropped back down to about 50 points completed. Our manager didn’t seem to have gotten any credit for us having gone from 50 to 80 but certainly caught flack for the sprint where we went back to 50.

Maybe this anecdote will inspire you to experiment with that little change that didn’t seem to be worth trying since it wasn’t clear how it would really help. Lining up a little change is easier than doing something big.

Hackathon

At work we recently held a hackathon where everyone who was interested in participating had 24 hours to build whatever they thought would be interesting and useful. We had 21 teams across 4 offices with ~60 participants total. I had never done a hackathon before and this seemed interesting so I registered for it without a particular idea in mind. As the start rolled around I was planning to put together a tool to simplify how we were generating configurations for containers in marathon. However, at the kickoff pizza dinner I heard another developer saying he had a plan to solve our issues concerning the lack of available conference rooms. Every afternoon there would be an hour or two where there were no meeting rooms left available, and while we’ve got more space coming it will be a while until it is available. Having been a victim of this problem before I asked if they could use another hand on their team and was graciously invited onboard.

The key insight they had was to use the logs from the wireless network to figure out who was in what office each day. Once we had an idea of who was in what office that day we could cross reference that with their calendar appointments and see what rooms were booked and didn’t need to be. There was some concern about false positives, i.e.,  we didn’t want to have the system saying that you weren’t in the office by 10 so release your room at 11 while you were stuck in traffic. So we built a hipchat integration to check with you about it.

The three of us started Thursday night at about 6 with a general divide and conquer along the three major components: data mining/parsing, calendar matching and decision, and the hipchat integration. I mostly worked on the hipchat portion. Since the bot had to reach out to specific people on it’s own volition as opposed to responding to people or messaging a fixed channel, our needs were different than what most of the prebuilt hipchat integrations are doing. I ended up doing an XMPP integration using Smack. The biggest challenge in getting this working in the context of a web service was that I needed to keep the connection to hipchat open longer than the API implied it needed to be. I found this out when my initial attempt to send a message and then close the connection failed because the message didn’t finish going through but we had closed the connection on our end. After spending several hours working through that I called it a night at about 1:30 a.m. and headed home to catch some sleep.

Getting back the next morning at about 7:30, in my office there was one lone developer who had been there working on his project all night. He had been working on porting a feature from the web app to the android app, because when he used the app he wanted that feature. I spent the first part of the morning working on getting the response from hipchat hooked up and found another interesting problem. I wasn’t able to respond to myself as the bot for whatever reason. So if the bot was using my credentials to send messages it wouldn’t see my response to it. I suspect it was because hipchat was being clever and not sending the message as a message but some sort of history, but I never was able to confirm. At 8:30 the dev who had been working on the matching stuff for our project got in and started processing live data for that day; our app immediately started spitting out rooms we thought didn’t need to be booked. I went and did a little scouting at about 10:30 to confirm the situation and matching seemed right.

We ran into a credentials snag on getting an account with the rights to unreserve other people’s meetings. So we didn’t have a full demo but the example meetings we had identified painted a pretty picture of how well it could work and the number of rooms it could free up.

When demo time rolled around we all got together to show off what we built. There was a bunch of interesting stuff put together. There was a set of living visualizations of service dependencies built by parsing urls from the system configuration data. There was a port of one of our mobile apps to Apple Watch. There were two different teams that built Alexa integrations for different portions of our products. Several teams built features for various mobile apps. One team set up a version of Netflix’s Chaos Monkey in the load test environment, including a hacked Amazon Dash button that would kill a server in that environment at the push of a button. Another team built a deploy football in the vein of the nuclear football complete with keys and switches and a little screen to display progress. Two tech writers twisted arms and got someone to build a hipchat integration to look up acronyms from a glossary they had put together on the wiki.

Overall I had a blast but ended up pretty exhausted from the ordeal. Some prizes will be given out on Monday. I’m not sure of the exact criteria for them but I wasn’t competing at all – I was enjoying the latitude to do what I thought was best. There will be one more prize given out at the company-wide all hands wherein everyone gets to vote on as the most impactful project after we get a chance to see how everything  turns out in real usage.

Both Project Management and Coaching?

This is somewhat outside of the normal material I write about, but this idea came to mind and I wanted to take an opportunity to explore it further. The context of this thought was that a mentoring program is being launched at work. Why is a manager both responsible for guiding the day-to-day execution of a project and for development of staff beneath them?
The goals of the project and the goals of personnel development are often at odds. Any specific experience a person needs or wants to get is likely something they don’t have much (if any) experience with yet, which means they won’t be as efficient or as good at that particular skill. While management taking care of your people gains you flexibility and goodwill in the long term, it has short term costs.
The product owner is separated from the scrum master, separating the “what” from the “how” in order to prevent a conflict of interest. Similarly, splitting the staff development aspect from the technical management of the project seems like it could prevent a similar conflict of interest about an individual. Maybe a stronger explicit mentorship program would handle mitigate this conflict of interest, but unless you give out a mentor who is both capable of and interested in working on staff development it wouldn’t help much. I have seen an explicit mentorship program where serving as a mentor was informally required for promotion to higher level, which resulted in people becoming mentors explicitly to check the box for their own personal benefit.
By setting up part of management to be incentivized to do staff development and achieve technical excellence, rather than completing projects or shipping features, you can create an environment that allows those closest to the system under development to do their best without the pressure to achieve explicit results. This reminds me of the idea of slack in queueing theory, where putting less work into the system means more work will come out. Once you build up staff appropriately and get everyone cross-trained, the overall outcome becomes better. Think of it as an optimization problem where you may have achieved a local maximum; getting to a higher peak would be better but the cost of valley between the peaks needs to be paid.
You could theoretically have the manager of the developers on a team be involved with the work of a different team, but it would be hard to see what those developers needed in order to help develop them in the longer term. Spending all day with a group of other developers working on the technical problems you face doesn’t really give you the insights necessary to see what a completely separate group of developers is struggling with. If you look at a problem with a new perspective, you can easily see different solutions, you could also miss important details about the problem itself and backtrack to already tread territory.
Maybe this is just my perspective based on the places I’ve worked, where the scrum master has been more of a process leader and impediment resolver than technical coach or project management. In my experience, development managers have spent a large portion of time working with the product owner to help factor stories in a more completable fashion and to derive the technical requirements from the business requirements. It always felt like the development manager spent more of their time working on those urgent but not really important things, and ignoring the important but delayable things because they were hard.
To answer the original question “Why is a manager both responsible for guiding the day-to-day execution of a project and for development of staff beneath them?” it seems to be because splitting up the responsibility differently won’t give management the day-to-day visibility into how to effectively provide useful developmental guidance. The ability to coach people on how to perform their role better requires that you spend enough time seeing them perform the role, so you can’t really be engaged in other technical activities on a day-to-day basis. Even if they had the insight to do so, it is exceedingly hard to measure staff development, which would make it hard to create goals and metrics around the activity. If you’ve had a different experience with how these roles have been broken down post a comment.

Future[Unit] Harmful

I’m not the first person to have seen this behavior but I saw it bite another engineer recently so it’s clearly not well enough known. The Unit type in Scala is a special type, there is effectively an implicit conversion from every type to Unit. If you are interested in the specifics check out this post, I’m not going to get into the specifics of how it works or the edge cases. This means that you can write the something like

val doubleWrapped: Future[Unit] = Future.successful(Future.successful(true))

and that compiles. It breaks the type safety that we have come to expect. More intriguingly if you were to do

val innerFailure: Future[Unit] = Future.successful(Future.failed(new RuntimeException))
innerFailure.value

what would you expect the result to be?

If you said

Some(Success(()))

you would be right, but that would probably not be what you were looking for. If you had a map where you needed a flatMap you would end up with compiling code that silently swallows all error conditions. If you have an appropriate set of unit tests you will notice this problem, but you can have that code that looks simple enough that it doesn’t need any unit tests.

The compiler flag -Ywarn-value-discard should give you some protection, but you need to turn it on explicitly and it may produce a fair bit of news in an existing large codebase. So keep an eye out for this issue, and be forewarned.

Generators for Infrastructure

I listened to a recent episode of .NET Rocks about CI/CD pipelines and the guest had built a yeoman generator to build out the pipeline. It is specific to the Microsoft offerings of Visual Studio Team Services and Team Foundation Services, but it is an interesting idea. There isn’t a reason you couldn’t build something similar on top of Travis, Circle, or Semaphore. I had touched on some of the other generators at work previously, but those were bootstrapping service and application code, not infrastructure. But that’s the advantage of infrastructure as code – you get all of the toolchain available to work with it.

At work we have code to build out the CI/CD pipelines but it is more of a desired state thing than a generator. It’s all built on top of a local Jenkins install, and regularly recreates all of the job definitions to prevent any changes made by hand from living very long. At previous jobs the automation for building different services and libraries were all a little different or built in a brittle template/multipart style that didn’t allow for any changes in a safe way. Our tool has been great for keeping all of the microservice jobs the same, templated in a way that makes upgrading them in groups easy enough, and even fairly testable.

The testing aspect is part of what’s most interesting, we boot up a Jenkins instance inside a container and generate the jobs into it, with some different configuration for notifications and where to put artifacts generated to blackhole all of it. The deployment jobs work using the same version running to the dev cluster so it should all be idempotent. The biggest issue I’ve seen so far is that it runs a ‘representative sample’ of jobs as part of testing; that hasn’t  always been as representative as we would want it to be especially as we add new jobs. We’ve got roughly 300 repos but only maybe 150-200 use managed builds like this. The good news is they break much less often than the unmanaged ones, but the bad news is when they break it’s usually a large swath of them that go down all at once.

AWS CodeStar was recently announced as a hosted version of this sort of tool. It creates repos, scaffolds a new application, can deal with permissions, and carries through to the provisioning of hosting as well. It has options for multiple languages and hosting options. So it seems like it is simplifying a lot of the initial setup of a service.

I’m not sure where the tipping point on the value of setting up your own infrastructure would be, but it seems like at least 100 repos depending on the complexity of each build. This sort of automation will become more important as microservices deployment continues to get bigger. If you haven’t done any sort of automation like this it’s worth looking into depending on the scope of the project.