Ran into a funny bug and I thought that others would see the humor with it. We have multiple QA environments set up: QA1.domain.com, QA2.domain.com, etc. Each environment is running a slightly different version of the web application, some of which are configured slightly differently. One of the least used environments, QA9, got a new version installed and immediately when anyone attempted to log in, they got redirected back to the login screen. This environment is used for testing hotfixes going out to a production environment for a client who is very picky about taking new versions and was willing to pay for the privilege. This deployment had two changes in it. One was a change (done by my team) to the login code and the other was a feature change in code that clearly wasn’t being executed.
Obviously, at this point we got a call to go investigate what happened. Simultaneously QA9 was rolled back to the last known good version. Our change was in several other QA environments already without incident so the change seemed like it should have been good. We started looking into whether our change depended on some other change that wasn’t in that single environment. We also started considering if there was some configuration difference in that environment that could have caused the problem. There didn’t seem to be any dependency issues, and we had tested the particular configuration case that was represented by that environment for the feature. The system didn’t record the login attempts as failed or otherwise, or even log any errors – for extra fun in investigating.
By the time we finished this exercise, the roll back had finished and things got weird. The problem didn’t go away, even though the code went back. This got us thinking about things that could persist across versions. Cookies, something funky in the database, etc. We cleared cookies and then the problem magically went away. So we logged in to start looking around at the cookies and didn’t find anything out of the ordinary.
A little later one of the QA Engineers mentioned that immediately after they had cleared their cookies they could log in, but then later they tried to log in again and couldn’t. This immediately had our interest piqued. The code that had supposedly caused the problem wasn’t there anymore, the cookies had been cleared, yet it broke again. Things had gotten more weird. We investigated the cookies on that browser; there was one session cookie with the QA9.domain.com domain set and another session cookie with the .domain.com domain set. That was unexpected.
We went and checked out the session cookie code that was deployed to that environment – nothing about setting the .domain.com version of the cookie. The code didn’t even look like it would support setting that cookie at all. We spent a while thinking about how this could happen. We tried to get that code to set that cookie and couldn’t. We started asking the QA Engineer what he had done between being able to log in and being unable to log in. He listed out a couple of different things; the notable one was that he logged into a different QA environment.
We go check out the session cookie code that was running in that QA environment and find a whole bunch of new code there about changing the domain. BINGO! The other environments had started setting .domain.com cookies which were conflicting with the cookie for this environment. We reached out to the team that made the change and they rolled it back in those environments. This immediately lead to an even funnier problem in that now every other environment had the same problem. But once everyone cleared their cookies it all got back to normal for now.
This little bit of irony felt like it needed to be shared. Breaking one system with a stray cookie from another environment made for an interesting day. In the comments share any funny bugs you’ve run into.