Credential stuffing is an attack where you take previously breached username/password combinations and reuse them against other sites with the hope that the user had an account there and that they used the same credentials. This is why users are recommended not to reuse the same password for multiple sites. On the application side, defensive options are generally varieties of MFA, but that is a significant burden to the user of your average web application. You can react to the attack by rate-limiting logins from the same IP and blacklisting IPs that attempt too many logins. This is a reactive approach to someone attempting to break into the system. Reactive defensive positions are great, but proactive ones are even better.
Discussing this with some coworkers over lunch we came to the idea of getting the credentials and doing it to ourselves. Find the users who had credentials breached and proactively prompt them to change their passwords. This was an interesting theory over lunch but turning it into a practice ran into some difficulties. The first was the legal status of the breached credentials. Could we pull these off the internet and use them to do something? Have I been Pwned? supplies lists of only passwords hashed in different ways, which are still in a dubious legal state. Even if you get past the legal barrier involved, how would you react to an email telling you that your password has been compromised even if it wasn’t the people sending the email who lost your password? It doesn’t seem like it would be a positive reaction. Forcing a password reset on the next login could work assuming the user was going to log in again soon, but for our particular use case we can’t make this assumptions.
This seems like a weakness to the entire username/password scheme of protection. Things like password managers exist to help cover part of the problem. A password manager requires the user to take a more active part in their own security. There has been ongoing discussions with replacing passwords with biometric identifiers or hardware devices like Yubikey. Yubikey and the like might be ready to augment security but don’t have general adoption yet. The opt-in nature of these measures means that they’re more likely to be adopted by those who are already not sharing passwords among sites. The self credential stuffing could be used to provide additional knowledge and security measures to those who are less aware of the problems or even just figure out how exposed your user base is to the problem. However, it doesn’t really seem to secure the application better, it might help the ecosystem as the whole.
I start most of these book chats by describing the book and end by describing the reader who it seems like it would benefit. This time I’m going to describe the reader first.
If you write software that interacts with another computer you should read this. To me, the title is somewhat misleading since it focuses on the data aspect, however the intensity comes from the size of the data and implies a distributed system. You also get an amazing survey of different database implementations as a side benefit. The only complaint that I have is that the first part of the book starts very much at the beginning of data-intensive systems (e.g., data locality and how it is organized on disk), which is important if you are building your own database system but isn’t as applicable to the average reader.
There are sections on a vast number of different topics. It covers both SQL-style ACID databases as well as BASE-style NOSQL databases. It even gets into things like graph databases that don’t fit neatly in either box. The authors cover topics as varied as locking and commit strategies, the levels of consistency available in a database and what they really mean, distributed consensus, replication, and streaming.
The majority of the text is written in a technology-agnostic way but it will reference specific implementations that demonstrate a concept. There is also a deep academic rooting with a well-referenced selection of footnotes to satisfy any further curiosities you may have on the topics. It seems like it should be fairly accessible as a whole text to a relative beginner since it introduces concepts in a way that doesn’t require much prior knowledge. I don’t think a beginner could jump into a chapter in the middle and be able to follow along, but given the complexity of the topic I don’t think that’s an unreasonable thing.
Even if you aren’t building a database, deeply understanding the tradeoffs of the database you are using will make your application more correct. The difficulty of testing into a lot of the concurrent failure scenarios makes understanding the system at a logical level the only way to attempt to handle all failure cases. I do think all software engineers working on the web would benefit from the material here. It won’t make your day-to-day much better but it will help keep you out of the really bad places where the system is intermittently failing.
Following up on the discussion of the OWASP dependency check, my team now has the means to scan a dependency manifest against a list of known vulnerable ones. The question now in front of us is whether this should this be a step in the build pipeline that breaks the build?
There are some immediate pros and cons that come to mind. On the pro side, you will find out if you are about to ship something that has a known vulnerability. On the con side, there is the problem of dealing with false positives, and builds breaking that used to work if the world around you changes. Evaluating the impact of the pro is relatively obvious: you gain information about the system to act proactively before something becomes a problem. The cons get a little more complex. You need the ability to feed information back to the tool telling it that is has made a mistake to reduce future false positives. This is easy enough with something like a dot file but it makes configuring and using the tool more complex.
For the other con imagine the situation where you have a pull request that has passed CI and is ready to merge. Once merged it kicks off a series of processes to build and deploy the code, and then the build of the merged code fails because a new vulnerability became known to the system between when the pull request was built and when it was merged. This should be a rare occurrence, but when it does happen it will be even more unfamiliar to those who see the outcome.
We’ve been toying with an in-between response where we run it in a pull request and break the build for critical security vulnerabilities that were strongly predicted to match the dependency and otherwise just push back informative status to the pull request. This feels like a the best of both worlds sort of system where we get some build breakage on the worst of the worst and some freedom with not having to set up a set of false positives on all of our repos. Once we’ve got it all hooked up and had time to bake I’ll report back and see if it worked the way we hope it will.