Organizing Code

I’ve been arguing with myself about the proper way to split up some code that has related concerns. The code in question relates to fetching secrets and doing encryption. The domains are clearly related, but the libraries aren’t necessarily coupled. The encryption library needs secrets, but secrets are simple enough to pass across in an unstructured fashion.

As I mentioned before, we are integrating Vault into our stack. We are planning on using Vault to store secrets. We are also going to be using their Transit Encryption Engine to do Envelope Encryption. The work to set up the Envelope Encryption requires a real relationship between the encryption code and Vault.

There are a couple of options for how to structure all of this. There are also questions of binary compatibility with the existing artifacts, but that’s bigger than this post. The obvious components are configuring and authenticating the connection to Vault, the code to fetch and manage secrets, the API for consuming secrets, and the code to do encryption. I’m going to end up with three or four binaries, encryption, secrets, secret API, and maybe a separate Vault client.

organizingCode

 

That would be the obvious solution, but the question of what the Vault client exposes is complex, given that the APIs being used by the encryption and secrets are very different. It could expose a fairly general API that is essentially for making REST calls and leaves parsing the responses to the two libraries, which isn’t ideal. The Vault client could be a toolkit for building a client instead of a full client. That would allow the security concerns to be encapsulated in the toolkit, but allow each library to build their own query components.

Since the authentication portion of the toolkit would get exposed through the public APIs of the encryption and secret libraries, that feels like a messy API to me and I’d like to do better. There seems like there should be an API where the authentication concerns are entirely wrapped up into the client toolkit. I could use configuration options to avoid exposing any actual types, but that’s just hiding the problem behind a bunch of strings and makes the options less self-documenting.

Like most design concerns there isn’t a real right answer. There are multiple different concerns at odds with each other. In this case you have code duplication vs encapsulation vs discoverable APIs. In this case code duplication and encapsulation are going to win out over discoverable APIs since the configuration should be set once and then never really changed, as opposed to the other concerns which can contain the long term maintenance costs of the library since it will likely be used for a good while to come.

Advertisements

Book Chat: Extreme Programming Explained

Extreme programming (XP) is an alternative software development methodology that would be described as an agile methodology. It’s a competitor to scrum, but more focused on the developer experience, less prescriptive of specific organizational practices, and more prescriptive of technical practices. I was familiar with the concepts of XP and recently picked up the second edition of Extreme Programming Explained. This new edition refined some of the technical practices about deployment since tools now exist for even more rapid deployment than what was initially conceived.

The build time practice is interesting, the idea being that a continuous integration build/test cycle should take ten minutes. While you could make the build faster than 10 minutes, keeping it a bit longer generates a decent mental break to allow someone to get a cup of coffee or get up and stretch. Whereas, if it’s slower than that, there is a tendency to move onto a different task and you can lose context on the old task and the new task. It matches with my experience; although I hadn’t been able to articulate the solution, I had seen the problem.

The overall methodology seems solid, however it doesn’t market itself to the whole business the way scrum does which seems to have impacted the adoption of the methodology as a whole. The practices suggested are all pretty straight forward:

  • colocate the team,
  • construct a team with all necessary skills on the team,
  • have visible progress locators,
  • work when you can really concentrate on it,
  • pair program,
  • user stories,
  • a weekly cycle,
  • a larger quarterly cycle,
  • slack,
  • the above build time practice,
  • continuous integration,
  • test first programming, and
  • incremental design.

Most modern software teams would be in favor of most, if not all, of these practices. Some of the practices are outside the control of the team and would need significant management support, but most are things the team can control.

I don’t think that the differences between this and other agile project management methodologies are that significant. The biggest difference with scrum I can see would be that scrum has fixed reflection periods whereas XP has continuous reflection with impromptu kaizen events. I think that this difference between XP and scrum would allow you to differentiate yourself from all of the scrum implementations that are out there but never finished. I don’t think that the book adds much to my understanding of software engineering, however it’s an excellent selection of software engineering practices. If you’re looking for a different perspective on agile methodologies this would be an interesting read.

BadSSL.com

I ran across badssl.com recently, and needed to share. The basic idea of the site is that it hosts a number of subdomains with all sorts of variants of SSL certificates. The example certificates cover the whole range of things that can go wrong with a certificate, including expiration, self signed certs, revoked certificates, and certificates for the wrong host. It also checks the strength of cryptography being used and has certificates specifying multiple different kinds of encryption to be tested against. This is all so you can see that your browser is securing you properly.

There is a more interesting use case however. When you go over to the associated github repo there are instructions for booting up the site locally inside a docker container so you can test your code against it as part of your automated test suite to test all sorts of other networking code outside of a browser. The container hosting a separate copy of the site avoids putting your integration tests in a path where they reach out to the public internet for resources. Having your integration tests work with public resources on the internet isn’t a good practice for a number of reasons, such as the time it takes to round trip, the dependency on someone else’s infrastructure for your processes, and just being inconsiderate of someone else’s resources. But, this container lets you avoid all of the work associated with defining what certificates are needed, generating the various certificates, and installing all of certificates.

The test case we used the certificates for didn’t turn up any bugs, but it did make us confident in the implementation. This confidence helped us move along more quickly and be sure we were appropriately securing the connections.

Book Chat: Working Effectively With Unit Tests

Working Effectively With Unit Tests is a discussion not of when to unit test or how to unit test, but how to know when you’ve done it well. It works backwards from the idea that tests should be Descriptive And Meaningful Phrases(DAMP) as opposed to the traditional software pneumonic Don’t Repeat Yourself (DRY). By allowing some duplication in tests and focusing on the clear intention of what is to be accomplished you get tests that are easier to read and tests that are more focused on the object under test rather than the collaborators of the test.

The style being described forces out a lot of the elaborate mock setups common in most first attempts at unit testing. This is a definite good intention, however like most resources, I feel it comes short at describing a means to actually get rid of these sorts of problems in real applications, as opposed to toy applications in books and articles. The ideas it provides do work towards those ends admirably. To me, the ideas presented seem to drive towards a more functional style of programming; methods were getting more arguments which made the methods more flexible, and the objects they lived on were less prone to carrying around extraneous state. The book didn’t discuss this in functional programming terms, but sort of implied that was a goal around the edges.

Compared to some of the other books on unit testing I’ve read, this felt more concise, and it was definitely less focused on a specific framework for doing testing. It feels written for someone who has been doing unit testing for a while and has not been getting value from the activity, or has been having maintainability problems with tests. For those audiences it seems like it is a good perspective towards trying to get out of their problems. For people new to unit testing, it may be a little to broad in what you should do and not prescriptive enough.

Encryption Future

As a working programmer, encryption doesn’t seem like it changes much. AES and RSA public key cryptography have been fairly consistent in the world for a while. Key size recommendations have held up to the projections on computing power, so the overall landscape of implementation hasn’t had much movement. There has been a big emphasis on deciding to encrypt web traffic and lots of other things, but no real changes in the underlying technology.

The unveiling of a 72 qubit quantum computer and some of the work I’ve been doing on encryption at my job has had me thinking about the future of encryption. The jump from 17 qubits in 2017 to 72 already this year makes me think we’re getting close to an inflection point where quantum computing goes from a toy to a realistic threat to existing crypto systems.

Lattice-based cryptography is the leading contender for quantum resistant cryptography. The math behind it is based on the same math that describes the be arrangement of atoms in a crystal, but instead of happening in a three dimensional space it happens in an arbitrarily high dimension. I don’t understand the math behind this in three dimensions let alone higher dimensions. However, I do appreciate that the idea of the hard problem to be solved is based on a normal concept, like elliptic curve cryptography factoring integers. Understanding the idea helps me trust that the underlying math makes sense, even if I don’t understand the math itself.

Looking into this I stumbled into a different idea that was much more radical. Homomorphic encryption is the idea that you can do work over two different encrypted values such that the encryption is distributed over other arbitrary operations. So essentially

 

Encrypted(a) + Encrypted(b) = Encrypted(a+b)

 

However this works for all operations not just addition. Practically, this is overkill for any normal application; however, if the party with the data and the party with an algorithm are unwilling to trust each other you could use this to send the data to the algorithm securely and process it. While this seems like an amazing technology from a security and privacy perspective, there is a downside – it currently takes ~13 ms per logical gate to process. So, even something simple like adding two integers would take seconds to complete. You won’t be able to encrypt your data and give it to a foreign neural network anytime soon.

Realistically, nobody is going to implement this themself. There will be academic applications for now, and eventually something will emerge from NIST’s post-quantum cryptography program that everyone agrees seems right. Once there is agreement on a secure standard, different existing cryptography providers will start to add whatever that is to the package and application developers just need to update make new keys and reencrypt the world.

Book Chat: Perspectives on Data Science for Software Engineering

Perspectives on Data Science for Software Engineering is a collection of short research papers on using the tools provided by data science to do research into software engineering. It isn’t about the concepts of data science for software engineers as I thought it would be when I initially picked it up. This difference had me put it down the first time I picked it up to read it, but when I came back around to it I found myself interested not in the data science aspect of it, but the software engineering research aspect.

While none of the individual papers was something I read and immediately knew how I could apply in my own practice, the overall package helped me feel positive for progress in software engineering. Outside of language design, it sometimes feels like most of the software engineering learning we’ve done going as far back as the 70’s and 80’s hasn’t been applied in practice. I think part of the difference is because the research is disconnected from the way software is built in the wild. The research is hyper-specific, (e.g., focusing on a particular kind of software in a single language) or defines problems but not solutions (e.g., the work on code quality metrics). The research isn’t wrong, but it’s missing a step about how to apply the work to what you’re doing.

The only piece in here that I saw and felt had an immediate connection to what I was doing was the piece on bug clustering. That showed that the more bugs a file had the more likely it was to have more bugs in future iterations. This seems like it may lend some credence to the idea of rewriting a piece of code that has quality problems to effectively blank the slate and start over again.

Overall the book was intellectually stimulating but has no real practical usage for what I do or what I feel would be the average software developer. If your role straddles the practical and academic worlds then this may have more value to you.

Vault

Recently I’ve been working on rolling out a Vault implementation at work and to migrate all of our existing secrets over. It is a tool designed to secure secret data and control access to it. It also offers a variety of ways to handle dynamic secrets for things like database credentials. The dynamic database credentials are are an interesting security feature; any particular set of database credentials can be shut off at any point if compromised and are effectively rotated each time a new instance starts up. It can also act as a certificate authority. This is all built on top of a configurable set of backends and HA clustering setups.

One of the most interesting things is the unsealing process. The system starts sealed, where all of the secrets are inaccessible. The unseal process requires a majority of key fragments to be provided to unseal the vault. This is an implementation of Shamir’s Secret Sharing which i sa cool concept. In the enterprise version, it also provides an auto-unsealing mechanism built on top of AWS Key Management Service.

The REST API is pretty good and most major languages have a third party client available already. The third party clients have different levels of compatibility with all of the features of the system; since it is a plugin based system they don’t necessarily support everything. Sadly, the UI also doesn’t support all of the features, which makes doing some basic testing about how the system works more painful.

Vault seems like a very good tool chest for dealing with secrets, but I would like a more opinionated system about how to do this. I can build my own system on top of it but would like to have integrated support for creating a key of some type and storing it securely. Similarly, its scheme to provide transit encryption requires a lot of work on my side if I wanted to use it. Despite these areas for improvement I’m still excited to get it integrated into our systems.

Where are all the consultants?

I spend a lot of time engaging with programming related content, online and in person. Most of this content is created by people who describe themselves as “consultants” of some variety. Up until recently I had never worked with any consultants of this variety anywhere. I had wondered, where are all the consultants? Recently at work the floodgates opened and a huge wave of consultants appeared to help a couple of teams hit their objectives. I’m talking 40-50 consultants against an existing total engineering team of ~250.

Watching this from the outside was interesting since it seemed like our people spent a lot of time trying to get the consultants up to speed on everything that is going on. This was exacerbated as the consultants did not get access to everything a normal employee would, most notably the wiki; that meant that large quantities of the documentation that would normally just be linked to a new employee had to be exported and therefore couldn’t easily be contributed back to either. There were also timezone issues since many of the consultants were in eastern Europe , which resulted in them having a limited access window to interact with anyone on the US east coast and no reasonable time for them to interact with those on the US west coast. The remote only contractor presence was interesting given our unwillingness to start full time employees as remote. Overall the teams that picked up the consultants seemed to be able to eventually get around the obstacles and get the consultants contributing.

All of this was of idle curiosity as to the way the rest of the organization was run until the team I was on was slated to pick up oversight of two new consultants. Fortunately by the time we had gotten there most of the immediate logistical problems had been solved, and the majority of the basic onboarding documentation had been extracted from the wiki and put into a google drive the consultants were able to see. We also had the advantage of picking up US based consultants so the time zone issues weren’t an issue. Overall both consultants are very sharp, and experienced in the kind technologies we use. But, we have them for three months to start with, so we get the whole onboarding overhead but only three months to get the return on investment that comes from it.

This raises three questions in my mind. First, when the consultants are done how much more did we get done over what we could have gotten rather than just doing it ourselves?  Second, isn’t the whole process just going to repeat itself with the next big set of deliverables for engineering? Third, is the content being generated by these consultants I’m seeing their reaction to other companies that have already gotten themselves into trouble? The first question seems like it should be net positive, at least for the consultants my team has, but I think part of that is because the kinks in the system were worked out by others who went first. I feel like the second question is much more intriguing. It seems like the initial need for the consultants was due to a failure of organic growth in engineering. So the resources we put into finding and vetting the consultants weren’t being put into finding and vetting employees. Therefore, it seems like while we may have gotten more engineering work done in the short-term, HR/management resources were spread thinner in terms of doing the long term recruiting. Even though the consultants were doing great work, it feels our longer term ambitions may have been sacrificed to meet present obligations. The third question is much broader. If the advice being poured out into the internet and being delivered at conference talks and similar is the result of consultants looking at lots of organizations that are already dysfunctional, then it’s possible that it’s biased toward bringing bad to passable versus aiming for great. It strikes me as being like trying to form a psychological theory using just a prison population because that’s the psychologist happens to treat everyday. Since having this thought I haven’t been able to see any common architectural or management mantras that are clearly thought up based on these sorts of situations. Maybe Tolstoy was right after all: Happy families are all alike, every unhappy family is unhappy in its own way.

Book Chat: The Architecture of Open Source Applications Volume 2

The Architecture of Open Source Applications Volume 2 has writeups describing the internal structure and evolution of nearly two dozen different open source projects, ranging from tools to web servers to web services. This is different from volume one, which didn’t have any web service-like software, which is what I build day to day. It is interesting to see the differences between what I’m doing and how something like MediaWiki powers Wikipedia.

Since each section has a different author the book doesn’t have a consistent feel to it or even a consistent organization to the sections on each application. It does however give space to allow some sections to spend a lot of time discussing the past of the project to explain how it evolved to the current situation. If looked at from the perspective of a finished product some choices don’t make sense, but the space to explore the history shows that each individual choice was a reasonable response to the challenges being engaged with at the time. The history of MediaWiki is very important to the current architecture whereas something like SQLAlchemy(a Python ORM) has evolved more around how it adds new modules to enable different databases and their specific idiosyncrasies.

I found the lessons learned that are provided with some of the projects to be the best part of the book. They described the experience of working with codebases over the truly long term. Most codebases I work on are a couple of years old while most of these were over 10 years old as of the writing of the book, and are more than 15 years old now. Seeing an application evolve over longer time periods can truly help validate architectural decisions.

Overall I found it an interesting read, but it treads a fine line between giving you enough context on the application to understand the architecture, and giving you so much context that the majority of the section is on the “what” of the application. I felt that a lot of the chapters dealt too much with the “what” of the application. Some of the systems are also very niche things where it’s not clear how the architecture choices would be applicable to designing other things in the future, because nobody would really start a new application in the style. If you have an interest in any of the applications listed check out the site and see the section there, and buy a copy to support their endeavours if you find it interesting.

Book Chat: Learn You a Haskell for Great Good

Haskell was the white whale of functional programming in my mind, something that is the definitive form of functional programming but with such a steep learning curve that it put off all but the most determined students. I had been recommended Learn You a Haskell for Great Good a while ago but kept putting it off because of the intimidating nature of the material. I eventually had a big block of time where I was going to be home and didn’t have many responsibilities so I figured this would be a great opportunity to take a crack at it.

I sat down with it with an expectation that it would be mentally taxing like Functional Programming in Scala was, however having put in the work already reading that and Scala with Cats I was way ahead of the curve. While the Haskell syntax isn’t exactly friendly to beginners I understood most of the concepts; type classes, monads, monoids, comprehensions, recursion, higher order functions, etc. My overall expectation of the difficulty of the language was unfounded. Conceptually it works cleanly, however, coming from a C style language background the syntax is off putting. Added to the basic syntax issues most of the operators being used do give it an aura of inscrutability, especially being difficult to search as they are. I did find this PDF that named most of them which helped me look for additional resources about some of them.

The book explained some of the oddities around some of the stranger pieces of Haskell I had seen before. Specifically the monad type class not also being applicatives, it’s a historical quirk that monads were introduced first and they didn’t want to break backwards compatibility. The other fact that I had not fully appreciated Haskell dates from 1990 which excuses a lot of the decisions about things like function names with letters elided for brevity.

The other differentiating fact about the book is that it tries to bring some humor, rather than being a strictly dry treatment of the material. The humor made me feel a stronger connection with the author and material. A stupid pun as a section header worked for me and provided a little bit of mental break that helped me keep my overall focus while reading it.