Organizing Code

I’ve been arguing with myself about the proper way to split up some code that has related concerns. The code in question relates to fetching secrets and doing encryption. The domains are clearly related, but the libraries aren’t necessarily coupled. The encryption library needs secrets, but secrets are simple enough to pass across in an unstructured fashion.

As I mentioned before, we are integrating Vault into our stack. We are planning on using Vault to store secrets. We are also going to be using their Transit Encryption Engine to do Envelope Encryption. The work to set up the Envelope Encryption requires a real relationship between the encryption code and Vault.

There are a couple of options for how to structure all of this. There are also questions of binary compatibility with the existing artifacts, but that’s bigger than this post. The obvious components are configuring and authenticating the connection to Vault, the code to fetch and manage secrets, the API for consuming secrets, and the code to do encryption. I’m going to end up with three or four binaries, encryption, secrets, secret API, and maybe a separate Vault client.

organizingCode

 

That would be the obvious solution, but the question of what the Vault client exposes is complex, given that the APIs being used by the encryption and secrets are very different. It could expose a fairly general API that is essentially for making REST calls and leaves parsing the responses to the two libraries, which isn’t ideal. The Vault client could be a toolkit for building a client instead of a full client. That would allow the security concerns to be encapsulated in the toolkit, but allow each library to build their own query components.

Since the authentication portion of the toolkit would get exposed through the public APIs of the encryption and secret libraries, that feels like a messy API to me and I’d like to do better. There seems like there should be an API where the authentication concerns are entirely wrapped up into the client toolkit. I could use configuration options to avoid exposing any actual types, but that’s just hiding the problem behind a bunch of strings and makes the options less self-documenting.

Like most design concerns there isn’t a real right answer. There are multiple different concerns at odds with each other. In this case you have code duplication vs encapsulation vs discoverable APIs. In this case code duplication and encapsulation are going to win out over discoverable APIs since the configuration should be set once and then never really changed, as opposed to the other concerns which can contain the long term maintenance costs of the library since it will likely be used for a good while to come.

Advertisements

Book Chat: The Architecture of Open Source Applications Volume 2

The Architecture of Open Source Applications Volume 2 has writeups describing the internal structure and evolution of nearly two dozen different open source projects, ranging from tools to web servers to web services. This is different from volume one, which didn’t have any web service-like software, which is what I build day to day. It is interesting to see the differences between what I’m doing and how something like MediaWiki powers Wikipedia.

Since each section has a different author the book doesn’t have a consistent feel to it or even a consistent organization to the sections on each application. It does however give space to allow some sections to spend a lot of time discussing the past of the project to explain how it evolved to the current situation. If looked at from the perspective of a finished product some choices don’t make sense, but the space to explore the history shows that each individual choice was a reasonable response to the challenges being engaged with at the time. The history of MediaWiki is very important to the current architecture whereas something like SQLAlchemy(a Python ORM) has evolved more around how it adds new modules to enable different databases and their specific idiosyncrasies.

I found the lessons learned that are provided with some of the projects to be the best part of the book. They described the experience of working with codebases over the truly long term. Most codebases I work on are a couple of years old while most of these were over 10 years old as of the writing of the book, and are more than 15 years old now. Seeing an application evolve over longer time periods can truly help validate architectural decisions.

Overall I found it an interesting read, but it treads a fine line between giving you enough context on the application to understand the architecture, and giving you so much context that the majority of the section is on the “what” of the application. I felt that a lot of the chapters dealt too much with the “what” of the application. Some of the systems are also very niche things where it’s not clear how the architecture choices would be applicable to designing other things in the future, because nobody would really start a new application in the style. If you have an interest in any of the applications listed check out the site and see the section there, and buy a copy to support their endeavours if you find it interesting.

Book Chat: Scala With Cats

Scala with Cats is a free ebook put together to help introduce the Cats library and it’s programming style to developers. It is targeted to Scala developers with about a year of experience with the language, but if you were using the language in a very Java-like way you may not be prepared for this book. That caveat aside, it brings an accessible introduction to the library and it’s style of programming.

I can’t go back and have read this before having read Functional Programming in Scala but it seems like either order would work fine. They both talk about the same basic concepts around purely functional programming. They come at it from two different perspectives; Scala with Cats is about how the category theory-inspired structures in the library can be used to solve problems, whereas Functional Programming in Scala is leading you towards those same category theory-inspired structures but getting you to find the patterns yourself.

I really appreciated the last set of exercises in Scala with Cats where it had you implement this concept. It starts out out as a fully concrete class then converting it into more and more generic structures. First, by adding some type classes to become generic to the specific types. Then, by abstracting over the intermediate data structure and converted the structure to its own type class. Finally, by abstracting over the data structure even further by replacing it with another type class.

I think this style of programming has some definitive pros. The idea behind the vocabulary is good, even if the terms chosen obscure some of the intent. The extensive usage of type classes adds an additional layer of polymorphism that lets a library author abstract over portions of the implementation to make it future-proof. The Scala implementation of type classes makes this feel awkward at points since the imports around implicit instances are less obvious around what is happening. I feel like I need to spend some time with a real application written in this style to try to see what the negatives are to working with it. I can see the issues with learning to work in this style, but I’m uncertain about what the negatives are once you’ve gotten used to this style.

Type Aliases in Scala

I had an interesting conversation recently with one of the junior engineers on my team about when to use type aliases. Normally when I get asked for advice I’ve thought about the topic or at least have a rule of thumb I use for myself. Here all I could manage to express was that I don’t use type aliases but not for any particular reason. I felt I should do better than that and promised to get some better advice and see what we can do with that.

Having thought it through a little, here’s the guidance I gave. You can use type aliases to do type refinement to constrain an existing type. So, you could constrain that integer to only positive integers. Instead of assuming that some arbitrary integer is positive, or checking it in multiple places you can push that check to the edge of your logic. This gives you better compile time checks that your logic is correct and that error conditions have been handled.

They can also be used to attach a name to a complex type. So instead of having an

Either[List[Error], Validated[BusinessObject]]

being repeated through the codebase you can name it something more constructive to the case. This also allows hiding some of the complexities of a given type. So if, for example, you had a function that returns multiply nested functions itself like

String => (Int, Boolean) => Foo[T] => Boolean

it can wrap all that up into a meaningful name.

None of this is really a good rule for a beginner but it feels like it wraps up the two major use cases that I was able to find. I ended up going back to the engineer that prompted the question, with “use type aliases when it makes things clearer and is used consistently.” Neither of us were really happy with that idea. There are clearly more use cases that make sense but we weren’t able to articulate them. We’re both going to try it in some code and come back around to the discussion later and see where that gets us.

Scala Varargs and Partial Functions

I ran into a piece of code recently that looked like

foo(bar,
   {case item:AType => …}
   {case item:AnotherType => …}
{case item:YetAnotherType => …}
// 10 more cases removed for simplicity
)

I was immediately intrigued because that was a very odd construction and I was confused why someone would write a function accepting this many different partial functions and what they were up to. I went to look at the signature and found the below.

def foo(bar: ADTRoot, conditions: PartialFunction[ADTRoot, Map[String, Any]]*):  Map[String, Any]

It was using the partial functions to pick items from the algebraic data type (ADT) and merge them into the map. More interestingly it used the the ability of the partial function to identify if it can operate on the type that bar happened to be. Overall it was interesting combination of language features to create a unique solution.

Part of is that the ADT was missing some abstractions that should have been there to make this sort of work easier, but even then we would have had three cases not a dozen. I’m not sure if this pattern is a generalizable solution or even desirable if it is, but it got me thinking about creative ways to combine language features provided by Scala.

Book Chat: Functional Programming in Scala

I had been meaning to get a copy of this for a while, then I saw one of the authors, Rúnar Bjarnason, at NEScala 2017 giving a talk on adjunctions. Before seeing this talk I had been trying to wrap my head around a lot of the Category Theory underpinning functional programming, and I thought I had been making progress. Seeing the talk made me recognize two facts. First, there was a long way for me togo. Second, there were a lot of other people who also only sort of got it and were all there working at understanding the material. At the associated unconference he gave a second talk which was much more accessible than the linked one. Sadly there is no recording, but I started to really feel like I got it. Talking with some of the other attendees at the conference they all talked about Functional Programming in Scala in an awe inspiring tone about how it helped them really get functional programming, and the associated category theory.

The book is accessible to someone with minimal background in this, so I came in a somewhat overqualified for the first part but settled in nicely for the remaining three parts. It’s not a textbook, but it does come with a variety of exercises and an associated repo with stubs for the questions and answers to the exercises. There is also a companion pdf with chapter notes and hints about how to approach some of the exercises that can help you get moving in the right direction if stuck.

Doing all of the exercises while reading the book is time consuming. Sometimes I would go read about a half a page and do the associated exercises and spend more than an hour at it. The entire exercise was mentally stimulating regardless of the time I committed to the exercise, but it was draining. Some of the exercises were even converted to have a web-based format that is more like unit testing at Scala Exercises.

I made sure I finished the book before going back to NEScala this year. Rúnar was there again, and gave more or less the same category theory talk as the year before, but this time around I got most of what was going on in the first half of the talk. In fact, I was so pleased with myself, that I missed a key point in the middle when I realized how much of the talk I was successfully following. I ended up talking with one of the organizers who indicated he encouraged Runar to give this same talk every year since it is so helpful to get everyone an understanding of the theoretical underpinnings of why all this works.

This book finally got me to understand the underlying ideas of how this works as I built the infrastructure for principled functional programming. It leaned into the complexity and worked through it whereas other books (like Functional Programming in Java) tried to avoid the complexity and focus on the what not the why. This was the single best thing I did to learn this information.

Functional Programming Katas

Based upon my success with the F# koans I went looking for some more covering the functional side of Scala programming. I’ve found a couple so far, which I’ve completed with varying levels of success.

First was the Learn FP repo is a github repo to checkout and has some code to fill in to make some tests pass. The exercise asks you to provide the implementations of various type classes for different types. There were some links to other articles about the topics but otherwise it was just code. The first part of this was fairly straightforward; I had some trouble with State and Writer but otherwise persevered until I hit the wall at Free. I ended up breaking down and looking up the completed solutions provided in a different branch to find that IntelliJ indicates that the correct solution doesn’t compile (Turns out the IntelliJ Scala plugin has an entire scala compiler in it, and it’s rough around the edges). That frustrated me for a while but I eventually managed to power through, and thankfully the rest of the exercises didn’t suffer from the same problem.

Next was the Cats tutorial. I had done some of the other exercises here when first learning Scala and that had been pretty helpful. This has a neat interactive website to run the code you fill in, but it makes it harder to experiment more with the code. This seemed like a reasonable place to start to cover a lot of the major type classes in Cats. It has you look at sample code and fill in what it would evaluate to. It was good but I had two issues with it. First, there are multiple blanks to fill in some of the sections and it evaluates all of them as a group and doesn’t provide any feedback helping you know which one you got wrong. Second, it’s a lot of looking at other code and describing what it does, no writing of code in this style yourself. Overall it helped me feel more comfortable with some of the terminology, but didn’t produce that “ah ha” moment I was looking for regarding the bigger picture.

Then I went to the Functional Structures Refactoring Kata, which is an application to be refactored into a more functional style with samples in multiple languages.The authors provide a ‘solution’ repo with refactored code to compare to. The issue I had with this exercise is that other than going to look at the solution there isn’t a real way to tell when you’re done.  Even then, some of the ways they factored their solution are opinion based. While seeing that opinion is interesting they don’t really explain the why of their decisions.

The last tutorial I tried was the Functional Programming in Scala exercises. It’s from the same people as the Cats tutorial above and is based on the exercises in the book Functional Programming in Scala. I managed to get about halfway through it without having read the book. While there is some prose in between exercises, it doesn’t adequately explain all of the concepts. While I’m reading the book I will come back to this and do the rest of the exercises.

Overall I would strongly recommend the Learn FP repo, and recommend the Cats tutorial. I would pass on Functional Structures Refactoring Kata. I’ll hold judgment on Functional Programming in Scala until I can try it with  the book. While these were largely good starts, I still haven’t had that conceptual breakthrough I’m looking for on how to use all of these pieces in practice.

Book Chat: Refactoring

Refactoring sets out describe what refactoring is, why you should refactor code, and to catalog the different refactorings that can be done to an object oriented codebase. This isn’t the first instance of the idea of refactoring, but it was the big coming out party of the idea in 1999. It is an audacious goal in that the effort to catalog all of anything can be daunting. While I’m not an authority on refactoring by any means, it certainly captured all of the basic refactorings I’ve used over the years. It even includes refactoring to the template method design pattern, though it doesn’t reference something like refactor to the decorator pattern. It seems odd to have included refactor to one design pattern but not to several others.

The description of the “what” and “why” of refactoring are excellent and concise. The catalog is ~250 pages of examples and UML diagrams of each refactoring technique; that each refactoring needed to be shown, feels like overkill. In general, the author shows both directions of a refactor, e.g., extract method and inline method, which can be rather overwhelming. A newer volume on refactoring like Working Effectively With Legacy Code seems more useful in its presentation of actual refactoring techniques, in that it prioritizes where we wish to go, rather than exhaustively describing each individual modifications. Honestly, I think that since Refactoring predates automated tools for performing refactoring, given that  the internet in 1999 wasn’t as full of help on these sorts of topics, the book needed to be more specific since it was the only source of help.

It’s an interesting historical piece, but not an actively useful thing to try to improve your craft.

Akka From A Beginner’s Perspective

I wandered into a new part of the codebase at work recently that contains a number of Akka actors. I was aware of both the actor concept and the library, but had never worked with either.

Actors are a way to encapsulate state from threads so that if you want to make a change to the state you need to send a message to that thread. If you’ve ever done any work with an event loop, it’s similar to that but genericized to whatever sort of data not just events. The idea is that each actor provides a mailbox where you can leave a message, then that actor processes the message and whatever happens to the actor’s state happens on that thread. This means the messages go the actor’s thread rather than the data being fetched from the actor and brought back to the caller’s thread. The big advantage of this is that there isn’t any need for locking since no mutable state is shared. The downside to this message-passing style is that the default message flow is one way. Some typical code using an actor would look like

actor ! message

This would send the message to the actor. The actor itself can be pretty simple such as

class ActorExample extends Actor {
  def receive = {
    case SampleMessage=> println(“got message”)
  }
}

That receives the message and runs the listed code if it is of some expected type (in this case, SampleMesage). This is good for data sinks, but actors can be composed too.

class ForwardingActor(destination: OtherActor) extends Actor {
  def receive = {
    case SampleMessage(content)=>
      println(s“got message {content}”)
      destination ! SomeOtherMessage(content)
  }
}

This actor logs the contained data and passes the data along inside a different message wrapper. This is interesting but requires you to define the destination when creating the actor. Akka provides a syntax for finding out the actor that sent you a message too.

class ReplyingActor extends Actor {
  def receive = {
    case SampleMessage(content)=>
      sender() ! Reply(content) // The () is optional but using it for clarity here
  }
}

This simply sends back the same content inside a new message envelope. There is one small gotcha in this code – if you close over sender() itself it will have unintended consequences, so a different pattern is recommended for your receive message.

class ReplyingActor extends Actor {
  def receive = {
    case SampleMessage(content)=>
      processSampleMessage(sender(), content)
  }
  private def processSampleMessage(sender: ActorRef, content: String) = {
      sender ! Reply
  }
}

This figures out the sending actor before doing any processing to be sure you don’t end up closing over the wrong actor as you chain more complex pieces together. The other interesting thing about this example is that the type of sender is ActorRef not Actor. The ActorRef is a handle to wrap around the actor which keeps track of where it runs and how to get the message to its mailbox for you. This allows you to do things like have two actors interact even though they are being scheduled independently. This all seems pretty straightforward if you send a message from an actor to another actor, but if you send a message from something that isn’t an actor, what does sender() do and how does that work?

The answer is that the message is generally discarded, unless the call was made with an ‘ask’ such as

val result = actor ? message

This captures the result of the actor as a Future[Any], which at least returns the result so you can inspect it even if the type isn’t that useful. Akka currently provides typed actors to try and work around that pain, which is intended to be replaced by Akka Typed which isn’t quite ready for production as of this writing.

That’s all the Akka I picked up delving into this new portion of the codebase. I didn’t need to get into supervision or schedulers, but if building a new application from scratch I’m sure those concepts would come up.

Type Classes

I have been trying to understand type classes for a while and having a hard time figuring out exactly what they mean or how to effectively use them. The idea that they are a means to extend the behavior of a class in a consistent matter makes sense in an abstract way. But I’m not sure how exactly they accomplish that goal, or what a type class is or isn’t. A coworker had offhandedly referred to a type class as “a trait with a type parameter,” but it feels like there has to be more to it than that. This post is a sort of journal of my efforts to figure out exactly what a type class is.

The wikipedia page wasn’t that helpful to me since it defines type classes in terms of other unknown terms. It had one useful tidbit of information from my perspective: in Scala, type classes are implemented with implicits. Following some more links I ended up at the Cats documentation, which has a bunch of example type classes and code using them that I found useful. This got me thinking about whether I had seen anything that had a signature that looked like it might be using a type class. I remembered the sum method on List, which had stuck in my mind because it was unclear as to how it knew how to sum the items and what types it would be legal to sum.

This definitely looks like a type class given our definition so far. We have an implicit argument that is a trait with a type parameter. It is being used to extend numeric types to give them orderings and perform arithmetic operations, but it also signals that the type is numeric. The type class is also being used to constrain what types the sum method is available for, since if the implicit is not available it won’t compile. This constraint also plays nicely with the context bound syntax.

So we’ve got an example and some rules, but that’s not really a definition. I went looking for some more examples in the Cats codebase since that is full of type classes. Each of the individual type classes in Cats definitely follows the pattern of a trait with a type parameter. I think the missing piece for my understanding, lay in what the methods on the type class are. The methods all seem to take at least one argument of the type parameter so that appears to be a reasonable constraint on the functions.

Type classes are different than an implicit class since you can constrain type signatures with it, but they both let you add new functionality to an existing type. The implicit lookup progress imposes some constraints on the implicit class, such as only taking one non-implicit argument, which the type class methods can bypass. You could write an implicit class to expose the type class in a more fluent way like.

 implicit class NumericWrapper[T:Numeric](x: T) {
  def plus(b: T): T = {
    val n = implicitly[Numeric[T]]
    n.plus(x, b)
  }
}

Type classes seem like they would be more useful in a language without multiple inheritance. Since in Scala I can have a type implementing multiple traits that already have implementation associated with them, just mixing in more code seems like an easier way around the extension problems. I found this proposal for adding type classes to C#, which seems very cool and in line with the sorts of powerful abstractions they’ve been trying to add to the language. Seeing a different syntax for using type classes without implicits being involved helped me understand what they really are.

Going back to my initial goal of figuring out what a type class is and how it works, I think I’ve figured out both. It is a generic type that adds functionality to that type but defers the implementation of that functionality. Then you specify something that expects the type class and brings the implementation of the functionality and the data together. In Scala it accomplishes this using type bounds to specify the type class and implicit parameters to pass the type class into the implementation. I’m still not sure when I would want to write my own type class as opposed using other polymorphic concepts, but I’m now confident about using existing type classes and even breaking out some of the Cats-based ones as opposed to just some of the inbuilt ones.