Book Chat: How To Solve It

How To Solve It isn’t a programming book. It’s not exactly a math book either, but you will find yourself doing geometry while reading it. It isn’t a book on logic, but it is all about structured thought processes. I would describe it as a manual to teaching a systematic approach to problem solving to others, using geometry and a series of examples. It tries to lay out all of the thoughts that whiz through your head when you see a problem and understand how to solve it without really contemplating how you knew it. It’s a fast read, assuming that you know the geometry he uses in the examples.

The problem solving process is broken into four basic steps: understanding the problem, devising a plan, carrying out the plan, and looking back. At first it seems obvious, but that’s thing about a structured approach, you need to cover everything and be exhaustive about it. For example, to understand the problem you identify the unknown, identify the data, identify what you want to accomplish, try to draw a picture, introduce suitable notation, and figure out how to determine success. If you wanted to know should you buy milk at the store this sort of formal process is overkill, but if you are struggling with a more complex problem like trying to figure out what’s causing a memory leak or setting up a cache invalidation strategy it might be valuable to structure your thoughts.

I haven’t had a chance to apply it to a real problem yet. I did use some of the teaching suggestions – how to guide the pupil to solve their own problems – with one of the junior engineers I mentor and it seemed productive. I got him to answer his own question, however not enough time has passed to see if it improves his problem solving abilities in the future.

Overall the book was an interesting experience to read and seems practically applicable to the real world.

Advertisements

Flow and Learning

Since I recently changed jobs, I’ve been learning a ton of new things. Moving from Windows/C#/Visual Studio/ASP.Net/SQL Server to OS X/Scala/IntelliJ/Play/MongoDB is a lot of change, normally you’d have some sort of strong point to leverage off of, but for me it is all new. I’ve always had an introspective personality, and the introspective part of me is looking at this and thinking that I got in over my head. The logical portion of my brain says the learning curve is expected and I’m working my way up it, but it is difficult to reconcile the two perspectives.

slide41

I had found the above diagram describing how challenge and skill interact and can cause boredom or frustration. The context of this diagram was how games try to stay in the flow channel, and the player response on either side of the channel. The more immediately relevant version was the one where it described the experience of starting a new job.

slide17-1024x7081

This diagram matches with my experience here, especially since I took a bigger hit to my applied skill level due to the radical change in the technical stack. This puts me further into the frustration section than other similar job change experiences, and causes the sort of anxiety that I had been grappling with.

I know I’m making sure to keep an eye on my stress levels and everyone at work understands the challenges that are being encountered since they went through most of the same ones at some point. I changed some of my habits around what I’ve been reading, less dense material more fiction. I changed some other habits, since I cut my commute down significantly I’ve been trying to make sure to use that time wisely and get some additional exercise to deal with the stress in a positive fashion.

By putting together the rationalization of what is happening I hope to assuage my own insecurities. The mental load of the insecurities can take attention away from learning and doing your best, making the insecurities a self-fulfilling prophecy. I hope this account of the feelings I’ve been encountering helps others to recognize that they aren’t anything abnormal to feel, but that you can’t let the negative feelings control your mind.

Book Chat: Thinking Fast and Slow

Thinking Fast and Slow by Daniel Kahneman is a psychological profile of the two systems of human thought – a topic on which he is one of the pioneering researchers, and for which he shared a Nobel Prize. System 1 is the fast intuitive system that allows you to make snap judgments. System 2 is the slow methodical system that allows you to really dig into the data and do math. The book is mostly about the weaknesses of the two systems and how System 1 can be tricked and System 2 gets tired. The material is presented as a series of experiments mostly paired with an anecdote of how the idea came up. It tries to outline which problems engage which system and why. None of this material is directly applicable to computer programming; however, the understanding of human cognition is useful to understand how you are thinking about problems and hopefully be able to recognize when you should be trying to engage System 2 and when taking the quick answer from System 1 is acceptable.

One of the most interesting experiments to me was one that described how patients rated their pain over time during a procedure and how they rated the experience as a whole afterwards. The goal was to find the relationship between the two of them. The peak level of pain suffered was an obvious component to the whole, the experience towards the end of the procedure also had a strong component. However the total duration of the procedure did not impact the total rating of the experience. This implies that the mind doesn’t remember things based on the duration of the experience. Kahneman suggests that therefore,  if you were trying to structure an experience where there will be discomfort, extending the session to make the last part nicer would improve the memory of the whole experience. I’m not sure if this would apply to pulling off bandaids or not.

There was another anecdote that caught my attention, where Kahneman had spent his time in the Israeli Defense Force working on figuring out how to assign incoming recruits to the various branches. The previous method had been a sort of free-form interview relying on the judgement of the interviewers, which had not been overly successful. He designed a new system to try and remove the individual biases of the interviewers from the process. The system he put together had six categories and a fixed set of factual questions for each category. Based on the answers to those questions each category was assigned a rating between one and five then the results were summed to produce an overall evaluation. This produced a significant increase in successful placements over the old system. He also suggests that this theory could be applied to hiring. I’m not certain if that would be a good choice since this was to sort recruits to different types of units, rather than to pick one person from many.

Overall there are thirty-some of these different little experiences that are all drawn together here from Kahneman’s 40+ years of research and quite entertaining. Despite the somewhat dense material, it reads easily and is quite accessible without any sort of former exposure to these sorts of topics. If this sort of thing is interesting to you, he also references other similar books talking in more detail about some of the specific ideas discussed.

Software Ownership vs Caretaking

Recently, I was trying to describe the relationship of my team to a particular piece of software within our company. I ended up settling on the term “caretaking.” We don’t have ownership of this particular piece of software in the traditional sense, wherein we are responsible for the content of the software and whether it is correct. Yet, we were in the terrible position of being the first people called whenever it ran into problems.

The software itself was in weird but decent shape; a lot of old battle-tested code that could benefit from some modernization and better automated testing (it had a couple of high-level integration tests but they were finicky and would only run in one particular QA environment). It is in that zombie state where we don’t make any changes to it since it meets the requirements, but nobody would look at it and say that’s some nice code. Nobody on the team had any particular experience with this code, but someone who had been on the team years ago had worked on it before joining this team. The code just sort of followed along with the person, and nobody ever really pushed back on it, since it didn’t run into problems. This particular code is even slated to be replaced, as a new team – we’ll call them “Triangle” – is centralizing all of responsibilities in this application’s domain into a couple of different services. We were caretaking – being nominally responsible for now but not paying too much attention to it.

This was all fine, right up until a serious security bug was found in this code. Our security team was rightfully saying this needed to get resolved ASAP. We wanted Triangle to just replace it now since that was going to happen anyway and we were already booked on other time-critical activities. Triangle wasn’t inclined to get pulled into this and commit to something on a deadline, which is a completely fair reaction on their part.

We took a look at our options for trying to resolve the problem and realized we had three. The first was a short-term fix, which would prevent the security exploit but would likely cause other problems in the medium term. The second was to gut the code and rebuild it without the exploit in place; this was pretty quickly rejected because it seemed wasteful to gut it for this then replace it again in the short term, especially given that the other team was planning to replace it in a different tech stack than the current one. The third was a nuanced tweak to the existing behavior that looked like it would fix the security issues but it was unclear as to if it would have any other negative side effects. We decided to go with the third approach since the first approach would likely end up coming back and costing us more time in the long term.

We decided at this point that we were also going to actively take ownership of the piece of code in the medium term. We didn’t gut the existing solution, but I applied some iterative refactoring to the situation to safely create the seams we needed to test the software to our satisfaction. The existing code was one big method, about 1500 lines. I started by extracting out some methods from that, then converting those methods into a couple of classes. I did it this way so that the available tools handled most of the refactoring so that I could be confident in the correctness of the refactored code, even with the limited integration tests available. We felt that this little bit of unit testing would enable us to make the change we needed to with confidence that we didn’t break any of the broader test cases.

We ended up fixing the problem and one little corner of the codebase. Applying some TLC to this portion of the codebase was rewarding, since every time we had looked at it before when it did something odd it was a negative experience everyone wanted to avoid. Overall, I only spent about a day cleaning it out and making the basic workflow conceptually clean. The software got much better and it just took the will to make the changes. The effort was lower than what would have been expected, to try and fix this and it built a good bit of confidence if anything else in this area. It was a rewarding little win.

The pride of ownership and fully tackling a problem to make the system better,  rather than continually avoiding, has lots of benefits. You actually make things better, which can be motivating to continue to tackle more problems. Making something a little better, then a little better, then a little better again, and all of a sudden you’ve changed the tenor of the system. In this instance, tackling the problem head-on proved to be easier than we thought. We are still caretaking a bunch of other code, a good bit of which is slated to be replaced as well, but if push comes to shove we’ll hopefully go ahead take full ownership of that code too.

Hiring Randomness

I ran across this article on companies’ interest in interviewing different archetypes of programmers, e.g., “Academic programmer”, or “Enterprise programmer”. I had two big takeaways from the article. First, all of the companies were different, so none of the archetypes appealed to everyone. Second, the archetype that received the most attention wasn’t the one described in terms of technical abilities, but in terms of the applicant’s interest in product development. The typical software engineer only hears interview feedback about themselves in relation to an individual employer, as opposed to hearing how different places consider the same set of people, so this was an interesting discussion to read.

The first insight, that different companies want different things, seems to have matched my intuition from years of moving around the industry. Each company seems to be looking for a different mix of skills and weighs technical versus soft skills differently. It’s really no different than the idea that different people look for different things in employers. When a job posting is put up, it discusses the technical and soft skills the role would want, so it filters the incoming candidates to people who think they match those. So, even on the inside, as an interviewer, you won’t see candidates who are self-selecting against the description of what you say you want to remove themselves from your potential hiring pool.

The second insight was more interesting, that the archetype that was described as about building product more so than building good software was the one that got the most positive response. This might make sense when you look at the audience viewing the archetypes in this particular example: small startups that need product. This archetype seems great, but why is there so much more interest than any of the other archetypes? I can see why most companies would want that archetype, but it isn’t clear why there is so much less for some of the other archetypes, who I feel like I would want to work with more.

There are some other interesting thoughts buried in the grid of results. The difference between the “child prodigy” and the “strong junior” archetypes are interesting. They both represent the same sort of talent, but with a different story, so why should there be a significantly different opinion of the two archetypes? Who is the company who rejected every profile but the enterprise programmer? Why would you take the “experienced and rusty” archetype over the “technical programmer” archetype? All of this taken together makes it seem like there is more at play in these discussions than just the background of the person being seen. It also seems that each company rejected about half of the candidates.

This led me to reconsider some of the steps I’ve used in hiring in the past. The shape of the funnel used in the hiring pipeline reinforces some of the built-in problems with attracting talent. It seems like the funnel is always too wide, pulling in candidates you can’t work with, and the resume screen is tossing the wheat with the chaff. The article’s advice to programmers seems to be to spend more time on each application and personalize it. This is good advice for the individual, but it doesn’t resolve the issue on the company side that everyone seems to be throwing away talent that some other company seems eager to have. That half is up to us when we are on the hiring side of the table to take an open mind to the background and find the talent that is interested.

Software Performance Economics

There is lots of information out there about how software performance impacts your ability to sell, but I haven’t seen much information about the cost of building higher performing software. What does it cost to go back and add a caching tier after the fact? What scales that cost? I would think it is related to the complexity of usage of whatever it was trying to cache, but I’d love to see hard data. Quantifying the relationship seems like it would be an interesting place for research. Today’s post is mostly on the performance side of the equation; next week I’ll be looking at the scalability side.

There is plenty of discussion in the community of the economics of software construction related to the cost of building quality in versus dealing with the fallout from low quality later. But there is little discussion of the economics of building a high performance solution as opposed to a low performance one. I think this is because low performance is perceived as a design defect and low quality as an implementation defect.

It seems like there are a couple of points at play. There is some fixed cost for building a feature in an application. It has some minimum level of performance, such as an n^2 algorithm with the number of customers in the system. When you have ten customers n^2 is fine; with ten thousand you might have a problem; at a million it is looking ugly. But if you started with n^2, it is now a question of what it costs to get from n^2 to n log n or even better. Premature optimization has a bad reputation, since it is assumed (for good reason) that a better performing solution would cost more, either in initial cost or in long-term maintenance.

If it takes 4 developers a month to build something that works at an n^2 performance level, would it take another week to improve it to run at n log n or would it take another month? What about if you wanted to go directly to n log n or better, how would that impact the results?

Imagine writing a sort implementation. Writing either a bubble sort implementation or a quicksort implementation is easy enough since they’re both out there and well know enough that you just go look it up by name. There are so many available options that – outside of school – most people will never write one. On the other hand, for a more complex sorting situation, maybe there aren’t so many published options to choose from at various levels of optimization. Maybe you need a sort that can take prior sorting actions into account, you’d end up with a selection sort that runs in n^2 time. I had to use this previously, and in the specific case I was working on, the runtime was fine since n was capped at 100, but for other circumstances this would be a severe bottleneck. If n was expected to be truly large what could I have done? There are additional concurrent sorting options that are out there which may have been able to apply more resources to the problem. If n was truly ridiculously large there are ways to sort data that don’t fit into memory too. But this example is still in a well-defined and well-researched problem domain.

The original solution to my specific case was a hideous optimization solution that approximated the correct answer. This approximation eventually caused trouble when an item that should have been included in the result wasn’t. Once we instead described it in terms of a sort, it afforded us the opportunity to rebuild the solution. We rebuilt in significantly less time than the original solution (two weeks versus four weeks), since we ended up with a simpler algorithm to do the same thing. Most of the time spent during the rewrite was on research trying to figure out how to solve the problem;during the initial implementation it was mostly spent on testing and bug fixing. This is the only time I’ve been involved in this sort of rewrite at the application level, which I suspect is a significant outlier.

This post is unfortunately more questions than answers. I don’t have the data to make any good conclusions on this front. But it is clear that building the most efficient implementation regardless of the scale of the business need isn’t the right choice. Understanding the situation is key to deciding what level of performance you need to hit. Not everything is as straightforward as sorting algorithms, but most interesting problems have some level of researched algorithms available for them.

Postscript for anyone that’s interested in the specific problem I was working on. We were trying to figure out what order events should be done in, but some of them couldn’t start until after a specific time, not some other set of events. So we needed to pick the first thing to occur than the second etc. Most sorts don’t work that way, but that how we ended up with a selection sort. It always selects the first element then the second element. So we could compute what time the next item would be starting at.

The Ability to Change Things

I touched on this previously in Management Shouldn’t Be The Only Option, but I saw this post, and it has me thinking about the topic again. The Codist regrets not having gone into management, for what looks like two core reasons. First the money: pretty straightforward since managers appear to make more of it and have more career advancement options to get even more money (and money has lots of advantages). The second reason – the ability to change things – is way more interesting to me.

I could go on and on but the key is that you can’t make changes in how people do things in a technical sense unless you have the ability, the authority and the opportunity. Once you make that call and assuming you find the right places to grow, the sky is really the limit.” Why doesn’t a top ranking developer have the the ability, authority and opportunity to change how things are done?” (emphasis mine). Read More

Complacency

My first car was decent car when I got it. I had it for years, and I didn’t take real good care of it. Part of it was because I didn’t have a ton of money and wasn’t inclined to spend it on the car, but a lot of it was because I didn’t know much about taking care of it. I changed the oil, but I neglected a bunch of other work on the car. This all came to a head when the air conditioner broke.

I could put a bunch of money into the car and fix the air conditioner, or I could live without. It was early fall, I figured I could wait until next year to deal with it. The next year rolled around and as spring started warming up I started to figure out ways to deal with the lack of AC. I’d park the car where it would be shady when I was leaving work. I’d crack the window when I parked it. I’d cruise around with the windows down. Summer came and it wasn’t great but I adapted to it. There were some unpleasant days on the highway stuck in traffic but most of the time it was livable.

The second and third years went similarly. I had just gotten used to it. My wife wasn’t enthusiastic about going places in my car without AC in the summer, but we’d take hers and everything was fine. My friends didn’t know about it until the second year and they all thought I was crazy when they found out.

This relates to software development with how we all face a point where we have to choose between continuing to do what you are doing or switching to something new. You may have gotten used to your screwy build process or awkward internal library, but everyone else who sees it thinks you are crazy, just like my friends thought I was with the car. Sometimes leaving your broken AC alone is the right call, but often you need to fix it to truly improve your life.

I never ended up fixing the AC in that car. It did end up having obvious costs sometimes. There was a job interview I ended up at all sweaty and I ruined some groceries, but overall the savings up front may have outweighed the costs for me in this situation.

Assuming you understand all of the costs that you are paying, you can try to make a reasonable decision. That’s easier said than done. Often times these costs manifest as infrastructure failings and are hard to see since they cause minor friction in day-to-day activities and not as more obvious standalone issues. If you are you used to setting up servers by hand, not having something like Chef doesn’t appear to be a problem. To find these types of costs, you need to talk to outsiders sometimes.

This can be interpreted as the blindspot from the Johari window. The thing that is obvious to most people but that you can’t see. This can also be described as boiling a frog. If you put the frog in a pot of boiling water, it will jump out. If the frog is put into cool water and heated the frog will stay in the water and be boiled. You can grow into a situation that you would never have tolerated to start with. Try to take this understanding and reevaluate situations from an outsider’s perspective. This can help you to understand what you have and fix the underlying issues.

Conway’s Law Cause and Effect

Conway’s Law is the idea that software represents the organization that created it. It is often described  wherein the organization creates design, like “Four developers make a compiler and you get a four-pass compiler.” But it could also make sense that it would go the other way. There is a three tier architecture and as you scale up you end up with teams for each tier. Is either causal relationship  more true than the other?

The initial paper describing the concept seems to be arguing for the former, but the examples I’ve seen have been more the latter. The team grew after the software started and then you brought in additional teams to scale out development that were given responsibility for a chunk of the system. I went looking for data about this and was unable to find any, so I’ll propose a thought experiment. You’ve got an existing system with three teams working on four parts. If you were to create a fourth team and not have them work on any particular part, how do you think it would go? I see frustration until responsibilities shift to make space for the new team. What about with the same three teams and the same four parts, creating a fifth part and having an existing team own it? That situation seems like it would work fine.

“Your org structure isn’t solving your problem. It’s an artifact of how it was solved before” – Adam Jacob (I’m assuming the one from Chef). This seems to support that the system grew the organization around it, not that the system reflects a preexisting organizational structure. If the system influences the organization to this degree, then that means as we architect software we wield outsized influence of the organization that will form around it.

This influence can’t fix all organizational structural problems, but it can be used to help guide the organization’s growth. I recognize that there’s a bit of a cart before the horse problem, in that if your organization has set staffing level budgets by team, you’re probably stuck with those and can’t easily reallocate each team’s staffing level to what you’d like the development to reflect. You can, however, structure responsibilities so each team owns multiple things, rather than one big thing. This would allow you to grow a team and then split it and split up their responsibilities to the resulting teams.

Checklist Manifesto

I recently read The Checklist Manifesto by Atul Gawande. The centerpiece of it was the case study of a checklist to prevent post-surgical issues. It was just some simple questions to be asked (1) before anesthesia, (2) before the first incision, and (3) before the patient leaves the operating room. Really simple things like what procedure is to be performed and the name of the patient.

You would think that since surgeons are highly trained specialists and everyone is highly motivated to save patients, that this simple list wouldn’t help much. But it prevented a third of post-surgical complications. Everyone knew you should do these things, but they were being overlooked or abbreviated. Putting steps in writing also empowered the nurses who had previously felt intimidated to speak up when they saw something go wrong, since they had a standard to hold the surgeons to.

By now you are thinking “that is interesting and all, but what does it have to do with software development?” Well, it made me think of an agile team agreement. The agile team agreement is a written plan for the team to do the work it has ahead of it and what is expected of team members, i.e., a checklist. A user story is supposed to meet certain criteria before being considered, then to be done with it you need to meet other criteria. Agreements help keep you from skipping out on the steps needed to build in quality that you know you should be doing but sometimes skip for simplicity or expediency. They also makes it ok for the rest of the team to call out someone who wasn’t meeting the standards you all agreed upon at the outset. The places I’ve worked that used team agreements felt like they worked better than those that didn’t.
That’s a gut feeling, and unfortunately, we don’t have hard quantitative measures for software quality to match the medical example. Qualitatively, I felt like we accomplished more with less, and built better software. We definitely had a better time doing it which was just as important.