jml's notebook

Thoughts from Jonathan M. Lange

2025-03-11

My weekly review

I’ve been using some version of Getting Things Done since around 2004. For better or worse, it’s a supporting pillar of my life.

One major part of this is the Weekly Review. The idea is that reality and systems diverge during the busyness of the week, and so it is essential to have dedicated time to get them back in sync, so the systems can be trusted.

The broad outline is:

Collect all the disparate inputs in your life into one place
Go through them one at a time and put them into the system
Go through the system one thing at a time and make sure everything is still relevant and accurate
Reflect

I’m being deliberately vague about “the system”, because I don’t want to rehash the entirety of GTD, I don’t want to be prescriptive, and because I think the weekly review generalises over a variety of systems. For me, my systems are a TODO list powered by OmniFocus, a calendar, and a text file in org format.

As of today, I have a personal weekly review and a completely separate one I do for work. This post is about my personal weekly review. Here’s its checklist:

Stay organized (weekly review)

Pray for wisdom & for God’s will to be done
Get everything into OmniFocus
- Review Apple Notes and extract to OmniFocus or org-roam
- Empty inbox portable folder
- Braindump active thoughts
- Empty in tray at home
- Empty kitchen in-tray
- Clear Desktop
- Clear Downloads
- Close all the tabs
- Empty personal email inbox
- Empty starred messages from WhatsApp
- Review past calendar week for actions
- Review upcoming four calendar weeks for actions
- Review nanny’s weekly plans spreadsheet and put things on calendar if necessary
- Put kids calendar stuff on nanny’s weekly plan spreadsheet
- Review emails in @WAITING
- Make tasks for each email in @REPLY
Do OmniFocus review
Empty OmniFocus inbox
Review “Waiting For” lists
Write a snippet for previous week
Pray, thanking God & committing week & plans to him

As you can see, it’s a lot. It normally takes me about 60-90 minutes to get through it, but I hardly ever do it in one sitting.

The bulk of it is going through all of the sources of “stuff” that exist in my life. This means open browser tabs, scraps of paper, files on my desktop, and that sort of thing.

Over time, I’ve had to add items for Apple Notes (where I sometimes write things down as it’s the only thing to hand), and WhatsApp (because people send messages that require action when I’m not at all interested in doing them), and our nanny’s spreadsheet (because we couldn’t get calendar sharing to work for her).

I generally find the calendar review the most draining part, because there’s a lot there, and it’s easy to get distracted. It’s super useful though, as it helps me make sure I’ve booked leave, filled in appointments, etc.

For email, “emptying the inbox” means archiving mail I don’t care about, and adding items to OmniFocus for things that I need to follow up, as well as tagging them. It generally does not mean actually doing the work, unless it is quick. Often I use my weekly review time to write those quick emails that I don’t want to write, because it involves awkward human interaction. I’m gradually getting better at this.

Once everything is in the OmniFocus inbox, I go through each item one at a time and make sure it’s phrased as a clearly executable action, and that it’s in a project and tagged, so I can find it later. OmniFocus has great affordances for this, and gives you a subtle but satisfying nod when its inbox is emptied.

“Do OmniFocus Review” means use OmniFocus’s feature for reviewing each project one at a time. In GTD parlance, project is a completable thing with multiple actions, like “Remortgage the house”. In OmniFocus parlance, a project can be that, but it can also be any group of actions, so for me, my review goes through each completable project and also each ongoing responsibility (like “Run the household”, “Be a good friend”, or “Take care of myself physically”). Different projects have different review cadences. When I review a project, first of all I decide whether I even still want to do this, and then I look through the actions and check if they are up-to-date, or if anything is missing. If the project has no clear next actions, then it is considered stalled, and I try to come up with something to move it forward. If there are actions that don’t seem relevant any more, or that I know I’ll never bring myself to ever do, I remove them, or mark them as “Someday/Maybe” (Vale, “learn latin”, vale)

Then I review my “Waiting For” lists. This feels like a bit of a superpower, because it means I can follow up on what people have promised me, or that I can notice that someone has fulfilled a promise which then triggers the next step in whatever it is I am setting out to do. Often, I send a quick message or email there & then, which the recipient immediately acts on. I find this bit particularly satisfying.

Where I end up

Once the administrative sections are done, I’m in a state where I can trust that my calendar has all of my appointments and none of my non-appointments, that I know what I need to do ahead of any deadlines or appointments, that there’s nothing lurking in a conversation or email or piece of paper that’s going to bite me later, and that my TODO list has all the things I need to do on it. That is, I’ve brought the Galahad principle to bear on my organization system, so I can relax and trust it.

The snippet

The part of my review that I enjoy the most is writing my “weekly snippet”. This is a practice I picked up at Google and have found very useful in all sorts of contexts since then

For this, I look at a list of completed tasks and projects in OmniFocus that ignores any repeated tasks like “Prepare school lunch for kids”. I call this my “Accomplished” perspective, since it gives me an indication of things I’ve actually accomplished. I jot these tasks down in an “Achievements” subheading in an org-mode file in Emacs which has a heading for each week, putting the tasks into a more human context. Here’s a sample:

Made sweet and sour chicken, and the kids liked it!
Got knives sharpened
Paid parking fine

I don’t do this every week, because sometimes I don’t want to and because I’m gradually learning that adhering to rigid, self-imposed rules isn’t doing me any favours in either outcomes or character. However, since the TODO list can be so overwhelming, and since I often feel unequal to the challenges of ordinary life, having a list of what I have actually done can be a comfort—and one based in reality!

In a professional context, listing out your achievements or accomplishments each week is super valuable for tracking how well you are doing your job, whether you are doing the things that you planned to do, and for making it easier to write up performance reviews and CVs. I would highly recommend.

My snippet also tends to talk about my feelings, books I’ve read or shows I’ve watched, people I’ve seen, relationship moods, vague thoughts about what I ought to be doing and why. This often stimulates creativity, further plans or ideas on what to do, repentance, or thankfulness, of which see more in the next section.

I used to have subheadings in my snippet: Achievements, Vs Plans, Plans, and Surprises. Nowadays I don’t bother, and just go with freeform text, with bullet points as I see fit.

The spiritual stuff

I don’t want to preach at you, but I also don’t want to hide that this is framed in the context of my faith in Christ, which is deeply important to me. Feel free to skip this (or any!) section.

I try to ground this exercise following the principles of the Lord’s prayer, reminding myself of what really matters, and of God’s provision and sovereignty.

Having finished the review and especially the snippet, I often have a bit more perspective on what happened in the last week and what I intend to do in the week ahead. This gives me a natural opportunity to thank God, repent of sins, and ask his help, as thinking about the week as a whole often surfaces things that are less clear in the day-to-day.

I don’t want to paint a picture of my own saintliness. Some weeks this ends up being cursory or perfunctory, and some weeks I actively avoid God, like Adam in the garden. But this is the direction I want to travel in.

I don’t want to be normative about anything in this piece, but if you do a weekly review, I’d recommend starting by reflecting on what really matters, or at the least taking a few minutes away from the whirlwind of usual activity to get some perspective. If you have or want to have a gratitude practice, I bet you will find more and richer things to be thankful for if you do so in the context of a week.

Areas for improvement

The whole thing takes too long given how busy I am in my life, but if I don’t do it, I really do feel less productive and less confident in the following week. It also feels hard to justify when things are busy around the house. I only get a small amount of discretionary time in front of a computer on the weekends, and after an hour at it, it feels almost embarrassing to say that all I did was get my TODO list a bit more organized.

I currently use OmniFocus itself to remind me of each of the steps. If I don’t do the review by the end of the week (during a window that goes from Friday afternoon to Sunday evening), then the tasks of the review itself become overdue, get marked red, and show up as a numbered badge on my OmniFocus app. I don’t think this is good, because then I lose signal on actually time-sensitive tasks, and spend much of the week feeling berated by own TODO list.

When reviewing the weeks ahead in the calendar, I often get distracted or go down rabbit holes. Sometimes they are useful, but I’d rather my weekly review was a breadth-first traversal.

Our kitchen in-tray is a mess, and that’s partly because I don’t really know how to bring these principles to bear in a shared environment.

Doing all of the administrative parts of this review leaves my mind feeling fragmented, as if I have done many things but cannot say what. Writing the snippet often helps recover this by bringing a sense of narrative and accomplishment, but I wish I could avoid it in the first place. Or rather, the feeling makes me think that there’s something fundamental I could be doing differently.

In the past, I’ve had one review for both personal and professional life. I prefer this, as it’s all just me, and this is a thing I’m doing for me that benefits all those to whom I have a responsibility. However, good infosec means not storing employer’s data on personal devices, and not giving employer’s access to details about my personal life. Yet another example of how a lack of trust imposes costs, even when that lack of trust is prudent and practical.

So what?

I wrote this because someone asked me to and I thought it would be fun. I don’t have an agenda to persuade anyone to do anything like this.

However, as a matter of general data consistency, if you have any sort of system you use to track your life, and your normal way of updating it is event-driven, and you want your system to be an accurate reflection of reality, then you need to have a regular poll of reality to make up for events that got dropped or for changes that didn’t generate events in the first place.

Put another way, things happen, and you aren’t always able to note them down in the moment, perhaps because you didn’t even know about them, so you need to check up.

If I was going to advocate for any practice here, it would be the weekly snippet, because it’s an easy source of insight, and because you can basically do it in any way you want.

2023-08-27

Deciding who goes to church

I thought it would be interesting to write up how we decided who goes to church today. Even though it is such an ordinary, low-stakes event, I think it sheds light on how decisions and groups work.

I woke up this morning after a rough night of sleep. I currently have a condition that makes it difficult for me to sleep through the night. During my many wakeful periods, I had noticed that one of my children, call them K1, was coughing all the time.

I decided to myself that they should not go to church. Even if they were feeling well enough, it would be irresponsible to risk spreading the illness further.

My partner, call them B, was also not feeling great yesterday. I guessed that she'd be the one to stay home with K1. I wanted to go to church and was feeling well enough, so the only remaining question in my mind was whether our second child, K2, should come with me or stay home. I thought he should come with me, although I wanted some time alone, B & K1 would have a more restful time if K2 were out. Also, church is good for K2, and I'm more than capable of taking him there.

I formed this plan in my head, but of course that's not how group decisions work. A group decision only works if everyone has the same understanding¹ of what to do², and everyone knows that everyone has the same understanding³ of what to do. Sometimes this can be "Do what the boss says", but that's not how things work in our house. Instead, we have a conversation:

Me: I don’t think K1 should go to church today

B: Yeah, you’re right. That means one of us will need to stay home, though.

I know this. She knows I know this. But now we both know that we both know.

Me: Okay. How about you stay here with K1 and I’ll take K2 to church.

B: Are you sure?

Me: Yeah. I’d really like to go, and you and K1 will rest better without K2 here.

B: Okay

And so everything worked out okay. No worries, no drama, just boring functional communication.

But on the way to church, I thought that this isn't rational decision making. Or perhaps better to say it's not a systematic approach to decision making. We didn't consider all the options or trade them off against the other. What are all the options? Well, we could represent this as a truth table where I'm A, B is my partner, and the kids are K1 and K2. Staying home is F and going to church is T.

A	B	K1	K2
T	T	T	T
T	T	T	F
T	T	F	T
T	T	F	F
T	F	T	T
T	F	T	F
T	F	F	T
T	F	F	F
F	T	T	T
F	T	T	F
F	T	F	T
F	T	F	F
F	F	T	T
F	F	T	F
F	F	F	T
F	F	F	F

That's 16 options. If you were forced to map my mental approach to the truth table, it would be something like:

Set K1 = F, because K1 has to stay at home
Set B = F, because I think she should stay at home
Set A = T, because I want to go
Look at what’s left and make an actual decision.

Internally, it felt more like a branching tree than a truth table.

But we can lean on this truth table thing a bit longer.

You see, we can't leave the kids at home unsupervised, and we can't send them to church unsupervised either, and we don't want to send sick people to church. So there are three things we want to avoid:

home_alone     = not (K1 and K2) and (A and B)
church_alone   = (K1 or K2) and not (A or B)
sick_at_church = B or K1

Putting these onto the truth table looks like:

A	B	K1	K2	not home alone	not church alone	not sick at church	OK?
T	T	T	T	T	T	F	F
T	T	T	F	F	T	F	F
T	T	F	T	F	T	F	F
T	T	F	F	F	T	F	F
T	F	T	T	T	T	F	F
T	F	T	F	T	T	F	F
T	F	F	T	T	T	T	T
T	F	F	F	T	T	T	T
F	T	T	T	T	T	F	F
F	T	T	F	T	T	F	F
F	T	F	T	T	T	F	F
F	T	F	F	T	T	F	F
F	F	T	T	T	F	F	F
F	F	T	F	T	F	F	F
F	F	F	T	T	F	T	F
F	F	F	F	T	T	T	T

Honestly, just looking at this is exhausting. Extracting the gunk, we get three valid choices:

we all stay home
I go to church by myself
I go to church, taking K2 with me

This is not how I normally make decisions.

So what’s the point of all this?

One version of this story is that I had a minor problem, did the first thing that came into my head, and it worked out okay.

Another version of this story is that even making simple decisions among people who are strongly aligned in terms of values (church good; spreading disease bad; rest while sick good; unsupervised kids bad) involves establishing consensus and at minimum acknowledging choices not taken.

Or we could look at this and say that even for easy decisions, it takes quite a bit of effort to break them down into their component subdecisions and to articulate the relations between them. Perhaps this is a thing to do sparingly, or perhaps it is a trainable skill to be drilled to the point of mastery.

I don't think I've ever seen someone explicitly present a decision table in my professional career, although some design docs have had tables that come close. I have certainly been involved in woolly discussions that go around and around without resolutions. Perhaps these would have gone better if someone had constructed a decision table.

Anyway, K2 & I had a great time at church, and K1 & B had a lovely quiet morning. That's probably enough.

See also Cost of decisions and Decisions.

P.S. By the time I got to writing this, I was being tickled by memories of reading some tech luminary advocating the use of "decisions tables" in documenting designs or system architecture or something like that. Thanks to Justin Blank who both first recommended Hillel Wayne's post Decision Table Patterns to me and helped me remember that he had done so. No promises, but I might follow up by applying that post to this decision, as an exercise.

2023-06-10

Ensuring quality enables throughput

Often in software development, caring about the quality of the things we are building is seen as an extra cost that must be justified on the potential benefits, such as improved user experience or lower maintenance.

From the engineering perspective, quality is frequently seen as an essential moral good. Sometimes we forget that the quality of the product and the quality of the code are two different things, although one influences the other.

Both of these are good, but I think they miss something important. Insisting on building high quality software is an effective means of /attention management/.

If a component or feature is low quality, then it is a source of distractions. Perhaps it causes bugs that need to be fixed. Perhaps its poor interface makes other work needlessly difficult. You can think of these bugs or difficulties as interruptions, like a child coming into the room to ask you to reach for something down from a shelf.

Whether or not you actually do anything about these interruptions is irrelevant, because deciding /not/ to do something still takes time and energy. In a corporate context, it can take quite a lot of time and energy, as people will keep wanting to talk about it.

Another way of saying this is that low quality software is unfinished. It still makes demands on your time and attention.

As you keep building buggy features, dodgy components and awkward interfaces, you add more and more sources of distractions. Your environment drowns in noise.

However, if you actually finish your work, if you produce high quality features and components, then you are free to move on to the next thing. Then, instead of you doing work for those components, they do work for you. Because they are trustworthy, you can build on them.

Thus, building quality in actually enables speed of execution. It increases cost, but it also disproportionately raises the ceiling on the adjacent possible.

2022-11-10

Waiting for Aragorn

I’ve gone on this rant to a couple of people now, so I figure I should write it up. This is more some feelings than an idea.

My general sympathies are roundhead, republican, and low church. I’m not particularly into the monarchy, aristrocracy, or a “great men” theory of history.

However, I was exposed to an extreme amount of Tolkein at an impressionable age, and that has left an indelible mark.

One of the themes in The Lord of the Rings is the glory of a good king. People (“Men” in the book) can only be their best selves when they are serving under a king who serves them and is worthy of them. Then they, in their service, strive to be worthy of their king.

I do not think this is how a modern state should be run.

And yet, in the much lower stakes world of business and building software, I do find myself and others yearning for good leaders.

Over the years I’ve worked with blind leaders, power-hungry leaders, spiteful leaders, well-meaning but ineffectual leaders, absentee leaders, and ill-starred leaders. Each time, I’ve seen people who are pouring their time, energies, and creativity into their work end up frustrated and disappointed. It’s not quite “lions led by donkeys”, but it’s close.

I know the world is complicated and full of compromise, and I know that no perfect leader exists, and I very much believe that people often need to step up, self-organize and take responsibility.

All the same, I see around me a longing for a worthy leader. Sometimes I feel it too.

2022-07-31

Got a new laptop

Hello again. Posting here both to break the drought and to check that all of this works from my fancy new laptop. I got myself a MacBook Air with an M2 chip, and I quite like it. It feels like a significant improvement over my old MacBook Pro, which is my most regretted purchase. It’s also the first time I’ve bought an Air instead of a Pro.

There are three factors that tipped me over the line.

The first is the price tag. A top-of-the-line Pro costs almost twice what an Air costs. That is a tonne of money that I would rather spend on other things, not least my own manumission from wage slavery.

The second is that I don’t really get to use my personal laptop all that much. My weekday evenings are largely focused on getting the kids fed, bathed, and safely in bed and the house in some sort of reasonable state for the next day of chaos. Between church, family outings, and childminding duties, I also don’t get a tonne of time on the weekends.

For a while, this lack of time combined with the sheer difficulty of firing up a laptop these days (so many updates, so little battery life) meant that I just used my work computer for the occasional blog post. Perhaps I just don’t even need a laptop of my own!

However, work is introducing MDM, and while I trust our IT manager, I don’t want to do any personal stuff on a computer that has corporate spyware on it.

Anyway, all of this means that while I am getting a computer, it doesn’t have to be a workhorse or a beast, because I’m just not going to do that much with it.

The third factor is that I think cloud-based development environments are becoming much more practical and attractive. The article The End of Localhost more or less convinced me of this. As long as I can use Emacs running locally, I’m sure I can find a way to have fun doing remote dev.

Anyway, the new Air is lovely. It’s very snappy, the keyboard is great, and I love the colour. I wish it had more USB-C ports, but that’s about it.

2022-05-28

Play is how mammals learn

The best way to learn is to play.

In order to play, you need to feel safe. You need to have time. The consequences of failure must be real but not lasting. The consequences of success, likewise, must be real but short lived.

Play can happen by yourself or with others. If playing with others, you need to trust them, because you need to feel safe in order to play.

To play well, we must throw ourselves into the scenario or game. That requires focus, attention, and dedicated time.

There must be a sense of abundance, or at least the freedom to “misuse” valuable resources. Time might be short, or resources tight, but you need to grant yourself a cheeky sort of permission, a license to be naughty, in order to actually play.

2022-05-20

Too much steering, not enough pedalling

For a while now, I’ve been trying to think of a replacement for the phrase that begins “too many chiefs”, as that phrase is way too racially loaded to be useful in any conversation.

When casting around, a lot of people suggested “too many cooks spoil the broth” or its variants. This doesn’t quite get to the heart of the matter.

The original phrase exists as a short-hand to describe a dysfunction where there are too many people directing work and not enough people doing work. “Too many cooks” is more about how some projects are harmed by having too many people work on them. If you try hard, you can bend the phrase to be about conflicting directions—maybe one cook thinks it needs to be sweeter and the other more savoury—but that’s not what you want in an aphorism.

The best replacement I’ve managed to come up with is “too much steering, not enough pedalling”. Anyone who has ridden a bicycle knows that if you’re on a bike and you don’t pedal, you stop. If you’re not pedalling fast enough and you steer too much, you fall over. Or, you don’t fall over, and instead inefficiently zigzag your way to your destination.

I imagine a cartoon with four or five people in suits sitting on the handlebars, jostling for grip and arguing about which way to go, with a lone cyclist on the seat, pedalling away as fast as they can, exhausted and dripping with sweat. The caption reads “Why aren’t we getting anywhere?”

The answer, of course, is too much steering, not enough pedalling.

2021-12-09

Cost of cruft

Our main product at work has grown quite a few features over the years, and from time to time we the engineers point to an unmaintained feature and ask, “can we kill it please?”

The answer to the question is almost always equivocal, partly because it’s hard for us to quantify or even communicate the cost of maintaining a single feature.

Here, I just want to rattle off a few of the costs in my head:

explaining it to new team members
if it uses a particular technology that’s not widely used elsewhere, maintaining that technology and training people in that technology
it interacts with other features, making the product complexity roughly quadratic with number of features
explaining why you aren’t going to fix its bugs
responding to production incidents involving it
apologising to customer support for temporarily disabling it in order to prevent more widespread damage
keeping the versions of libraries it uses up to date to prevent security issues
compile time / test run time cost, paid with every single change to the app
user confusion with a feature being present on one platform but not others
keeping the visuals in line with the latest branding style guidelines
localising it when expanding to new languages
remembering to delete & report on the user data in the feature for GDPR compliance
increased difficulty of making app-wide architectural changes
updating the feature when the underlying platform deprecates an API you were relying on

A lot of these problems can be mitigated by having enough money, and spending that money on awesome platform and internal tools teams.

Most of these costs are small enough at the time of incurring the cost that one can never make a rational argument for deleting the feature, especially because deleting is never free, but always involves planning, comms, public relations, data export, etc.

I think it was Alex Gaynor who first told me that “subscription” is the only valid business model for software, because there’s an ongoing cost of keeping it running (Apologies if I’m misrepresenting you, Alex). I kind of wish it were easier to quantify with features. I feel like we’re okay at estimating how much effort it will take to build something, but not how much to maintain it. It would be great to have planning conversations like we we could say something like, “it will take five engineers three months to build this, and then another 0.5 FTE for ongoing costs”.

2021-09-09

Turn the Ship Around / Drive mashup

I think a lot about Turn the Ship Around! by L. David Marquet. Out of all the books I’ve read on leadership or management, it’s the one that resonates the most strongly with me.

When I was a lot younger, I figured that as a manager, you wanted to make sure you had a bunch of smart people around you and then get out of their way. Any time you told them what to do, you were making a mistake, because you probably know less than them about their job, and because the act of telling someone what to do short circuits the bit of their brain that would actually think about the problem. By giving an instruction, you’ve reduced the net intelligence of your team.

I tried this a few times and it mostly ended in disaster.

Groups of people don’t naturally coordinate. It takes work to get a bunch of individual effort to cohere and add up, rather than cancel out.

Also, people sometimes do things badly, or do the the wrong thing, or take too long to do something, or spend way too much time on something unimportant, or completely overlook important details, or don’t know what to do, or…

I responded to this failure of approach by changing my approach! I tried to become more directive, and give more instructions. I worked on giving timely constructive feedback (outside of the context of code reviews, which I’ve been doing for a long time).

This was definitely better, but it still wasn’t where I wanted to be. I felt that I was the bottleneck of the team, that I was putting a ceiling on the growth of the more senior members of the team, and only giving the junior members on-the-spot feedback without giving them a way to actually grow. Helping them to do a job better without actually helping them get better at their job, if that makes sense.

Anyway, at some point I read Turn the Ship Around! and it opened my eyes. Marquet had the same ambition as me—at some point he vowed never to give a direct order—but actually thought about what would be required to make it happen. And, hoo boy is it a tonne of work. Reading through the book you get the distinct impression that he worked his arse off the entire time he was on tour.

Anyway, the big insight is that for people to have control, they must also have competence and clarity. Control means making decisions, taking initiative, having autonomy. Competence means actually being good at your job, being able to demonstrate that you are good at your job, and continually learning. It is about mastery. Clarity means knowing the direction you’re going in and what’s expected of you. If you have enough clarity and you have some commitment to what you see, then you have a sense of purpose.

So Marquet thinks that people in an effective organization need:

Control
Competence
Clarity

In his book Drive (which I have not read), Dan Pink suggests that for people to be happy they need:

Autonomy
Mastery
Purpose

And it turns out that control maps pretty well to autonomy, competence maps pretty well to mastery, and clarity to purpose.

I think this is fascinating. Marquet was most definitely not operating from the assumption that “if I make the crew happy, the ship will run better”, he was trying to figure out how to get everyone to be leaders. Pink (to my limited understanding) wasn’t concerned with organizational success or efficiency, but what makes individuals tick.

Put another way, my take on Turn the Ship Around! is that the whole thing is a framework for building an organisation around trust (which I think is underemphasised by almost everyone), and Drive is about individual happiness.

When I realised this, I had a bit of a “mind blown” moment. Is this a coincidence? It is that one author influenced the other? Or is there some deeper connection between institutional trust and individual happiness?

2021-08-06

Thoughts from a code yellow

A long time ago I was working somewhere with a data pipeline with severe problems. Borrowing a term from Google, I declared a Code Yellow and wrote a doc with a plan for getting out of it.

Being relatively new to the org at the time, I also wrote down some core principles for getting out of the Code Yellow. Reading over them recently reminded me that they are actually relevant almost all of the time.

Quantify the problem

Pretty much everyone agreed there was a problem and that it was pretty bad, but we had no consistent quantification of the problem.

Without numbers, we couldn’t know if we were making progress. Quantifying the problem led to better decisions and more motivation.

Fix the leak, then fill the bucket

Whenever we encountered a problem in our production systems, in our code, or in the data, our first reaction had to be “how can we change the system so it is impossible for this to happen again”? Only then should we address the problem at hand.

(Note: don’t do this if you are paged for a system serving live user traffic.)

This was difficult, because:

There is always significant pressure to just fix the problem
Identifying deeper solutions only added to our already-long todo list

Nevertheless, we can’t just solve problems, we need to eliminate problem generators.

Close the loop

Whenever a person or a machine let us know about a problem with our systems, we should fix the problem, and then tell them we have fixed the problem, ideally in the same forum they raised it.

For example, if we got an alert, we should:

Reply to it saying we’re working on it (ideally silencing the alert for the duration)
When it’s resolved, mark it as resolved
If there’s follow-up work to be done, link to that from the alert itself

This was important, because our goal was to restore trust in our system, and being trustworthy ourselves is a key part of that. Pragmatically, it also reduced interruptions, questions, and hassling, because people will know where to look for status updates.

Learn from failure

When things go wrong, we don’t blame anyone, but instead see the failure as an opportunity to learn. Put another way, accidents and mistakes will always happen, and it is our responsibility to build a system that can tolerate them.

Specifically, when something goes wrong in our production systems, we had to write a post-mortem, and then review it within the team.

This is absolutely not about blame, but rather about making sure we are actually “fixing the leak”. Post-mortems will give us valuable insight into which issues are hurting us most, and how we can systematically address them.

We started by being over-enthusiastic in writing official post-mortem documents, and then backed off as we become more familiar with the process and could incorporate that kind of thinking into our day-to-day work.

Build learning in

When we learned a better way of doing something, we changed the system (either our production system or our development processes) so that the new, better way was the default. Adding checks to CI is a great example of this.

We should be extremely suspicious of advice like “be careful not to…”, “make sure you…”, “watch out for…”. Humans are bad at vigilance. Instead, let’s make machines that check things for us.

I don’t think any of these principles are revolutionary or unique to me. I also don’t think that they are fundamental or complete or come anywhere near describing a system of thought. Instead, these were gaps between how I like to operate and where that particular team was at that particular time.

That said, I do find myself referring to them a lot. They sit in that big grey space between broad principles like “reliability” and more concrete processes.