Blameless post-mortems are critical to maintaining a positive culture of continuous improvement while also acknowledging room for growth.
Why it matters: The CrowdStrike incident affected millions globally, costing an estimated $1-2 billion in fixes. It underscores the critical need for learning from failures in our interconnected tech world.
What's a blameless post-mortem?
A structured analysis of an incident that focuses on systemic issues, not individual blame.
Aims to identify what happened, why, and how to prevent future occurrences. You stay focused on the actions, not the people.
Emphasizes learning and improvement over punishment.
Why it's crucial:
You’re intentional about maintaining open, honest communication about failures.
You foster a culture of continuous improvement. (Missed last week’s newsletter? Here’s your chance!)
You’re addressing root causes, not just symptoms. Root causes → longevity for overall system resilience.
Large-scale incidents rarely result from one person's mistake, if ever. These incidents are a sign of a process issue.
Here's how to conduct effective blameless post-mortems:
Understand what happened: Gather all relevant data about the incident. Focus on facts, not blame. List out action by action, along with a timeline of how each action played out, all the way to full recovery.
Analyze the impact: Consider immediate and long-term consequences. Look at effects across various stakeholders if applicable.
Identify root causes: Dig deep beyond surface-level issues. Look for systemic problems.
Generate action items: Be specific and actionable. Assign clear owners and deadlines, along with a plan to follow up.
Share learnings: Communicate these insights widely. Use learnings to inform future practices. We learn best from mistakes, and when appropriate, we learn from others when they share their mistakes openly.
Bottom line: Blameless post-mortems transform crises into opportunities for systemic improvement. They're not about avoiding accountability, but creating an environment where honest analysis leads to real solutions.
Want more? Paid subscribers get access to:
A comprehensive, downloadable template for conducting effective blameless post-mortems
A walkthrough on how I would run this post-mortem for the CrowdStrike incident (with the information I have available)
SOMETHING EXTRA:
🧩 Good news: Your iPad can now run Windows XP. Now that emulators are supported on iPads, someone had a little fun.
📺 Not sure what exactly happened with CrowdStrike? I enjoyed Theo’s video recapping the issue.
Subscribe to All Access to read the rest.
Become a paying subscriber of All Access to get access to this post and other subscriber-only content.
UpgradeA subscription gets you:
- Gain access to all historical content 4+ weeks old
- Receive a monthly deep dive on a leadership topic designed to make you a stronger, more influential leader