Post Mortem: Kubernetes Node OOM

Production issues are never fun. They always seem to happen when you’re not at work, and the cause always seems to be silly. We recently had issues in our production Kubernetes cluster with nodes running out of memory, but the node recovered very quickly without any noticeable interruptions. In this story we will go over the specific issue that happened in our cluster, what the impact was, and how we will avoid this issue in the future.

