Traditional monitoring tools like logs and metrics were necessary but not sufficient to debug how and where systems failed in CI, which relies on multiple, interconnected critical systems (e.g. GHE, Checkpoint, Cypress).
In this talk, Frank Chen shares how traces gave us a critical and compounding capability to better understand where, when, how, and why faults occur for our customers in CI. We share how shared tooling for high-dimensionality event traces (using SlackTrace and SpanEvents) could significantly increase our velocity to diagnose code in flight and to debug complex system interactions. We go from stories with early incidents that motivated further investment throughout Slack’s internal tooling teams to stories about gains in performance and resiliency throughout our infrastructure.
Please register for o11ycon+hnycon first, then register for this workshop. Conference registration is required.
Michael is a Platform Engineer at Honeycomb.io. Has worked with various public and private cloud providers over for the past 8 years. Originally was deeply rooted in system administration but has since gained fondness for infrastructure as code and developer tooling. He has been using Kubernetes + Terraform software pairing since 2017. In his spare time he is an avid PC gamer, enjoys cooking and tinkers with mixed reality.
We use cookies or similar technologies to personalize your online experience and tailor marketing to you. Many of our product features require cookies to function properly.