Often chaos engineering is either used as a buzzword to get more VC funding or dreaded as a headache to operators and developers. “We create enough chaos just by releasing software!” But if we can understand the role of Chaos Engineering in regards to Resilience Engineering and take a methodical approach to how we inject system faults, not only will our insights be more actionable but we’ll also be able to build confidence in overall resilience and impact our teams less. In this talk I’ll cover Resilience & Reliability, Resilience & Chaos, focusing on the problem, minimizing your blast radius while also maximizing the insights, monitoring best practices for chaos tests and building enough confidence to test an entire system outage.
通常情况下,混沌工程给我们的印象不是被用来拉风险投资的流行词汇,就是让开发运维头疼的可怕存在。当然你也可以说:“我们仅仅通过发布软件就制造了足够的混乱!”但如果我们能够真正理解混沌工程在弹性工程中扮演的角色,并有条不紊地将故障注入系统,那么这个实验就更具可操作性,更能让我们对整个系统弹性建立信心,那么,团队遭受到的打击也就更少。这个演讲会涵盖弹性及可靠性,弹性及混沌的内容,聚焦于如何在最小化爆炸半径的同时也获得最大化的观察结果,监控混沌实验的实践,并在整个系统宕掉的时候有足够的信心。Brian Wilcox Brian Wilcox 即将在 QCon 全球软件开发大会(上海站)2019分享《Scientific and Safe Chaos Engineering》
嘉宾介绍
A cross discipline engineer and user enthusiast Brian spends most of his time trying to combine the best principles of structural, software and systems engineering into scalable, robust and resilient systems. He’s been working at LinkedIn for the last 5 years and lately focusing on Resilience Engineering and Chaos Validation tools.
Brian 是一个跨学科工程师和实践爱好者,他花了大部分时间试图将结构的最佳原则、软件和系统工程结合到可扩展、健壮和有弹性的系统中。他在 LinkedIn 工作了 5 年,最近专注于弹性工程和混沌工程工具。
更多混沌工程的相关分享请访问 QCon 上海 2019 官网。
评论