CodeKuduCodeKudu

Change failure rate

Change failure rate is the percentage of production changes that cause degraded service or need remediation; the hard part is agreeing what counts as a failure.

Definition

Change failure rate (CFR) is a DORA metric: the proportion of changes to production that result in failure—for example incidents, rollbacks, hotfixes, or user-visible defects tied to a release.

The intent is to balance speed with quality: if you deploy often but most changes break things, the system is unstable; if CFR is very low but deploys are rare, you may be trading risk for batch size.

How teams typically measure it

  • Numerator: changes linked to incidents, rollbacks, or Sev-1/2 tickets within an agreed window after deploy.
  • Denominator: total production changes in the same period.
  • Definitions differ: some teams count any rollback; others only customer-impacting events. Consistency matters more than the exact formula.

Common pitfalls

  • Undercounting by not tying incidents back to a specific change or by using a narrow definition of “failure.”
  • Blame framing: CFR is for system improvement, not for punishing individuals; fear distorts reporting.

Related terms

Browse other entries in the glossary.