Change failure rate

Change failure rate is the percentage of production changes that cause degraded service or need remediation; the hard part is agreeing what counts as a failure.

How to use this glossary

These pages explain common engineering and delivery metrics in plain language. Definitions vary by company, toolchain, and industry; we highlight typical usage and caveats. Nothing here is legal, financial, or professional advice, and it is not a substitute for judgment in your own context.

Metrics can be misused for surveillance or stack ranking. We do not recommend using them that way. DORA performance bands from research are contextual—not targets for individuals or hiring decisions.

See the Engineering metrics glossary hub for all terms.

Definition

Change failure rate (CFR) is a DORA metric: the proportion of changes to production that result in failure—for example incidents, rollbacks, hotfixes, or user-visible defects tied to a release.

The intent is to balance speed with quality: if you deploy often but most changes break things, the system is unstable; if CFR is very low but deploys are rare, you may be trading risk for batch size.

How teams typically measure it

Numerator: changes linked to incidents, rollbacks, or Sev-1/2 tickets within an agreed window after deploy.
Denominator: total production changes in the same period.
Definitions differ: some teams count any rollback; others only customer-impacting events. Consistency matters more than the exact formula.

Common pitfalls

Undercounting by not tying incidents back to a specific change or by using a narrow definition of “failure.”
Blame framing: CFR is for system improvement, not for punishing individuals; fear distorts reporting.

Related terms

Browse other entries in the glossary.

CodeKudu is the best agentic platform for engineering leaders

Stand-ups, retrospectives, and 360° feedback for your team all for $50/month with code EARLYACCESS.

Learn more →