PagerDuty's report The State of Digital Operations presents an aggregated view of data from PagerDuty's 16,000 customers and 700,000 users around the world.
That data is on a huge scale: 30 million events per day, filtered into one million alerts, 500,000 interruptions, and more than 55,000 critical incidents.
The report provides an analysis of data collected between January 2019 and April 2021.
Critical incidents rose 19% year-on-year, and the average incident cost US$126 in engineering time alone.
That increase was accompanied by an increased burden on technical teams. Those additional and inconsistent working hours – on average, two extra hours a day – affected employee turnover. In particular, users experienced 9% more off-hour (6pm-10pm) interruptions and a 7% increase in holiday and weekend interruptions.
And, the report states, "We found a statistically significant correlation: the more frequently users are involved in fixing problems off hours, the more likely they are to quit."
While organisations are managing to spread this load reasonably equitably, employees at small to medium businesses are more likely to be interrupted than their enterprise equivalents (46% vs 30% per month).
As a benchmark, PagerDuty considers two interruptions per month per user to be good, seven to be bad, and 19 to be a sign of burnout.
Not surprisingly, ChatOps adoption increased 22% during the period, compensating to some extent for the effects of remote working on peer collaboration by allowing engineers to drive the resolution process from a chat interface (eg, Slack and Microsoft Teams).
According to PagerDuty, organisations that use its system see over time worthwhile improvements in mean time to acknowledge (MTTA) and mean time to resolve (MTTR). However, the report notes that "operational maturity and digitally transforming businesses is a long-term investment that takes years, not months."
Interestingly, organisations using the PagerDuty mobile app had 40-50% faster MTTA than those with lower mobile adoption.
"Today, digital operations are a core business strategy. That means more complex systems, more rapid rates of change, and more pressure on the teams tasked with keeping those operations running smoothly," said PagerDuty chief product officer Sean Scott.
"Digital operations maturity is the difference between preventing an incident before it begins and losing
customers because it takes too long to remediate an issue. With this report, PagerDuty is sharing our unique insights on how the right practices can help organisations unburden their teams and bring order and intelligence to operations management,"