The curious connection between cloud repatriation and SRE operations

I have a penchant for philosophy. I’m about three classes away from graduating, and every few years I tell myself that one day I’ll finish it. Thus, I know very well what is called in statistics – and in logic – a post hoc error, from which we derive the saying “correlation is not causation”. It is the logical error to assume that if event Y followed event X, event Y must have been caused by event X. The most famous call for this error came from Bobby Henderson , which illustrated the absurdity of assuming causation from the correlation with his chart showing that global warming has been caused by the decreasing number of pirates in the world. Yeah, that doesn’t make sense, but neither do a lot of graphs that people derive causality from. Just because two data points are mapped against each other doesn’t mean one caused the other. In many cases, it doesn’t even make sense to correlate the two. After all, pirates and global warming? Nobody takes this seriously. But it’s an important point to make as we delve into the question of the relationship between Site Reliability Engineering (SRE) operations and cloud repatriation.

To be clear, I’m not suggesting that adopting SRE practices results in cloud repatriation. But I suggest that there is a close and meaningful relationship between the two. The fact that Google, a cloud provider, created SRE as a practice is not a mistake. The model, mindset, and skills associated with SRE are integral to the successful operation of cloud infrastructure and services.

Cloud repatriation is real

The repatriation of the public cloud itself is a somewhat taboo subject in some circles. Consider the controversy raised by Andreessen Horowitz when he published “The Cost of the Cloud, A Trillion-Dollar Paradox” and suggested that companies were repatriating from the cloud and realizing significant savings in the process. Some would have you believe that this does not happen, but there is enough data and anecdotal evidence to indicate that yes, it does.

F5’s 2021 State of Application Strategy Report questioned the market about public cloud repatriation. Only 13% had apps backhauled and 14% planned to do so. A year later, that combined total has increased by 40 percentage points to 37% and 30%, respectively. This is not an anomaly, as several credible analyst firms report similar results. Interestingly, the repatriation rate is not globally universal. Both APCJ and LATAM are much less likely to repatriate than EMEA and NA.

I maintain that companies bring applications back from the public cloud, and the question is not “Are they?” but rather ‘How many workloads are they removing and where are they going?’ This is a question for the State of Application Strategy 2023 research.

For now, consider one possible catalyst for repatriation: SRE operations. Because even if the rising cost of the cloud is a driver of the desire to repatriate if you do not have the skills to operate as efficiently elsewhere – and therefore benefit from a lower cost – then why repatriate?

And we posit that it is the SRE operational practices and skills that enable enterprises to repatriate and maintain the efficiencies and cost savings needed to justify the decision whether to move these workloads to another public cloud, on site or on the outskirts.

Dig into the data

At first glance, there is a strong correlation between adoption and application of SRE practices and repatriation to the cloud, suggesting that organizations capable of operating in a cloud-like manner, i.e. that they have adopted SRE practices, effectively take back their toys (apps) and return home (on-site or elsewhere) because they can.

In other words, only 4% of organizations that have not adopted SRE practices have brought applications back from the public cloud. No less than 73% of those who adopted SRE practices also brought back applications.

Of course, adopting practices does not necessarily mean applying practices. So we looked at how organizations actually operate applications, systems, and infrastructure. Specifically, we looked at the percentage of their operations that use SRE practices. Perhaps unsurprisingly, this generated similar results.

Of those operating 0% of their applications, systems, and infrastructure using SRE practices, 81% do not inherit. Conversely, of those using SRE practices for 76% to 99% of applications, systems, and infrastructure operations, 54% have been repatriated. The point where repatriation seems to start gaining momentum is when organizations outgrow the use of SRE practices to operate more than a quarter (25%) of their applications, systems, and infrastructure.

Remember I noted that APCJ and LATAM were much less likely to repatriate? They are also much less likely to leverage SRE practices to leverage their applications, systems, and infrastructure. In fact, more than a quarter (26%) of LATAM and APCJ respondents (29%) were operating ZERO% of applications, systems and infrastructure using SRE practices. In the EMEA region? It’s only 5%. And in NA, even lower at 2%.

Significant relationship or curious coincidence?

There seems to be an indisputable correlation between organizations that adopt SRE as an operational practice and public cloud repatriation rates. But is it a meaningful relationship or just a curious coincidence?

I will argue that this is a meaningful relationship.

The practices and skills associated with SRE are well suited to operating a large-scale cloud environment. As I said before, it’s no mistake that it was Google that created SRE and literally wrote the book about it. The value of the cloud lies in its operational model, which can significantly reduce the cost per transaction, whether measured by HTTP exchanges or client sessions. This enables cost-effective scaling of digital applications and services.

Using automation and practices that tend to focus on significant incidents rather than non-disruptive events provides the cost-effective scale of people (and therefore their expertise) who are responsible for maintaining a high level of availability and performance.

Adopting and using SRE practices enables organizations to effectively scale their operations, whether in the public cloud, on-premises, or at the edge. And what the data tells us is that organizations seem to be using the capability to do just that.

About Leah Albert

Check Also

Manhattan DA Raids The Met, Seizes Antiques For Repatriation

The DA’s Manhattan office recently seized antiquities from the Metropolitan Museum of Art (The Met) …