Fork me on GitHub
#summit-info
<
2021-10-11
>
Chrystina Nguyen, Rhythmic Technologies15:10:54

Happy Monday to you all out there!ā€¦.

šŸŽ‰ 3
šŸ‘‹ 1
Brandon Baker (IT MGR - O'Reilly Auto Parts)17:10:11

Let me know if there's a better channel to throw this in, but I'm wondering how companies deal with error budgets when it comes to one service impacting another? Do both services have their error budget impacted? My educated guess is that the answer will be the impacted service should have been architected in such a way where this wouldn't happen, but I'm curious about the real-world answers šŸ™‚

Brandon Baker (IT MGR - O'Reilly Auto Parts)17:10:11

Let me know if there's a better channel to throw this in, but I'm wondering how companies deal with error budgets when it comes to one service impacting another? Do both services have their error budget impacted? My educated guess is that the answer will be the impacted service should have been architected in such a way where this wouldn't happen, but I'm curious about the real-world answers šŸ™‚

Ferrix Hovi - Principal Engineering Avocado - SOK (S Group)17:10:59

Don't turn it into a science. Both ends can do something usually and in the simple case they release both budgets with a simple co-operation.

Ferrix Hovi - Principal Engineering Avocado - SOK (S Group)17:10:00

It is not about accuracy or fairness, it is about the outcome of better stability.

Bryan Finster - Defense Unicorns (Speaker)18:10:00

Is there not product level budgeting?

Bryan Finster - Defense Unicorns (Speaker)18:10:41

What I'm hearing is that every service has a budget. That sounds nightmarish to track.

Brandon Baker (IT MGR - O'Reilly Auto Parts)18:10:35

Today we don't use the concept of error budget. I'm trying to better understand how others use it. I said "service" because that's how areas are broken down today (for us).

Bryan Finster - Defense Unicorns (Speaker)18:10:05

If my service depends on your service I promise you I will have zero trust in your stability. That's just my default position. So, I will work on backup behaviors if you aren't available. If no reasonable backup behavior is possible, I'll log the hell out of the error I'm receiving so we can identify your service is the cause so I can stay out of a war room at all costs.

Ferrix Hovi - Principal Engineering Avocado - SOK (S Group)18:10:05

Good addition. So, for unknown errors that cannot be pointed either way through monitoring is a shared issue. If the interoperability tests and monitoring are working, then it can be proven as somebody else's problem. Even then, backup behaviours should be implemented and unless the error budget bites that team until then, there is no promise of that ever happening. So, equilateral effect sounds like something that turns transparency to action rather than unilateral.

Jerreck22:10:13

that's from a service-owner perspective and not the perspective of someone doing the budgeting, which I think is what brandon is asking about, though, right? @bbaker8

Ferrix Hovi - Principal Engineering Avocado - SOK (S Group)07:10:25

Error budgeting. That should be from the perspective of "shall I fix errors since we are over budget or shall I implement new functionality and increase the risk of error"

Ferrix Hovi - Principal Engineering Avocado - SOK (S Group)07:10:39

The business interest to over or under error budget should be whether to expect amount of features to go up or the amount of support calls to go down. The service ownership would typically be the highest authority with subject matter understanding to set those budgets responsibly.

Brandon Baker (IT MGR - O'Reilly Auto Parts)13:10:11

@jerreck.moody - I'm thinking about both perspectives, but my viewpoint comes from more the service-owner perspective.

šŸ‘ 1
Brandon Baker (IT MGR - O'Reilly Auto Parts)13:10:48

@bryan.finster486 - What you said makes sense.

Brandon Baker (IT MGR - O'Reilly Auto Parts)13:10:30

@ferrix - Thanks! The idea of an error budget is new to the company. There are some who have attended these conferences and read the books. But overall most of this is new ground for us. Right now it's more of a "we're at 99.9% SLA, we need to do better", but there isn't the idea of "we're at 100%, we have budget for error". ...hopefully that makes sense.