Fork me on GitHub
Virginia Laurenzano NSA (Speaker)18:10:49

loving the actionable inclusion exercise ideas from this T-mobile talk

Phil Jochimsen (UW-Madison)18:10:20

Reminds me of the Crucial Conversations book

Phil Jochimsen (UW-Madison)18:10:22

Great story of making the hiring process by removing biases, including having the candidates perform in a real-work experience.

๐Ÿ’ฏ 1
Denver Martin - Sr. Mgr Cloud Ops Infrastructure18:10:33

I try to find people that is different from the others but still have a DevOps mindset.

Ganga Narayanan18:10:40

Not being "culturally fit" as grounds for rejecting a candidate has seemed vague to me. This makes sense, thank you!

Denver Martin - Sr. Mgr Cloud Ops Infrastructure18:10:18

I like people that have unique journeys into IT too.


What is the common path?

Denver Martin - Sr. Mgr Cloud Ops Infrastructure18:10:05

Degree in CS or IT, work help desk, then into back office, then IT. I do hire those too, however I am interested in why they chose that path... I have undergrad degree in Landscape Architecture and love to draw sketch etc... but CAD is what converted me to IT.

Denver Martin - Sr. Mgr Cloud Ops Infrastructure18:10:13

SRE - is our superstar team...

๐Ÿ‘ 1
Adam Shake - MediaMath18:10:00

Thats awesome @denver.martin!!

Neil Kalinowski18:10:58

Is an SLI similar to a KPI?

Adam Shake - MediaMath18:10:10

They can be the same thing Neil

๐Ÿ‘ 1
Adam Shake - MediaMath18:10:33

The real difference is that an SLI supporting an SLO has to be CUSTOMER driven. KPIs can be business or leadership driven

thankyou 1
Neil Kalinowski18:10:59

Makes sesne.

๐ŸŽ‰ 1
Ferrix Hovi - Head of DevOps - Siili18:10:55

@neil.kalinowski SLI is generally "how is it responding?". The other kinds of KPIs will look at "is it any good?" So a stable service does not have to be a good product but it is advisable.

๐Ÿ‘ 1
Ferrix Hovi - Head of DevOps - Siili18:10:30

just like the four metrics from State of DevOps are general hygiene. If the service has crashed, the UX does not matter much. If the CFR and MTTR both zero, it may still be the wrong product.

Lucas Melo (American Airlines Architect)18:10:35

Typical, just renaming titles does not work!

๐Ÿ˜ฒ 1
๐Ÿ’ฏ 1
Adam Shake - MediaMath18:10:00

YES!!! That happens across so many things @lucas.demelo

๐Ÿ’ฏ 1
Adam Shake - MediaMath18:10:34 (Link from current Slide) Art of SLO Workshop!

Denver Martin - Sr. Mgr Cloud Ops Infrastructure18:10:43

We took the SRE folks and embedded them to the teams that had issues. and then they came back together to priorities which teams were higher priority for clients issuesโ€ฆ then they swarmed the top issue constraints, then moved to the next problemโ€ฆ The SRE team always attend our BPM on the hunt for next larger issuesโ€ฆ and when they speak at the BPM everyone listensโ€ฆ

๐Ÿ‘ 2
Dave Stanke - Google [he/him]18:10:31

@denver.martin yes! That's an ideal engagement model.

Adam Shake - MediaMath18:10:48

Thats describing exactly what I'm working to stand up at my company!

Dave Fugleberg18:10:33

@denver.martin BPM?

Denver Martin - Sr. Mgr Cloud Ops Infrastructure19:10:20

Blameless Post Mortem (granted in the beginning it was not very blameless, I had to lead a book club with other managers using Dekker books โ€œJust Cultureโ€ and โ€œDrift Into Failureโ€ then it became more blameless)..

๐Ÿ˜† 1
Brian Gallop19:10:39

@denver.martin how long were/are your SRE's embedded in the other teams?

Lucas Melo (American Airlines Architect)19:10:00

@denver.martin Sounds like the "Netflix SRE model", where the SRE Team swarms in larger issues and act as consultants

Denver Martin - Sr. Mgr Cloud Ops Infrastructure19:10:48

It varies based on what is being observed and how the data is gathered and measured.. if there is good data then it can be short, 1 week, if it is harder where there are no measures then maybe 3 or 4 weeksโ€ฆ

๐Ÿ‘ 1
Adam Shake - MediaMath19:10:20

I think one of the keys is that you define, as much as possible, what that embedded engagement is before it starts.. so it doesn't end up permanant

โž• 1
Denver Martin - Sr. Mgr Cloud Ops Infrastructure19:10:26

we try to fix quickly and move ops issues to Dev quickly with $$$ driven tech debt data.

Adam Shake - MediaMath19:10:32

that doesn't mean you can't "re-up" after that period, but don't leave it open ended

Denver Martin - Sr. Mgr Cloud Ops Infrastructure19:10:50

Agreed, we work out a number of sprints for fixes based on how many handoffs are involved.. this way each handoff is dedicated to 1 sprint..

Mark Fuller19:10:21

@davidstanke532 How different are SLOs than NFRs? Seems like the SLOs are acceptance criteria for NFRs. And maybe an uncovering of NFRs you never knew you had.

Dave Stanke - Google [he/him]19:10:54

I've never heard of NFRs -- what is that?

Mark Fuller19:10:07

Non-Functional Requirements

Dave Stanke - Google [he/him]19:10:16

Oh, of course.

๐Ÿ˜† 1
Dave Stanke - Google [he/him]19:10:32

Well, I have two tweets to share with you about that. One sec...

Andrew Hughes - Manager, DevOps Service Delivery QA (TRIMEDX)19:10:15

@adam619 any chance you'd share that job description?

Adam Shake - MediaMath19:10:50

I think uncovering is more likely than them being the same thing. NFRs are work that need to be done (possibly), where as SLOs are a trending indicator of the customer percieved reliability of a service

๐Ÿ‘ 1
Adam Shake - MediaMath19:10:08

Let me double check if I can share that, its not public yet I don't think

Andrew Hughes - Manager, DevOps Service Delivery QA (TRIMEDX)19:10:13

We have a super immature devops culture/understanding at my org. I put this together (still very much a draft) to help develop shared language and expectations

Denver Martin - Sr. Mgr Cloud Ops Infrastructure19:10:29

we have to look at there path to see where they have used the tools and think if they are in the right mindset โ€ฆ very labor intensive to find SREโ€ฆ

๐Ÿ‘ 2
Andrew Hughes - Manager, DevOps Service Delivery QA (TRIMEDX)19:10:02

"...I'm busy making the application non-funcitonal" ๐Ÿ˜†

๐Ÿ˜‚ 1
Adam Shake - MediaMath19:10:16

Mindset is the KEY @denver.martin

Dave Stanke - Google [he/him]19:10:12

Okay, snark aside: yes, there's certainly overlap, as Adam describes, both are aspects of prioritizing work... work that may not be self-evident ("important user X wants a feature" is self-evident. "we need more reliability isn't)

Dave Stanke - Google [he/him]19:10:40

Also +1 to "uncovering"... we sometime say that we don't define our SLOs, we discover them.

Adam Shake - MediaMath19:10:50

Here is a slice of the JD Doc we have

Adam Shake - MediaMath19:10:56

Key Responsibilities
We are seeking a Site Reliability Engineer well versed in large-scale distributed systems. Someone who will own the reliability and performance of those systems ensuring that our customers have the benefit of highly available and extremely effective products. You will do this by creating a bridge between development and operations, applying your software engineering mindset to various topics inclusive of system administration, observability, reliability, and performance. You will utilize your deep experience to simplify processes through automation while developing production software to continuously improve reliability and performance.
We work with many languages and technologies critical to the success of our platform including Golang, Scala, Clojure and, C++.  Chef, AWS, ScyllaDB, Kafka, Prometheus, Kubernetes and many more. We expect that you have experience with most of these and also a passion for becoming proficient with many more.
You will:
โ€ข	Use data from our observability stack and incident trends to prioritize reliability improvements
โ€ข	Provide architectural guidance on our critical customer facing services
โ€ข	Contribute to sprint development, executing on availability and performance topics within our product roadmap
โ€ข	Mentor and consult with product, development, and operations to drive reliability best practices
โ€ข	Work with Product Management and Engineering teams to answer priority concerns for reliability fixes
โ€ข	Define SLI/SLO/Error Budgets
โ€ข	Improve observability across all services
โ€ข	Participate in On-Call rotations shared with development teams
โ€ข	Automate deployment capabilities and implement auto healing philosophies
โ€ข	Collaborate with development teams on best practices, infrastructure setup, and planning activities with a focus on stability and performance

๐Ÿ‘ 3
Mark Fuller19:10:56

I like it. Thanks for the insight.

Denver Martin - Sr. Mgr Cloud Ops Infrastructure19:10:29

Great Talk! Amazing the work and turn around a good SRE team can do! Thanks.

Dave Stanke - Google [he/him]19:10:07

Thanks everyone for attending and chatting!

Vaishali Deshmukh, Team Lead - Database Applications, Edward Jones20:10:04

Does anyone having issues with pulling video for "attacking the fuzzy end of value streams" from the library?

Ryan Dobson - Motorola Solutions - RadioCentral20:10:21

Hi everyone - here for any questions or comments

Eduardo Rodrigues Semensati (Procter and Gamble)21:10:06

I assume the Portal is custom coded in-house?

Trevor Condon21:10:40

How did you move to 100% automated tests when it came to the radio hardware (on target)? Are there any tools/frameworks you recommend having gone through that journey?

๐Ÿ‘ 1
Ryan Dobson - Motorola Solutions - RadioCentral21:10:07

@condontrevor our coverage is for the cloud services and backend. The device itself does not have that level over coverageโ€ฆ yetโ€ฆ

๐Ÿ‘ 1
Eduardo Rodrigues Semensati (Procter and Gamble)21:10:16

would you be able to share an architecture diagram showing the integrations to your APIs and systems to collect the data? After putting together our central offering, I have now teams asking for something similar, so interested on seeing what others did instead of starting from scratch @jonathan.akers

Jonathan Akers21:10:42

@rodriguessemensati.e Yes for sure. We'll reach out to you...

Eduardo Rodrigues Semensati (Procter and Gamble)21:10:25

thats awesome! Glad to connect, just let me know how ๐Ÿ™‚

Jonathan Akers21:10:54

Ryan and I will DM you

๐Ÿ‘ 1
Jess Meyer - IT Revolution (she/her)21:10:17

Thank you @ryan.dobson and @jonathan.akers!

๐Ÿ‘ 1
Jess Meyer - IT Revolution (she/her)21:10:33

Welcome @halfmoondad!

๐Ÿ‘ 1
John Ediger21:10:37

Hey bro! Hope you are well.

Bryan Finster - Walmart (Speaker)21:10:49

Is there a link to the GitHub repos?

Bryan Finster - Walmart (Speaker)21:10:51

Sorry, I meant the DXC playbooks. ๐Ÿ˜„

John Ediger21:10:44

Not yet @bryan.finster but our open-sourced Dojo training modules are at:

โค๏ธ 1
Ann Perry - IT Revolution21:10:28

omg - long day, sorry!!

Matt Cobby (NAB)21:10:48

But no-one ever got fired for choosing IBM SAFe!

Bryan Finster - Walmart (Speaker)21:10:50

I was just having a conversation about CAB and asking if the goal was value delivery or distribution of blame.

Bryan Finster - Walmart (Speaker)21:10:39

We shouldn't do things because they won't get us fired.

Chris Gallivan, FCA, Builder of JOY21:10:43

Actually - nothing even happens in a CAB. Itโ€™s a non event

Chris Gallivan, FCA, Builder of JOY21:10:13

How many times does something not make it through?

Bryan Finster - Walmart (Speaker)21:10:51

Depends on how people are feeling that day.

Bryan Finster - Walmart (Speaker)21:10:25

Only we are using ink made from francium dissolved in dragon tears.

๐Ÿ˜‚ 1
Ben Williams - Arvest Bank - Sr Data Pipeline Dev21:10:05

I would rather have a rubber stamp of technical people in CAB than an education session of IT people managers.

Bryan Finster - Walmart (Speaker)21:10:35

Even the technical people are unqualified. They lack the team's context, history of the application, etc.

Chris Gallivan, FCA, Builder of JOY21:10:39

Shouldnโ€™t that be in the pyramid ?

Bryan Finster - Walmart (Speaker)21:10:57

@halfmoondad how did you get consensus on how long it takes to go from idea to refined work?

๐Ÿ‘ 2
John Ediger21:10:21

This organization had all that tracked on a Sharepoint site. Typically though, the facilitator would ask questions of what typically happens and arrives at a reasonably accurate 'typical' time. It's good to reference actual recent use cases.

Bryan Finster - Walmart (Speaker)21:10:43

We struggle to get consensus from teams when we are facilitating one we move left of coding.

Bryan Finster - Walmart (Speaker)21:10:31

Pulling Jira cards history would certainly help.

Eduardo Rodrigues Semensati (Procter and Gamble)21:10:12

haha, "we are so freaking agile" ๐Ÿ˜„

๐Ÿ‘ 1
Chris Gallivan, FCA, Builder of JOY21:10:38

How do you ensure the teams donโ€™t ignore the improvements in favor of the day to day stuff?

John Ediger21:10:33

Not easy. Leadership support (up front). Champion on team, and the Product Owner to commit to a percentage of backlog items (on average) to be included in sprints.

John Ediger21:10:15

As leadership sees results we continue to reinforce the importance of this.

John Ediger21:10:21

coaches at the team levels AND at the executive level to reinforce this, we've found is also key.

Eduardo Rodrigues Semensati (Procter and Gamble)21:10:25

what is a good ongoing balance between new features and improvements. Considering improvements mostly = technical debt, is it right to assume the PO should reserve at least 20% of sprint capacity for the improvement piece?

John Ediger21:10:27

yes, 10-20% is a good rule of thumb - not necessarily every sprint but averaging out per quarter

Sandeep Joshi23:10:50

TechDebt should be first citizen in your backlog - not an optional element. PO should work with the team to put a priority and business value to tech debt elements same as features. So it gets prioritised accordingly.

๐ŸŽฏ 1
Eduardo Rodrigues Semensati (Procter and Gamble)23:10:17

good point I agree. Now, how do you deal with it when the POs are more business oriented than technical? How to ensure they dont let technical debt go on forever since they prioritize always on business value. I have my thoughts around it, but would love to hear what you think ๐Ÿ™‚

Sandeep Joshi23:10:49

In my experience, we seldom find POs who are more technical (except infraOps space). We often assume PO is an independent island - PO must collaborate with the team and the respective SMEs to understand what the items in the backlog means (if they donโ€™t know). The first step to bring the transparency through creating a single backlog. One key learning for POs and the team is to understand (Work = Work). If the value / priority of the work is ranked higher (irrespective of the type of work), it should simply be done. On the question when we need to balance, balance according to the rank/value (not based on type) ๐Ÿ™‚

Eduardo Rodrigues Semensati (Procter and Gamble)21:10:40

okay, I guess you just answered with your slide and the 16%

Denee (de-NAY) Ferguson - Director, Technology - Capital One (Speaker)21:10:13

What strategies have proven successful in reducing % of work that is unplanned?

Bryan Finster - Walmart (Speaker)21:10:50

Yes, it's a tool, not a goal.

Ben Williams - Arvest Bank - Sr Data Pipeline Dev21:10:27

But I printed it out on the plotter....

Christopher S Donahue21:10:55

Well Done @halfmoondad

๐Ÿ™ 1
Chris Gallivan, FCA, Builder of JOY21:10:09

Great job @halfmoondad !

๐Ÿ™ 1