This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-10-07
Channels
- # ask-the-speaker-track-1 (422)
- # ask-the-speaker-track-2 (356)
- # ask-the-speaker-track-3 (215)
- # ask-the-speaker-track-4 (278)
- # bof-arch-engineering-ops (2)
- # bof-leadership-culture-learning (12)
- # bof-sec-audit-compliance-grc (1)
- # bof-working-with-data (1)
- # demos (7)
- # discussion-main (1182)
- # games (73)
- # games-self-tracker (1)
- # gather (4)
- # happy-hour (38)
- # help (82)
- # hiring (14)
- # lean-coffee (8)
- # networking (20)
- # summit-info (101)
- # xpo-adaptavist (5)
- # xpo-anchore-devsecops (7)
- # xpo-aqua-security-k8s (2)
- # xpo-basis-technologies (2)
- # xpo-blameless (3)
- # xpo-bmc-ami-devops (1)
- # xpo-cloudbees (14)
- # xpo-codelogic-code-mapping (1)
- # xpo-dynatrace (1)
- # xpo-everbridge (2)
- # xpo-gitlab-the-one-devops-platform (1)
- # xpo-granulate-continuous-optimization (1)
- # xpo-instana (1)
- # xpo-itrevolution (9)
- # xpo-launchdarkly (1)
- # xpo-pagerduty (1)
- # xpo-planview-tasktop (3)
- # xpo-rollbar (1)
- # xpo-servicenow (2)
- # xpo-shoreline (2)
- # xpo-snyk (2)
- # xpo-sonatype (7)
- # xpo-split (1)
- # xpo-splunk_observability (8)
- # xpo-stackhawk (2)
- # xpo-synopsys-sig (1)
- # xpo-tricentis-continuous-testing (1)
- # xpo-weaveworks-the-gitops-pioneers (1)
Good morning, everyone! Looking forward to another fantastic day here at DevOps Enterprise Summit!
Reminder: Remember all those talks you attended the first two days of the Summit? Please submit your feedback for those! It’s so valuable for us and the speakers. And after all, feedback is a gift and sharing is caring! Enter your feedback for those talks here: https://events.itrevolution.com/virtual-agenda/ https://devopsenterprise.slack.com/files/UATE4LJ94/F02GHSEB604/feedback-does21us.png
Yes, I’m hoping you are having lots of mutually exothermic interactions!!! 🎉
I made the mistake of not putting all my TODOs (links, books) in my notes — I need to go thru all the Slack history to find them all! 🙂 I plan on writing some tools to make this a little easier, so anyone can do the same.
Learning from last year, I have a Google doc open the whole time where I type notes, etc.
:satellite_antenna: Kicking off our final day is none other than @michael_winslow, here to present Building Confidence in Your SRE Team :satellite_antenna:
Please welcome @michael_winslow Senior Director of Development and Engineering at Comcast!
MICHAEL!!! buckle up everyone!
AND NEWS FLASH! Last night, it was announced that @michael_winslow was promoted to Distinguished Engineer!!!! 🎉 🎉
Congratulations @michael_winslow - I remember speaking together at the DevOps Summit
Budget Analysis Comparison was my most recent beast! Comparing our Eng team's list of people + spend against what Finance was seeing. "and believe me, there were some discrepancies!" :rolling_on_the_floor_laughing:
(Go ahead and uninstall Microsoft Office. Except for Visio, of course. 😆 )
@michael_winslow you’re the most famous michael winslow in THIS community
Yes, we acquired SKY in 2018. I've visited their offices in UK ... so awesome!
They do some great stuff, I have worked there on their SAP system and the SAP team is quite jealous of other teams
I did another talk for DOES last year where we talked about using the Dojo format to work together across the pond!
I shall have a look and watch it - it will be useful for our new distributed development teams in terms of asynchronous teams
@michael_winslow what was your organizational structure looking like @ xfinityMobile? Were you set up as a "flat organization structure" where you embed SREs to engineering teams or more of the traditional top to bottom structure?
We were VERY LEAN. We called what we did DevOps. It was not as concrete as SRE.
I’m so delighted that we had @jpetoff and @cleng share the theory and practice of SRE at Google yesterday, so we can point it at people who say “we just operate the code”!!
@michael_winslow i love your curiosity and suspension of skepticism at the beginning with the team.
SREs are the class of humans that implement DevOps.... there are lots of tools too
If you don't have Culture, Automation, Measurement and Sharing in your SRE team, your code won't compile!!! :rolling_on_the_floor_laughing:
Devs: “who are you to tell us that we should define SLOs and error budgets for our code?!?” 😆 [go pound sand]
I saw PRODUCT at the top of the SRE pyramid...what is your take on that as a foundational need for success
100% agree. Product thinking in SRE or any Infra/Ops function is so important
Agree as this is the next big evolution at Nationwide to really embrace Product orientation
Long thread on this yesterday. The problem is that there's a gap in product skills in infra/ops. You can either try to upskill, or parachute in. Both have issues.
"Automate away the 100s of routine tasks which create the fog that impedes our vision"
Brent doesnt have time to document!
Confluence history: “his documentation was actually only five steps!” 😆
Just like real developers don't use version control!
That confluence page looks like "Interface" now in OOO terms (nothing concrete, all abstract) 😆
this is a great pattern for increasing collective knowledge
Watch the documentation grow! Watch the conditions and branches being discovered!
anyone else find themselves in a Confluence jungle of documents?
i “strongly encourage” my team to use github markdown pages for documentation in the same repos where the code lives. version control and discoverability are real needs!
@michael_winslow - your confluence page reminds me - my 2014 deployment nightmare 😞
When I was a junior engineer... "I still don't have access" was a mourning routine
Two things I took away from Brent: 1. I told our Ops folks that our Real Brent was not allowed to touch the keyboard 2. We made our Brent#2 put his thoughts into our Models (MBSE). When our Brent#2 left the company, thank God.
You can automate only when you know what to automate, so you need documentation
Great pattern! Note that the 2021 SODR found that documentation is a high performance capability! Also note that @jmrichardson1 and @emily356 discussed the importance of "if you think you understand something: write it down!" https://cloud.google.com/devops/state-of-devops/
To achieve this would be an absolute dream.
@steve773 @genek - in this quote, I see both fast and slow thinking on behalf of a product team. Thoughts?
@michael_winslow did you actually capture the numbers for toil vs engineering work?
I captured the amount of time saved per deployment and kept a running calculation for every time the deployments were run to the staging environment and above.
(suddenly an important thing to do quickly, cloud credentials…)
@michael_winslow - got to ask a stupid question, is Toil an acronymor just hard work?
Neither - see the definition above: https://devopsenterprise.slack.com/archives/C015DQFEGMT/p1633615232263500
i’m loving this animation/slide progression. it tells a wonderful story!
TOIL reduction!!! love it
The existing team was already great at scheduling on-call through pager duty. The main change we did there was to collect MUCH MORE information from every incident in order to create engineering tasks.
Did you hear when @jpetoff said "You can't build a wall and expect Dev Teams not to throw code over it?" I love that soooooo much!!!
How do you measure toil? I have a sense of how much I experience (and it's prob high) but being able to measure and show reduction would seem to be beneficial. Is this the DORA metrics?
Or, do measurements need to be more specific (x time writing release notes)
If you have a work tracking tool, use that. If you don’t, define it and ask your people. They’ll tell you.
We use Jira... fwiw. Almost everything is a ticket. Sometimes def of done is clear, sometimes less so. Sometimes abused, sometimes not.
Nice, dumping/passing off on the SRE team is not a right, it needs to be earned.
“make it the easy and obvious choice” - Michael implementing product management mindset to SRE offerings
Extent to which devs are using certain standardized platforms increases likelihood of getting SRE support
@michael_winslow - my wife just said "I recognize that voice"... Me - yeah that's DJ Boo Boo Wife - YES!
@michael_winslow - my wife just said "I recognize that voice"... Me - yeah that's DJ Boo Boo Wife - YES!
The family that watches DOES talks together stays together
proposed: the essential distinction between SRE and traditional ops is the political power to refuse handover.
Product => Xfinity Stream on (Android, iOS, Roku, Web, Flex) + Chromecast, FireTC, etc
How did you manage to be selective about your workload in a corporate environment @michael_winslow
May be one talk title "How not to attend meetings - Devs how to find & ignore your productivity enemy :)" cc - @topo.pal
@michael_winslow did you institute a “hand back” to the team if a system/app gets too unstable?
Error Budgets have not taken hold yet at Nationwide. Fairley new SRE team....leadership buy in is always the challenge
I've been trying to hard to get error budgets adopted right now. I'm having some success getting buy in from the field support team, who has to deal with customer issues when they come in. If we can use an error budget to tell them that something is going wrong before the customer they like to see that.
Also saw them become very popular with our engineers when they had started alerting on something bad (error counts) and were feeling the pain of that very noisy alerting while holding a pager.
Is there a different type of error budget or metric the field support team uses? That has been a problem I hit when I was trying to get them to use something I would eventually want as an error budget.
I'm trying to pick SLIs that are customer facing, so our account teams or module SMEs want to know if the query success rate is below our targets because it means something has happened in that environment that they can go fix to support their customer.
I'm kind of lucky that our technical account managers lean so far forward to support customers.
Haven't succeeded yet. Still preaching the good word and looking for converts.
Comcast is a customer, so it'd be real easy to get together and talk more if you are interested. I think I'd learn more from the conversation though.
“our products always exceeded their defined SLOs” — ah, yes, I loved how @cleng described some of the tough love here, and how SREs can leave the product at any time. A truly self-balancing system. 🙂
@michael_winslow - we have implemented error budget (for real) on teams that reach incident threshold (by quarter)
We focus on the outcomes for customers in phrasing the "whys" for the error budget (versus assigning blame/promoting risk aversion). Once a team exceeds the error budget, it will be on code freeze (duration is left up to the team). During this time, team is asked to come up with actions to de-risk new changesets (usually action items from post-mortem).
✨ Next, we're honored to welcome @farleys, who will share Putting the Ops in DevOps – An Infrastructure Story ✨
Very interesting. Thanks @michael_winslow! BTW we use Test/Failure analytics as opposed to an error budget so that we can enforce across all business units
Please welcome @farleys!!! (and one wonders what he did either really great, or really awful, that led to this new role!! 😆 😆
Thank you @michael_winslow,this strongly resonates! Toil has been the talk of town for us as well! Love how you stood up an empowered SRE team!
@michael_winslow Thank you very much for this fantastic presentation/discussion. Bravo!!!
Some very similar themes here to the fantastic talk from Capital One from @girija.rao @denee.ferguson and @jennifer.miles!
@michael_winslow Leadership was sold on error budget. We have a monthly meeting where go over error budget and a OKR to reduce the max errors that we were seeing . The SRE helped us with analysis and going to architecture
(Reminder: add your title and org to your Slack name — it helps everyone! 🙂
@michael_winslow Great talk! I'm going to be spinning up a new SRE org at Nike and your talk will be very helpful for our future
@michael_winslow great presentation. I've struggled with the concept of the "premium" offering; I have a bunch of "helpers" on my team. I'd love to hear more about how you drew the line.
@nasello.scott - Good luck on that, sounds like a lot of fun and rewarding
Nationwide: $46B revenue; #1 in numerous categories; 8400 technologists; $1.3B IT spend, 2700 applications
Stephen 30 Years with Nationwide, wow 🙂 (he was referring his family working for Nationwide)
Infrastructure now supports 60 product teams — “this is a new construct for us, took us awhile to be able to determine this”
Definitely interested in the follow-up discussion about the help @michael_winslow asked for with error budgets. Love to join a discussion with , @jpetoff, @cleng and others...
@cleng will be hosting a birds of a feather session on sre today!
I can share our experience with implementing error budgets for teams that reach incident threshold (quarterly)
This is where our conf value is and especially in-person all these hallway conversations - learning so much from our DevOps Enterprise community
“I drew a pink circle, not red as it’s in trouble, but just to focus our attention and infuse transformation into infrastructure” — @farleys
“I had a guilty feeling — I was so indexed on the software [dev] side of the side, that I never paid much attention to infrastructure and ops” — @farleys 😆 A familiar journey!!!
@farleys Was that code for “I used to complain about Infrastructure and Ops all the time?”
We actually did financial benchmarking using key vendors for those benchmarks in Infrastructure cost....peer comparisons on financials. We were way off the mark at first look so needed to rationalize whyf
speaking from the infrastructure/ops side of the house.... sometimes it's hard to understand how to apply agile to our world!!
We have started treating it that way in my firm - it's not an easy road as lots of folks don't quite get it
@nickeggleston we treated it that way....
“Product approach to I&O made sense: we found product orientation to make sense; we found that those teams that built and ran their own services to have better lead times, better employee engagement”
Big mystery: “given global nature of infrastructure, could we actually achieve value stream independence?” A GREAT QUESTION!
Service as a Product vs the other flavors of product orientations always interesting conversation. We try to treat them like Product teams with Tech Services Roadmaps etc. but in reality they support our real product teams that Build/Ship/Run real infrastructure and software
this is so awesome @farleys - driving product thinking into Infra/Ops is a big thing at Target at the moment!
Hey Luke.....I used to work at Target 🙂
Long time ago but did work in City Center office downtown MN. Lived in Plymouth.
@farleys Budget is in the components hand . We did sub-account for all of our components
Very interested about this as we organized infra into products earlier this year
We talk a lot about how product can be defined and I’ve personally settled on company by company what matters is only that we’re aligned on what the products are FOR US.
Charge back for services is hard. My org, Toolbox@IBM, is in the charge back business to internal teams.
^^^^ THIS. I have worked in both worlds... I vastly prefer not having to deal with chargeback models
“You have to own all elements of products; you have to own unit costs, SLAs, lifecycle, and understand what drives unit costs.” Holy cow, I don’t recall this being taught in HP OpenView certification course @farleys 😆
Yes stay tuned Charlie on the new roles we needed for product management.....
“driven by user empathy” for an infrastructure team! that’s awesome!
if you assume your users have other choices and your job is to make something that those users love, yet work for your business. good things will happen
FINSIM: Financial Simplification: created in concert with Finance.
Holy Moly - @farleys - you are articulating my current challenges and giving me so many great ideas on who to progress
44 product teams!!! insane...
That 44 is really just Platform product.....meaning we unitized and charge for. Those role into Product Suites which are departmentalized
This is the question of Team:Product cardinality. Default assumption of 1:1 is often not valid.
Absolutely true http://Charlie.tg one should not assume a 1:1 relationship of team to product. We have large products supported by multi-team and small products where one team support multiple products. When I say that I mean products that are unitized and have their own backlog of work. We try to create product team alignment that makes sense but often a single team might own a couple products and vice versa.
All columns on right are charged for, unitized. @farleys Is there an easy definition for “unitized?”
@farleys I don't see underlying things like circuits or routers/switches in play here... how are those costs accounted for?
Interesting @farleys, so you consider every vendor product also as a product, example Tomcat? referring to this picture
Was there something that you thought would fit the category of a service team at first but needed to change later?
IAM unitization — high res screenshot here from his slides • Ping Cloud Director: $0.59 / user • internal IAM: $269
The semantics of "Product" vs "Capability" are very interesting. 15 years ago at Best Buy I was a "Product Capability Manager" - the richness of that concept has only dawned on me in recent years. @nicole.forsythe is thta title still used?
Sometime yes on the specific product question like Tomcat or Oracle. But not always unitized at that low level. But you will see the Database Product Suite would have those low level sub-products that we must support and help dev teams with from an infrastructure perspective
My company has also been more focused on the development side for Agile and DevOps (and still is). How does this structure really help the development teams delivery faster/better?
what's the difference between tech delivery lead and tech delivery professional?
This comes directly from my mentors at https://www.humanizingwork.com/, but I often use the definition of Product Ownership as “Great product ownership is learning and communicating what to build (including who and why) in collaboration with a cross-functional team and other stakeholders in order to maximize the flow of value through that team.”
How does this relate to what Gartner is calling Digital Humanism (if you know)?
They help the software teams first by having product teams infuse self service automation, help software teams with monitoring infrastructure components, and provides feedback mechanisms for software teams
I love all your focus on helping everyone learn thru this transition, @farleys — and your obvious standard for excellence and clarity!!! I told @mik how blown away I was by how you tackled this!
This is making me want to look at moving my insurance.... so much great thoughts and actions...
Great presentation. There is soooo much here that it could be unpacked into a multi-day conference.
Just surprised how Infrastructure engineers were not influence by the DevOps movement over the years and how it could apply to them. Software teams were all over it at Nationwide. Thus the surprise to me when I took this new role.
You re not alone, still hearing we don t need devops for infra… However including them in the clients for our platform, probably even the first to work with to add self service, automation to infra services, as they re building blocks for all other products and if we don t get this velocity first, it will be difficult to serve the business products
totally agree and personally I have landed on infrastructure needs to infuse with Software Teams and also embrace DevOps for what they personally Build/Ship/Run in engineering. If infrastructure is not thinking about continuous flow, automation in all we do, along with all forms of monitoring that both Software teams and Infrastructure product teams can leverage.... then how are we really helping the DevOps movement.
The slide pack has so much to takeaway - probably I need to playback multiple times and unpack this. Blown away by the effort behind the project to product transformation. Awesome @farleys
A great book on internal costing/economics is https://www.amazon.com/Internal-Market-Economics-resource-governance-principles/dp/1892606313 - Dean has kind of been a "voice in the wilderness" on this
“I had passion for how we respond to product feedback: feedback doesn’t mean a meeting — people don’t feel heard.” So good. As @farleys suggests, no amount of meetings make people love a product!
Yes Continuous Improvement Culture and discipline in our backlogs
$25MM in identified savings that we’ll capture in the next couple of years
When do you talk about savings in terms of "This is what we will save this month" to "This is what we will save in the next couple of years" I try to sell my cost saving efforts but it hasn't gained as much traction.
@farleys - Love the framework and the ideas behind the project to product transformation. Know I realized I was undermining the "Platform as Product" in a complex enterprise. This is real eyeopener. Probably opensource all these ideas or probably an addition/reference implementation to @mik book.
Working directly with Mik on next evolutions of Product adoption at Nationwide in how we associate value streams to our products
We are a tasktop customer for integration of our DevOps tool chain. I am personally working with Mik on his new value stream tools and how that might help us with our Product Orientation and thinking as we try to map more at the enterprise level
Amazing work on this @farleys! One of the best examples of product orientation for operations that I have come across in large enterprises!
Thanks Mik and can't wait to get working with you on how we can partner in the evolution with Nationwide. Gave you a few Kudo's along the way today
Container platform based on Rancher is a growth product — “we’re investing in it, expecting more customers; not focusing on savings.”
i love the distinction between “efficiency” products versus “growth” products. they should be treated differently (three horizons/zone to win).
@farleys this is terrific. I'm talking to multiple large organizations attempting this. Where did you find the product mgmt talent for the engineering offerings, primarily? Engineers upskilling, or product mgrs learning the platforms?
Not sure if you can read....but hot of the press this week on what our CTO's love about our transformation, CTO means CIO at Nationwide
One question on "Vendor lock-in" in most of the enterprises, how did you or dealing with that? @farleys
“Previously, people don’t know what they’re paying for: mainframe? storage? Now they know: databases, compute, storage, mainframe MIPS, IDs.” Wow.
I can't find a link to this session in the "Library". Am I missing it? https://videos.itrevolution.com/?_ga=2.86338463.608380314.1633358218-1611712249.1623960948
Yes Brian IT Ops is recovered either thru Product Units or some still allocated across business unit. The Unitized model is driving transparency and alignment in what is being consumed and actually driving consumption reduction.
@farleys Wow, thanks for sharing so much about your transformation to a product centic model...pure gold!!! Great job!
“I&O group now has top quarter Gallup engagement scores; people now know what’s expected of them; they’re accountable for plan/build/run and to delight your customers” ❤️ ❤️ ❤️
Vendor lock is real and needs regular working especially CLOUD lock-in, but also platform lock-in, and even SaaS lock-in or big purchased App lock in. We struggle with it.
@farleys - did you apply anything like Wardley Mapping to think through some of these challenges?
We did not Scott but will look at it.....thanks for the tip
@farleys Steep learning curve with Warley Mapping for sure, but can be helpful to identify duplication, in appropriate methods, technical debt, etc. Here's a good thread as an overview: https://twitter.com/swardley/status/971026844664332289?s=20
A pretty good talk from Simon Wardley : https://youtu.be/NnFeIt-uaEc
Loved the presentation, @farleys! Did you have a team of internal champions for training / Dojo/ consulting etc?
Your verticality in a provider determines your lock-in. Accenture has an amazing readout of services comparisons and lock-in potential in their "cloud canvas"
…I love that these teams eventually became sherpa into the cloud, @farleys
Yes we created a Product Champions Model and Guilds to support
“hopefully that’s an interesting infrastructure story.” ummm… yeah. SO MUCH!!!
One thing I could tell you after seeing @farleys presentation, most of the enterprise will reboot their platform as product transformation work
🌟 And now, we welcome @kboth_does @christina and @aalvare2, here to present "She’s Not Dead Yet, Jim": Vulnerability and Retrospectives in Emergency Medicine 🌟
@farleys what kinds of help are you looking for from the community?
@kboth_does, @aalvare2, and I are here for live discussions. Feel free to ask any questions!
Very proud to be a part of this community...we learn so much from each other so humbled by your comments and I have learned much from everyone at the conference this year.
Please welcome Dr. @aalvare2 Clinical Assistant Professor, Emergency Medicine, Stanford Medicine, who is responsible for Director for Well-Being and Human potential, on implementing blamelessness in the ER. Co-presenting with @kboth_does and @christina
I just realized @genek’s books are organized by color. As a former Librarian, I both appreciate and shudder. 😜
That marvelous achievement was done by my boss, @mvk842 . I was so proud I had enough great books in all parts of the color spectrum!!! 😆
@nicole.forsythe - I know…the Dewey Decimal System weeps, but the visually pleasing outcome cannot be denied. 😬😂
I was a college librarian so Library of Congress FTW! I organize mine by my own conceptual linkages…but it’s definitely something I think about a lot.
Slides for current talk if anyone needs them! https://github.com/devopsenterprise/2021-virtual-us/blob/main/DOES21_Blameless.pdf
“what can we learn from the high tempo, very high consequence world of emergency departments”
For people who want some additional decoder rings
, here's the resource list from the end of our presentation (with all the book citations we mention):
Chatter - Ethan Kross
Awaken Compassion in the Workplace - Monica Worline, Jane Dutton
The Checklist Manifesto - Atul Gawande
Retrospectives for Humans - Courtney Eckhardt
The Centre for Compassion and Altruism Research and Education (CCARE, Stanford)
Dare to Lead - Bréne Brown
Incident Metrics for SRE - Štěpán Davidovič
Behind Human Error - Woods, Dekker, Cook, et al.
Just Culture: Who are We Really Afraid of? - Steven Shorrock
Just Culture - Sidney Dekker
Tribute to Dr. McCoy - https://youtu.be/IAsaHZ-m3k0
The links didn't paste well: • Chatter: https://www.amazon.com/Chatter-Voice-Head-Matters-Harness-ebook/dp/B087PL8YVQ • Awaken Compassion: https://www.amazon.com/Chatter-Voice-Head-Matters-Harness-ebook/dp/B087PL8YVQ • Checklist Manifesto: https://www.amazon.com/Chatter-Voice-Head-Matters-Harness-ebook/dp/B087PL8YVQ • CCARE: http://ccare.stanford.edu/ • Dare to Lead: https://www.amazon.com/Dare-Lead-Brave-Conversations-Hearts-ebook/dp/B07CWGFPS7 • Incident Metrics: https://www.oreilly.com/library/view/incident-metrics-in/9781098103163/ • Behind Human Error: https://www.amazon.com/Behind-Human-Error-David-Woods-ebook/dp/B009KOE1W0 • Just Culture - Afraid: https://humanisticsystems.com/2016/11/24/just-culture-who-are-we-really-afraid-of%ef%bb%bf/ • Just Culture: https://www.amazon.com/Just-Culture-Restoring-Accountability-Organization-ebook/dp/B07H9G6L68
“I am the Director of Well-Being at Stanford emergency medicine clinically. I’m an emergency physician at Stanford. I serve as the associate residency program director focused on quality patient safety and process improvement. I also serve as the co-chair of the physician wellness forum at Stanford. By the way are my friends here. And so please feel free to call me by my first name.” — @aalvare2
haha, yes, Al'ai is very down to earth
The presentation from Dr. Chris Strear, Idealcast with Trent Green, healthcare chapter of HVE by Dr. Spear, and discussion of the new case study in DevOps Handbook v2 have me so excited for this talk. I've started reading books about improving healthcare it is so interesting. So great to be able to learn things from them too!
focus on individual, team, and then environment — individual focus helps avoid finger-pointing.
bringing people back to the shared goal of saving peoples’ lives. yesssss. excellent!
Very interesting @kboth_does! Thanks for the references as well! Been looking forward to reading Chatter!
We also have a resources slide at the end of the slide deck, Chatter is on there
leaders set tone by sharing their own mistake first; “I was nervous when patient first arrived and went into cardiac arrest; my focus caused void in leadership” (!!)
one important technique promoted by dr. amy edmondson for promoting psychological safety
MrsK is amazing at doing this in her clinical skills retrospectives - fascinating in seeing the difference between education leaders and management
@nickeggleston No, if it 's immediately in the moment post procedure debrief, it's okay to be small. Everyone's stressed and tense, even sharing a small mistake can make a big difference.
"what's the 1% change I can do to better the outcome?"
focus on 1%: what is that small change that could change/deflect the outcome; especially for devastating events; that we can actually achieve
A mentor of mine recently shared the concept of moral foundations theory as a way to create a blameless environment that’s required for the psychology safety. He recommended the book https://dividedwefall.com/2018/07/15/the-righteous-mind-moral-foundations-theory/#:~:text=In%20his%20groundbreaking%20book%2C%20%E2%80%9CThe,%2C%20Fairness%2FCheating%2C%20Loyalty%2F.
Interesting talk - excellent - @christina;I love @aalvare2 this slide and his explanation around this :
inspire ownership: “Dr. Christina did a great job; how about we have her lead the rest of the debrief” (decision made before-hand)
We personally don’t because it would impact the psychological safety of the room, but we did debate it. (Not that we are in the same industry and not that you asked me.)
Generally not - there are not facilities to do that in the hallway 🙂 And it could undermine the safety of the conversation.
I find a lot of value in pattern matching problems but it seems like you are saying iterative learning is more important. Thats a new way of thinking of things for me.
"The goal of debriefing is to unpack and create a shared mental model."
We use the term project post-mortem so frequently that we forget that the term comes from medicine! This is so awesome!
This is a good point. Noting that this talk uses the term "debriefs" and not "post-mortems" because of the very real correlation in emergency medicine. Curious your take @aalvare2.
After Action Review is a better term, in my opinion. It takes away the stigma from the word, “Debrief”
post mortems are fine for physicians, although again, it implies someone died. We use AAR for celebrations, too, when things have gone well and we want to highlight positive deviations in practice.
It’s interesting how every industry has verbiage. From Captain Marquet’s Turn the Ship Around, they got away from “debrief” is it was laden with meaning and tradition. I often prefer “retrospective” but a team inoculated against Scrum by bad experience that doesn’t work.
I like Learning Reviews. It does, however, also imply we have to learn something (we do… though in the heat of the moment, it may feel gaslighting for those who’s already learned something, ie, they were directly involved in a medical harm, etc).
AAR is agnostic to whether it was a positive or negative outcome. It’s just a review.
debriefs
are the quick, immediate recaps; case review
is the later, bigger discussion
there are so many parallels here to the technology world — multiple, different perspectives: nurses, doctors, social workers, …
Technology has so much to learn from other industries - instead of trying to reinvent stuff.
overriding goal: ensure shared mental model — we didn’t all see/feel the same thing; give people heads up beforehand, so they’re ready, vs. processing/surprised in meeting, feeling defensive (so much pre-work for the leader!)
The pre-meeting heads up make a big difference in the level of psychological safety the team feels
In this space - you do not waste time like "Tech" in meetings, because it is all about "Saving Lives" - The one thing what we could learn (Legacy Enterprises or old mindset) from this.
a tough case, where patient died; lots of mixed emotions; after resuscitation; surgeon not happy; why? shared perspective, but I want to hear from you b/c no time to brief senior, I stood next to surgeon, to ensure he/she not alone; I never expected trauma surgeon to say “I screwed up”, because so different than norm. It was huge, b/c we had obviously created safe space for professionals to share mistakes.
"Compassion allows people to move past self blame and into a mode of curiosity - what can I do better?"
@aalvare2 referring to Mindfulness 🙂 Oh yes - not just in Med and but everywhere
how do you coach people on self-blame? I think people need to experience the negative emotions, like grief, but not feel shame; don’t try to cheer people up, b/c it sets up internal conflict; you’ll have many mistakes, and you’ll feel crappy, but this is where compassion is so important. Can’t do coulda/shoulda/woulda game. Be kind to yourself.
Also fits with the points that @jmrichardson1 and @emily356 made in their workshop yesterday about supportive connections 🪶 and just like me
"Just like me" - came up with @jmrichardson1 and @emily356 and now here
How does legal risk management feel about admissions that someone screwed up? Often they are very concerned about potential lawsuits..
great point. There’s a lot of work about talking about mistakes and even admitting it to patients and how this minimizes litigation.
Here’s someone at Stanford Law who leads this effort: https://law.stanford.edu/press/medical-errors-patients-want-doctors-hear/
high consequence ===> someone died, with all sorts of ramifications to individuals, teams, and orgs.
@aalvare2 - I'm interested, how did you get introduced to the DevOps / SRE community? What are you most surprised about when working with this community?
Al'ai and I met through a year-long compassion cultivation mindfulness program at Stanford, we also share that program in the resources slide.
Yes, I’m honored to be here, and grateful for the opportunity to work with @christina at Stanford Center for Compassion and Altruism Research and Education via the Applied Compassion Training.
In essence, this is a manifestation of our training… applying compassion to each of our workspace.
I've always loved hearing from the Physicians in our community such as Dr Richard Cook, thank you for being here and your presentation !
Compassion and movement away from blame is a form of empowerment - so people can shift the focus to how one can do better, 1% better
If you think of something the next day that you could've done better, do you find ways to discuss that with the team again?
debriefs can also be about celebrations: after incredible resusciatation: let’s pause and celebrate: “this is what it feels like to save a life” (goosebumps too!)
"This is what it's life to save a life." LIke @emily356 said yesterday, focus on what's wroking well
@jmrichardson1 yesterday talked about the importance of recognizing the good stuff, too! Very similar thematically.
Well done @aalvare2 - I am not surprised, you will be keynoting now most of the Tech Conferences 🙂
@aalvare2 - I am going to say this to my kids every single day from now on! (In addition to my usual, “make good choices!” as they get out of the car. LOL. Asian mom style.)
We can also have a whole conversation about responses to self-blame
• do acknowledge the feelings without trying to cheer the person up or negate their feelings in the moment
• do know if it's appropriate to respond with relatability or not be relatable in response
@aalvare2 small groups that crowdsource and then discuss subtopics on a theme
(I would love the Incident Tech Managers are like ) @aalvare2 - Love your work
thank you. To be clear, this takes an entire team for after action reviews to work effectively. It takes training and practice.
IMHO - What I would say - our Incident managers in Enterprise IT miss.
to leaders: in emergency medicine, people can die, or have long term complications; you can’t say that it’s ok. but also recognize that there’s only so much can do; what are systems issues vs. inherent to disease itself massive heart attack: not right: how can medical in top flight institution prevent a heart attack? maybe interruptions <--- in our control ! or maybe it’s because of the pandemic, when we knew so little about pandemic <--- that is out of our control
All the book recommendations will be in the resources slide
designing for well-being: how do leaders use language to create learning and psych safe org
Language matters - lots of people forget this and it often causes issues in retrospectives
Love Checklist Manifesto - I give it to all of my teams leads to help with documentation of work...
This is an incredibly valuable session and perspective, thank you
At Stanford: “peer review” ---> “case review” ; it’s nota bout the peer who already feels shame; it’s about dissecting the case, understanding how we get to the outcome; and we protect physicians and frontline workers.
"Just like you, I've made many mistakes. Just like me, you'll make mistakes, too"
Before, would get pages and pages of procedures and not quick actionable checklists... this book helped them see the difference.
I have to do what @genek does - Holy Cow. So much in this talk
Hard to keep myself from smiling during the call, b/c the recording felt so magical when everything we've prepared for came together
Kurt: don’t use word “why” — use “what” or “help me understand”; “why” asks a judgement or agentive: who can we hang/blame for this?
Courtney Eckhardt's talk: https://www.usenix.org/conference/srecon19asia/presentation/eckhardt
There are direct lines to be drawn between our response to self-blame and the three components of occupational burnout: 1. increasingly frequent feelings of emotional and/or physical exhaustion 2. increasingly frequent feelings of inefficacy 3. increasingly frequent feelings of cynicism / disengagement
Applied Dekker --- amazing to hear and see in action...
did you do everything you could have done in the moment - yes
@kboth_does: avoiding Why questions seems similar to how @allspaw has moved incident inquiry away from Five Whys.
Also see Courtney's talk: https://www.usenix.org/conference/srecon19asia/presentation/eckhardt
I’m going to guess she quotes Chris Voss, Never Split the Difference? That’s where I first heard “why is an attack in every language”.
I'm not sure - would have to rewatch it, but yes, that's definitely part of her talk. She also covers lots of other language nuance.
Kurt on the danger of counterfactuals: cannot play would have / could have / should have; the goal is to understand how it happened;
“language is a powerful tool for shifting culture” < change your conversations, change your culture 🙂
M&Ms -> A&As, amazing and awesome! let's celebrate wins!
another reason why using the term “debriefs “alone do not capture the positives. After Action Review highlights positive deviants, as well.
We brought “Save of the Month” during pandemic; Morbidity and Mortality rounds: in old days: get in front of room, humiliating; “Save of Month” only highlighted one person; “A&A: Amazing and Awesome” highlight positive variance for everyone!!!! (Profound lens to view these rituals)
“highlight positive variance”. let’s spotlight some wins!
DOES friends, I will be not surprised, we will see @aalvare2 now keynoting in our Tech conference circuit. Doc got that - Whatever it is.
@aalvare2 - It will be really good to listen to you and our influence @allspaw together - This will be great session.
I would also add that, at DevOpsDays NYC 2020, we had a wonderful Open Space discussion on Burnout that was attended by a trauma / ER doctor who wanted to learn about solutions to burnout. This was held the first week of March, just before lockdown. She attended DOD not because she wanted to learn more about DevOps but because of the common themes.
Dr. Shannon McNamara is gold. She’s one of our faculty for the High Performance Resuscitation Teams Summit we’re hosting in May 2022. https://ce.mayo.edu/emergency-medicine/content/2021-high-performance-resuscitation-teams-summit-postponing-may-2022-canceled#group-tabs-node-course-default1
Shannon is a friend of mine as well - indeed one of the deepest thinkers in her domain and doing lots to bridge across domains ❤️
How are those A&As and M&Ms put together in time? Do they happen at the same time, are they different events?
A&A over M&M reminds me of @esh saying “feed that which you wish to see grow”
I love the idea of changing the vernacular to shift culture - I would love to hear how others have not only changed the words but followed up with actions too. What did that journey look like? What challenges did you find and then solve? What successes did you have and how?
This from @christina seems like part of the answer: https://devopsenterprise.slack.com/archives/C015DQFEGMT/p1633619247001600
Here’s a recent Tweetorial we did at Stanford Emergency Medicine as we try to use simulation to practice running after action reviews. https://twitter.com/alvarezzzy/status/1442899032674746368?s=20
@aalvare2 So amazing; what role does “senior” signifiy? Thank you @christina and @kboth_does for introducing Dr. @aalvare2 to this community!
In medicine, there’s the attending (faculty), there’s the senior resident (the most advanced learner), the junior resident (interns and mid-level training), then the medical students at the earlier stages.
There’s also nursing, pharmacy, respiratory therapist, others plus in trauma, the surgeons and their levels as above.
Sometimes we hear about Fellows, are they faculty as well?
The components of building a culture of learning & psychological safety: language, rituals, and role creation.
Director of [Staff] Well Being: help train future emergency physicians; I created role for myself: training, support and case reviews; I still get palpitations when I get those emails of events Kudos to you @aalvare2 and to Stanford Medical!
With working in a fast paced environment, have you had problems where people wanted to just move on to the next patient instead of stopping to learn? Is there a way you have brought them along for them to get excited?
@aalvare2 what is your approach/plan to learning about developments in the areas of focus you have shared?
Can you please elaborate? You mean, how do we operationalize this? https://twitter.com/alvarezzzy/status/1442899032674746368?s=20
retention/recruiting: $500K to $2MM per doctor who leaves due to burnout; burnout: leads to medical suicides; 400 physicians die of suicide per year. that’s what motivates my work. med students have lower burnout than their peers outside profession; then at every stage, then 2x burnout than their peers outside of medicine.
Great to hear more about Mental Health issues. We need to talk about this a lot more.
High achievers tend to be perfectionists, which can be dangerous.
We need to understand the need for self-compassion, self care, and recognize mental health challenges in ourselves and those we lead.
OMG on the burnout. family members in their first years of being attendee physicians are going to be sent this talk...STAT ❤️
aviation safety: so thoroughly studied by Dr. @ronwestrum who spoke yesterday
fascinated by the seemingly intrinsic tie between quality outcomes and staff well-being
@genek - THIS IS GOLD, PURE GOLD - Well done @aalvare2 @christina @kboth_does!
thank you. It’s definitely an inspired talk thanks to @genek and team. We wanted to highlight similarities in our work, our shared human experiences/common humanity, through learning in adverse outcomes. Kudos to @christina for leading this effort, and also to @kboth_does.
real talk: what are the architectures in tech orgs condemn technology workers to burnout and horrendous outcomes for their customers. what are you as the leader willing (and responsible) to do to fix it.
Excellent closing about Vaccination and frontline workers - well done
From @aalvare2 “Please support initiatives that support mental health of medical and frontline healthcare professionals; and thank you for getting vaccinated: you protect yourself, and everyone else around your (and frontline workers)”
Yes, ask enough whys, architecture issues can be traced back to leadership decisions as well
Really excellent presentation on the importance of these retros and some good how to tips
what a session! really puts things into perspective! @christina @kboth_does @aalvare2 Thank you!
@christina and @aalvare2 met through Stanford Mindfulness Community <----- the importance of Connections.
Love the mindfulness - taking a moment to reflect is a gift we rarely give ourselves.
Now - we need to include Mindfulness meditation end of every session or during break hours
@aalvare2 can you expand on what you said at the end of the talk about pausing to consider the other's perspective when your gut reaction is to completely disagree with what the other person said?
Generous interpretation is important. Using open ended questions, appreciative inquiry… I’ve learned so much doing this because not everyone will have the same frame.
This was great...I really enjoyed hearing about specific behaviors that contribute to a learning organization. It's rooted in psychological safety & feeling comfortable learning from incidents on the job, being in an environment where the org can capitalize from the learning without emotional toil.
Thank you Dr. @aalvare2 for teaching us about all your amazing work, and improving the systems that support patients and healthcare workers at Stanford Medical. 🙏 And thank you to @christina and @kboth_does for introducing him to us. ❤️ ❤️ ❤️
I could use more guided meditation at the end of meetings.
I wonder how to handle the apparent conflict between psychological safety and cognitive dissonance... :thinking_face:
Thank you everyone! Such an honour to be here!
can we get that mantra in writing please.. May i be happy… may i be..
May I be happy.. May I be peaceful... May I be free from suffering
Dropping that in my discord.. Thank you for this moment!
@aalvare2 @christina@kboth_does this was an amazing and rewarding talk - thank you!!!
⭐ It's a privilege to now welcome @amandas from Microsoft who will be discussing Leadership and Remote Work ⭐
@christina leading a two-minute mental meditation to end that phenomenal talk. WELL DONE - THAT WAS AWESOME!
I am so so excited to introduce the next speaker, @amandas, CVP of Product for Developer Tools at Microsoft, who I’ve admired for years. I am so excited that I finally got to talk with her a couple of weeks ago, after I heard an interview of her on Scott Hanselman’s podcast!
Woohoo! @amandas in the house! So excited for this talk!
@mik and I have spent years marveling at the decades of achievements of the famous Developer Division at Microsoft. I can’t overstate how excited I was to finally talk with her.
Bootcamp refers to the astounding 2 week onboarding process in DevDiv — did I get that right, @amandas? (So delighted and honored that you’re here!)
Our Bootcamp is essentially the initiation class to our division. It's a 2 week course that new employees take within the first 3 months of their job in the Developer Division.
When was Bootcamp created? Was there a compelling event that caused it to be created?
"Dont want decisions to be made by the Highest-Paid Person in the Room" (HPPR) [want them made based on data]
The first week is an intro Microsoft, the developer business, engineering practices, etc. The second week is a "Customer Driven Workshop". Everyone gets to write down their hypotheses and interview a few customers (real actual people who use our products or who we want to use our products) to test their hypotheses.
Hehe can we take the bootcamp even though we don't work for Microsoft? :rolling_on_the_floor_laughing:
Highest Paid Person's Opinion now we all work from home?
The Bootcamp was created in 2017, I believe. In part because we had TERRIBLE "net promoter scores" for new employees.
This is so amazing, @amandas! I’m sure others can resonate with this, or have lived it!!!
“we need our people to become the experts, and need to show that they can make decisions, that we trust them”
This statement also reminds me Netflix "No Rules Rules" Book by Reed
‘Act in the company s best interest’ (underpined, we trust you to determine this best interest). Reminds me as well of a talk with J Willink and Sacha Labourey from Cloudbees, roughly saying that if you want ppl to take leadership, you have to give them, even if as a leader you have 95% of the solution and guy comes with 80% let him go with it as the energy he’ll put in his own plan will exceed the difference
IBM has a program we call "Jumpstart" ~2 yrs long. New developers attend it. Various stages, design thinking, development, patents, public speaking, etc.
Two years for Jumpstart? How much time does the long version take?
“for any given segment of market we can compete in, they now have first-hand experiences of the people we’re trying to serve” — “senior leaders provide global context; teams operate independently with as much local context as possible”. So beautiful, @amandas!
yes to this. facilitation and meeting management needs to change when not everyone is in the same physical space.
Need to create pauses and poll the room for contributions -- so true! as the quick talkers often dominate the conversation
I laughed … because I can relate to this myself. : )
One of the many things I admired at Etsy visiting @allspaw in 2013 was their huge investments in conference rooms in AV. Optimizing for remote participants. It was unreal. One of the best sound systems I had ever seen.
“…you have a lot less meetings” - Something that so many companies should learn…
Much more asynchronous communication and it needs to be snackable
Much more asynchronous communication and it needs to be snackable
Yes THIS! The "snackable" / bite size content still feels like a problem for our folks.
I’m marveling that @amandas is modeling the behavior of Admiral John Richardson (@jmrichardson1) and Captain @emily356 — Leaders pause to think about the systems that we work within — rethinking the use of meetings in remote only, moving from meetings in Old World to deliberate shift to async communications.
"being more intentional about bringing people together... and knowing what we want to accomplish" speaking my language!!!
Gather has been eye-opening for me, very much has the feeling of a team room and hallway conversations
How do you limit async conversations via messaging from becoming long running sync conversations?
@amandas who do you look up to as leaders (companies/people) for the new model of fully asynchronous, distributed, remote-only ways of working?
Learning by Osmosis.. sounds like how we used to do onboarding pre covid
Is there a way I can help leaders want to have engineers talk to customers? I have asked every manager and director I have had but after 4.5 years, I still haven't talked to a customer.
(To explain my reaction, @amandas — when I was at Tripwire from 1997 to 2010, every customer interaction was so fraught with consequence. Sales/account rep orchestrated and controlled everything, and developers talking with one directly was… unthinkable!!! 😆
(Story does end well: in 2005, I helped lead the UX initiative at Tripwire, to atone for all the terrible things we built, and made it common practice to talk directly with customers. Although it did lead to horrifying horrifying discoveries. 🙂
I'm not sure who I look up to per say... Always open to recommendations! I have a ton of colleagues, peers, and folks on my team who are constantly teaching me. Monty Hammontree on my team has been absolutely pivotal in terms of bringing outside ideas into the team through reading books like the Culture Code, etc.
Do you blog about this or publish your consolidated learnings and references?
This is great! Always have those people who are bringing new ideas or questions in from outside our own (team/dept/company/social) echo chambers.
I do occasionally, Monty Hammontree, Travis Lowdermilk, Jessica Rich are all prolific writers that often post on LinkedIn and Medium about our process.
@amandas ... followed you, monty, travis on twitter... but haven't been able to find jessica. is she on there?
The notion of video conferencing fatigue is real: the fact that you could see how leaders in the GitHub study from Dr. @nicolefv last year was startling, showing how leaders are having to work longer, doing more coordination, due to remote work.
Seeing a tendency for people to schedule video calls that could be an email because the video call is such low friction vs. scheduling a conference room, locations, etc.
We need to do a video conferencing about how video conference fatigue is real
In the new remote-only world, we have to be very intentional to discover and create experiences and learnings that just happen by osmosis for those who are in close proximity
I've had to re-think our "incremental planning" offsite and split remote time to a 3hr, 5hr, 2hr event over 10 days
I implemented an optional 30 minute meeting for my staff. I call it "Funday Fridays". The requirement is you have to have your camera on and we do not talk about work. I have been doing this for a year now and the staff have grown closer.
to make the 5 hr work, I did a 1 hr ask a question about how this is going to work 'pre-meeting'
I totally can relate to what @amandas just said: there was a remote meeting for our kids’ school, and after a long work day, it took SO MUCH WORK to look engaged, and finally turned off camera, and it was such a relief! I’ve never felt anything quite like that.
Its good for me to hear how burnout is also affecting others in meetings and its not just me. Thank you for this.
You are in A LOT of good company @dacahill7. 😄
Interesting to me how Gather feels very different from being on Zoom meetings... anyone else have the same sense?
I'm still struggling with Gather, in person I'm generally fine, but this was hard for me
A weird Gather experience I just had tho, was someone who had their mic unmuted, just walked past me and I got a "feedback" double video from his on-camera that really confused me
Gather has lots of room for improvement, yet I think the gamer world and community has a lot to teach us about creating and sustaining enriching distributed, remote experiences
@amandas You seem to have thought deeply about how people collaborate and work, and revisited the assumptions behind them. Do you allocate time to think about things like this? Where does this category of thinking fall in your roles/responsibilities?
Software is a team sport. At their core, our products are designed for teams to collaborate on, deploy together, take accountability for together in operations, and to continually improve. So, from my vantage point, studying how digital product creation teams function is core and essential to what our products need to do. I'm lucky that my job is very meta and I get to have my own team "eat our own dogfood".
Kudos to all the incredible decisions coming out of all this incredible and impressive cerebral work — it’s so admirable! 🙏 I suspect that these practices will be rapidly adopted!
I've found trying replicate all in-person meetings with an exact replacement with a virtual alternative of the same length to be like replacing a meat dish with a meat-like replacement. It's not going to be the same. 🙂 . This is the opportunity to try something different. We've switched to shorter workshops, offline work etc.., asynch coordination is so true!
Not all meetings have to be camera on, or at least not all the time for sure. Otherwise it feels like having multiple one-on-ones simultaneously
Realistically in lots of remote meetings 90% of the attendees are either not closely involved in the discussion and do other things in parallel compared to face to face meetings. It is one of the realistic picture as per my experience with remote meetings.
I actually feel like this is one of the most ideal parts of having these remote meetings. Because it allows for more parallel work.
Though it does make it necessary sometimes to say “Please pay close attention to this content”
I do not know about that, @christopher.rueber. That is the most context switching way of working and therefore not ideal IMO.
Not as productive as face to face meeting. Due to these remote channels, the number of meetings have increased manifolds compared to pre covid time. Focus and productivity is impacted for those who work in complex projects because every couple of minutes there are remote meetings scheduled even for small things.
It really depends on how well the company adheres to busting Parkinsons Law. If you have meetings that are an hour, and always expand to that time, it’s nice when you can keep rolling things forward while listening to the important bits of the meeting. Most meetings have “high points” that I want to capture, but that I don’t necessarily need to be part of all the conversation to get there.
Ideally if meetings didn’t expand to fill the time allotted, then you really wouldn’t have people that feel the need to multitask. In my experience that is pretty rare.
Many people used to multitask on their laptops during in-person meetings as well -- except when the person who called the meeting insisted on people putting all the electronics away
Does this imply that the strategy session before was focused on not only what and what needed to happen, but the how, and the "leaner" version is much more about the why and what?
“Podcasts by team members teach people, but also exposes the person behind the content, and their careers behind the story” <--- enables Connections. (Seems so important, given how many times this has come up in this conference… cc @jtf @abd3721)
Just going to say that during this non-camera-on-conference, I've baked 35 rhubarb pies (from own garden), made mulled wine essence for all of December, prepared christmas cookie dough for the freezer, made chili con carne and potato leek soup. YAY for consuming awesome knowledge without cam on. ❤️
I could basically host a watch party on pies at this time. The geek factor would be even higher if I'd made them with Raspberries.... 😉
Great point about the "Human" @amandas. In most of the Enterprise the people leaders/manager they do not even know their own team members - Family, background, social/economical background, birth day or anniversary days. They expect just clock the time and "Delivery" and of course meetings
Influencing decisions has been difficult in the remote world. I think Amanda's point about data-oriented writing is key. Would love to learn more about that.
This is a great way of managing remote and creating a solid culture,
Satya's book "Hit Refresh" is a great read (cc - @amandas)
Satya Nadella on his role: shift resources and be the cultural ambassador
Speaking of bring folks into the industry, it's great to see Women leaders presenting at conferences. Great role models.
"Customer obsessed, like they are oxygen or sun light!" ❤️
I think it’s so exciting and so beautiful that DevDiv was on the vanguard of the Microsoft transformation, and the extent that it’s spread beyond DevDiv, @amandas
Role of manager - be a role model, be authentic, what are the values of our teams and how we live our values every day
“Everyone, including managers and leaders, needs to grow and mentors; so part of the role of the manager is enable that for your teams. If you’re not coaching people and helping everyone on a learning journey, then you’re missing something very important.” — @amandas
for sure: the same boundaries between personal and work don’t exist anymore.
Nice. Some senior leaders at IBM are also talking about bringing your personal life/feelings to work.
that is one positive that has come out of all this — recognize the WHOLE person. not just the “work” person.
Great morning session day-3 at DOES; Why does DOES matter? All these slack messages are evident;
The extent to which so many orgs here are hiring developers is astounding: e.g., Vanguard has 500+ openings. Almost every talk has ended with “We’re hiring!”
A friend of mine wrote about the gift of "full human presence" when being a mentor: http://www-personal.umich.edu/~bcoppola/publications/42A.%20Human.pdf
“Reduce the friction required to get the work done”
I think we need to rethink what the minimal set of work needs to be. Just because you can solve a problem with software, doesn't necessarily mean you should, we need to focus on the Why
Also, on the "should we do it"? Good to Great has awesome insights into focusing on what makes your org special. If it's not what makes your org special, buy it!
Simon Sinek - Start With Why, Infinite Game, Leaders Eat Last - some of my absolute favorites
…and @amandas is connecting the dots from ^^^ to ensuring the leadership systems and other systems are in place to ensure that leads to achieving business outcomes. Awesome.
Amazing @amandas - Love your thoughts and ideas; A great reminder to all of us (rest of us) what to look for in our work environment, our mentors and human connections
IDC reported last week that we expect a >4M shortage of developers world wide by 2025.
I have worked remote only for 10 years and the things I miss the most are those impromptu breaks with other coworkers in the hall, in the lunch room, etc. 10 years ago that was especially hard but infra and tools have improved dramatically. Can that be reasonably replicated virtually now?
Gather looks like a very interesting approach to this.
I've found it really hard to replicate those side of desk conversations
Even in cubicles, hard to replicate. Team rooms by far my favorite. Gather has feeling of a team room.
Deep Work by Cal Newport describes the 'perfect office' where there is collaboration space but also offices with door space. I interviewed at a company that had nearly replicated this with sliding glass doors to close off cubicle sized spaces with a "conference" table and whiteboard walls/glass all around
I love team rooms -- I hated the cube farm and constant source of distraction those represented
I self assign as ambivert - but need my 'deep work' alone time to recharge - effectively playing an extrovert at work.
Naturally introvert, but can play extrovert with enough focus and energy
the working on other things during a meeting is a rare luxury for me - we don't have any personal electronics at work and are mostly in the office
I've mastered the art of taking notes in a meeting I'm hosting on a page/email and insist on having the keyboard turned over to me in nearly every meeting...and have now started challenging others to learn how to listen & type
I’m still struggling to get my head around how ya’ll work without access to the internet. Not being able to easily google something just 🤯
@joe.waid - I'll add that to my list of "can I talk about this at a future event?"
@bryan.finster486 - who knew learning to type on a TYPEWRITER would continue to come in so handy?
I’m 6-6. Means I have large hands. Keyboards are too small for touch typing. i never learned. 😄
typing without looking is a real super power. I’m an oddball who learned dvorak so I had no choice but to master touch typing. 😄
excuses. I'm 5-0 and literally had to get the smallest mouse ordered. Good news: no one uses my desk at work
Thank you thank you @amandas so much for teaching us about leadership, the role of the leader, and rethinking deeply held notions of what work should look like! Help she is looking for: dev productivity: to what degree are teams truly supported? time to first contribution! Would love stories of culture changes to enact that to happen!
…and that next generation of who are creating software are diverse and representative of humanity!!!! Thank you @amandas!!!!
Thank you @genek brining the best from industry to our summit. DOES standard is now so high. In-person or virtual does not matter.
Great interview @amandas and @genek... gonna have to re-watch it multiple times
I feel inspired and refreshed after listening to that. Much needed booster shot (no pun intended!) :)
One thing I can say about inclusion is that, as a hard of hearing person myself, the live-captions on my browser in Zoom sessions, the chats, slack etc. have given me super powers I did not know I had!
Thank you for sharing so much great information @amandas!
So much awesomeness shared in that talk - thank you @amandas!!!!
I just heard @jeff.gallimore mention the wrap of the conference and felt the pang of anxiety that things will be ending far too soon
Reminder: The action has moved to the breakouts! Join the following channels to interact with speakers live while their talks air: #ask-the-speaker-track-1 #ask-the-speaker-track-2 #ask-the-speaker-track-3 #ask-the-speaker-track-4
@paul_littlefair and @lbmkrishna - really liked the presentation! Can I ask, what underlying systems have you wired up to Tasktop Viz at this point?
Ok, interesting...we're on the same kind of adoption journey (with help from @dominica herself!)...same two major tools, and are in the debating phase about other / smaller spots where work happens
Your talks, contribution impact on folks like me and other, Enterprises are - tremendous
Reminder: The final plenary sessions are starting again in 5 minutes. Start making your way back to your browser. https://devopsenterprise.slack.com/files/UATE4LJ94/F01D34MC2KS/image.png
Last Chance! Grab your free digital copy of these two amazing books before copies run out. Agile Conversations: Change Your Conversations, Change Your Culture (sponsored by Split) Sooner Safer Happier: Antipatterns and Patterns for Business Agility (sponsored by LaunchDarkly) Get yours here! https://members.itrevolution.com/free-ebooks
⚡ Starting the wind-down this afternoon will be the dynamic duo of @shelby and @lizf presenting, Shifting Left on Production Excellence with Observability ⚡
All pictures shows the Production on the right; But @lizf @shelby talk on "Shifting Left on Production Excellence with Observability"
<start flame war> yes, developers should be on-call for their own services. Yes, they should not be setup to fail because if it though, they need coaching (SRE) help to run their services.
haha. Joined a little late and heard "Many things can break and we don't necessarily know what it is up front..." and my immediate thought was "Huh. Sounds like honeycomb". so... branding++ 😉
relevant: https://charity.wtf/2020/10/03/on-call-shouldnt-suck-a-guide-for-managers/
IMHO: In most complex Enterprises - folks do not know how many services are running in PRODUCTION; You get different data from different catalogues LOL
@lizf - @shelby does what you do 🙂 glass of water 🙂
Technical decisions are business decisions 🙂 Love That
I was wondering why my voice sounded so ragged but then I remembered that I was still recovering from a cold when we recorded this
I heard @damon say "Operation is the BUSINESS", I will add this to the list now by @shelby "Technical decisions are business decisions"
“our customers shop our stores, i am your colleague” ~ @bryan.finster486
operations is also the group closest to the actual customer
OMG. had to reboot laptop after it crashed! Thank you so much for presenting, @shelby @lizf!!!
@lizf - when you say event, is that a measured metric event or a tripped metric threshold?
we encourage you to think about SLOs as being per transaction as opposed to per time window
it is much more dangerous for your service to break on Black Friday if you're in retail than at 3am on a random Sunday in March.
not every time window is equal. thus count number of transactions succeeded vs failed
That thought also brings to mind that if you have scalability you may want to use that same schedule to show you are ramped up already to support that more important window.
These 9's are crazy numbers - How some architecture document state infinite nines, but no tools to measure that and impact of that;
I love this observation of mobile devices over cellular networks shaping SLOs, and how 99% is often enough, because mobile reliability is so poor. The famous 2014 Ben Treynor-Sloss talk referenced this, too!
There are also some stats on the reliability targets for pacemakers that put "9"s approaches in sobering light.
Question @lizf - have you seen architects care about this 9's now in the observability world or just stop with Visio diagrams?
I love @lizf notion of monitoring SLO budgets — “RELEASE THE CHOAS MONKEYS! UNLEASH MORE DANGER!!!” 😆
I was shocked when I first read that google intentionally injects failures to get back to the SLO if your are running over it. It makes sense once you pause and think about it though.
making your service down 1 less nanosecond per year doesn't matter if your users in practice experience minutes or hours of downtime per year due to being out of cell signal
I think it’s the probability of any transaction failing/erroring – independent of # of events.
if you are not thinking reliability as a critical functional requirement, then you are doing a poor job as an architect
Without proper engineering what remains is only "CHAOS" :rolling_on_the_floor_laughing:
@lizf observability tools monitoring the actual state of our systems:
release the chaos monkeys = learn about the dark corners of your system
hmm, some just release monkey, forget about learning, what to do?
I find it helpful to emphasize that it's an experiment and to reference the scientific method. what do you expect to happen? how are you going to measure it? (it's why observability is criticial)
i love the “investment” perspective. you can over- and under-invest in this with real business consequences.
also, going way over your SLO target means users get used to that ultra-high reliability and start depending on your service for things it wasn't meant for
a new standard customer support response: “it was probably your wifi.” :rolling_on_the_floor_laughing:
I 💗 this > alert on error budget burndown rate, not every potential cause of alert
Prob of all cell phone not becoming available is very low and hence back end service avail must be very high
alerting on burndown rate is my #1 favorite thing about SLOs
(or worse, if the refresh is what causes the service to fall over. 😆
fun story once upon a time a change I worked on brought the entire South Korean mobile network down
because I disconnected all the clients with hanging GETs to http://google.com/android/services at once, and they all reconnected at once.
a hanging get takes no attention from a cell tower. a FIN and then new TCP handshake takes a ton of resources.
Mind literally blown. That’s one of the wildest things I’ve ever heard. Thank you!
and just let the client send a request to hanging GET at a random time, get a RST, and retry
it's like inrush current when servers are turned on. So even if the rack is running fine at steady state, right after a power outage they all try to boot and poof blown breaker - and we are dark
Very cool... or persist the TCP state so other cluster nodes can take over the already active connection and operational state.
I will not commit code without unit test and also not commit without instrumentation - by @lizf
instrumentation coverage is as important as test coverage
I really want to introduce a set of artifacts to be delivered before something goes like - observability, documentation, change documentation, decommissioning
Honeycomb's PR template includes a question: "How are you going to observe this change as it goes out?"
OMG. Mind blown, @lizf https://devopsenterprise.slack.com/archives/C015DQFEGMT/p1633642461265500?thread_ts=1633642408.262000&cid=C015DQFEGMT
doesn't have to be a lot of overhead. just like, look where you're going
What bothers me a lot in this space is - the SLOs remain just on the paper, some do not know how to measure that; What is involved in measuring that?
something something I maaay have built a solution for that having been dissatisfied with what was in the field
People default back to SLAs, because that's what middle managers are actuall ymeasured upon
(psst http://honeycomb.io/slo now that I have tooted the horn of a couple of other players in the space as well)
also, the upcoming O'Reilly book Observability Engineering discusses event-based SLOs in detail https://www.oreilly.com/library/view/observability-engineering/9781492076438/
free copy of the book: https://info.honeycomb.io/observability-engineering-oreilly-book-preview (but you can also get it through Safari if your org has a membership)
This practice also seems like it will help improve confidence in stakeholders if I start my project with instrumentation.
it doesn't matter if your RTO/RPO is 20 min if you have 5 outages in a week 😉
so instead track your SLO for the past month or quarter to maximize user happiness
I think RTO/RPO is more for DR and that SLO would me more inline for this item..
right. RTO/RPO/TTR are useful metrics for bounding how many outages you can have
but also if you have very few outages because you've made all the small stuff not be outages
would you look at the Constraint on telemetry and that should be first?
I see so you still need to focus on what is being measured and how before you can really look at telemetry and find the right constraints.
this is the link on that previous slide https://leaddev.com/monitoring-observability/observability-and-your-business
(thinking: “why can’t I take down entire cell phone network of a country? why does @lizf get to do all the fun things?” Sorry. Still recovering from that story!)
Importantly do not rename OPS or Sys Admins as SRE - seems like the SRE is all together different skill set; OLD WAYS WON'T OPEN NEW DOORS
But also do retain and train, don't try to fire and replace with outside people. The skills are teachable
but if i am testing code in prod, then the dev has to be on call. what am i missing ?
but if i am testing code in prod, then the dev has to be on call. what am i missing ?
@shelby explores where/how are we going to get all the SRE/Ops skills needed for what tens of developers will be writing?
how do we help new devs grow from "cloud-naive"? https://twitter.com/shelbyspees/status/1446160124637106177
👏:skin-tone-3: And now, to bring us to a close is none other than @allspaw, presenting: Learning Effectively From Incidents: The Messy Details 👏:skin-tone-3:
Brilliant as usual - both @lizf and @shelby - I am delighted that Liz you spoke at my "Continuous Learning Series" - BNZ Thank you
I loved your close and call to action, @shelby — and thanks for proposing this talk!!!
I talk about 10 Deploys a day often... helping my teams understand that is possible...
@allspaw giving us the BLUF (bottom line up front) 💯
this is the paper being referenced https://www.researchgate.net/publication/224753269_The_Messy_Details_Insights_From_the_Study_of_Technical_Work_in_Healthcare
@allspaw I loved this talk from this morning from @aalvare2 from Stanford Medical and @christina @kboth_does describing the realities and consequences of this. Was such a beautiful talk! https://videos.itrevolution.com/watch/625427503/
@allspaw let me guess… you fired that engineer, right? just a bad apple, right?
@jeff.gallimore Like you should and what is right 😄
Slides for this talk: https://github.com/devopsenterprise/2021-virtual-us/blob/main/DOES2021%20-%20Allspaw-Messy-Details.pdf
I love this presentation — @allspaw describes the performance improvement plan he put an engineer on after a disruption in a revenue generating service.
Wonder if there is a difference in remembering the exact items or remembering where to find the details is the same - same type of learning?
Quoting me from earlier today: "Ops stories are cooler. Like the time I held all company data on my knee because someone had not installed the rack rails..."
I love this observation from @allspaw on what is required for an event to be remembered, which is nearly a perquisite for the right learning to occur.
I agree. I think my sentiment was that stories about something exposed to customers failing makes a more dramatic story.
They can, but I think if you were to ask hands-on folks about non-customer-impacting incidents, they’ll have stories that are worth listening to!
I didn't say impacting customer but systems with customer facing and live. Near misses are good drama. Whatever happens in pre-production environment doesn't have that tight rope even if the stakes could be high in other ways.
Also, drama does not equal educational. It's just naturally tends to circulate better.
Different people will also remember things differently even on the same event...
If there is someone wanting to do incident analysis or fill out a ticket before taking the steps to actually resolve the incident, how do you help them want to act more urgently? They are trying to make sure we go slow enough to be safe, but I don't know how to ever build their trust enough to get decision rights. (I just got called during these keynotes because of this problem.)
YES! Adjacent to this, I love the way this was articulated in your 2017 presentation “How Your Systems Keep Running Day After Day”. Different people have different assumptions and perceptions about how the system works. Capturing those as part of the post-mortem is always fascinating. https://youtu.be/xA5U85LSk0M
Big fan of your work @allspaw. Appreciate you taking the time to share your work with the community!
(actually, I would agree that Ops stories tend to be wilder and more dramatic than Dev stories, which are mostly about tabs vs spaces. 😆
I’ve a few impactful (positive) dev stories. My ops stories are hair raising.
Well... there is the story about writing the bootloader for an embedded platform where we had the wrong datasheet for the NAND chip...
... cutting the bootloader code in half... twice
Unless the dev story (about something like... not validating input?) results in the exfiltration of huge amounts of sensitive customer information...
The time I deleted the application for calculating tax in Argentina and it wasn’t discovered for a month.
If it's YAML, it will crash with tabs
The official writeup has to be in 4th order normal ITSM form. @allspaw
Wait, is this because Allspaw looked like Broderick when they were both young?
yeah people would always confuse them and he never recovered
@allspaw you are a hero of mine and the inspiration behind the postmortem writeups I've published at work
at my end of the keyboard - OMG Allspaw recognized my comment! Seriously, though. I'm still recommending folks watch your talk on incidents at DOES from 2018...can't find the link - had the diagrams about what we know (wee bit) and what we don't (massive)
@vmshook - This one is fantastic too if you haven’t seen it previously -> https://youtu.be/xA5U85LSk0M. (Great talk Tues btw!)
anemic:
a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network
https://github.com/JDHarris007/coe - This is the postmortem format I often point to. Really excited to hear where this goes. 🙂
I'm excited to hear that this was an innocuous change (typo fix or similar)
“we started a reboot fest. dividing up machines to reboot”
(this is a different story of the same incident I mentioned at the beginning, BTW)
Sad part is at most companies this would be a Resume generating event - instead of using it to learn from and get an ROI on this event...
This is why I cringe when clients tell me they have “less testing” for “minor changes”. You don’t know what a minor change is until AFTER you deploy!
John is in the Halloween spirit with these horror stories.
That slide transition was AWESOME! 😁
“css issue forced entire fleet of web servers required a power cycle”
Solution: maximum recurse depth of 404 pages! 😄
loving how you say richest understanding, not most complete. b/c there is always more
There are a lot of PDFs that John has for you to read. 🙂
this is the value of studying the humanities and social sciences
I picked up a habit from Amazon to combat hindsight bias, which is to type out all of my thoughts in chat to make reconstruction way easier.
also super valuable for helping your teammates learn from you
I saw $x at $y which makes me think it's $z. Going to look at that next.
Well, there could have been a call to stop apache before the touch, then a call to start it again after.. It’s not really a fix, but it might have reduced the down time from ~70mins to maybe 15mins or something and then a real fix could roll out. It’s a hatchet toss
Was “1 hour and 10 minutes” the part of the story you’ll remember? The part of the story you want people to hear?
“I am pushing my CSS change, because the worst that could happen is that the text won’t be centered anymore” cc @justin124 😆
Ha. Earlier today I argued that you don't want to issue a security audit for a change of moving something 3 pixels north through means of CSS. 😄
but what Protocol could we use to read our HyperText?
One place I worked had a field for the change request "What is the worst possible outcome of this change" I always wrote "The heat death of the universe" it took about a dozen changes before someone pulled me up on it.
Tracks well with The Big Bang being the root cause. 😉
You try writing a valid test to see if you're going to end life as we know it.
I get the point of the question, but if no one really considers the impact. I certainly wouldn't have believed a small CSS change could take down hundreds/thousands of servers until I heard this story.
I spent 20+ hours on my last incident review. 13 Interviews with a total of 21 individuals. I got most of these things in it 😀. Certainly could do a lot more to craft it into a story though.
better writeups and review meetings have also encouraged more people to speak up during an incident when they have a hunch
I'm excited to see if the new VOID (Verica Open Incident Database) will help make it easier to learn from incidents across the community and make it more readable.
We’ll likely learn more about what orgs are willing to publish about their incidents.
I love that the Etsy Three Arm Sweater award was in the DevOps Handbook: https://codeascraft.com/2012/05/22/blameless-postmortems/
A year long how it started how it’s going! 🤯
https://twitter.com/rynchantress/status/806576489798004737?lang=en
A certain company has a "Major Incident Meeting" process... they want zero MIMs this year. Suggested solution: don't invite the meetings. 😄
I think the name of the process is ... unfortunate
Time for everyone to get on the same zoom like DOES Europe?
👏👏 thank for your enlightening us @allspaw -- I'm on a mission to get richer stories on post-mortem reports
Thanks for speaking! I don't think its an exageration to say that I think of your talk every incident I respond to. It's really helped me practice stepping back and looking for more than one root cause. Even this morning, there was an incident I avoided the down call for and I was able to come up with 4 contributing factors that would've helped us avoid the outage.
Excellent! A great exercise might be to ask others who responded to see what they come up with, without mentioning yours, and see if and how they overlap. 🙂
Thanks for the talk @allspaw - sorry no great stories except for my working culture is one of NOT sharing but providing the bare minimum of info
Loved the talk, @allspaw! I was reading a defense of detailed case studies as an important learning tool (vs dismissing it as an "anecdote") and was going to link here. but of course now that I am looking I can't find it.
This is the first that I've been to, but I really enjoyed it! So much value! Thank you!
My first DOES and I absolutely want to attend next year. Everyone was awesome.
Note that Facebook outage this week brought back PTSD due to the "strange loop" similarities with our CSG outage that is in the DevOps Handbook v2 @allspaw @genek
Keep an eye on slack in the coming weeks and months so we can keep the conversations going!! Don't disengage!
Thank you ITREV team!!! @mvk842 @mollyc @annp @kearav @alex @annan @leahb!!!
Thank you @mvk842 for sharing @genek with us via the conference!
Gather might be my favorite newest edition to this virtual format. Overall really great conference yet again (4th year attendee). @genek @mvk842
Does it have to be over? Will be following and looking for many of you after today. Thanks to all. I just love this community
@mollyc and Team...YOU ROCK!!! Can't wait to see everyone in person...soon!❤️
@genek Do you have a high quality picture of the books behind you? Or maybe a list somewhere?
@allspaw I have a failure story where the documented post mortem never reflected the true activities/issues, only found the details from the individual stories people told 1-1
@dian.hansen this is incredibly common! I can count on one hand the organizations where the write-ups and verbal stories have a lot in common.
Experimenting with hybrid (physical + virtual at the same time) vs physical then virtual (or vise-versa) would be super interesting for A/B testing
Yes, I’m LOVING the virtual format. As an introvert it got really tough as DOES got bigger, and the virtual format made it much easier to control interactions and still get to talk to people / meet a few new people.
Virtual and then physical would be pretty cool…meet/chat people online and then get a chance to talk in person.
Here’s what our plan is right now, to always have one virtual conference and at least one in person conference (for sure DOES US, possibly another continent). Because we built our own conference platform (+/- of being founded by software developer, we are very fussy about how things work), it would be a shame to just let it go the way of the Dodo. 😆 We also love that a virtual format makes DOES content and community so much more accessible both to attendees and to us as organizers - we’re able to invite speakers who would not normally have time to travel to a venue and deliver a talk at our conference. All of this is to say that DOES - Virtual isn’t going anywhere!
@genek Thanks! Can we keep Gather.town open for conference attendees for a few weeks?
Such a great group of people here. Loved all the 🥐 conversations. Now I need to rewatch half a dozen of the talks and take notes.
So many videos to watch (or re-watch).... so little time...
I’ll find a more complete picture later — sorting by color achieved by @mvk842. 😆
Very much appreciated the absence of velvet ropes, though it was also rather unnerving.
Reminder: Please submit your feedback for the talks you attended. It’s so valuable for us and the speakers. And after all, feedback is a gift and sharing is caring! Enter your feedback for those talks here: https://events.itrevolution.com/virtual-agenda/ https://devopsenterprise.slack.com/files/UATE4LJ94/F02GHSEB604/feedback-does21us.png
I would like to have 2 options - Remote and in-person maybe 2 different conferences ... thoughts
Thank you all! This was my first DOES conference and I enjoyed it immensely!
Thanks IT rev team for this edition (my fourth or fifth? But only one in person, not the easiest one to follow… in Beijing)
And hopefully, we may have some story for a future edition if things start rolling out at scale, in our traditional business, in a not moving very fast country… 😆
@allspaw Part of the problem, many times, is the Incident Resolution templates being too long and data dependent(I've seen some with about 5-6 separate date and time fields)...one has little interest in documenting the details that really matter.
Where I worked, we asked only fill in those items if it mattered or it is quick and easy to fill in. The Story of what happened, how it was resolved and any other things that can speed up response and resolution is required.
When there’s a template, what gets written is often what follows the template. As a result, people don’t read them because frankly…they’re not interesting, because templates != stories.
When the focus is on fixing, not much time is spent on describing. Rich descriptions pull people in. If the understanding of the incident isn’t very good, fixes that come from it aren’t likely to be any good either. 🙂
This is an amazing conference Thank You everyone that made it possible ❤️💌💚💙💜🤎🖤🤍🧡💛💗