This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
@sebastian.bertoncini https://videos.itrevolution.com/watch/549298333
Reminder: Get yourself to your seat in Chelsea for the opening remarks. We’re kicking off the final day of the DevOps Enterprise Summit in 15 minutes at 8:45am PDT! https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
Reminder: We want to hear your stories from the Summit. What did you learn? Whom did you meet? What ideas are you taking back with you? What actions are you planning to take? Post in #summit-stories! https://devopsenterprise.slack.com/files/UATE4LJ94/F05UKU0HBTP/stories.png
Reminder: Remember all those talks you attended the first two days of the Summit? Please submit your feedback for those! It’s so valuable for us and the speakers. And after all, feedback is a gift and sharing is caring! Enter your feedback for those talks here: https://doeslasvegas2023.sched.com/ https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4P024W/feedback.png
🎉 Kicking us off this morning is: George Kraniotis, Director of Software Engineering and Erin Daugherty, Director of Product from Discover Financial Services.
Forget the program committee; I just wanna apply for the music team. 🔥
Product areas at Discover Financial Services: Card set up, manage my account, re-engage with my card, transfer balance, strategic partnership, credit actions, portfolio enablement
I don’t know how this talk ends, but the Sweet Tango Apple is the new best Apple in the world.
Another fine product of the University of Minnesota (who also developed the Honeycrisp).
That’s what I hear…but until I have more sample size data - I’ll still be in the honeycrisp camp.
What a duo! George: leading 170 technologists; Erin leading 17 product leaders
Card Posting and Transactions (post transactions, calculate interest and fees, among dozens of capabilities, spanning cloud and mainframes): 100+ engineers, 5700 batch jobs, 60+ components;
Last re-architcture of CPT? The year 2000. (23 years ago!!!)
I promise to post my Spotify list to this Slack soon!
A lot easier to add business logic in a big zoo than clean it up...
"We needed to stop treating CPT just like a backend ___". (what word did she use? In my head, I filled in "dumping ground", but I know that's not what she said)
"Even our most senior engineer didn't realize how reliant they were to things in CPT"
What we discovered: CPT was 13 domains and 65 capabilities. (There was no way one team could support all these domains because of cognitive load)
90 days of event storming to do the archaeology of 23 years of strata of changes (in the dumping ground)
First two domains: API Engagement, Delinquencies; the result were two engaged teams, now 8 dedicated and persistent product teams; and acknowledging need for shared platform team
When the business can name an engineer who helps solve their problems, THAT is a problem!
Ah. "Treat CPT as a product, not just as a backend technology"
I still have nightmares of people asking me “Do you have a P-code?”. Funding product teams for the win! :first_place_medal:
@erindaugherty - how do you think about “making business logic human readable and accessible to all”? I’d love to hear more about this!
🎉 Next up: Rosalind Radcliffe, IBM Fellow, CIO DevSecOps CTO at IBM
To state the obvious: I could say those things in my introduction because there's no way that Rosalind could. 😂 But I thought it was super necessary to set the context.
11 data centers, which now longer own. 1K+ applications managed as pets. Your mess for less, and stays a mess. And now you need bring it all back in!
"What skills mainframe shortage? I found them!" (She was explicitly not allowed to hire from any IBM clients. But if I understand correctly, some people chose to retire early and join Rosalind at IBM.)
It says a lot about the leader when people continue choosing to work with that leader. Go, @rradclif!
There's a whole other talk here on building great teams that she could tell. I want to hear it too.
"I have systems that have been written 50-60 years ago. I have systems that were written to support the moon launch that are still running". (!!)
And we were thinking the 23 year Discover CPT platform was potentially heritage!
"If they actually need to go into production, it will be read-only"
It's fascinating that all this innovative improvement work is happening only when these thousands of systems are being insourced — as opposed to the economics that lead to "your mess for less, which will stay a mess"
I have never heard someone say 8 9’s with authority and credibility before. It feels weird.
Quite the opposite. I believe her… because she IS “the Hammer”! 🙌
315.58 milliseconds per year of downtime is essentially 0. I agree, measured how?
Measured by systems availability and application availability with 0 minutes planned downtime. It helps to start with hardware designed for it but it also takes running a sysplex which allows you to move work dynamically without impact.
She is putting on a masterclass right now on how to humble brag about being the best of the best in leadership.
99.999999% (8 nines of availability) monthly downtime would be approximately 0.026 seconds.
Focus on Run vs. Change — I have to imagine there can be no ROI on rewriting 1000s of applications that have been running critical business processes for IBM for decades. Interesting technology change.
315.58 milliseconds of downtime per year. Gracious.
Would love to learn about how this was setup
This amount of (cultural) change for a company that still supports moon-landing services is inspiring
There is so much courage behind this talk @rradclif in leading such a change. It is not only knowing the tech, you got talents to follow you. Bravo! :right-facing_fist::skin-tone-2: :left-facing_fist::skin-tone-2: 😎
There was a talk at DOES last year about re-skilling and giving folks an opportunity to be welcomed into our open community.
Yes! A Lightning Talk by Alex McCleod. She is absolutely amazing.
I will pass along the message to her. She will be so touched that her talk was remembered. Her work with ReUP is so important.
Also very impressive her career and credentials as a woman and how rare that was when she started her career!!
Absolutely, I have no people management responsibilities but do have leadership responsibilities
🎉 Next up: Christof Leng, SRE Engagements Engineering Lead at Google
Mohawk, Bond music and awesome tech talks! You've got to love tech oriented conferences!!
My favorite @cleng quote: "you really don't understand a system until it's on fire... in production... with live customer traffic!"
My favorite @cleng quote: "you really don't understand a system until it's on fire... in production... with live customer traffic!"
Nothing like a failure in production to surface unacknowledged dependencies...
And some generals are demanding an explanation....and time to correct...
Drifting from economy of abundance to economy of scarcity... applies well to the reliability
People need to be able to point out the weaknesses and vulnerabilities in our systems.
Less that it's lncentivized more that its not disincentivized - the incentive is then by admitting the issue exists that others will help solve it and you will be able to learn from the process
“Where there is fear, you do not get honest figures” - Deming
How dare someone from Google highlight that something like DORA metrics aren’t the only thing to measure 😳
I imagine you folks may spend more time sharing the disclaimer than the metrics these days 😅
“The best way to understand a system is to watch it go up in flames with real user traffic.” 🔥 😳 :rolling_on_the_floor_laughing:
"When we look at history, heroes tend to have a short lifespan."
In a world that rewards fire fighting, no one wants to go into fire prevention…
If you reward firefighting you get an army of arsonists
Ops worst nightmare. You're in the middle of the night, working an outage, alone, not knowing what to do.. and you have no one to call.
your therapist and pizza/beer delivery peeps
Don’t run buckets of water into the fire faster, invent a fire truck.
“Automate yourself out of your current set of tasks every 18 months”
"You need to aggressively automate. Not for efficiency, but for consistency. Without doing this, you get more pets."
It would be nice to expand our community understanding of heroes in the system. I'm hearing it spoken about as always a bad thing but I don't think that's universally true. Sometimes a great outcome of a strong system is that its design inherently offers the space for heroes to emerge.
I like @genek ‘s concept of “scenius” and “interesting friends” instead. “Heroes” usually fly solo, which leads to dependence on an individual to solve problems rather than a community that changes the scene.
Heroes are celebrated because they protect and save us from a disaster... but we need to learn from the hero's journey to make the necessary and difficult changes so that the hero can retire
So if we talk about heroes, it should be more about the Justice League than Superman.
Being a hero should only happen under very special circumstances and not be a career path. Like production freezes, you sometimes need heroes, but you should never mistake them for the solution to your problem.
Agreed. And I don't mean hero as in the weekend worker... I mean the team member who emerges to solve incredibly vexing problems. Like the enlisted member who fixed the water pump on Adm Richardson's nuclear submarine. The culture on the sub allowed him the space to fix that pump on his own. It encouraged his behavior. we all surge when we have to but must take liberty when we can
@mreele perhaps the distinction is in the definition of a hero and designing systems that try to minimize the need for a hero. In the situation you described, there was a need for someone to step up, and absolutely agree that the culture provided an outlet for them to do that. But what happens next is of interest to SRE’s; how do we build systems that minimize the need for people to do extraordinary work to accomplish ordinary goals? It’s a balance, and unfortunately, tech has a lousy track record of exploiting extraordinary behaviors instead of systematic thinking about ways to meet that need.
But yes, we should build a culture that encourages people to solve hard problems and be bold in doing so, but then goes the next step to figure out ways that those bold steps are required less often for system dependencies.
Yep, I completely agree. I personally am having dissonance with the concept of heroes in the system as part of the system design and encouraging that exceptional performance yet needing an architectural resiliency that doesn't require heros at all.
I’d love to hear more discussion about this distinction as a community. I think there’s enough ambiguity around the term hero that it warrants it. And this brief exchange showed me that I’m looking at it from one lens and there’s definitely more.
“People think automation is about efficiency. It's really about consistency.”
Was thinking the same thing as Christof described winging it, pushing out config change without version cont rol or review. 😂
Production freezes don’t solve the underlying problem. They just pause them, temporarily.
Yes! They're a mitigation, not a fix. Sometimes you need a mitigation, but you should never confuse it with a fix.
:thinking_face: If you add a $ to “Don’t deploy on Fridays”, does Charity still appear in the chat?
I should have said: When your code hits production should not be whenit gets tested for the first time.
‘Don’t be a Hero. SRE heroes leave an influence and that creates a culture of over working’
The bigger your forest, the more lightning strikes. Don’t try to reduce that count, improve your response.
Everyone has a test environment, some people are lucky enough to have a separate production environment...
Anyone can build complex systems (even by accident) - try building simple systems
Hilarious. Christof always pushes the big red buttons, and deletes code that says "DO NOT TOUCH" to see what happens. 😂
Signs that my 3yo has strong SRE potential: (1) likes to push every red button (2) likes to break things to see what happens (3) gets overly excited around chaos (4) likes to nap during office hours
"Don't touch" technical debt is a booby trap for change.
Lines of code deleted is my favorite stat in a pull request
The person in the upper-left is the late Dan Kaminsky, famous for his work for "saving the Internet" by coordinating a critical DNS fix. https://en.wikipedia.org/wiki/Dan_Kaminsky HD Moore, inventor of Metaspoit: https://en.wikipedia.org/wiki/H._D._Moore
Your capacity is equal to the capacity of your constraint
The work and 🧠 power of this collective community is being applied to critical life or death situations! ❤️
Latency sensitivity is a great lens for prioritization and sequencing ⏱️
🎉 Next up: JD Black, Director of Digital Transformation at Northrop Grumman
Helluva call to action. I’ll reiterate my recommendation of (Re)coding America. The Calvary isn’t coming. Become the Calvary.
Love the passion and call to action I feel after Josh Corman’s talk! The Calvary isn’t coming!!! Bravo sir!
Public policy and regulations aren't sexy. But it's often written in blood.
So true. Bad policy leads to many harms, including death, but those responsible rarely are held to account. Incentives...
Systems engineering: 50 pounds of software ---> 250 pounds of documentation. Systems engineering enabled massive successes, but they don't know what to make of DevOps practices.
This is incredible effort brought down Authority to Operation from 9 months to 2 hours. 🎉
Once more, It is all about breaking silos between all stakeholders including system engineering team, not just Dev&Ops!
One success: they detected a late discovery of an interface compatibility; they were able to quickly fix. (I will need to watch this video again.). These are amazing examples of true integration of systems integration into the daily work of the rest of the value stream. Amazing.
Some of the things that JD is talking about, he wrote on weekends to prove these concepts out.
For anyone who missed the context, JD's team is focused on shooting down incoming nuclear missiles
If people want a slower overview of the pandemic related mapping and lessons: https://youtu.be/XrSVXbWGZHw
If you’d like the decade later (emotional) birthday keynote to the hacker conference where we were born: title: “And together we crossed the River” 1 hr starts here: https://youtu.be/Eh6b1H_-U20?t=2008
Hoping everyone can join our talk, “APIs, Optionality & The Science of Happy Accidents” up next in Chelsea! It may not sound like it, but it’s a pirate story… Arrrrr! 🏴☠️
Reminder: The breakout sessions are starting in 5 minutes. Start navigating your way to whichever session you’re attending. https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
Pirates need wealthy merchant ships to plunder, and some legal cover to avoid the costs... :rolling_on_the_floor_laughing:
@internettitan @internettitan @mattmclartybc would Miners be a better metaphor than Pirates?
There are so many metaphors! Farmers might even be better… sow the seeds, see which ones grow
We talked about miners/drillers/farmers with the idea of natural resources… pirates won for two reasons 1) we liked the idea of the “land of 1000 shovels” 2) pirates are more fun 🏴☠️
Interested in optionality, you might like Chris Matts' and Olav Maasen's https://www.infoq.com/articles/real-options-enhance-agility/principle.
I'm new to DevOps community can someone give me a quick explanation of pets to cattle? Thx!
I'm new to DevOps community can someone give me a quick explanation of pets to cattle? Thx!
https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
Missing from that post are what I think to be some of the key elements of the metaphor: Pets have names and their owners care about them as individuals They are expensive Cattle are generally interchangeable, not named, and the owners of the herd don’t care very much about individual members of the herd It should be easy to cull problem cattle from the herd It should be easy to expand the herd In other words, you have a totally different approach to managing cattle than to managing pets
That's a quality trophy there @nathen.harvey and @amandalewis!
Culture is essential for people to thrive, which drives a thriving company
I’m sure @nathen.harvey said that point about the culture first 😉 . But there’s this person named John Shook… :thinking_face: https://sloanreview.mit.edu/article/how-to-change-a-culture-lessons-from-nummi/
Elite teams are back in 2023! Woot! ... wait, @nathen.harvey arguing against eliteness as a goal? Oh c'mon Nathen-with-an-E! 😋
Reminder: The breakout sessions are starting again in 5 minutes. Start navigating your way to whichever session you’re attending. https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
Reminder: The final plenary sessions are starting again in 5 minutes. Start making your way back to Chelsea. https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
And anyone who knows English can be in charge of computers... and data... 😂
Damon's big surprise: he found more enthusiasm for AI/LLMs in the internal operations people communities vs. external product revenue generating side of business. Why? The opportunities to reduce margins (people, toil, coordination...). Until now, difficult to automate.
Interestingly, IIRC, Patrick Debois' work was driven and funded by Marketing (at least in short term)
If you hear "train a model", stop the meeting. Too hard! 😆
Operationalizing vs building seems more relevant than buy vs build
Would need an AI to be quick enough to capture this great material... so quick 🤯
A big shock: I was blown away when I heard that companies like Notion were using 3-4K token prompts — which was 50% of the token limit of the time.
The real differentiator is not technology, but the ability innovate around it. (Agh, missed the quote.)
“Capacity to navigate” is an excellent focus for org performance!!!
@genek after @joshcorman’s talk this morning I feel like I should give you the shirt off my back.
@genek after @joshcorman’s talk this morning I feel like I should give you the shirt off my back.
After running 1hr workshops for two days, I highly recommend enabling constraints for making exercises productive :)
Wow. Depts participating! Internal audit! Risk! Compliance! Board member in charge of infosec!
Nice. • strategic work vs. "more work" • flow vs. headcount • missionaries vs. mercenaries • community vs. zero-sum
Great terms in this talk: “connecting org islands”
Another Marty Cagan name drop in the DevOps community. Nice.
🎉 Welcome, Moied Wahid, EVP, CTO Consumer Information Services, Housing, Verification Solutions, and Employer Services, Experian
100s of PB data. Access controls super critical. (All relevant to PII, I'm guessing.)
It's amazing to see the structure of these highly evolved MLOps stuff. Hundreds of models running in production, pipeline for getting them into production, A/B testing in performance, challenger models, etc. Input are data scientists in JupyterHub. Wild.
Product teams with cloud budgets - awesome FinOps practice.
🎉 Welcome, Sascha Schärich, DevOps Evangelist at Deutsche Telekom IT