This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
@sebastian.bertoncini https://videos.itrevolution.com/watch/549298333
Reminder: Get yourself to your seat in Chelsea for the opening remarks. Weโre kicking off the final day of the DevOps Enterprise Summit in 15 minutes at 8:45am PDT! https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
Reminder: We want to hear your stories from the Summit. What did you learn? Whom did you meet? What ideas are you taking back with you? What actions are you planning to take? Post in #summit-stories! https://devopsenterprise.slack.com/files/UATE4LJ94/F05UKU0HBTP/stories.png
Reminder: Remember all those talks you attended the first two days of the Summit? Please submit your feedback for those! Itโs so valuable for us and the speakers. And after all, feedback is a gift and sharing is caring! Enter your feedback for those talks here: https://doeslasvegas2023.sched.com/ https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4P024W/feedback.png
๐ Kicking us off this morning is: George Kraniotis, Director of Software Engineering and Erin Daugherty, Director of Product from Discover Financial Services.
Forget the program committee; I just wanna apply for the music team. ๐ฅ
Product areas at Discover Financial Services: Card set up, manage my account, re-engage with my card, transfer balance, strategic partnership, credit actions, portfolio enablement
I donโt know how this talk ends, but the Sweet Tango Apple is the new best Apple in the world.
Another fine product of the University of Minnesota (who also developed the Honeycrisp).
Thatโs what I hearโฆbut until I have more sample size data - Iโll still be in the honeycrisp camp.
What a duo! George: leading 170 technologists; Erin leading 17 product leaders
Card Posting and Transactions (post transactions, calculate interest and fees, among dozens of capabilities, spanning cloud and mainframes): 100+ engineers, 5700 batch jobs, 60+ components;
Last re-architcture of CPT? The year 2000. (23 years ago!!!)
I promise to post my Spotify list to this Slack soon!
A lot easier to add business logic in a big zoo than clean it up...
"We needed to stop treating CPT just like a backend ___". (what word did she use? In my head, I filled in "dumping ground", but I know that's not what she said)
"Even our most senior engineer didn't realize how reliant they were to things in CPT"
90 days of event storming! That gives a real idea of complexity and scale ๐คฏ
What we discovered: CPT was 13 domains and 65 capabilities. (There was no way one team could support all these domains because of cognitive load)
90 days of event storming to do the archaeology of 23 years of strata of changes (in the dumping ground)
Run ๐ from the monolith!
First two domains: API Engagement, Delinquencies; the result were two engaged teams, now 8 dedicated and persistent product teams; and acknowledging need for shared platform team
When the business can name an engineer who helps solve their problems, THAT is a problem!
Ah. "Treat CPT as a product, not just as a backend technology"
I still have nightmares of people asking me โDo you have a P-code?โ. Funding product teams for the win! :first_place_medal:
@erindaugherty - how do you think about โmaking business logic human readable and accessible to allโ? Iโd love to hear more about this!
๐ Next up: Rosalind Radcliffe, IBM Fellow, CIO DevSecOps CTO at IBM
To state the obvious: I could say those things in my introduction because there's no way that Rosalind could. ๐ But I thought it was super necessary to set the context.
11 data centers, which now longer own. 1K+ applications managed as pets. Your mess for less, and stays a mess. And now you need bring it all back in!
"What skills mainframe shortage? I found them!" (She was explicitly not allowed to hire from any IBM clients. But if I understand correctly, some people chose to retire early and join Rosalind at IBM.)
It says a lot about the leader when people continue choosing to work with that leader. Go, @rradclif!
There's a whole other talk here on building great teams that she could tell. I want to hear it too.
"I have systems that have been written 50-60 years ago. I have systems that were written to support the moon launch that are still running". (!!)
And we were thinking the 23 year Discover CPT platform was potentially heritage!
"If they actually need to go into production, it will be read-only"
It's fascinating that all this innovative improvement work is happening only when these thousands of systems are being insourced โย as opposed to the economics that lead to "your mess for less, which will stay a mess"
I have never heard someone say 8 9โs with authority and credibility before. It feels weird.
Quite the opposite. I believe herโฆ because she IS โthe Hammerโ! ๐
315.58 milliseconds per year of downtime is essentially 0. I agree, measured how?
Measured by systems availability and application availability with 0 minutes planned downtime. It helps to start with hardware designed for it but it also takes running a sysplex which allows you to move work dynamically without impact.
Number of production access tells you how mature your DevOps practice is.
She is putting on a masterclass right now on how to humble brag about being the best of the best in leadership.
99.999999% (8 nines of availability) monthly downtime would be approximately 0.026 seconds.
Focus on Run vs. Change โย I have to imagine there can be no ROI on rewriting 1000s of applications that have been running critical business processes for IBM for decades. Interesting technology change.
315.58 milliseconds of downtime per year. Gracious.
Would love to learn about how this was setup
This amount of (cultural) change for a company that still supports moon-landing services is inspiring
There is so much courage behind this talk @rradclif in leading such a change. It is not only knowing the tech, you got talents to follow you. Bravo! :right-facing_fist::skin-tone-2: :left-facing_fist::skin-tone-2: ๐
There was a talk at DOES last year about re-skilling and giving folks an opportunity to be welcomed into our open community.
Yes! A Lightning Talk by Alex McCleod. She is absolutely amazing.
I will pass along the message to her. She will be so touched that her talk was remembered. Her work with ReUP is so important.
Also very impressive her career and credentials as a woman and how rare that was when she started her career!!
Absolutely, I have no people management responsibilities but do have leadership responsibilities
๐ Next up: Christof Leng, SRE Engagements Engineering Lead at Google
Mohawk, Bond music and awesome tech talks! You've got to love tech oriented conferences!!
My favorite @cleng quote: "you really don't understand a system until it's on fire... in production... with live customer traffic!"
My favorite @cleng quote: "you really don't understand a system until it's on fire... in production... with live customer traffic!"
Nothing like a failure in production to surface unacknowledged dependencies...
And some generals are demanding an explanation....and time to correct...
Unreliability CAN be taken for granted.
Drifting from economy of abundance to economy of scarcity... applies well to the reliability
People need to be able to point out the weaknesses and vulnerabilities in our systems.
Less that it's lncentivized more that its not disincentivized - the incentive is then by admitting the issue exists that others will help solve it and you will be able to learn from the process
โWhere there is fear, you do not get honest figuresโ - Deming
How dare someone from Google highlight that something like DORA metrics arenโt the only thing to measure ๐ณ
I imagine you folks may spend more time sharing the disclaimer than the metrics these days ๐
โThe best way to understand a system is to watch it go up in flames with real user traffic.โ ๐ฅ ๐ณ :rolling_on_the_floor_laughing:
"When we look at history, heroes tend to have a short lifespan."
In a world that rewards fire fighting, no one wants to go into fire preventionโฆ
Heroes have a short lifetime expectancy ๐ ๐
If you reward firefighting you get an army of arsonists
Ops worst nightmare. You're in the middle of the night, working an outage, alone, not knowing what to do.. and you have no one to call.
your therapist and pizza/beer delivery peeps
Donโt run buckets of water into the fire faster, invent a fire truck.
โAutomate yourself out of your current set of tasks every 18 monthsโ
"You need to aggressively automate. Not for efficiency, but for consistency. Without doing this, you get more pets."
It would be nice to expand our community understanding of heroes in the system. I'm hearing it spoken about as always a bad thing but I don't think that's universally true. Sometimes a great outcome of a strong system is that its design inherently offers the space for heroes to emerge.
I like @genek โs concept of โsceniusโ and โinteresting friendsโ instead. โHeroesโ usually fly solo, which leads to dependence on an individual to solve problems rather than a community that changes the scene.
Heroes are celebrated because they protect and save us from a disaster... but we need to learn from the hero's journey to make the necessary and difficult changes so that the hero can retire
So if we talk about heroes, it should be more about the Justice League than Superman.
Being a hero should only happen under very special circumstances and not be a career path. Like production freezes, you sometimes need heroes, but you should never mistake them for the solution to your problem.
Agreed. And I don't mean hero as in the weekend worker... I mean the team member who emerges to solve incredibly vexing problems. Like the enlisted member who fixed the water pump on Adm Richardson's nuclear submarine. The culture on the sub allowed him the space to fix that pump on his own. It encouraged his behavior. we all surge when we have to but must take liberty when we can
@mreele perhaps the distinction is in the definition of a hero and designing systems that try to minimize the need for a hero. In the situation you described, there was a need for someone to step up, and absolutely agree that the culture provided an outlet for them to do that. But what happens next is of interest to SREโs; how do we build systems that minimize the need for people to do extraordinary work to accomplish ordinary goals? Itโs a balance, and unfortunately, tech has a lousy track record of exploiting extraordinary behaviors instead of systematic thinking about ways to meet that need.
But yes, we should build a culture that encourages people to solve hard problems and be bold in doing so, but then goes the next step to figure out ways that those bold steps are required less often for system dependencies.
Yep, I completely agree. I personally am having dissonance with the concept of heroes in the system as part of the system design and encouraging that exceptional performance yet needing an architectural resiliency that doesn't require heros at all.
Iโd love to hear more discussion about this distinction as a community. I think thereโs enough ambiguity around the term hero that it warrants it. And this brief exchange showed me that Iโm looking at it from one lens and thereโs definitely more.
โPeople think automation is about efficiency. It's really about consistency.โ
Was thinking the same thing as Christof described winging it, pushing out config change without version cont rol or review. ๐
Production freezes donโt solve the underlying problem. They just pause them, temporarily.
Yes! They're a mitigation, not a fix. Sometimes you need a mitigation, but you should never confuse it with a fix.
:thinking_face: If you add a $ to โDonโt deploy on Fridaysโ, does Charity still appear in the chat?
We all test in production. Some more safely than others.
I should have said: When your code hits production should not be whenit gets tested for the first time.
โDonโt be a Hero. SRE heroes leave an influence and that creates a culture of over workingโ
The bigger your forest, the more lightning strikes. Donโt try to reduce that count, improve your response.
Everyone has a test environment, some people are lucky enough to have a separate production environment...
Anyone can build complex systems (even by accident) - try building simple systems
Hilarious. Christof always pushes the big red buttons, and deletes code that says "DO NOT TOUCH" to see what happens. ๐
Signs that my 3yo has strong SRE potential: (1) likes to push every red button (2) likes to break things to see what happens (3) gets overly excited around chaos (4) likes to nap during office hours
"Don't touch" technical debt is a booby trap for change.
Lines of code deleted is my favorite stat in a pull request
๐ Next up: Josh Corman, Founder, I am The Cavalry (dot org)
The person in the upper-left is the late Dan Kaminsky, famous for his work for "saving the Internet" by coordinating a critical DNS fix. https://en.wikipedia.org/wiki/Dan_Kaminsky HD Moore, inventor of Metaspoit: https://en.wikipedia.org/wiki/H._D._Moore
Your capacity is equal to the capacity of your constraint
The work and ๐ง power of this collective community is being applied to critical life or death situations! โค๏ธ
Latency sensitivity is a great lens for prioritization and sequencing โฑ๏ธ
๐ Next up: JD Black, Director of Digital Transformation at Northrop Grumman
Helluva call to action. Iโll reiterate my recommendation of (Re)coding America. The Calvary isnโt coming. Become the Calvary.
Love the passion and call to action I feel after Josh Cormanโs talk! The Calvary isnโt coming!!! Bravo sir!
Public policy and regulations aren't sexy. But it's often written in blood.
So true. Bad policy leads to many harms, including death, but those responsible rarely are held to account. Incentives...
Systems engineering: 50 pounds of software ---> 250 pounds of documentation. Systems engineering enabled massive successes, but they don't know what to make of DevOps practices.
This is incredible effort brought down Authority to Operation from 9 months to 2 hours. ๐
Once more, It is all about breaking silos between all stakeholders including system engineering team, not just Dev&Ops!
One success: they detected a late discovery of an interface compatibility; they were able to quickly fix. (I will need to watch this video again.). These are amazing examples of true integration of systems integration into the daily work of the rest of the value stream. Amazing.
Some of the things that JD is talking about, he wrote on weekends to prove these concepts out.
For anyone who missed the context, JD's team is focused on shooting down incoming nuclear missiles
If people want a slower overview of the pandemic related mapping and lessons: https://youtu.be/XrSVXbWGZHw
If youโd like the decade later (emotional) birthday keynote to the hacker conference where we were born: title: โAnd together we crossed the Riverโ 1 hr starts here: https://youtu.be/Eh6b1H_-U20?t=2008
Hoping everyone can join our talk, โAPIs, Optionality & The Science of Happy Accidentsโ up next in Chelsea! It may not sound like it, but itโs a pirate storyโฆ Arrrrr! ๐ดโโ ๏ธ
Reminder: The breakout sessions are starting in 5 minutes. Start navigating your way to whichever session youโre attending. https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
Pirates need wealthy merchant ships to plunder, and some legal cover to avoid the costs... :rolling_on_the_floor_laughing:
@internettitan @internettitan @mattmclartybc would Miners be a better metaphor than Pirates?
There are so many metaphors! Farmers might even be betterโฆ sow the seeds, see which ones grow
We talked about miners/drillers/farmers with the idea of natural resourcesโฆ pirates won for two reasons 1) we liked the idea of the โland of 1000 shovelsโ 2) pirates are more fun ๐ดโโ ๏ธ
Interested in optionality, you might like Chris Matts' and Olav Maasen's https://www.infoq.com/articles/real-options-enhance-agility/principle.
I'm new to DevOps community can someone give me a quick explanation of pets to cattle? Thx!
I'm new to DevOps community can someone give me a quick explanation of pets to cattle? Thx!
https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/
Missing from that post are what I think to be some of the key elements of the metaphor: Pets have names and their owners care about them as individuals They are expensive Cattle are generally interchangeable, not named, and the owners of the herd donโt care very much about individual members of the herd It should be easy to cull problem cattle from the herd It should be easy to expand the herd In other words, you have a totally different approach to managing cattle than to managing pets
That's a quality trophy there @nathen.harvey and @amandalewis!
Shout from the rooftops!!
Culture is essential for people to thrive, which drives a thriving company
Iโm sure @nathen.harvey said that point about the culture first ๐ . But thereโs this person named John Shookโฆ :thinking_face: https://sloanreview.mit.edu/article/how-to-change-a-culture-lessons-from-nummi/
Elite teams are back in 2023! Woot! ... wait, @nathen.harvey arguing against eliteness as a goal? Oh c'mon Nathen-with-an-E! ๐
Reminder: The breakout sessions are starting again in 5 minutes. Start navigating your way to whichever session youโre attending. https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
Reminder: The final plenary sessions are starting again in 5 minutes. Start making your way back to Chelsea. https://devopsenterprise.slack.com/files/UATE4LJ94/F05UG4ZGTLN/timer.png
๐ Welcome, Damon Edwards, Senior Director, Product at PagerDuty
And anyone who knows English can be in charge of computers... and data... ๐
Damon's big surprise: he found more enthusiasm for AI/LLMs in the internal operations people communities vs. external product revenue generating side of business. Why? The opportunities to reduce margins (people, toil, coordination...). Until now, difficult to automate.
Interestingly, IIRC, Patrick Debois' work was driven and funded by Marketing (at least in short term)
If you hear "train a model", stop the meeting. Too hard! ๐
Operationalizing vs building seems more relevant than buy vs build
Would need an AI to be quick enough to capture this great material... so quick ๐คฏ
A big shock: I was blown away when I heard that companies like Notion were using 3-4K token prompts โย which was 50% of the token limit of the time.
The real differentiator is not technology, but the ability innovate around it. (Agh, missed the quote.)
โCapacity to navigateโ is an excellent focus for org performance!!!
@genek after @joshcormanโs talk this morning I feel like I should give you the shirt off my back.
@genek after @joshcormanโs talk this morning I feel like I should give you the shirt off my back.
After running 1hr workshops for two days, I highly recommend enabling constraints for making exercises productive :)
An AI support agent with a human voice?! ๐คฏ
Wow. Depts participating! Internal audit! Risk! Compliance! Board member in charge of infosec!
Nice. โข strategic work vs. "more work" โข flow vs. headcount โข missionaries vs. mercenaries โข community vs. zero-sum
Great terms in this talk: โconnecting org islandsโ
Another Marty Cagan name drop in the DevOps community. Nice.
๐ Welcome, Moied Wahid, EVP, CTO Consumer Information Services, Housing, Verification Solutions, and Employer Services, Experian
"DevOps is table stakes now. ML is still super complex."
100s of PB data. Access controls super critical. (All relevant to PII, I'm guessing.)
It's amazing to see the structure of these highly evolved MLOps stuff. Hundreds of models running in production, pipeline for getting them into production, A/B testing in performance, challenger models, etc. Input are data scientists in JupyterHub. Wild.
Casually fixing a bug in the open-source spark code! ๐ช
Product teams with cloud budgets - awesome FinOps practice.
๐ Welcome, Sascha Schรคrich, DevOps Evangelist at Deutsche Telekom IT