This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-10-05
Channels
- # ask-the-speaker-track-1 (316)
- # ask-the-speaker-track-2 (312)
- # ask-the-speaker-track-3 (283)
- # ask-the-speaker-track-4 (309)
- # bof-leadership-culture-learning (3)
- # bof-project-to-product (10)
- # bof-sec-audit-compliance-grc (2)
- # demos (9)
- # discussion-connect-february (1160)
- # faq (14)
- # games (135)
- # games-self-tracker (4)
- # gather (6)
- # happy-hour (50)
- # help (175)
- # hiring (25)
- # lean-coffee (8)
- # networking (26)
- # project-to-product (3)
- # summit-info (219)
- # xpo-adaptavist (5)
- # xpo-anchore-devsecops (12)
- # xpo-aqua-security-k8s (3)
- # xpo-basis-technologies (17)
- # xpo-blameless (4)
- # xpo-bmc-ami-devops (1)
- # xpo-broadcom (2)
- # xpo-cloudbees (5)
- # xpo-codelogic-code-mapping (8)
- # xpo-dynatrace (1)
- # xpo-everbridge (6)
- # xpo-gitlab-the-one-devops-platform (6)
- # xpo-granulate-continuous-optimization (15)
- # xpo-infosys-enterprise-agile-devops (18)
- # xpo-instana (5)
- # xpo-itrevolution (15)
- # xpo-launchdarkly (7)
- # xpo-logdna (3)
- # xpo-pagerduty (8)
- # xpo-planview-tasktop (12)
- # xpo-rollbar (3)
- # xpo-servicenow (4)
- # xpo-shoreline (11)
- # xpo-snyk (6)
- # xpo-sonatype (6)
- # xpo-split (10)
- # xpo-splunk_observability (3)
- # xpo-stackhawk (1)
- # xpo-synopsys-sig (1)
- # xpo-tricentis-continuous-testing (4)
- # xpo-weaveworks-the-gitops-pioneers (4)
My session, "DevOps for Pandemics" kicks off at 12:20 EDT. Hope to see you all there.
Great themes for sure. please share more on breaking down the silos. how did you organize your product teams and not let the matrix structure come in the way.
I like that you speak of guests rather than customers. Has this always been this way or did it involve a cultural shift lately or some time ago?
Does Vanguard have regularly scheduled Chaos testing that impacts the full enterprise or are they more configured for a smaller targeted group? For example do you stress just a subset of applications or shared services?
the majority are very targeted. We have small-scale chaos experiments happening in vanguard's non-production environment every single day. Larger-scale chaos experiments happen less frequently, because of the coordination effort required, but we're certainly tackling this a few times a year with aspirations to get even better at this once we have fully activated our SRE operating model
For the Large-scale chaos testing what metrics do you look to collect? For example, total apps that failed, degraded service, latency?
My apologies for the detailed questions but very fascinated in how you approach these types of tests.
We go in with a set of hypotheses for every test we run. Which metrics/SLOs we look at depend on the type of test we’ve crafted. It changes every time! But everything you’ve listed has been used at some point, yes.
Vanguard Team: Thanks for sharing! This was really great. I will definitely watch it again.
What specifically do the SRE Leads provide to the Product teams? For example do they just explain policy to provide clarity? Do they review monthly metrics with the teams or recommend best practices?
SRE Leads will consult on infrastructure/architecture decisions, ensure alert portfolios and SLIs/SLOs are being reviewed at the appropriate frequency, facilitate post-incident reviews across the product teams, and co-ordinate and conduct chaos and performance testing for the products in collaboration with the engineers on the teams. It's a role that is a combination of hands-on technical and facilitation
In other words, the SRE Lead workload is a balance between chaos testing and production app delivery
How does Vanguard approach the organic process of tuning alerts? Is there a strategy you recommend? Do you use the number of false positives to drive improvement? Curious if your approach is different other than just having teams add the work to their backlog and address it as time and capacity permits?
I recommend quarterly alert portfolio reviews at a minimum, but I was once on a team that did alert reviews weekly! I always suggest that people use Cory Watson's CASE methodology for alert construction as a guide for tuning alerts, and actually calculate their signal-to-noise ratio and track it.
Now, that's quite a bit of overhead, so not every team is doing this today, but I'm hopeful that adding SREs into the mix will help to ensure ALL teams have the available bandwidth to prioritize this critical work!
Fascinating. This is the first I heard of Cory Watson's CASE methodology. I will study up on that. Thank you
@rdaitzman and I followed along in the #ask-the-speaker-plenary channel during the talk. I'm going through this channel now to catch up on any questions we may have missed that were directed here!
How long has Vanguard been using OpenTelemetry and how many applications are using it?
I don’t know the exact numbers, but it’s at least a hundred so far! We started using it about a year ago, evaluating it a year and a half ago. In combination with Honeycomb for visualization of traces, it has been a total game changer!
🎉 Let's get ready to welcome @smack from Wiley, presenting DevOps for Pandemics 🎉
It sounded like you were scaling to meet demand as the pandemic started. With Wiley being used in online learning, were there already major lessons in scaling from summer to school year that helped you be prepared for the need to scale?
Yes. Most definitely. We have cyclical patterns which we prepare for all year round Most notably the back-to-school period in fall.
Of course, we had that + a pandemic this past year which was something we had never seen before 🙂
Another great story of vast mobilizations due to COVID. Thank you @smack!
Hah! "Socially distance your applications." Really like that way of describing it
Can you share sample of transparency and observability dashboards?
Sure @alshah1 The Business Continuity Dashboard was a prime example of this. We shared metrics about system performance and business performance across the business.
It's easiest to see this in contrast to siloed tooling where teams only have access to their data. Network teams have access to network data, database teams have access to db data, etc. Our approach was to fundamentally change this and get out of these information silos
Thank you, @smack. Enjoyed your talk and really appreciated the emphasis that you placed on training and the importance of a learning culture.
📣 We now welcome the team from ING – @aurel-george.proorocu @mihai.roman2 @misupriest1 presenting How's Your Bank Working From Homehttps://devopsenterprisesummitus2021.sched.com/editor/schedule?id=7b3f2357d56855b48a80220d6ea684f8#edit 📣
@smack sounds exactly what we are trying to do in my team, what kind of tools you have used for building Business cont. dashboard, aggregation from other monitoring tools, my guess, like Grafana
@alshah1 For this we used Power BI but we also use Grafana for more of day-to-day operational information.
Thank you for sharing @smack , great session!!
I think we had one or two official "no meeting days".. but wasn't a long term thing
We've seen a trend toward fewer formal meetings throughout the week, many more smaller, impromptu Slack "meetings" as needed, AND blocks of "flow time," blocked on the full teams' calendars during which no meetings are allowed. Sort of a hybrid approach.
NOAA/NESDIS has started doing this on Fridays... but now I find that everybody's M-T are so full that when we need a new meeting to discuss something one-off meeting... then it ends up being on Friday because that's the only time everyone is free.
Too much work in progress, a chronic symptom for organizations to address.
Walking one on ones are a great tool, I love that.
Yes, this was something very appreciated by our engineers because people were really missing social interaction and in some cases they were not very comfortable to discuss different sensitive subjects on Teams. In my case I also had some people that were hired at the beginning of the pandemic, so we never managed to see each other face to face before our "walking 1:1".
Good point about paying attention to "micro actions" with the camera on! As long as we can avoid Zoom fatigue!
I think that it's also real due to so much context switching if we are going from back to back. in which case, stop that. right?
Indeed it is! Maybe a 50-50 or whatever ratio works for camera on vs off? For example in 1:1's and workshops, camera on by invitation..
unfortunately, the zoom fatigue is something we, all, have to learn to deal with it in the same way as we have learned to work from home for long periods.
@ganga.narayanan, not success recipe for on vs off camera. The context of the discussion/meeting is the one that decides most of the time for us. As human interaction is an important factor, I would say for 1:1 should be on 😉
I have zero poker face so most know if I turn mine off it's to hide if I"m vehemently disagreeing with the conversation and composing myself. lol
True true! Same with me - zero poker face! But we haven't forced people to turn their cameras on. A lot of people keep them off most of the time..
We struggle with this too. It is a common discussion point with both our team leaders on Scrum Masters. We gently encourage in subtle and fun ways, but don't push it hard.
How did ING deal with all the time zones in 6 countries when working a normal work day?
working 9 - 5 is “old school”, flexibility remains the key. Having the right environment and the space will solve any time zone differences.
I don't know if I agree with "9-5" is old school. The only way I can maintain a proper work/home-life balance is to keep strong start/end times to my work from home day. I strongly set 9-5. When my day is over my laptop is shut and work is OVER. There is no checking e-mails after the kids are in bed or anything like that for me.
Will the "new normal" at ING result in returns to physical offices? To what degree?
Good one and not an easy one. We make sure to plan the meetings so that most of us can align.
Thank you all for joining our talk! Please let us know if you have any questions and we will reply asap.
@tom.wojtusik not so much. 2 days a week in the office and the rest at home. The teams are asked to align so that in those 2 days everyone is in the office

🌟 Coming back from the break, let's welcome @bryan.finster486 on How to Misuse and Abuse DORA Metrics 🌟
Hey, @bryan.finster486 — looking forward to hearing about weaponizing DORA metrics!!! 😆
Hey Brian, how you doing? Weaponizing depends on from how far away I can launch my metric...
Although, the details for weaponization are there, repeatable set, with clear results, that deliver a static response
I don't understand that last comment. Product teams and pipelines over scaling frameworks. Why not both?
Release trains are designed for teams delivering together. We didn’t want that. It was slowing us down.
I am not sure that is accurate understanding of "Continuously Deliver and Release on Demand"
@bryan.finster486 I expected more critique there for scalable frameworks 😉
You started with deliverying one app: Jigsaw. That is not the same thing as delivering large integrated value at scale. What happens when you need the teams alligned to deliver value.
I have so much respect for the efforts inside USAF to drive DevSecOps — so I couldn’t help but read this: https://www.theregister.com/2021/09/03/usaf_chief_software_officer_quits_angry_post/ Hope that the mission goes on, @bryan.finster486!
We are driving the mission forward. I had dinner with him after this. The mission continues.
This was a really poignant post by Nic. And Bryan is right... the mission, or should we say missions, continue!
His open departure letter: https://www.linkedin.com/pulse/time-say-goodbye-nicolas-m-chaillan/
Coverage correlation with quality/effectiveness seems to rear its head time and time again. https://neverworkintheory.org/2021/09/24/coverage-is-not-strongly-correlated-with-test-suite-effectiveness.html Relatively recent article around this topic… not peer reviewed, but purely observational.
My takeaway is that the expectation needs to be a sentence. As a ... I want ... so that...
Instead of just a couple of words that are easier to interpret however I want.
I was working on making some better metrics yesterday. How do you improve the quality and quantity of metrics about how much time is spent on helping in different areas without logging minute by minute? Is there an interval that has worked well? I'm wanting to enable other teams to get better outcomes but trying to show both that it is where I'm spending time and actually proving improvements.
PS: @vmshook I’ve been meaning to email you about this: I’ve been dazzled by the MARFORCYBER people, who spent an entire episode discussing the article above. https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy8xZDQ5YWIyMC9wb2RjYXN0L3Jzcw/episode/OWEzZWYzMTYtY2E1OS00MTA2LTg1NzMtZDZlOTQxMTdhZDAz?hl=en&ved=2ahUKEwjx5PKd87PzAhU8IjQIHVhEAEwQieUEegQIGBAI&ep=6
“handed out six cases of books, handing it out to people, saying, ‘please read!’” 😆
This is an incredibly frustrating problem, "People don't read books" I read this book and could not get others to read it in the AF.
Can't lie.. i've become a @bryan.finster486 fan! 😄 always speaking great truths..
Can' ask for context with metrics? What kind of heresy is that? Apples to apples not oranges to elephants
And for the love of all that’s holy, READ PAST PAGE 19!!
@bryan.finster486 So for those fictitious dashboards (ahem), what was the (fictitious) context? In what way were they misrepresenting the situation?
Strange that MTTR and lead time were so out of whack.
And then they wonder why they're not a high performing org. OBVIOUSLY they need new contractors...
Here's what I've been thinking. How many of the Five Ideals do the four metrics cover. Not a lot.
Is there a place to watch for when this is open sourced?
https://repo1.dso.mil/platform-one/big-bang/apps/sandbox/holocron
Yep, read that and you’re done :rolling_on_the_floor_laughing:
This is a great talk! My organization has absolutely had a similar experience. We've learned that DORA metrics are great indicators but two "next level" measures are Value Stream Flow and Business Outcomes!
Love this - I've been screaming from the rooftops that no organization should ever be going faster for the sake of going faster. What is the downstream benefit to actual business outcomes!!
This reminds me of the quote: > I made up the term ‘object-oriented’, and I can tell you I didn’t have C++ in mind > -- Alan Kay, OOPSLA ’97 “This is not what we had in mind when we wrote Accelerate.” 😆
Oh, and DORA metrics showed up in GitLab S-1 filing!!! Wild!!!
It’s part of the lifecycle of ideas. I have equanimity about this. 🙂
Oof. Hoefully it only took a couple of minutes that wire that dashboard…. 🙂
I say that because the DORA metrics have to be traceable to business outcomes in order to avoid them becoming proxy/vanity metrics.
What have you found effective to implement OKRs so they don't devolve into just a new form of arbitrary deadlines?
Jon Smart’s got it nailed. He helped kick them off in Platform 1
"flow metrics from teams moving cards" Ive had this conversation, our pipeline is based on when the install happened, time on keyboard, not when someone moved the JIRA ticket the following monday
I might have missed it, but did @bryan.finster486 mention that the throughput metrics are about size of change vs. speed of change?
What would your official training instead of hobbyists look like? For now, I'm at a job where hobbyists is the main way I am teaching myself 🙂
This talk is so profound! I feel like I will need to listen to it a few times. There are so many nuggets in each sentence
This is so timely for us. We're trying to figure out what we could/should measure to show our dojo adds value to the organisation . I'm happy to see I'm not the only one looking at typical DevOps metrics and wondering if focusing on them could become problematic.
A good portion of SSH in the IT rev library. There is a link in the reading room.
Does anyone have any recommendations for how to measure your Value Stream Flow.? Pipelines often have many tools which make measuring end to end flow challenging. Any recommended dashboarding or measurement tools and or techniques?
Thank you!!! Keep collecting stories, @bryan.finster486 — let’s figure out what to do with this next year!!! And keep up the great work at USAF!!!
@bryan.finster486 Thanks. Some slides are going to be stolen 😄
Thanks so much everyone. I’ll be at the bar later. 🙂
Bryan, great talk, but it is just a starting point. The devil is in the details! Is there a BOF for talking details?
We should do that. I agree. It’s just the beginning. I wanted to say so much more.
I vote for a BOF, Happy Hour and then After-Happy-Hour Happy Hour session @bryan.finster486. Just let me know when and where. 😃
✨And now, we're honored to have @dff here to present: Thinking Upstream About White House Cybersecurity Executive Order 14028 ✨
We (Tidelift) contributed comments for the NTIA requested comments, alongside a bunch of others https://www.ntia.doc.gov/other-publication/2021/comments-software-bill-materials-elements-and-considerations
Tidelift’s specific comments Re: Software Bill of Materials Elements and Considerations here: https://www.ntia.doc.gov/files/ntia/publications/tidelift_-_2021.06.16.pdf
I also appreciated Google’s comments in the NTIA process, which included this
Here’s a link to the 2021 Tidelift open source maintainer survey referenced in the talk: https://tidelift.com/subscription/the-tidelift-maintainer-survey
And here’s the companion survey of organizations that build with open source, also cited in the talk-- https://tidelift.com/subscription/2020-managed-open-source-survey
If you’re interested in following up on anything in the talk or finding out more about Tidelift, my contact info: <mailto:dff@tidelift.com|dff@tidelift.com>
Love this talk, @dff! One of my concerns with paying the maintainer is that would maintainers intentionally add bugs (there was a great article on this regarding the U of M and open source software). I think we can trust the maintainers because there should be a number of them with checks and balances. Do you have any thoughts around this?
My perspective--it’s all in how you setup the incentive system. For example, Tidelift doesn’t pay a bounty per issue resolved, instead we pay maintainers who agree to work with us to ensure their software meets specific security, licensing, and maintenance standards.
Details on Tidelift coverage here https://tidelift.com/catalogs
TLDR: Broad coverage of language-level application development packages in JavaScript, Java, PHP, Ruby, Python, .NET and emerging coverage of Go, Rust
How scalable is managed open source? There are some orgs that use a very high number of open source dependencies
Yep, typically the organizations we’re working with are using tens of thousands of discrete open source dependencies. You can see the breadth of our current coverage at https://tidelift.com/catalogs
Does a donor specify which projects they want to support, or is it a "general pool"?
The more Tidelift’s paying subscribers use a particular open source package in their applications, the more income that partnered maintainer receives. Think kind of like Spotify artists paid based on song play counts.
Wow! What a big idea! Thank you @dff — I will definitely be watching your talk later this week! I was so much looking forward to your presentation! And thanks for everything you’ve done to advance important pieces of the software ecosystem, such as Akka, etc. Looking forward to great interactions ahead!
What does Tidelift do to enforce that critical vulnerabilities are addressed?
More on how Tidelift reviews security vulnerabilities: https://support.tidelift.com/hc/en-us/articles/4406293292948
• Sure, but in that process, I see a maintainer can: Create an exception - The vulnerable release stays approved in your catalog.
There doesn’t seem to be a way to enforce a maintainer to fix a vulnerable transient dependency? Or am I missing something?
That’s referring to the fact that as an organization using Tidelift, you can choose to override Tidelift’s maintained-based guidance — in other words if you want to “force approve” a release where we’ve flagged a security issue.
For transitive dependencies, we work jointly with our maintainer network — sometimes a few packages need to be independently updated in coordination (say, one to fix the vuln itself and another to update a version-locked dependency on it)
🔆 Excited to now introduce @tbannon, here to talk about DevOps’ Missing Link: Data 🔆
What’s so difficult about reproducing errors?! This is so good, @tbannon 😆 😆
I know a myriad of teams who have this problem but are not spending time on it in a way that they'd regard it as a priority... So hard.
“I frown on using production data because… we’re not masking it. Less than 50% mask their production data” 😱
Hey, you could steal it from a competitor and mask it to your needs... with 1:2 odds 😄
The other option seems to be gatekeeping teams from any test data at all.
"We can't test because we ran out of fake social security numbers."... I am so near to shellshock right now.
My old favorite was “we don’t have good enough fake data, so we won’t test at all” from long ago.
What? It’s about correctness, testing and data. Doesn’t get much more interesting than that! 🙂
"We have full test automation" - "How do you manage test data" - "Hmmm. We always have problem with that"
Thanks @tbannon. A great collection of ground rules. A lot of people need this revelation.
Reminder: The plenary sessions are starting again in 5 minutes. Start making your way back to your browser and join us in #ask-the-speaker-plenary to interact live with the speakers and other attendees. https://devopsenterprise.slack.com/files/UATE4LJ94/F01D34MC2KS/image.png
Drat, not writing fast enough that was: 1. Competance 2. ? 3. Accountability 4. Delegated Authority.
@andy744 all the action is happening in #ask-the-speaker-plenary now 🙂