This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-10-15
Channels
- # ask-the-speaker-track-1 (437)
- # ask-the-speaker-track-2 (251)
- # ask-the-speaker-track-3 (122)
- # ask-the-speaker-track-4 (136)
- # bof-american-airlines (3)
- # bof-arch-engineering-ops (3)
- # bof-covid-19-lessons (1)
- # bof-cust-biz-tech-divide (26)
- # bof-leadership-culture-learning (6)
- # bof-next-gen-ops (1)
- # bof-overcoming-old-wow (3)
- # bof-project-to-product (3)
- # bof-sec-audit-compliance-grc (11)
- # bof-transformation-journeys (4)
- # bof-working-with-data (1)
- # demos (57)
- # discussion-main (1491)
- # games (41)
- # happy-hour (162)
- # help (96)
- # hiring (12)
- # itrev-app (10)
- # lean-coffee (65)
- # networking (16)
- # project-to-product (3)
- # summit-info (199)
- # summit-stories (60)
- # xpo-atlassian (1)
- # xpo-delphix (48)
- # xpo-gitlab-the-one-devops-platform (2)
- # xpo-infosys-enterprise-agile-devops (2)
- # xpo-instana (3)
- # xpo-itrevolution (1)
- # xpo-launchdarkly (10)
- # xpo-moogsoft (3)
- # xpo-muse (9)
- # xpo-nowsecure-mobile-devsecops (3)
- # xpo-opsani (5)
- # xpo-optimizely (1)
- # xpo-pagerduty (18)
- # xpo-pc-devops-qualifications (5)
- # xpo-planview-tasktop (4)
- # xpo-plutora-vsm (1)
- # xpo-redgatesoftware-compliant-database-devops (9)
- # xpo-servicenow (1)
- # xpo-snyk (2)
- # xpo-sonatype (8)
- # xpo-split (9)
- # xpo-sysdig (25)
- # xpo-teamform-teamops-at-scale (6)
- # xpo-transposit (4)
loving the actionable inclusion exercise ideas from this T-mobile talk
Great story of making the hiring process by removing biases, including having the candidates perform in a real-work experience.
I try to find people that is different from the others but still have a DevOps mindset.
Not being "culturally fit" as grounds for rejecting a candidate has seemed vague to me. This makes sense, thank you!
I like people that have unique journeys into IT too.
Degree in CS or IT, work help desk, then into back office, then IT. I do hire those too, however I am interested in why they chose that path... I have undergrad degree in Landscape Architecture and love to draw sketch etc... but CAD is what converted me to IT.
The real difference is that an SLI supporting an SLO has to be CUSTOMER driven. KPIs can be business or leadership driven
@neil.kalinowski SLI is generally "how is it responding?". The other kinds of KPIs will look at "is it any good?" So a stable service does not have to be a good product but it is advisable.
just like the four metrics from State of DevOps are general hygiene. If the service has crashed, the UX does not matter much. If the CFR and MTTR both zero, it may still be the wrong product.
http://bit.ly/artofslos (Link from current Slide) Art of SLO Workshop!
We took the SRE folks and embedded them to the teams that had issues. and then they came back together to priorities which teams were higher priority for clients issues… then they swarmed the top issue constraints, then moved to the next problem… The SRE team always attend our BPM on the hunt for next larger issues… and when they speak at the BPM everyone listens…
Blameless Post Mortem (granted in the beginning it was not very blameless, I had to lead a book club with other managers using Dekker books “Just Culture” and “Drift Into Failure” then it became more blameless)..
@denver.martin Sounds like the "Netflix SRE model", where the SRE Team swarms in larger issues and act as consultants
It varies based on what is being observed and how the data is gathered and measured.. if there is good data then it can be short, 1 week, if it is harder where there are no measures then maybe 3 or 4 weeks…
I think one of the keys is that you define, as much as possible, what that embedded engagement is before it starts.. so it doesn't end up permanant
we try to fix quickly and move ops issues to Dev quickly with $$$ driven tech debt data.
that doesn't mean you can't "re-up" after that period, but don't leave it open ended
Agreed, we work out a number of sprints for fixes based on how many handoffs are involved.. this way each handoff is dedicated to 1 sprint..
@davidstanke532 How different are SLOs than NFRs? Seems like the SLOs are acceptance criteria for NFRs. And maybe an uncovering of NFRs you never knew you had.
@adam619 any chance you'd share that job description?
was think the same thing
I think uncovering is more likely than them being the same thing. NFRs are work that need to be done (possibly), where as SLOs are a trending indicator of the customer percieved reliability of a service
Let me double check if I can share that, its not public yet I don't think
Here's one: https://twitter.com/ahidalgosre/status/1252040324261740544
We have a super immature devops culture/understanding at my org. I put this together (still very much a draft) to help develop shared language and expectations
we have to look at there path to see where they have used the tools and think if they are in the right mindset … very labor intensive to find SRE…
And here's the other: https://twitter.com/davidstanke/status/1291011535171653634
"...I'm busy making the application non-funcitonal" 😆
Okay, snark aside: yes, there's certainly overlap, as Adam describes, both are aspects of prioritizing work... work that may not be self-evident ("important user X wants a feature" is self-evident. "we need more reliability isn't)
Also +1 to "uncovering"... we sometime say that we don't define our SLOs, we discover them.
Key Responsibilities
We are seeking a Site Reliability Engineer well versed in large-scale distributed systems. Someone who will own the reliability and performance of those systems ensuring that our customers have the benefit of highly available and extremely effective products. You will do this by creating a bridge between development and operations, applying your software engineering mindset to various topics inclusive of system administration, observability, reliability, and performance. You will utilize your deep experience to simplify processes through automation while developing production software to continuously improve reliability and performance.
We work with many languages and technologies critical to the success of our platform including Golang, Scala, Clojure and, C++. Chef, AWS, ScyllaDB, Kafka, Prometheus, Kubernetes and many more. We expect that you have experience with most of these and also a passion for becoming proficient with many more.
You will:
• Use data from our observability stack and incident trends to prioritize reliability improvements
• Provide architectural guidance on our critical customer facing services
• Contribute to sprint development, executing on availability and performance topics within our product roadmap
• Mentor and consult with product, development, and operations to drive reliability best practices
• Work with Product Management and Engineering teams to answer priority concerns for reliability fixes
• Define SLI/SLO/Error Budgets
• Improve observability across all services
• Participate in On-Call rotations shared with development teams
• Automate deployment capabilities and implement auto healing philosophies
• Collaborate with development teams on best practices, infrastructure setup, and planning activities with a focus on stability and performance
Great Talk! Amazing the work and turn around a good SRE team can do! Thanks.
Does anyone having issues with pulling video for "attacking the fuzzy end of value streams" from the library?
Hi everyone - here for any questions or comments
I assume the Portal is custom coded in-house?
How did you move to 100% automated tests when it came to the radio hardware (on target)? Are there any tools/frameworks you recommend having gone through that journey?
@condontrevor our coverage is for the cloud services and backend. The device itself does not have that level over coverage… yet…
would you be able to share an architecture diagram showing the integrations to your APIs and systems to collect the data? After putting together our central offering, I have now teams asking for something similar, so interested on seeing what others did instead of starting from scratch @jonathan.akers
thats awesome! Glad to connect, just let me know how 🙂
Not yet @bryan.finster but our open-sourced Dojo training modules are at: https://dxc-technology.github.io/about-devops-dojo/
I was just having a conversation about CAB and asking if the goal was value delivery or distribution of blame.
Actually - nothing even happens in a CAB. It’s a non event
Rare..... pretty much a rubber stamp
Only we are using ink made from francium dissolved in dragon tears.
I would rather have a rubber stamp of technical people in CAB than an education session of IT people managers.
Even the technical people are unqualified. They lack the team's context, history of the application, etc.
@halfmoondad how did you get consensus on how long it takes to go from idea to refined work?
This organization had all that tracked on a Sharepoint site. Typically though, the facilitator would ask questions of what typically happens and arrives at a reasonably accurate 'typical' time. It's good to reference actual recent use cases.
We struggle to get consensus from teams when we are facilitating one we move left of coding.
watersrumfall.... love it
How do you ensure the teams don’t ignore the improvements in favor of the day to day stuff?
Not easy. Leadership support (up front). Champion on team, and the Product Owner to commit to a percentage of backlog items (on average) to be included in sprints.
As leadership sees results we continue to reinforce the importance of this.
coaches at the team levels AND at the executive level to reinforce this, we've found is also key.
what is a good ongoing balance between new features and improvements. Considering improvements mostly = technical debt, is it right to assume the PO should reserve at least 20% of sprint capacity for the improvement piece?
yes, 10-20% is a good rule of thumb - not necessarily every sprint but averaging out per quarter
TechDebt should be first citizen in your backlog - not an optional element. PO should work with the team to put a priority and business value to tech debt elements same as features. So it gets prioritised accordingly.
good point I agree. Now, how do you deal with it when the POs are more business oriented than technical? How to ensure they dont let technical debt go on forever since they prioritize always on business value. I have my thoughts around it, but would love to hear what you think 🙂
In my experience, we seldom find POs who are more technical (except infraOps space). We often assume PO is an independent island - PO must collaborate with the team and the respective SMEs to understand what the items in the backlog means (if they don’t know). The first step to bring the transparency through creating a single backlog. One key learning for POs and the team is to understand (Work = Work). If the value / priority of the work is ranked higher (irrespective of the type of work), it should simply be done. On the question when we need to balance, balance according to the rank/value (not based on type) 🙂
okay, I guess you just answered with your slide and the 16%
What strategies have proven successful in reducing % of work that is unplanned?