Fork me on GitHub
#ask-the-speaker-track-1
<
2021-05-18
>
Adrian (Tasktop)09:05:57

fantastic courage and commitment from everyone involved at TUI to bring about such a massive transformation!!

❤️ 1
Jeffrey Fredrick, Author-Agile Conversations09:05:21

fyi, main conversation currently in #ask-the-speaker-plenary

👍 2
Ann Perry - IT Revolution11:05:18

Coming up in a few minutes – @ben.connolly and @sabina.kambersalamanc

Ben Connolly, Head of Engineering, Vodafone11:05:25

Let's GOOOOOOO! 🙂 We're here for any questions or anything else folks.

👍 7
🙌 2
👋 1
Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)11:05:45

Yes, very excited to be here.

🎉 6
👋 1
Jon Smart [Sooner Safer Happier]11:05:49

"Becoming a product & value led company"

Jon Smart [Sooner Safer Happier]11:05:25

Micro [ongoing] transformations

Katharine Chajka (Tasktop)11:05:28

curious to hear the key to unlock! Curious the role leadership at the top played vs bottoms up

🙌 1
Jon Smart [Sooner Safer Happier]11:05:48

Interesting: quite a bit in Africa

1
Jon Smart [Sooner Safer Happier]11:05:25

Are the 'red' shaded countries where software engineering is, or Vodafone more broadly?

Ben Connolly, Head of Engineering, Vodafone11:05:28

that's vodafone's footprint generally. purple are partner markets. (mental note to add a legend in future)

👍 1
Jon Smart [Sooner Safer Happier]11:05:23

"Over 100 KPIs, with targets"

🤯 2
Ben Connolly, Head of Engineering, Vodafone12:05:57

and amazingly almost every one was hitting them. :thinking_face:

🤯 2
Jon Smart [Sooner Safer Happier]12:05:53

Continuous improvement is part of the DNA

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:21

Yep, in every essence "it felt like that and continues to feel like that to date"

💯 1
Chris12:05:58

insanity is to keep doing the same thing over and over again and expecting different outcome... Except if you consider already delivering perfection, there's always room for improvement. 🙂

😀 1
Jon Smart [Sooner Safer Happier]12:05:03

Culture, behaviour, as part of OKR adoption

Jon Smart [Sooner Safer Happier]12:05:20

"Mindset change, culture change"

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:34

We tried lots of things before the OKRs, but only after we started to trial OKRs is when we started to feel cultural change we were after: • empowerment dial increasing • alignment to a common purpose improving • lowering the fear of failure, but learning to embrace it and continue to learn

🎉 1
Chris12:05:35

Interesting. Did you start with OKR outside of any HR process (like "annual performance assessments"). Thought about submitting/incepting the idea to HR, but did not get that much echo

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:23

Perdoo company has a talk that specifically advocates against running OKRs via HR in any way. If I find it, I will share it with you. It's not recommended

Chris12:05:40

Looks like this option may impose by itself as it didn't anyway found any echo in the void. Would be great for the paper Sabina! And thanks for the session!

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:18

As soon as you involve HR, they are going to want to tie this somehow to end of year reviews and compensation, which can be counter productive. There is a lot of debate around this out there in the community.

Chris12:05:45

Actually one of the point I wanted to raise to them is the need for decoupling bonus and these assessments as you end up having people only focused on the compensation question. And had a few other griefs against the system in place as well,

Chris12:05:08

Many thanks for the link!

Saket Kulkarni, Coach, Capgemini (he/him)12:05:27

It’s a method pioneered by Andy Grove at Intel. https://en.wikipedia.org/wiki/OKR

👍 2
Kristina Maria Manalo12:05:39

Do OKRs replace KPIs?

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:07

• KPIs are indicators how we are doing. • OKRs are our navigational system as we go towards our north star

❤️ 2
Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:15

You can see it now in the talk. KPIs and OKRs

Jon Smart [Sooner Safer Happier]12:05:29

I like to think of KPIs as 'health indicators' e.g. heartrate KRs are measures of progress, measures of movement, leading and lagging (e.g. I've run 5 miles of my 10 mile run aspirational outcome)

2
💡 1
Rmunwin12:05:42

Object and Key Result

👍 1
Chris Gallivan, Stellantis, Value Stream Architect12:05:47

is this a SAFe implementation? I heard program increment

Magda Niedźwiadek12:05:37

Increments are specific to Scrum (SAFe - just the implementation of SCRUM)

Ben Connolly, Head of Engineering, Vodafone12:05:56

Blended I'd say. 🙂 We use some aspects that help bring structure where otherwise we're still a little nebulous.

❤️ 1
Chris Gallivan, Stellantis, Value Stream Architect12:05:55

Am curious how SAFe and empowered teams blend. What sort of team coaching approaches do you use?

Magda Niedźwiadek12:05:49

Objective and Key Result

AlignedDev (Omnitech Engineer)12:05:13

are there some good book or blog recommendations about OKRs?

Jon Smart [Sooner Safer Happier]12:05:51

I'm doing a 15 min talk on OKRs on Thursday

🙌 4
AlignedDev (Omnitech Engineer)12:05:04

Are OKRs part of Sooner Safer Happier?

Jon Smart [Sooner Safer Happier]12:05:02

@logankd to answer your question, a focus on outcomes over output is one of the patterns in Sooner Safer Happier. OKRs are a great way to have a focus on outcomes (and the mindset that come with OKRs)

Patrick Anderson - Tasktop - he/him12:05:45

Dr. Mik Kersten (Project to Product, Flow Framework) is also talking about OKRs and metrics this week (Wednesday, 19th May @ 12:05pm BST) https://doeseurope2021.sched.com/event/jEMq/okrs-devops-from-micromanagement-misery-to-finding-flow?iframe=no

👍 3
AlignedDev (Omnitech Engineer)13:05:02

That's good, I've read his book and am still searching for ways to use the ideas (I'm a developer, but share a lot of ideas with the team I work with)

👍 1
Vaidik Kapoor (Speaker) - Technology Consultant12:05:29

I have usually found engineering teams often have a hard time articulating good OKRs though. I prefer focussing on the communication aspect, trying to ensure that every last person on the team understands what we are trying to communicate.

👍 1
Tim B12:05:09

@sabina.kambersalamanc - what techniques did you use for multiple teams to deploy independently i.e. feature toggles etc?

👀 1
Ben Connolly, Head of Engineering, Vodafone12:05:47

hey @tim.bassett - yes, multiple techniques/tools. @robert.greville1 is giving a talk later today on exactly that!

👍 1
Robert Greville12:05:49

Commonly we’ve been using parent child flags where one team would look after the parent and lower level teams would maintain their children. We’ve tried to ensure logical separation of flags between environments and teams. Hopefully more on this in our talk at 14:50 on #ask-the-speaker-track-4

👍 2
Tim B12:05:58

Thanks - I had that down as one of the ones to go to 🙂 Also interesting in any branching strategy or do the flags do away for the need etc.... Look forward to the talk

Robert Greville12:05:33

Flags have enabled us to move fully to TBD (Trunk Based Development). Previous to that although most of our teams had moved too, some were cherry picking release branches. LaunchDarkly enabled us to move towards this and really move away from several branches, with varying levels of code in them, in various environments - it really cleaned it all up.

Ffion Jones (Partner, PeopleNotTech)12:05:11

"The teams started to feel empowered, excited. The throughput started to skyrocket" The first step to team nirvana?

👍 1
Kimberley Wilson12:05:13

@ben.connolly @sabina.kambersalamanc Really enjoying your talk, particularly the emphasis that mindset shift and psychological safety lead to better OKRs. How did you transition to a culture of better psychological safety? Did you find that there were some areas still looking for security using command and control and, if so, how did you tackle this?

🙌 3
Ben Connolly, Head of Engineering, Vodafone12:05:13

Great question! This is one of my favourites as it really gets to the heart of the 'alternative' leadership values needed. Will do my best to summarise!

James Simon12:05:07

Very intetested in this question, I find command and control to be so deeply embedded

Katharine Chajka (Tasktop)12:05:20

how much is bottom up vs Top down wiht OKRs/KPIS?

☝️ 1
Ali Shahadat - Engineering at Wise12:05:27

Who defines the OKRs at Vodafone? Is it the teams themselves or are they set in a top down fashion?

👀 1
Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:41

As we went trialling, we were doing shaping them in Ben's Leadership team. Now, we are starting to work with the teams to shape them together, as we come out of the trials. More on this, at some point in the future

👍 2
Stijn Claes - Nike12:05:30

@sabina.kambersalamanc did you stick to company/tech wide OKRs or did you also ask teams to come up with what is relevant to them, on their level?

Stijn Claes - Nike12:05:57

We are trying to cascade OKRs to the teams because we see different teams struggling with different obstacles

Stijn Claes - Nike12:05:08

Happy to share thoughts on the learnings

Saket Kulkarni, Coach, Capgemini (he/him)12:05:36

It’s a big step to decoupling from a large monolithic release into many, independent releases in terms of platform/software architecture. I’d love to hear a bit more about how this was done.

Siddharth, NatWest Group, DevOps CoE (he/him)12:05:43

@ben.connolly , @sabina.kambersalamanc impressive numbers. not sure whether my ques will be answered in upcoming slides. But how was it linked to business outcome.

Ben Connolly, Head of Engineering, Vodafone12:05:20

Great question @siddharth.pareek - for these initial OKRs our real outcome was to achieve & demonstrate the business agility we're striving for. Our ability to deliver value faster has really seen a step change, which in turn has started to change behaviour around us (in order to better leverage that capability). Long way to go, but we're cooking!

Katharine Chajka (Tasktop)12:05:12

also, did you have a coach helping with "good OKRs" vs "bad OKRs" or just learn through practice?

Ffion Jones (Partner, PeopleNotTech)12:05:47

Psychological safety and removing fear of failure are critical to team success

👍 4
Chris Gallivan, Stellantis, Value Stream Architect12:05:52

what is your coach to team ratio?

👀 1
Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:00

I am the coach for the entire digital engineering, and our Scrum Masters act as coaches for our teams. I am not saying it is like that for the rest of the org, but it works for us.

Katharine Chajka (Tasktop)12:05:51

how frequently do you review them with teams/leadership

Rmunwin12:05:51

Can you share your current KPIs?

Vaidik Kapoor (Speaker) - Technology Consultant12:05:08

@ben.connolly @sabina.kambersalamanc how do you deal with situations where you have set OKRs for a quarter or a year and now want to change them? As leadership, a common quesiton that comes for teams is “but we had set those OKRs for a quarter, why should we change them now?“. Often teams struggle with a sudden need for change either driven from the management (change of business priorities due to forces out of control) or from the team (change in undestanding of the problem and feasibility to provide effective solutions).

Bob Fischer12:05:21

Can you provide more details on inner sourcing?

👀 1
Bob Fischer12:05:35

What were you trying to achieve?

Ben Connolly, Head of Engineering, Vodafone12:05:38

Hey @bfischer yes, sure. Might be worth a call later though. Lots involved in that one!

Ben Connolly, Head of Engineering, Vodafone12:05:20

Main thing we're trying for is to share ownership, better leverage the scale of the team, and be able to work in a much more concurrent way, rather than always sequentially (and regularly forcing prioritisation decisions to be made)

Bob Fischer12:05:53

So, did you measure the number of teams who had accepted pull requests from anyone outside of the team?

👍 2
Bob Fischer12:05:05

That might show health of inner souricing.

🙌 2
Bob Fischer12:05:44

Ahh! Like it.

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:21

Inner sourcing OKR, when we commenced (which was last quarter) was the following: OKR: Every team must accept, approve and successfully deploy ONE PR into service they own from a different team. Overarching objective we are working to across our quarters: We will be inner source capable across all services.

Bob Fischer12:05:12

Perfect. Great informaton!

👍 1
Bob Fischer12:05:43

For some companies this is really controversial. This is a nice experiment.

Jon Smart [Sooner Safer Happier]12:05:54

"OKRs drive cultural change, not just process improvement"

🙌 1
Erik Sackman12:05:00

are you using specific tools to communicate and gather data on your OKRs?

👀 1
☝️ 2
Vaidik Kapoor (Speaker) - Technology Consultant12:05:24

we used to use something called 15five

Jon Smart [Sooner Safer Happier]12:05:53

Good question. @ben.connolly @sabina.kambersalamanc are you using a tool for OKR transparency?

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:21

No specific tool at the moment. We are looking at this for the future, when Vodafone is ready to scale on this. Right now, it is a combo of data from AzureDevops, packed up in end of Sprint Reports that our Agile Lead and SMs run every 2 weeks. The reports are powerpoint presentation style.

👍 2
Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:56

Everyone attends the end of sprint reviews, which shows transparently the OKRs status. It is a community of technology, agile reps, POs, PMs etc...

👍 2
Ffion Jones (Partner, PeopleNotTech)12:05:09

Yes - psychological safety and agile servant leadership - you can empower the team and release yourself from all the pressure of leadership on your shoulders by building PS

❤️ 5
👍 1
Ben Connolly, Head of Engineering, Vodafone12:05:31

ignore that bit

5
😂 2
😁 1
🙊 2
Andy Farmer - Tasktop12:05:18

Great talk @ben.connolly and @sabina.kambersalamanc

❤️ 5
Jon Smart [Sooner Safer Happier]12:05:34

Thank you @ben.connolly and @sabina.kambersalamanc for sharing your journey, great talk!

❤️ 3
Ben Connolly, Head of Engineering, Vodafone12:05:40

Some awesome questions there! Thanks everyone! We're working through them. 🙂

Ben Grinnell - North Highland and DOES PC12:05:57

Great talk @ben.connolly and @sabina.kambersalamanc - thank you for sharing

❤️ 1
Ash Martin12:05:22

Great job @ben.connolly!

❤️ 1
Fokko V.12:05:46

What sources have you used to learn about OKRs @ben.connolly? And of course, thanks for sharing your great story!

Sabina Kamber Salamanca (Lead Agile Coach, Vodafone)12:05:14

Mainly: Perdoo and Felipe Castro materials. Also, lots of others, but the above two were key for us to start with

Fokko V.12:05:21

Makes sense, thanks a lot!

👍 1
Katharine Chajka (Tasktop)12:05:57

great talk thank you!

❤️ 1
👍 2
Ann Perry - IT Revolution12:05:28

Thank you, Ben and Sabina! Next up: @deepak!

👍 2
Alex Ryan Burnett12:05:35

Great talk, thanks @ben.connolly & @sabina.kambersalamanc 🙂

❤️ 2
Vlad Ukis12:05:34

Financial Guardrails as Code - How to do this?

4
Deepak RV(Contino)12:05:06

@vladyslav.ukis a variety of mechanisms: • Built in tags as part of the any approved modules (eg. Terraform modules, stackets, template spec .. pick your poison) • Provide a set of set of Finance modules that we expect product / apps teams to run as part of their services • A default set of Finance based policies (AWS SCP - Azure Policies) that we run on the product environments (AWS Accounts, GCP Projects, Azure Subs) that are run as part of the vending lifecyle of these environments

🙏 2
Vlad Ukis12:05:28

Good points!

Vlad Ukis12:05:10

This is cool! Would be interested in how to integrate F&P deep into the product delivery process!

Deepak RV(Contino)12:05:44

@vladyslav.ukis, we've learnt to gamify the process and make Benefits and Cost visible to everyone .. just like you would do with Reliability metrics on Engineering screens. So as part of the Product team, we have a FinOps SME embedded, or Financial viability is part of the Well Architected review and Route to live.

👍 3
Vlad Ukis12:05:15

A FinOps SME could support how many dev teams (roughly)?

Deepak RV(Contino)12:05:01

depends on the level of maturity and of the teams and what outcomes you're trying to achieve. Starting off with on part time SME to support under 6 teams is sufficient. As you start to move towards 12 - 15 teams having a person full time is useful; as the cost savings and benefits ends up paying for the role

Vlad Ukis12:05:49

thx! this is very useful!

Vlad Ukis12:05:49

The maturity model - is that by Contino?

Deepak RV(Contino)12:05:48

It's adopted based on the maturity model by the FinOps foundation, which is open source 🙂

Saket Kulkarni, Coach, Capgemini (he/him)12:05:08

What are some of the most established tools for FinOps/Financial Traces aside from the CSP tools? Is there anything you recommend to work with?

Deepak RV(Contino)12:05:47

Cloudability & Apptio has been a very mature one that I used. I do also like the Shared Cost allocation feature on Azure Cost Explorer

Andy Farmer - Tasktop12:05:10

Love the financial trace - really useful framework for getting costs nailed down for organisations focused on cost. What about organisations focused on value - is there a similar FinOps trace/framework for orgs focused on value rather than cost?

👀 1
Andy Farmer - Tasktop12:05:36

I guess the "Benefits Summary" part of the screenshot could help in that direction

💯 1
Deepak RV(Contino)12:05:46

@andy.farmer as part of the establishing the trace, you start to tag and align the consumption against the actual business value. You can pluck out the value leavers from the Cloud Benefits framework, and tag either your AWS accounts against it or a collection of applications.

👍 3
Andy Farmer - Tasktop12:05:08

Have you been able to get data in from PPM/timesheet tools as well? (where companies use those)

Deepak RV(Contino)12:05:31

we've focused our efforts more on outcomes as opposed to outputs. So we've not incl. timesheets or people time against it .. but I can see why this would be of interest to someone. Ironically, that's where integrating with something more value and flow focused like Tasktop would be a better way of looking at it 😉 .

Andy Farmer - Tasktop12:05:28

ha yes thats a fair point! 🙂

Andy Farmer - Tasktop12:05:39

My thinking is - including this financial traceability info automatically into the business results datasets that we work with... gives a nice realtime link.

Saket Kulkarni, Coach, Capgemini (he/him)12:05:46

I’m guessing the business loves the insight into IT spend. What are some of the responses you’ve had from the business about FinOps?

Deepak RV(Contino)12:05:17

They've loved it. They've felt part of the product development and decision making process as oppoposed to feeling like outsiders who don't have a role to play in the world of cloud. It's also help Finance teams realise how they need to change the processes, specially if Cloud is going to be the norm for their organisation. So all in all, it's one of the best things we could have done.

👍 4
Vlad Ukis12:05:15

what was the starting point for the initiative? was it a crisis?

Deepak RV(Contino)13:05:59

Less so of a crisis, more so a question being asked by Finance teams what role do we actually have to play in a cloud-first world

Vlad Ukis13:05:11

I like their attitude!

Fokko V.12:05:59

Thanks a lot for the great talk! A lot of insights and learnings here 👍

💯 3
Vlad Ukis12:05:39

Learned a lot from the talk. Thank you very much, indeed, @deepak.ramchandani!

🙏 1
🙌 1
John Osborne13:05:15

Really looking forward to this next talk

Ann Perry - IT Revolution13:05:34

Coming back from the break, we welcome @christina_yakomin from Vanguard!

Steve Smith13:05:51

Hi @christina_yakomin 👋

Christina Yakomin13:05:03

Hi all! 👋

👋 2
Steve Smith13:05:21

Please don't have any outages, a lot of attendees will have money in Vanguard 😆

😂 2
😄 1
Saket Kulkarni, Coach, Capgemini (he/him)13:05:22

She did just mention “Chaos Engineering”. Don’t they specifically generate incidents on production to test resilience? 😁

Christina Yakomin13:05:36

At Vanguard, we don’t run our chaos tests that we expect to cause outages in production, for reasons that are probably obvious! We do however frequently run these kinds of tests in non-production, and have run tests with a limited blast radius off-hours in an isolated segment of our production environment (separate from where our client traffic is routed!)

👍 3
Vaidik Kapoor (Speaker) - Technology Consultant13:05:15

how is the dev/prod parity in case of vanguard?

Steve Smith13:05:13

@christina_yakomin Thanks for not running chaos tests in production 🙂

Steve Smith13:05:37

@simon.skelton and I will talk on Thursday about how John Lewis & Partners run Chaos Day tests in their one pre-prod env

🙌 2
Saket Kulkarni, Coach, Capgemini (he/him)13:05:05

Thanks for the clarification, @christina_yakomin. I hope it was clear my comment wasn’t entirely serious, but it’s certainly interesting to read how you have things set up at Vanguard. 😊

Christina Yakomin14:05:11

@kapoor.vaidik no non-prod environment will ever perfectly mimic production, so anyone who tells you they’ve achieved parity is lying (or confused)! But even though we know it’s not exact, we still get a lot of value out of our non-prod chaos experimentation. I talk a lot more about chaos testing at Vanguard in my presentation from SRECon last year, which you can find on youtube by searching my name and SRECon

👍 1
1
John Osborne13:05:04

Hi @christina_yakomin curious - did you move your database to the public cloud or did you just move the apps and have them connect back to the datacenter?

Christina Yakomin13:05:26

We have a LOT of different applications at Vanguard that are in various stages of migration. Some have “fully migrated,” data and all, while others still leverage data on-premise. Our goal is to migrate almost all of the data securely to the public cloud (think ~90%)

Christina Yakomin13:05:58

Most of our apps running in the cloud at least perform reads against data in the cloud

Steve Smith13:05:25

Does that mean you're doing multi-writes, one to the cloud DB and one to the on-prem DB, for the same transaction?

Steve Smith13:05:44

Or doing writes to the on-prem DB and simply replicating to the cloud DB?

Steve Smith13:05:49

I assume its AWS Aurora or similar

Dave Stanke - DORA.dev14:05:21

I’ve heard from some people that they had rude surprises when they moved applications to the cloud but kept data on-prem: major networking costs and bottlenecks between the tiers. Any similar experience here?

Christina Yakomin14:05:47

@davidstanke532 We definitely co-located the data in the cloud primarily to address concerns about performance due to latency between the app and the data. Leveraging redundant AWS Direct Connect does help us a bit here. I’m no expert on all of our data architectures, but I will say from experience that hybrid data architectures (partly on-prem, partly cloud) should be temporary. Relying on replication long-term is going to bite you eventually!

💯 3
Steve Smith14:05:06

Ah I see... very interesting, thanks Christina 👍

Steve Smith14:05:23

My favourite saying is "temporary is permanent", yes you have to be careful with temporary DB replication

Christina Yakomin14:05:43

@steve.smith exactly. I’ve seen replication far more than multi-write. Teams that have had the most success in migrating their apps to the cloud have already moved on from that replication stage, having built confidence in its functionality, and are running 100% cloud, app and data.

Steve Smith14:05:44

My experience has been multi-write was popular back in the day to replicate from one on-prem DB to another in code Now it's on-prem to cloud and AWS etc. have solutions we can just plug in 😓

Dave Stanke - DORA.dev14:05:39

+1 — IMHO, managed cloud SQL offerings are some of the most useful cloud services. The flexibility w/r/t backups, replication, scale up/down are amazing. Yes, it’s a bit scary to transplant the beating heart of your service to another host but I have found it to be super liberating. Throw away your backup tapes! 😄

2
Steve Smith14:05:27

@christina_yakomin Have I missed a bit on operating model, or is it coming up? Do you have product teams on call out of hours for their microservices?

👀 1
Christina Yakomin14:05:25

Yes. Product teams are now on-call for their services. In the pre-microservices era, that prod support was much more centralized, and app teams rarely needed to support production. As part of the shift to microservices, the production support model shifted, too.

❤️ 2
Philipp Böschen, TUI, DevOps Coach, (he/him)14:05:08

How did the teams cope with suddenly being on call? Did you do anything to ease them into that role? It's pretty scary to suddenly be on call for something..

Christina Yakomin14:05:37

It was a gradual shift. On-prem legacy: centralized prod support Initial microservices: shared ownership of prod support Cloud-native apps: app teams primarily own their own prod support

❤️ 1
Steve Smith14:05:52

That's great to hear @christina_yakomin! So a similar journey to John Lewis & Partners

Steve Smith14:05:12

@simon.skelton and I are talking about it on Thurs, hopefully you can make it and let us know what you think

Christina Yakomin14:05:53

We still have central teams of experts to assist with prod support as needed, have invested a LOT into training, and now with our rollout of SRE roles, we are building a lot more operations expertise on our product teams

❤️ 1
Christian Rudolph (TUIGroup - Head Of DevOps Transformation)14:05:24

so you will have SRE roles in every product team?

Steve Smith14:05:02

That's the kind of approach at Equal Experts we recommend to our clients - build up the product teams, spin up enablement teams to assist teams at scale when necessary 👍

Christina Yakomin14:05:47

@christian.rudolph we will have SRE roles for every family of related products, and certain products requiring very high availability may have dedicated SREs as well. The introductory level of our internal SRE training program will be shared with ALL engineers who are part of a product’s on-call rotation, even if they don’t have an SRE job title

👍 1
Steve Smith14:05:08

I assume yes if you're looking for 99.9% availability, or maybe you've got an SRE on-call team for extreme high availability services say 99.99%?

Christina Yakomin14:05:07

At Vanguard it varies by service. For some critical trading platforms, we’ll aim for 99.99% or 99.999% while our marketing sites may not have the same criticality - more like 99.9%

Steve Smith14:05:20

A very sensible approach 👍

Steve Smith14:05:19

I ask because @simon.skelton has overseen John Lewis & Partners moving to on-call product teams, 99.9% is needed for http://johnlewis.com. It wasn't practical for one ops team to be on-call for 20+ product teams. A superhero SRE on-call team wouldn't have been cost effective for 99.9%, either. Retail companies rarely need true 99.99% availability, in my experience

Vaidik Kapoor (Speaker) - Technology Consultant14:05:02

@christina_yakomin may be you covered this, but typically who is the first responder to an alert in your model? Is it the developers who are owners of the microservices? Or the SREs? or both? How does collaboration / knowledge exchange happen while an incident is being dealt with?

Christina Yakomin14:05:05

Today, in most cases, it is the on-call engineer on the product team for the impacted product. SRE experts for each line of business and operations experts from centralized technology teams may be brought in to assist by our incident commanders as needed. We have an entire team of excellent incident commanders who ensure that communication is effective during an incident call

Vaidik Kapoor (Speaker) - Technology Consultant14:05:58

We don’t have incident commanders though

Vaidik Kapoor (Speaker) - Technology Consultant14:05:22

Do alerts auto-escalate to “defined SREs” for areas of the business if not resolved within an SLA?

Christina Yakomin14:05:08

To SREs, no, but we do have auto-escalation in place if incidents are not acknowledged. Usually it is up to the discretion of the responding engineers and the incident commanders to escalate further and pull in additional engineering resources for troubleshooting, driven by the severity of the incident in most cases.

Vaidik Kapoor (Speaker) - Technology Consultant14:05:42

That’s a classic where logs end up with 2 KB JSON objects making absolutely no sense

Philipp Böschen, TUI, DevOps Coach, (he/him)14:05:47

100% downtime is a much better easier goal! 😎

😂 2
Saket Kulkarni, Coach, Capgemini (he/him)14:05:04

This is true. Although deceptively so if I look at how many people get pinched by their Cloud Service Provider bill because they somehow missed bringing down some part of their infra. 😂

Philipp Böschen, TUI, DevOps Coach, (he/him)14:05:46

It's scary what you find lying around in some cloud accounts indeed 👀

Philipp Böschen, TUI, DevOps Coach, (he/him)14:05:51

But more serious, how did you arrive at your SLOs, was there some period of measuring all interactions, or some hard requirements?

🙌 1
Christina Yakomin14:05:47

It is very hard to centralize this and provide standards. It really comes down to knowing your clients’ expectations. Our technical teams partner closely with business product owners to set these thresholds and re-evaluate them quarterly. Many teams are working to improve their overall availability, so they’re making their SLOs stricter each quarter as reliability increases. Eventually we want to see teams operating at their target SLO and using their error budgets effectively.

👋 1
Philipp Böschen, TUI, DevOps Coach, (he/him)14:05:42

Ah gotcha, that makes a lot of sense!

Billy Hudson - ScholarPack - DevOps Engineer14:05:26

We’re trying to move away from alerting on causes to alerting on symptoms at my organisation and i believe the setting of SLI and SLO for performance metrics is key to this.

4
Vaidik Kapoor (Speaker) - Technology Consultant14:05:41

Nice point on making good monitoring dashboards.

Billy Hudson - ScholarPack - DevOps Engineer14:05:37

After doing my time building dynamic grafana dashboards I will certainly look into this, thanks!

🙌 1
Vaidik Kapoor (Speaker) - Technology Consultant14:05:09

the idea is to have best dashboard design practices built in the code

🙌 1
Ann Perry - IT Revolution14:05:26

Thank you, Christina! Coming up next, @matthew.pegge and @ilia.shakitko !

Simon Skelton - John Lewis, Platform and Ops Manager14:05:28

@christina_yakomin how do you share your Post Incident Reviews to ensure everyone learns from them? P.S. Thanks for a great presentation!

👀 1
Christina Yakomin14:05:46

We have internal “blogs” that I’ve leveraged in the past to share incident review write-ups. These allow likes and comments to encourage discussion and questioning. We recently added the ability to track “views” on these pages, as well. In my experience, my post-incident reviews are always my most-viewed and most-liked blog posts. Seems like the engineers at Vanguard are more interested in how I broke and later fixed something than they are in my new feature releases.😂

😂 1
✔️ 1
Simon Skelton - John Lewis, Platform and Ops Manager14:05:06

I agree, there's always lots to learn from things we break, and it's usually a very interesting set of circumstances that led to it. I guess we all just like detective and whodunit stories!! 🔎

Steve Smith14:05:33

@ben.conrad @mhyatt: Does HMRC still have an internal blog for sharing post-incident reviews? I remember something similar in 2017. Thanks

✔️ 2
Steve Smith14:05:38

@christina_yakomin I'd definitely recommend tracking view counts on post-incident reviews. Out of context, it's a vanity metric. In context, it's a cheap, effective proxy of organisational learning Until John Allspaw thinks of something better, that is

2
Christina Yakomin14:05:18

100% agreed. I was very excited about the recent addition of this simple feature.

Ben Conrad - Delivery Person, HMRC14:05:51

At HMRC we publish all our PIRs and have a slack channel where we post links to them as well.

👍 2
Saket Kulkarni, Coach, Capgemini (he/him)14:05:36

Thank you @christina_yakomin for a great presenation!

Ilia Shakitko, Accenture Liquid Studio NL14:05:44

Thanks @annp. Time to Rock!

🙌 1
Jason Clark14:05:46

@christina_yakomin For your technical platforms, what does your support model look like. Do you have a separate production support team, or does vanguard follow an SRE model where devs are also production support? Financial firms often have requirements of prod/dev segregation so maintain seperate teams.

Christina Yakomin14:05:01

Great question. Most of our platforms have an engineering team dedicated to building and supporting them. That team would be on-call for the platform. In the event that a platform outage causes application outages, the incident calls can get very large very quickly. The incident commanders help platform engineers with internal stakeholder communication while they troubleshoot and triage.

Christina Yakomin14:05:06

Our non-prod and prod environments are very segregated, but the same product teams troubleshoot both. We have lots of controls around production changes, many of them automated, including separation of duties (person who wrote the code/config change can’t deploy it). In a pinch, there is a dedicated team with more privileged access - though they still are subject to many controls - to handle manual changes to remediate incidents.

Billy Hudson - ScholarPack - DevOps Engineer14:05:52

Thanks @christina_yakomin great work!

Vaidik Kapoor (Speaker) - Technology Consultant14:05:20

@christina_yakomin really nice talk. very relatable. thoroughly enjoyed it!

Steve Smith14:05:44

Best talk of the conference so far 👏

👍 1
❤️ 1
✔️ 1
Chris Gallivan, Stellantis, Value Stream Architect14:05:26

what was the mix of Accenture vs Fedex coaches?

Ilia Shakitko, Accenture Liquid Studio NL14:05:36

@chris.gallivan278 we started with most of the coaches coming from Accenture | SIQ, but the goal was to “train the trainer” ASAP.

Ilia Shakitko, Accenture Liquid Studio NL14:05:05

Current state is somewhere close to 50/50 at that particular program, @matthew.pegge correct me if I am wrong.

Matthew Pegge14:05:49

Yes we are moving towards self sustainability closer to 50:50 now

Steve Smith14:05:00

"Stephen Smith published a safety check article". I don't remember doing that. Was it good? 🎉

Ilia Shakitko, Accenture Liquid Studio NL14:05:20

@steve.smith 🙂 Above is the reminder :)))

😆 1
Ilia Shakitko, Accenture Liquid Studio NL14:05:45

Depends on the team maturity and team coach, but in my experience never worked effectively with having more than 2 teams/coach.

Chris Gallivan, Stellantis, Value Stream Architect14:05:08

sounds familiar. many coaches say "1" when I ask this question. The best coaches usually say "1"

Steve Smith14:05:54

@ilia.shakitko I don't understand "extending traditional CD capabilities to support enterprise functions". I know CD pretty well, Where did you feel it needed extending?

Ilia Shakitko, Accenture Liquid Studio NL14:05:48

especially considering the knowledge reinforcement… teams are stepping back when there is a massive change they are undergoing. and we had to give them time to absorbe and make that step back. So it’s an illusion if you want to go like stairs, 2, 2, 2, 2,… not linear….

Ilia Shakitko, Accenture Liquid Studio NL14:05:38

@steve.smith what I meant, from my experience there is always CI/CD process is implemented to support the software part of it…

Ilia Shakitko, Accenture Liquid Studio NL14:05:24

But we’ve incorporated automation pieces and tools to have audit, compliance, change management, and CAB (yeah…) requirements into the pipeline.

Steve Smith14:05:15

I see... I think CD is much more than just automation, auditing/change management are always a part of it!

Ilia Shakitko, Accenture Liquid Studio NL14:05:48

I am not saying that it is never done, but in large solutions and legacy landscapes, in what I’ve seen teams are struggling to get out of only build&deploy automation activities.

Steve Smith14:05:50

Dave/Jez covered that in 2010 book, and I've always seen governance/service management as the much bigger part of it. I'm sure you have to

Bryan Kemp14:05:59

I like controlling WIP were there metrics to guide you?

Ilia Shakitko, Accenture Liquid Studio NL14:05:27

@bryan.kemp good question, can’t really share a ready-to-use playbook here… I am using some of the insights from the Theory of Constrains + looking at current team maturity state. And adjusting based on the flow observation. Also, depends on where the bottleneck was initially found - there initial WIP limit can be lowered first (while other parts can be left at ration 1:1 to amount of team members)…. That really helps to expose issues and let people shout out and start reaching other folks to see how they can help.

Ilia Shakitko, Accenture Liquid Studio NL14:05:46

I need to write shorter messages, sorry.

Ilia Shakitko, Accenture Liquid Studio NL14:05:56

If you look after the talk into the presentation, you’ll see in initial slide there is a delivery street mapped, with 3 main metrics (Delay, ProcessTime, and % Complete & Accurate). This is a good start, to answer your question better (w.r.t metrics) @bryan.kemp

👍 1
Bryan Kemp14:05:01

@matthew.pegge can you provide more detail on replicating the value streams?

Lee Reid (Tasktop)14:05:05

Great talk, thank you!

👍 1
Bryan Kemp14:05:50

great talk!

👍 1
Ilia Shakitko, Accenture Liquid Studio NL14:05:28

Thanks for tuning in and questions!

AmrutaRaul14:05:00

@ilia.shakitko @matthew.pegge - Thank you for awesome talk. How are you managing the flagging?

Matthew Pegge14:05:26

@bryan.kemp it's more about replicating the success we had with this particular team across the other ART's and VS's. So selling the success and showing others what can be achieved, but hopefully shortcutting some of the challenges we had in the first case.

👍 1
Ilia Shakitko, Accenture Liquid Studio NL14:05:29

Do you mean what solution is used? Or whether it is working out well with Business? @amruta.raul

AmrutaRaul14:05:59

@ilia.shakitko - yes the solution and do Business enable the feature?

Lee Reid (Tasktop)14:05:38

I may have missed it but how have you determine %C&A

Ilia Shakitko, Accenture Liquid Studio NL14:05:19

@amruta.raul well, for a certain reason we had to go fast and make simple feature API available in that particular product technology, available Out of Box. But there are variety of the OpenSource and Commercials options available. We had to go fast, while (new) things are being evaluated and being approved.

Ilia Shakitko, Accenture Liquid Studio NL14:05:16

Business got the flexibility and actually using feature enablement. The “fear” of allowing teams to deploy to production - is not yet totally gone, but we are getting there. It’s like driving tesla - first few turns, you really scared and then you enjoy the road.

Ilia Shakitko, Accenture Liquid Studio NL14:05:22

Well, maybe not the best comparison 🙂

🙌 1
👍 1
Ilia Shakitko, Accenture Liquid Studio NL14:05:59

@lee.reid we didn’t determine, we measured. Based on the ALM work rejections (tickets moved from test/approve back to dev) and interviews with those who participate in value delivery. For some of the stages - it’s amount of failed tests, or amount of failed Release Candidates.

Lee Reid (Tasktop)14:05:28

ahh, thank you that makes sense