Fork me on GitHub
#ask-the-speaker-track-4
<
2021-10-07
>
Steve George [Weaveworks]13:10:26

Hi everyone! 👋

1
❤️ 1
Slackbot13:10:25

Reminder: The final day is starting now – opening remarks and then plenary talks! Join the conversation in #ask-the-speaker-plenary.

Molly Coyne (Sponsorship Director / ITREV)16:10:43

Let's welcome @bramley.maetsa and @guus.hutschemaekers for our next session's Q&A. Thank you to #xpo-servicenow!

👋 4
👏 1
Richard Hawes - ServiceNow DevOps16:10:01

Thank you Bramley for sharing!

❤️ 1
Guus Hutschemaekers - Plat4mation16:10:11

Good question @bramley.maetsa!

Tom Wojtusik (Tasktop)16:10:10

@bramley.maetsa very good point. DevOps has to be defined by an org to meet its particular needs.

thumbsup_all 1
👍 1
Dian Hansen16:10:39

"Products are compliance by design" - great!

2
Richard Hawes - ServiceNow DevOps16:10:23

If you’d like a copy of Bramley’s slides feel free to send an email to <mailto:richard.hawes@servicenow.com|richard.hawes@servicenow.com>

👍 2
❤️ 1
Joe Arrowood16:10:56

Please Mutr yourself

Steve Beal16:10:58

what's going on here?

Dian Hansen16:10:58

yes, we're hearing you talk

Sara Gramling16:10:59

seems to be in the video

Chris16:10:01

seems we have other people on air

Patrick S. Kelso16:10:01

Glad it's not just me.

Steve Beal16:10:40

was that part of the recording??

Molly Coyne (Sponsorship Director / ITREV)16:10:16

Thanks so much for your note, everyone! Our VendorDome participants were eager and live a bit early. 🙂

2
😅 2
Richard Hawes - ServiceNow DevOps16:10:53

Was not in recording. I can send a clean recording to anyone that needs it and it will be available in the Summit site after the event. <mailto:richard.hawes@servicenow.com|richard.hawes@servicenow.com>

👍 3
Molly Coyne (Sponsorship Director / ITREV)16:10:12

Please note that you can video this entire video (without the voices 😉) in our Video Library right after the session.

👍 1
Richard Hawes - ServiceNow DevOps16:10:37

Great minds thinks alike Molly!

1
😉 1
Chris16:10:32

Then I was probably right thinking, but isn t it S. Labourey from cloudbee voice?

1
Chris16:10:15

Thank you @bramley.maetsa

Guus Hutschemaekers - Plat4mation16:10:43

If anyone wants a similar analysis bramley mentioned on their ServiceNow environment and CMDB, drop me a message 🙂

❤️ 1
Molly Coyne (Sponsorship Director / ITREV)16:10:15

Thanks so much @bramley.maetsa @guus.hutschemaekers and #xpo-servicenow for your great presentation and roll-with-the-punches. Please be sure to visit ServiceNow's booth and #xpo-servicenow channel for follow up information and questions!

❤️ 1
Guus Hutschemaekers - Plat4mation16:10:25

thanks for hosting us 🙂

❤️ 1
Molly Coyne (Sponsorship Director / ITREV)16:10:31

A warm welcome to @jcampos as he joins us for Q&A! And thank you to #xpo-splunk_observability for supporting DOES US 2021!

👏 1
Justin Abrahms (eBay)16:10:32

"Latency is the new downtime. It's not CPU or memory being pegged". <- this matches my experience as well, especially without good backoff/retry mechanisms.

💯 3
Dave Mangot - DevOps transformation professional16:10:24

Will we see the 4 Golden Signals make an appearance? 😂

Justin Abrahms (eBay)17:10:17

Looking ahead at the slides, it appears not. This is a "what is open telemetry and why/how should someone use it" talk.

Ryan Taylor - Senior Geospatial Developer, GISinc17:10:08

Does telemetry also include capturing data about how someone is using and application such as, what capabilities they are using in an application, or is more geared towards resources, processes, event logs, etc?

Ryan Taylor - Senior Geospatial Developer, GISinc17:10:18

We have some home grown usage logging but standardizing on an open standard could be a worthwhile endeavor.

Justin Abrahms (eBay)17:10:26

I don't think it's intended for the "are customers using this particular feature" sorts of workflows, especially given I think it will do sampling at scale.

👍 1
Justin Abrahms (eBay)17:10:42

@jcampos What's your expeirence here?

Greg Leffler (Splunk)17:10:37

opentelemetry doesn’t natively require sampling - and some platforms (e.g. ours) do not sample (by default). there’s not yet an opentelemetry schema for user events, but some platforms let you integrate say, RUM data, with opentelemetry-native events (i.e., metrics, traces, logs)

Johnathan Campos17:10:12

@ryanewtaylor Absolutely, we use traces to gather this data. Applications are instrumented to tell us how users are using the features of the application.

👍 1
Patrick Schmidt17:10:29

Boy I wish the conference overall had more practical guidance like this session. There are so many 'airy formless' presentations recounting an experience an org had moving from A to B. Very little details on 'what/how/etc'

🎉 1
Justin Abrahms (eBay)17:10:11

Just curious.. any other particular talks you're looking for practical guidance on?

Mike Snyder - Speaker (Oteemo)20:10:01

i believe the intent is the programming is set up to stimulate the follow on conversations that may go deeper into the tech/approaches/etc. In a half hour, it is a tough balance

Peter K.17:10:04

Currently OpenTelemetry does not have Logging implemented in the Go library. Do you know the tentative timeline when it will be up for release (alpha or what-not?)

Johnathan Campos17:10:52

Hey @contact670 I am not 100% sure on this timeline. I would have to get back to you on this and confirm.

Peter K.17:10:36

No worries, it's a nice-to-have atm as we'd like to have everything self-contained in our Go codebase.

Greg Leffler (Splunk)17:10:53

the OpenTelemetry logging specification overall is still not final, and I believe the team is working on getting that nailed down before expanding language support for it.

Peter K.17:10:52

Awesome, btw congrats on getting it to 1.0! cc: https://0ver.org/

👏 1
Molly Coyne (Sponsorship Director / ITREV)17:10:20

Thank you again @jcampos! 🌟

👍 2
Dian Hansen17:10:36

brilliant practical walk through - thanks @jcampos!

🙏 1
Craig Cook - IBM17:10:44

Can the links from the last slide also be posted here?

1️⃣ 2
Johnathan Campos17:10:04

Just a sec. Let me gather those.

Johnathan Campos17:10:00

If anyone has another questions, happy to answer. Please feel free to DM me. 🙂

Greg Leffler (Splunk)17:10:43

Finally, if you enjoyed Johnathan’s presentation, he also recorded a video showing end-to-end how to instrument a Java app with OpenTelemetry: https://www.youtube.com/watch?v=oKxHnkuA5zY

Molly Coyne (Sponsorship Director / ITREV)17:10:12

🌟A warm welcome to @hlynch and @stephen who will be moderating today's VendorDome Q&A between @sacha and @brianf. Thank you #xpo-cloudbees and #xpo-sonatype for hosting our final VendorDome of DOES US 2021! 🌟

🎉 1
1
upvotepartyparrot 1
Sacha17:10:06

Hi everybody!

👋 2
Hope17:10:06

What is top of mind for Developers who are facing challenges with the shift in responsibilities as it relates to security? What is working? What isn’t? Questions for Sacha or Brian?

Craig Larsen - he/him - Solution Design Group Mpls17:10:37

Haha … finally someone who knows that a developer can feel, well, like a mushroom being fed. 😄

😃 1
1
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear17:10:39

A challenge I see is that the incentives for developers are to build and implement new work, and rarely are they given the time to review that work with a mindset of security.

1
👍 3
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear17:10:24

If we integrate tools into our deployment pipelines that will put on the brakes for code that isn’t secure enough, we can get pushback around why we can’t get this delivered to production. How can we get buy in around the security tools running in the pipeline being as valuable as test automation in the pipeline?

2
👍 1
Dipesh Bhatia17:10:20

We did that. False positives broke us down. How did we take care of false positive? Go with the tool: critical, less critical

👍 1
Hope17:10:35

What is the biggest change you have seen in your development role with devsecops good or bad? Changes on your wish list?

Joe Waid - Manager, Delivery Engineering - Columbia Sportswear17:10:52

the good is that security is getting considered much earlier that it used to be. It is getting integrated into the pipeline, and cloud vendors force you to think about identity and access from the beginning which is big improvement over the past in the datacenter when security was thought about only after all the code is built and ready to go live.

🎉 2
Trac Bannon (Speaker)17:10:24

w00t - "security is not just about code security!!"

👏 4
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear17:10:48

Continuing education on secure coding practices remains a challenge. There is always a ton to learn in this industry and I don’t think we are doing as much as we should be to help developers grow their skills.

1
Trac Bannon (Speaker)17:10:22

We need to help the developers to understand security coding; right now, they are often security hobbyists.

3
👏 1
Trac Bannon (Speaker)17:10:42

Upskilling and advocacy are an imperative!

Craig Larsen - he/him - Solution Design Group Mpls17:10:09

Along with blocking code during deployment, is finding code that has a security issue after it has been deployed (like a month later). I worked at a place where some tools were deployed and then wouldn’t be changed for months because they worked. But if they were deployed they would have tripped security issues.

🙀 3
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear17:10:00

How do you think we balance the speed we get from the open source ecosystem we have with node, nuget, etc with these newer requirements to understand your dependencies and risk from them?

👂 1
👍 1
Gene Kim, ITREV, Program Chair17:10:48

What an illustrious cast of experts!!! @sacha @brianf @stephen @hlynch What is the most optimistic thing you’ve seen to help devs, ummm, rise up to the level that the outside threats demand. I’m utterly in awe of the carnage of the effects of Solarwinds, codecov — and how software supply chains and CI/CD has been put in the spotlight.

👂 1
👍 3
1
Gene Kim, ITREV, Program Chair17:10:11

(Not anything against our friends with Ph.D.s.. Dr. @stephen 😆 )

Sacha17:10:06

oh oh!!! I screwed up!!!

😰 1
😁 1
Dipesh Bhatia17:10:21

the other thing that we are dealing with CI/CD, is time it takes for security tests. We are doing veracode and twistclock, it takes 1 - 1.5 hr to finish !!

😰 1
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear17:10:57

I had similar speed issues in a past life with checkmarx! Scans took over 24 hours on some code bases.

Gene Kim, ITREV, Program Chair17:10:35

“It’s hard to find optimism” — a typical quote from grizzled@brianf , creator of Maven Central. 😆 cc @topo.pal

😆 2
Christopher Pryce17:10:48

What can we do make this area more simple for Developers? The more simple the more secure is usually my rule of thumb, am I thinking about this wrong?

👂 1
Brian Fox17:10:57

but i did find one at least @genek;-)

😆 1
Gene Kim, ITREV, Program Chair17:10:47

“at least it’s not as bad as 2011, when people were just worried about GPL licenses and everything else was left to the Infosec team” — @brianf (did I get that right? and you call that optimism? 😆 )

2
Brian Fox17:10:29

I only have to avoid the agpl and we have security teams and firewalls for everything else (circa 2011)

❤️ 2
👋 1
Gene Kim, ITREV, Program Chair17:10:35

I’d love to hear from each of the panel: “what was your reaction when you read/heard about the scale of Solarwinds and codecov issues?”

👂 1
Brian Fox17:10:13

“It was inevitable?”

Laura Henry - American Airlines [she/her]17:10:37

How do you get developers to care about security vulnerabilities even if their application “isn’t critical to the operation” but is still exposed to the internet

👂 1
👍 1
Gene Kim, ITREV, Program Chair17:10:43

“the easiest dependency you have isn’t the one that isn’t in your codebase” — @stephen 😆

🎉 2
Christopher Pryce17:10:57

Is that a correlation to “The best network security tool is a pair of scissors” ?

😆 1
Gene Kim, ITREV, Program Chair18:10:30

“modern attacks are focused on supply chains, and YOUR developers, and in the secrets in the open source community” . — @brianf It really is awe inspiring to see the full impacts of this — quite literally showed how unimaginative I’ve been. 🙂

👏 1
Gene Kim, ITREV, Program Chair18:10:08

’Staging is an important environment, too” — @stephen

🎉 2
Gene Kim, ITREV, Program Chair18:10:10

Okay, I gotta share a surprise — how a CEO might blame a summer intern for a systemic development issue. Not our industry’s finest moment. 🙂

4
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear18:10:42

Great point on crypto making everything a valid target now!

👏 2
Trac Bannon (Speaker)18:10:53

Even simpler: give teams time to experiment with OWASP & BURP and the training apps.

1
Trac Bannon (Speaker)18:10:44

It is surprising how even those samples make you take a step back and realize the most frequent vulnerabilities are not occluded.

Topo Pal - Programming Committee Member18:10:52

Has anyone done a threat modeling of their supply chain?

1
1
Hope18:10:36

Zero trust, anyone? 😁

Anupriya Rath18:10:14

@prashant.darisi Hi! Here is the slack Channel where we will be addressing Audience Questions during our session starting at 2:50 PM ET.

upvotepartyparrot 1
Hope18:10:59

Any topics you’d like to ask Sacha or Brian about? Even if off our topic of security as we approach the end of the session?

Topo Pal - Programming Committee Member18:10:14

I was going to ask about the threat modeling

👂 3
Topo Pal - Programming Committee Member18:10:25

Supply chain pipeline threat modeling

Gene Kim, ITREV, Program Chair18:10:16

@topo.pal I’ve been dazzled by some of the innovations being discussed in the Deno framework, created by the node.js founder, putting in new permission model to prevent dependencies from access system resources. https://deno.land/ And Richard Feldman is now working on Roc, which rethinks the relationship and the runtime/VM, going way beyond what Deno is thinking. Allows different runtimes, different memory allocators, sandboxing, etc.

Stephen Magill [Sonatype]18:10:13

Thought this said “Richard Feynman” at first 😂

😂 2
Gene Kim, ITREV, Program Chair18:10:56

That Roc video is cool. Feldman has some notoriety because he wrote the book on Elm, and now he’s off taking what he learned to domains outside of the browser. And presents a very new novel relationship between code, something he calls the environment, and the runtime. “Environment” is something between the code and runtime.

Gene Kim, ITREV, Program Chair18:10:27

Although I cringe everytime Feldman says “virus” — when he really means “arbitrary code execution”

Christopher Pryce18:10:41

Supply chain vulnerability is the trending topic this year. Say we learn how to secure the Supply Chain, what is the next big security threat on the horizon.

👂 1
👍 1
Gene Kim, ITREV, Program Chair18:10:33

@topo.pal It seems like this that eliminates a whole category of errors, such as dependencies being able to basically do anything they want.

❤️ 1
Sacha18:10:31

• Global survey of C-level executives performed by CloudBees • Executives overwhelmingly claim their software supply chains are secure (95%) or very secure (55%) and 93% say they are prepared to deal with an issue such as ransomware or a cyberattack on their supply chain.  • Vulnerabilities ◦ More than two in five (45%) executives admit that initiatives to secure their software supply chains are halfway complete or less, and  ◦ 64% say they are not sure who they would turn to first if their supply chain was attacked.  • 64% say it would take more than four days to fix the problem if they did experience an issue. For a Fortune 500 company, this could result in the loss of millions in revenue and create significant reputational harm.  • And, while 93% of executives say they routinely practice dealing with a supply chain production vulnerability, 58% say that if they experienced one they have no idea what their company would do. • More than four in five (83%) C-suite executives say having security issues causes their developers to drop everything to review code, which in turn causes other business disruptions. By dealing with security issues, 82% of executives say they are losing time employees could be spending on innovation • Almost all executives say container images are checked for high or critical vulnerabilities (95%) and their automation access keys are set to expire automatically (95%), while 92% say their company only accepts commits signed with a developer GPG key. Nine in ten C-suite executives say dependencies to trusted registries are limited at their organization (90%) and that administrative access to CI/CD tools is restricted (89%). 

🎯 3
⬆️ 1
Gene Kim, ITREV, Program Chair18:10:04

@brianf is incapable of even hypothetical optimism — he’s like the grizzled veteran who has seen too much. 🙂

Topo Pal - Programming Committee Member18:10:11

Good discussion @stephen @brianf @sacha and @hlynch

🙏 2
😆 1
Gene Kim, ITREV, Program Chair18:10:08

Just becomes someone shows their teeth isn’t smiling. 🙂

4
Gene Kim, ITREV, Program Chair18:10:47

“To see the open source community mobilizing shows that the problem is recognized.” YES! OPTIMISTIC! Nice job, @sacha and @brianf!! 😆

👏 1
1
Sacha18:10:11

He did it! But, hey, he is right…

Gene Kim, ITREV, Program Chair18:10:19

@sacha You achieved something I haven’t been able to do for years, not for lack of trying! 🙂 Nice!

🙌 1
Sacha18:10:37

HE IS ON FIRE!!!

Virginia Laurenzano NSA/MARFORCYBER18:10:55

"things are not sh1tty" - is it bad this makes me glad?

😂 3
👌 1
Gene Kim, ITREV, Program Chair18:10:00

Optimism is fun! :)_

😂 1
Craig Larsen - he/him - Solution Design Group Mpls18:10:42

@genek will this talk be available in the library???

👂 2
Craig Larsen - he/him - Solution Design Group Mpls18:10:42

@genek will this talk be available in the library???

👂 2
Alex Broderick-Forster, IT Revolution, Event Staff18:10:00

Which talk are you looking ffor @craig.larsen?

Alex Broderick-Forster, IT Revolution, Event Staff18:10:33

VendorDome or the talk from Steve George?

Alex Broderick-Forster, IT Revolution, Event Staff18:10:19

• Vendordome will be added tomorrow at the latest • Tales From the Branches with Steve George is here: https://videos.itrevolution.com/watch/621617856/

👍 2
Craig Larsen - he/him - Solution Design Group Mpls22:10:40

@alex The Vendordome talk “VendorDome: What does it mean to be a developer in the era of cyber crime?“. Thanks, Alex!

Alex Broderick-Forster, IT Revolution, Event Staff22:10:47

@craig.larsen yes will be in tomorrow!

👍 1
Craig Larsen - he/him - Solution Design Group Mpls22:10:45

@joe.waid @vmshook @virginia.shook Answer in the thread. : )

Topo Pal - Programming Committee Member18:10:55

Every night I sleep being optimistic only to get up pessimistic

😂 4
Gene Kim, ITREV, Program Chair18:10:10

That was awesome, @brianf! For anyone observing my teasing of @brianf — I’m a huge fan of his work, and the stories behind Maven Central are epic. Here’s an amazing interview he did for The Manifest podcast:

👏 1
1
Joe Waid - Manager, Delivery Engineering - Columbia Sportswear18:10:33

I’ve really enjoyed this discussion, and not just because I asked so many of the questions! :rolling_on_the_floor_laughing: Thank you @stephen @brianf @sacha @hlynch!!

🎉 1
1
1
Gene Kim, ITREV, Program Chair18:10:39

Great discussion, thank you, @sacha @stephen and @brianf!

1
3
1
👏 2
Brian Fox18:10:52

There’s one pending on the CTO confessions @genek that really dives into this supply chain topic

Sacha18:10:12

Thank you all for your participation, this was really fun to have so much traffic on Slack! GO GO GO!!!

🏃 1
1
1
Molly Coyne (Sponsorship Director / ITREV)18:10:20

🌟Welcome @steveg for our next session's Q&A. Thank you to our sponsor #xpo-Weaveworks! 🌟

Gene Kim, ITREV, Program Chair18:10:31

(I enjoy the interviews on The Manifest, despite the fact that most of them are the same story of “a bunch of developers try to build a package repository, and endless mayhem ensues.” Great fun if you enjoy mayhem. 🙂

Gene Kim, ITREV, Program Chair18:10:01

Rarely are they at the scale of Maven Central…. 🙂

Steve George [Weaveworks]18:10:12

hello all, hope the talk was clear and useful! 👋

Denver Martin, Dir DevSecOps, he/him18:10:46

Have you also looked at adding some of this to Chat ops?

Christopher Rueber18:10:11

I’ve seen some minimal interaction with chat systems with gitops integrated. Essentially confirmation type dialogs.

Pedro Jordan18:10:02

Love this , how non-tech people can understand the change ? is there any description on it for them ?

Steve George [Weaveworks]18:10:13

@mr.denver.martin no reason you couldn't do it for sure, and relates to my point about 'git being a place of collaboration'. We see people alerting back into slack etc - but haven't directly played with driving from chat yet

Steve George [Weaveworks]18:10:20

@pedro.jordan Thanks, there's material on the Weaveworks site that's designed for non-tech people. Also the CNCF working group does try and represent user interest: https://github.com/gitops-working-group/gitops-working-group

1
👏 2
Joel Roberts18:10:38

Is all documentation stored in git with gitops rather than in conference or other tools?

Steve George [Weaveworks]18:10:43

@jroberts We certainly believe that you should store "everything needed to operate the service" together with as a single unit of operation. An obvious example of that is things like monitoring/alerting definitions and the Playbooks for the service.

Steve George [Weaveworks]18:10:48

That way everything is versioned together. In practise enteprises have often made decisions on parts of this tooling - so that's why we try to be pragmatic about the adoption path

👍 1
1
Patrick S. Kelso18:10:56

phew, for a minute there I thought you were suggesting we stop documenting in Sharepoint. 😞

Steve George [Weaveworks]18:10:33

Hah hah - I've learnt to pick my battles in the tech industry!

Patrick S. Kelso18:10:24

I managed to get our architecture team to agree that Sharepoint was only for things that change less than once a year. So they can keep our architecture standards in sharepoint, but I don't need to put the reference architectures there because something will always change.

Ryan Taylor - Senior Geospatial Developer, GISinc18:10:05

Is gitops most applicable to 1 repo for all pieces of your app, or many separate repos. I.e., mono-repos vs individual repos. For instance, if you have a SPA + some web services + db, how do folks typically break up their repos with devops in mind?

Malcolm McAlpin18:10:29

Thank you!!

❤️ 1
Pedro Jordan18:10:46

really interesting , thanks for the talk !

❤️ 1
Joel Roberts18:10:50

Excellent presentation!

❤️ 1
Steve George [Weaveworks]18:10:02

@ryanewtaylor Yes people choose lots of different options here. I think the easiest thing is to think about it from an organisational perspective, so have the services together that a team will deploy. You can use branches or folders easily enough - so for me the "organisational view" has been the main lesson

👍 1
Ryan Taylor - Senior Geospatial Developer, GISinc18:10:49

Does that mean, that pieces that should move together, in an atomic way (say, client and api), should be organized together in one repo?

Steve George [Weaveworks]18:10:21

Yes, I think this is an important consideration. We want to make it easy to deploy an "atomic" change (as you said). If that means that logically they are together it's easier to have them in a single "configuration repository" together.

Steve George [Weaveworks]18:10:03

Commonly, you don't want to land up with 1:1 for every service. So I think about teams, and then clustering services that are logically deployed together.

👍 1
Steve George [Weaveworks]18:10:39

(I don't know why I said commonly there - you NEVER want to have 1 separate git repo for each service right 🙂 )

👍 1
Ryan Taylor - Senior Geospatial Developer, GISinc18:10:34

I have done both, one mega repo and we swung the other way, essentially 1:! and need to find a better balance. 🙂

Steve George [Weaveworks]18:10:29

Ack! we run regular GitOps events @weaveworks, please come along one time and describe your situation - people are always interested to hear - and our team have done "all the options" so you would definitely get some interesting opinions ;-)

👍 1
Molly Coyne (Sponsorship Director / ITREV)18:10:24

Hello @prashant.darisi for our next session's Q&A. #xpo-everbridge is one of our great sponsors!

Prashant Darisi18:10:54

Thank you Molly, very happy to be here

Sanjeev Shrivastava18:10:36

Should we not focus on deferring the impact of change via canary /chaos or circuit breakers

Prashant Darisi19:10:18

That will work Sanjeev and that should be part of the workflow process; the question always asked is "do we embrace risk, or do we mitigate risk"

👏 1
Prashant Darisi19:10:29

it should be an either-or discussion...we can do both

Prashant Darisi19:10:57

despite your best efforts, as you see customers are experience disruption....so need a strategy for both

Prashant Darisi19:10:28

oops...meant to say it should be an NOT either-or discussion...we can do both

Sanjeev Shrivastava19:10:27

Thank you Prashant , I agree its both and embracing risk is key and fail safe fail fast could be one more aspect to it

Prashant Darisi19:10:16

Facebook and WhatsApp, great examples...now we know what happened (and how) despite the best efforts ...bad changes do slip through 🙂

jeff.gallimore (CTIO - Excella, he/him)19:10:46

automation and proactive remediation is so important to getting the important MTTR down. using a ticket-based approach is so problematic.

Sanjeev Shrivastava19:10:46

agree , moment ticket is created we already landed into reactive world -- understanding baseline and deviation from baseline is an indication

💯 1
Sanjeev Shrivastava19:10:40

i joke to my team have a dashboard which is blank and if anything is bad it pops up with relationships mapping

jeff.gallimore (CTIO - Excella, he/him)19:10:46

have you found any effective techniques for calibrating the right point to start the “proactive remediation”? make sure you’re not too early or (especially) too late…

Prashant Darisi19:10:57

and what about signal enrichment...we know that as a human being, once we see 1 alert...we will also look the others systems to see if we can find more...so, when we receive an alert from from system, why not proactively gather data from the other monitoring systems, log managers, ...and enrich the signal

Prashant Darisi19:10:34

It is a learning exercise...we have a customer that has done this over 2-3 months and NOW decided that they are about 28 'pro active' remediation steps that can run without APPROVAL

Sanjeev Shrivastava19:10:01

yes self heal and pre approved with bots/rpa

Sanjeev Shrivastava19:10:42

great presentation and insight

Prashant Darisi19:10:47

however, our systems ALSO supports seamless approval proces...Jeff, you could be that approver "you might get a text or a phone call to say"...the recommendation is to execute these scripts, "press 1 to approve" or 2 to join the conference bridge...or 3 to deny the remediation

jeff.gallimore (CTIO - Excella, he/him)19:10:26

i imagine this approach might feed a regular retro/post-incident review to learn the conditions to move this to “automated approval” so i wouldn’t need to “approve” next time?

jeff.gallimore (CTIO - Excella, he/him)19:10:28

it sounds like this is addressing a couple of related issues: monitoring tool sprawl (signals coming from lots of directions) and cognitive overload — especially for ops staff.

Prashant Darisi19:10:26

Yes it is...folks need to know what is the difference between Sev 1 in their left hand vs. Sev 1 in the right...everything looks bad when there is overload

💯 1
jeff.gallimore (CTIO - Excella, he/him)19:10:32

but at least you know something is bad. what’s worse is you miss the signal that “something is bad” because of overload.

Gene Kim, ITREV, Program Chair19:10:53

‘escalating to “cross functional disciplines” to address major incidents, such as ransomware’ — even hearing about these scenarios are so stressful! 😆 Hearing the stories of POS ransomware this were truly surreal and awe-inspiring.

Prashant Darisi19:10:47

Colonial pipe is a great example...Start as. "Digital Critical Event"...affects Supply chain...and a day later you cannot fill gas in your car....

Prashant Darisi19:10:08

company and external ramifications to a "Digital" critical event

Gene Kim, ITREV, Program Chair19:10:00

Totally — I was following the impacts of the Kaseya ransomware and its impacts on merchants. Truly amazing. Did you know whether it affected e-commerce at all, or was it just POS?

Prashant Darisi19:10:01

so Business Continuity, Supply Chain Resiliency plans had to get activated...of course the Security and IT teams did their part

Gene Kim, ITREV, Program Chair19:10:21

Thank you @prashant.darisi!

❤️ 1
Prashant Darisi19:10:22

not much for them on e-commerce...

Molly Coyne (Sponsorship Director / ITREV)19:10:25

👏:skin-tone-2:Welcome @ally.corsetti to field questions for our next session's Q&A. A big thank you to #xpo-aqua-security-k8s!👏:skin-tone-2:

🎉 1
Ally Corsetti19:10:36

Thanks for having us!

❤️ 2
Gene Kim, ITREV, Program Chair19:10:18

I’m dying to ask: I’m always so afraid to update my container images — because I’m afraid that everything will break. Am I the only one, and what advice would you give to people to keep their containers up to date? 🙏

💯 1
Gene Kim, ITREV, Program Chair19:10:51

cc @ally.corsetti Thank you!!!

Ally Corsetti19:10:43

@genek Great question! Let me follow up with our team and get back to you ASAP :)

Ally Corsetti20:10:22

@genek Images should be updated when there's a reason to do so. Either the new version adds capabilities, solves bugs or resolves security issues. When there's a good reason to update an image, like any piece of software, it should be tested in a secondary environment before loaded into a production environment. If security is important, there will be enough planning, time and resources to provide the test and the processes around it, so it will become easier and less scary to updated the images.

Gene Kim, ITREV, Program Chair20:10:30

HAHAHAHA. A wonderful piece of advice, which I’ll take… umm, next time around. 🙂 Going from Heroku cedar-14 image to cedar-18 was terrifying. Was motivation to do it better next time around, for sure!!!

Gene Kim, ITREV, Program Chair19:10:03

“afraid” ===> “update never until they deprecate the container image”. I’m ashamed to say that this happened to me in Heroku earlier this year. …err….I’m asking for a friend.

😂 2
Molly Coyne (Sponsorship Director / ITREV)19:10:30

🌟@andreas.grabner joins us for our next session's Q&A. Thank you so much #xpo-dynatrace !🌟

🎉 1
Andreas Grabner19:10:23

I am here 🙂 - let me know if you have questions. This is what you miss in case you dont watch my sessions

👋 1
🙌 1
Gene Kim, ITREV, Program Chair19:10:25

Good to see you again, @andreas.grabner!!!

Andreas Grabner19:10:40

Great to see you again @genek - hope soon in real life again

Gene Kim, ITREV, Program Chair19:10:07

Yes!! “DevOps Activist!” 😆

Andreas Grabner20:10:15

Well - the other title "I would like to be like Gene Kim" is too long for most title input fields 🙂

Gene Kim, ITREV, Program Chair20:10:47

Because you too want to type a lot. 🙂

Gene Kim, ITREV, Program Chair20:10:12

So much SRE focus this year at DevOps Enterprise!

❤️ 3
Joshua Barton20:10:03

Made it a great first year for me to be here :)

Gene Kim, ITREV, Program Chair20:10:33

@andreas.grabner Several stories of “CI/CD systems crashing under heavy dev load”, including the Vanguard Chaos Team simulating this, so they could finally figure out what was going wrong. I suspect these stories would resonate with you, even though which tool was never specified, except for “thread dump” — that means Jenkins, right? 😆

Andreas Grabner20:10:06

definitely exactly in my field. We at Dynatrace also still use a lot of Jenkins - and have just contriubted back to OpenTelemetry for Jenkins - capturing distributed traces to better understand where your end-2-end pipelines are slow, failing, waiting ...

Gene Kim, ITREV, Program Chair20:10:32

Hahaha. I’ve become a huge fan of the JVM since falling in love with Clojure. Now I’m in awe of JVM, and pride myself that Java stack traces make sense to me now. 😆 Watching Brian Goetz talks is one of my fave things to do now.

🎉 1
Andreas Grabner20:10:52

maybe you have two browser tabs open? simulataneaous watching? 🙂

Charlie Betz - Forrester Research - Principal Analyst, DevOps/ESM20:10:22

I wasn't expecting Gather to start sounding off :face_with_rolling_eyes:

Charlie Betz - Forrester Research - Principal Analyst, DevOps/ESM20:10:21

We had bad leakage earlier today that everyone heard in breakout 4, multiple reports - thought this was more of same

Gene Kim, ITREV, Program Chair20:10:20

@char Not for me, buddy. (Gather?)

1
1
Charlie Betz - Forrester Research - Principal Analyst, DevOps/ESM20:10:10

that was it. we did have leakage earlier in another session.

Gene Kim, ITREV, Program Chair20:10:37

What Ops think of JVM:

👍 1
Justin Abrahms (eBay)20:10:54

So I'm clear.. keptn and tekton fit in a similar space?

Andreas Grabner20:10:31

Keptn and Tekton are a perfect fit! We are even on the same CDF eventing SIG to standardize who DevOps tools talk with each other

Justin Abrahms (eBay)20:10:14

hrm. So my group is doing tekton, which I believe is cd tooling. I would have expected that "cd tooling" is the thing that keptn would claim to do.

Justin Abrahms (eBay)20:10:21

I would have guessed it would be Keptn OR tekton, but you're suggesting it might be Keptn AND tekton?

Andreas Grabner20:10:42

Keptn is doing Data/SLO-Driven Orchestration of DevOps & SRE automation seqeuences. We are not yet another CD tool because you define your sequence, e.g: for auto-remediation in production - and Keptn is then orchestrating all the tools involved in restoring a sytem back to its healthy state. Most important is that Keptn uses SLO evaluation after any step to make data-driven decisions on what to do next

Andreas Grabner20:10:11

my next example #3 is actually talking about that remediation use case

Justin Abrahms (eBay)20:10:25

Gotcha. So it's a replacement for a jenkins pipeline that's used by ops to do non-deploy things, and adds layers of safety, etc.

Andreas Grabner20:10:43

haha - I even said it in the recordin: "Keptn is not just another tool to automate delivery" 🙂

1
Gene Kim, ITREV, Program Chair20:10:57

What Dev thinks of JVM:

👍 1
Andreas Grabner20:10:36

In case you are interested in Keptn -> https://github.com/keptn/keptn

Justin Abrahms (eBay)20:10:44

This CLI command is 😍

Andreas Grabner20:10:27

and there is not just a CLI - there is also an API where you can trigger the remediation sequence through a simple webhook 🙂

Justin Abrahms (eBay)20:10:07

Very cool. Reminds me of a feature that one of the feature flagging vendors has which you can turn on/off a feature flag w/ a curl which is great for runbooks.

🔥 1
Andreas Grabner20:10:50

We have an integration with e.g: Unleash and also working with launch darkly

Gene Kim, ITREV, Program Chair20:10:30

@andreas.grabner I’m dying of curiosity: how in the world did you build up so many war stories with failing CI/CD servers? 🙂

😄 1
Andreas Grabner20:10:14

just what you do: Talking with many people and listening what they are really struggling with

Craig Larsen - he/him - Solution Design Group Mpls20:10:42

@andreas.grabner I feel like this is a newbie question. Or maybe I missed this part. But doesn’t this mean you have to configure Keptn to tell it how to deploy an application (for example). Gracious thanks! And thanks for your talk!

Craig Larsen - he/him - Solution Design Group Mpls20:10:42

@andreas.grabner I feel like this is a newbie question. Or maybe I missed this part. But doesn’t this mean you have to configure Keptn to tell it how to deploy an application (for example). Gracious thanks! And thanks for your talk!

Craig Larsen - he/him - Solution Design Group Mpls20:10:24

So I’m kinda seeing this as the configuration you need to maintain, but that it helps with maintaining the automation.

Justin Abrahms (eBay)20:10:27

From my understanding, it's not just deployment. You can use keptn to drive ops workflows (like: turn off the high latency features b/c we have very high latency)

👍 1
Craig Larsen - he/him - Solution Design Group Mpls20:10:21

Yeah, I think I’m looking at this at too low a level.

Justin Abrahms (eBay)20:10:03

Here's what I'm taking away: If you want to build a magic runbook companion where you can kick off "make this better" workflows that auto-check your SLOs to ensure you didn't explode the world... you should do that w/ keptn.

Andreas Grabner20:10:46

So. Keptn itself doesnt delivery, doesnt do testing, doesnt do monitoring. Keptn does what you shouldnt do: Build your own automation scripts that orchestrate your tools by invoking them through a proprietary API. Keptn takes away the pain of building your own tool integrations and orchestration - all centered around SLOs

Andreas Grabner20:10:58

"Friends dont let friends build their own automation. Friends suggest to first have a look at Keptn". Just sayin ...

👍 2
Andreas Grabner20:10:18

To continue the conversation on Tekton and Keptn. Or any other CD tool -> keep them and use them for what they are really great for -> but then leverage the automation that keptn provides as it might be easier with Keptn, e.g: SLO evaluation, remediation, ...

Gene Kim, ITREV, Program Chair20:10:37

Very clever to open source Keptn, @andreas.grabner. Thank you and catch you soon!

Andreas Grabner20:10:04

Thanks Gene. Great to get a "virtual" stage to talk about our work

👍 1
Malcolm McAlpin20:10:54

Thank you!! 8))

💯 1
Andreas Grabner20:10:52

If you have any further questions feel free to reach out. Either here - direct message or find me in the #xpo-dynatrace channnel

👏 2
1
Slackbot21:10:29

Reminder: The final plenary sessions are starting in 5 minutes. Start making your way back to your browser and join us in #ask-the-speaker-plenary to interact live with the speakers and other attendees. https://devopsenterprise.slack.com/files/UATE4LJ94/F01D34MC2KS/image.png