This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-10-07
Channels
- # ask-the-speaker-track-1 (422)
- # ask-the-speaker-track-2 (356)
- # ask-the-speaker-track-3 (215)
- # ask-the-speaker-track-4 (278)
- # bof-arch-engineering-ops (2)
- # bof-leadership-culture-learning (12)
- # bof-sec-audit-compliance-grc (1)
- # bof-working-with-data (1)
- # demos (7)
- # discussion-main (1182)
- # games (73)
- # games-self-tracker (1)
- # gather (4)
- # happy-hour (38)
- # help (82)
- # hiring (14)
- # lean-coffee (8)
- # networking (20)
- # summit-info (101)
- # xpo-adaptavist (5)
- # xpo-anchore-devsecops (7)
- # xpo-aqua-security-k8s (2)
- # xpo-basis-technologies (2)
- # xpo-blameless (3)
- # xpo-bmc-ami-devops (1)
- # xpo-cloudbees (14)
- # xpo-codelogic-code-mapping (1)
- # xpo-dynatrace (1)
- # xpo-everbridge (2)
- # xpo-gitlab-the-one-devops-platform (1)
- # xpo-granulate-continuous-optimization (1)
- # xpo-instana (1)
- # xpo-itrevolution (9)
- # xpo-launchdarkly (1)
- # xpo-pagerduty (1)
- # xpo-planview-tasktop (3)
- # xpo-rollbar (1)
- # xpo-servicenow (2)
- # xpo-shoreline (2)
- # xpo-snyk (2)
- # xpo-sonatype (7)
- # xpo-split (1)
- # xpo-splunk_observability (8)
- # xpo-stackhawk (2)
- # xpo-synopsys-sig (1)
- # xpo-tricentis-continuous-testing (1)
- # xpo-weaveworks-the-gitops-pioneers (1)
Reminder: The final day is starting now – opening remarks and then plenary talks! Join the conversation in #ask-the-speaker-plenary.
✨Let's welcome @bramley.maetsa and @guus.hutschemaekers for our next session's Q&A. Thank you to #xpo-servicenow!✨
@bramley.maetsa very good point. DevOps has to be defined by an org to meet its particular needs.
If you’d like a copy of Bramley’s slides feel free to send an email to <mailto:richard.hawes@servicenow.com|richard.hawes@servicenow.com>
Thanks so much for your note, everyone! Our VendorDome participants were eager and live a bit early. 🙂
Was not in recording. I can send a clean recording to anyone that needs it and it will be available in the Summit site after the event. <mailto:richard.hawes@servicenow.com|richard.hawes@servicenow.com>
Please note that you can video this entire video (without the voices 😉) in our Video Library right after the session.
If anyone wants a similar analysis bramley mentioned on their ServiceNow environment and CMDB, drop me a message 🙂
Thanks so much @bramley.maetsa @guus.hutschemaekers and #xpo-servicenow for your great presentation and roll-with-the-punches. Please be sure to visit ServiceNow's booth and #xpo-servicenow channel for follow up information and questions!
A warm welcome to @jcampos as he joins us for Q&A! And thank you to #xpo-splunk_observability for supporting DOES US 2021!
"Latency is the new downtime. It's not CPU or memory being pegged". <- this matches my experience as well, especially without good backoff/retry mechanisms.
Will we see the 4 Golden Signals make an appearance? 😂
Looking ahead at the slides, it appears not. This is a "what is open telemetry and why/how should someone use it" talk.
Does telemetry also include capturing data about how someone is using and application such as, what capabilities they are using in an application, or is more geared towards resources, processes, event logs, etc?
We have some home grown usage logging but standardizing on an open standard could be a worthwhile endeavor.
I don't think it's intended for the "are customers using this particular feature" sorts of workflows, especially given I think it will do sampling at scale.
opentelemetry doesn’t natively require sampling - and some platforms (e.g. ours) do not sample (by default). there’s not yet an opentelemetry schema for user events, but some platforms let you integrate say, RUM data, with opentelemetry-native events (i.e., metrics, traces, logs)
@ryanewtaylor Absolutely, we use traces to gather this data. Applications are instrumented to tell us how users are using the features of the application.
Boy I wish the conference overall had more practical guidance like this session. There are so many 'airy formless' presentations recounting an experience an org had moving from A to B. Very little details on 'what/how/etc'
Just curious.. any other particular talks you're looking for practical guidance on?
i believe the intent is the programming is set up to stimulate the follow on conversations that may go deeper into the tech/approaches/etc. In a half hour, it is a tough balance
Currently OpenTelemetry does not have Logging implemented in the Go library. Do you know the tentative timeline when it will be up for release (alpha or what-not?)
Hey @contact670 I am not 100% sure on this timeline. I would have to get back to you on this and confirm.
No worries, it's a nice-to-have atm as we'd like to have everything self-contained in our Go codebase.
but also found https://github.com/open-telemetry/opentelemetry-go/pull/2010
the OpenTelemetry logging specification overall is still not final, and I believe the team is working on getting that nailed down before expanding language support for it.
• Specification ◦ https://github.com/open-telemetry/opentelemetry-specification • OpenTelemetry Collector ◦ https://opentelemetry.io/docs/collector/configuration/ • Other ◦ CNCF Demo – https://www.youtube.com/watch?v=31poMDrZSug ◦ Tags Webinar - https://www.youtube.com/watch?v=XmUHhiVwCLA ◦ Blog Post - https://splk.it/3liBeLf ◦ https://opentelemetry.io/docs/workshop/resources/ ◦ https://devstats.cncf.io/ ◦ https://github.com/open-telemetry/community#special-interest-groups ◦ Gitter: open-telemetry/opentelemetry-service
If anyone has another questions, happy to answer. Please feel free to DM me. 🙂
Finally, if you enjoyed Johnathan’s presentation, he also recorded a video showing end-to-end how to instrument a Java app with OpenTelemetry: https://www.youtube.com/watch?v=oKxHnkuA5zY
🌟A warm welcome to @hlynch and @stephen who will be moderating today's VendorDome Q&A between @sacha and @brianf. Thank you #xpo-cloudbees and #xpo-sonatype for hosting our final VendorDome of DOES US 2021! 🌟
What is top of mind for Developers who are facing challenges with the shift in responsibilities as it relates to security? What is working? What isn’t? Questions for Sacha or Brian?
Haha … finally someone who knows that a developer can feel, well, like a mushroom being fed. 😄
A challenge I see is that the incentives for developers are to build and implement new work, and rarely are they given the time to review that work with a mindset of security.
If we integrate tools into our deployment pipelines that will put on the brakes for code that isn’t secure enough, we can get pushback around why we can’t get this delivered to production. How can we get buy in around the security tools running in the pipeline being as valuable as test automation in the pipeline?
We did that. False positives broke us down. How did we take care of false positive? Go with the tool: critical, less critical
What is the biggest change you have seen in your development role with devsecops good or bad? Changes on your wish list?
the good is that security is getting considered much earlier that it used to be. It is getting integrated into the pipeline, and cloud vendors force you to think about identity and access from the beginning which is big improvement over the past in the datacenter when security was thought about only after all the code is built and ready to go live.
Continuing education on secure coding practices remains a challenge. There is always a ton to learn in this industry and I don’t think we are doing as much as we should be to help developers grow their skills.
We need to help the developers to understand security coding; right now, they are often security hobbyists.
Along with blocking code during deployment, is finding code that has a security issue after it has been deployed (like a month later). I worked at a place where some tools were deployed and then wouldn’t be changed for months because they worked. But if they were deployed they would have tripped security issues.
How do you think we balance the speed we get from the open source ecosystem we have with node, nuget, etc with these newer requirements to understand your dependencies and risk from them?
What an illustrious cast of experts!!! @sacha @brianf @stephen @hlynch What is the most optimistic thing you’ve seen to help devs, ummm, rise up to the level that the outside threats demand. I’m utterly in awe of the carnage of the effects of Solarwinds, codecov — and how software supply chains and CI/CD has been put in the spotlight.
(Not anything against our friends with Ph.D.s.. Dr. @stephen 😆 )
the other thing that we are dealing with CI/CD, is time it takes for security tests. We are doing veracode and twistclock, it takes 1 - 1.5 hr to finish !!
I had similar speed issues in a past life with checkmarx! Scans took over 24 hours on some code bases.
“It’s hard to find optimism” — a typical quote from grizzled@brianf , creator of Maven Central. 😆 cc @topo.pal
What can we do make this area more simple for Developers? The more simple the more secure is usually my rule of thumb, am I thinking about this wrong?
“at least it’s not as bad as 2011, when people were just worried about GPL licenses and everything else was left to the Infosec team” — @brianf (did I get that right? and you call that optimism? 😆 )
I only have to avoid the agpl and we have security teams and firewalls for everything else (circa 2011)
I’d love to hear from each of the panel: “what was your reaction when you read/heard about the scale of Solarwinds and codecov issues?”
How do you get developers to care about security vulnerabilities even if their application “isn’t critical to the operation” but is still exposed to the internet
“the easiest dependency you have isn’t the one that isn’t in your codebase” — @stephen 😆
Is that a correlation to “The best network security tool is a pair of scissors” ?
“modern attacks are focused on supply chains, and YOUR developers, and in the secrets in the open source community” . — @brianf It really is awe inspiring to see the full impacts of this — quite literally showed how unimaginative I’ve been. 🙂
Okay, I gotta share a surprise — how a CEO might blame a summer intern for a systemic development issue. Not our industry’s finest moment. 🙂
Great point on crypto making everything a valid target now!
Even simpler: give teams time to experiment with OWASP & BURP and the training apps.
It is surprising how even those samples make you take a step back and realize the most frequent vulnerabilities are not occluded.
@prashant.darisi Hi! Here is the slack Channel where we will be addressing Audience Questions during our session starting at 2:50 PM ET.
Any topics you’d like to ask Sacha or Brian about? Even if off our topic of security as we approach the end of the session?
@topo.pal I’ve been dazzled by some of the innovations being discussed in the Deno framework, created by the node.js founder, putting in new permission model to prevent dependencies from access system resources. https://deno.land/ And Richard Feldman is now working on Roc, which rethinks the relationship and the runtime/VM, going way beyond what Deno is thinking. Allows different runtimes, different memory allocators, sandboxing, etc.
That Roc video is cool. Feldman has some notoriety because he wrote the book on Elm, and now he’s off taking what he learned to domains outside of the browser. And presents a very new novel relationship between code, something he calls the environment, and the runtime. “Environment” is something between the code and runtime.
Although I cringe everytime Feldman says “virus” — when he really means “arbitrary code execution”
Supply chain vulnerability is the trending topic this year. Say we learn how to secure the Supply Chain, what is the next big security threat on the horizon.
@topo.pal It seems like this that eliminates a whole category of errors, such as dependencies being able to basically do anything they want.
• Global survey of C-level executives performed by CloudBees • Executives overwhelmingly claim their software supply chains are secure (95%) or very secure (55%) and 93% say they are prepared to deal with an issue such as ransomware or a cyberattack on their supply chain. • Vulnerabilities ◦ More than two in five (45%) executives admit that initiatives to secure their software supply chains are halfway complete or less, and ◦ 64% say they are not sure who they would turn to first if their supply chain was attacked. • 64% say it would take more than four days to fix the problem if they did experience an issue. For a Fortune 500 company, this could result in the loss of millions in revenue and create significant reputational harm. • And, while 93% of executives say they routinely practice dealing with a supply chain production vulnerability, 58% say that if they experienced one they have no idea what their company would do. • More than four in five (83%) C-suite executives say having security issues causes their developers to drop everything to review code, which in turn causes other business disruptions. By dealing with security issues, 82% of executives say they are losing time employees could be spending on innovation • Almost all executives say container images are checked for high or critical vulnerabilities (95%) and their automation access keys are set to expire automatically (95%), while 92% say their company only accepts commits signed with a developer GPG key. Nine in ten C-suite executives say dependencies to trusted registries are limited at their organization (90%) and that administrative access to CI/CD tools is restricted (89%).
@brianf is incapable of even hypothetical optimism — he’s like the grizzled veteran who has seen too much. 🙂
“To see the open source community mobilizing shows that the problem is recognized.” YES! OPTIMISTIC! Nice job, @sacha and @brianf!! 😆
@sacha You achieved something I haven’t been able to do for years, not for lack of trying! 🙂 Nice!
@genek will this talk be available in the library???
@genek will this talk be available in the library???
• Vendordome will be added tomorrow at the latest • Tales From the Branches with Steve George is here: https://videos.itrevolution.com/watch/621617856/
@alex The Vendordome talk “VendorDome: What does it mean to be a developer in the era of cyber crime?“. Thanks, Alex!
@joe.waid @vmshook @virginia.shook Answer in the thread. : )
That was awesome, @brianf! For anyone observing my teasing of @brianf — I’m a huge fan of his work, and the stories behind Maven Central are epic. Here’s an amazing interview he did for The Manifest podcast:
I’ve really enjoyed this discussion, and not just because I asked so many of the questions! :rolling_on_the_floor_laughing: Thank you @stephen @brianf @sacha @hlynch!!
Great discussion, thank you, @sacha @stephen and @brianf!
There’s one pending on the CTO confessions @genek that really dives into this supply chain topic
Thank you all for your participation, this was really fun to have so much traffic on Slack! GO GO GO!!!
🌟Welcome @steveg for our next session's Q&A. Thank you to our sponsor #xpo-Weaveworks! 🌟
(I enjoy the interviews on The Manifest, despite the fact that most of them are the same story of “a bunch of developers try to build a package repository, and endless mayhem ensues.” Great fun if you enjoy mayhem. 🙂
Have you also looked at adding some of this to Chat ops?
I’ve seen some minimal interaction with chat systems with gitops integrated. Essentially confirmation type dialogs.
Love this , how non-tech people can understand the change ? is there any description on it for them ?
@mr.denver.martin no reason you couldn't do it for sure, and relates to my point about 'git being a place of collaboration'. We see people alerting back into slack etc - but haven't directly played with driving from chat yet
@pedro.jordan Thanks, there's material on the Weaveworks site that's designed for non-tech people. Also the CNCF working group does try and represent user interest: https://github.com/gitops-working-group/gitops-working-group
Is all documentation stored in git with gitops rather than in conference or other tools?
@jroberts We certainly believe that you should store "everything needed to operate the service" together with as a single unit of operation. An obvious example of that is things like monitoring/alerting definitions and the Playbooks for the service.
That way everything is versioned together. In practise enteprises have often made decisions on parts of this tooling - so that's why we try to be pragmatic about the adoption path
phew, for a minute there I thought you were suggesting we stop documenting in Sharepoint. 😞
I managed to get our architecture team to agree that Sharepoint was only for things that change less than once a year. So they can keep our architecture standards in sharepoint, but I don't need to put the reference architectures there because something will always change.
Is gitops most applicable to 1 repo for all pieces of your app, or many separate repos. I.e., mono-repos vs individual repos. For instance, if you have a SPA + some web services + db, how do folks typically break up their repos with devops in mind?
@ryanewtaylor Yes people choose lots of different options here. I think the easiest thing is to think about it from an organisational perspective, so have the services together that a team will deploy. You can use branches or folders easily enough - so for me the "organisational view" has been the main lesson
Does that mean, that pieces that should move together, in an atomic way (say, client and api), should be organized together in one repo?
Yes, I think this is an important consideration. We want to make it easy to deploy an "atomic" change (as you said). If that means that logically they are together it's easier to have them in a single "configuration repository" together.
Commonly, you don't want to land up with 1:1 for every service. So I think about teams, and then clustering services that are logically deployed together.
(I don't know why I said commonly there - you NEVER want to have 1 separate git repo for each service right 🙂 )
I have done both, one mega repo and we swung the other way, essentially 1:! and need to find a better balance. 🙂
Ack! we run regular GitOps events @weaveworks, please come along one time and describe your situation - people are always interested to hear - and our team have done "all the options" so you would definitely get some interesting opinions ;-)
⚡Hello @prashant.darisi for our next session's Q&A. #xpo-everbridge is one of our great sponsors! ⚡
Should we not focus on deferring the impact of change via canary /chaos or circuit breakers
That will work Sanjeev and that should be part of the workflow process; the question always asked is "do we embrace risk, or do we mitigate risk"
despite your best efforts, as you see customers are experience disruption....so need a strategy for both
oops...meant to say it should be an NOT either-or discussion...we can do both
Thank you Prashant , I agree its both and embracing risk is key and fail safe fail fast could be one more aspect to it
Facebook and WhatsApp, great examples...now we know what happened (and how) despite the best efforts ...bad changes do slip through 🙂
automation and proactive remediation is so important to getting the important MTTR down. using a ticket-based approach is so problematic.
agree , moment ticket is created we already landed into reactive world -- understanding baseline and deviation from baseline is an indication
i joke to my team have a dashboard which is blank and if anything is bad it pops up with relationships mapping
have you found any effective techniques for calibrating the right point to start the “proactive remediation”? make sure you’re not too early or (especially) too late…
and what about signal enrichment...we know that as a human being, once we see 1 alert...we will also look the others systems to see if we can find more...so, when we receive an alert from from system, why not proactively gather data from the other monitoring systems, log managers, ...and enrich the signal
It is a learning exercise...we have a customer that has done this over 2-3 months and NOW decided that they are about 28 'pro active' remediation steps that can run without APPROVAL
however, our systems ALSO supports seamless approval proces...Jeff, you could be that approver "you might get a text or a phone call to say"...the recommendation is to execute these scripts, "press 1 to approve" or 2 to join the conference bridge...or 3 to deny the remediation
i imagine this approach might feed a regular retro/post-incident review to learn the conditions to move this to “automated approval” so i wouldn’t need to “approve” next time?
it sounds like this is addressing a couple of related issues: monitoring tool sprawl (signals coming from lots of directions) and cognitive overload — especially for ops staff.
Yes it is...folks need to know what is the difference between Sev 1 in their left hand vs. Sev 1 in the right...everything looks bad when there is overload
but at least you know something is bad. what’s worse is you miss the signal that “something is bad” because of overload.
‘escalating to “cross functional disciplines” to address major incidents, such as ransomware’ — even hearing about these scenarios are so stressful! 😆 Hearing the stories of POS ransomware this were truly surreal and awe-inspiring.
Colonial pipe is a great example...Start as. "Digital Critical Event"...affects Supply chain...and a day later you cannot fill gas in your car....
Totally — I was following the impacts of the Kaseya ransomware and its impacts on merchants. Truly amazing. Did you know whether it affected e-commerce at all, or was it just POS?
so Business Continuity, Supply Chain Resiliency plans had to get activated...of course the Security and IT teams did their part
👏:skin-tone-2:Welcome @ally.corsetti to field questions for our next session's Q&A. A big thank you to #xpo-aqua-security-k8s!👏:skin-tone-2:
I’m dying to ask: I’m always so afraid to update my container images — because I’m afraid that everything will break. Am I the only one, and what advice would you give to people to keep their containers up to date? 🙏
@genek Great question! Let me follow up with our team and get back to you ASAP :)
@genek Images should be updated when there's a reason to do so. Either the new version adds capabilities, solves bugs or resolves security issues. When there's a good reason to update an image, like any piece of software, it should be tested in a secondary environment before loaded into a production environment. If security is important, there will be enough planning, time and resources to provide the test and the processes around it, so it will become easier and less scary to updated the images.
HAHAHAHA. A wonderful piece of advice, which I’ll take… umm, next time around. 🙂 Going from Heroku cedar-14 image to cedar-18 was terrifying. Was motivation to do it better next time around, for sure!!!
“afraid” ===> “update never until they deprecate the container image”. I’m ashamed to say that this happened to me in Heroku earlier this year. …err….I’m asking for a friend.
🌟@andreas.grabner joins us for our next session's Q&A. Thank you so much #xpo-dynatrace !🌟
I am here 🙂 - let me know if you have questions. This is what you miss in case you dont watch my sessions
Well - the other title "I would like to be like Gene Kim" is too long for most title input fields 🙂
@andreas.grabner Several stories of “CI/CD systems crashing under heavy dev load”, including the Vanguard Chaos Team simulating this, so they could finally figure out what was going wrong. I suspect these stories would resonate with you, even though which tool was never specified, except for “thread dump” — that means Jenkins, right? 😆
definitely exactly in my field. We at Dynatrace also still use a lot of Jenkins - and have just contriubted back to OpenTelemetry for Jenkins - capturing distributed traces to better understand where your end-2-end pipelines are slow, failing, waiting ...
Hahaha. I’ve become a huge fan of the JVM since falling in love with Clojure. Now I’m in awe of JVM, and pride myself that Java stack traces make sense to me now. 😆 Watching Brian Goetz talks is one of my fave things to do now.
We had bad leakage earlier today that everyone heard in breakout 4, multiple reports - thought this was more of same
Motivated me to write this: https://sysadvent.blogspot.com/2019/12/day-9-in-defense-of-modern-day-jvm-java.html
Keptn and Tekton are a perfect fit! We are even on the same CDF eventing SIG to standardize who DevOps tools talk with each other
hrm. So my group is doing tekton, which I believe is cd tooling. I would have expected that "cd tooling" is the thing that keptn would claim to do.
I would have guessed it would be Keptn OR tekton, but you're suggesting it might be Keptn AND tekton?
Keptn is doing Data/SLO-Driven Orchestration of DevOps & SRE automation seqeuences. We are not yet another CD tool because you define your sequence, e.g: for auto-remediation in production - and Keptn is then orchestrating all the tools involved in restoring a sytem back to its healthy state. Most important is that Keptn uses SLO evaluation after any step to make data-driven decisions on what to do next
Gotcha. So it's a replacement for a jenkins pipeline that's used by ops to do non-deploy things, and adds layers of safety, etc.
haha - I even said it in the recordin: "Keptn is not just another tool to automate delivery" 🙂
and there is not just a CLI - there is also an API where you can trigger the remediation sequence through a simple webhook 🙂
Very cool. Reminds me of a feature that one of the feature flagging vendors has which you can turn on/off a feature flag w/ a curl which is great for runbooks.
@andreas.grabner I’m dying of curiosity: how in the world did you build up so many war stories with failing CI/CD servers? 🙂
just what you do: Talking with many people and listening what they are really struggling with
@andreas.grabner I feel like this is a newbie question. Or maybe I missed this part. But doesn’t this mean you have to configure Keptn to tell it how to deploy an application (for example). Gracious thanks! And thanks for your talk!
@andreas.grabner I feel like this is a newbie question. Or maybe I missed this part. But doesn’t this mean you have to configure Keptn to tell it how to deploy an application (for example). Gracious thanks! And thanks for your talk!
So I’m kinda seeing this as the configuration you need to maintain, but that it helps with maintaining the automation.
From my understanding, it's not just deployment. You can use keptn to drive ops workflows (like: turn off the high latency features b/c we have very high latency)
Yeah, I think I’m looking at this at too low a level.
Here's what I'm taking away: If you want to build a magic runbook companion where you can kick off "make this better" workflows that auto-check your SLOs to ensure you didn't explode the world... you should do that w/ keptn.
So. Keptn itself doesnt delivery, doesnt do testing, doesnt do monitoring. Keptn does what you shouldnt do: Build your own automation scripts that orchestrate your tools by invoking them through a proprietary API. Keptn takes away the pain of building your own tool integrations and orchestration - all centered around SLOs
"Friends dont let friends build their own automation. Friends suggest to first have a look at Keptn". Just sayin ...
To continue the conversation on Tekton and Keptn. Or any other CD tool -> keep them and use them for what they are really great for -> but then leverage the automation that keptn provides as it might be easier with Keptn, e.g: SLO evaluation, remediation, ...
Here the video from Mike -> https://www.youtube.com/watch?v=6vd8rtcoV9k&list=PLqt2rd0eew1YFx9m8dBFSiGYSBcDuWG38&index=5&t=2s
Here the story from Austrian Online Banking -> how they do release validation -> https://medium.com/keptn/keptn-automates-release-readiness-validation-for-austrian-online-banking-software-eaaab7ad7856
Very clever to open source Keptn, @andreas.grabner. Thank you and catch you soon!
If you have any further questions feel free to reach out. Either here - direct message or find me in the #xpo-dynatrace channnel
Reminder: The final plenary sessions are starting in 5 minutes. Start making your way back to your browser and join us in #ask-the-speaker-plenary to interact live with the speakers and other attendees. https://devopsenterprise.slack.com/files/UATE4LJ94/F01D34MC2KS/image.png