This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-10-15
Channels
- # ask-the-speaker-track-1 (437)
- # ask-the-speaker-track-2 (251)
- # ask-the-speaker-track-3 (122)
- # ask-the-speaker-track-4 (136)
- # bof-american-airlines (3)
- # bof-arch-engineering-ops (3)
- # bof-covid-19-lessons (1)
- # bof-cust-biz-tech-divide (26)
- # bof-leadership-culture-learning (6)
- # bof-next-gen-ops (1)
- # bof-overcoming-old-wow (3)
- # bof-project-to-product (3)
- # bof-sec-audit-compliance-grc (11)
- # bof-transformation-journeys (4)
- # bof-working-with-data (1)
- # demos (57)
- # discussion-main (1491)
- # games (41)
- # happy-hour (162)
- # help (96)
- # hiring (12)
- # itrev-app (10)
- # lean-coffee (65)
- # networking (16)
- # project-to-product (3)
- # summit-info (199)
- # summit-stories (60)
- # xpo-atlassian (1)
- # xpo-delphix (48)
- # xpo-gitlab-the-one-devops-platform (2)
- # xpo-infosys-enterprise-agile-devops (2)
- # xpo-instana (3)
- # xpo-itrevolution (1)
- # xpo-launchdarkly (10)
- # xpo-moogsoft (3)
- # xpo-muse (9)
- # xpo-nowsecure-mobile-devsecops (3)
- # xpo-opsani (5)
- # xpo-optimizely (1)
- # xpo-pagerduty (18)
- # xpo-pc-devops-qualifications (5)
- # xpo-planview-tasktop (4)
- # xpo-plutora-vsm (1)
- # xpo-redgatesoftware-compliant-database-devops (9)
- # xpo-servicenow (1)
- # xpo-snyk (2)
- # xpo-sonatype (8)
- # xpo-split (9)
- # xpo-sysdig (25)
- # xpo-teamform-teamops-at-scale (6)
- # xpo-transposit (4)
Have been looking forward to this talk so much.
Super excited to be here for this talk and to listen to Alok Uniyal and Donald Patra on the HSBC Transformation Journey! 🙂
Welcome @donaldpatra and @alok_uniyal Uniyal for our next session's Q&A! Thank you to #xpo-infosys!
What were some of those behavioral changes that you found yourself undertaking @donaldpatra?
Some of the key changes were around team management, moving away from the centralized to more empowered team model , with lot more autonomy within the teas which later led to teams defining their OKRs and operating as a community.. so the behavior change was to give up the command and control /centralized model over time as the organization matured.
@donaldpatra Great insights ! Do you see the pandemic experience influencing the transformation agenda ?
Interesting observations during the pandemic.. The measures around mean time to recover, release frequencies, shift left etc.. has all shown a favorable trend.. this obviously required a robust remote working collaboration platform.. which we had been ready with
Hey guys, I'm here to take any questions you have on the Humanizing DevOps Through Data talk!
Welcome @steven.boone for our next session's Q&A and thank you to #xpo-hcl-software-devops!
Anyone having audio trouble listening to @steven.boone’s talk?
@steven.boone while you’re talking about the unplanned work, I noticed recently how synchronous most companies still operate even after 6 months of lock down. Asynchronous as opposite should help to accelerate but also to be able to operate more independent. Do you see this as a next step for large enterprises as well to remove “waiting time” for example?
I think the key is that regardless if teams are working synchronous or asynchronous, that they are able to know who is working on what. I think the main benefit the data provides is that visiiblity. BRB, apologies, my kid is crying - "new normal"
I think if we really want to reduce "waiting time" we need to know a few things. • what are we waiting for • where does it get helped up?
If we know those things, we can start asking the important questions, and hopefully getting to the "how do we move it along"
yeah, thats true, and in particular when new large projects start the what are we waiting for is sometimes harder to work out in detail in case everyone is remote and not coming together in one room like we did before, that is probably where some of the struggle is coming from
https://pages.services/hcltechsw.com/data-driven-devops-ebook/ Steve is talking about.
Loved the 5-point presentation! Simple, concise and powerful! Thank you Steve!
Welcome @andreas.prins for our next session's Q&A and thank you to #xpo-digitalai-accelerates-software-delivery!
thank you, always a bit weird to tune in to your own talk 🙂
@andreas.prins, what's a LCM board?
what is the technical state of your apps, or components of the apps that need to be addressed. Having a clear view of this for all your teams is super helpful to see the bigger picture
we’ve learned that making this visual is helpful because architecture is often so abstract and not visible
Great idea! Love the observability - big picture/small details, all on the same page
@andreas.prins This also helps the team feel better about not automating themselves out of a job, it helps them feel like they are shaping their future job.
that is sometimes a true fear indeed. Until they discover that freeing up time to focus on real development they feel 10x motivated
yeah, I often got the question during 1:1 and then after they were able to have free time to work on the more fun projects, I tend to ask them the question back, do they feel that they are less needed or more needed now? they say more need because the new projects are important and they are not doing repetitive work now…
fun thing we learned, sometimes little side projects from motivated and engaged colleagues often deliver the greatest value for the future
How do you balance self-service with rules, checks and balances? For instance, if I give my teams the independence to create their own pipelines, how do I ensure that they've got Sonar scans enabled as a part of the CI process?
that is a really nice question, I’ve struggled with this while managing my teams, but also many of our customers.
I’m now at a point that I do believe some mandatory components can be there, next to all the freedom that is for example tech specific. If you love to learn how our product support this, feel free to ask in #xpo-digitalai-accelerates-software-delivery
some of my colleagues will be happy to give a demo and explain this fine grained balance
same with cloud resources, i want to give them autonomy to create resources in a dev
env but how do we know they are properly securing?
we have learned we can't give developers that much latitude...we are building out a semi-automated pipeline that allows some variability, but comes with manditory components...
yeah that is a good one, I believe more modern frameworks like OPA (open Policy agent) for these type of resources can be very usefull and actually required
https://blog.xebialabs.com/2019/08/06/why-freeing-developers-requires-taking-control/
if the deployment went successfully, what would you recommend to search and what data to look for in the monitoring ?
would that be where VSM comes in looking for the bottle neck..
certainly, data is crucial in this, we’re as a company actively developing a lot of “lenses” to look into the process, understand the flow etc.
value stream management, or value stream insights I do believe are the most valuable if you know your goals, and if the chain from goal to strories and tasks are nicely planned
agreed… great job breaking this down and providing a framework…
We also looked at hand-offs as waste and where we can find some key wins for automations…
I once proposed (by that time in ING bank highly regulated) to turn it upside down and proposed to have a 100% pipe from commit to deployment in prod. That was impossible for many people just in their imagination, however, even CISO and other security departments changed their mind when they discovered what is possible. The funny fact, by setting this goal, it resulted in zero handovers at the end of the journey
agreed, we had the same conversation with our InfoSec team wanting division of labor, showing that if we automate, then there is less hands that need to be watched, and the audit process is easier to set up..
other funny fact, it was ultimately the business owner of the application that was starting the releases to prod. Knowing that with all the automated tests, quality gates etc… if it did not break it was of good enough quality (combined with monitoring) that was enough for infosec team since “business” was involved that way
that is the best, so you would override and accept your own risk… lol
Great talk and advice @andreas.prins - thanks!
enjoy the day! it’s 10 in the evening in Europe… will answer more questions tomorrow morning my time 🙂
or just go to #xpo-digitalai-accelerates-software-delivery for all our colleagues
Hi. I am here to answer any questions about my Atlassian session: Crushing Incidents with DevOps Agility. Let me know if you have any comments or questions.
Welcome @dhenry2 for our next session's Q&A! And thank you to #xpo-atlassian!!
ITIL has a bad rep because it tends to come across as process heavy and not friendly to speed (but they have the same mission)
Maybe it's a company maturity (yeah I know) where they have always been different
learn = RCA = problem management as a example...same thing different terminology
Yeah, but they have always cycled differently, with Problem being the "handoff" between the two
not sure I understand ITIL is a full loop as well ITIL V4 is adopting more agile approach
Incidents group to Problem which could cause a Change, with Known Error and Knowledge Base being in there as well.
Agree with you @bwilliams4 that there’s a lot of linkages between the practices. It can also be that a change is linked to an incident. One of the areas Atlassian’s product team has been investing in is giving you visibility into recent changes (such as a view deploys from Bitbucket when swarming on an incident.)
@archana_kataria - You raise a great point about ITIL. Our team at Atlassian recently did a deep dive on ITIL 4, the most recent release of ITIL and noted quite a lot of influence from agile methodologies. https://www.atlassian.com/whitepapers/itil4
the challenge that we are facing is the complexity of the systems and how to observe them
I mean can it be integrated so that it would catch mainframe alerts
Yes, we have a tool called edge connector which enables you to connect your infrastructure and applications behind the firewall to Opsgenie.
If it can be connected with mainframe, then is there a change management module also?
do the devops metrics retrieve information from GitHub and Jenkins as well, or only Atlassian products out of the box?
We have integrations on our roadmap with GitHub. You can use our api to help tie deployments to incidents.
RCA on every Incident? Wasn't the goal of Incident to restore service and then with Problem do RCA and then plan for removal?
Something change somewhere with what I know....
we are currently piloting OpsGenie and StatusPage, so it was a good quick sessions overview for me. Integration to other CI/CD tools for measuring metrics out of the box is key. Not sure if you folks saw the open source repo from Google on the Four Keys, but that would be exactly what we are looking for
Yes agreed, but in may cases in Dev a feature flag, or rollback is the path to resolution. But you are right, it also helps with problem management
@dhenry2 those are the actions we want but we are sometimes challenged with multiple services that are in the workflow that is experiencing outages to pinpoint who is the issue is a challenge
I've worked in places that had an "Incident" every 1.6s, 24 hours a day, 7 days a week.
I agree, we are simply trying to raise more info to the response team. One thing I did not show is that we can also suscribe to all of our 3rd party services that use statuspage. So we can help determine if a incident is caused by a 3rd party service failure.
we have over 200 integration and have api, webhooks, and email. so many of our customers are using Atlassian’s Opsgenie to centralize alerts from their entire stack. Then they user our rules engine to prioritize and in many cases ignore the noise.
too much data with noise drives alert fatigue also does not solve the problem of one cannot predict every failure scenario
Right that is why we use our rules engine to sift through the noise. We also can group alerts and delay notification in case the alert is flapping.
You can't do RCA at that rate, that is why you have Problem
how does (or doesn't) OpsGenie respect the groupings CSUM gives large entities access to?
we'll just ping our TEM 🙂 we've been bitten by plug-ins not playing well together and our customers rely on CSUM for group management
@vmshook if you would like to speak with someone more technical, DM me, I am not familiar with CSUM.
@vmshook I just found out the answer. Opsgenie does not respect the CSUM grouping. The users should be easily accessed, but we do not use the grouping when defining response teams.
Is the presentation live on track 4? I see only "get together/go faster" screen. Tried refreshing the chrome tab
Here is the link to Atlassian’s Incident Management Handbook I mentioned in the presentation. https://www.atlassian.com/incident-management/get-the-handbook