This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-06-24
Channels
- # ask-the-speaker-track-1 (423)
- # ask-the-speaker-track-2 (320)
- # ask-the-speaker-track-3 (405)
- # ask-the-speaker-track-4 (68)
- # bof-arch-engineering-ops (6)
- # bof-covid-19-lessons (6)
- # bof-cust-biz-tech-divide (10)
- # bof-leadership-culture-learning (8)
- # bof-next-gen-ops (17)
- # bof-overcoming-old-wow (1)
- # bof-project-to-product (5)
- # bof-sec-audit-compliance-grc (13)
- # bof-transformation-journeys (10)
- # bof-working-with-data (24)
- # discussion-main (1276)
- # games (69)
- # games-self-tracker (3)
- # grc (1)
- # happy-hour (189)
- # help (166)
- # hiring (12)
- # lean-coffee (20)
- # networking (5)
- # project-to-product (4)
- # snack-club (42)
- # sponsors (85)
- # summit-info (274)
- # summit-stories (3)
- # xpo-datadog (2)
- # xpo-digitalai-accelerates-software-delivery (14)
- # xpo-github-for-enterprises (14)
- # xpo-gitlab-the-one-devops-platform (14)
- # xpo-itrevolution (6)
- # xpo-launchdarkly (1)
- # xpo-pagerduty-always-on (1)
- # xpo-planview-tasktop (7)
- # xpo-slack-does-devops (8)
- # xpo-snyk (2)
this is going to be interesting, I just worked out an entire new setup for our terraform stuff
this is going to be interesting, I just worked out an entire new setup for our terraform stuff
it may not go into enough detail for you - 30 minutes is a terrifyingly short amount of time to cover such a huge subject
Yeah it's a massive subject 🙂 Thanks for the offer! Will probably find questions 😛
would be great to understand what you did too, we will definitely be able to learn from you too
Super happy to talk that through yeah, probably less concise than your version, preparing a talk tends to structure thoughts better 😄
ha - thanks; it didn’t get nearly as much prep as it should have done :rolling_on_the_floor_laughing:
@richard431 - What is the number of “product teams” developing the (I think you said 20) microservices?
@richard431 - What is the number of “product teams” developing the (I think you said 20) microservices?
You might discuss this later, but how have you handled rollbacks using terraform? We have a lot of custom wrappers with mixed results.
You might discuss this later, but how have you handled rollbacks using terraform? We have a lot of custom wrappers with mixed results.
I don’t quite get to it. The most important thing is to test at the “service” layer (you’ll see that in a sec) that rollback is possible
Splitting states seems to be fairly important in our setup, then you can have the most volatile layer isolated in its change
yeah - splitting state is absolutely vital. We have one statefile per “service” per “environment”
and usually you can just revert the evil commit and run terraform apply over it again - tends to cover most cases
yeah - however if you’ve tested as part of your IaC SDLC that that approach DOES work, then you can be doubly sure that you’re covered
@richard431 Did you ever consider using Feature Flags to manage rollouts and state splitting?
We use / and provide an integration into Terraform so that you can manage all of your feature flagging centrally. Thus controlling everything from Canaries, Betas, Rollbacks, anything wrapped in a flag from a single UI.
Terraform is also really great for Flag cleanup, as you can keep all the flag definitions as application code.
@richard431 - Slightly off topic from Terraform as a whole, but do you guys live on EKS or self hosted Kubernetes?
@richard431 - Slightly off topic from Terraform as a whole, but do you guys live on EKS or self hosted Kubernetes?
terraform modules in separate git repositories is so vital for keeping sane
terraform modules in separate git repositories is so vital for keeping sane
AND HAVING SOMEONE THAT MAINTAINS THEM.... don't just create it and then ignore it forever 😂
it means i’m quite surprised when the next slide arrives :rolling_on_the_floor_laughing:
Hah interesting we went away from that module component :thinking_face: It was a large overhead and the team I'm currently in is not that big and only has a few people writing infrastructure
Hah interesting we went away from that module component :thinking_face: It was a large overhead and the team I'm currently in is not that big and only has a few people writing infrastructure
Usually with smaller teams I've found that starting from the component level is a bit easier to get up and running with
Great talk @richard431! We're in the Azure world using Terraform and learning as we go so this is great for us
The components creating alerts for stuff are so smart - I'd so going to steal that 🤯
The components creating alerts for stuff are so smart - I'd so going to steal that 🤯
Be interested to know how you pull in the other modules into another terraform. We’re currently using separate modules like you are 1 to 1 with a AWS resource, but currently setting up say a S3 bucket, then using outputs/remote state to obtain the ARN for example
Be interested to know how you pull in the other modules into another terraform. We’re currently using separate modules like you are 1 to 1 with a AWS resource, but currently setting up say a S3 bucket, then using outputs/remote state to obtain the ARN for example
We’ve moved to Terragrunt to combat some of the environment challenges, which has helped a load for us.
@richard431 - Services are re-used across environments? You might have “smaller” versions of a service for staging versus production?
@richard431 - Services are re-used across environments? You might have “smaller” versions of a service for staging versus production?
I'm also a keen user of some other Hashicorp products (including packer and vagrant). I love thinking about the 'chicken and egg' challenge with infrastructure-as-code for things like SCM/repository systems that contains the code and binary build artefacts to build the SCM/repository systems 😉 The answer: the egg came first (dinosaurs laid eggs well before chickens existed!) Also it probably doesn't matter anyway, if it can all be self-referencing, bootstrapping, however you want to explain it (I guess similar to compiling a compiler on a kernel compiled by the same compiler...) and idempotent
I asked my 8 month old little boy what should come next and he did an amazing homer impression
npm update is probably the scariest command I usually issue though 😄 it just downloads the entire internet
npm update is probably the scariest command I usually issue though 😄 it just downloads the entire internet
I think that is node’s fault rather than the concept of dependency management’s fault 😄
Just seeing terragrunt, haven't looked into that in ages - what does it now bring to the table after we have state locking and workspace in place?
Just seeing terragrunt, haven't looked into that in ages - what does it now bring to the table after we have state locking and workspace in place?
the main thing is that you don’t have to repeat your backend configuration in every “service” in the environment
you just have a terragrunt.hcl at the top of the environment which declares that
I do like it for that, though - and the ability to run Terraform across multiple folders that are dependent on each other.
that’s a good shout @j.white.1 - we are running a plan across the entire environment every hour which uses exactly this feature
That might actually be a really good feature yeah - my folders usually are split by how often they are applied so this is usually not hitting me as hard I suppose
Have not yet had the opportunity to play with it but definitely on my to-do list https://terratest.gruntwork.io/
Have not yet had the opportunity to play with it but definitely on my to-do list https://terratest.gruntwork.io/
its a bit wierd, cause usually you’d write your tests in whatever language you’re writing in
I've done a bit of that in very lightweight teams - where the "test" in the end is just a tf apply on a temporary environment and followed by a destroy Doesn't really test on a functional level but atleast shows you the code does not explode
Yeah, don't think IaC is quite ready to join the 'test in production movement' just yet... 🙂
Have you used other tools like AWSpec or ServerSpec and if so, how do you feel it compares to Terratest?
I really like them both, and have even forked it with a view of porting it to go so that I can have it in the Go world. We have more go skills at Babylon, so terratest made more sense
ahh yea, we just started using terratest just recently. It works surpsingly well with kubernetes
Do you run anything like https://www.runatlantis.io/ for the terraform/terragrunt plan? It’s something i’m looking into myself currently
'extending the concept of a micro service at the infrastructure layer' fundamentally so powerful for pushing higher quality into non-functional (especially performance and operational acceptance) test scenarios
@richard431 Good shout on running terraform plan regularly to see if anyone has manually changed the resource - we'll have that!
@richard431 Good shout on running terraform plan regularly to see if anyone has manually changed the resource - we'll have that!
Mhm there it is again, the symlink of global variables :thinking_face: That always breaks for me because linux and windows had problems on these 😞 (might be all gone now I haven't looked at that for ages)
Mhm there it is again, the symlink of global variables :thinking_face: That always breaks for me because linux and windows had problems on these 😞 (might be all gone now I haven't looked at that for ages)
This is great @richard431! I see you have broken down the testing and you have Unit Testing -> Does the resource deploy? Are your team's unit tests following the rule of: These tests run even if the machine you are testing on is completely isolated/air gapped?
This is great @richard431! I see you have broken down the testing and you have Unit Testing -> Does the resource deploy? Are your team's unit tests following the rule of: These tests run even if the machine you are testing on is completely isolated/air gapped?
@richard431 How do you establish a connection between app code repos and the repos that contain the “Service” from an infra perspective?
I'd definitely like to try and learn more on this, especially how does this compare to 'managing state-as-code' with things like Terraform Enterprise/Cloud services watching/testing SCM changes for consistency/compliance Thanks so much @richard431 brilliant talk
Good thing to know is that my approach is not that different so I feel validated now 😂
Good thing to know is that my approach is not that different so I feel validated now 😂
awesome talk @richard431, validates some of the stuff we have done and gives me a few new ideas as well :thumbsup:
@richard431 Great talk! I can't wait to show it to my coworkers that are struggling with the same issues at Stack Overflow.
This is great for us, we are just at the start of our terraform journey and trying to figure out structures and working practices
Good talk! Really helpful since Terraform is what we use. Gave me some ways to imrpove and focus on better breakdown.
I am very early on this journey - but know that it’s a journey that needs to happen.
@richard431, nice presentation, thank you! Do you have any rules on how setting boundaries for the scope of your environment creation and testing? For instance, in a scenario where I have an app that consumes an API backend and has an API Management in the middle. In your experience, should we handle it individually each piece or in this case we could handle as a "package"?
@richard431, nice presentation, thank you! Do you have any rules on how setting boundaries for the scope of your environment creation and testing? For instance, in a scenario where I have an app that consumes an API backend and has an API Management in the middle. In your experience, should we handle it individually each piece or in this case we could handle as a "package"?
I think you have to look at how they are released. Are changes made to just the application, or just the backend? If so they should probably have a separate release train and be managed separately. If, on the hand, you find that you are most often making changes simultaneously to both, then they really are just one thing and you should join them together
as a rule of thumb, in my experience, if you aren’t sure which to do, then go for more smaller release packages - so keep them separate.
Sounds also reasonable for starting and iteratively improve it adding the components. Thanks for sharing!
@richard431 I really enjoyed your presentation. I am proposing a terraform / chef implementation for the company I work for and I like your approach. What were your challenges with VNet separation between applications?
@richard431 I really enjoyed your presentation. I am proposing a terraform / chef implementation for the company I work for and I like your approach. What were your challenges with VNet separation between applications?
at the MS level we’re exclusively using K8s so we segregate using isthio rather than sperate vnets - would be interesting to talk through you problem though1
Oh bummer, ok I can go high level. I inherited a structure with large vnets and groups of subnets with separation (loosely) by sservice, so all apps with infra are in one subnet and services of type A in another
Obviously other control structures in place with these, but in listening to your approach it sounds like your focused on the container structure. I'm not there yet :)
Actually writing it out like this i think I see how I can manage this in the app grouping similar to your approach
ok we have a “special” service called bootstrap
which creates all the common stuff in the environment - that would create and output the vnets
now i like to use “data only modules” for services to look up what they depend on which already exists, but you could just as easily do a data lookup from the component itself
then you’re saying in your sevice that you want the infrastructrure to be deployed to “subnet type A” and it will look up which subnet that is at deployment time
Cool, is that within the Terraform ecosystem? Or is it a custom add on? I'm just getting my team started with Terraform, shifting out of direct ARM template management in our repo.
Ok i follow, that sound way more manageable and we can still verify that no changes have been injected into the deployments
great 🙂 - happy to talk through in more detail if you get stuck - ping me on linkedin or something 🙂
@ann.marie.99 @cncook001 Are you available for Q&A right now?
hey everyone! Just managed to tune in today. What presentation is now on in this track?
hey everyone! Just managed to tune in today. What presentation is now on in this track?
There was a scheduling mishap. The audit presentation will be rescheduled.
I'm not sure yet. As soon as I find out, I'll let you know 🙂
@areti.panou Glad to hear! @lewir7 and I are looking forward to everyone seeing it 🙂
@lucasc5 @lewir7 I'd love to connect and compare notes on dashboards.
@bryan.finster I believe there was a scheduling mishap. Our presentation will be rescheduled for a later time. The presentation currently playing is from @ann.marie.99 and @cncook001. Thanks!
@ann.marie.99 and @cncook001, I'd love to connect and compare notes on dashboards. 😂
@ann.marie.99 and @cncook001, I'd love to connect and compare notes on dashboards. 😂
We are working on building low level engineering metrics to help teams with the right behaviors. The challenge is that getting these metrics can be very time consuming.
"Platform" in this context is something like "Commerce Platform Squads". These teams are focused on things like checkout flows.
Sorry, I was asking @jose_mingorance what tooling they were using for CD platform. @cncook001 I'd love to schedule a zoom meeting sometime, if you and @ann.marie.99 have spare minutes.
@jose_mingorance we use Hygieia to aggregate data. It will integrate with those easily.
we are experimenting with Splunk to aggregate and trend. We did play around with Hygieia but did not go far.
Setup can be a bit of a struggle sometimes, mostly a documentation issue.
@cncook001 we are building dashboards using Hygieia data to gamily metrics and to use the metrics to direct teams to CD playbooks to help improve them.
@jose_mingorance I'd ping the Hygieia core team. They are working to improve that.
Nice. We are trying to do the same. Metrics not to control but to guide and coach. Feedback loop into our Dojo coaching too.
The open source tool I referred to earlier in my talk was Hygieia. No travis support and too hard to add it. We built our own.
We only use it for collection. Visualization we are working to make more actionable.
I like the PR duration scoring. It's on our roadmap for the coming quarter.
We score on integration frequency, build duration, build stability, deploy frequency, and Sonar violations.
I also want to score on code inventory: code on any branch that is not in prod.
I was here for the other presentation but I am glad that I get the chance to watch this one as well. Really useful, thanks!
<!here> Sorry - technical mishap! @ann.marie.99 and @cncook001 will re-air at 1:55. Nationwide will air tomorrow - stay tuned! Sorry everyone!
this is a very useful presentation (even if it wasn't scheduled for now/ended early)! Do you have a breakdown of the various metrics you used? I made a note of some but think I missed some of the points you were making @ann.marie.99 @cncook001
this is a very useful presentation (even if it wasn't scheduled for now/ended early)! Do you have a breakdown of the various metrics you used? I made a note of some but think I missed some of the points you were making @ann.marie.99 @cncook001
https://github.com/devopsenterprise/2020-London-Virtual I believe “Day 2” will show up here at the end of today.
just fyi - the session that was airing will be played again at the correct time
Well what ever happened, @ann.marie.99 @cncook001 that was a brilliant set of insights. Love to talk more. I was riveted all the way.
<!here> Ann Marie Fred and Craig Cook's talk will re-air 1:55 during the correct time!
TL;DR yes and no! First as Craig suggested too, my views are my own and not necessarily shared by those of my employer 😉 IBM is so big that if you think of any software/tools, somebody somewhere in IBM is very likely using them in anger but not necessarily globally/enterprise-wide. I'm an IBM employee and I've used Jenkins a lot, it's also kind of 'built in' as a standardised part of some other pipeline offerings on our cloud tooling but our current enterprise-wide solution is Travis CI (fully integrated with an enterprise instance of Github as the strategic SCM of choice at the moment). Personally, I prefer being completely tech/tools agnostic and looking at whatever seems best suited for the context/problem/product/system-under-test at the time...
Another point to add for anyone interested, there's also a recent (I think) offering named CIO Cirrus which is basically an OpenShift based cloud platform solution for hosting IBM internal tools. I expect that strategically many internal teams may want to move their pipelines to this for the benefits over whatever they were doing before instead. The enterprise platform standardisation, ongoing support and availability/reliability advantages may outweigh any in-house supported options.
In our own area of IBM, I’d say we’re about 50% Jenkins and 50% Travis! People tend to use Travis CI if possible, because it’s easier to set up and maintain, but they’ll use Jenkins if they need its more sophisticated set of plug-ins, scheduling, or build chaining capabilities.
We even have one group using UrbanCode Deploy, because they have multiple environments and a set of modules that have to be deployed in a synchronized way.
The CI/CD tool is one where I frankly don’t care what another team is using; whatever works for them is fine with me. But there are, of course, skills and knowledge we can share by at least more or less limiting it to two tools.
There are other parts of IBM that are completely Jenkins because of it’s support for multiple hardware and operating systems. The only other one that is close for operating systems is Gitlab CI. We need a solution that works well for z/OS and for various languages.
Hm I’ve never tried Gitlab CI. Good to know it has broad platform support.
@richard431 Hi - I want to be able to see this talk where you don't stand still. 🙂 I don't know if I'm being daft but I can't see it in the library amongst all the others. Has it been uploaded?
@richard431 Hi - I want to be able to see this talk where you don't stand still. 🙂 I don't know if I'm being daft but I can't see it in the library amongst all the others. Has it been uploaded?
there were a few problems with my original upload so it only got properly submitted yesterday
I’m sure @jessicam is on it, but she also has a billion other things to do for the speakers who were more organised than me :rolling_on_the_floor_laughing: I’m sure it will be up soon!
@saloni.seth Yes! We will get it in the library soon. Thanks
@aimee.bechtle055 your video looks good to me! Anyone having issues?
@aimee.bechtle055 what was the biggest help in your change to remote workforce with COVID 19?
@aimee.bechtle055 what was the biggest help in your change to remote workforce with COVID 19?
Hi @rradclif 👋 Apologies to dive in here, but we at Slack created these two blogs with our point of view on the remote workforce that may help; 1️⃣ https://slackhq.com/how-slack-shortens-distances 2️⃣ https://slack.com/intl/en-gb/resources/using-slack/slack-remote-work-tips
np, we use slack extensively…. slack spread is our problem channels are great until you have 100s.
Many of our employees were faced with childcare issues and getting the equipment needed and fearful they wouldn't be able to meet their commitments and target dates. We softened our dates and deadlines and were understanding and empathized with them
https://itrevolution.com/book/full-stack-teams-not-engineers/
My favorite part about that paper is where it mentions how it eliminates a lot of women from applying to full stack jobs
Some ideas are reduced hours in the day for Dojo, using conferencing and collaboration tools like Miro and Zoom or Meet, allow breakouts or pairings of team members in the Dojo to do work on their own time or time zone. At S&P we have people in Asia-Pacific, UK, and multiple USA time zones. We choose a time frame to collaboration and be a team in the AM, from 8 - Noon, so we can all collaborate during our time zone work hours. Then assignments are completed outside of those hours and with a pairing of teammembers.
I really like how you put the C’s together, this will be a great help in helping explain this culture change.
If you noticed, so many things begin with a "C". We've all been working for "A"s our whole life, let's celebrate the "C"!
Hi @aimee.bechtle055, are you doing this within an existing funding envelope, or is 'extra' development funding available for the transformation?
Thats a common theme over the past couple of days. Unsurprising, I suppose - doing more with the same (at best) and less (probably table stakes now)
I hear "We're DevOps" over and over again, this success criteria helps me to help them understand if they are just practicing CICD, or are they really DevOps?
at a minimum, it will help explain what you’re dealing with and manage expectations
@jeff.gallimore I've been thinking of how to put together a matrix of the factors that influence and effect the pace of change and if we could look at companies and be able to relatively estimate how long it would take. I'm sure there's too much unpredictability but how can it be a data supported estimation when companies need expectations set?
At Company #2 they set out to "transform" in 3 years, when I left in 2019 they had a ways to go
From 3 years to 6 years. The CEO updated the same slide every year and changed the date.
@aimee.bechtle055 - in light of the corp legalese on your slides - can we use your C-Suite concept (with appropriate attribution of course)?
@aimee.bechtle055 that would be a powerful tool. imagine using that tool as a self-diagnostic with leaders… where do YOU think you are? and how fast do YOU think you’ll move? …aaaaaand here’s what the data show…
We have teams 15 years into the transformation, they would be called DevOps or as close to it for a team building product code, but they understand it’s continuous improvement, so I consider them one of the best teams.
thanks! - and did you change your twitter handle? It is coming up as doesn't exist
I told @genek101 once when I was frustrated once that the trick is finding a way to get change done before leadership changed.
I think Gartner made a run at something similar with their “enterprise technology adoption” profiles: https://www.gartner.com/en/documents/3890775/understanding-gartner-s-enterprise-technology-adoption-p
Yes, I'm going to see how many people can watch your preso @aimee.bechtle055. Very nice.
@jeff.gallimore This is going to be shared today. Thanks for this link from Gartner
Thank you @aimee.bechtle055 great talk and good explanations of your thinking
The DOJO Consortium - A Living Scenius Project (US Bank, Verizon, Walmart) is on the track 2 channel
Welcome to the encore presentation! The story I heard was this presentation was so great it needed to be repeated. I may be biased though.
We had a gut feeling that certain things were slowing down many of our squads. What about you, do any of you have an intuition that something is a problem, but no way to visualize or prove it?
"redeliveries from IV&V" - everyone knows there are code quality problems; impacts in terms of apparent/anecdotally low %C/A (but nobody had bothered measuring); and it was a distraction from improving quality
"redeliveries from IV&V" - everyone knows there are code quality problems; impacts in terms of apparent/anecdotally low %C/A (but nobody had bothered measuring); and it was a distraction from improving quality
it's meant to show how often work flows "left to right" without being sent back left
Have you ever seen an Agile or DevOps metrics dashboard used in evil? Have well-meaning but ill-informed people swooped in from outside of the team asking why certain scores were out of range and how soon could they get back to green?
Have you ever seen an Agile or DevOps metrics dashboard used in evil? Have well-meaning but ill-informed people swooped in from outside of the team asking why certain scores were out of range and how soon could they get back to green?
if managers start asking for features like productivity per person be aware
if managers start asking for features like productivity per person be aware
start with the end in mind, what do you want to know/reach and which metrics can provide you with relevant data on this
I really like what the Accelerate book has to say about this, namely that lead time beginning from feature inception is very variable, but change lead time (from code commit to being used in production) is less variable and a great thing to measure.
There’s value in both. If your product management process takes 3 months but your development/deployment/delivery cycle takes a couple of weeks, a) speeding up product management will have much more of an impact than you might think, b) the lead times should be measured separately.
When we looked at the lead time of features that were actually delivered (from the time the story was open to the time it was in production), it was almost always less than 1 month. And stories that were more than 1 month old rarely got delivered at all.
I think measuring “the time from when a story is opened, to when it gets into the sprint backlog” AND “the time from when a story is added to the sprint backlog, to when it is delivered and in production” would both be useful metrics for our own teams.
Defects per developer reminds me of that Dilbert comic: I’m going to write me a minivan!
Defects per developer reminds me of that Dilbert comic: I’m going to write me a minivan!
In my first job out of uni - at a video games company - us testers were ranked on defects raised. So not quite the same as showing defects per developer, but it made for a really, really toxic environment.
It didn't feel like a good learning experience at the time! But with being far removed from that experience, I do now see it as a fantastic learning experience.
True. We’re measuring work item throughput, not velocity in Scrum terms.
Squads seem to have fairly consistent story point sizes, but the meaning is different. That's one reason we made it difficult to compare squads.
One of the great things about counting stories is that (unlike story points) it naturally provides an incentive to make all stories small.
Hm, I hadn’t thought of it that way, but you’re right! We also learned over time that stories larger than 5 points would take FOREVER to deliver. As in several weeks to months, with long running feature branches and hideously painful merges. So we stopped allowing stories to be over 5 points.
Do any of you have problems with pull requests getting “lost”, where nobody notices them, or nobody reviews them? Anyone else have a good solution for that problem?
Do any of you have problems with pull requests getting “lost”, where nobody notices them, or nobody reviews them? Anyone else have a good solution for that problem?
One technique I use is to coach teams to really care about finishing things (where finishing means used by customers in production). Once a team cares about that, they are more motivated to not just review pull requests but also care about all the other important things that happen after code commit before production.
Do they have a way to quickly see all of the open PRs? Before we built our dashboard, a few of the older squads (the ones with more repos to maintain) built their own tools.
Github has a quick way to see a list of all of the repos for a team, but not an easy way to see how many PRs are open against each of them. You have to click through.
I'm adding that to our team dashboard because even my team has that problem. "Aged PRs" My wife's team has "Code Monkey" a Slack bot that nags them about PRs
Yes, what I’ve seen most often is for teams to build tools for this, to fill in any gaps that e.g. GitHub doesn’t provide out of the box. I think what’s even better is when an organisation has created bandwidth explicitly for tools like that to be built, to enable all the teams . .
Developer Productivity includes Developer Enablement (CD Platform), Developer Experience (IDE's Frameworks), and Developer Advocacy (Dojo, Training, helpful tools)
We actually did all of those things before in the CD platform area. Now it's less ad hoc. 🙂
@ann.marie.99 how many environments you have where the deployment is counted. So if you have software version 1.0.1 and two customer environments would you count deployment of version 1.0.1 two times if it is deployed on both environments?
@ann.marie.99 how many environments you have where the deployment is counted. So if you have software version 1.0.1 and two customer environments would you count deployment of version 1.0.1 two times if it is deployed on both environments?
That is counted as "1". Some squads use travis to deploy to 3+ regions. That is counted once.
Our group is running chunks of http://IBM.com, so we don’t have customer deployments.
I agree We have 12K stores, each are HA datacenters. We count unique artifacts only. 🙂
How do you define availability? Should the whole site be down to be in red or only part of it, what is some corner case of feature is not working?
How do you define availability? Should the whole site be down to be in red or only part of it, what is some corner case of feature is not working?
Ideally and in most cases, we use synthetic monitors to ensure that customers can actually interact with our services. For example, we’ll check web pages to make sure a certain word renders on the screen, or we’ll make sure customers can log into My IBM and see their account info.
Our APM monitors are almost never down, but our synthetic monitors are down more often.
Since squads own their own services they know what "customer impact" means.
great presentation, we are right in the middle of defining our metrics, so very valuable input 🙂
We monitor about 300 services on our production availability dashboard, and 60+ in pre-prod/validation.
I recommend a monitoring tool like New Relic where it’s easy for developers to configure their own monitors. Some (like CheckMK) are just not user-friendly enough.
agree, we use AppDynamics, but I honestly haven't gotten the teams to buy into it, so it's a centralized effort to visualize key metrics now
The first thing I did joining was getting the Availability dashboard created. Make your uptime visible. Review each week with execs. Drives interest in monitoring in general.
yep, just need to get the availability defined for each area, will use the input I got today for sure
@ann.marie.99 @cncook001 did you look at the 4 key metrics from the state of devops report?
@ann.marie.99 @cncook001 did you look at the 4 key metrics from the state of devops report?
Do you consider WIP at all in your DevOps score? For example, rewarding teams that focus on finishing a smaller number of work items as opposed to starting to work on a large number of work items?
Do you consider WIP at all in your DevOps score? For example, rewarding teams that focus on finishing a smaller number of work items as opposed to starting to work on a large number of work items?
Do you find teams use that display to change behavior? I ask because teams in our organization recognize the need to limit WIP, but commonly overlook the implications and end up in heavy WIP scenarios.
If teams review WIP in their daily stand-up or weekly retro, yes they will change their behavior based on it.
Does the bucket size metrics encourage the developers to write as few lines as possible eg. define all variables in one line instead of mulltiple lines.
Does the bucket size metrics encourage the developers to write as few lines as possible eg. define all variables in one line instead of mulltiple lines.
We do have plenty of 5-line pull requests, though, and that’s FINE. Not a problem.
Squad comments is our answer to squads who don’t intend to change their practices to improve a score for one reason or another. Often they would ask us to change the scoring system so they would get a Green/Good score. Have any of you run into a similar situation?
This may be a bit out of the purview of this presentation, but it's from a conversation I had at Lean coffee yesterday. Were any of these metrics used to try to reflect the DevOps culture internally? @ann.marie.99
This may be a bit out of the purview of this presentation, but it's from a conversation I had at Lean coffee yesterday. Were any of these metrics used to try to reflect the DevOps culture internally? @ann.marie.99
Yes… we had good discussions about each metric, why we should or should not use it, and what we would expect to see if teams are doing what we want them to do.
I know the slides in the middle went fast, but we tried to outline what best practices we were hoping to drive with each metric. You can get the slides from https://github.com/devopsenterprise/2020-London-Virtual tonight after they’re posted if you want more details.
It’s difficult to read the second link on one screen. It’s a link to our open-source `npm audit fixer` tool. The link is here: https://github.com/IBM/npm_audit_fixer It works for us, but if you have trouble using it, feel free to send me a message on Twitter, or if you’re truly lovely, contribute a fix!
This talk made me excited to bring this back to my org. We need to do this!
With daily automated checks for versioning, how are your squads dealing with breaking changes?
With daily automated checks for versioning, how are your squads dealing with breaking changes?
It caused a lot of pain with our own squad when we turned that on. The script Ann Marie posted has flags to adjust how aggressive it gets. We tuned it to focus on high critical defects. Those are not too bad to resolve when then happen.
We also expect 100% unit test coverage. That helps with confidence when we upgrade packages frequently.
100% is only slightly painful when you start at 100% with greenfield code and put a check in to fail the build if it drops below that. It’s definitely harder to get there with legacy code, but some teams have done it. The rest will get to, say, 80% coverage and put a check in the build that won’t let their coverage drop below 80%.
To expand on the automated patching a bit more: we have two ‘npm audit’ scripts - one version will block the build pipeline for higher-severity vulnerabilities, to get developers to fix those right away. It goes in the “test” stage of the build. The one we posted here that runs on a schedule will try to patch any and all vulnerabilities it finds.
Github itself has a feature that will automatically notify you of open source vulnerabilities, but it’s not available on our Github Enterprise Server yet. I don’t think it actually fixes them for you, either. There’s a bot called Renovate that will fix things, but it doesn’t catch nearly as many things as npm audit
does for NodeJS apps.
Thank you @ann.marie.99 & @cncook001. Super interesting, and very helpful. I've been toying with getting various metrics visible to my team and you've definitely given me some pointers for this 👏 👏
The slides will be posted to https://github.com/devopsenterprise/2020-London-Virtual tonight if you want more details.
Once you get the initial connection set up with your code repo, your monitoring system, and a few other things, adding new metrics isn’t bad. Credit also goes to to Tony Huo, our developer who implemented most of the calculations and the front-end.
We did have some issues with Zenhub rate limiting… but anything can be worked around. Tony ended up pulling the data in Jenkins batch jobs and storing it in a cloud DB.
Oh that’s a point too - pulling all of the data live was too slow! You don’t want your users to have to wait 20 seconds for a screen to load, so pre-caching the data in a database is important.
@aimee.bechtle055 I just watched your session and very much appreciate what I saw. Do you have more long form material regarding your C-suite and DevOps Success Criteria that you would like to share?
@david627 do you remember the Flintstone's episode where Fred becomes an executive at the quarry? He was told all he ever had to do was say 3 things. Who's baby is that? What's my line? I'll buy that.
@richard431 I regret I missed your talk. I don't see it in the Library. Are you able to share? Thanks
@richard431 I regret I missed your talk. I don't see it in the Library. Are you able to share? Thanks
There were some problems with the original submission. It will be up soon - I'll ping you as soon as it's up 😃