This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-06-25
Channels
- # ask-the-speaker-track-1 (280)
- # ask-the-speaker-track-2 (292)
- # ask-the-speaker-track-3 (326)
- # ask-the-speaker-track-4 (212)
- # bof-arch-engineering-ops (11)
- # bof-covid-19-lessons (18)
- # bof-cust-biz-tech-divide (3)
- # bof-leadership-culture-learning (3)
- # bof-next-gen-ops (1)
- # bof-overcoming-old-wow (2)
- # bof-project-to-product (7)
- # bof-sec-audit-compliance-grc (4)
- # bof-transformation-journeys (2)
- # bof-working-with-data (3)
- # discussion-connect-february (1477)
- # games (131)
- # games-self-tracker (9)
- # happy-hour (38)
- # help (85)
- # hiring (11)
- # networking (24)
- # snack-club (22)
- # sponsors (41)
- # summit-info (263)
- # summit-stories (17)
- # xpo-datadog (1)
- # xpo-digitalai-accelerates-software-delivery (9)
- # xpo-github-for-enterprises (9)
- # xpo-gitlab-the-one-devops-platform (14)
- # xpo-itrevolution (1)
- # xpo-launchdarkly (5)
- # xpo-planview-tasktop (1)
- # xpo-slack-does-devops (5)
- # z-do-not-post-here-old-ask-the-speaker (2)
Looking forward to this one. "How do you level up Audit?" is a frequent conversation we have in the Dojo Consortium.
https://itrevolution.com/book/devops-automated-governance-reference-architecture/
https://itrevolution.com/book/devops-automated-governance-reference-architecture/
One of the biggest problem with audit in most organizations is that it is hard to reconcile cloud-native activities with ITSM processes.
One of the biggest problem with audit in most organizations is that it is hard to reconcile cloud-native activities with ITSM processes.
Newest ITIL gets a little better
But most ITSM mindsets are 10 years in the past
interesting concept that separation of duties is between software and human
I find the mis-match starts with an inaccurate CMDB. If audit relies on CI/Service Owner this becomes an issue. Typically Cloudnative activities start with git evidence. Typically organizations don't have callbacks between CI and git evidence.
I find the mis-match starts with an inaccurate CMDB. If audit relies on CI/Service Owner this becomes an issue. Typically Cloudnative activities start with git evidence. Typically organizations don't have callbacks between CI and git evidence.
Absolutely. Itβs been crucial for what weβre building.
Pull request as separation of duties evidence
You need evidence in an attestation store that comes directly from the pipeline not the chage record.
Yes. We link the pipeline to the change record - actually raise and approve the CR automatically from the pipeline.
A common pattern I see is that the CMDB is "precious" and so only "trusted" people (aka Service Management) is allowed to update it. This in turns means that the people with the correct information can't update it and it gets out of date. As it gets out of date it becomes less valuable and less used and so less updated. Repeat and rinse
I hinted at this in my talk Tuesday
CMDB is a dev and ops resource
EA and dev teams are just as key stakeholders as ops
In my experience I have not met a company that has better than a 40% accurate CMDB
We are getting better. Incentivising data quality in the CMDB is hard, but we are getting there.
As much automation as possible - and cloud CIs 100% automated & dynamic.
It get worse with microserices and even worse with serverless... even worse than worse with Service Mesh activities.
Evidence needs to be objective not subjective. Or said as attestations need to be objective not subjective. The previous mentioned paper suggests all attestations in a chain of immutable signed lists.
Evidence needs to be objective not subjective. Or said as attestations need to be objective not subjective. The previous mentioned paper suggests all attestations in a chain of immutable signed lists.
Yes, but I want a freight bill in production with full traceability of what is in it, how it was approved, how it was tested, etc.
I am also pushing org in that direction as we have some disconnection between tools that requires manual (thus arbitrary) step to glue activities and complete a process
This is all super familiar - itβs exactly what weβre doing :)
And then collecting the right evidence from key controls. So glad to see so many people doing this or looking to!
This is what our team built to help with Compliance Attestations linked to CI/CD pipelines https://www.youtube.com/watch?v=ll50dAiKPoI

Also this is v similar to Topo Palβs work
@lucasc5 how to you audit the pipelines in a reasonable amount of time?
@lucasc5 how to you audit the pipelines in a reasonable amount of time?
@bryan.finster Thanks for the question! Our typical audits take about 8-12 weeks from start to finish. We are increasing our use of analytics and automated testing (fewer manual tests), which is helping to reduce the amount of time spent testing.
What controls do you have automated in the pipelines today?
@bryan.finster As an organization, many processes and controls are still manual. We are moving to automation of certain controls to increase efficiencies. One example is automated vulnerability scanning. @lewir7 and I can do a little more digging and follow back up with a few more examples if you're interested.
Yes, I'd love to know what you have as requests automate in the pipeline. I work in the CD platform area and keep telling Audit that if it's not in the pipeline, it's probably not happening. I'm very interested in what others are trying to automate to make compliance a reality.
A better question is what exactly doe the audit look for. I find that the answer is basically these 5... 1. Change record service owner 2. Time window for deploy 3. backout activities
A better question is what exactly doe the audit look for. I find that the answer is basically these 5... 1. Change record service owner 2. Time window for deploy 3. backout activities
Good auditors will look for evidence of what you specified you would do in your formal process documentation :)
So if you can update that to be cloud native then youβre on the right track
@jwillis A lot of these in the list are related to risks and controls (e.g., not meeting an SLA for time window to deploy could impact the production environment). Auditors should be seeking to understand how the risk is mitigated, rather than sticking to a list of pre-defined controls.
@jwillis I'd love to connect with you and discuss how this could apply to cloud native activities.
Great talk! This idea of using DevOps pipeline evidence to streamline compliance is really getting traction. Great to hear the success stories.
fyi... We are doing a second version of the Devops Automated Governance paper this summer with a specific focus on policy... policy as code, policy error budgeting.
fyi... We are doing a second version of the Devops Automated Governance paper this summer with a specific focus on policy... policy as code, policy error budgeting.
@lucasc5 and @lewir7 Thanks for sharing your experience. I always found that talking to auditors saved us a lot of work.

@lucasc5 and @lewir7 Thanks for sharing your experience. I always found that talking to auditors saved us a lot of work.

Thank you! If I could impart any additional advice, it would be to continue, or even begin your journey today in cultivating a culture of collaboration with your auditors. This can have a tremendous impact in the overall risk management process!
@lucasc5 @lewir7 - Great talk, thank you for sharing.
Welcome @david.jungwirth @max.ehammer ! Thank you @lucasc5 and @lewir7 !
"Not many of us had experience in such dynamic environments, where requirements change constantly"
"A typical food retail use-case is that you at least have 50 products in your cart, often more than 100 items..."
@david.jungwirth and @max.ehammer didn't get the point of regulation? Could you elaborate please? in the context of eCommerce, regulations should help...?
The aspect of regulation is with respect to comply to local laws that restrict you somehow how you can deliver goods - for example you are not allowed to mix up meet and fruits in one delivery item
This is a great case study from outside the early adopter agile and DevOps bubble. Thanks for sharing!
"Alone with improvements in process and culture we achieved a deployment time improvement from one week to one day"
Sorry that isn't very inclusive (article in German): https://www.derstandard.at/story/2000116456865/spar-zu-online-shop-bestellungenkapazitaetsgrenzen-erreicht - maybe you will get to this - how do align tech to business outcomes?
Sorry that isn't very inclusive (article in German): https://www.derstandard.at/story/2000116456865/spar-zu-online-shop-bestellungenkapazitaetsgrenzen-erreicht - maybe you will get to this - how do align tech to business outcomes?
Or to put this differently - how do you align e-commerce with the logistical aspects (which seem to have been the bottleneck here).
@joachimsammer - this is an article regarding COVID times, which created a huge demand on all kind of e-Commerce retailers. Max will explain business impact of the conduced improvements in general in a few seconds...

The logistic cost is the biggest painpoint in food e-commerce, however, our business is trying different aspects of lowering the cost of logistics. This starts with the commissioning of the goods, delivering the goods, and return of goods - all with support of proper IT solutions
Until there is no appropriate business model, the growth is limited that's for sure
The cultural issues were mainly related to different work behaviour. Coming from mainly waterfall driven projects it's a huge change adopting agile methodologies
Definitely both sides - if you ask me today, business is still lacking behind although they are catching up
We are still around some more time in case you have further questions - just let us know...
Stephen, are you guys doing the same open source study for 2020, that you and Gene presented on yesterday? Have you already done the research?
Stephen, are you guys doing the same open source study for 2020, that you and Gene presented on yesterday? Have you already done the research?
Yes, although this year weβre focusing on the consumer side. Currently analyzing the results of a survey we sent out in January to assess the impact of various practices on secure use of open source.
Got it. Looking forward to seeing that data. Are you guys using Datomic after-all?
I'm sure you guys are close to done, but, please let me know if there is any way I can contribute, or help.
The survey data is small enough that we can analyze it locally using Python with pandas and scikit learn (some of my favorite tools). But I know @genek101 is still using Datomic for some cool related analysis of open source usage data.
Continuous Everything again. Theme of the conference.
Continuous Everything again. Theme of the conference.
Also: βfree up humans to do other thingsβ - is the flip side of the automation coin that gets missed so much
The βbeaconsβ in Rasmusβs talk from this morning are another great example of this approach.
"It's a government effort but there are community elements." And thank goodness for that.
Full reference for this story here: https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext
This is why I love SonarQube these days - fix new code now vs fix old code over time.
This is why I love SonarQube these days - fix new code now vs fix old code over time.
And check out Muse for a SonarQube-type approach but for deeper security and reliability bugs π
:D probably not
One nice thing Iβve seen from Sonarqube: itβs possible for a single developer to fix dozens of security and code smells in 1-2 days, because it gives good, concrete fixes to implement.
Tell me if you have particularly good stories about the dev / security conflict.
More detail on Googleβs Tricorder platform: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43322.pdf
More detail on Googleβs Tricorder platform: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43322.pdf
We run a GitHub Enterprise for a few thousand devs and I've long believed that it's our biggest collection of untapped intelligence about how our company works.
Our CISO sees DevOps and automated tools as the way to keep the company safe. So credit to our Digital Security team, they help build things like CheckMarx into our pipelines. So it's way less of a conflict these days.
Our CISO sees DevOps and automated tools as the way to keep the company safe. So credit to our Digital Security team, they help build things like CheckMarx into our pipelines. So it's way less of a conflict these days.
Itβs so effective when security works with devs to provide tools and services that integrate with their workflows.
This from Capital One is my gold standard
How many of their teams are at that level? I'd say our very best teams are close to that, but they are a bit unicorn-y compared to most of our teams!
I believe that this is their standard pipeline. But Topo Pal isnβt around this year to answer, I donβt think.
Here a bit the opposite we initiated it and proposed our central sec team to use as well as relieving some of their manual work. But weβre still at the beginning and have to support our dev in fixing results from the chkmrx scans
@tapabrata.pal ?
Yeah, itβs a 4-5 year aspiration for us. We have around 8-10 of the 26 checks implemented.
Yep, our private beta is available now to DOES attendees (https://does.muse.dev) and full public launch in the coming weeks.
This is a pattern i've seen with high performers. They add as many tools as possible.
One org let their devs push a jar to their system and it spit out a report. The system had about 20 analysis tools.
One org let their devs push a jar to their system and it spit out a report. The system had about 20 analysis tools.
And how do you interact? Is this an editor integration, a brower? So many questions on this one.
(vs batch mode long after the code was written, which never works well)
95% of code errors (in the application code, not dependencies) fixed when analysis was deployed in code review at Google.
Quality in many axes, automate to the max but recognise the unique value people actively poking things brings
Quality in many axes, automate to the max but recognise the unique value people actively poking things brings
Bolton and Bach on why testing is a human practice https://www.developsense.com/presentations/2018-04-TestingIsTestingAgileIsContext.pdf
Add an automated test for anything they find though so it doesn't reoccur.
So so good. Like paradigm shiftingly good.
Yes, @tom.ayerst! This is why multiple tools / platform approach is important. So many axes of quality and security itβs hard for one tool to provide everything you need.
The βROFLβ Term: https://discovery.ucl.ac.uk/id/eprint/10074600/7/O'Hearn_Continuous%20reasoning.%20Scaling%20the%20impact%20of%20formal%20methods_VoR.pdf
More on what Facebook has been doing: https://cacm.acm.org/magazines/2019/8/238344-scaling-static-analyses-at-facebook/fulltext
And NISTβs effort is on GitHub and open to contributions from everyone: https://github.com/usnistgov/ACVP https://github.com/cisco/libacvp
call out to @jwillis, @tapabrata.pal, @samgu, @john_z_rzeszotarski, and all my co-authors on that report!
We minimised the CAB so itβs a single button push
Yep, working on that with some of our clients. Been difficult... Bank in rural TX.
As well as the production readiness checklist, having it automated as possible instead of manual checklists.
more importantly they are getting policy ppl writing policy as code (human readable) that drives the attestation and enforcement pipeline models
Rusty and Clarissaβs talk on DevOps and Internal Audit is also a great success story for this sort of integration of controls and evidence collection / audit for those controls.Β Thereβs some great discussion around that and related efforts in the Track 2 Slack this morning.
What have you needed independent V&V for?
I think thatβs the right question to ask first. I think thereβs probably still aspects of quality / risk mitigation that IV&V is well-positioned to solve, but continuous assurance should be able to shift and narrow their focus on things that canβt be automated / handled by tools.
It was there when I arrived. Trying to eliminate it from the value stream entirely, but probably won't be able to do so until the dev team has what they need to improve quality more quickly.
A ridiculous number of low/medium defects are being carried over each week, as found by an end-to-end static analysis tool. What Stephen's talk indicated is that it's quality tools not tool singular.
A trick Iβve used before is to ratchet up the automated bar slowly
Fix as much as you can, then accept no more new medium defects as an automated bar
Then no more low ones
Then turn warnings into errors
Yes! I love @samguβs phrase from his talk today: βget clean, stay cleanβ.
Also Australia National Bank doing this: https://www.youtube.com/watch?v=ll50dAiKPoI
also, all the CAB has is the change record where the only reference in the change record is an immutable chain of digitally signed attestaions.
Superb presentation! Thanks! π
Thanks! If you have stories, please reach out to me. Weβre revising the automated governance paper this summer and would love to incorporate more case studies.
Careful @stephen - mine half demanded he is mentioned in the second book because he was in the first ππ
Careful @stephen - mine half demanded he is mentioned in the second book because he was in the first ππ
Thanks @stephen. I really enjoyed your presentation!! Awesome ending, so cute!!
<!here> good morning/afternoon/evening. Hope you enjoy the talk. Let me know if you have questions
It is noticeable the change of Microsoft in last years regarding open source. And the acquisition of GitHub was pretty strategic.
One of the problems my company has ran into is that if we have too much of our code exposed to the internet, then we will lose our competitive edge. This has resulted in only 2 repos being open sourced. How do I help continue to build more momentum to open source more of our tools?
One of the problems my company has ran into is that if we have too much of our code exposed to the internet, then we will lose our competitive edge. This has resulted in only 2 repos being open sourced. How do I help continue to build more momentum to open source more of our tools?
We were discussing this today in quarterly planning. I don't think it should be a volume games, more about maintaining quality.
Yes, that makes a lot of sense. It always bothered me when people would come out with stats comparing open source engagement
Hi @dacahill7, Adam will be able to answer that in more detail. Come to our booth https://doesvirtual.com/sonatype we can open dialogue on this topic. π
That's a key point. I'll talk about it in a second in the talk but it really comes down to understanding the "value" of your company and what you produce
and relating that to your competition. Why do people buy your products vs your competitors.
@jeffmcaffer iβve talked to other enterprises with open source programs. one of the drivers was desire to attract technical talent. how much of that do you see at Microsoft and other places?
What tools did you have to build @jeffmcaffer? And how do you find the half a million new instances of OS every month?
@jeff.gallimore that is a common desire/direction. If that is the driver then typically I've seen that fail in the end. There needs to be a strong product/business need. engaging with open source communities is a durable activity that needs sustained attention.
@jeff.gallimore that is a common desire/direction. If that is the driver then typically I've seen that fail in the end. There needs to be a strong product/business need. engaging with open source communities is a durable activity that needs sustained attention.
so maybe a driver but not the top driver. :thumbsup: . i can imagine if βattract talentβ is at the top, the program (and the open source products) would atrophy
exactly. I would see it more as a consequence or outcome. Where it was a key driver, the company would attract folks but they would leave in a year because the engagement didn't run deep
@matthew.cobby we ended up writing a lot of tools to support that scale. Ultimately instrumenting the builds with tools that "detect" open source is key. Then having an automated way of relating policy to those discovered uses.

@matthew.cobby No. We wanted to but there was too much company context in there. I'm sure they could be abstracted but we did not end up doing it
@jeffmcaffer sounds like youβre bringing in βsoftware supply chainβ concepts there
Even in producing open source, the more you pay attention to the needs of the supply chain (your consumers), the more adoption.
ClearlyDefined for example would not be needed if all the compliance info was readily available
ClearlyDefined for example would not be needed if all the compliance info was readily available
@jeffmcaffer what would help with the compliance tracking? more YAML? π
there are lots of tools out there , commercial and open for tracking. Automation is key. It's so easy for a team to bring in 1000 components with one command (think npm install
)
Integration into the engineering system is key. This notion of Proficient is really where most people should strive for where all the compliance and security work is automated
@jeffmcaffer indeed on both points. accessing one oss product is just the tip of the iceberg with all the dependencies that come with it. and yes, weβve integrated as much as we can into the pipeline.
@kolton - there's definitely no such thing as 100% availability forever, always a case of outage/downtime when not if but unfortunately many large enterprises (and project silos within them!) are still pushing ambiguous (or even missing!) ops requirements that don't consider this properly and not even aware of what the nines really mean. I have to drop for a call but definitely going to watch this later. Thanks for raising awareness!
@kolton - there's definitely no such thing as 100% availability forever, always a case of outage/downtime when not if but unfortunately many large enterprises (and project silos within them!) are still pushing ambiguous (or even missing!) ops requirements that don't consider this properly and not even aware of what the nines really mean. I have to drop for a call but definitely going to watch this later. Thanks for raising awareness!
Ya, that is the remainder of the talk, the trade offs from different levels of investment. LMK if you have questions when you catch it later π
There are systems that measure their up time in terms of years, some are at 15 years already and going. There are systems that canβt go down.
Even for a few milliseconds? Mother nature is likely going to force that test before DevOps does then π
'Nine nines' allows for 31.56 milliseconds every year apparently: https://en.wikipedia.org/wiki/High_availability#Percentage_calculation π
we have 7 nines in the z hardware which helps, 7 nines in the hardware and multiple systems to handle floating the load
Wild scalability and portability (ilities!) for floating the load are useful approaches for risk assess/management approach. Many tier 1 data centres (certainly some of the smaller cloudy ones...) can't really offer five nines effectively?
These are usually running in companies Data Centers - multiple - geographically dispersed
How do you decide when to excuse people from the incident call? For example, what if your service is down but itβs something you canβt do anything about. And once the thing you depend on is fixed, your service will recover automatically. (Like a multi-site Cloud infrastructure outage.) Do you excuse that person from the call?
How do you decide when to excuse people from the incident call? For example, what if your service is down but itβs something you canβt do anything about. And once the thing you depend on is fixed, your service will recover automatically. (Like a multi-site Cloud infrastructure outage.) Do you excuse that person from the call?
Another example is when the authentication service is down; apps that depend on it canβt do much on their own.
It's a judgement call. Once you can eliminate their service from question is often the place to let them go. i.e. you're dealing with an outage and it's clear it's related to the interaction between two services, maybe we identified a change that is suspect and the group feels confident we know what's happened. That's a good spot to let other folks go. Often that is the recovery time when the 'fix' is going out or taking effect.
I feel like I would love to see a new book come out that is something like the life of brent: how to realize you are him, and how to change it
I feel like I would love to see a new book come out that is something like the life of brent: how to realize you are him, and how to change it
Is the number of 9βs ok as part of a service deployment eg. 2 9βs MVP 3. 9βs as we add more users or go external?
Is the number of 9βs ok as part of a service deployment eg. 2 9βs MVP 3. 9βs as we add more users or go external?
Ya, I think that's part of the trade off. If you're working on a POC and it's early days, a few outages may be OK. Once it becomes a core part of the system that teams and customers rely on, it's worth the investment to improve it.
This should then also allow a more rapid route to live as you have a minimum set of controls/criteria needed - if the team knows this it should lead to less handoffs (minimum guardrails)
In your experimentation with games testing incident response, do you make sure you vary the people available - to make sure their is not that critical one person?
In your experimentation with games testing incident response, do you make sure you vary the people available - to make sure their is not that critical one person?
There's a few approaches you can take, whether you inform the teams before you run the drill, or whether you 'surprise' them. Either way, I think you want to rotate through so everyone gets an opportunity to practice.
I know one team that did this with declaring whoβs dead π. virtually of course
There was a great talk by Dave Rensin, Director of SRE at Google, speaking about testing the knowledge that people have and chaos engineering our organizations.
(https://speakerdeck.com/chaosconf/keynote-chaos-engineering-for-people-systems)
A Gameday focused on testing the services and technologies, where a Fire drill focuses on the team and their response.
The Gameday is often planned and widely communicated, where the Fire drill might be run as a surprise, or with limited knowledge beforehand.
'availability theatre' - reminds me of someone who once argued with me that 'we never test that failure scenario as it's never happened before' evidence of past failure (or lack of it!) in complex systems does not always closely correlate with future reliability/failures, similar concern with testing complex systems, as there are simply too many combinations of possible outcomes that we ask how many nines test coverage is acceptable. Which failure test scenarios are more likely? Arguably of all possible global disasters, was a pandemic in the top ten most likely to happen, regardless of how we might have prepped and tested readiness to recover? Absolutely loving this talk, thanks for sharing @kolton
'availability theatre' - reminds me of someone who once argued with me that 'we never test that failure scenario as it's never happened before' evidence of past failure (or lack of it!) in complex systems does not always closely correlate with future reliability/failures, similar concern with testing complex systems, as there are simply too many combinations of possible outcomes that we ask how many nines test coverage is acceptable. Which failure test scenarios are more likely? Arguably of all possible global disasters, was a pandemic in the top ten most likely to happen, regardless of how we might have prepped and tested readiness to recover? Absolutely loving this talk, thanks for sharing @kolton
Thank you very much. Yeah, it's a prioritization effort of both what you think is likely (based on past outages or your own analysis), and things that might be rare but very impactful when they occur (black swan events)
Thank you all for the opportunity to present! Happy to answer any other questions now or later, DMs are also open or via email (kolton @ gremlin). π