Simulations To Collect and Analyze Data, Make Informed Business Decisions, and Test Solution Impact
The Security Operations team of a top 20 Fortune 100 company was implementing a new vulnerability management tool. Vulnerability management is the quarterly process of identifying, remediating, and mitigating vulnerabilities. A vulnerability generally refers to software weakness in computing systems or applications. To meet compliance service level agreements all identified vulnerabilities must be process by the end of the quarter they were identified.
The team was transitioning from a manual collection process to an automated vulnerability management solution. Quarterly volumes were increasing each quarter leading up to the transition (Q2 & Q3). With increasing volumes the team was falling behind, 50% were processed by exception handling, 12% were “false positives,” and 10% were not processed by the end of the quarter and therefore out of compliance. Security Operations had a business need to design a visible and efficient operation for the IT vulnerability scanning and remediation process.
The business was entering the third quarter, the backlog of open vulnerabilities was rising, and the vulnerability management team had resource constraints due to competing projects and budget cuts.
The historical process data recovered was unreliable and incomplete. Unfortunately, the data collection process was managed by a single resource no longer employed by the company. The data lacked the necessary key performance indicators to accurately measure and analyze the process. The process for remediating a vulnerability requires multiple weeks of development and testing before the fix can be released to production. It was determined by the stakeholders that a 3 month data collection period was not reasonable or cost effective.
Conduct a full process review focused on vulnerability scanning and remediation management program based on a Lean Six Sigma DMAIC approach. Define a repeatable and reportable process that reduces volume and increases process efficiency (reduce cycle time) for vulnerability management.
The Renaissance Data Solutions team employed the DMAIC (Define, Measure, Analyze, Improve and Control) approach to isolate, diagnose, and address the vulnerability management process. Following this method we successfully defined the problem and set agreed upon goals outlined in the Renaissance Charter. We entered the measure phase without reliable historical data and no time to collect new data.
A Kaizen event was held with subject matter experts (SMEs), a swim lane process map was completed that identified pain points and bottlenecks in the process. Establishing an accurate process map was critical to the Renaissance solution. Prior to the Kaizen a robust data collection plan was created, that outlined the practical questions the data needed to answer and the data points needed to answer those questions. With the assistance of the SMEs, we also collected processing times including the min, mean, and max processing times, which enabled Renaissance to assign distributions to each process step. We also captured process volumes, resource allocations, scheduling patterns, and typically wait or delays.
Simulation modeling is the method of using a computer application to simulate the behavior of a real process. These computational models give the ability to conduct numerical experiments on the process to gain a better understanding of how it works and the effects of different conditions. The greatest advantage of these applications is their ability to model complex systems in a relatively short amount of time. It is no surprise this methodology is highly effective in improving business processes.
Using graphical simulation software, the model was built by integrating the process map and the data collected in the Kaizen event. To finely tune the process simulator, each simulated step was re-validated by the subject matter experts and, where possible, compared to actual timings to match baseline performance.
A brainstorming session resulted in a list of eight potential improvements. The recommended changes were built into the model and analyzed. Design of Experiments (DOE) were used to determine potential interactions between factor effects, along with response surface analysis to identify the necessary improvement to meet our clients goal.
The Results - Approved Improvements
1. Weekly Scanning – The “As Is” process was based on a quarterly baseline scan that identified vulnerabilities “in scope” for remediation in that quarter. The baseline process delayed the scan results 14 days each quarter.
Solution: Implement weekly scanning to eliminate need for baseline scan.
2. Authenticated Scanning – Current scanning methods included unauthenticated scans which lack the proper administration rights to confirm all vulnerabilities.
Solution: Implement authenticated scanning to eliminate false positives.
3. Scheduled Maintenance Windows – The lack of scheduled maintenance windows in test and production environments caused a bottleneck in the process limiting throughput in the process.
Solution: Scheduled maintenance windows allows the resource managers to accurately staff resources and increase value add work in the process by eliminating time spent scheduling and waiting for windows.
4. Web Server Upgrade Process – Identified during data collection, 1/3 of all vulnerabilities were related to web server products that required an upgrade. Cycle time to install an upgrade is greater than 90 days causing the vulnerability to be out of compliance.
Solution: Establish a secondary process to upgrade all web servers on a biannual schedule and update policies to match operational procedures.
Baseline Simulation Results
The simulation established a cycle time of 106 days to process quarterly volume. This confirmed the current state process cannot meet the 90 day service level agreement (SLA) and cannot overcome the rising backlog. We also verified that two resources cannot support the process alone. However, adding more than 4 resources resulted in diminishing returns.
Short Term Solution
Dedicate 2-4 resources to eliminate backlog.
Long Term Solution
Implement top 4 improvements resulted in a 47.3% reduction in open backlog volume (2089 to 1101) and an ongoing reduction in future false positive and expiring risk exception volume equaling a 27.4% reduction in cycle time to remediate backlog (106 days to 77 days.)
Renaissance Foundation for Success
By leveraging the right mix of simulation, hands-on experience, and catered attention, the Renaissance team was able to identify root cause, raise awareness about key performance indicators, and implement changes that had a direct impact on the customers as well as the company’s bottom line.