The cybersecurity industry has been counting on machine learning and statistical analysis to provide defenders the ultimate weapon in the war against cyber threats. As a result, nearly hundreds of User Behavioral Analytics (UBA) or User and Entity Behavioral Analytics (UEBA) solutions have come onto the market while "legacy" SIEM vendors have raced to re-tool and re-brand their platforms as UBA/UEBA to capitalize. Meanwhile other end users have chosen to build their own cybersecurity data lake to augment or replace SIEM, an attractive option given the immaturity of off the shelf options and an emerging security community contributing to open source analytics projects. Regardless of the path we choose, achieving success remains a daunting task even for the well equipped. Over the last decade I’ve been creating threat detection use cases, building out Security Information and Event Management (SIEM) systems and large scale data warehouses focused on analytics for security. A lot has changed over that time and I don’t pretend to have a perfect blueprint for success but I’ll humbly attempt to share something useful here. There is no question in my mind that the toolsets actually can help us mitigate risks, now more than ever. Before I dive in.. I do believe there is a big role for detective controls as part of a comprehensive cybersecurity program especially for those targeted by advanced threat actors. That said, resources are often better spent on stepping up preventative controls. Reducing attack surface, admin best practices, least privilege and segmenting networks not only prevent attacks but also are a big help to analytics efforts by filtering out a lot of noise.
Wait, why are we doing this?
If you come away with only one tip from this article it’s this: Focus on uses cases above all else. It is all about use cases! Channeling my Steve Ballmer “Developers!, Developers!, Developers!” … I say “Use cases!, Use cases!, Use cases!”. A “use case” is a analytical goal, usually a detective control aiming to alert on malicious behavior. I’m constantly amazed to hear about people building infrastructure for analytics without a real sense of the use cases they will implement. Infrastructure does matter and we do need to get organizational buy-in that logging is a priority and commit resources to build reliable and quality data pipelines but use cases should rule the roost. Even with good input, deploying a tool and pushing data into it won’t deliver great threat detection either. This was the unrealistic expectation of SIEM that never materialized even after decades of development. Even with SIEM those who focus on use cases find much more satisfaction. Failure is on the horizon if you don’t know what use cases you want to implement. While there are a lot of common threats, all threats aren’t a priority in all orgs so select use cases based on risk analysis that identifies priority threats given your business and assets. Another pitfall is not including the full operational workflow into the dev process. A use case is not only “detect lateral movement”. It’s the detective analytics and the operational workflow! So often this is missed. Asking yourself a few questions should get you off on the right foot. What are the data inputs and do we have access to that data? Can we ensure a reliable feed of that data? Who is going to follow-up on those possible lateral movement alerts? What tools, visibility, authorizations are needed to triage those alerts? A focus on use cases will lead you to the right delivery model and provide the best cost justification. You don’t have to enumerate all use cases you will ever need however your effort should start with identifying at least a handful of use cases. The more you can enumerate at the outset, the more future proof your plan will be. Draw from past incidents, risk analysis and scenarios unique to your business.
The dream is free but the hustle is sold separately!
The second leading cause of death of SIEM/analytics efforts is the lack of staff and expertise to realize the full benefits of the use cases. Analytics tools WILL require staff in one form or the other. Even if we assume solutions have “canned” analytics for your target use cases they will always require customization to your organization’s data and workflow. Every company has a slightly different way they deploy and use commodity software and will have unique business characteristics, staff work patterns or IT operational patterns. These factors impact input data and require tweaks through the whole lifecycle from data wrangling to filtering false positives to optimizing workflow. The good news is regardless of your budget you can resource the staff needed for success. You may find data engineering resources within your organization that can help you keep the pipeline and engine running while you focus on the use cases. Some software solutions provide staff aug to tweak the solution as an included service. Another strategy is to scale back data to the data needed for the planned use cases saving hardware & software costs; savings that can help make room for staff or services. I recommend focusing on the data needed for use cases regardless of budget because too many projects fail due to a “boil the ocean” approach. It’s better to implement half of your use cases than spend all your resources on capacity to support “all of the data” and have nothing to show for it. Prove value, then scale up to bring more use cases online. Even with outside help, I recommend dedicated resources in-house when bringing in software solutions. You don’t have to hire a bunch of Ph.D. mathematicians, you just need a curious minds with an aptitude for coding and eager to learn how to tease threats out of streams of data. It’s critical to have dedicated staff in order to be responsive to the security operation center's (SOC) needs. The workflows should include automated triage and orchestration that can deliver the efficiency needed to maximizing SOC resources. Smart resourcing guided by use cases and care in design of full SOC workflows is the path to controlling costs while realizing the benefits of your analytics program.
The right stuff.
It’s critical to regularly test your use cases and make sure they flag the target behavioral pattern. The best testing is Red Team testing generating the activity on the live network. Even when done well, use cases are fragile… add an extra space to the input data, or an API key used to lookup info gets invalidated and BOOM the alert doesn’t fire unwinding all your efforts. Regularly generating the target behavior and checking that the alert makes it all the way through the workflow is critical. This “control testing” will ensure that changes in upstream data or downstream ticketing systems that break workflows are promptly detected and fixed. Another key is that success should be measured by efficacy and not outcomes. A false positive is an outcome, but it really doesn’t tell us if the use case is efficient or not. For example, lets say we have a use case that detect command and control (C2) channels and it has a 100% false positive rate in the last 3 months against real live traffic. Should we turn this off?? Not necessarily! Why? When we test a few different C2 channels modeled after real toolkits the use case detects the activity 100% of the time. The use case is fully tuned and auto-triaged as much as possible and fired 90 times in the last 3 months costing approximately 8-10 man-hours per month to follow-up on. For targeted organizations it’s well worth 10 man-hours a month to detect C2 channels. Our testing gives us confidence that the use case works and this should be our focus. While we do everything we can to eliminate explained anomalies, getting your team to focus on the risks and dumping that antiquated 1999 “IDS false positive” mindset can revolutionize your operations. Tuning should constantly eliminate repeated alarms for authorized activity in order to keep analyst fresh and interested. One way to achieve this is have a feedback loop and workflow that enables faster tuning cycles. Add in occasional red teaming to give the SOC some true positives and feel what it's like to WIN and you have a recipe that boosts the enthusiasm and satisfaction up and down the org chart. Meanwhile you get to catch the real bad guys when they come! Control testing, promoting an analytical culture and a focus on efficacy will mature your analytics program into a critical function reliably protecting your organization when preventative controls fall short.
Perfection is the enemy of profits
I recently heard this quote, “Perfection is the enemy of profit”. Wow that’s so true when developing use cases. So many valuable use cases end up on the cutting room's floor due to too many alerts because the author of the analytics is afraid of missing malicious activity. Even worst, some shops will overwhelm the SOC with one type of incident due to this same fear of tuning out an important event. As a result so many UEBA platforms end up on the shelf or set off to the side and seldom used. A risk based approach is one secret to unlocking the value of behavioral analytics use cases. As security pros we all know that risk can never be zero so why would we try to eliminate all risk that our use case misses the target behavior? Consider our example use case: Detect lateral movement. The target behavior is a malicious actor or insider moving from one asset to another. A comprehensive strategy would include multiple use cases but one of those use cases might be to flag anomalous network connections. This use case looks for rare connections between two devices on the network. Even after filtering for known authorized activity we may find the use case generating too many alerts to follow-up on. Instead of abandoning it wholesale think about the risk, what connections are the most risky? Remote console access like RDP is one example. If we filter resulting alerts including just “high risk protocols” we may see drastic reduction in alerts. Add some auto-triage and you could end up with a very efficient and effective use case detecting common lateral movement techniques. Even if you have to partially filter a lateral movement protocol like SSH from specific networks it's OK because if it gets alerts filtered down enough to operationalize the use case in the SOC then you have won! You have reduced risk of all the other covered protocols being abused for lateral movement. Don’t worry about what you filtered out, a different analytics approach might be better at monitoring the excluded activity. The big picture is that a risk based approach allows you to iteratively implement very efficient use cases until you have a critical mass, a virtual minefield, more and more difficult for attackers to navigate without your knowledge and response.
I realize a lot of the secrets to success is buried in the gory details. I was recently asked “How do you filter all of these sources down to the events that you care about?”. My eyes lit up, I was suddenly very interested in the otherwise ordinary conversation and I hastily blurted out “Well that’s the whole trick, isn’t it!?” There are too many tricks to enumerate in one article but I hate the fact that building valuable use cases and operationalizing analytics is so much secret black art so I'm passionate about helping others be successful with this. Interested in learning more? Collaborate with others focused on these challenges on https://www.ctmx.org/join.
Note: This post was originally posted on Linkedin