Security Operations Centres (SOC) are overwhelmed by false positives due to the rapid growth in data volumes and the inability of current analytics models to adapt to evolutionary changes in logs, i.e., unstable log data, creating a need for more efficient solutions. Thus, we introduce VoBERT, an innovative sequence anomaly detection method. An improvement on BERTs (Bidirectional Encoder Representations from Transformers), VoBERT adds resilience by accurately classifying unstable logs that traditional BERT-like models would deem out-of-vocabulary. We show that a standard BERT and a simple heuristic (defined as the anomaly score of a sequence is the percentage of unseen logs) often used in industry cannot deal with log changes in time. This innovation is crucial as a more stable model leads to a significant reduction in the number of false positives and enhances our attack detection. Our evaluation for the Thunderbird log dataset shows the MCC (Matthews correlation coefficient) of the standard BERT model and the heuristic decreasing significantly from 60% (no unseen logs) to 10% (for 97% unseen logs). Meanwhile, VoBERT experienced no significant decay (-2%), showing on-par performance under realistic instabilities. We also tested VoBERT against real-world data from a large European bank (50,000+ employees). The results confirmed a stable MCC across all ranges of instability. Analysing real-life datasets also reveals that academic studies often project overly optimistic outcomes by solely testing on artificial datasets. For very low-instability cases, results for all models are alike, however, as instability increases to over 40%, MCC for the heuristic drops to 0, while for VoBERT, it remains unchanged.
This presentation will benefit cybersecurity professionals and SOC analysts, offering insights into practical applications of VoBERT to improve detection results. Attendees will learn the significance of real-world data evaluation and will leave equipped with tools to enhance their detection capabilities.