Building and Defending a Machine Learning Malware Classifier: Taking Third at MLSEC 2021
Nowadays when you read about cybersecurity, you’re almost sure to see something that mentions machine learning (ML) as the silver bullet to solve all problems cyber. Of course, ML isn’t the cyber cure-all, and indeed suffers from its own non-cyber problems – chiefly that ML bring with it its own set of vulnerabilities and weaknesses, often termed “adversarial ML.” These weak points range from leaking private data that the model was trained on to being easily evadable given the right motivation and context.
In this talk, we’ll go through our own experiences leveraging ML to try to build and defend a robust malware detector as part of our submission to the 2021 Machine Learning Security Evasion Competition. Our talk will start by first going over the background on adversarial ML, followed by how we used these ideas to generate adversarial malware variants that we then built our model from. We’ll then shift gears to how we sought to “defend” this model by explicitly attacking the models submitted by the other participants, walking through how we trained a proxy ML model and staged attacks against it.
In the end, our submission took third place in the competition, outperforming some but not all of the contestants. However, our journey helped expose many lessons learned for others looking to get into the space, as well as for those already practicing in it. Attendees of this talk should walk away with an understanding of those lessons, including pointers to resources they can use to build their own models – including the open-source code and the data behind our submission.