Enhancing robustness of ML-based malware detectors
IoT devices are essentially networked embedded computers, which makes them vulnerable to remote malware infections. Once compromised, these devices can be used to execute large-scale attacks that pose serious threats to the broader Internet. In cyber-physical systems, the consequences can be even more severe, as infected embedded devices may cause physical damage to equipment or endanger human safety. Detecting malware on such devices is therefore a critical challenge, made harder by the strict resource constraints inherent to embedded hardware. A lightweight detection approach has been proposed in the literature, based on measuring the binary similarity between scanned files and known malware samples. One family of methods built on this principle, SIMBIoTA-ML, achieves strong detection performance, but relies on Machine Learning (ML) models that may be susceptible to adversarial examples (carefully crafted malware samples designed to evade detection). Since SIMBIoTA-ML is a family of detectors whose members differ in the underlying ML model they employ, understanding and improving their robustness is a well-motivated research direction.
This project consists of two main parts. In the first part, the student will design and implement adversarial example generation methods against SIMBIoTA-ML with the help of guided search in the feature space known from ML literature. The second part of the project focuses on strengthening the robustness of SIMBIoTA-ML against such attacks by adversarial training. This involves:
- Defining appropriate metrics to quantify robustness,
- Proposing adversarial training strategies,
- Systematically measuring robustness of the adversarially trained models across the detector family, and
- Analyzing the trade-offs that arise, for instance, between detection performance and resilience to adversarial examples.