ENGLISH / MAGYAR
Kövess
minket

Automated generation of adversarial malware samples with a genetic algorithm

2025-2026/I.
Dr. Buttyán Levente

Machine learning-based malware detection is a hot research topic, especially in the IoT domain. Malware detection based on machine learning relies on training a model with feature vectors of malicious and benign samples, distributing the trained model to end-points, and using it there to predict if a new file encountered is malware or not. This approach can be effective and efficient - even in the IoT domain - however, its robustness against evasion techniques must also be ensured. In particular, this project is concerned with robustness against adversarial samples, which means that it should not be easy to craft malware samples that are misclassified by the trained detector as benign files. 

A well-known approach to make machine-learning based malware detectors more robust against adversarial samples is adversarial training, which requires antivirus providers to enrich their training datasets with a large number of artificially created adversarial samples. It is not clear, however, how such samples should be created without unnecessarily making restrictive assumptions about the attacker. The main idea to be elaborated in this project is to use a genetic algorithm-based approach for finding good strategies to modify an existing IoT malware sample into an adversarial sample. In this way, the antivirus provider does not need to specify adversarial sample creation methods explicitly, but they can rely on the genetic algorithm-based approach to provide good methods for this purpose. 

The specific tasks of the student include the following:

  • Understanding the concept of genetic algorithms;
  • Identifying possible ways to modify an ELF binary in such a way that it remains functionally equivalent to the original binary, and map these to a multi-dimensional space of adversarial sample creation strategies;
  • Representing strategies such that they can be mutated and combined via cross-over;
  • Proposing a fitness function that reflects some generic assumptions about the goals of an attacker (e.g., they want to evade detection as much as possible); 
  • Identifying an appropriate genetic algorithm framework in which the evolution of a population of various adversarial sample creation strategies can be simulated;
  • Designing and execution of experiments within the identified framework with the purpose of finding good strategies for converting an existing malware sample into an adversarial sample;
  • Drawing some conclusions about the suitability of the genetic algorithm-based approach for creating adversarial samples in light of the results of the experiments.

1
1