Concealing attacks against similarity based malware detection

Dr. Buttyán Levente

Similarity based malware detection is a recently proposed approach that uses a similarity digest scheme (e.g. TLSH) to efficiently check if a newly found software binary is similar to any on a previously known list. This proved to be a very efficient and accurate solution of malware detection that can even be used on resource limited systems (e.g. IoT devices), where traditional antivirus cannot. However, the TLSH similarity digest scheme, which showed great results in this application, was since found vulnerable, and automated attacks were developed that can manipulate input binaries in a way that confuses TLSH to find originally similar binaries unsimilar or vice versa, without changing the modified binaries' behavior.

One weakness of these attacks is that they implement unnatural byte patterns to the files which might make this tampering easily detectable. The task would be to modify these algorithms to produce less noticeable patterns, for example patterns consisting of valid machine code instructions of the given architecture.

The student can learn about software binaries, machine code, similarity digest schemes, the attacks against TLSH, and the Rust programming language, which the attacking library is developed in.