ENGLISH / MAGYAR
Kövess
minket

Attacking and improving the TLSH similarity digest scheme

2023-2024/II.
Dr. Buttyán Levente

TLSH is a similarity digest scheme used in malware detection, malware clustering, and digital forensics applications. It computes a hash value for a given input such that similar inputs give similar hash values, and it includes a difference score calculation algorithm that compares two TLSH hash values and measures their similarity. Recently, TLSH has been used as the underlying similarity digest scheme of SIMBIoTA, a lightweight similarity-based IoT malware detection method. It is also used by VirusTotal (VT) for searching similar malware samples in the VT database.  

  

These applications rely on the assumed robustness of TLSH against attacks. For instance, attacks in the context of malware detection could aim at modifying a malware sample such that its functionality is preserved, but it becomes dissimilar to the original sample or it becomes similar to a benign program; in both cases, the modified and still malicious sample could be misclassified as benign. In an ultimate attack, the malware sample could be modified such that the TLSH value of the modified sample fully matches the TLSH value of an arbitrary other file. If such attacks are feasible, then TLSH can be considered useless for malware detection and for digital forensics purposes, and new, more robust similarity digest schemes need to be developed.

 

The tasks of the student include

·      to propose specific attack algorithms against TLSH and, if possible, demonstrate their feasibility;

·      to propose improvements on TLSH, which could make it more robust against the identified attacks, and to study what we gain and what we need to pay for these improvements in terms of performance loss.  


1
1