Security and Privacy in Machine Learning

Dr. Ács Gergely

Machine Learning (Artificial Intelligence) has become undisputedly popular in recent years. The number of security critical applications of machine learning has been steadily increasing over the years (self-driving cars, user authentication, decision support, profiling, risk assessment, etc.). However, there are still many open privacy and security problems of machine learning. Students can work on the following topics:

Own idea: If you have any own project idea related to data privacy, or the security/privacy of machine learning, and I find it interesting, you can work on that under my guidance... You'll get +1 grade in that case.

Security of Machine learning based Malware Detection: Adversarial examples are maliciously modified program code where the modification is hard to detect yet the prediction of the model on this slightly modified code is very different compared to the unmodified code. For example, the malware developer modifies a few bytes in the malware binary which causes the malware detector to misclassify the malware as benign. A potential task can be to develop solutions to detect adversarial examples, develop robust training algorithms for malware detection, or design backdoor attacks. 

Privacy and Security of Transfer Learning: In transfer learning, large companies train large (base) models (e.g, LLM or diffusion models) which are then fine-tuned by smaller companies, which don't have the necessary resource to train large models, for more specific tasks (e.g., a chatbot for a specific topic, or object recognition for tumor classification) using their own private training data. However, large companies can poison the base model so that the fine-tuned model can potentially leak information about the small companies' private data. The task is to design/develop/evaluate/mitigate such attacks. 

Privacy and Security of Federated Learning: Federated learning allows multiple parties to collaborate in order to train a common model, by only sharing model updates instead of their training data (e.g., mobile devices train a common model for input text prediction, or hospitals train a better model for tumor classification). Even if this architecture seems more privacy-preserving at first sight, recent works have highlighted numerous privacy and security attacks to infer private and sensitive information. The task is to develop privacy and/or security attacks against federated learning (data poisoning, backdoors, reconstruction attacks) , and/or mitigate these attacks.

Develop Federated Learning Framework for Medical Data: Federated learning is going to be adopted in health care, where different organizations want to train a common model for different purposes (tumor/disease classification, prediction of survival time, finding an explainable pattern of Covid on whole slide images of livers, etc.) but organizations lack of sufficient training data individually. The task is to develop federated learning framework for such tasks.

(De-)Anonymization of Medical Data: EHR (Electronic Health Records), ECG (Electrocardiogram) and CTG (Cardiotocography), diagnostic images (MRI, X-ray), are very sensitive datasets containing the medical records of individuals. The task is to anonymize such datasets (or some aggregates computed over such data) for data sharing with strong, preferably provable privacy guarantees which are also GDPR compliant.

Poisoning Differential Privacy: Differential Privacy is the de facto privacy model used to anonymize datasets (see US-Census data). Small noise is added to the data which hides the participation of any single individual in the dataset, but not the general statistics of the population as a whole. The noise is calibrated to the influence of any record. However, if the data is coming from untrusted sources, the attacker can inject fake records into the dataset in order to increase the added noise that eventually degrades the utility of the anonymized data. The task is to design and implement such an attack

More information: https://www.crysys.hu/education/projects/