BME-HIT

Security and Privacy in Machine Learning

Félév: 2021-2022/I.

Konzulens: Dr. Ács Gergely

Machine Learning (Artificial Intelligence) has become undisputedly popular in recent years. The number of security critical applications of machine learning has been steadily increasing over the years (self-driving cars, user authentication, decision support, profiling, risk assessment, etc.). However, there are still many open privacy and security problems of machine learning. Students can work on the following topics:

Own idea: If you have any own project idea related to data privacy, or the security/privacy of machine learning, and I find it interesting, you can work on that under my guidance... You'll get +1 grade in that case. (Contact: Gergely Acs)

Adversarial examples: Adversarial examples are maliciously modified samples where the modification is visually imperceptible yet the prediction of the model on this slightly modified sample is very different compared to the unmodified sample. One way to detect such adversarial examples is to classify the activation values in a neural network. The task is to develop such classifiers and potentially modify the training so that the activation values generated by an adversarial example becomes easily distinguishable from an ordinary (non-adversarial) one with a classifier. (Contact: Szilvia Lestyán)

Watermarking of Machine Learning models: As model extraction is easy (i.e., one can easily steal a machine learning by using it as an oracle), model owners embed a watermark into the trained model and claim ownership upon a copyright dispute in order to discourage model extraction. Watermarks can be implemented by inserting a backdoor sample into the model that is only known to the model owner. The task is to develop/evaluate watermarking schemes in federated learning and study the interaction between rivacy and watermarking. (Contact: Gergely Ács)

Record reconstruction from aggregate queries: It is (falsely) believed that aggregation preserves privacy, that is, if one computes several aggregation queries (SUM, AVG, COUNT, etc.) on a database then it is very hard to infer the individual record values in the table only from these aggregates. The task is to implement attacks which check whether a set of aggregate queries can be answered without revealing any single individual record on which these queries were computed. (Contact: Gergely Ács)

Anonymization: Sequential data includes any data where data records contain the sequence of items of a user (e.g., location trajectories, time-series data such as electricity consumption, browsing history, etc.). The task is to anonymize such datasets with generative models and realistic sequence generation. (Contact: Gergely Ács)

Fairness/privacy/robustness in Machine Learning: Privacy-preserving training is considered unfair to subgroups as the trained model is less accurate on underrepresented groups (e.g., minorities). Similarly, compression is also believed to negatively impact fairness as it decreases model capacity. It is an open question how privacy-preservation and compression together influence robustness (i.e., poisoning which aims at degrading model performance by inserting outliers into the training data), and in general, whether unfairness helps robustness or not. The task is to study the impact of privacy-preserving techniques and compression on the robustness of training and the fairness of the produced models. (Contact: Gergely Ács)

More information: https://www.crysys.hu/education/projects/

Hallgatók száma: 6

Jelentkezők száma: 2