BME-HIT

Security and Privacy in Machine Learning

Félév: 2020-2021/II.

Konzulens: Dr. Ács Gergely

Machine Learning (Artificial Intelligence) has become undisputedly popular in recent years. The number of security critical applications of machine learning has been steadily increasing over the years (self-driving cars, user authentication, decision support, profiling, risk assessment, etc.). However, there are still many open privacy and security problems of machine learning. Students can work on the following topics:

Detecting adversarial examples: Adversarial examples are maliciously modified samples where the modification is visually imperceptible yet the prediction of the model on this slightly modified sample is very different compared to the unmodified sample. One way to detect such adversarial examples is to classify the activation values in a neural network. The task is to develop such classifiers and potentially modify the training so that the activation values generated by an adversarial example becomes easily distinguishable from an ordinary (non-adversarial) one with a classifier. (Contact: Szilvia Lestyán, Gergely Ács)
GDPR compliance test of websites: Although GDPR was enacted in 2018, there are still many webpages which do not comply with the regulation. For example, they set cookies even if a user did not provide his/her consent to do so, or do not respect the privacy settings of the user. The task is to crawl Hungarian websites and check whether their cookie managment is GDPR compliant or not. (Contact: Gergely Ács)
Watermarking of Machine Learning models: As model extraction is easy (i.e., one can easily steal a machine learning by using it as an oracle), model owners embed a watermark into the trained model and claim ownership upon a copyright dispute in order to discourage model extraction. Watermarks can be implemented by inserting a backdoor sample into the model that is only known to the model owner. The task is to develop/evaluate watermarking schemes in federated learning and study the interaction between differential privacy and watermarking. (Contact: Gergely Ács)
Database reconstruction from aggregate queries: It is (falsely) believed that aggregation preserves privacy, that is, if one computes several aggregation queries (SUM, AVG, COUNT, etc.) on a database then it is very hard to infer the individual record values in the table only from these aggregates. The task is to implement attacks which check whether a set of aggregate queries can be answered without revealing any single individual record on which these queries were computed. (Contact: Gergely Ács)
Anonymization of NYC taxi data: This dataset contains the pick-up and drop-off dates/locations of many taxi trips in New York. The task is to anonymize this dataset with Variational Auto-encoders providing (provable) Differential Privacy guarantees. (Contact: Gergely Ács, Szilvia Lestyán)
Anonymization of sequential data: Sequential data includes any data where data records contain the sequence of items of a user (e.g., location trajectories, time-series data such as electricity consumption, browsing history, etc.). The task is to anonymize such datasets with generative models based on gaussian mixtures and realistic/probabilitic sequence generation. (Contact: Gergely Ács, Szilvia Lestyán)
Secure aggregation of sparse data: Secure aggregation is often employed in Federated Learning. It allows an untrusted aggregator to learn only the sum of model updates without revealing the individual members of this sum. However, these model updates are large, and secure aggregation is too slow in practice. The task is to speed up secure aggregation perhaps by exploiting the sparsity of the model updates. (Contact: Gergely Ács)
Compression of quantized model updates: There are several techniques to compress model updates in federated learning. We are looking for solutions that can be intergated with secure aggregation (i.e., linear) and support the compression of quantized update values (typically two or three-level quantization, e.g., when only the sign of the update values are transferred for aggregation). Another optional task is to develop lossless compression operators which are linear and hence can be integrated with secure aggregation. Indeed, most compression schemes for federated learning are lossy in nature, that is, the decompressed updates are not identical to the original updates that were compressed. (Contact: Gergely Ács)
Fairness and privacy: Compression techniques can improve the utility of privacy-preserving machine learning. However, privacy-preserving training is also considered unfair to subgroups as the trained model is less accurate on underrepresented groups (e.g., minorities). Similarly, compression is also believed to negatively impact fairness as it decreases model capacity. It is an open question how privacy-preservation and compression together influence robustness (i.e., poisoning which aims at degrading model performance by inserting outliers into the training data), and in general, whether unfairness helps robustness or not. The task is study the impact of differential privacy and compression on the robustness of training and the fairness of the produced models. (Contact: Gergely Ács)

Hallgatók száma: 6

Jelentkezők száma: 3