BME-HIT

Security and Privacy in Machine Learning

Félév: 2021-2022/II.

Konzulens: Dr. Ács Gergely

Machine Learning (Artificial Intelligence) has become undisputedly popular in recent years. The number of security critical applications of machine learning has been steadily increasing over the years (self-driving cars, user authentication, decision support, profiling, risk assessment, etc.). However, there are still many open privacy and security problems of machine learning. Students can work on the following topics:

Own idea: If you have any own project idea related to data privacy, or the security/privacy of machine learning, and I find it interesting, you can work on that under my guidance... You'll get +1 grade in that case. (Contact: Gergely Acs)

Adversarial examples: Adversarial examples are maliciously modified samples where the modification is visually imperceptible yet the prediction of the model on this slightly modified sample is very different compared to the unmodified sample. A potential task can be to develop solutions to distinguish adversarial and benign samples, or to develop robust training algorithms. (Contact: Szilvia Lestyán, Gergely Acs)

Watermarking of Machine Learning models: As model extraction is easy (i.e., one can easily steal a machine learning by using it as an oracle), model owners embed a watermark into the trained model and claim ownership upon a copyright dispute in order to discourage model extraction. Watermarks can be implemented by inserting a backdoor sample into the model that is only known to the model owner. A potential task can be to develop, evaluate (compare) watermarking schemes. (Contact: Gergely Ács)

Record reconstruction from aggregate queries: It is (falsely) believed that aggregation preserves privacy, that is, if one computes several aggregation queries (SUM, AVG, COUNT, etc.) on a database then it is very hard to infer the individual record values in the table only from these aggregates. A potential task can be to implement attacks which check whether a set of aggregate queries can be answered without revealing any single individual record on which these queries were computed. (Contact: Gergely Ács)

Anonymization: Sequential data includes any data where data records contain the sequence of items of a user (e.g., location trajectories, time-series data such as electricity consumption, browsing history, etc.). A potential task can be to develop (GDPR compliant) anonymization methods so that individuals are not re-identifiable anymore in the dataset. (Contact: Gergely Ács)

Fairness/privacy/robustness in Machine Learning: In machine learning, privacy-preserving training is considered unfair to subgroups as the trained model is less accurate on underrepresented groups (e.g., minorities). It is an open question how privacy-preservation and fairness together influence robustness (i.e., resistance against integrity attacks such as poisoning or adversarial examples). A potential task can be to study the relation of privacy-preservation, fairness, and robustness in machine learning. (Contact: Gergely Ács)

More information: https://www.crysys.hu/education/projects/

Hallgatók száma: 7

Jelentkezők száma: 7