Trade-offs in Machine Learning

Dr. Pejó Balázs

There are several open problems with ML. A handful of these is concerning privacy (such as the protection of the data used for training) and game theory (such as setting the parameters for rational agents). Within this project, the student will get familiar with ML techniques, and depending on the topic (should you choose to accept), either privacy-preserving mechanisms or game-theoretical models. 

  • Improving Machine Learning by Preclassification: Machine Learning (ML) algorithm performs better on bigger datasets, so in general, it is a good idea to use more data. On the other hand, not all data was created equal: could the model's accuracy be improved by carefully selecting different training data for each phase of the learning?
  • Quality Inference: In Federated Learning, multiple participants train an ML model iteratively together. Due to the enormous communication costs, not everybody participates in each update round. Is it possible to exploit this feature and infer private information of the individual datasets by only accessing the aggregated results?
  • Testing Data Inference: For every ML model, the underlying data is separated into training and testing. While Membership Inference aims to determine whether a particular data point was part of a training set, currently, there are no known techniques to indicate a data point in the test set. Is it even possible?
  • Approximating Shapley from Aggregates: Shapley value is a reward distribution scheme amongst multiply entities that collectively calculated something based on their contributions. It requires access to the individual inputs. Is it possible to approximate this value based on a few aggregated metrics and no individual information?
  • Accuracy vs Privacy - Optimizing the Complexity: More complex ML models perform better, mostly because they are capable of learning more. As a direct consequence, they could potentially leak more information than their simpler counterparts. In which situation does the accuracy gain outweigh the privacy leakage?
  • Privacy-Security-Accuracy Triangle: There is a clear connection between privacy and accuracy within ML. However, more privacy (e.g., noise) could decrease the robustness of the model as it would be easier to fool it (e.g., misclassification). Could this trade-off be measured, and based on some incentives optimized?
  • Privacy-Honesty-Accuracy Trade-off: Privacy protection has an explicit effect on accuracy (e.g., more noise, less accurate model). With more privacy comes a higher chance of cheating (since the actual contribution is more and more hidden). Hence there is an implicit effect as well. How could this relationship be modeled?