In collaboration with Cambridge AI in Medicine, we hosted Daniela Massiceti from Microsoft Research on 22nd November, who discussed the importance of good data for AI. High quality datasets are essential for producing functional machine learning systems, and Dr Daniela Massiceti addressed the importance of this with a special focus on the effects of Dataset bias. The link to the recorded talk is available here.

Event poster for Dr Daniela Massiceti’s talk.


Large datasets are a fundamental part of current machine learning (ML) pipelines, however, biases often find their way into these datasets and can have serious consequences if left unaddressed. This talk will dive right in with an introduction to dataset bias and types of bias that are commonly encountered in ML systems. Next, we’ll unpick some examples of what can happen when it’s left unaddressed, and finally we’ll look at an overview of approaches that can be used to tackle it. Join in to learn more about dataset bias and building robust and equitable ML systems.


Daniela is a machine learning (ML) researcher at Microsoft Research Cambridge (UK) where she works on ML systems that learn and evolve with human input (“human-in-the-loop”). Her main research directions lie, firstly, in making models robust to real-world training data provided, and, secondly, in making models more human-explainable. Advances in these directions will enable completely personalised tools in many spheres – from assistive tools for people who are blind/low-vision, to AI-assisted diagnostic tools for doctors. Prior to joining MSR, she did a PhD in computer vision at the University of Oxford, a Masters in Neuroscience also at Oxford, and a Bachelors in Electrical and Computer Engineering at the University of Cape Town. She is passionate about diversity and inclusion in machine learning and is an organiser of the Deep Learning Indaba.