How my work has confronted and grappled with the intersection of technology and diversity

Mar 5, 2023

A talk delivered at the Festival of Difference on Monday February 27, 2023, University of Westminster.

Within the Genetics realm, the issue of diversity has been notorious. Within my research field, it all started with a seminal paper by Alice Popejoy at Stanford who highlighted that within all of the genome wide association studies performed to date, about 80% of all existing data were of Northern European descent.

Why is this important for genomic research? This is important because the promise of Precision Medicine, the ability to diagnose and treat disease in a way that is personalised, predictive, preventative and participative hinges on the genetic makeup of the individual.

For instance, the CYP2D6 gene. This gene metabolises 20% of all prescribed drugs, the likes of codeine. It has been found that different ancestries have different variants that affect patients’ ability to metabolise drugs. The algorithms that have been designed to predict the right dose for a patient requiring codeine have been trained with European cohorts. This makes it possible for Africans for instance, who have different variants, and who are significantly more affected by sickle cell anaemia to receive too much or too little of the appropriate dose, hence providing insufficient relief.

Pharmacology is not the only area that is affected by the lack of diversity in the datasets from which our source knowledge was based. Another example is the genomic predictors of complex diseases. By complex diseases I mean those whose susceptibility affecting a patient is based on many genes. All diseases that people die of in industrialised nations are complex diseases: diabetes type 2, coronary artery disease, inflammatory bowel disease, atrial fibrillation, breast cancer. All of these diseases have a significant genetic susceptibility component. In other words, your genetic make up is a risk factor as much as it could be your high cholesterol or high blood pressure for heart disease. Yet our models for these important diseases are all trained with white rich Northern Europeans. What this means is that our ability to stratify high risk individuals is compromised for underrepresented populations such as Africans or South Asians. This does not only mean that these populations are unable to benefit from the advances of science in genomics, their lack of representation in genetic models makes them more at risk for adverse drug reactions that could be given for treatment.

How are we bridging this gap? First and foremost, there has been a tremendous move towards making datasets more inclusive of diverse populations and there is a motivation that affects us all. It’s not just the fact that by having more diverse datasets we will be able to have better trained disease and dose prediction algorithms for susceptible individuals. All of us will benefit because the greater the diversity of our datasets the better we are able to understand the genetics and how it affects trait generation and their inheritance. In other words, it’s invaluable for scientific advancement to record all of the diversity that exists in humankind, because this will enable us to better understand genetics, which reverts to both represented as well as underrepresented populations. Moreover, the mapping of diversity helps us understand our past and future. Our past because a continent like Africa, which has more genetic diversity than the rest of the world combined, offers the opportunity to understand how traits evolved in the past and how they could evolve in the future.