Health Data Science

Theme Leads: Alex Lewin (LSHTM) and Julia Critchley (SGUL)

Data Science at the interface of human health and biology (including modelling, data analytics, artificial intelligence and machine learning) is a strategic priority for the MRC, as is the cross-cutting theme of quantitative skills.

Unprecedented volumes of data are being generated through patient records, population studies, clinical trials, imaging and large-scale biological studies such as genomics. In addition, the possibilities to gain further insights into lifelong health and wellbeing through joining such data to other sources of information, such as administrative data and wearables, is opening up new opportunities for research and innovation.

To realise these benefits, complementary methods, tools, infrastructure and skilled people, and regulatory and governance frameworks are needed. Trust also needs to be built with the public, patients and practitioners, so that their interests and privacy are protected while realising the benefits of working together.

Areas of activity (relevant to quantitative skills):

  • Advanced data science, including artificial intelligence Supporting advanced data science and providing the environment to develop and implement new computational technologies and tools in the UK.
  • Methodology research Funding the development of new methodologies needed to inform research practice, policy and healthcare to maximise benefits for researchers, patients and the general population, ensuring biomedical and health research and policy are built on the best possible evidence.
  • Data science capacity and skills Ensuring the UK has the skilled people it needs, working in the best way to support UK biomedical and health-related data science-intensive research and innovation.

LSHTM is a world leader in Health Data Science, with expertise in the creation, linkage and analysis of a wide range of data sources, encompassing data on environmental and social factors as well as ‘omics data, both human and pathogen. We are uniquely placed to translate health data science to low-and-middle-income country (LMIC) settings, where this field is rapidly developing.

SGUL also has long-standing expertise in analysing data from very large linked datasets including primary care databases, ONS mortality and hospitalisation data, and European data on congential abnormalities and medication utilisation in pregnancy.

In order for a doctoral project to meet the criteria of developing “quantitative skills”, the following should apply.

The project must involve the applicant developing specific quantitative skills that are required for data science-intensive research, in a biomedical, clinical, epidemiological or public health-related field. This should involve skills that a student would not normally learn during a typical MSc in Public Health, Epidemiology, Health Data Science or Medical Statistics (or similar).

The quantitative aspect of the project should be a large component of the overall project. Projects that simply involve analysing data using a standard approach (e.g. standard logistic regression analysis of a case-control study, or quantitative analysis of laboratory data) will not qualify.

The quantitative skills to be developed could include, but are not restricted to:

  • Application of advanced quantitative methods as a significant component of the PhD, including (but not restricted to):
    • machine learning and artificial intelligence
    • mathematical modelling (e.g. modelling of infectious disease, agent-based modelling)
    • advanced statistical methods (e.g. latent class analysis, factor analysis, mediation analysis, marginal structural modelling)
    • advanced epidemiological methods (e.g. causal inference methods, triangulation, quantitative bias analysis, GWAS, Mendelian randomization)
  • Development or evaluation of quantitative methodology
  • Applied data analysis in contexts which require specific analytic methodologies and considerable context-specific analytic expertise. These could include, but are not restricted to:
    • Large scale health data (e.g. electronic health record data, audit data, registry data, claims data, inpatient hospital data)
    • Omics data and other high dimensional data
    • Complex data from wearables, mobile phone apps or social media