skip to content

Cambridge NERC Doctoral Training Partnerships

Graduate Research Opportunities
 

Lead supervisor: Henry Moss, DAMTP

Co-supervisor: Colm Caulfield, DAMTP

Brief summary: 
The “Automatic Bayesian Climate Scientist” project aims to develop novel interpretable Machine Learning methods that empower climate scientists to uncover deep insights into intricate physical systems from measurement data.
Importance of the area of research concerned: 
There is a significant disconnect between modern ML methods and their ability to assist climate scientists in gaining deeper scientific understanding from observational data. Specifically, the challenge lies in distilling the fundamental equations governing physical phenomena. Indeed, this goal is directly at odds with the black-box nature of recent ML advancements, which prioritise raw predictive performance over interpretability. Therefore, to realise the potential of ML within climate science, we need to shift our focus to a fundamentally different methodology, namely equation discovery. Equation discovery produces a human-readable formula that can be easily trusted, understood, and verified by domain experts, setting it apart from even the most explainable black-box methods. However, although equation discovery has been successful in several impactful climate applications (see references below), existing methods are plagued by significant computational expenses, lack of robustness, poor data efficiency, and require highly specialised domain knowledge to use effectively.
Project summary : 
This project aims to develop a reliable and user-friendly approach for equation discovery. Using recent advances in physics-informed probabilistic ML, we will recast equation discovery as a Bayesian inference problem. Central to the project is the identification of an apt methodology for characterising the complexity of a candidate differential equation — existing methods rely on statistical complexity metrics that fail to characterise properly the complexity arising from differential terms. The developed methodology will be applied across multiple domains within climate science including empirical atmospheric and fluid dynamical data. A particular focus will be on improving subgrid-scale parameterisations of various key physical processes within climate models, such as cloud dynamics in the atmosphere, heat transport in the oceans, and ice formation/melting in the polar regions.
What will the student do?: 
The successful applicant will develop new probabilistic ML methodology for discovering the governing equations of physical systems from noisy and sparse measurement data. The student will become an expert in Bayesian ML ( e.g. Gaussian processes) and gain practical knowledge of fluid and atmospheric dynamics while applying their methodology in tandem with climate scientists. The exact climate systems used to test new methodology during the project will be adapted based on the student’s interests and background. A key part of the project will be in building open-source tooling to enable the use of the project’s outcomes by the wider climate community.
References - references should provide further reading about the project: 
Zanna, L. & Bolton, T. 2020. Data-driven equation discovery of ocean mesoscale closures. Geophysical Research Letters.
Ross, A., Li, Z., Perezhogin, P., Fernandez‐Granda, C., & Zanna, L. (2023). Benchmarking of machine learning ocean subgrid parameterizations in an idealized model. Journal of Advances in Modeling Earth Systems.
Grundner, A., Beucler, T., Gentine, P., & Eyring, V. (2023). Data-Driven Equation Discovery of a Cloud Cover Parameterization. arXiv preprint arXiv:2304.08063.
Applying
You can find out about applying for this project on the Department of Applied Mathematics and Theoretical Physics (DAMTP) page.