Artificial Intelligence for Life in Space (AI4LS) Working Group
Overview
Computational analysis of biological data spans a variety of approaches including statistical methods, bioinformatics approaches, mathematical modeling, and artificial intelligence (AI) and machine learning (ML) methods. Each of these approaches has strengths and weaknesses and must be evaluated according to the biological question and data type(s). ML approaches have shown promise for deriving meaningful patterns from complex, heterogeneous, multi-modal biological datasets. Broadly, ML methods are designed to create a mathematical model to fit patterns in a dataset, and iteratively improve the model by minimizing errors of prediction on unseen data. This approach has the advantage of often not being limited by specific statistical assumptions or data distributions, and self-supervised ML methods can learn intrinsic patterns from data without human labeling. The space biology field studies how living systems from Earth are affected by exposure to the space environment. Spaceflight includes five main hazards for biological systems: radiation, microgravity, distance, confinement, and hostile/closed environments. The AI4LS working group aims to build ML models to predict physiological effects of these systems, using multi-modal biological and environmental data from the NASA Open Science Data Repository.
Focus Areas
Causal Inference
Biological investigations are often interested in identifying causal relationships between a condition and an effect, but most analysis techniques are limited to identifying correlation. Correlation is required for causation but not sufficient. However, leveraging a combination of invariance theory and ML methodology, we aim to elucidate causal relationships in complex biological data.
Foundation Models
Transfer learning is a machine learning technique in which a model is trained on a large, broad dataset to encode underlying features and relationships, and then refined on a smaller dataset for a similar problem space. This is relevant to space biology research, where datasets typically have limited sample size and the problem space is restricted to a specific distribution. We are developing a “model zoo” of foundation models pretrained on larger biomedical datasets that can then be refined to model space biology research questions.
“Self-Driving” Labs
Space biology experiments are expensive and require many hours of hands-on astronaut time. Terrestrial biological investigations have been taking advantage of automated, cloud-based laboratory technology. We aim to support the development of automated, “self-driving” lab technology that is hardened for spaceflight. This technology will also enable to collection of many more data points in a consistent and reproducible manner.
Contact
Sylvain Costes (sylvain.v.costes@nasa.gov), Lauren Sanders (lauren.m.sanders@nasa.gov)
AI/ML Analysis Working Group: https://osdr.nasa.gov/bio/awg/about.html
Recommended Reading
Sanders L.M. et al. “Biological research and self-driving labs in deep space supported by artificial intelligence.” Nature Machine Intelligence 5, 208–219 (2023). DOI: 10.1038/s42256-023-00618-4
Scott R.T. et al. “Biomonitoring and precision health in deep space supported by artificial intelligence.” Nature Machine Intelligence 5, 196–207 (2023). DOI: 10.1038/s42256-023-00617-5
Afshinnekoo E. et al. “Fundamental Biological Features of Spaceflight: Advancing the Field to Enable Deep-Space Exploration.” Cell vol. 183,5 (2020): 1162-1184. DOI: 10.1016/j.cell.2020.10.050
Budd S. et al. “Prototyping CRISP: A Causal Relation and Inference Search Platform applied to Colorectal Cancer Data.” IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech), pp. 517-521 (2021). DOI: 10.1109/LifeTech52111.2021.9391819