Napoleone Ferrara and Giovanni Paternostro have spoken with Andrea Califano. Andrea is the President of the Chan Zuckerberg New York Biohub. He is also Professor of Chemical and Systems Biology at Columbia University. From 2013 to 2023, he was the Founding Chair of the Department of Systems Biology at Columbia University and Director of the JP Sulzberger Columbia Genome Center. Among numerous awards he has received are the 2015 and 2022 NCI Outstanding Investigator Award, the 2019 Ruth Leff Award in Pancreatic Cancer, and the 2023 NCI Alfred G. Knudson Award in Cancer Genetics.
Dear Andrea,
What could be achieved if there was a public or nonprofit AI effort with the same scale and level of funding as the current large private efforts? What would be the benefits for society?
Andrea:
Machine learning coupled with experimental approaches has already been transformational in our ability to predict the cell behavior and response to perturbations. For example, in several of our papers—see (1), for instance—we show that we can discover novel mechanism that can be leveraged to reprogram cell state effectively by interrogating regulatory and signaling networks reverse-engineered from large-scale data.
AI, when coupled with appropriate experimental data generation, may help address some key challenges that are poorly addressed by current computational approaches. For instance, the logic of the cell is both causal and intrinsically very loopy; computational elucidation of causality and loopiness has been very challenging so far. When coupled with large-scale perturbational profiles, AI holds the promise to unravel our understanding of these complex properties of cellular networks, both in signal transduction and in transcriptional regulation.
The Chan Zuckerberg Initiative (CZI) and Biohubs (CZB) have access to what is probably the largest GPU farm available to nonprofit biomedical research. The different groups within CZB and CZI are hard at work to leverage this platform to develop novel AI-based methodologies to tackle key biological questions.
At the same time, I'm a little bit skeptical about large language models (LLMs), because they are not well suited to model cell behavior. They work really well to predict protein folding because the dependency model is relatively local, but when you look at the complex graph structure that underlie cell regulation, then LLMs don't work very well because they cannot deal with loopy or highly non-local dependency models, which are intrinsically represent in cellular networks. We are following several different approaches that we're pioneering to couple graph-based dependency models with AI.
Also, it is important to always start from a specific biological problem one is trying to solve rather than developing AI models in vacuum. At the CZ Biohub in New York, we are trying to reprogram the immune system to do the kind of things that has not been trained to do by natural evolution, for instance, to detect signals from cryptic diseases, such as Parkinson or Alzheimer’s or to deplete the immunoevasive subpopulations that are often present in the tumor microenvironment. Among the latter, some of our initial target populations include macrophages, regulatory T-cells, and fibroblasts. We are generating the data is by using very large scale perturb-seq approaches where we perform CRISPR-mediated silencing of every regulator, including transcription factors, co-factors, and signaling proteins. That's about 6,000 proteins in total, but if you contextualize it to a specific tissue, that number goes down to around 4,000. We are literally able to silence all of them in millions of single cells and then to the response of the cells by single cell RNA sequencing for each one of the silenced genes. From those large-scale datasets, comprising the genome-wide response of millions of cells to single gene perturbations, we are able to train AI to do many interesting things.
We have already done the same thing with simpler network-based methods that have worked incredibly well. This was published in Cancer Cell last year (1). We reprogrammed regulatory T cells (Tregs) to not being able to become tumor-resident anymore, and we've discovered, using network-based biology, what we call master regulators of these process. We then validated them using a CHIME assay. You can't perform this kind of complex assays by silencing all 20,000 genes but they become feasible when you restrict the number of potential candidates using these methodologies. For this study, we only had to validate 17 candidate master regulator genes and eight of them were experimentally validated. When the top one (TRPS1) was silenced in all the cells, we achieved spontaneous remission of more than half of the tumors. In this case, we could also identify a drug that effectively phenocopies the inactivation of all 17 candidate master regulator genes. This is a drug called gemcitabine—one of the typical chemotoxic drugs used in oncology—except that it was shown to elicit the desired effect only when you use it at one-tenth of the currently used clinical concentration, as it would otherwise also kill helper CD4 and effector CD8 T cells together with the Tregs.
We have several researchers in the lab contributing to the larger-scope CZI effort to create a virtual cell AI to model cell behavior, which could have a large impact on biology and medicine.
We are encouraging researchers at different career stages to share ideas about complex science problems that could benefit from a large-scale AI effort. We found that motivation and recognition could be provided if you and other well-known scientists were willing to talk to people that suggest the best ideas. You would be the judge and decide if any idea is for you deserving of attention. Any scientist selected might receive advice but could also be a potential collaborator. Many ideas will be produced, and society will take notice. Would you be willing to talk to any of these scientists?
Andrea:
Yes, I would be very happy to do it. We have always been very open to collaborations, and I am big supporter of open science. At this time, I have more than 60 ongoing collaborations, mostly with people outside of Columbia. I am excited by problems that address relevant biological questions, so if the problem is interesting, I typically tend to say yes, even if I should say no, because I might not have the bandwidth. Collaboration is common in systems biology, because there's no problem within systems biology that can be solved by an individual lab, besides the design of a specific algorithm, and so most of the things that we do are collaborative. The same is true for AI in biological science, where both computational and experimental expertise are important; some researchers become experts in both and are really helpful in promoting communication.
As an ex-physicist, I am not fond of attaching special importance to the order of authors, separating the first author and the last author from the others. It may be much better for collaborative work to list authors alphabetically and rather explain what everybody has done. For instance, in a truly collaborative study, there should not be a difference between two or more co-first authors or two or more co-corresponding authors. But recruitment committees may still look at co-first and co-last authors with some diffidence.
An important way in which we communicate in science is of course the scientific literature. LLMs like ChatGPT are still not very good in mining the literature and providing useful answer to standard questions.
REFERENCES
1 - Obradovic, A., Ager, C., Turunen, M., Nirschl, T., Khosravi-Maharlooei, M., Iuga, A., ... & Califano, A. (2023). Systematic elucidation and pharmacological targeting of tumor-infiltrating regulatory T cell master regulators. Cancer Cell, 41(5), 933-949.