Interview with Pierre Baldi

Pierre Baldi is Distinguished Professor, Department of Computer Science, Director, AI in Science Institute, and Associate Director, Center for Machine Learning and Intelligent Systems at the University of California, Irvine. He is one of the leading researchers in deep learning, and he has applied these techniques to data from different areas of science, in the life sciences but also in physics and in chemistry. His research is focused on understanding natural and artificial intelligence (1).

Dear Pierre,

What could be achieved if there was a public or nonprofit AI effort with the same scale and level of funding as the current large private efforts? What would be the benefits for society?

Pierre: Today scientists in universities cannot train and study the most advanced AI systems. I am thinking about things like GPT4 and what will be GPT5, those systems require tens of thousands if not hundreds of thousands of GPUs, therefore very large budgets, and no university can even come close to that.

We cannot train these systems. We can use them once they are published, we can do a few things on top of them, but we cannot study AI at its cutting edge and this is bad for science, it is bad for the science of AI, and it is bad for AI safety. Imagine that in one year or two we have an even more powerful GPT5 and someday AGI (Artificial General Intelligence) and imagine that this happens in a for-profit organization. Even if they are well intentioned it is a thought that makes me uncomfortable. AI is not just a technology; it involves the fundamental understanding of intelligence.

If we think, as society, collectively, that these are serious problems it seems to me that there is only one way to solve them: we must create the largest computing and data center in the world, and then organize many AI scientists around that center. You can make a calculation that this is very expensive, it's many billions of dollars. Training GPT4 costs about 100 million dollars and the numbers go up by a factor of 10 every generation, so imagine GPT5 is a billion, and universities are significantly behind the curve. None of this has really started yet—the existing efforts by NSF and other entities are too small by at least one order of magnitude. It may take one or two years for politicians to agree, and then you must sustain the project over several decades or more, for the foreseeable future. You need also hundreds of permanent staff in such a center. In short, you're talking about a very significant budget.

You need to sustain budgets of many billion dollars, maybe hundreds billion dollars. Only large economies can do that, primarily with taxpayer money, and so you're looking at the US, you're looking at China, you're looking at Europe possibly with additional contributions from countries like Japan, Australia, and Canada. That's what I propose, to have one CERN-AI kind of effort, or maybe two-one in Europe and one in the US, with a very large budget, with permanent staff to run these clusters, and then thousands of academic labs, like CERN, visiting the center, some working at the center for a period of a year or more, and everybody contributing ideas and training the most advanced kinds of AI.

The funding could come from governments and government agencies, but also from foundations, and possibly from companies. Once this big AI system is built, this CERN like center, it could have commercial spin-offs, and the spin-offs could give money back to CERN AI. You could have different models and probably you need that because so much money is required that you need different stakeholders to put money in.

In the coming decades, as AI advances, AI may help us solve or make progress in some of the major problems facing science. There are two major ones that come to my mind first. The first one is about the foundation of physics. General relativity and quantum mechanics are the foundation of everything we know in physics, and these two theories don't go together for instance at the very small scale (Planck scale), so there is a sense that there ought to be a better unified theory that somehow reconciles general relativity and quantum mechanics together. There are all kinds of hypotheses, but so far nobody has found a good solution to this problem. Maybe AI could. Once AI becomes very good in mathematics that's a very interesting problem for it. Another challenge is the famous “P not-equal NP” problem, which is maybe the most important and fundamental open problem in computer science and even in mathematics. The short version of this problem is: can we show that there is no computer algorithm that can solve the traveling salesman problem exactly in polynomial time? This problem has been known for half a century and very little progress has been made towards its solution.

Finally, there are many other scientific problems where AI could support advances. For example, climate modeling, drug discovery, materials discovery, vaccine design, finding new chemicals for a variety of applications, various forms of precision medicine, and so forth.

We are encouraging researchers at different career stages to share ideas about complex science problems that could benefit from a large-scale AI effort. We found that motivation and recognition could be provided if you and other well-known scientists were willing to talk to people that suggest the best ideas. You would be the judge and decide if any idea is deserving of your attention. Any scientist selected might receive advice but could also be a potential collaborator. Many ideas will be produced, and society will take notice. Would you be willing to talk to any of these scientists?

Pierre: Yes. I am all for collaborations, a lot of things I do are based on collaborations. What you're saying resonates with me in several different ways.

If there was an effort to create the sort of CERN AI that I am proposing, one big problem would be the decision process within such an organization. You will have thousands of scientists affiliated with this large computing data center. There's going to be divergence of opinions, and it would be good to have this grassroot, bottom-up input of many different ideas. But to get things done, you need also some top-down process to decide which experiments should be carried out, because training in cutting-edge AI requires a lot of organizing.

There is also the issue of the data, how are we going to get the data to train a public CERN AI effort? Large companies, like Google, Microsoft, Apple, Meta, and Tesla have access to a lot of proprietary data. We can crawl the web like the companies do, and we have arXiv and all the scientific literature, but companies have an edge there. So that's an interesting issue that would need to be carefully addressed.

The other aspect this discussion is resonating with is collective intelligence. What is interesting here is that AI can also engage in various forms of collective intelligence like we do. In fact, it can do it much better than us. Once you have a GPT5 it's very easy to have 1,000 of them, you just copy the weights. This agentic side of AI is potentially very important, where you have a lot of AI agents, and they are looking at different types of data in different parts of the world and then they are sharing and collaborating. In a way this is similar to what humans can do but on a potentially much larger and faster scale. Humans cannot share synaptic weights.

REFERENCES

1- Caltech Heritage Project Interview with Pierre Baldi

https://heritageproject.caltech.edu/interviews-updates/pierre-baldi

2- Pierre Baldi - The Need for New “AI Telescopes”

https://www.linkedin.com/feed/update/urn:li:activity:7203564233395982336/