Summary | cellcomm.org

The timeline traces the historical milestones that led to AlphaFold, a landmark achievement in protein structure prediction powered by artificial intelligence (AI). It highlights key scientific, methodological, and cultural developments spanning over six decades, beginning with the first protein structures solved by Kendrew and Perutz (1958-1960).

Significant early milestones were the establishment of protein sequence and structure repositories, particularly the Protein Data Bank (PDB) initiated in the early 1970s by an effort including both senior and junior scientists. The PDB grew from these grassroots efforts amid debates about data-sharing practices, progressing gradually over several decades. The adoption of open data sharing policies was a consequence of community letters and petitions, initiatives prompted by the PDB leaders and decisions of scientific societies, journals (like Nature and Science) and funders (like HHMI and NIH).

Bioinformatics methods and computational tools evolved considerably, from algorithms for sequence alignment (1970s-80s), through other bioinformatics tools in the following decades, adopting practices of open-source software development and significantly enhancing sequence analysis capabilities.

The Critical Assessment of Protein Structure Prediction (CASP) launched in 1994 benchmarked computational predictions. After initial improvements, for more than 10 years there was no progress in the prediction metrics, until DeepMind’s AlphaFold achieved breakthroughs in 2018 and 2020.

The advent of GPUs (2008), large datasets like ImageNet (2009) and algorithmic innovations like transformers catalyzed advancements in AI.

DeepMind, a company co-founded by Hassabis in 2010 with venture capital support and acquired by Google in 2014, leveraged computational resources and data advancements, notably UniProt's extensive sequence datasets and PDB's comprehensive structure archives. Multiple sequence alignments provided evolutionary information. AlphaFold2 (2020) employed transformer-based neural networks, significantly outperforming prior methods in CASP14 and largely solved structure prediction for most single-chain globular proteins. Substantial challenges remained for disordered regions, multi-protein complexes, and dynamic conformational landscapes.

This historical perspective underscores crucial contributions by numerous scientists and institutions toward open data sharing, algorithmic innovation, and interdisciplinary collaboration. These advances led to AlphaFold and to its open-source academic derivatives, including RoseTTAFold, OpenFold and Colabfold.

Current efforts focus on extending AI applications to more complex cellular and biomolecular interactions, including the virtual cell, driving the next frontiers in science and biology.