The timeline traces the historical milestones that led to AlphaFold, a landmark achievement in protein structure prediction powered by artificial intelligence (AI). It highlights key scientific, methodological, and cultural developments spanning over six decades, beginning with the first protein structures solved by Kendrew and Perutz (1958-1960).
Significant early milestones were the establishment of protein sequence and structure repositories, particularly the Protein Data Bank (PDB) initiated in the early 1970s by an effort including both senior and junior scientists. The PDB grew from these grassroots efforts amid debates about data-sharing practices, progressing gradually over several decades. The adoption of open data sharing policies was a consequence of community letters and petitions, initiatives prompted by the PDB leaders and decisions of scientific societies, journals (like Nature and Science) and funders (like HHMI and NIH).
Bioinformatics methods and computational tools evolved considerably, from algorithms for sequence alignment (1970s-80s), through other bioinformatics tools in the following decades, adopting practices of open-source software development and significantly enhancing sequence analysis capabilities.
The Critical Assessment of Protein Structure Prediction (CASP) launched in 1994 benchmarked computational predictions. After initial improvements, for more than 10 years there was no progress in the prediction metrics, until DeepMind’s AlphaFold achieved breakthroughs in 2018 and 2020.
The advent of GPUs (2008) and large annotated datasets like ImageNet (2009) catalyzed advancements in AI. DeepMind, a company co-founded by Hassabis in 2010 with venture capital support and acquired by Google in 2014, leveraged computational resources and data advancements, notably UniProt's extensive sequence datasets and PDB's comprehensive structure archives. Multiple sequence alignments provided evolutionary information. AlphaFold2 (2020) employed transformer-based neural networks (2017), significantly outperforming prior methods in CASP14 and largely solved structure prediction for most single-chain globular proteins. Substantial challenges remained for disordered regions, multi-protein complexes, and dynamic conformational landscapes.
This historical perspective underscores crucial contributions by numerous scientists and institutions toward open data sharing, algorithmic innovation, and interdisciplinary collaboration, foundational to the modern AI-driven breakthroughs exemplified by AlphaFold and its open-source academic derivatives, including RoseTTAFold, OpenFold and Colabfold.
Current efforts focus on extending AI applications to more complex cellular and biomolecular interactions, including the virtual cell, driving the next frontier in biology.
Main events:
1958-1960: First protein structures solved (myoglobin, hemoglobin).
1970s: Establishment of Protein Data Bank (PDB), key data resource.
1990s: Development of bioinformatics tools (BLAST, HMMER) enhancing sequence analysis.
1994: Launch of CASP to benchmark protein structure prediction methods.
1998-1999 Substantial progress in the sharing policies of structural data, driven by the scientific community and supported by major journals and by funders.
2008-2009: GPUs and large datasets (ImageNet) accelerate AI research.
2010-2014: DeepMind founded, acquired by Google, focusing on artificial general intelligence.
2017 Transformers paper introduces a new type of AI architecture.
2020: DeepMind's AlphaFold2 achieves significant breakthrough in protein structure prediction using AI.
2024: Nobel Prize awarded to key AlphaFold contributors, recognizing AI's impact on science.