An ambitious protein moonshot is gaining momentum to develop and widely share tools that will land us understanding of the complete human proteome.
In 2003, biology was all about the genes. Scientists had just mapped the human genome, yielding, they thought, complete knowledge of the book of life. But unearthing those genes didn’t mean scientists had decoded them. Of the genome’s 20,000 genes that code for proteins, around 2,000 are well-studied, according to a 2018 report.
“It’s one of the only domains of science where it is bounded and you can map activity in the world on a little box,” said Aled Edwards, a molecular geneticist and professor at the University of Toronto. “And it unequivocally shows we are myopic.”
As a result, about 35% of the human proteome remains uncharted and a scarce 5% has been successfully targeted for drug discovery, according to a new report. The authors of that report now aim to change that with a project to develop and distribute a complete set of tools to study the human proteome by the year 2035.
The project, Target 2035, will yield new probes –– including small molecules, chemical probes and antibodies –– to illuminate the “dark proteome,” or a subset of unknown proteins, known proteins with unknown functions and variants of known proteins in our proteome.
“We’re not going to be able to explain life and disease without knowing what every single one of those proteins does,” said Edwards, also founding director of the Structural Genomics Consortium (SGC), a public-private partnership that jumpstarted Target 2035 in 2020.
Netting tools to study the dark proteome — and sharing those tools through an open science model — could offer crucial insights into human biology and drug development. Those insights could also help scientists interpret cryptic parts of the genome and reveal new treatment targets for genetic diseases.
Now, roughly 60 biologists across more than 50 institutions have signed onto the project. They’re led by Cheryl Arrowsmith, a structural biologist, chief scientist at the SGC’s Toronto laboratories and professor at the University of Toronto.
In its first two years, the team has started cataloging and distributing existing protein tools. Scientists also built a framework to funnel resources from life science companies to projects concentrated on protein research reagents, including a system by which pharmaceutical companies can donate probes generated for drug discovery projects to Target 2035 scientists for distribution in the public domain.
Based on that framework, scientists hope to first coordinate the development and sharing of new probes for well-understood protein targets before moving on to more challenging targets. All the while, they’ll weave new infrastructure for shared data repositories, research facilities and, ultimately, knowledge.
“What we find is, typically, people who concentrate in the chemistry technologies don’t have access to proteins, people who are protein experts don’t have access to chemistry,” Arrowsmith said. “Often, neither have access to engineering. So it’s those disciplines that need to come together to move the field forward.”
In addition to shared knowledge, illuminating the dark proteome requires a toolbox of chemical probes, prototype drugs, chemogenomic libraries and functional antibodies. But, according to Edwards, the practical utility of any tool hinges on whether scientists have access.
“To me, the solution is open science,” Edwards said.
Opening the toolshed
The open science movement, which has been gaining traction for decades, strives to ensure research findings are shared by making publications, data, samples, software and tools freely available. Thanks to recent advances, scientists can generate scores of data. But, outside the public domain, its potential stalls.
“If the data is in the public domain, then you can learn and learn, and everything should get faster,” Arrowsmith said. “But we’ve got to really encourage that sharing, both of data and technology.”
The initiative has partnered with leaders of new projects from global collaborators, including EUbOPEN, EU-OPENSCREEN, Illuminating the Druggable Genome and the Chemical Probes Portal. Another contributing initiative, the CACHE Challenges, aims to unite academic and industrial scientists to predict small molecules that bind to key proteins using artificial intelligence.
So far, the projects chiefly aim to develop and distribute chemical probes and small molecules. Soon, scientists creating other protein tools, including antibodies, will have a unique opportunity to bring quality reagents to biologists.
Currently, “the infrastructure is so, so much better for the small molecules,” said Stephen Fuchs, director of partnerships and alliances at the Institute for Protein Innovation (IPI) and a co-author on Target 2035’s latest report. “On the antibody side, even if tools exist, it’s hard to know which tools are any good.”
That’s partly due to chronic problems in the creation, distribution of and access to reliable antibodies. A 2008 study of more than 20,000 commercial antibodies found that half failed quality assurance tests, undermining the reproducibility of experiments they might be used in.
“Right now, 95% or more of all antibody discovery is aimed at drugs,” Fuchs said. “That’s not actually fixing this problem.”
IPI aims to contribute novel antibodies, including those that block function. Sharing data about its antibodies and other protein tools with scientists using the reagents will enable teams to affirm the quality of the proteins prior to using them in their assays.
Identifying the needs of protein scientists of all specialties is crucial to strengthening Target 2035’s open science aims, Fuchs added. Reducing duplication costs by sharing resources, including knockout cell lines — widely used, but notoriously expensive to make — is also key.
“Somebody’s got to do that work,” he said. “And you shouldn’t have to do it twice.”
Writer: Halle Marchese, firstname.lastname@example.org