In the world of protein production, codon optimization is generally considered the go-to method to enhance expression and increase yields. Swapping native DNA for genes with simplified codons can decrease redundancy, increase efficiency and yield more abundant protein returns.
Or so the thinking goes.
But new work from the Institute for Protein Innovation (IPI), posted as a preprint in March, suggests that scientists may have overestimated the value of codon optimization when producing mammalian proteins in mammalian cell lines — a keystone for the antibody production and protein biologics industry. Instead, native constructs function as — or sometimes more — reliably than their optimized counterparts, the study shows, casting doubt on the utility of the practice.
“The long story short is you don’t need to do any codon optimization,” said Rob Meijers, co-author of the paper and head of the Neuroscience Program at IPI. “But if you do it, take control of the optimization.”
The push toward codon optimization
Every biology class cites the twenty-two amino acids that, in mammals, queue into variable-length peptide chains, bending and twisting into millions of unique function-endowing shapes. But, of course, before the amino acids came the codons. These nucleotide trios translate DNA into amino acids.
What makes the translation curious is that, in nature, more than one codon can encode the same amino acid. In the lab, protein engineers have exploited that natural redundancy, flip flopping DNA codons to produce the same final protein, perhaps with greater efficiency.
Schemes of how best to achieve that optimization have relied on databases such as the Codon Adaptation Index (CAI), which tracks codon frequency in highly expressed genes, and the Codon Usage Tabulated from GenBank (CUTG) that compiles usage statistics for more than 200,000 genes across nearly 9,000 organisms.
It became clear that each species had a codon preference. So molecular biologists figured they should match the preference, or codon usage bias, to their model system to keep its translational machinery working smoothly. As gene synthesis technologies matured in the early 2000s, this effort became increasingly commercialized. Companies started offering proprietary algorithms to automatically redesign DNA sequences, often maximizing for metrics based on the CAI and eliminating problem sequence motifs.
In principle, these codon schemes provided labs with a straightforward solution to match host cell biases and appeared to address stubbornly low yields. Some critical researchers identified potential downstream side effects and questioned the underlying assumptions of codon optimization. But with simple tools in hand and a desire for high yields, optimization became a routine first step in recombinant protein engineering.
“People believed codon optimization was the way to go,” said Haisun Zhu, associate director of the IPI platform and co-author on the paper. “All you need to do is pick a codon scheme and go with it.”
A codon showdown
Like many others, IPI was searching for ways to raise its protein yields to enhance the productivity of its antibody discovery platform. IPI scientists tried many of these optimization tools and noticed some schemes worked better than others.
But at that point, “it was all hand waving,” Zhu said. The team could find no study that verified scheme performance showed a clear need for codon optimization.
So, IPI decided to conduct a methods-focused “bake-off” pitting five different codon optimization strategies against each other, each of which applied a different optimization philosophy.

The Skewing (or Skewed) method exclusively used the most abundant codon in the host organism for each amino acid. The Harmonization approach redistributed rare codons to best recapitulate the gene expression frequency in the native DNA sequence. The LinearDesign scenario sought to stabilize the mRNA middleman between DNA and amino acid, employing an algorithm to limit rare codons while minimizing the free energy of the messenger RNA. A native codon and a proprietary company scheme, similar in strategy to the Skewing method, rounded out the competition.
The team, including former research assistant Chang Yang and summer intern Raina Soni, tested the merit of each scheme by using it to construct and express 18 neurologically relevant human and murine glycoproteins.
The scientists picked a tough challenge, proteins linked to Wnt signaling — including ROR1/2, SFRP1/2 and LRP5/6 — that had previously given low protein yields.
“We were trying to see if using different codon optimization algorithms would salvage [the proteins] and make them express more,” Yang said.
The team seized the opportunity to test an IPI-made vector as the scaffold for all the protein constructs. The so-called pTipi2 vector is a slimmed down, open-source version of commercial vectors that IPI has made available at low cost through Addgene. In the IPI study, pTipi2 was shown to produce high yields for epitope tag antibodies and antibodies produced on the IPI platform showing it to be a feasible alternative to other expression vectors that often come with commercial restrictions on downstream use and distribution.

B) Protein yields for antibodies produced using the pTipi2.1 expression vector after one step of affinity purification with protein A. IPI figure.
Unexpected outcomes
After developing the vector, IPI researchers conducted 90 small-scale screens to compare the expression levels of all 18 glycoproteins using each scheme. The worst performer was the LinearDesign method, created to maximize messenger RNA stability, which produced the lowest yields. This was somewhat surprising, according to Meijers, because it was assumed that RNA stability would benefit the RNA lifetime, which should boost protein expression.

Native and Harmonized gave the most robust performances; they were never the worst performer for any of the 18 glycoproteins tested. The proprietary algorithm’s results fluctuated, showing extreme high and low levels depending on the protein. Interestingly, the Skewed method that only used the 20 most abundant codons did remarkably well, and in some cases gave the best yields.
To further confirm their results, the IPI team scaled up the expression of three proteins, hROR1, hROR2 and mLRP6. Again, the LinearDesign scheme showed poor performance, while the Harmonized, proprietary and native schemes all gave similar results. The Skewed method produced markedly high yields for one protein, indicating that this extreme method might be a reasonable alternative for labs that need to make the same protein many times.
Balancing priorities
In sum, the team found that codon optimization to make human proteins, in a human cell line, did not generate increased yields or produce much additional material.
In fact, choosing the wrong optimization scheme at the forefront of a project could have detrimental results, leading researchers toward uncertain or even depleted yields. Most codon optimization algorithms rely on some level of randomization, and unpredictable variation occurs every time the algorithm runs.
“It’s like playing the lottery, but you can only lose,” says Meijers. “It’s better to take command of this aspect of protein production.”
Though some labs developing robust platforms may still benefit from testing multiple codon usage strategies, the results signal that simpler might be better.
Afterall, “evolution is best,” Zhu said.
Sources:
Haisun Zhu, haisun.zhu@proteininnovation.org
Rob Meijers, rob.meijers@proteininnovation.org
About IPI
The Institute for Protein Innovation is pioneering a new approach to scientific discovery and collaboration. As a nonprofit research institute, we provide the biomedical research community with synthetic antibodies and deep protein expertise, empowering scientists to explore fundamental biological processes and pinpoint new targets for therapeutic development. Our mission is to advance protein science to accelerate research and improve human health. For more information, visit proteininnovation.org or follow us on social media, @ipiproteins.


