Research shared in Nature in May 2023 details work by a team from Baidu Research to develop an AI algorithm that can efficiently design highly stable COVID-19 mRNA vaccine sequences. The algorithm is called LinearDesign, and achieved a 128-fold increase in the vaccine’s antibody response. The paper, currently unedited, has been made available in the publication to “give early access to its findings”.  

Baidu’s “major leap” 

In a statement from Baidu Research the company described this research as a “major leap” for vaccine sequences. It came about through a collaboration with Oregon State University, StemiRNA Therapeutics, and the University of Rochester Medical Centre. The company states that the publication, shared through Accelerated Article Preview (AAP), marks the “first time a Chinese tech company has been credited as the first affiliation on a paper published in Nature”.  

“The paper reveals how a complex biology problem can be tackled by taking a classic approach from natural language processing (NLP), using an elegantly simple solution that has been employed to understand words and grammar.”  

Instability and insufficient protein expression 

During the COVID-19 pandemic, mRNA has made its name as a “revolutionary technology” for vaccine development. Baidu describes it as a “vital messenger” that carries “genetic instructions from DNA” to the “cell’s protein-making machinery”. 

“mRNA enables the creation of specific proteins for various functions in the human body.” 

Furthermore, it has “numerous advantages” from safety to production, which allowed its adoption for use in the pandemic. However, “natural instability” results in insufficient protein expression, which “weakens a vaccine’s capacity to stimulate strong immune responses”. This also presents challenges for storage and transport, particularly in developing countries.  

Using NLP  

Baidu indicates that previous research into optimising the secondary structure stability of mRNA, when combined with optimal codons, has led to improved protein expression. However, the challenge “lies in the mRNA design space”. Due to “synonymous codons”, this is “incredibly vast”.  

Although NLP and biology appear unrelated fields, they share “strong mathematical connections” according to the team at Baidu. They compare human language, which comprises a word sequence and underlying syntactic tree to convey a meaning, with an RNA strand. This has a nucleotide sequence and “associated secondary structure” based on the folding pattern.  


The team used lattice parsing, a language processing technique, which represents potential words connections in a graph and selects the most likely option based on grammar. Manipulating it for mRNA purposes, the researchers created a graph that “compactly represents all mRNA candidates” using deterministic finite-state automation (DFA).  

With this process LinearDesign takes “a mere 11 minutes” to generate the most stable mRNA sequence that encodes Spike protein. When compared to existing vaccine sequences the sequences designed by LinearDesign demonstrated “significantly improved results”. For COVID-19 sequences the algorithm achieved up to a 5-fold increase in stability, a 3-fold increase in protein expression within 48 hours, and an “incredible” 128-fold increase in antibody response.  

Dr He Zhang, Software Engineer at Baidu Research, hopes this work can apply mRNA medicine encoding to a “wider range of therapeutic proteins” with the promise of “broad applications and far-reaching impact”.  

“The vaccines designed through our method may offer better protection with the same dosage, and potentially provide equal protection with a smaller dose, leading to fewer side effects.”  

Dr Zhang hopes this will “greatly reduce the vaccine research and development costs” while “improving the outcomes”. Baidu emphasises that it will keep exploring the AI applications in life sciences, hoping to broaden the “scope and depth of inclusive technology”. 

“Championing the health and well-being of all humanity.” 

For more technological advances in vaccine development, head to our technology section or subscribe for regular updates in your inbox!