Colin Carlson, a biologist at Georgetown University, began to worry about smallpox.
This virus, discovered in 1930, spreads among rats, killing them brutally. But scientists have never considered it a potential threat to humans. Now Dr. Carlson, his colleagues and their computers are not so sure.
Using a technique known as machine learning, researchers have spent the past few years programming computers to teach themselves about viruses that can infect human cells. The computers scoured vast amounts of information about the biology and ecology of the animal hosts of those viruses, as well as the genomes and other characteristics of the viruses themselves. Over time, computers have recognized a number of factors that can predict whether a virus is likely to infect humans.
Once computers proved their mettle against the viruses that scientists had worked so hard for, Dr. Carlson and his colleagues deployed them on the unknown, eventually creating a short list of animal viruses capable of crossing species barriers and causing outbreaks in humans.
In the latest runs, the algorithms unexpectedly put the ratpox virus at the top of the dangerous pathogens.
“Every time we run this model, it goes very high,” says Dr. Carlson.
Confused, Dr. Carlson and his colleagues began poring over the scientific literature. They came across documents of a long-forgotten outbreak in 1987 in rural China. The students had infections that caused pharyngitis and hand and foot infections.
Years later, a team of scientists tested throat swabs collected during the outbreak and put into storage. These samples, as the team reported in 2012, contained smallpox DNA. But their research garnered little attention, and a decade later, smallpox is still not considered a threat to humans.
If the computer programmed by Dr. Carlson and his colleagues is right, this virus deserves a new look.
“It is crazy that this is lost in a huge pile that public health has to sift through,” he said. “This really changes the way we think about this virus.”
Scientists have identified about 250 human diseases that arise when an animal virus jumps the species barrier. For example, HIV has spread from chimpanzees, and the new coronavirus has originated in bats.
Ideally, scientists want to spot the next spreading virus before it starts infecting people. But there are too many animal viruses for virologists to study. Scientists have identified more than 1,000 viruses in mammals, but that is most likely a fraction of the true number. Some researchers suspect mammals carry tens of thousands of viruses, while others put the number as high as hundreds of thousands.
To identify new spillovers, researchers like Dr. Carlson are using computers to uncover hidden patterns in scientific data. For example, machines can stop viruses with a special ability to give rise to disease in humans, and can also predict which animals are most likely to harbor dangerous viruses that we don’t know yet. .
“It feels like you have new eyes,” says Barbara Han, a disease ecologist at the Cary Institute for Ecosystem Research in Millbrook, NY who collaborates with Dr. Carlson. “You just can’t see as many dimensions as the model can.”
Dr. Han first learned about machine learning in 2010. Computer scientists have been developing this technique for decades and are starting to build powerful tools with it. Today, machine learning allows computers to detect fraudulent credit charges and recognize people’s faces.
But some researchers have applied machine learning to diseases. Dr. Han wondered if she could use it to answer open-ended questions, such as why less than 10% of rodents harbor pathogens known to infect humans.
She provides computerized information on various rodents from online databases – everything from their weaning age to their population density. The computer then looks for features of rodents known to harbor many species-hopping pathogens.
After the computer created a model, she tested it with another group of rodents, seeing how well it could guess which ones were filled with pathogens. In the end, the computer’s model reached 90 percent accuracy.
Dr. Han then turned to rodents that had yet to be tested for spreading pathogens and came up with a list of high-priority species. Dr. Han and her colleagues predict that species such as the montane vole and the northern locust of western North America will be particularly likely to carry the worrisome disease.
Of all the characteristics that Dr. Han and colleagues provided to their computer, the most important one was the longevity of the rodents. Stillborn species carry more pathogens, perhaps because evolution has focused more resources on reproduction than building a strong immune system.
These results relate to years of painstaking research, during which Dr. Han and her colleagues searched for useful data on ecological databases and scientific studies. Recently, researchers have accelerated this work by building databases explicitly designed to teach computers about viruses and their hosts.
For example, in March, Dr. Carlson and his colleagues published an open access database called VIRION, which collected half a million pieces of information about 9,521 viruses and 3,692 their hosts – and still growing.
Databases like VIRION are now allowing more focused questioning of new pandemics. When the Covid pandemic hit, it was soon learned that it was caused by a new virus called SARS-CoV-2. Dr. Carlson, Dr. Han and their colleagues created programs to identify which animals are most likely to harbor relatives of the new coronavirus.
SARS-CoV-2 belongs to a group of species known as betacoronaviruses, which also include viruses that have caused outbreaks of SARS and MERS in humans. For the most part, betacoronavirus infects bats. When SARS-CoV-2 was discovered in January 2020, 79 species of bats were known to carry them.
But scientists have not systematically searched all 1,447 bat species for betacoronavirus, and such a project would take years to complete.
By feeding biological data about different types of bats – diet, wing length, etc. – into their computers, Dr. Carlson, Dr. Han and their colleagues created a model can make predictions about the most likely bat species. to contain betacoronavirus. They found more than 300 species that fit the bill.
Since that prediction in 2020, the researchers have actually found betacoronavirus in 47 species of bats – all of which were on a list of predictions generated by several computer models they created for the study. his rescue.
Daniel Becker, a disease ecologist at the University of Oklahoma who also works on betacoronavirus research, said it was striking how simple traits like body size can lead to strong predictions about the virus. “Most of it is the low end of comparative biology,” he said.
Dr. Becker is now watching from his own backyard for the list of potential betacoronavirus hosts. It turns out that several species of bats in Oklahoma are predicted to host them.
If Dr. Becker found a betacoronavirus in his backyard, he wouldn’t be in a position to immediately say it was an imminent threat to humans. Scientists will first have to do diligent experiments to assess the risk.
Pranav Pandit, an epidemiologist at the University of California at Davis warns that these models are very much underway. When tested on well-studied viruses, they are fundamentally much better than random chance, but can do better.
“It’s not the stage where we can take those results and create an alarm to start telling the world, ‘This is an animal-to-human virus,'” he said.
Nardus Mollentze, a computational virologist at the University of Glasgow, and his colleagues have pioneered a method that could dramatically increase the accuracy of models. Instead of looking at the virus’ host, their models look at its genes. A computer can be taught to recognize subtle features in the genes of viruses that can infect humans.
In their first report on the technique, Dr. Mollentze and his colleagues developed a model that can accurately identify viruses that infect humans more than 70% of the time. Dr. Mollentze still can’t say why his gene-based model works, but he has some ideas. Our cells can recognize foreign genes and send alarms to the immune system. Viruses that can infect our cells may have the ability to mimic our own DNA as a kind of viral camouflage.
When they applied this model to animal viruses, they came up with a list of 272 species with a high risk of transmission. That’s too much for virologists to delve deeper into.
“You can only work with so many viruses,” said Emmie de Wit, a virologist at the Rocky Mountain Laboratory in Hamilton, Mont., who oversees research on the novel coronavirus, influenza, and other viruses. . “On our side, we really need to narrow it down.”
Dr. Mollentze admits that he and his colleagues need to figure out how to pinpoint the worst of the viruses that infect animals. “This is just a start,” he said.
To continue his original research, Dr Mollentze is working with Dr Carlson and his colleagues to merge data about the virus’s genes with data relevant to the biology and ecology of the host. . Researchers are getting some promising results from this approach, including that lead causes smallpox.
Other types of data can make for better predictions. For example, one of the most important characteristics of a virus is the coating of sugar molecules on its surface. Different viruses end up with different sugar molecule patterns, and that arrangement can have a huge impact on their success. Some viruses can use this molecular coating to hide from the host’s immune system. In other cases, the virus can use its sugar molecules to attach to new cells, causing a new wave of infection.
This month, Dr. Carlson and his colleagues posted an online commentary asserting that machine learning can glean insights from the outer coatings of viruses and their hosts. Scientists have already gathered a lot of that knowledge, but it has yet to be put into a form where computers can learn.
“My intuitive feeling is that we know more than we think we do,” says Dr. Carlson.
Dr. de Wit says that machine learning models could one day guide virologists like her in studying certain animal viruses. “There is definitely a big benefit from this,” she said.
But she notes that models to date have mainly focused on the pathogen’s ability to infect human cells. Before it can cause a new disease in humans, a virus must also spread from person to person and cause severe symptoms along the way. She’s also waiting for a new generation of machine learning models that can make those predictions.
“What we really want to know is not necessarily what kind of virus can infect people, but what kind of virus can cause outbreaks,” she said. “So that’s really the next step that we need to figure out.”