Using neural networks, researchers from the University of Copenhagen have developed a new method to search the human genome for beneficial mutations from Neanderthals and other archaic humans. These humans are known to have interbred with modern humans, but the overall fate of the genetic material inherited from them is still largely unknown. Among others, the researchers found previously unreported mutations involved in core pathways in metabolism, blood-related diseases and immunity.
Thousands of years ago, archaic humans such as Neanderthals and Denisovans went extinct. But before that, they interbred with the ancestors of present-day humans, who still to this day carry genetic mutations from the extinct species.
Over 40 percent of the Neanderthal genome is thought to have survived in different present-day humans of non-African descent, but spread out so that any individual genome is only composed of up to two percent Neanderthal material. Some human populations also carry genetic material from Denisovans -- a mysterious group of archaic humans that may have lived in Eastern Eurasia and Oceania thousands of years ago.
The introduction of beneficial genetic material into our gene pool, a process known as adaptive introgression, often happened because it was advantageous to humans after they expanded across the globe. To name a few examples, scientists believe some of the mutations affected skin development and metabolism. But many mutations are yet still undiscovered.
Now, researchers from GLOBE Institute at the University of Copenhagen have developed a new method using deep learning techniques to search the human genome for undiscovered mutations.
"We developed a deep learning method called 'genomatnn' that jointly models introgression, which is the transfer of genetic information between species, and natural selection. The model was developed in order to identify regions in the human genome where this introgression could have happened," says Associate Professor Fernando Racimo, GLOBE Institute, corresponding author of the new study.
"Our method is highly accurate and outcompetes previous approaches in power. We applied it to various human genomic datasets and found several candidate beneficial gene variants that were introduced into the human gene pool," he says.
The new method is based on a so-called convolutional neural network (CNN), which is a type of deep learning framework commonly used in image and video recognition.
Using hundreds of thousands of simulations, the researchers at the University of Copenhagen trained the CNN to identify patterns in images of the genome that would be produced by adaptive introgression with archaic humans.
Besides confirming already suggested genetic mutations from adaptive introgression, the researchers also discovered possible mutations that were not known to be introgressed.
"We recovered previously identified candidates for adaptive introgression in modern humans, as well as several candidates which have not previously been described," says postdoc Graham Gower, first author of the new study.
Some of the previously undescribed mutations are involved in core pathways in human metabolism and immunity.
"In European genomes, we found two strong candidates for adaptive introgression from Neanderthals in regions of the genome that affect phenotypes related to blood, including blood cell counts. In Melanesian genomes, we found candidate variants introgressed from Denisovans that potentially affected a wide range of traits, such as blood-related diseases, tumor suppression, skin development, metabolism, and various neurological diseases. It's not clear how such traits are affected in present-day carriers of the archaic variants, e.g. neutrally, positively or negatively, although historically the introgressed genetic material is assumed to have had a positive effect on those individuals carrying them," he explains.
The next stage for the research team is to adapt the method to more complex demographic and selection scenarios to understand the overall fate of Neanderthal genetic material. Graham Gower points out that the team aims to follow up on the function of the candidate variants in the genome that they found in this study.
Looking forward, it remains a challenge to search the human genome for genetic material from as yet unsampled populations, so-called ghost populations. However, the researchers are hopeful that they can further train the neural network to recognize mutations from these unsampled populations.
"Future work could also involve developing a CNN that can detect adaptive introgression from a ghost population, for cases in which genomic data from the source is unavailable," says Graham Gower.