Software forecasts effects of mysterious mutations
A new computer program predicts the effects of mutations in regions of the genome that control gene expression. Researchers can use the tool, called DeepBind, to gauge whether autism-linked mutations might block the genetic landing strips for regulatory proteins.
A computer program called DeepBind predicts the effects of mutations in regions of the genome that control gene expression. Researchers can use the tool, described 27 July in Nature Biotechnology, to gauge whether autism-linked mutations might block the genetic landing strips for regulatory proteins1.
Many mutations associated with autism are located in genes that encode key proteins, disrupting the proteins in obvious ways. They can also prevent proteins from attaching to RNA — a key step in protein production.
But some mutations fall outside genes, making their effects harder to parse. For instance, mutations can alter DNA in a way that blocks the binding of proteins called transcription factors, which regulate gene expression.
Studies have mapped many of the DNA and RNA stretches that interact with these regulatory proteins. DeepBind uses this information and a technique called ‘deep learning’ to predict whether a protein of interest is likely to bind to a particular sequence.
The program analyzes known DNA-protein interactions for short nucleotide segments called motifs. A given motif, or combination of motifs, can either enhance or diminish a pairing. Based on the patterns it discovers, the program builds a model for the protein that predicts its affinity, expressed as a ‘binding score,’ for any DNA or RNA sequence.
The researchers also created a ‘mutation map,’ which plots the effects of changes in a protein’s target sequence on its binding score. They tested their software on proteins with known DNA and RNA targets.
In one case, they analyzed mutations leading to an inherited form of high cholesterol. One of these mutations lies near a cholesterol-processing gene called LDLR. It prevents the DNA from interacting with a protein called SP1, which regulates LDLR expression.
The researchers used DeepBind to identify whether other nearby mutations also disrupt LDLRregulation by SP1. The program pinpointed a cluster of mutations as likely to have an effect on SP1 binding. They then found examples in the scientific literature of around a dozen of these mutations in people with cholesterol problems. They also identified several new mutations that would be likely to lead to high cholesterol.
DeepBind’s performance topped that of 26 other tools in predicting the consequences of mutations on protein-DNA binding. The program also performs well when trained with data from a variety of other sources.
A similar tool, described 24 August in Nature Methods, also uses deep learning to predict how mutations disrupt the binding of regulatory proteins to DNA2.
The researchers have posted the code for DeepBind online. They have also released DeepBind’s models for 538 proteins that interact with DNA and 194 that connect with RNA — including the autism-linked proteins FMRP, RBFOX1 and FOXP2.