A newly developed statistical tool offers a way to control for unmeasured confounding factors in investigations of the downstream effects of autism-linked variants. Researchers describe the tool, called causarray, in a preprint posted on bioRxiv last month.
Existing statistical methods can help establish which variants or genetic changes are associated with alterations in gene expression but not whether those links are causal, says the study’s principal investigator, Kathryn Roeder, professor of statistics and life sciences at Carnegie Mellon University.
For example, there are more than 100 autism-linked genes, but each likely affects only a small proportion of people with the condition. “A major question for the field is whether [and] how so many distinct genes converge biologically,” says Michael Gandal, associate professor of psychiatry, genetics and pediatrics at the University of Pennsylvania, who was not involved in the tool’s development.
Advanced techniques such as Perturb-seq, which combines gene editing with single-cell RNA sequencing, have enabled scientists to alter genes associated with autism or other brain conditions in individual cells and observe the resulting changes in gene expression in the developing mouse brain. With the growth of these approaches, it is crucial to have statistical tools that can separate the direct effects of genetic changes from indirect effects caused by other factors, which can be hard to spot, Gandal says. “By explicitly modeling unmeasured confounding factors, causarray will account for these potential biases, increasing the robustness of the estimated causal effects,” he says.
W
hen scientists introduce a genetic variant into a cell and compare gene expression with a control cell, many factors besides the variant itself—including differences between experiments and variations in the cells themselves—could affect gene expression.Most tools that control for potential confounders in data adjust for factors such as laboratory conditions or the age and sex of the animal or person the sample came from, but these techniques fail to handle influences that can’t be measured directly, Roeder says.
Causarray helps scientists account for unmeasured confounders. The tool uses a combination of machine learning and statistical techniques to model gene expression and estimate what would have happened if a key factor, such as a treatment or a genetic change, had been different—an approach known as counterfactual. By comparing the counterfactual outcomes with actual outcomes, causarray can isolate the direct effects of genetic variants on gene expression.
Causarray outperformed existing methods that do not use a counterfactual approach, the researchers found. For example, it produced half as many false positives as another method that corrects for hidden biases in gene-expression data, meaning it was better at identifying truly significant genetic changes rather than random noise. When used to analyze Perturb-seq data, causarray identified relationships between autism-related gene variants and downstream effects that more accurately reflect the expected causal effects of perturbations in autism-associated genes, such as the role of SATB2 in neuron development and synapse structure.
Other methods, Roeder adds, associate autism-linked variants with processes such as ribosomal function. Although past studies have linked ribosome activity to autism, these changes could be a ripple effect caused by disruptions in genes that affect synapses.
“I was extremely happy with our analysis,” Roeder says of this experiment. “Each one of these autism genes that they knocked down had a large number of downstream genes that were significantly changed—and, we believe, in a causal way.”
When used on Alzheimer’s disease data, causarray revealed that some genes related to the condition are more affected by aging than others—a finding that could help scientists identify new treatment targets, the preprint suggests.
T
he causarray tool is similar to matching approaches, which compare similar cells with and without the treatment, but it goes further by estimating hidden influences on gene expression, says Jingyi Jessica Li, professor of statistics at the University of California, Los Angeles, who was not involved in the study.Causarray offers a more accurate way to identify the direct effects of specific genetic changes involved in autism, Gandal says. But, he adds, sometimes confounding factors are tied to a variable of interest, such as a genetic manipulation, making it hard to identify true causal relationships.
Roeder’s latest work is “rigorous and creative,” says Steven McCarroll, professor of biomedical science and genetics at Harvard Medical School, who was not involved in causarray’s development. The study also serves as a reminder of how much can be learned by reanalyzing datasets in new ways, he says. “I expect [causarray] will be impactful for research on the biological basis of brain disorders.”