A digitally distorted image of a file folder against a blue gradient background.
Costly concerns: The shift could lead to prohibitive expenses for computation-heavy studies, some researchers say.
Flavio Coelho / Getty Images

Data access changes to UK Biobank stir unease in neuroscientists

“I feel a little bit in limbo,” says neuroscientist Stephanie Noble, who has paused a study using Biobank data after the repository shifted from a data download to a cloud-only access model.

The UK Biobank—a trove of brain scans and other health information from people in the United Kingdom—has changed the way researchers can access its data, the organization announced earlier this month.

Instead of downloading data, researchers must now access and analyze data on the Biobank’s cloud-based platform, according to the announcement. Scientists who have already downloaded data can still analyze those locally until their approved projects end, at which point they must delete them.

The move prompted neuroscientists to express concerns on X about increased fees and logistical burdens associated with the change. Many institutions cover computing costs on their local networks, or even invest in their own supercomputing centers, so individual labs pay only to access the data. Now, labs would need to pay for computing on the cloud platform, which is hosted by Amazon Web Services.

“A lot of research groups that use this sort of data just aren’t going to keep doing research, because it’s just going to be cost prohibitive,” Timothy Raben, a postdoctoral researcher in statistical genetics at Michigan State University, told The Transmitter. Raben’s annual research expenses would double if he ran all of his analyses on the cloud platform, according to a thread he posted on X.

The change will be especially costly and cumbersome for researchers who use imaging data or develop computational pipelines, says Franco Pestilli, associate professor of psychology at the University of Texas at Austin. Those researchers need to download the data for their work, he says, whereas those who study the relationship between lifestyle factors and aging, for example, “might be fine” with sticking to the platform.

Imaging researchers often tweak and re-run an analysis several times, which is not a problem on local networks, but the expense would be “extremely prohibitive” on a commercial cloud, says Elvisha Dhamala, assistant professor of psychiatry at the Feinstein Institutes for Medical Research.

Stephanie Noble, assistant professor in the psychology and bioengineering departments and at the Center for Cognitive and Brain Health at Northeastern University, says she recently obtained approval to use Biobank data in a study and has spent the past month designing an analysis pipeline and planning a budget; she has not yet downloaded any data. When she learned about the data access change, she says, “I was very surprised; I was a bit apprehensive.” Now her preparation is paused while she awaits training on how to use the platform. “I feel a little bit in limbo.”

T

he database launched in 2012 and contains genetic and health information, including lifestyle information, health records, DNA sequences, brain imaging scans, and blood, saliva and urine samples, from 500,000 adults.

The size and depth of the data makes the UK Biobank “one of a kind,” Noble says. “It’s kind of a force to be reckoned with. In my opinion, it seems to open doors to all kinds of research that people would not have been able to do otherwise.”

Yet the size of the dataset—about 30 petabytes, equal to 30 million gigabytes—has made it increasingly complex to manage, says Rory Collins, principal investigator and CEO of the UK Biobank, a nonprofit organization funded by the Wellcome Trust, UK Medical Research Council and other U.K. government and philanthropic entities.

The switch provides greater control over the growing dataset and extra layers of security for sensitive patient health data, Collins says. The transition is “painful, but it is the way to secure the long-term stability of the resource and the ability to make data available to researchers,” Collins says.

The UK Biobank has a program that provides credits in increments of 1,000 pounds (about $1,300) for data computation and storage to early-career researchers and scientists in low-income countries, which “will be extended to all researchers who need such support,” according to the announcement.

Exceptions to the cloud-only rule are also possible. If a researcher would like to use Biobank data but their analysis would be too expensive or cumbersome to run on the cloud platform, they can apply for a download exemption through the access committee.

“I was slightly encouraged by that,” Raben says, “because to me, that means that somebody is at least taking it somewhat seriously, and if they have an exemption policy that works well for most researchers, that would be amazing. I think the proof will be in the pudding.”

But it’s unclear how liberal the exemption policy will be. “We will take it on a case-by-case basis,” Collins says. Researchers must “make a compelling case for an exemption based on not being able to do some kind of research on the [platform], and we would anticipate working with them in parallel to develop the capacity to support such research on the [platform] in future.”

T

he UK Biobank introduced the cloud platform in 2021 as a secure way to store and share genetic sequence data. The genetic data have always been available only in the cloud, Collins says, and the organization planned to move the entire dataset onto the platform when it was ready to handle it. “So now in 2024, it feels like the right time to transition to the data being available only on the platform,” Collins says.

Even more health records may become accessible now that the entire dataset has moved to the cloud model, Collins says. Currently, the dataset contains records from hospitals but not from primary care physician visits, which are “carefully controlled,” Collins says. “I think that this move to the data being available on the platform increases our ability, our likelihood of being able to get primary care data. The value of that would be enormous.”

This decision is the “tip of the iceberg” and signals important questions the neuroscience field should discuss surrounding data and computing infrastructure, Pestilli says.

So far, the financial cost of supercomputing has largely been handled at the institution level, he says: Federal datasets in the United States, such as the Adolescent Brain Cognitive Development (ABCD) Study and the Human Connectome Project, are available for download—often at a financial loss for the federal agency—and researchers analyze the data at national or university computing centers. When data are housed and analyzed in commercial cloud platforms instead, researchers must carve out a portion of their grant to pay for computing costs that they otherwise would not have had to account for. If more datasets—and dollars—migrate into commercial cloud space, national and university computing centers may become obsolete.

What are the consequences of “centralizing data in the hands of the few, versus leaving expertise and ability to host and analyze the data to universities?” Pestilli says. That is the “very big picture that is part of what I think is starting here.”

How will the data access change affect your research? Leave a comment below.

Sign up for our weekly newsletter.

Catch up on what you may have missed from our recent coverage.