Neuroscience graduate students deserve comprehensive data-literacy education

Despite growing requirements around how to handle and share data, formal training is lacking.

Illustration of a scientist attempting to wrangle many forms of data at once: a pile of charts and graphs threatens to knock them off of their feet as they attempt to prop it up.
Culture shift: Adding data literacy training to graduate school curricula will require investment in the time required to learn and teach these new skills.
Illustration by Daniel Barreto

Data-handling and data-sharing are an essential part of the research endeavor, yet most trainees learn how to do it through trial and error. Before I went to graduate school, I worked at a contract research organization supporting data management and sharing for clinical trials across the United States. I anticipated that all I had learned in that highly standardized environment would give me a leg up, but I was shocked to find that formal graduate school education completely left out essential basic concepts, such as data validation, metadata and stable file formats.

In the past 10 years, data-sharing requirements have expanded, but little has changed to institutionalize data-literacy training, outside of the occasional workshop or boot camp. As a faculty member, I developed my own course and was surprised to discover that a definition of basic data-literacy skills has yet to be established for graduate students in the U.S. Some American universities outline broad core competencies for trainees in the biomedical sciences (see, for example, those for the University of California, Berkeley and Drexel University), but these outlines provide virtually no practical details about working with research data. (The situation is similar in the European Union and Canada.)

If we want to move neuroscience forward, this must change. The next generation deserves formal, sustained data-literacy education. This type of training can have a profound impact on a student’s approach to science—and subsequently on the field.

Neuroscience presents a particular challenge for data-sharing, only increasing the need for training. The field is constantly evolving, and so are the data. Just as researchers must learn new techniques, they also need to evolve their data-handling skills. To prepare students for the demands of the field, we need faculty who are dedicated to developing innovative course content, keeping up with disciplinary changes and delivering the new material.

To implement such training on a broad scale will require a cultural shift, however. We need more people to begin to develop the required coursework, and we need to make it a priority.

I

n 2016, three years after I finished my graduate training, I joined the Oregon Health & Science University library as the “biomedical science liaison,” charged with bridging communication between researchers and librarians. As a neuroscientist, I was able to identify and articulate the needs of the research community to other faculty librarians and, conversely, to convey to researchers what library resources and expertise were available to them.

It was from my librarian colleagues that I grew my personal understanding of open-source research tools, such as Git and R, as alternatives to high-cost statistical software. I also became well versed in the value of data-licensing agreements, embargoes and metadata as a way to promote data-sharing while combatting concerns of being “scooped.” And I became especially interested in introducing trainees to these resources; as our future scientific leaders, they could be the most influential group to learn and implement best practices in data-handling and data-sharing within the research ecosystem.

Building on workshop sessions I offered through the library, and on conversations with students, I created a credited course in 2017 based entirely on research methods. The aim was to expose first- and second-year graduate students to data-literacy topics early in their training and to give them protected time to learn new skills. In designing the curriculum, I curated resources from a variety of organizations, including The Carpentries and the Research Data Alliance, and from notable researchers, such as Jeff Leek and Lisa Federer, who advocate for data-sharing and open-science practices.

I organized the curriculum in alignment with recently established FAIR (Findable, Accessible, Interoperable and Reusable) data principles, which have been adopted within a global scientific network and, notably, through the mission of the International Neuroinformatics Coordinating Facility. I also made a point to show students how data-handling and data-sharing are intimately connected. To effectively share data, the researcher must, in a sense, “reverse engineer” a data-handling plan, thinking ahead to the prospective experimental outcomes in order to define data types, standards, formatting and other factors.

As we approached the end of the term, it was clear that the one-time offering was not enough. Students wanted ongoing opportunities for mentored training, specifically for working with their data during their doctoral studies. Applying what they learned in the classroom to their evolving research questions had its own challenges, particularly given the significant time gaps between skill-building and practical application. But having proficiency in preserving and sharing data is particularly important at the end of the Ph.D. training period, as graduate students wrap up their work and prepare to leave the lab.

M

ost graduate programs expect students to learn these skills on the fly, but it’s unreasonable to ask them to piece together an individual academic program from data-literacy tools, skills-development workshops and other resources. Self-directed efforts by students should supplement rather than replace the didactic education delivered by an expert, mirroring their experiences with all other compulsory core courses in their graduate training.

Adding this training to graduate school curricula will require a culture shift around how we invest in the time required to learn and teach these new skills. This will be particularly difficult at research-intensive institutions. As someone who also advocates for STEM equity, I’ve learned that changing culture requires a collective shift in attitudes and beliefs. As a start, we need to provide graduate students with protected time to learn these skills. Training programs should include data-literacy courses and credits in their degree requirements, signaling that advanced data-literacy skills are valuable not just to the trainee but to the overall scientific enterprise.

These kinds of courses will also create a demand for faculty with relevant expertise. To incentivize faculty to develop these skills, thereby increasing the pool of available educators, institutions could include data-handling and data-sharing practices as a standard component of annual reviews and promotion. The creation of faculty awards and other forms of recognition would also help encourage teaching and development of new curriculum.

Through my own course, I saw firsthand how important these efforts are. Graduate students told me how empowered they felt after developing their data-literacy skills. Students requested time after class to show me how they implemented what they learned in class to their own data, completing their first RStudio data analysis and code-sharing, for example. And they brought that knowledge to their labs, making plans to tidy up messy document-storage systems. These junior scientists were on their way to becoming the change we so needed them to be. Soon, I hope to see the same revolutionary spirit from my faculty peers.

Get alerts for “Open neuroscience and data-sharing” in your inbox.

This series of scientist-written essays explores some of the benefits and challenges of data-sharing.