Transforming Research with Data Curation Practices

Written by Center for Open Science | Dec 11, 2024 8:02:03 PM

The following blog is based on a recent webinar with the Data Curation Network. You can watch the full recording here.

Data curation is the ongoing process of managing research data throughout its lifecycle. This involves organizing, describing, cleaning, and preserving data to make it findable, accessible, interoperable, and reusable (FAIR).

The Data Curation Network (DCN), a membership organization of institutional and non-profit data repositories across the United States, advances open research by making data more ethical, reusable, and understandable. As part of these efforts, the DCN creates and disseminates curation best practices, develops community-focused resources and trainings, and facilitates dialogue around data curation.

As DCN members emphasize, data curation is essential for preserving and enhancing research— in addition to boosting discoverability, reusability, impact, and integrity.

Why Curate Data?

As Wanda Marsolek, Data Curation Librarian at the University of Minnesota, explained, raw data can be messy and lacking in context. Without proper curation, data can be fragmented, inconsistent, and difficult to interpret.

By way of example, Marsolek shared a data set that did not contain variable labels indicating what each column represents.

“If we were to discover this data, we wouldn't be able to reuse it,” they said. “We wouldn’t have any idea what these rows or columns are – units of measurement, or anything like that.”

Data curation is also critical because digital file formats frequently become outdated or obsolete, and data stored in older formats can become inaccessible. Moreover, data is often scattered across various devices and locations. Proper organization and management through curation ensures that data remains discoverable, retrievable, and reusable over time.

“We can't just think about sharing data once, or even curating it once,” Marsolek said, “We have to think of sharing and curation as an ongoing process, especially as the scholarly outputs are reused.”

Who is Involved in Data Curation?

Data curation is a partnership between researchers, repository systems, and data professionals and curators. During the curation process, data is systematically organized, described, cleaned, enhanced, and preserved for public use – similar to how a museum curator prepares artwork for exhibition.

“Data curators are well-positioned to provide feedback on what should be shared openly, where the output should be shared, and what, if any, access restrictions might need to be placed on the data,” Marsolek said. “It is the role of the data curator to support researchers in this effort by providing different levels of curation support."

Levels of Data Curation

Data sets can be reviewed at different depths, which then impact findability, access, and fairness. Decisions around which level to utilize may be based on time and capacity limitations, knowledge constraints, the needs of the data, and the partnership between collaborators.

In Level Zero, data is deposited exactly as submitted without cleaning, standardization, or quality checks. In Level 1, Record Level Curation, metadata is briefly reviewed. At Level 2, File Level Curation, the file arrangement is reviewed and format conversions are performed for increased accessibility. Level 3, Documentation Level Curation, is a review of documentation, along with requesting or adding missing information for increased reusability. Level 4, Data Level Curation, includes all the above in addition to reviewing data contents and annotating or editing for accuracy and interoperability.

“No one of these [levels] is better than the other,” Marsolek said. “It can be a combination, and it’s actually a negotiation between curators, repositories, and researchers.”

“Not all data needs to be kept forever and curated really in-depth,” they continued. “We need to accept this and know when to curate the record and keep on going.”

Data Curation Resources

Sophia Lafferty-Hess, Senior Research Data Management Consultant and Curator at Duke University Libraries, highlighted several data curation resources that the DCN has created or helped facilitate.

The DCN has developed a standardized model for curating research data called CURATE(D). Designed as a teaching tool, the CURATE(D) steps are a guide for onboarding data curators and assisting researchers as they prepare to share their data. The DCN provides detailed checklists for each of the CURATE(D) steps.

As Lafferty-Hess advised, curation may not follow the same workflow each time, as procedures may differ depending on data needs or institutional processes. While presented sequentially, the CURATE(D) process is not necessarily linear. Curation can jump between steps and repeat actions as necessary.

The DCN offers online training modules that follow the CURATE(D) workshop curriculum. These materials are designed for those who are new to data curation or looking to refresh their skills, as well as researchers who want to apply curation knowledge to managing their data.

Additionally, the DCN has created peer reviewed data curation primers. These community-developed resources guide curators through processes for specific disciplines, data types, file formats, and curation topics. The DCN also maintains a robust list of curator resources and tools created by their members and the broader library and information science community.

Data Curation Using the OSF

The OSF, developed by COS, is a free and open-source platform that supports researchers throughout their entire project lifecycle.

The OSF facilitates data curation by enabling researchers to organize, store, version control, and enrich their data. Users can add metadata at multiple levels, including detailed descriptions, resource types, licensing information, and funding details. This allows for greater discoverability, access, and reuse of data while maintaining a record of changes and ensuring data integrity.

The platform also offers features like community-created metadata templates that allow researchers to add discipline-specific descriptive details. Additionally, a built-in Wiki feature enables researchers to create comprehensive README files that provide context and interpretation guidelines for their research materials.

For institutional users, the OSF offers advanced curation features including authentication, content aggregation, and user activity metrics – enabling institutional liaisons such as research support staff and data management librarians to see where they may be able to offer additional support to researchers.

Together, resources like those offered by the DCN and platforms like the OSF play a pivotal role in advancing proper curation practices and ensuring that research data remains accessible, usable, and impactful over time.

View full post