From Research to Researcher: Producing Findable, Accessible, Interoperable and Reusable Data

Photo Credit: Theodore Nelson

For the past six weeks, I have been fortunate to join peers from across the United States in learning about the principles of the Findable, Accessible, Interoperable and Reusable (FAIR) Data, both in its generation and application through the dkNET Summer of Data Student Program.

dkNET is a working group that attempts to integrate data in a FAIR manner across scientific disciplines, which then allows for meaningful collaboration with lab partners and more broadly across the medical and scientific community[1] .At each session, I learned about the challenges inherent in creating a reproducible scientific ecosystem and infrastructure and, with dkNET a pioneer within this field, had some significant take-aways from the program. First, all published products require a persistent and unique identifier, with the most well-known being the digital object identifier (DOI)[2] . DOIs have been widely applied within journal publishing, and now serve as the gold standard for digital objects that exhibit the capacity for easy access and permanence. Within our community, Columbia Libraries manages Academic Commons, a digital repository of scholarship produced by academics at Columbia University. For wet-lab protocols, we were introduced to protocols.io, which assigns a DOI to each published protocol, and allows groups to directly fork, or takeing a particular methodology and building on it, existing protocols, maintaining a clear and communicable genealogy for a particular research methodology. dkNET maintains an up-to-date curated list of repositories for scientific data deposition, attempting to ensure best practices across disciplines. 

Many components of published projects require something like a DOI. Unique identifiers help to disambiguate similarly named things and make it easy for aggregating information on them. On the micro-level, the methodologies and data assigned a DOI require specific materials and tools to be reproducible. These can now be collated across studies utilizing Research Resource Identifiers (RRIDs), which allow for adequate cataloging of wet-lab reagents and tools. Based on these studies, dkNET produces resource reports, which demonstrate the level of application and validation for a particular item. Research involving antibodies and cell lines for example, are plagued by concerns related to lack of specificity and contamination, respectively. dkNET makes it possible to both avoid these issues and identify the reagent with the greatest validation across the scientific literature.

In sum, once a base of literature has been built around FAIR data principles, the resulting computational analysis and statistical power can be applied to the underlying biological data and results. dkNET stands at the forefront of these efforts, and as FAIR data management practices continue to expand, the opportunity for this specific field of meta-analysis research will grow exponentially.

This entry was posted in major research, STEM research, Summer Research. Bookmark the permalink.