FAIR Data

FAIR principles are guidelines towards making all research outputs such as, but not limited to, research data, software, and protocols: Findable, Accessible, Interoperable and Reusable. See: https://doi.org/10.1038/sdata.2016.18

 

FINDABLE:

   F1. (meta)data are assigned a globally unique and persistent identifier

   F2. data are described with rich metadata (defined by R1 below)

   F3. metadata clearly and explicitly include the identifier of the data it describes

   F4. (meta)data are registered or indexed in a searchable resource

ACCESSIBLE:

   A1. (meta)data are retrievable by their identifier using a standardized communications protocol

      A1.1 the protocol is open, free, and universally implementable

      A1.2 the protocol allows for an authentication and authorization procedure, where necessary

   A2. metadata are accessible, even when the data are no longer available

INTEROPERABLE:

   I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

   I2. (meta)data use vocabularies that follow FAIR principles

   I3. (meta)data include qualified references to other (meta)data

REUSABLE:

   R1. meta(data) are richly described with a plurality of accurate and relevant attributes

      R1.1. (meta)data are released with a clear and accessible data usage license. 

               Find help with the License Selector Tool 

      R1.2. (meta)data are associated with detailed provenance

      R1.3. (meta)data meet domain-relevant community standards

FAIR AWARE  - self-assessment tool curated by fairsfair.eu (DANS, DCC, UNIHB)

FAIR self assessment tool by Australian Research Data Commons

FAIR Data is NOT OPEN Data! Remember that data can be restricted and still be FAIR. The guiding principle to follow is “as open as possible, as closed as necessary” according to the specific needs, for example for the sake of protecting sensitive data or Intellectual Property.

 

To know more…

What is the difference between “FAIR data” and “Open data” if there is one?

Three camps, one destination: the intersections of research data management, FAIR and Open

Think about how research and innovation could grow faster owing to increased reproducibility and transparency enabled by FAIR/Open data and think about the people who could benefit from your data. The very first to benefit from FAIR data it’s you! “As a scientist, you should treat your data like a love letter to your future self” (Lambert Heller, German National Library of Science and Technology - Nature Index 360o Feb 2019) 

Source: Sara Jones, DCC, University of Glasgow, Open Science Days 2015, 21st & 23rd April, Prague & Brno http://foster.czu.cz/?r=6661

FAIR metadata standards vary from discipline to discipline. Some resources that may be helpful to find the standards for your specific field of research are

Please be aware that in some fields of  engineering, technology, and design FAIR data standards are still evolving. It is useful to check with your research community regarding development and co-creation of these standards.

The journey towards FAIR data starts even before you start your research project. A DMP often nudges you to think about your RDM practices right from the beginning. During the research cycle, as shown in the example figure below, you will need to manage and store your data during the research, which can be referred to as active data management.

 

Researchers often use university IT storage and compute resources{preferred} , or their own servers for active data management. It is important to consider and devise a storage and backup strategy for your active data management while writing your DMP. At the near conclusion or at the end of the project,  you will also make some choices on the type of data you wish to preserve in the long term. And that process is called archiving. When you archive your research outputs for the long term you need a certified repository, see below. Before you deposit your data or research outputs in a repository, you need to select from all your output, what needs to be preserved.

Source: Stanford University Libraries

Different research disciplines have different research outputs and you may need to consider various components to decide what to select and preserve. Here are some general guidelines that apply to most research disciplines. For your own discipline check with your community best practices and/or consult your RDM consultant.

Definitely deposit:

  • Original data sets, original software code, raw data obtained from analysis of physical samples, observational data that can not be regenerated.
  • Data sets that are not original but that are not easily available online and that you have permission to share.
  • For social science data, include study descriptions, codebooks, and summary statistics.

Maybe deposit:

Intermediate versions of analyses or code if they are potentially useful to others or were used in publications or theses.

Not necessary to deposit:

  • Incomplete, non-functional, or intermediate versions of code that would be of marginal usefulness to others.
  • Output files from analyses if 1) the data set and code used to generate the output are deposited and 2) regenerating the output from the deposited files is fairly easy to do.
  • Data sets that are preserved and accessible via other institutions or organizations.
  • Graphs or charts created from the original data that could easily be regenerated.

Don’t deposit:

Any data that contains personal identifying information for human subjects or data that could breach legal contracts.

Exceptions:

Output files from analyses may be deposited if they are time-intensive to regenerate or are not excessively large, or can not be easily recreated from the deposited data set and code.

Repository within the Research data/output context is a digital environment that allows you to preserve your research data and other digital output for the long term. Essentially it should offer the following functionalities:

  1. Stores the data safely
  2. Make sure the data is findable
  3. Describes the data appropriately (metadata)
  4. Adds license information

You can deposit data  in a general repository (e.g. Zenodo, Harvard Dataverse) or a subject-specific repository (e.g. Dryad). Looking for your discipline? Search www.re3data.org for more suitable data repositories. See a demonstration of searching for research data repositories using the re3data directory.

 

Source: Openaire

Openaire has a detailed costing guide that will give you an indication on time, effort and budget required for RDM activities ranging from data storage, data cleaning, software license fees to data analysis, all the way to archiving in a repository.

Please remember that these costs can be budgeted in your funding proposals.

Source: https://doi.org/10.1515/itit-2019-0040

We cannot imagine conducting research without software. Researchers either use software for research activities or develop their own software as part of their research outputs.

For good scientific practice research software should adhere to the FAIR principles to allow full repeatability, reproducibility, and reuse. Research software should be both archived for reproducibility and actively maintained for reusability.

Publishing research software as open source is an established practice in science on platforms such as Github and Gitlab.

Growing community initiatives like software carpentries help train researchers who don’t have an educational background in software development or programming to establish work flows that help them to manage, track, preserve and if applicable share or publish research software using task automation tools, and version control systems such as Git.

For more on making software FAIR read: Top 10 FAIR Data, Software and Things