Skip to Main Content

Manage Research Data: Introduction

Introduction to managing research data

Institutions and individuals are discovering the benefits to planning and documenting research data. The amount of data to be collected (size of files), security of data collected, responsibilities and ownership of data is being considered at an earlier stage of the research lifecycle. Both national and international funding bodies increasingly require researchers to provide evidence of appropriate data management and curation in grant applications.

ECU researchers are encouraged to integrate sound data management practices into their projects from the Pre-research stage. The Research Ethics Requirements and Research Data Management Planning processes address interrelated issues. ECU's Research Integrity provide further information about research management at ECU.

Why is Metadata important to Research Data Management?

Recording accurate and useful metadata from the outset of your research project will not only make it easier for you to find, access and analyse the data as the project continues but will also make the completion of data management plans and grant applications quicker and easier. Publishing your data as a supplement to a journal article or as a standalone dataset is also much simpler and more useful to the research community if relevant metadata has been recorded throughout the data collection stages.

 

What is Metadata?

Metadata can be described as structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource (National Information Standards Organization (NISO).

  • Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords, for tables: column and row names, etc.
  • Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.

  • Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it.

 

Metadata is often referred to as 'data about data'. It contains information that describes the data and its attributes. This information aids in both users discovery of the data and also their ability to reuse the data. The aim of metadata is to provide useful and accurate information that pertains to the dataset in a structured format.

There are several metadata schemas, these control the way that the metadata is structured and the type of information that is recorded. This is often disciplinary specific, and relates to the type of information about data that is important within that field. There are many similarities between these schemas, however they all have the same aim and will contain much of the same foundational information. These schemas can assist you with documenting your data as they contain information that is important to your field.

For further information see the NISO Understanding metadata page, the Australian Research Data Commons (ARDC) guide to metadata and the Jisc metadata page.

When creating a file name keep the golden rule in mind: Will this name allow me to find, and understand what the file contains in 6 months time?

What is data documentation?

By recording and documenting information about your data you add context and useful information that ensures the reusability of your data for both yourself and others.
Data documentation is best practice in data management and should be undertaken concurrently with the research project at all stages. Any information that is important to understanding your data should be maintained, and serves as a reminder to the processes and decisions you undertook with your data.

 

What should you be documenting?

There is no one sized fits all approach to data documentation. Each dataset, research project, and discipline will have its own requirements. However, one thing remains consistent in all cases- you should record any information that could be important later, and record the maximum amount of information you can, allowing for any time contraints you may have. This information could end up being vital to your research later, or in understanding your results. Data documentation also increases your own ability to reuse your data as well as future use by other researchers.

 

A non exhaustive guide to elements you may wish to document:

  • The context of how the data was collected
  • Creators of the dataset, and their author identifiers such as ORCID
  • The methods and methodology used in the data collection
  • How the data was generated, including what equipment and software was used
  • The organisation and structure of the data files, and associated files
  • How the data has been validated and how the quality was assessed
  • Data manipulation and analysis that has been performed
  • Licensing of the data - access, use and confidentiality
  • Variable names and explanation
  • Codes used and any code classification schemes used
  • Any missing or erroneous data points and the codes used to identify them
  • Special terminology and acronyms
  • Algorithms and formulas used
  • File format and any software necessary to access and use the data
  • Information and links to where this data is published and related publications. Along with any persistent identifiers such as DOIs

These files are supplementary to your dataset and are a way of providing future users with required information to understand and use the data. These types of additional files increase the usability and longevity of your data for both yourself and others. They may contain a variety of different types of information about your data, and what they contain will depend on the dataset itself and your discipline's requirements. 

  • Readme.txt files are a simple document that may be downloaded and used as a complement to your data. Often this type of documentation is more narrative then the other types, and may contain additional information and commentary.
  • Codebooks are documents that have the intended purpose of providing a complete explanation of any code or variable used in the dataset. Including information about what values should be expected and what those values correspond to. This enables other users to be confident in your data. The code book should also contain information on what the codes for null values are and explain and identify any missing data. 
  • Data dictionaries have an overlap with the previous two types of documentation. They contain a description of every element of your data, and should contain a list of all fields with a definition or explanation. For information on how to create a data dictionary please see the OSF guide.
Expand all