Skip to Main Content

Manage Your Research Data: Publishing your dataset

Research data play a crucial role in ensuring the reproducibility and transparency of scientific findings. They enable the reuse and re-analysis of data, as well as the integration of data from various sources, thereby opening up opportunities for further research and the generation of new knowledge. Ideally, reusability encompasses the rights to download, copy, distribute, and automatically process the data, all without financial, technical, or legal barriers. Publishing research data also enhances their citability, thereby boosting the scientific reputation of the authors.

benefits with publishing dataset chart


benefits of sharing Data


 

Before publishing data, researchers or data authors must ensure that the dataset is thoroughly prepared. Here are the key steps to follow:

Ensure the dataset is cleaned, verified for accuracy, and suitable for its intended use.


 

Organise the dataset in a well-structured manner.

 

Provide comprehensive documentation, including rich metadata, methodology details, a codebook or variable descriptions, and any data collection tools like questionnaires, if applicable.

Consider any privacy, confidentiality, and security issues to determine if the dataset can be published. If necessary, anonymise or de-identify the data.

Ensure the dataset is in reusable file formats, such as open standard formats like CSV files or widely accepted proprietary formats.

Apply an appropriate license to your dataset to clarify how others can use it. Consider using licenses like Creative Commons to specify usage rights.

Make sure the dataset is accessible to a broad audience, including those with disabilities. This might involve providing alternative text for images or ensuring that data tables are screen reader-friendly.

 

Sharing and Access Control are crucial in research data management, especially when considering the FAIR and CARE principles. By adhering to these principles, researchers can ensure their data is both useful and ethically managed, fostering trust and collaboration.

 

Fair Data Principles

Data should be easy to find for both humans and machines.

How to achieve it?

Keep Necessary Software: Ensure you have the software needed to open and work with your data, keeping a copy if it’s not widely available.


Data should be openly accessible, with clear conditions for use.

How to achieve it?

Check Accessibility Regularly: Periodically confirm that you can open and access your data files to catch any issues early.


Data should be able to integrate with other datasets and tools.

How to achieve it?

Use Non-Proprietary Formats: Store your data in widely supported formats like .csv or .txt to ensure long-term accessibility, interoperability and flexibility.


Data should be well-documented and provided with clear licenses to ensure that it can be reused for future research.

How to achieve it?

Archive Raw Data: Always keep a copy of your raw data for future reference and validation.

Maintain Backups: Back up your data in a separate folder, ensuring compliance with any sensitivity and security concerns.

Care Data Principles

Data should be collected and used in ways that benefit the communities from which it originates.

How to achieve it?

Use data anonymisation techniques where necessary to protect the identity and privacy of individuals while still allowing for broader, community-wide benefits. Implement data access policies that promote open data sharing within ethical boundaries, while ensuring the protection of sensitive data.


Communities should have the authority to control the data that belongs or comes from them. 

How to achieve it?

Enabling communities to have decision-making power over how their data is used. Ensure that communities retain the right to control access to data and decide whether it can be shared with third parties.


Researchers and organisations have a responsibility to manage data ethically and responsibly.

How to achieve it?

Data Stewardship: Take responsibility for managing the integrity, security, and accessibility of data throughout its lifecycle. This includes ensuring data is accurate, updated, and securely stored.

Addressing biases in data and ensuring that data use does not harm any individual or community.


Ethical considerations should be the priority of data management practices.

How to achieve it?

Ensure ethical data collection by obtaining informed consent from data contributors and explaining how their data will be used, shared, and protected. Data anonymisation, ensures individuals’ privacy is maintained while still enabling data to be used for broader societal benefits.

 

Funders and publishers mandate or encourage data sharing to enhance the transparency, reproducibility, and impact of research: 

The ARC encourages researchers to deposit data from their projects in publicly accessible repositories.


 

This code includes guidelines on the management of data and information in research, supporting responsible research conduct.

 

These guidelines stipulate that datasets should be made available for reuse unless restricted by compliance obligations.

Their guide on sharing and citing data provides valuable insights into publisher policies and highlights the benefits of data sharing.

This publisher has implemented comprehensive data-sharing policies to increase transparency and reproducibility in research.

 

ARDC logo

ARDC Research Data Rights Management Guide

Choose the best copyright license to meet the requirements of sharing your data, maximise the ability of reuse for innovation whilst meeting legal, ethical and grant requirements. 

The ARDC Creator flowchart offers data creators an easy-to-follow checklist for addressing licensing queries.

When using datasets created by others, it is important to exercise caution and fully understand the licensing terms. In some instances, seeking legal advice is necessary.


 

Managing sensitive data responsibly is crucial for maintaining trust and ensuring compliance with ethical and legal standards. Here are some recommendations to consider:

Handle Sensitive Data with Care: Follow ethical guidelines and ensure sensitive or identifiable information is managed securely.

Adhere to Ethics and Compliance: Stick to the procedures outlined in your ethics approval or participant consent documents, including storage and access requirements.

Plan Data Interaction: Consider how you will access, use, and share the data while maintaining security and privacy.

Prepare for Data Breach: Develop a response plan for potential data breaches, notify stakeholders promptly, take corrective actions, and implement preventive measures.

De-identification Techniques
  • Use pseudonyms or identifiers instead of real names.
  • Group ages into ranges rather than using specific birth years.
  • Aggregate data by broader categories, such as region (e.g., urban, rural) instead of specific suburbs.
  • Remove key pieces of information that could lead to identification.

Consider Dataset Combinations

  • Keep in mind that combining datasets can increase the risk of re-identifying individuals.

Alternative Sharing Options

  • Consider publishing only the metadata record or implementing mediated access to your data. This allows others to understand the dataset without exposing sensitive details.