Home: Research Data Management with DataLad

Tutorial: Research Data Management with DataLad

Author
Published

May 8, 2026

This tutorial was initially created by Adina Wagner for the 2020 OHBM Brainhack Traintrack session on DataLad. This notebook accompanies this tutorial video by Adina Wagner.

What is DataLad?

DataLad is a data management multitool that can assist you in handling the entire life cycle of digital objects. It is a command-line tool, free and open source, and available for all major operating systems. In the command line, all operations begin with the general datalad command.

Tutorial contents

This tutorial covers:

  1. Introduction and setup - Getting started with DataLad
  2. Creating a DataLad dataset - Basic dataset creation and management
  3. Version control workflows - Managing data and code changes over time
  4. Dataset consumption and nesting - Working with existing datasets and subdatasets
  5. Dataset nesting - Advanced nested dataset structures
  6. More on data versioning, nesting, and a glimpse into reproducible paper - Real-world examples
  7. Full provenance capture and reproducibility - Complete workflow tracking and replication

Getting started

To start the tutorial:

  1. Visit the Setup page for installation instructions
  2. Follow along with the Tutorial for hands-on practice

Key features of DataLad

  • Version control for data - Track changes to datasets of any size
  • Data consumption - Install and manage datasets from remote sources
  • Reproducible workflows - Capture complete provenance of data processing
  • Collaboration - Share and synchronize datasets across teams
  • Storage flexibility - Work with data locally or on remote storage

Resources

Back to top