Home: Research Data Management with DataLad
Tutorial: Research Data Management with DataLad
NoteAcknowledgements
This tutorial was initially created by Adina Wagner for the 2020 OHBM Brainhack Traintrack session on DataLad. This notebook accompanies this tutorial video by Adina Wagner.
What is DataLad?
DataLad is a data management multitool that can assist you in handling the entire life cycle of digital objects. It is a command-line tool, free and open source, and available for all major operating systems. In the command line, all operations begin with the general datalad command.
Tutorial contents
This tutorial covers:
- Introduction and setup - Getting started with DataLad
- Creating a DataLad dataset - Basic dataset creation and management
- Version control workflows - Managing data and code changes over time
- Dataset consumption and nesting - Working with existing datasets and subdatasets
- Dataset nesting - Advanced nested dataset structures
- More on data versioning, nesting, and a glimpse into reproducible paper - Real-world examples
- Full provenance capture and reproducibility - Complete workflow tracking and replication
Getting started
To start the tutorial:
Key features of DataLad
- Version control for data - Track changes to datasets of any size
- Data consumption - Install and manage datasets from remote sources
- Reproducible workflows - Capture complete provenance of data processing
- Collaboration - Share and synchronize datasets across teams
- Storage flexibility - Work with data locally or on remote storage