Introduction to Version Control

Research Data Management for Psychology and Neuroscience
Course at University of Hamburg, RTG 2753: Emotional Learning and Memory
Slides | Source
License: CC BY 4.0

09:30

Schedule

Day Date Time Title
2 2026-02-06 09:30 - 10:00 Introduction to Version Control
2 2026-02-06 10:00 - 12:00 Version Control of Data with DataLad
2 2026-02-06 12:00 - 13:00 Lunch Break
2 2026-02-06 13:00 - 14:00 Data publication
2 2026-02-06 14:00 - 16:00 Integrating with data infrastructure at UHH (and beyond)
2 2026-02-06 16:00 - 16:30 Summary & Outlook

1 Introduction to Version Control

Objectives

💡 You know what version control is.
💡 You can argue why version control is useful (for research).
💡 You can name benefits of Git compared to other approaches to version control.
💡 You can explain the difference between Git and GitHub.

Reading

https://lennartwittkuhn.com/version-control-book/chapters/intro-version-control.html

Scenario 1

Imagine a scenario where you crafted a brilliant paragraph for a manuscript (for example, your paper, thesis, or report), but then accidentally ruined it. How would you retrieve the earlier brilliant version? Is it even possible?

  • “Only if I saved it before - otherwise, I’d have to draft another brilliant paragraph.”
  • “I might be able to find it in a cloud backup, like OneDrive or Google Drive.”
  • “Version Control?”
  • “I’d simply revert to the relevant commit.”

Scenario 2

Consider a situation where you are working with five co-authors on a paper. How do you handle the changes and comments they make to the document? If you’re using LibreOffice Writer or Microsoft Word and you accept changes made using the “Track Changes” option, what happens to the history of those modifications?

  • “I always save new versions of files with my initials (and others’ initials when I receive the document), which often results in having 10–20 different file versions.”
  • “I believe the history of modifications is lost after accepting the changes …”

Why we need version control …

… for code (text files) © Jorge Cham (phdcomics.com)

… for data (binary files) © Jorge Cham (phdcomics.com)

When everything is relevant …

… track everything.

What is version control?

“Version control is a systematic approach to record changes made in a […] set of files, over time. This allows you and your collaborators to track the history, see what changed, and recall specific versions later […]” (Turing Way)

keep track of changes in a directory (a “repository”)

take snapshots (“commits”) of your repo at any time

know the history: what was changed when by whom

compare commits and go back to any previous state

work on parallel “branches” & flexibly “merge” them

“push” your repo to a “remote” location & share it

share repos on platforms like GitHub or GitLab

work together on the same files at the same time

others can read, copy, edit and suggest changes

make your repo public and openly share your work

What are Git and DataLad?

  • most popular version control system
  • free, open-source command-line tool
  • graphical user interfaces exist, e.g., GitKraken
  • standard tool for most (all?) software developers
  • 100 million GitHub users 1
  • “Git for (large) data”
  • free, open-source command-line tool
  • builds on top of git and git-annex
  • allows to version control arbitrarily large datasets 2
  • graphical user interface exists: DataLad Gooey 3

What is GitHub?

  • cloud-based platform for version control using Git
  • allows for collaboration on coding projects in real time
  • hosts millions of public and private repositories
  • supports both Git command line and GUI tools (e.g., GitHub Desktop)
  • enables code sharing, project management, issue tracking, and continuous integration
  • used by companies, open-source communities, and individual developers worldwide
  • 100 million users 4

More benefits of Git(Hub) for project management

  • Discuss and plan your project in issues (even just with your future / past self)
  • Ask questions, share ideas and discuss with your community via GitHub Discussions
  • Propose changes to each other’s projects using pull requests 5
  • Create a fork of someone else’s repository and extend their work
  • Manage access to your projects with detailed permissions and roles
  • Add documentation to your repository or in a separate wiki
  • Access to more features and tools for teaching via GitHub Campus Global

Note

  • The dominance of GitHub (a for-profit company owned by Microsoft) is not uncontested (see #GiveUpGitHub)
  • A project on GitHub is not a FAIR archiving of scholarly outputs (see previous and following slides)

Goal

From this …

To this …

Footnotes

  1. (Source: Wikipedia)

  2. see DataLad dataset of 80TB / 15 million files from the Human Connectome Project (see details)

  3. but apparently not maintained anymore

  4. (Source: Wikipedia)

  5. pull requests on GitHub, merge requests on GitLab