Effective Progress Tracking and Collaboration: An Introduction to Version Control of Code and Data

Session 1

Slides | Source

License: CC BY 4.0

October 20, 2023

Logistics and admin

Team

Teaching Assistant

A portrait photo of Konrad Pagenstedt.

Konrad Pagenstedt

konrad.pagenstedt@uni-hamburg.de
GitHub

Who are you?

  • Your name?
  • Your preferred pronouns?
  • Which study program are you currently enrolled in?
  • What did you study before and where?
  • What do you expect from this course?
  • A fun fact about you?
  • Your mood on rubber duck scale?

Mood on a rubber duck scale.

Course overview

  • Event: Seminar
  • Credits: 4,0
  • Language: English / German
  • Tag: PsyM14-PsyWB-K02

What will the average seminar session look like?

The course will consist of up to 14 sessions (90 minutes each)

  1. Content Review (up to 30 minutes):
    Course participants engage with the online materials, supplemented by concise presentations by the instructors. Some course preparation may occur outside of the class.
  1. Interactive Discussions & Quizzes (up to 15 minutes):
    Course participants collectively address any inquiries related to the session’s content and online materials. Instructor-led quiz questions may also be interspersed throughout.
  1. Exercises & Implementation (up to 60 minutes):
    Course participants actively delve into hands-on exercises and assignments.

Logistics

  • You need a laptop. Talk to use if you don’t have a laptop.

Note, that course participants are not required (but are of course free) to work on course materials outside of class time. All course contents will be covered during class time.

Schedule

No Date Title Notes Contents Reading Survey
1 2023-10-20 Introduction to version control Organisational matters
Overview of seminar sessions
Computational reproducibility
Introduction to version control
Introduction to Git and its advantages
Intro to version control Course introduction survey
2 2023-10-27 Command line File Systems
Benefits of the Command Line
Basic Command Line commands
Command Line Command Line survey
3 2023-11-03 Git Basics Installation and configuration of Git
Initializing a Git repository
Basic Git commands
Ignoring files with .gitignore
Good commit messages

Installation, setup, first steps with Git Installation survey, Git Basics survey
4 2023-11-10 Cancelled Cancelled
5 2023-11-17 Basic Git workflow Practicing basic Git commands
Ignoring files with .gitignore
Good commit messages


First steps with Git Git Basics survey
6 2023-11-24 Cancelled Cancelled
7 2023-12-01 Quarto workshop Introduction to Quarto
Usecases as a scientist
Markdown Syntax
Using code chunks
Workshop Slides
8 2023-12-08 Git Branching and Merging Understanding branches in Git
Creating and switching between branches
Merging branches: fast-forward and recursive
Resolving merge conflicts
Stashing and retrieving changes
Undoing changes
Removing files
Branches Git branches survey
9 2023-12-15 Introduction to GitHub Introduction to remote repositories
Creating a GitHub account
Creating and managing repositories on GitHub
Cloning/Forking a remote repository
Pushing and pulling changes
Branching and merging in a collaborative environment
Graphical User Interfaces (GUIs), e.g., GitKraken
GitHub Intro GitHub Survey
10 2023-12-22 Repetition and practice Initializing repository
Staging and committing
Creating and merging branches
11 2024-01-12 GitHub: Collaboration Organizing Git repositories and projects
Collaborating with team members using issues and pull requests
Using pull requests
Understanding different Git workflows
Introduction to Gitflow workflow
Working with feature branches in Gitflow
12 2024-01-19 GitHub: Advanced, Tags/Releases README.md
Licenses
Contributions
Forking
Practicing GitHub workflow
Introduction to tags and their importance
Best practices for tagging in Git
Integration with Zenodo
13 2024-01-26 Introduction to DataLad - Version control of (large) datasets Guest Lecture by Adina Wagner (DataLad Developer & Project Lead of the DataLad Handbook)
14 2024-02-02 Summary & Wrap-Up Summary & Wrap-Up

Course Website

lennartwittkuhn.com/version-control-course-uhh-ws23

Version Control Book

lennartwittkuhn.com/version-control-book

Exercises, quizzes & surveys

  • we use surveys to ask you questions and implement exercises or quizzes
  • implemented in the formr survey framework

Anonymity & data usage

  • all raw data are kept anonymous, will only be used for the course and never shared publicly
  • the data will be used exclusively for educational purposes as part of the course
  • if responses are shared as part of the course, they will be aggregated to ensure anonymity is maintained
  • you can also complete survey without providing a personal codeword
  • if you want your data to be deleted, send an email with your codeword to sekretariat-luv.psych@uni-hamburg.de

Collaborative notes

HedgeDoc (UHH Pad)

Your role, questions and interactions

Activate participation

  • This is a pass / fail course
  • Requirement 1: Come to at least 12 out of 14 sessions (85%) and sign the attendance list
  • Requirement 2: Complete the exercises and quizzes (in class and online)

Questions & discussions during class time

  • Ask questions! There are no stupid questions!
  • Share your ideas in writing via the notepads
  • Participate in the discussions

Questions & discussions outside of class time

Code of Conduct

During this course, we want to ensure a safe, productive, and welcoming environment for everyone who attends. All participants and speakers are expected to abide by this code of conduct. We do not tolerate any form of discrimination or harassment in any form or by any means. If you experience harassment or hear of any incidents of unacceptable behavior, please reach out to the course instructor, Lennart Wittkuhn (lennart.wittkuhn@uni-hamburg.de), so that we can take the appropriate action.

Unacceptable behavior is defined as:

  • Harassment, intimidation, or discrimination in any form, verbal abuse of any attendee, speaker, or other person. Examples include, but are not limited to, verbal comments related to gender, sexual orientation, disability, physical appearance, body size, race, religion, national origin, inappropriate use of nudity and/or sexual images in public spaces or in presentations, or threatening or stalking.
  • Disruption of presentations throughout the course. We ask all participants to comply to the instructions of the speaker with regard to dedicated discussion space and time.
  • Participants should not take pictures of any activity in the course room without asking all involved participants for consent and receiving this consent.

A first violation of this code of conduct will result in a warning, and subsequent violations by the same person can result in the immediate removal from the course without further warning. The organizers also reserve the right to prohibit attendance of excluded participants from similar future workshops, courses or meetings they organize.

Two RA positions in our group!

Version Control of Code & Data

  • 📓 Tasks: Support our teaching project!
  • 📆 Duration: as soon as possible until March 31 2024
  • 🕐 Time: flexible, up to 37 hours / month (WHK)
  • 💰 Salary: 12.00 € / hour (SHK) / 13.95 € / hour (WHK)
  • ✉️ Contact: lennart.wittkuhn@uni-hamburg.de

Memory reactivation in older adults

  • 📓 Tasks: Support of fMRI data collection
  • 📆 Duration: as soon as possible
  • 🕐 Time: flexible, 60 - 80 hours / month
  • 💰 Salary: 12.00 € / hour (SHK) / 13.95 € / hour (WHK)
  • ✉️ Contact: erc-studies-luv.psych@uni-hamburg.de

Survey results

Introduction to version control

Learning objectives

At the end of this session, you should be able to answer the following questions:

  1. What is version control?
  2. Why is version control useful (for research)?
  3. What are Git and GitHub?
  4. What is the difference between Git and GitHub?

Your turn

Read Chapter 1: “Introduction to Version Control” in the Version Control Book

Why we need version control …

… for code (text files) © Jorge Cham (phdcomics.com)

… for data (binary files) © Jorge Cham (phdcomics.com)

When everything is relevant …

… track everything.

What is version control

“Version control is a systematic approach to record changes made in a […] set of files, over time. This allows you and your collaborators to track the history, see what changed, and recall specific versions later […]” (Turing Way)

keep track of changes in a directory (a “repository”)

take snapshots (“commits”) of your repo at any time

know the history: what was changed when by whom

compare commits and go back to any previous state

work on parallel “branches” & flexibly “merge” them

“push” your repo to a “remote” location & share it

share repos on platforms like GitHub or GitLab

work together on the same files at the same time

others can read, copy, edit and suggest changes

make your repo public and openly share your work

What are git and DataLad?

  • most popular version control system
  • free, open-source command-line tool
  • graphical user interfaces exist, e.g., GitKraken
  • standard tool for most (all?) software developers
  • 100 million GitHub users 1
  • “git for (large) data”
  • free, open-source command-line tool
  • builds on top of git and git-annex
  • allows to version control arbitrarily large datasets 2
  • graphical user interface exists: DataLad Gooey

Note: We will mainly focus on Git and only refer to DataLad as an outlook.

DataLad

Schedule

No Date Title Notes Contents Reading Survey
1 2023-10-20 Introduction to version control Organisational matters
Overview of seminar sessions
Computational reproducibility
Introduction to version control
Introduction to Git and its advantages
Intro to version control Course introduction survey
2 2023-10-27 Command line File Systems
Benefits of the Command Line
Basic Command Line commands
Command Line Command Line survey
3 2023-11-03 Git Basics Installation and configuration of Git
Initializing a Git repository
Basic Git commands
Ignoring files with .gitignore
Good commit messages

Installation, setup, first steps with Git Installation survey, Git Basics survey
4 2023-11-10 Cancelled Cancelled
5 2023-11-17 Basic Git workflow Practicing basic Git commands
Ignoring files with .gitignore
Good commit messages


First steps with Git Git Basics survey
6 2023-11-24 Cancelled Cancelled
7 2023-12-01 Quarto workshop Introduction to Quarto
Usecases as a scientist
Markdown Syntax
Using code chunks
Workshop Slides
8 2023-12-08 Git Branching and Merging Understanding branches in Git
Creating and switching between branches
Merging branches: fast-forward and recursive
Resolving merge conflicts
Stashing and retrieving changes
Undoing changes
Removing files
Branches Git branches survey
9 2023-12-15 Introduction to GitHub Introduction to remote repositories
Creating a GitHub account
Creating and managing repositories on GitHub
Cloning/Forking a remote repository
Pushing and pulling changes
Branching and merging in a collaborative environment
Graphical User Interfaces (GUIs), e.g., GitKraken
GitHub Intro GitHub Survey
10 2023-12-22 Repetition and practice Initializing repository
Staging and committing
Creating and merging branches
11 2024-01-12 GitHub: Collaboration Organizing Git repositories and projects
Collaborating with team members using issues and pull requests
Using pull requests
Understanding different Git workflows
Introduction to Gitflow workflow
Working with feature branches in Gitflow
12 2024-01-19 GitHub: Advanced, Tags/Releases README.md
Licenses
Contributions
Forking
Practicing GitHub workflow
Introduction to tags and their importance
Best practices for tagging in Git
Integration with Zenodo
13 2024-01-26 Introduction to DataLad - Version control of (large) datasets Guest Lecture by Adina Wagner (DataLad Developer & Project Lead of the DataLad Handbook)
14 2024-02-02 Summary & Wrap-Up Summary & Wrap-Up

Footnotes

  1. (Source: Wikipedia)

  2. see DataLad dataset of 80TB / 15 million files from the Human Connectome Project (see details)

  3. pull requests on GitHub, merge requests on GitLab