Session 1: Welcome & Introduction to Version Control

Track, organize and share your work: An introduction to Git for research

Course at Max Planck Institute for Human Development

Slides | Source

License: CC BY 4.0 DOI

09:30

1 Logistics & Admin

About

Me

🧑‍🔬 Position: Postdoctoral Researcher & Lab Manager at the Institute of Psychology at the University of Hamburg

🎓 Education: BSc Psychology & MSc Cognitive Neuroscience (TU Dresden), PhD Cognitive Neuroscience (MPIB)

🔬 Research: I study the role of fast neural memory reactivation (“replay”) in the human brain using fMRI

đź”— Contact: You can connect with me via email, Twitter, Mastodon, GitHub or LinkedIn

ℹ️ Info: Find out more about my work on my website, Google Scholar and ORCiD

This presentation

đź’» Slides: Slides are publicly available at https://lennartwittkuhn.com/version-control-course-mpib-2024

📦 Software: Reproducible slides built with Quarto and deployed to GitHub Pages using GitHub Actions

Source: Code is publicly available on GitHub at https://github.com/lnnrtwttkhn/version-control-course-mpib-2024

🙏 Contact: I am happy for any feedback or suggestions via email or GitHub issues. Thank you!

Who are you?

  1. Your name?
  2. Your preferred pronouns?
  3. Your research group?
  4. Your research?
  5. Your mood on a sheep scale?

Mood on a sheep scale.

Course overview

  • Date: Friday, June 21st 2024
  • Time: 9:30 a.m. to 4:30 p.m. (6 hours)
  • Room: Campus Open Space

What will the average session look like?

The course will consist of 6 main sessions (ca. 60 minutes each)

  1. Demonstration (up to 15 minutes):
    The instructor introduces the topic and gives a short demonstration of the main Git commands.
  1. Exercises (up to 45 minutes):
    Course participants actively delve into hands-on exercises and assignments.
  1. Reading (in parallel to the exercises; up to 45 minutes):
    Course participants engage with the online learning materials (aka. our “Version Control Book”).
  1. Discussions (up to 10 minutes):
    Course participants and instructor collectively address any questions related to the session’s content.
  1. Quizzes (up to 10 minutes):
    Course participants complete online quizzes to test their knowledge.

Schedule

No Time Title Contents Reading Survey/Quiz
1 9:30 - 10:00 Welcome & Introduction to Version Control Logistics and course admin
Results of course survey
Introduction to Version Control
Introduction to Git
Introduction to Version Control Course survey
2 10:00 - 11:00 Basics of the Command Line File systems and navigation
Benefits of the command line
Basic command line commands
Command Line Command Line Quiz
3 11:00 - 12:00 Setup & First steps with Git Configuration and setup of Git
Initializing a Git repository
Fundamental Git commands
Setup, First steps with Git Git Basics Quiz
4 12:00 - 13:00 Branches, Merging & Merge Conflicts Understanding branches in Git
Creating and switching between branches
Merging branches
Resolving merge conflicts
Branches Git Branches Quiz
5 13:00 - 14:00 Lunch Break Enjoy your lunch!
6 14:00 - 15:00 Integration with GitLab / GitHub Introduction to remote repositories
Creating and managing repositories on GitLab / GitHub
Pushing and pulling changes
Cloning a remote repository
GitHub Intro GitHub Quiz
7 15:00 - 16:00 Collaboration on GitLab / GitHub Forkinga repository
Collaboration with GitHub Flow
Pull / Merge Requests
Issues
README files
GitHub Advanced, GitHub Issues GitHub Quiz
8 16:00 - 16:30 Summary & Outlook Summary of course contents
Outlook to more Git topics
Discussing open questions

Course website

https://lennartwittkuhn.com/version-control-course-mpib-2024

Version Control Book

https://lennartwittkuhn.com/version-control-book

Exercises, quizzes & surveys

  • We use online surveys to ask you questions and implement exercises or quizzes
  • Implemented in the formr survey framework (open-source, hosted in Germany)

Anonymity & data usage

  • all raw data are kept anonymous and will only be used for research and educational purposes
  • if responses are shared as part of the course, they will be aggregated to ensure anonymity is maintained
  • if you want your data to be deleted, send an email with your personal codeword to rdm@mpib-berlin.mpg.de. Your codeword is then forwarded to us (without your name).

Cheatsheets

Example cheatsheet: Basic Git commands

Command Description
git init Initializes a folder as Git repository
git status Views Git tracking status of files in the repository
git add Adds file(s) to the staging area
git commit Commits staged files
git commit -m "commit message" Commits staged files with a commit message
git log Views past commits
git diff Views made changes compared to the last commit

Pair Programming (variant)

  • Find and say hello to your nearest desk neighbor
  • Complete the exercises together, help each other out, etc.

This illustration is created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Course exercise: Building an online recipes book

https://lennartwittkuhn.com/recipes

Code of Conduct

During this course, we want to ensure a safe, productive, and welcoming environment for everyone who attends. All participants and speakers are expected to abide by this code of conduct. We do not tolerate any form of discrimination or harassment in any form or by any means. If you experience harassment or hear of any incidents of unacceptable behavior, please reach out to the course instructor, Dr. Lennart Wittkuhn (lennart.wittkuhn@tutanota.com), so that we can take the appropriate action.

Unacceptable behavior is defined as:

  • Harassment, intimidation, or discrimination in any form, verbal abuse of any attendee, speaker, or other person. Examples include, but are not limited to, verbal comments related to gender, sexual orientation, disability, physical appearance, body size, race, religion, national origin, inappropriate use of nudity and/or sexual images in public spaces or in presentations, or threatening or stalking.
  • Disruption of presentations throughout the course. We ask all participants to comply to the instructions of the speaker with regard to dedicated discussion space and time.
  • Participants should not take pictures of any activity in the course room without asking all involved participants for consent and receiving this consent.

A first violation of this code of conduct will result in a warning, and subsequent violations by the same person can result in the immediate removal from the course without further warning. The organizers also reserve the right to prohibit attendance of excluded participants from similar future workshops, courses or meetings they organize.

Breaks

  • We will have a one-hour lunch break at 1 p.m.
  • Feel free to take short breaks in-between (sessions) when needed.

2 Survey results

🙏 Thank you for your responses!

Let’s do the splits

3 Introduction to Version Control

Learning objectives

At the end of this session, you should be able to answer the following questions and / or achieve the following learning objectives:

đź’ˇ You know what version control is.
đź’ˇ You can argue why version control is useful (for research).
đź’ˇ You can name benefits of Git compared to other approaches to version control.
đź’ˇ You can explain the difference between Git and GitHub.

Your turn

In this session, you will work on the following tasks:

  1. Reading: Read the chapter “Introduction to Version Control” in the Version Control Book.
  2. Discussion: Discuss the learning objectives with your desk neighbor (quietly)

Learning objectives

đź’ˇ You know what version control is.
đź’ˇ You can argue why version control is useful (for research).
đź’ˇ You can name benefits of Git compared to other approaches to version control.
đź’ˇ You can explain the difference between Git and GitHub.

4 Discussion

The issue of computational reproducibility in science

“… when the same analysis steps performed on the same dataset consistently produce the same answer.” 1

by Scriberia for The Turing Way Community (2022) (Link, CC BY 4.0)

The problem

  • about more than half of research is not reproducible 2
    • research data, code, software & materials are often not available “upon reasonable [sic] request”
    • if resources are shared, they are often incomplete
  • 90% of researchers: “reproducibility crisis” (N = 1576) 3

Why?

  • computational reproducibility is hard
  • researchers lack training
  • incentives are not (yet) aligned 4
  • “natural selection of bad science” 5

“… accumulated evidence indicates […] substantial room for improvement with regard to research practices to maximize the efficiency of the research community’s use of the public’s financial investment.” (Munafò et al., 2017)

We need a professional toolkit for digital research!

Why we need version control …

… for code (text files) © Jorge Cham (phdcomics.com)

… for data (binary files) © Jorge Cham (phdcomics.com)

When everything is relevant …

… track everything.

What is version control?

“Version control is a systematic approach to record changes made in a […] set of files, over time. This allows you and your collaborators to track the history, see what changed, and recall specific versions later […]” (Turing Way)

keep track of changes in a directory (a “repository”)

take snapshots (“commits”) of your repo at any time

know the history: what was changed when by whom

compare commits and go back to any previous state

work on parallel “branches” & flexibly “merge” them

“push” your repo to a “remote” location & share it

share repos on platforms like GitHub or GitLab

work together on the same files at the same time

others can read, copy, edit and suggest changes

make your repo public and openly share your work

What is Git?

  • most popular version control system
  • free, open-source command-line tool
  • graphical user interfaces exist, e.g., GitKraken
  • standard tool for most (all?) software developers
  • 100 million GitHub users 6

5 Outlook

Schedule

No Time Title Contents Reading Survey/Quiz
1 9:30 - 10:00 Welcome & Introduction to Version Control Logistics and course admin
Results of course survey
Introduction to Version Control
Introduction to Git
Introduction to Version Control Course survey
2 10:00 - 11:00 Basics of the Command Line File systems and navigation
Benefits of the command line
Basic command line commands
Command Line Command Line Quiz
3 11:00 - 12:00 Setup & First steps with Git Configuration and setup of Git
Initializing a Git repository
Fundamental Git commands
Setup, First steps with Git Git Basics Quiz
4 12:00 - 13:00 Branches, Merging & Merge Conflicts Understanding branches in Git
Creating and switching between branches
Merging branches
Resolving merge conflicts
Branches Git Branches Quiz
5 13:00 - 14:00 Lunch Break Enjoy your lunch!
6 14:00 - 15:00 Integration with GitLab / GitHub Introduction to remote repositories
Creating and managing repositories on GitLab / GitHub
Pushing and pulling changes
Cloning a remote repository
GitHub Intro GitHub Quiz
7 15:00 - 16:00 Collaboration on GitLab / GitHub Forkinga repository
Collaboration with GitHub Flow
Pull / Merge Requests
Issues
README files
GitHub Advanced, GitHub Issues GitHub Quiz
8 16:00 - 16:30 Summary & Outlook Summary of course contents
Outlook to more Git topics
Discussing open questions

Next session: The command line

Source: Wikimedia Commons (free license)

References

Baker, M. (2016). 1,500 scientists lift the lid on reproducibility. Nature, 533(7604), 452–454. https://doi.org/10.1038/533452a.
Crüwell, S., Apthorp, D., Baker, B. J., Colling, L., Elson, M., Geiger, S. J., Lobentanzer, S., Monéger, J., Patterson, A., Schwarzkopf, D. S., Zaneva, M., & Brown, N. J. L. (2023). What’s in a Badge? A Computational Reproducibility Investigation of the Open Data Badge Policy in One Issue of Psychological Science. Psychological Science, 34(4), 512–522. https://doi.org/10.1177/09567976221140828.
Hardwicke, T. E., Bohn, M., MacDonald, K., Hembacher, E., Nuijten, M. B., Peloquin, B. N., deMayo, B. E., Long, B., Yoon, E. J., & Frank, M. C. (2021). Analytic reproducibility in articles receiving open data badges at the journal Psychological Science : an observational study. Royal Society Open Science, 8(1). https://doi.org/10.1098/rsos.201494.
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(1). https://doi.org/10.1038/s41562-016-0021.
Obels, P., Lakens, D., Coles, N. A., Gottfried, J., & Green, S. A. (2020). Analysis of Open Data and Computational Reproducibility in Registered Reports in Psychology. Advances in Methods and Practices in Psychological Science, 3(2), 229–237. https://doi.org/10.1177/2515245920918872.
Poldrack, R. A. (2019). The Costs of Reproducibility. Neuron, 101(1), 11–14. https://doi.org/10.1016/j.neuron.2018.11.030.
Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal Society Open Science, 3(9), 160384. https://doi.org/10.1098/rsos.160384.
The Turing Way Community. (2022). The turing way: A handbook for reproducible, ethical and collaborative research. Zenodo. https://doi.org/10.5281/zenodo.3233853.
Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61(7), 726–728. https://doi.org/10.1037/0003-066x.61.7.726.

Footnotes

  1. The Turing Way Community (2022), see “Guide on Reproducible Research”

  2. for example, in Psychology: CrĂĽwell et al. (2023); Hardwicke et al. (2021); Obels et al. (2020); Wicherts et al. (2006)

  3. see Baker (2016), Nature

  4. see e.g., Poldrack (2019)

  5. see Smaldino & McElreath (2016)

  6. (Source: Wikipedia)

  7. pull requests on GitHub, merge requests on GitLab