Version Control of Code and Data

Track, organize and share your work: An introduction to Git for research

Authors

Lennart Wittkuhn

Konrad Pagenstedt

Preface

Figure 1: This illustration is created by Scriberia with The Turing Way community. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3695300 (Version 3, direct download link).

Welcome to the world of version control! The purpose of this book is to empower scientists, researchers, and students with the knowledge and skills needed to use Git for version control of code and data.

Who is this book for?

This guide is meant to be a gentle introduction to version control for (aspiring) scientists, who are seeking to become more effective in managing the evolution of digital objects on their computers. Whether you’re conducting experiments, writing code, collaborating with scientific peers, or managing complex data sets, Git provides a robust framework to enhance the efficiency, reproducibility, and collaboration of your work. While this book was developed with scientists in mind, it’s of course open to anyone who wants to learn more about Git. We try to avoid technical jargon as much as we can. When we discuss best practices in using Git commands, we try to offer multiple alternatives but also give opinionated recommendations as guidance to new users.

What is the purpose of this book?

Version control can be a real game-changer for your scientific projects. By adopting Git, you gain the ability to trace the evolution of your work, experiment with new ideas without fear of irreversible consequences, and collaborate with a global community of researchers. Of course using Git also adds some layer of complexity to your workflow, especially in the beginning. We aim to simplify and demystify this versatile tool for you. Whether you’re new to version control or have dabbled in it before, this book aims to add something for all levels of expertise.

How to use this book?

Git is fundamentally a command-line tool which means you typically interact with it by typing text-based commands into a small command-line window rather than clicking on buttons in a graphical user interface (GUI) as in many other applications. This book focuses on teaching Git from the command line. While the command line is arguably the rockier road to learning Git, we believe that it provides more long-term benefits and allows to make use of the full potential of Git. This book therefore covers basics of the command line to teach you just enough to interact with Git via the command line effectively. That being said, if you prefer to interact with Git via a graphical user interface, you can still learn about the fundamental concepts of Git in this book and then implement them in your preferred GUI.

We also believe in learning by doing and try to focus on implementation as much as possible. The concepts introduced in this book are accompanied by practical examples, hands-on exercises and quizzes. Feel free to follow the exercises to gain the necessary “muscle-memory” to start using Git in your day-to-day work. Try out the commands in each chapter, play around with them or apply them to a project of yours!

This book was initially created for the full-semester course “Effective Progress Tracking and Collaboration: An Introduction to Version Control of Code and Data” at University of Hamburg, Germany. The book is therefore structured for a course that is spread out across multiple sessions. Each chapter and the accompanying quizzes and exercises should roughly fill 90 minutes of class time during a course.

How can I contribute?

This book is constantly evolving and meant as a living resource, and your input can make it even better! If you spot typos, have suggestions for improvement, or want to contribute new content, we welcome your involvement! If you find a typo, an unclear explanation, have an idea for a new chapter or want to see a specific topic covered you are more than welcome to open an issue or submit a pull request in the GitHub repository of this book.

Testimonials

Testimonials of students enrolled in the course “An Introduction to Version Control of Code and Data” at University of Hamburg, where this book acted as the primary learning resource:

The online, tailor-made course book and wiki, along with the exercises and quizzes were incredibly helpful and aided my comprehension.

All materials are conveniently accessible through a course website and completely open source, thus conveying important principles of good scientific practice beyond version control. I particularly appreciated this open approach!