Resources

Git for beginners

The Turing Way: Version Control

The Turing Way is a community-driven, open-source project developed by The Alan Turing Institute. Its primary goal is to provide guidance on conducting reproducible, ethical, and collaborative data science. The project aims to make data science accessible and inclusive by offering a set of resources, guidelines, and best practices. The Turing Way Handbook is organized into multiple chapters, each focusing on different aspects of data science and research methodology. The chapter on Version Control in The Turing Way Handbook is dedicated to teaching users the fundamentals and benefits of using version control systems, with a particular focus on Git. It not only teaches the technical aspects of version control but also embeds these practices within the broader context of reproducibility and collaboration in research.

Authors Title Website License Source
The Turing Way Community (2022) The Turing Way: A handbook for reproducible, ethical and collaborative research CC BY 4.0

Software Carprentry: Version Control with Git

The Carpentries is a non-profit organization that teaches foundational coding, data science, and computational skills to researchers, scientists, educators, and data professionals. The Carpentries organizes and runs workshops around the world, focusing on teaching foundational skills in programming, version control, data analysis, and more. Software Carpentry is one of the key programs under the broader Carpentries umbrella, focused on teaching researchers and professionals foundational software skills that are essential for effective and reproducible computational research. The “Software Carpentry: Version Control” with Git session is a good, comprehensive and beginner-friendly introduction to Git. However, topics like branching strategies, rebasing, stashing, and cherry-picking, which are crucial for more complex workflows, are not covered.

Authors Title Website License Source
Koziar et al. (2023) Software Carpentry: Version Control with Git CC BY 4.0

Pro Git

The Pro Git book is a standard resource for learning Git, offering a comprehensive and in-depth guide to version control with Git. Written by Scott Chacon and Ben Straub, the book covers everything from the basics of Git to advanced topics like branching strategies, workflows, and Git internals. The book is available for free online, making it an excellent resource for anyone looking to deepen their understanding of Git. However, the Pro Git book can be dense, and some sections may feel overwhelming for newcomers. Additionally, while it’s very thorough, users looking for quick, specific answers might find the book’s extensive coverage a bit too detailed. Despite these drawbacks, the Pro Git book is one of the best resources available for mastering Git.

Authors Title Website License Source
Chacon and Straub (2014) Pro Git CC BY-NC

GitHub

Official GitHub documentation

The official GitHub resources are essential for anyone working with Git and GitHub. They offer clear, detailed guidance on using GitHub for version control, collaboration, and project management, covering everything from basic Git commands to advanced features like Actions, Pull Requests, and Issues. The documentation is well-organized and comprehensive. A downside is that the documentation can sometimes be overwhelming due to its depth and the wide range of topics it covers.

Authors Title Website License Source
GitHub (2023) GitHub Docs CC BY-NC 4.0

Git with R

Happy Git with R

Jennifer Bryan is a prominent figure in the data science and statistics community, particularly known for her work in the R programming language. She is a statistician, data scientist, educator, and open-source advocate with a record in the fields of data analysis, data visualization, and reproducible research. “Happy Git with R” is a user-friendly guide authored by her that focuses on helping R users integrate Git and GitHub into their data science workflows. The guide is particularly useful for those working in data science and using R, providing clear, step-by-step instructions on how to use version control and collaboration tools.

Authors Title Website License Source
Bryan (2023) Happy Git and GitHub for the useR CC BY-NC 4.0

Version Control of Data

DataLad

DataLad is an open-source tool designed for the management, sharing, and reproducibility of large-scale data and code, particularly in research environments. It builds on top of Git and Git-annex to provide a version control system for data, making it easier for researchers to handle datasets that are too large to be efficiently managed by Git alone.

The DataLad Handbook

The DataLad Handbook is a comprehensive resource that provides users with guidance on how to use DataLad for managing, sharing, and reproducibly analyzing data. It is available online for free and is designed to be a practical, user-friendly guide for both beginners and advanced users of DataLad.

Authors Title Website License Source
Community (2021) The DataLad Handbook CC BY-SA 4.0