5  First steps with Git

beginner
basics
Summary

In this chapter, we are exploring the most important basic Git commands that you will need in your everyday workflow.

Learning Objectives

💡 You can initialize a Git repository
💡 You can stage and commit changes
💡 You know how to explore the commit history
💡 You can compare different commits
💡 You know how to use and create a .gitignore file
💡 You can discuss which files can (not) be tracked well with Git and why
💡 You know how to track empty folders in Git repositories

Take the quiz!

5.1 Creating a Git repository

Let’s git started! First, we will create a so-called repository (or repo, for short). A repository is a regular folder on your computer that is tracked by Git. The full history of changes of a project and information about who changed what and when is also stored by Git in this folder.

To create a new Git repository, you need to initialize a folder as a Git repository. First, let’s create an empty folder using the command line1. Alternatively, you can create a new folder using the graphical user interface of your file browser.

Next, use the command line to navigate to the folder which you want to initialize as a Git repository 2. Once you are in the desired folder, run the git init command to initialize the folder as a Git repository:

Code
git init

You should see an output similar to:

Output
Initialized empty Git repository in /foldername/.git/

Congratulations on successfully initializing your first Git repository! 🎉

Git is now able to track your file(s) and the changes you make in this folder. You only need to use git init once per folder.

You can also skip creating a folder manually and only use the git init command:

Code
1git init /Users/yourusername/Desktop/foldername
1
Replace /Users/yourusername/Desktop/foldername with the (absolute or relative) path to the folder which you want to initialize as a Git repository.

Git will create an already initialized folder at the path and with the name you use. Check your current working directory to be sure that you create the Git repository in the desired location 3. You can use both absolute and relative paths as arguments to the git init command to specify the location of your Git repository 4.

Git can now track every file you move, create or change in this folder. You can use this folder like any other normal folder on you computer. The only difference is the tiny but powerful folder called .git within it, which stores the full history of files and relevant metadata of your project, thereby enabling the tracking of your project progress.

If you want to verify that Git is tracking your folder, you can look for the .git folder. To do this navigate to the correct folder and use ls -a to receive a list of hidden files. If the folder has been initialized as a Git repository, you should see a folder called .git. Generally, you should not manually modify the .git folder, since doing so can corrupt your repository.

The .git folder in a Git repository contains all the essential information and configuration for the repository to function properly. Tampering with this folder can have serious consequences, as it can lead to data loss, corruption, or the inability to use the repository effectively. It’s essential to only modify Git repositories through Git commands and established workflows to ensure the integrity and reliability of your project. In short: Don’t mess with the .git folder.

Short version (if you know what you are doing): To remove Git from a folder run:

Code
rm -rf .git

Long version: You may find yourself in a situation where you want to undo the initialization of a Git repository. For example, you initialized the wrong folder as a Git repository. Undoing an initialization is straightforward but should be approached with caution. To completely remove the Git setup from a folder, you can delete the .git directory within it. This is where Git stores all the information about the repository, including the history of commits and configurations. To do this, navigate to the directory of your Git project in the terminal and run the command rm -rf .git. This command forcefully (-f) removes (rm) the .git directory and all of its contents recursively (-r), effectively undoing the git init command. Be aware that this action is irreversible and will permanently delete the repository’s history and configurations, leaving you with just the working directory files (the files currently in your project folder).

5.1.1 Subfolders in a Git repository

Generally, you can organize your Git-initialized project folder in any way you want (just like a regular folder on your computer), for example by creating subfolders within your repository. This allows for better organization of your files and directories, making it easier to manage larger projects. Git will also track the changes in these subfolders. Git commands are designed to work recursively from the parent directory in which git init was executed 5. This means that if you use Git commands within any of the child directories (subfolders) of the repository, those commands will still apply to the entire repository context. For example, if you make changes to a file within a subfolder and commit those changes, Git will process this action in the context of the parent repository, ensuring that the entire project’s history is maintained cohesively.

5.1.2 Checking the status of the Git repository

After you initialized a Git repository, you can use git status to receive the current file tracking status from Git.

Code
git status

If you initialized a new Git repository, you should see an output similar to the following:

Output
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Let’s unpack what this output means:

On branch main

This tells you that you are currently working on the main branch of your Git repository. A branch is like a separate line of development in your project. The default branch is often configured during the setup of Git. You can find out more about branches in a later chapter.

No commits yet

This means that there haven’t been any changes or updates made to the repository yet. You haven’t created or saved any new versions of your project. The term “commit” will be introduced below.

nothing to commit (create/copy files and use "git add" to track)

This indicates that there are no new changes to save or “commit” to the repository at this moment. Git is suggesting that if you want to start tracking changes, you should create or copy some files into the repository’s directory and then use the git add command to tell Git to start keeping track of those files. Once you’ve added files with git add, you can then commit those changes to save them in your project’s history (for more details, see below).

As you can see git status gives you an overview, which files Git is tracking for you. The git status command is helpful for understanding the current state of your Git repository and we recommend to use it frequently, especially before adding or committing changes in files to the Git history (see below for more details).

-s: Provides a more compact and simplified output, showing only the file status in a short format.

-b: Includes information about the branch you are currently on along with the status.

-u: Shows untracked files in the output.

-uno: Suppresses the display of untracked files.

-v: Provides more detailed information, including additional status information about ignored files.

You can use these flags in combination. For example, git status -s -b will display a short and concise output along with the branch information.

5.1.3 Adding files to the Git repository

In the next step, please create an empty text file within your repository 6.

Code
touch filename.txt

You can also use any other method or application for creating files.

5.2 Status, staging and committing

5.2.1 Checking the status of your Git repository again

After you initialized a Git repository and added a file, you can use git status to receive the current file tracking status from Git again. This time, you should see an output similar to the following:

Output
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
  
  filename.txt

nothing added to commit but untracked files present (use "git add" to track)

5.2.2 Introduction to staging and committing

We will now introduce central concepts of Git that might not be super intuitive if you learn about them for the first time, so read carefully. After initializing a folder with Git, it’s capable of recording changes in your files, but it won’t do so automatically. In Git, the process of saving changes involves two steps: the staging area and commits. The staging area is like a preparation area where you gather and organize your changes before saving them as a snapshot. You can choose which changes to include in a snapshot by adding them to the staging area using git add. Once you are happy with the changes in the staging area, you create a commit with git commit. A commit is like taking a snapshot of your project at a specific moment, capturing all changes in the staging area and saving them as a new version of your project. Commits serve as milestones to track your project’s progress and allow you to revert to previous versions if needed. As further introduction to the basic workflow of Git you can check out the last two minutes of the YouTube video: Version control for reproducible research from BERD Academy, which explains the process very vividly.

5.2.3 Staging

You can use git add to place new or modified files in the staging area. The staging area acts as a space for gathering changes you plan to commit shortly. It bridges between your modified files and the next commit.

For example, you can stage a specific file like this:

Code
git add filename.txt

If you use git status again, your file(s) should now show up under “Changes to be committed”.

Output
On branch main

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
    new file:   filename.txt

Your change to the file (or the addition of it to the project directory) is now staged. Especially if you have many files, you don’t have to add every file one by one, but you can simply use the -A flag. This means if you want to stage every file in your project for preparation of your next commit, you can use:

Code
git add -A

Especially for existing project folders that have a lot of files in them, be careful when you stage and / or commit all files in the project directory. Git might start to track files that you are actually not interested in tracking. We recommend to use git status frequently to check which files are in the staging area and will therefore be added to the next commit.

-A: Adds all changes, including modifications, deletions, and new files, in the entire working tree.

-u: Adds modifications and deletions, but not new files.

--ignore-errors: Ignores errors when adding files, allowing the command to continue even if some files cannot be added.

5.2.4 Committing

Now that the changes to your file(s) are staged, you are ready to create a commit.

In simple terms, a commit in Git is like taking a snapshot of your project at a specific moment in time. It’s a way to save the changes you’ve made to your files. When you make a commit, you’re saying, “I want to remember what my project looks like right now”. Each commit in Git includes a record of the changes you’ve made, a description of what you did, and a unique identifier. Commits are like milestones in your project’s history, and they allow you to keep track of all the different versions of your work over time. You can go back to any commit to see what your project looked like at that point or even undo changes if needed. Commits help you manage and document the history of your project.

To create a commit, use the git commit command followed by the flag -m and a commit message in quotes that describes the changes you made. The commit message should be short yet informative, providing enough detail to understand the purpose of the commit. If you just use git commit without adding a commit massage, the text editor of your choosing, opens up and lets you type in a commit message.

Code
git commit -m "Add filename.txt file"

You should see output similar to the following:

Output
[main (root-commit) e9ea807] Add filename.txt file
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 filename.txt

Congratulations! You have successfully created your first commit in the Git repository. 🎉

Commits are the core elements of version control and the “commit history” of your Git repository. They allow you to track the history of your project and easily revert changes if needed.

You can use the same workflow of git add and git commit for every file you add or make changes in.

5.2.4.1 Rock climbing analogy

Using a Git commit is like using anchors and other protection when climbing. If you’re crossing a dangerous rock face you want to make sure you’ve used protection to catch you if you fall. Commits play a similar role: if you make a mistake, you can’t fall past the previous commit. Coding without commits is like free-climbing: you can travel much faster in the short-term, but in the long-term the chances of catastrophic failure are high! Like rock climbing protection, you want to be judicious in your use of commits. Committing too frequently will slow your progress; use more commits when you’re in uncertain or dangerous territory. Commits are also helpful to others, because they show your journey, not just the destination.

from R Packages by Hadley Wickham

In this situation, you command line probably looks like this:

Output
git commit -m "Add filename.txt
>
>
>

You can try one of the following solutions:

  1. Close the quotation marks by entering another ".
  2. Abort the git commit command by typing Ctrl + C simultaneously and repeat it (now with quotation marks closed).

5.2.4.2 Commit description

Although it’s not required, it can be a good idea to add a more thorough description to the commit, in addition to the (shorter) commit message. While the commit message is usually a single short (less than 50 character) line summarizing the change, a more thorough description can be used to add more background information or helpful links that may help to understand the changes of the commit.

To add an additional description directly in the command line, add an extra -m after the commit message, followed by the description in quotes:

Code
git commit -m "Message" -m "Description"

-m : Specifies the commit message inline. For example, git commit -m "Fix typo" allows you to provide a short commit message directly in the command.

-a or --all: Automatically stages all modified and deleted files before committing. This skips the separate git add step.

-v or --verbose: Provides a detailed output, showing the diff of the changes being committed.

-e or --edit: Opens the commit message editor, allowing you to edit the commit message before finalizing the commit.

--amend: Modifies the previous commit. It allows you to add new changes to the previous commit or modify its commit message.

5.2.5 Logging commits

To look at the history of your past commits, you can use the git log command.

Code
git log

You should see output similar to the following:

Output
commit e9ea80781ceed7cc3d6bff0c7bfa71f320ec1f60 (HEAD -> main)
Author: Jane Doe <jane@example.com>
Date:   Thu Jun 29 12:23:53 2023 +0200

    Add filename.txt file

git log is a very useful command because it provides a clear and organized view of a repository’s commit history. It allows you to understand the evolution of a project over time by displaying detailed information about each commit, including changes made, authors, and timestamps.

--oneline: Provides a condensed output with each commit displayed on a single line, showing the abbreviated commit hash and commit message.

-n or --max-count=: Limits the number of commits shown to the specified . For example, git log -n 5 will display the latest 5 commits.

--since=: Shows commits made after the specified . You can use various date formats, such as specific dates or relative expressions like “2 weeks ago” or “yesterday”.

--until=: Shows commits made before the specified .

--author=: Filters commits by the author’s name or email using a specified .

HEAD can be thought as the current spot in your project’s timeline, like a bookmark. When you see HEAD -> main, it means your bookmark is on the main branch, usually at the most recent update. It moves with each commit you make, staying always at the forefront of your project’s development. When you switch branches with a command like git switch, HEAD moves to point to the tip of the new branch, changing the context of your working directory to reflect the latest commit on that branch. HEAD is used in many Git commands to reference the current commit. For example, git reset HEAD is a command used to unstage changes by moving the current branch’s tip back to HEAD, without altering the working directory. Similarly, git checkout HEAD~1 will move HEAD back one commit as a way to look at or revert to a previous state of the project without making any changes to the branch itself.

5.2.6 Comparing versions

Another very handy feature is the git diff command. It allows you to compare two different versions of your file(s).

By default it shows you any uncommitted changes since the last commit. You can explore this by pasting any additional text in your .txt file, for example:

Code
Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. 
Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. 
Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

You can then look at the changes using:

Code
git diff

You should see an output similar to the following:

Output
+++ b/filename.txt
@@ -0,0 +1 @@
+Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid ex ea commodi consequat. Quis aute iure reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint obcaecat cupiditat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
\ No newline at end of file

The output of git diff includes several pieces of information:

  • +++ and --- indicate the paths to the files being compared.
  • @@ -0,0 +1 @@ is a unified diff header. It shows where the changes occurred. Note, that this line will look different for each change or commit.
  • + indicates added lines and - indicates removed lines. Lines that are identical between the two versions are not explicitly shown.

You can also use git diff to compare two specific commits. For example, if you want to compare the current state of your file with a commit from a few commits ago:

Code
git diff <commit-A> <commit-B>

Instead of <commit-A> and <commit-B>, you use the commit hashes, that you can see when using git log.

If you’ve staged some changes using git add, you can compare the staged changes to your last commit.

Code
git diff --staged

If you only want to look at the changes of a specific file, you can specify that file.

Code
1git diff <filename>
1
Replace <filename> with the actual name of the file that you want to inspect.

This command can also be combined with additional flags and options.

--cached or --staged: Shows the changes that are staged (added) but not yet committed.

<commit>: Displays the difference between the current working directory and a specific commit. For example, git diff abc123 shows the changes compared to commit abc123.

<commit> .. <commit>: Shows the difference between two specific commits. For example, git diff abc123 def456 displays the changes between commit abc123 and commit def456.

--name-only: Outputs only the names of the files that have differences, without showing the actual changes.

--name-status: Displays the names of files along with a status identifier that indicates if a file was added, modified, or deleted.

--color-words: Highlights the differences at the word level, providing more granular detail.

5.2.6.1 Commit hashes

To compare the changes between two commits, you can use git diff followed by the commit hashes of the two points in history you’re interested in comparing. A commit hash is akin to a unique fingerprint for each commit you make to your repository. This hash is a long string of characters generated by Git using a cryptographic hash function (specifically SHA-1, though Git is transitioning to SHA-256 for enhanced security). The purpose of this hash is to uniquely identify every single change set or commit in the history of your repository. Think of it as a precise identifier that Git uses to track the specific state of your project at the moment of each commit. Every commit hash is unique. Even the slightest change results in a completely different hash.

5.2.7 Good commits

You should not consider a commit as a general Save button, that you use to save all your recent changes in one go. Instead, each commit should ideally contain one isolated and complete change. For example, if you want to rename a variable and add a new enhancement, put the variable rename in one commit and the enhancement in another commit.

This approach to commits helps you down the line. For example, if you want to keep the enhancement but revert the renaming of the variable, you can revert the specific commit that contained the variable rename. If you put the variable rename and the enhancement in the same commit or spread the variable rename across multiple commits, you would spend more effort reverting your changes.

5.2.8 Commit message etiquette

Git allows for a commit message with a 72 character limit and a description without such a limit. In general, you should aim to write clear and short commit messages. There are different conventions across projects/persons but also some general guidelines, you can stick to.

It is standard to start the title with an imperative verb to indicate the purpose of the commit, for example, “add”, “fix” or “improve”. It is also recommended to only write 50 characters for the title and up to 72 for the description (if such a description is even necessary). Here is an example:

Example
git commit -m "Implement user registration feature

This commit adds the user registration functionality to the application. It includes the following changes:
- Created a new 'register' route and view for user registration.
- Added form validation for user input."

An example of commit messages, which are not really optimal, are illustrated in Figure 5.1.

Figure 5.1: “Git Commit” by xkcd (License: CC BY-NC 2.5)

Some examples for commit messages, which stick to the etiquette:

  • "Add 'favourites.txt' to the project"
  • "Fix typo in first recipe"
  • "Improve code comments for clarity"
  • "Fix critical security vulnerability (CVE-2022-1234)"
  • "Refactor database query functions for efficiency"
  • "Update installation instructions in README"

5.2.9 Amending a commit

The git commit --amend command is used to modify (or “amend”) your most recent Git commit in your repository. It allows you to edit the commit message, add more changes to the commit, or simply adjust the last commit without creating a new commit. This flag can be useful if you forgot to include something in your last commit, made a typo or want to change your commit message. You can also combine --amend with other flags. To change the commit message of your last commit (without opening an editor) try:

Code
git commit --amend -m "changed commit message"

Replace "changed commit message" with your actual improved commit message.

If you want to to keep your commit message but include new changes to your file(s) in you last commit, you can use:

Code
git commit --amend --no-edit

The --no-edit flag, lets you keep the commit message of your most recent commit.

5.2.9.1 The repeated --amend workflow

Let’s consider the following part of the rock climbing analogy for commits again:

use more commits when you’re in uncertain or dangerous territory.

The “repeated amend” is a Git workflow where you avoid cluttering your history with numerous tiny commits. Instead, you gradually build up a “good” commit by amending it repeatedly. Continue making small changes and amending the existing commit, refining it over time. This method keeps your Git history useful and looking good, making sure your commits tell a straightforward story of how your project evolved.

This workflow can be useful when you’re working on a new feature or a significant change and you might make several small adjustments to get it right. Instead of cluttering your history with numerous tiny commits for each tweak, you can use the “repeated amend” to gradually build up a well-polished commit that includes the entire feature.

5.2.10 Partial commits

Git allows you to make partial commits by staging only specific parts of your file before committing. This can be achieved using git add with the -p or --patch option, which allows you to interactively choose which changes to stage.

To try this make some changes to your file(s) and use:

Code
git add -p

This will prompt you with each change, giving you options to stage, skip, or split the changes. You’ll see a series of prompts like this:

Output
+ Example text ...
+
+
+
(1/x) Stage this hunk [y,n,q,a,d,e,?]?

These prompt options respectively stand for:

  • y: Stage this hunk.
  • n: Do not stage this hunk.
  • q: Quit. Do not stage this hunk or any remaining hunks.
  • a: Stage this hunk and all later hunks in the file.
  • d: Do not stage this hunk or any later hunks in the file.
  • /: Search for a hunk matching the given regex.
  • e: Manually edit the current hunk.
  • ?: Print help.

Type one of the symbols in the command line to proceed in the desired manner.

In Git, a “hunk” refers to a distinct block of code changes within a file. It represents a cohesive set of added, modified, or deleted lines in a specific location. Git automatically divides changes into hunks to facilitate easier review, selective staging, and conflict resolution during version control operations.

Our recommendation: Use a GUI for partial commits

In our opinion, using partial commits on the command line is a bit of a hassle. This would be a good use case for a Git Graphical User Interface (GUI). To checkout how to do partial commits using GitKraken checkout the GUI chapter in this book.

5.3 What files can/should I track with Git?

In principle, any file can be tracked with Git. However, to make use of the full potential of Git, it is recommended to mainly track plain text files with Git.

5.3.1 Code files

Tracking code files is the most common and original use case for version control with Git. Git is well-suited for tracking changes in source code, and it’s widely used by developers and teams for this purpose. Whether it’s a single developer maintaining a personal project or a large team collaborating on a complex software system, Git excels at tracking code files throughout the development lifecycle.

5.3.2 Plain textfiles

Git is a useful tool even if your project contains few or no code at all, especially in comparison to a setup that uses emails or shared Dropbox folders for “version control” (see the introduction chapter). However, to really be able to use the full set of features of Git, you should rely on plain text files for your project. This is because .docx files are saved as binary files, which makes meaningful outputs of the text inside it impossible for Git. “Plain textfiles” does not mean you have to use .txt files. Instead you can use formattable Markdown (.md) files. Markdown (.md) files are plain text files that contain formatting elements, making them more versatile than plain .txt files. Markdown allows you to add simple formatting like headings, lists, links, and images.

5.3.3 Microsoft Office files

As said earlier, it is not recommended to track Microsoft office files using Git, since Git treats .docx files as binary files. Binary files lack the inherent structure that allows Git to capture and display changes effectively. Git relies on understanding the differences between versions of files. With binary files, you won’t benefit from the ability to view detailed textual differences (git diff) or effectively use branches. Collaborative work on binary files, such as simultaneous editing of a Word document, may result in complex merge conflicts that are challenging to resolve.

So while it is possible to track .docx files with Git, you will not be able to use the many features of Git, which rely on Git being able to display the file, since Git can only output the “zeros and ones” and not the text inside the .docx files.

If you ever choose to track them regardless, you should use detailed commit messages, since you will not be able to easily look at the text content of past versions. You will also need to know about temporary word files. For it’s own version control system, Word creates temporary files in the folder of your word file. You should not track these files using Git, since Word creates a lot of them. To not stage or commit these files by accident, you can use the .gitignore file.

5.4 Ignoring files and folders: .gitignore

.gitignore is a special file. It is used to specify files and folders that should be ignored and not tracked by Git. When you create a .gitignore file, place it inside your Git repository and specify names of files or folders inside it. Git will then exclude these files and directories from being staged or committed, for example when using git add -A which normally stages all your files.

The .gitignore file is useful to prevent certain files or directories that are not essential for the project or generated during the development process from being included in the version history. Files that can recreated from the code, like for example .png plots from R code, should also be included in your project’s .gitignore file. Including only essential files in version control keeps your repository clean, making it easier for collaborators to focus on the important project files without distractions from unnecessary files. .gitignore can also help you avoid accidentally committing sensitive information like passwords, API keys, or personal data, which can lead to serious privacy breaches. Huge files, like big data files should also not be committed, since they can slow down working with your repository. To version control files with a big size, tools like DataLad are more appropriate.

To create a .gitignore file from the command line, navigate to your repository and use the touch command. Alternatively, create the file in your favorite text editor.

Code
touch .gitignore

.gitignore will be a hidden file, so to make it show up in the command line, you will have to use the -a flag in the ls command. For details on listing of files and folders, see the chapter on the command line.

Code
ls -a

Now you can write a filename or folder name inside it, to prevent Git from tracking it.

It’s recommended to add a project-specific .gitignore file to your repository using the regular stage-commit workflow that you learned about in this chapter, for example:

git add .gitignore
git commit -m "Add .gitignore"

5.4.1 Global .gitignore file

Instead of specifying files you want Git to ignore for each folder, you can also create global .gitignore file. To do this, you create a file named .gitignore_global (or any name you prefer) in your users’s home directory (located, for example, at /Users/yourusername). Then, you configure Git to use this global file by running:

Code
git config --global core.excludesfile ~/.gitignore_global

After this setup, you can add common files to this list, that you want to ignore. These rules will apply across to all Git repositories on your computer.

As discussed in the command line chapter wildcards are special characters that represent patterns of filenames or directory names. They can be used to specify multiple files or directories that should be ignored by Git when tracking changes. Wildcards allow you to match multiple files with a single rule, making it more convenient to exclude specific types of files or patterns.

*: Matches any number of characters within a filename. For example, *.txt will match all files with the extension .txt in any directory.

?: Matches a single character within a filename. For example, image?.png will match files like image1.png or imageA.png.

/: Matches the root directory of the repository. For example, /config will match a directory named config only in the root of the repository.

[] (Square Brackets): Matches any single character within the brackets. For example, file[123].txt will match file1.txt, file2.txt, or file3.txt.

!: Negates a pattern and includes files that would otherwise be ignored. For example, !important.txt will exclude important.txt from being ignored, even if there’s a wildcard pattern that matches it.

  • Temporary files and output: Ignore files generated during analysis, like log files, temporary files, or intermediate data files.
  • Data folders: Exclude large datasets or data stored locally.
  • Environment-specific files: Exclude environment-specific files like .env files or venv folders used for local development setups.
  • System-specific files: In macOS, ignore .DS_Store files, and in Windows, ignore Thumbs.db files.
  • R: Ignore .Rdata files and /Rplots.pdf generated by R for plotting.
  • Python: Ignore .pyc (Python compiled) files and pycache folders.
  • LaTeX: Ignore auxiliary files like .aux, .log, .bbl, and .blg files generated during LaTeX compilation.

# R-specific
*.Rproj.user/
*.Rhistory
.RData
.Rproj

# R package specific
.Rcheck/
man/*.Rd
NAMESPACE

# Temporary files
*.bak
*.csv~
*.html
*.pdf

# RMarkdown-specific
*.knit.md
*_cache/
*_files/

# R Markdown Notebook
*.nb.html

# RStudio Project Files
*.Rproj

# R Environment Variables
.Renviron

# R dcf file
DESCRIPTION.meta

# Mac-specific
.DS_Store

.DS_Store is a hidden file created by macOS in every folder to store custom attributes of the folder, such as the position of icons or the choice of a background image. The filename stands for “Desktop Services Store” and it helps macOS Finder maintain the folder’s view settings. .DS_Store is very commonly ignored from version control using the .gitignore file, for various reasons:

  1. Unnecessary for the project: .DS_Store files contain metadata that are only relevant to the local macOS file system and do not provide any useful information for the project itself. Including them in the repository adds unnecessary clutter.

  2. Avoiding conflicts: Since .DS_Store files are automatically generated by macOS, different users may have different versions of .DS_Store files based on their personal Finder settings. Should .DS_Store be tracked by Git, this can lead to unnecessary version conflicts when collaborating on a project.

  3. Irrelevant for collaborators with another operating system: Anyone else who is not using macOS (for example, Windows or Linux users) do not need these files and they serve no purpose for them. Including .DS_Store in the repository could confuse contributors who are not using macOS.

To add .DS_Store to .gitignore, you can include the following line in your .gitignore file:

Code
.DS_Store

This will ensure that any .DS_Store files in the project directory and its subdirectories are ignored by Git and not tracked in the repository.

5.5 Tracking empty folders

Normally, Git does not track empty folders in a repository. However, tracking an empty directory in a Git repository can be useful in some scenarios. For example, you want to track an empty folder as a placeholder for future content or as a way to organize the project more clearly.

A workaround, to get Git to track your empty folder, is adding a hidden .gitkeep file in this folder7. This not an official Git feature but rather a convention adopted by software developers. That means you could name this file anything, but .gitkeep is used by convention to explicitly state its purpose for keeping otherwise empty folders in Git.

As discussed above, Git is not suitable for tracking a lot of large files, as is often the case when working with research data. Imagine that you use Git for your code that you use to analyze data from a research project. Usually, your analysis code needs access to the research data but you think that the research data is too large to be tracked with Git.

Here is what you could do:

  1. Create an empty subfolder in your analysis Git repo, called data.
  2. Add a .gitkeep file to the /data folder and commit it.
  3. Add the /data folder to your .gitignore file.
  4. Add your research data to the /data folder.

This is the result: Your analysis code and research data can now live in the same repository. Your analysis code can access your research data in the /data folder. However, the contents of /data are not committed to the repository, thanks to the .gitignore file. If you later share that repository with collaborators, they will only receive a repository with an empty /data folder, thanks to .gitkeep. The advantage is that your collaborators can easily understand where to put the research data. Note, however, that you still need to make the research data accessible to your collaborators in another way.

If you are interested in version control of larger datasets, you can check out DataLad.

5.6 Cheatsheet

Command Description
git init Initializes a folder as Git repository
git status Views Git tracking status of files in the repository
git add Adds file(s) to the staging area
git commit Commits staged files
git commit -m "commit message" Commits staged files with a commit message
git log Views past commits
git diff Views made changes compared to the last commit

5.7 Acknowledgements & further reading

We would like to express our gratitude to the following resources, which have been essential in shaping this chapter. We recommend these references for further reading:

Authors Title Website License Source
The Turing Way Community (2022) The Turing Way: A handbook for reproducible, ethical and collaborative research CC BY 4.0
Millman et al. (2018) Teaching Computational Reproducibility for Neuroimaging CC BY 4.0. Website:
McBain (2019) Git for Scientists CC BY-SA 4.0
Bryan (2023) Happy Git and GitHub for the useR CC BY-NC 4.0

  1. 💡 Tip: Use mkdir FOLDER_NAME (details here)↩︎

  2. 💡 Tip: Use cd (details here)↩︎

  3. 💡 Tip: Use pwd (details here)↩︎

  4. 💡 Tip: Details about absolute and relative paths here↩︎

  5. 💡 Tip: Details about parent and child directories here↩︎

  6. 💡 Tip: Use touch (details here)↩︎

  7. 💡 Tip: Use touch .gitkeep (details here)↩︎