12  Rewriting History

intermediate
Summary

Dive into Git’s history-rewriting features for cleaner commits and efficient collaboration.

12.1 How to avoid accidental commits

Considering rewriting Git history? This usually comes into play when there are errors or changes needed in previous commits. There are a few tricks to avoid committing things you don’t want committed:

  1. Avoid the catch-all commands git add . and git commit -a. Use git add filename and git rm filename to individually stage files.

  2. Use git add --interactive to individually review and stage changes within each file.

  3. Use git diff --cached to review the changes that you have staged for commit. This is the exact diff that git commit will produce as long as you don’t use the -a flag.

  4. You can use also a visual program like GitHub Desktop or GitKraken to commit changes. Visual programs generally make it easier to see exactly which files will be added, deleted, and modified with each commit.

12.2 Removing changes or files

Be careful about data loss!

This section introduces Git commands that may delete changes in your repository beyond recovery. Especially if you apply them for the first time, take your time and only execute a command if you are certain about its consequences. Better yet, experiment with these commands in a separate test repository, not in a repository where you keep your most important work.

12.2.1 Discarding changes in the working directory

Sometimes you might want to revert files back to the state of your last commit. For example, this can be useful when you realize that the recent changes you made to a file have introduced an error. You can use git restore to discard changes in the working directory and revert the files to the state they were in when you last committed them.

Code
git restore <FILE>

It is also possible to revert all files in your repository to the state of your last commit using:

Code
git restore .

12.2.2 Unstaging files

You can also use git restore with the --staged flag to unstage files, you have previously added to the staging area. This can be useful when you accidentally staged files or want to reorganize your commit.

Code
git restore --staged <FILE>

To remove all files from the staging area use:

Code
git restore --staged .

12.2.3 Deleting files

If you want to delete files from your computer and your Git repository, you can use the git rm command. This command removes files from your working directory and automatically stages this change for your next commit. The workflow would go like this:

Code
git rm <FILE>
Code
git commit -m "Delete file <FILE>"

-f or --force: This flag forces the removal of files, even if they are modified.

-r or --recursive: Use this flag to remove directories and their contents.

--cached: This flag removes files from the staging area but preserves them in the working directory.

-n or --dry-run: With this flag, Git will only show you what would be removed but will not actually perform the deletion.

12.2.4 “Untracking” files

Sometimes, you may have files in your Git repository that you no longer want to track or include in future commits, but want to keep in your local filesystem. It makes sense to include these files in a .gitignore file, as discussed in the chapter on first steps with Git. You could also use git rm in combination with the --cached flag.

Code
git rm --cached <FILE>

12.2.5 Reverting commits

The git revert command is used to create a new commit that undoes the changes made in a previous commit. It’s a way to safely reverse the effects of a specific commit without actually removing that commit from the commit history. It requires specifying the commit hash you want to revert. To look at the hashes of your commits, you can use the git log command, as described in the chapter on first steps with Git. If you have a specific commit that you want to revert, you would use a command like:

Code
git revert <commithash>

This will create a new commit that effectively undoes the changes made in the specified commit. This new commit will have the opposite changes, effectively canceling out the changes from the original commit. You might need to resolve a merge conflict, if the changes you want to revert conflict with changes in subsequent commits. By default the commit message will be revert <commit message of reverted commit>. However, Git will open an editor for you to change this message, if you don’t specify it otherwise.

In Git, a commit hash, also known as a commit ID or SHA-1 hash, is a unique identifier for a specific commit in a Git repository. It’s a 40-character hexadecimal string that represents the contents and history of that commit. Each commit in a Git repository has a unique hash.

-n or --no commit: Prevents Git from automatically creating a new commit after reverting changes. It stages the changes, allowing you to make additional modifications or review them before committing.

-m <parent-number>: When dealing with a merge commit, this flag specifies which parent commit to use as the source for reverting. By default, Git uses the first parent (main branch), but you can specify another parent by providing its number.

--no-edit: This flag prevents the text editor from opening for editing the commit message, making it useful when you want to keep the default commit message.

12.2.6 Resetting to a commit

If you want to reset your repository to a commit that was made in the past, you can use the git reset command.

To get an overview of your past commits you can either use git log or git reflog. Both commands should get you an overview of your past commits. For example:

Code
git log --oneline

will give you an output like:

Output
25a51e8 Update README.md
b7f3a12 Add new feature X
8d76a45 Merge branch 'feature-branch'
2f0e73b Implement new functionality

Using git reset <commit> will undo all the changes you made after the specific commit you picked. Your “branch pointer” will move to the specified commit, and changes after that commit will be uncommitted and moved back to the working directory. For example:

Code
git reset 8d76a45

However, these changes are still present in your working directory, so if you want to discard them completely, you can use add the --hard flag:

Code
git reset --hard <commithash>

This will not only reset your “branch pointer” but also discard all changes in your working directory and staging area since the specified commit. It effectively resets your working directory to the state of the chosen commit.

Be careful about data loss!

Please be careful when using the git reset --hard command! This command reset your working directory to the state of the chosen previous commit. This will delete changes beyond recovery, unless you can retrieve these changes from another location, for example a remote repository like GitHub.

git log shows the commit history, while git reflog shows a log of all changes to branches, including resets and other adjustments. Think of git log as a timeline of commits, and git reflog as a detailed diary of all recent changes, even those that alter commit history, providing a safety net for recovery.

--soft: Resets the branch pointer to the specified commit but keeps changes staged. This allows you to rework the changes and create a new commit.

--mixed (default): Resets the branch pointer to the specified commit and unstages changes. Changes are kept in your working directory, allowing you to modify them before committing.

--hard: Resets the branch pointer to the specified commit, unstages changes, and discards changes in your working directory. Use with caution, as it can lead to data loss.

--merge: Resets the branch pointer and the index to the specified commit, but keeps changes in your working directory. This is useful in aborting a merge.

--keep: Resets the branch pointer and index to the specified commit but refuses to do so if there are uncommitted changes in the working directory.

--patch or -p: Allows you to interactively choose changes to reset, similar to the git add -p command for staging changes.

12.3 Purging a file from your repository’s history

Let’s say that you accidentally added a large file to a previous commit. Now you want to remove the file but keep all commits that came afterwards.

For example, this could be the case if you committed sensitive data as a binary file. You will need to remove the file from the history, as you can’t modify it to remove or replace the data.

12.3.1 git filter-repo

git filter-repo is a very useful tool that is not part of the official Git distribution. It is a third-party tool and is designed to be used alongside Git for more advanced repository history manipulation tasks.

To illustrate how git filter-repo works, we’ll show you how to remove your file with sensitive data from the history of your repository and add it to .gitignore to ensure that it is not accidentally re-committed.

12.3.1.1 Installation

Since git filter-repo is not a part of standard Git, you will need to install it first. You can install git-filter-repo manually or by using a package manager. For example, to install the tool with HomeBrew on macOS, use the brew install command.

brew install git-filter-repo

For more information on installation, see INSTALL.md in the newren/git-filter-repo.

12.3.1.2 Clone a fresh copy of the repository

If you don’t already have a local copy of your repository with sensitive data in its history, clone the repository to your local computer.

1git clone https://github.com/YOUR-USERNAME>/YOUR-REPOSITORY
1
Replace YOUR-USERNAME with your GitHub username and YOUR-REPOSITORY with the name of your GitHub repository.

12.3.1.4 Running git filter-repo command

Warning

If you run git filter-repo after stashing changes, you won’t be able to retrieve your changes with other stash commands. Before running git filter-repo, we recommend unstashing any changes you’ve made. To unstash the last set of changes you’ve stashed, run git stash show -p | git apply -R. For more information, see Git Tools - Stashing and Cleaning.

Let’s say that you accidentally committed (and pushed) a file you actually don’t want to share because it contains sensitive data. After you installed the git filter-repo tool, you can use it to exclude a specific file from you commit history.

Run the following command, replacing PATH-TO-FILE-YOU-WANT-TO-REMOVE with the path to the file you want to remove, not just its file name. These arguments will do the following:

  1. Force Git to process, but not check out, the entire history of every branch and tag
  2. Remove the specified file, as well as any empty commits generated as a result 1- Remove some configurations, such as the remote URL, stored in the .git/config file. You may want to back up this file in advance for restoration later.
  3. Overwrite your existing tags

Essentially, git filter-repo will modify your entire Git repository history to exclude the specified file (PATH-TO-FILE-YOU-WANT-TO-REMOVE). The --invert-paths flag inverts the selection, so it keeps everything except the specified path, effectively removing the file from the entire history of the repository. If you do not use --invert-paths, the command would only keep the specified file in your repository, discarding everything else.

1git filter-repo --invert-paths --path PATH-TO-FILE-YOU-WANT-TO-REMOVE
1
Replace PATH-TO-FILE-YOU-WANT-TO-REMOVE with the path to the file** you want to remove, not just its file name.

If you did not clone a fresh copy of your repository, you may see this message after running the git filter-repo command:

Aborting: Refusing to destructively overwrite repo history since
this does not look like a fresh clone.
  (expected freshly packed repo)
Please operate on a fresh clone instead.  If you want to proceed
anyway, use --force.

As described in the message, you have two options:

  1. Clone a fresh copy of your repository and execute the command there.
  2. Add --force to proceed with the existing repository.

12.3.1.5 Add your file to .gitignore

Add your file to .gitignore to ensure that you don’t accidentally commit it again. You can edit .gitignore in your favorite text editor.

1echo "PATH-TO-FILE-YOU-WANT-TO-REMOVE" >> .gitignore
git add .gitignore
git commit -m "Add YOUR-FILE-WITH-SENSITIVE-DATA to .gitignore"
1
This command writes PATH-TO-FILE-YOU-WANT-TO-REMOVE inside the .gitignore file. >> makes sure that this is written to a new line inside the .gitignore file.

Double-check that you’ve removed everything you wanted to from your repository’s history, and that all of your branches are checked out.

12.3.2 BFG

Instead of using git filter-repo it is also possible to use the open source tool BFG repo cleaner. The tool is especially useful if you need to remove very large files from your commit history.

12.3.3 Force-push your local changes to GitHub

If you are happy with the state of your repository, force-push your local changes to overwrite your repository to GitHub, as well as all the branches you’ve pushed up. A force push is required to remove sensitive data from your commit history.

When you perform a force push, you are essentially overwriting the existing commit history on the remote branch with the new commit history from your local branch. So instead of adding changes to your commit history, like with a normal push, a force push changes or discards the existing commit history on that branch. To do a force push, you use git push together with the --force flag. The --all flag is used to push all branches that have corresponding branches on the remote repository, that way you are updating multiple branches at once.

git push origin --force --all

Did you receive this error message?

fatal: 'origin' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

As described above, this could mean that origin was removed from your repository. Configure origin again using either SSH or HTTPS:

1git remote add origin https://github.com/YOUR-USERNAME/YOUR-REPOSITORY
1
Replace YOUR-USERNAME with your GitHub username and YOUR-REPOSITORY with the name of your GitHub repository.
1git remote add origin git@github.com:YOUR-USERNAME/YOUR-REPOSITORY.git
1
Replace <YOUR-USERNAME> with your GitHub username and <YOUR-REPOSITORY> with the name of your GitHub repository.

In order to remove the sensitive file from your tagged releases, you’ll also need to force-push against your Git tags:

git push origin --force --tags
The dangers of force pushing

As stated in this example, force pushing can be very useful, for example if you committed and pushed data you want to remove again. However force pushing comes with potential drawbacks and should mostly be used as a last resort.

Force pushing can lead to the loss of commit history on the remote branch, which may cause confusion or conflicts for collaborators who have based their work on the previous history. Force pushing to a shared branch can cause conflicts for collaborators who have already cloned the repository and pulled changes. Once a force push is executed, the old commit history may become difficult to recover.

Especially if you collaborate with others on your repository, let them know in advance that you are force-pushing to your common remote repository.

12.4 Fully removing data from GitHub

If you you committed really sensitive data and want to make absolutely sure that it’s deleted, you will have to complete a few additional steps. After using either the BFG tool or git filter-repo to remove the sensitive data and pushing your changes to GitHub, you must take a few more steps to fully remove the data from GitHub.

  1. Contact GitHub Support, asking them to remove cached views and references to the sensitive data in pull requests on GitHub. Please provide the name of the repository and/or a link to the commit you need removed.

  2. Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.

  3. After some time has passed and you’re confident that BFG / git filter-repo had no unintended side effects, you can force all objects in your local repository to be dereferenced and garbage collected with the following commands (using Git 1.8.5 or newer):

git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now
Tip

You can also achieve this by pushing your filtered history to a new or empty repo.

12.5 Acknowledgements & further reading

We would like to express our gratitude to the following resources, which have been essential in shaping this chapter. We recommend these references for further reading:

Authors Title Website License Source
GitHub (2023) GitHub Docs CC BY-NC 4.0

12.6 Cheatsheet

Command Description
git restore <FILE> Reverts <FILE> back to the state of your last commit
git restore . Reverts all files in your repository back to the state of your last commit
git restore --staged <FILE> Removes <FILE> from your staging area
git restore --staged . Removes all files in your repository from your staging area
git rm <FILE> Deletes <FILE> from your repository
git rm --cached <FILE> Removes <FILE> from your repository but keeps <FILE> on your system
git revert <commithash> Creates a new commit which reverts your repository back to <commithash>
git reflog Logs recent branch changes
git reset <commithash> Resets the branch to a specified commit, keeping changes in the working directory
git reset --hard <commithash> Resets the branch to a specified commit
git clean Deletes untracked files from your directory
git filter-repo --invert-paths --path < PATH-TO-FILE-YOU-WANT-TO-REMOVE > Remove specified file from your repository history
brew install git-filter-repo Installs git filter-repo using brew