Skip to main content
Purging data from source control is essential for maintaining a clean, efficient, and secure codebase. In this guide, we’ll define purging in the context of Git repositories, explain why it matters, compare the top tools, and walk through hands-on examples.

What Is Purging?

Purging a repository means removing unwanted or sensitive files from its commit history. This process helps you:
  • Reclaim disk space
  • Eliminate accidental commits
  • Protect secrets from exposure
The image shows a stack of documents with a magnifying glass, symbolizing examination or review. Below, there's text explaining "Purging" as the process of cleaning up a codebase by removing unnecessary or sensitive files.

Why Purge Files?

By cleaning up your Git history, you can:
  • Optimize Performance: Smaller repos clone and checkout faster.
  • Eliminate Mistakes: Remove large or accidental commits.
  • Protect Secrets: Expunge API keys, passwords, and other sensitive data.
The image lists three reasons for purging files: shrinking repository size for performance, eliminating mistakenly committed large files, and removing files with sensitive information like passwords or API keys.
Always back up your repository before rewriting history. Purging is irreversible.

Repository Cleanup Tools

Here’s a quick comparison of the two leading Git history-rewriting tools:
ToolUse CaseDocumentation
Git filter-repoOfficial, highly configurable, fine-grainedGit filter-repo
BFG Repo-CleanerFast, simple syntax for common cleanup tasksBFG Repo-Cleaner
The image lists two tools for repository cleanup: "Git filter-repo" and "BFG Repo-Cleaner," with brief descriptions of each.

Practical Examples

1. Deleting Large or Unwanted Files

Remove a file named archive.tar.gz:
# Using BFG Repo-Cleaner:
bfg --delete-files archive.tar.gz

# Or with Git filter-repo:
git filter-repo --path archive.tar.gz --invert-paths

2. Removing Sensitive Content

First, list sensitive patterns in passwords.txt (one per line):
PASSWORD
API_KEY
Then run:
# Using BFG Repo-Cleaner:
bfg --replace-text passwords.txt

# Or with Git filter-repo:
git filter-repo --replace-text passwords.txt
Force-pushing rewritten history will overwrite the remote. Coordinate with your team to avoid conflicts.

Final Steps

After rewriting history, complete these actions:
  1. Force-push the cleaned history
    git push --force
    
  2. Notify your team to reclone or reset their local copies:
    git fetch --all
    git reset --hard origin/main
    
Ensure everyone is on the same page to prevent divergent histories.