Working with large repositories

Challenges in Large Repositories
Optimize Clone Time with a Shallow Clone
Managing Large Files
Git Large File Storage (LFS)
Git-Fat
Cross-Repository Sharing
Sparse Checkout for Partial Working Copies
Partial Clone to Defer Large Object Downloads
Background Prefetch to Keep Objects Up to Date
Links and References

As projects evolve, repositories often accumulate extensive commit histories and hefty binary assets. These factors can slow down everyday Git operations like clone, fetch, and checkout. This guide explores strategies and tools to keep your workflow snappy—even in very large repositories.

Challenges in Large Repositories

Long histories, numerous branches, and large binaries contribute to degraded performance. Cloning or fetching a repo with hundreds of megabytes (or gigabytes) of data can take minutes or even hours.

The image illustrates challenges in managing large code repositories, highlighting extensive commit histories and the presence of large binary files.

Optimize Clone Time with a Shallow Clone

A shallow clone downloads only the most recent commits and omits deep history:

git clone --depth <number-of-commits> <repository-url>

Replace <number-of-commits> with how many commits you need (e.g., 1 for just the latest). This approach can reduce clone time and disk usage dramatically.

The image illustrates a concept of optimizing history in large repositories, showing folders labeled "Code" and "Clone" connected to a large repository icon. It suggests using shallow cloning to speed up cloning time.

Shallow clones are ideal for read-only CI jobs or quick code reviews, but avoid them if you need full history for debugging or releases.

Managing Large Files

Storing large binaries in Git can bloat your .git directory. Two popular tools help:

Git Large File Storage (LFS)

Git LFS moves big assets—audio, video, datasets—to a separate server while keeping lightweight pointers in your repo:

git lfs install

Then track files by pattern:

git lfs track "*.psd"
git add .gitattributes

The image illustrates Git Large File Storage (LFS), showing how it replaces large files like audio, video, and graphics with text pointers in Git, while storing the actual file contents on a remote server.

Git-Fat

An alternative is Git-Fat, which also offloads large blobs and keeps pointer files in your Git history:

git fat init
git fat track "*.zip"

The image is an infographic about "Git-Fat," a tool for handling large files in Git repositories, showing how it manages audio, video, and graphics files to reduce repository size.

To avoid duplicate code, share common libraries or components across projects. You can use Git submodules, subtrees, or a package manager.

Method	Description	Example Command
Submodule	Embed another repo at a fixed path	`git submodule add <repo-url> path/to/module`
Subtree	Merge external repo into a subdirectory	`git subtree add --prefix=lib/my-lib <repo> main`
Package	Publish and consume via npm, Maven, or NuGet	`npm install @myorg/my-lib`

The image illustrates cross-repository sharing, showing two repositories (A and B) connected by shared code and components to reduce duplication and improve maintainability.

Sparse Checkout for Partial Working Copies

If you only need a subset of files, sparse checkout lets you clone the full Git history but only check out selected paths:

git clone <repository-url>
cd <repo-directory>
git sparse-checkout init --cone
git sparse-checkout set path/to/folder another/path

This keeps your working tree lean by populating only the directories you specify.

The image illustrates the concept of sparse-checkout, showing a diagram with "Repository A" and "Repository B" connected to "Large Repositories," highlighting the ability to check out only a subset of a repository.

Partial Clone to Defer Large Object Downloads

A partial clone avoids downloading all blobs up front and fetches objects on demand:

git clone --filter=blob:none <repository-url>
cd <repo-directory>
git sparse-checkout init --cone       # optional, to combine with sparse checkout

Git will automatically retrieve missing objects the first time you access them.

The image is a flowchart illustrating the process of a partial clone in Git, showing steps to initialize a repository, enable partial clone, and download necessary or all Git objects based on the decision.

Background Prefetch to Keep Objects Up to Date

Enable background prefetching so Git periodically pulls object data from remotes—reducing wait times during normal fetch operations:

git config --global fetch.writeCommitGraph true
git config --global fetch.showForcedUpdates true
git config --global remote.origin.promisor true
git config --global core.commitGraph true

With these settings, Git maintains an up-to-date local object database that accelerates subsequent fetches.

The image is an infographic titled "Background Prefetch," illustrating a process for downloading Git object data from large repositories every hour to reduce time for foreground Git fetch calls.

Links and References

Watch Video

Introduction to managing repositories

Large repositories with Scalar

⌘I

Introduction

Configure Activity Traceability and Flow of Work

Configure Collaboration Communication

Branching Strategies for Source Code

Configuring and Managing Repositories

Design and Implement Pipeline Automation

Design and Implement a Package Management Strategy

Design and Implement Pipelines

Implementing an Orchestration Automation Solution

Design and Implement Deployments

Maintain Pipelines

Design and Implement Authentication and Authorization Methods

Design and Implement a Strategy for Managing Sensitive Information in Automation

Implement Security and Validate Code Bases for Compliance

Analyze Metrics

Configure Monitoring for a Dev Ops Environment

Design and Implement Infrastructure as Code Ia C

Work with Azure Repos and Git Hub

Working with large repositories

Challenges in Large Repositories

Optimize Clone Time with a Shallow Clone

Managing Large Files

Git Large File Storage (LFS)

Git-Fat

Sparse Checkout for Partial Working Copies

Partial Clone to Defer Large Object Downloads

Background Prefetch to Keep Objects Up to Date

Links and References

Watch Video

Introduction

Configure Activity Traceability and Flow of Work

Configure Collaboration Communication

Branching Strategies for Source Code

Configuring and Managing Repositories

Design and Implement Pipeline Automation

Design and Implement a Package Management Strategy

Design and Implement Pipelines

Implementing an Orchestration Automation Solution

Design and Implement Deployments

Maintain Pipelines

Design and Implement Authentication and Authorization Methods

Design and Implement a Strategy for Managing Sensitive Information in Automation

Implement Security and Validate Code Bases for Compliance

Analyze Metrics

Configure Monitoring for a Dev Ops Environment

Design and Implement Infrastructure as Code Ia C

Work with Azure Repos and Git Hub

​Challenges in Large Repositories

​Optimize Clone Time with a Shallow Clone

​Managing Large Files

​Git Large File Storage (LFS)

​Git-Fat

​Cross-Repository Sharing

​Sparse Checkout for Partial Working Copies

​Partial Clone to Defer Large Object Downloads

​Background Prefetch to Keep Objects Up to Date

​Links and References

Watch Video

Challenges in Large Repositories

Optimize Clone Time with a Shallow Clone

Managing Large Files

Git Large File Storage (LFS)

Git-Fat

Cross-Repository Sharing

Sparse Checkout for Partial Working Copies

Partial Clone to Defer Large Object Downloads

Background Prefetch to Keep Objects Up to Date

Links and References