gitversion control

I've read that Git does not store file deltas. If this is true, how does it support file rollback to previous versions? If it's storing the entire file the repository space on disk must grow to be unmanageably large. Does Git support file rollbacks and diff(s) back to file version 1? Does it even suport a versioning concept as related to files? This is (I believe) essential to my understanding of a VCS/DVCS and my needs. I need to be able to compare what I'm about to check in with previous versions.

Best Answer

Git does not throw away information on its own*. All previous versions of every file are always available for reverts, diffs, inspections, et cetera.

Whole-tree versus Individual-files

What you may be trying to reconcile is the idea of accessing an old version of an individual file versus the fact that Git's history model is focused on the whole tree. Whole-tree versioning does require a bit more work to see (for example) the version of foo.c as it existed ten foo.c-changes ago versus ten whole-tree-changes ago:

# 10 foo.c-changes ago
git show $(git rev-list -n 10 --reverse HEAD -- foo.c | head -1):foo.c

# 10 whole-tree-changes ago
git show HEAD~10:foo.c

The benefits of tree-orientation, chiefly the ability to view commits as a unit of interdependent changes made to various parts of the whole tree, generally greatly outweigh the extra typing (which can be alleviated with aliases, scripts, et cetera) and CPU time spent digging through past commits.

Storage Efficiency

When a new object (e.g. a file with previously unseen contents) enters the system, it is stored with plain (zlib) compression as a “loose object”. When enough loose objects accumulate (based on the configuration option; or when the user runs git gc or one of the lower-level packing commands), Git will collect many loose objects into a single “pack file”.

Objects in a pack file can be stored either as plain compressed data (same as a loose object, just bundled up with other objects) or as compressed deltas against some other object. Deltas can be chained together to configurable depths (pack.depth) and can be made against any suitable object (pack.window controls how widely Git searches for the best delta base; a version of a historically unrelated file can be used as a base if doing so would yield a good delta compression). The latitude that the depth and window size configurations give the delta compression engine often results in a better delta compression than the CVS-style simple one-version-against-the-next/previous-version “diff” compression.

It is this aggressive delta compression (combined with normal zlib compression) that can often let a Git repository (with full history and an uncompressed working tree) take less space than a single SVN checkout (with uncompressed working tree and pristine copy).

See the How Git Stores Objects and The Packfile sections of The Git Community Book. Also the git pack-objects manpage.

* You can tell Git throw away commits by “rewriting history” and with commands like git reset, but even in these cases Git “hangs onto” the newly discarded commits for a while, just in case you decide that you need them. See git reflog and git prune.

