Minority Opinions

Not everyone can be mainstream, after all.

Uncrunching Git

leave a comment »

I abuse version control.  I use it for program development, of course, even when I have no reason to expect anyone else to care.  I use two separate types for my home directory; one private, another public.  I’ve used git as a package manager.  But the best way I’ve found to utterly break it is a saved game directory.

Git was not designed for large binary files.  It can handle them, but it doesn’t like to.  Fortunately, most binary files change rarely or are significantly rewritten whenever they need to be checked in.  For some saved game data, not so much.  Certain games, particularly of the types I enjoy playing, save lots of data as C structures straight from memory to disk in one or more massive files.

At first, everything seems fine.  Checking in is nice and smooth; requesting a diff will complain that some of the files are binary, but quickly.  The trouble comes when trying to push or pull a large set of changes between repositories.  Git will tell you that it’s compressing files, but will appear to stall.  The computer doing the compression will feel sluggish.  Leave it long enough, and it will be impossible to log in.

The first time this happened, I was able to cancel in time, and used a tag to push half of the commits at a time.  Then I forgot about the issue for a year or two before it bit me again.  This time, with a bit more experience under my belt, I tried an explicit git repack.  No dice; that exhibits the exact same problem.  Then I looked at its help page, and found the –window-memory option, which explains how great it is for “repositories with a mix of large and small objects” just like mine.  Someone had seen my problem before, and already solved it.

About the third or fourth time I found myself repacking the repository before a slow push or pull, I started thinking that there must be a way to make the implicit repack of the push/pull operation use that option.  The help pages for git pack, git push, and git pull mention several configuration options, but nothing for the window memory.  Eventually, a full internet search pointed me to a Stack Overflow answer with the right configuration option.  Which, of course, is perfectly well documented in the git config help page.  Why didn’t I look there first?

So now I have pack.windowMemory set to 1g in my global .gitconfig for all relevant machines, with no noticeable effect on anything else.  I’m also beginning to think that a default value of unlimited is less than optimal; when a single thread consumes enough memory to start thrashing swap, responsiveness gets thrown out the window.  A better default would seem to be the amount of physical memory in the machine.  Is there a good way to determine that?  How valid is the result in the face of process or cgroup limits, virtual machines, and alternative operating systems?

Advertisements

Written by eswald

16 Apr 2013 at 9:35 pm

Posted in Technology

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s