essay:
Git Mail

“So basically, what I’m doing is, sorting the email on the server in a git repository, and then pushing and pulling to that as needed, or just working their over ssh.” I explained

“So what you’re saying, is that you’ve basically reinvented IMAP,” Chris said.

“Yeah, pretty much, except that this works,” I said.

“If you say so.”


I do say so. So this is what it comes down to:

IMAP got one thing right: we need a way of accessing our email that works on public computers, is machine independent, that works offline and online, and that keeps all these mail reading environments synchronized.

The problem is that IMAP is incredibly flakey and inconsistent. Messages that you’ve read suddenly become unread, messages that you’ve moved suddenly pop back into your inbox, it’s slow, and if you don’t have the right mail client and a server that’s tweaked in the right way, it might not really work at all.

If IMAP worked as well as it could, we’d all use it because, ideally it’s the best way to manage email. Everything is stored remotely but cached locally for offline use and backup, you can use multiple machines without worry. Instead I suspect most people who need these features use webmail if they need multi-machine email accounts,[^webmail] and that’s great, if it works for you. Gmail and the like are great pieces of software, I don’t mean to begrudge webmail, I just don’t enjoy the experience of the browser, and seek to avoid it except for browsing.

So while I’d given up on email that synced really well in the last couple of months (preferring to use mutt and procmail locally,) I still longed for this kind of email set up.

So I thought a bit and figured some things out, and here’s what I came up with. (A step by step explanation of how I get email):

  1. A bunch of email addresses (for different contexts) are forwarded to a gmail account, which despamifies my email, and uses it’s own filters to sort and forward mail (using +address aliases) to a secret email account on the web server.
  2. The webserver uses procmail to sort and deliver the email into a non-public (obviously) folder/git repository.
  3. I have a series of scripts to manage the git/mail interaction depending on if I’m on an SSH connection or not, mostly this is straightforward, but here’s how the sync works (commands from the local perspective in parenthesis):
    1. via ssh I add and commit any new mail that may have arrived since the last sync to the repository on the server (ssh git add * && git commit).
    2. Locally I commit any change that’s been made to my mail directory since the last sync, and pull down the new mail from the server (git add * && git commit && git pull)
    3. Push my changes up, and everything merges and the repositories look the same now. and I reset the index of the remote server so to reflect the changes. (git push && ssh git reset --hard)1
    4. Rejoice.2
    5. Set up a cronjob or a launchd daemon to do this automatically every, say 15 minutes, and then you get the illusion of having “push” email.
      • If you get growl to notify you of the results of the pull (end of step 2) you’ll be able to see weather there’s new mail or not.

I think you could probably set much of this up in post-commit/post-push hooks. And I think it goes without saying that if there are more than one “client” repository in play, you’ll want to pull changes more often than you sync.

So why would I (and you?) want to do this? Easy:

  • It’s quick. Git is fast, by design, and since email files have a lot of redundancy in the headers and what not, git can save a lot of space in the “tube” and speed up your email downloads.
  • It’s robust and flexible. If my server stooped working for some reason, I could set up another, or simply start using fetch-mail again. If the server went back up, I just push the changes up, and I’m back in business.
  • Git handles renames implicitly, and this is a really cool feature that I don’t think gets quite enough mention. Basically if I move or rename a file, and add the new file to git, (git add * does this) git realizes that I’ve moved the file. so as files move and get named other things, git realizes that it’s happened, deals with it and moves on.
    • Understandably, if you delete a file locally git won’t delete it from the repository “head” unless you tell it to. This bash script takes care of that: for i ingit status | grep deleted | awk ‘{print $3}’; do git rm --quiet $i; done
  • It’s secure, or at least is if you do it right. All the email gets downloaded over SSH, so no clear text passwords (if you’re using public key authentication) and encrypted data transmission.
  • Oh yeah, and it’s not flakey like IMAP. Totally worthwhile.
    • If there is some sort of syncing problem, you know where it is, as opposed to IMAP making an executive decision without telling you and un-sorting all your email.

Any questions?

Onward and Upward!



Notes:
  1. Git is really good at merging, and because it can track renaming implicitly, it does pretty well in this situation. There are some things you can do to basically ensure that you never have a conflicted merge (because after all, you’re never really editing these files, and so rarely concurrently. for the most part). Remember after pushing to the server, always reset it’s index. Under normal conditions we’d not push to a repository that had an index, so this usually isn’t an issue, but here where it’s important that the server’s repository have files, keeping that index “right” means you won’t undo your changes in successive commits. 

  2. My first instinct was to focus on pushing changes up early, and it turns out that this is the wrong thing to do, as it means potentially that the state of the upstream repository could be wonky when you want to pull. Pulling before you push, locally, prevents having more than one kind of change in any one commit, and makes the whole situation a bit more error proof. 

permalink coments
tagged:

Comments: »

  1. You are doing a few things inefficiently.

    First, use “git add .” to add all files, not “git add \*” (or “git add ‘\*’”), because “git add .” is much faster (and was much, much faster).

    Second, in scripts or pipelines, instead of using “git status” and grepping its output, use “git ls-files” with appropriate option (’–other’ in your case). Read also about “git update-index” about how to mark all removed files as such (see GitFaq or GitTips on Git Wiki (http://git.or.cz/gitwiki/); perhaps “git add -u” or “git add -A” (quite new, and very new inventions) would be what you want.

    Third, you can try to search git mailing list archives about discussion about modifying git in such a way as to better work with maildirs.

    Comment by Jakub Narebski — 30 August 2008 @ 6:03 am


  2. Conceptually, cool. I’m not computer-efficient enough to pull this off (nor do I want to spend more time at my computer right now), but I like following your thought process here. Which I mostly do.

    Comment by Deborah Robson — 30 August 2008 @ 9:28 am


  3. So after some more thinking about it, you could also wire this up backwards, a la:

    Have a bare repository on the centralized server, and then either: run fetchmail in a cron on your local machine and sync the changes upstream (kinda lame, because you have to have fetchmail and procmail running in the same way on every machine that you check your email on, conversely you could wire up a post pull/push hook that syncs up your procmailrc in the repo with your mail folder, but it’s still lame.)

    The second option, is that you run a script in cron that syncs (pull, then push) the repository where procmail sorts your mail to to the bare repository thats elsewhere on the server. The main downside is your local check mail script doesn’t definitively tell you if you have new mail, because the new mail cron might not keep the repository that you’re pulling from completely up to date. While your remote server likely has the capacity for more than one copy of your mail folder, it feels sort of wasteful. This option would also run faster locally, as you’re just doing a pull, where as, right now, you’re locally doing a lot of magic.

    The second option might make more sense if you had more than two computers working on this system, because I think the push/reset command might get a little harried otherwise, but probably not…

    Comment by tycho garen — 2 September 2008 @ 4:27 pm


RSS feed for comments on this post. TrackBack URI

Leave a comment