remote encrypted backups with duplicity

Replies:

  • None.

Parents:

  • None.
In the last couple months I have been trying to come up with a
coherent data hosting and backup strategy. I have come up with
something I am fairly happy with (and haven't quite implemented
yet):

For large media (music, TV, movies), store it all in one place on
a 300 GB or 500 GB drive, and back it up once a week to an
external USB drive of the same size. Exchange the USB drive with
a similar one once a year or so. (so you have an weekly full
backup, and an annual snapshot)

For smaller but more important data (e.g. the CVS repository that
holds my web site, currently about 30 GB), store it in that same
master space but also create redundant backups of it with various
online services, like Amazon S3 and anywhere else you have space.
(I'm currently using Dreamhost where I have 400 GB of space for
$16/mo, and my disk quota increases faster than I'll use it.)
http://www.dreamhost.com/r.cgi?227949

I haven't implemented the first part of my plan yet but a couple
months ago I ran out of room on the server where I was backing up
my CVS repository, so that motivated me to figure out how to
implement the second part of my plan for smaller data.

I wanted to be able to back up my data in a way that doesn't let
the backup host (e.g. Amazon or Dreamhost) read it; there's
nothing especially secret in there but might be one day, and
Amazon or whoever would probably happily give up my data if it
were subpoenaed for some reason.

Also, I didn't want to rely on any particular data hosting
service; I don't expect S3 to go away any time soon but I don't
think they've committed to offering it indefinitely, and I'd like
to use multiple storage services in case one fails.

A few things I tried to accomplish these goals:

 - Brackup http://brad.livejournal.com/tag/brackup

   Sounds like exactly what I need, but after a couple hours of
   struggle I was unable to get it to work. I don't remember
   what the problems were, and didn't send bug reports. (it was
   over a month ago when I tried.) I'm planning to try it again
   after it matures a bit more.

 - Write my own code to encrypt each file in my repository and
   upload it to Amazon S3. I expected storing a file in S3 would
   be extremely simple, but after hours of struggle with
   software libraries in all kinds of different languages I gave
   up on this too.

   I am mystified that this would be so difficult; I just want a
   program in any language that lets me say
   "put this file to S3, return an error if you can't", and
   "get this file from S3, return an error if you can't".

 - Duplicity http://duplicity.nongnu.org/

   I was able to get this to work with only minor effort. I used
   it to back up my 30 GB CVS repository to my shell account on
   Dreamhost, and tested it by restoring a file. I haven't
   really examined duplicity closely so wouldn't rely on this as
   my only backup but it seems to work.

Notes about using duplicity:

1. the first full backup takes a long time, and if it's
   interrupted in the middle (e.g. due to a network hiccup) it
   seems to have to start over from scratch. My first few
   attempts to back up 30 GB to Dreamhost failed after a few GB
   and I had to start over again hours later. I worked around
   this problem with the command line arg:

       --scp-command 'scp -o ConnectionAttempts=1800'

   to tell it to keep retrying for half an hour.

2. Duplicity generates a big file with signatures of all the
   files it backs up; for my 30 GB of data (145674 files) this
   file is 480 MB, which filled up my /tmp the first time. I
   worked around this with:

TMP=$HOME/tmp

3. By default it breaks the backup data into encrypted chunks of
   5 MB each and scp's each chunk to the remote site one at a
   time; to speed this up I installed openssh v4 to benefit from
   its connection reuse features (and started a master ssh
   session between the two systems inside 'screen')
   http://larve.net/people/hugo/2005/blog/2006/04/20/speeding-up-ssh-fsh-vs-openssh-v4/

4. Incremental backups seem to work but are really inefficient
   because the first thing it does each time is download the
   massive 480 MB signature file from the remote server. I
   haven't checked if there's some way to prevent that, e.g. by
   caching the signature file locally. This is probably less of
   an issue if you aren't backing up lots of small files like
   me, because the signature file will be smaller.

5. It encrypts everything with GPG, which I haven't figured out
   a good way to run non-interactively yet. I found gpg-agent
   which presumably works much like ssh-agent but haven't tried
   it yet.

The commands I ended up using to back up my data were:

 For the first full backup:

   TMP=$HOME/tmp duplicity \
       --scp-command 'scp -o ConnectionAttempts=1800' \
$HOME/cvsroot scp://[email protected]/duplicity/bubbles

 To do an incremental against the last full backup:

   TMP=$HOME/tmp duplicity --incremental \
       --scp-command 'scp -o ConnectionAttempts=1800' \
$HOME/cvsroot scp://[email protected]/duplicity/bubbles

 (the --incremental isn't really needed, it just says to abort
 if you can't do an incremental)

 To restore a single file:

   TMP=$HOME/tmp duplicity \
       --file-to-restore www/people/gerald/index.html,v \
       scp://[email protected]/duplicity/bubbles \
       /tmp/restored-index.html,v

Various related stuff I had bookmarked:

   http://www.debian-administration.org/articles/209
     (hmm, haven't read that yet, looks very useful)
   http://aws.amazon.com/s3
   http://www.jungledisk.com/
   https://files.dreamhost.com/
   http://joseph.randomnetworks.com/archives/2006/10/03/amazon-s3-vs-dreamhost/
   http://developer.amazonwebservices.com/connect/thread.jspa?messageID=44471
   http://www.muellerware.org/projects/s3u/index.html
   http://developer.amazonwebservices.com/connect/kbcategory.jspa?categoryID=47
   http://s3.amazonaws.com/ServEdge_pub/s3sync/README_s3cmd.txt
   http://jeremy.zawodny.com/blog/archives/007641.html
   http://web.mit.edu/~emin/www/source_code/dibs/index.html
   http://backuppc.sourceforge.net/

--
Gerald Oskoboiny <[email protected]>
http://impressive.net/people/gerald/

HURL: fogo mailing list archives, maintained by Gerald Oskoboiny