[ View menu ]
Main

Keep your files in sync for free

Filed in Ideas ,R ,Tools
Subscribe to Decision Science News by Email (one email per week, easy unsubscribe)

COSTLESS FILE SYNCHRONIZATION TECHNIQUES IN INCREASING ORDER OF COMPLEXITY

It is not uncommon to have two computers at work, four at home, and a server out on the wide open internet. How to keep all these files in sync? Here are some file synchronization tools that we use, listed in increasing order of complexity.


Dropbox dropbox.com
Setup: Easy.
OS: Windows, Linux, or Mac
We use this at Decision Science News, but only for some of our files. New users get 2GB of storage for free (or 2.25 GB if they use this link). If used sparingly it can last a long time. We find this especially useful for open-updated files that one doesn’t want out of sync for even a minute. For files that only need to be synced every day or so, we use Unison, covered next.


Unison www.cis.upenn.edu/~bcpierce/unison
Setup: Moderate for USB drive use, hard for network use (requires installing server software)
OS: Windows, Linux, or Mac

We sync about 10GB of files with unison. Unison works across the network or with a portable USB drive. Like the other solutions listed here, it magically only needs to sync the differences between files, which is much faster than moving whole files around. We have unison run as a scheduled task to make sure files get synched at least daily.

The best Unison tip is to set up a “star” configuration. That is, you designate one server (or one USB drive) as the hub and all your other machines as spokes off of it. You sync each spoke with the hub, and never sync one spoke directly to another spoke.

On a Windows7 system, unison will create a .unison folder in the C:\Users\YourUserName directory. You can put configuration files (with .prf extensions) there to tell unison what to do. Here’s a sample config file to sync the directory C:\DG on your machine to a folder E:\DG, on a USB drive.

==myconfig.prf == (assumes Unison 2.27.57 is installed on the server)
root = C:\DG
root = E:\DG
batch=true
fastcheck=true
log=true

We wrote a little batch file to start the sync process:
==sync.bat contents== (assumes Unison is installed under C:)

"C:\Unison-2.27.57 Text.exe" myconfig

When getting started, there’s a GUI version of Unison that helps you get the knack of it. For everyday use, the text version (called from our batch file, above) is the way to go.

Want to sync to a server instead of a USB drive? Here is an example config file we use to sync a local directory (C:\DG) to a directory on a linux server (/home/dsn/DG). We sync all our computers (the spokes) with this same linux server directory (the hub), which keeps all our computers in sync.

==myconfig.prf == (assumes Unison 2.27.57 is installed on the server and that ssh is installed on the Windows machine)
root = C:\DG
root = ssh://dsn@ourserver.com//home/dsn/DG
batch=true
fastcheck=true
log=true


Subversion subversion.tigris.org
Setup: Hard (Need to know how to install and configure client and server software)
OS: Windows, Linux, Mac
Built as a version control system for programmers, some people use Subversion to keep all their files in sync. It is a programmer’s tool and not easy to learn, though if you read the free subversion book and are handy with computers, you can learn it. You’ll want to have a server running on the network somewhere to make this a viable option.

We use subversion to keep our research projects (R source code, documentation, LaTeX writeups, images, PDFs of articles, small data sets) synched across many machines.


Lsyncd code.google.com/p/lsyncd
Setup: Hard (Need to know how to install and compile server software)
OS: Linux only

Also not built for the purpose, Lsyncd can be used in conjunction with Unison to keep files in sync. Lsyncd (or “live synching daemon”) is a program that watches a bunch of files waiting for any of the to be changed. Once a change occurs, it can trigger arbitrary actions, such as synching them. J D Long uses lsyncd to keep his R files (specifically, R Studio output) in sync with his local machine. Post 1. Post 2. At DSN, we use lsyncd to create a magic folder on our server that pushes R plots generated on the server back to our PC automatically.

ADDENDUM

Some other ideas have been coming in through the comments. I will list them here for posterity.

  • Box.net
  • DVCS-Autosync
  • Rsync
  • Sparkleshare
  • Sugarsync
  • Ubuntu one
  • Wuala

8 Comments

  1. Jon Baron says:

    You omitted the only ones that I use!

    Rsync:
    http://en.wikipedia.org/wiki/Rsync

    Git:
    http://git-scm.com/
    This is for collaborative projects, when several people are working on the same files (documents or code).

    cp
    http://en.wikipedia.org/wiki/Cp_%28Unix%29

    I use cp in combination with sshfs:
    http://fuse.sourceforge.net/sshfs.html

    In some ways, cp is the best combination of power and ease of use for simple backups, unless you have huge files with small changes, in which case the advantages of rsync or unison (which back up pieces of files rather than entire files) become important.

    November 20, 2011 @ 7:22 am

  2. dan says:

    Jon: As you mention, Unison uses rsync, but I prefer Unison because has added features that are nice for two-way synching. I don’t use Git much, but sure, why not? Not sure how to easily achieve sync (as opposed to backup) with cp.

    November 20, 2011 @ 10:54 am

  3. Hasan Diwan says:

    I use cron, along with a VCS (subversion/git/arch/darcs/what-have-you), to keep files in sync across machines.

    November 20, 2011 @ 8:13 am

  4. dan says:

    Hasan: How are you using cron along with a vcs? Does each client machine do a checkout from the repository every day?

    November 20, 2011 @ 10:46 am

  5. Dirk Dittmer says:

    Thanks for the post. This is really becoming a big problem.

    November 20, 2011 @ 8:26 am

  6. Hasan Diwan says:

    Dan,
    We use subversion at work, so there’s a cronjob to update at 9am and 5pm every weekday.

    November 21, 2011 @ 9:10 am

  7. Locklin says:

    There’s also the “Ubuntu One” that comes default with the Ubuntu Linux system. That combined with the fact that R is easily installed from the package manager makes it a dead easy combination.

    November 21, 2011 @ 10:19 am

  8. Dirk Eddelbuettel says:

    Dan,

    Nicely comprehensive survey. Here is one more solution I had also emailed to JD after he blogged about his little hack: http://mayrhofer.eu.org/dvcs-autosync

    This uses a combination of file-system polling /events together with git as a backend to provide a private replacement of dropbox.

    Cheers, Dirk

    November 21, 2011 @ 4:57 pm

RSS feed Comments

Write Comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>