hgext/largefiles/design.txt
author Matt Mackall <mpm@selenic.com>
Fri, 21 Oct 2011 16:52:16 -0500
branchstable
changeset 15333 f37b71fec602
parent 15315 ca51a5dd5d0b
permissions -rw-r--r--
largefiles: py2.4 doesn't have BaseException ..and it's the wrong base class anyway.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
15168
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     1
= largefiles - manage large binary files =
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     2
This extension is based off of Greg Ward's bfiles extension which can be found
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     3
at http://mercurial.selenic.com/wiki/BfilesExtension.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     4
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     5
== The largefile store ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     6
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     7
largefile stores are, in the typical use case, centralized servers that have
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     8
every past revision of a given binary file.  Each largefile is identified by
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
     9
its sha1 hash, and all interactions with the store take one of the following
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    10
forms.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    11
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    12
-Download a bfile with this hash
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    13
-Upload a bfile with this hash
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    14
-Check if the store has a bfile with this hash
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    15
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    16
largefiles stores can take one of two forms:
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    17
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    18
-Directories on a network file share
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    19
-Mercurial wireproto servers, either via ssh or http (hgweb)
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    20
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    21
== The Local Repository ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    22
15315
ca51a5dd5d0b largefiles: fix documentation to match desired behavior
Benjamin Pollack <benjamin@bitquabit.com>
parents: 15168
diff changeset
    23
The local repository has a largefile store in .hg/largefiles which holds a
15168
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    24
subset of the largefiles needed. On a clone only the largefiles at tip are
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    25
downloaded. When largefiles are downloaded from the central store, a copy is
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    26
saved in this store.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    27
15315
ca51a5dd5d0b largefiles: fix documentation to match desired behavior
Benjamin Pollack <benjamin@bitquabit.com>
parents: 15168
diff changeset
    28
== The User Cache ==
15168
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    29
15315
ca51a5dd5d0b largefiles: fix documentation to match desired behavior
Benjamin Pollack <benjamin@bitquabit.com>
parents: 15168
diff changeset
    30
largefiles in a local repository store are hardlinked to files in the user
ca51a5dd5d0b largefiles: fix documentation to match desired behavior
Benjamin Pollack <benjamin@bitquabit.com>
parents: 15168
diff changeset
    31
cache. Before a file is downloaded we check if it is in the global cache,
ca51a5dd5d0b largefiles: fix documentation to match desired behavior
Benjamin Pollack <benjamin@bitquabit.com>
parents: 15168
diff changeset
    32
hard-linking to the local store if we find it.
15168
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    33
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    34
== Implementation Details ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    35
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    36
Each largefile has a standin which is in .hglf. The standin is tracked by
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    37
Mercurial.  The standin contains the SHA1 hash of the largefile. When a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    38
largefile is added/removed/copied/renamed/etc the same operation is applied to
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    39
the standin. Thus the history of the standin is the history of the largefile.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    40
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    41
For performance reasons, the contents of a standin are only updated before a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    42
commit.  Standins are added/removed/copied/renamed from add/remove/copy/rename
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    43
Mercurial commands but their contents will not be updated. The contents of a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    44
standin will always be the hash of the largefile as of the last commit. To
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    45
support some commands (revert) some standins are temporarily updated but will
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    46
be changed back after the command is finished.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    47
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    48
A Mercurial dirstate object tracks the state of the largefiles. The dirstate
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    49
uses the last modified time and current size to detect if a file has changed
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
    50
(without reading the entire contents of the file).