annotate hgext/largefiles/design.txt @ 15168:cfccd3bee7b3

hgext: add largefiles extension This code has a number of contributors and a complicated history prior to its introduction that can be seen by visiting: https://developers.kilnhg.com/Repo/Kiln/largefiles/largefiles http://hg.gerg.ca/hg-bfiles and looking at the included copyright notices and contributors list.
author various
date Sat, 24 Sep 2011 17:35:45 +0200
parents
children ca51a5dd5d0b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
15168
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
1 = largefiles - manage large binary files =
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
2 This extension is based off of Greg Ward's bfiles extension which can be found
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
3 at http://mercurial.selenic.com/wiki/BfilesExtension.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
4
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
5 == The largefile store ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
6
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
7 largefile stores are, in the typical use case, centralized servers that have
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
8 every past revision of a given binary file. Each largefile is identified by
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
9 its sha1 hash, and all interactions with the store take one of the following
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
10 forms.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
11
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
12 -Download a bfile with this hash
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
13 -Upload a bfile with this hash
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
14 -Check if the store has a bfile with this hash
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
15
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
16 largefiles stores can take one of two forms:
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
17
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
18 -Directories on a network file share
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
19 -Mercurial wireproto servers, either via ssh or http (hgweb)
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
20
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
21 == The Local Repository ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
22
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
23 The local repository has a largefile cache in .hg/largefiles which holds a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
24 subset of the largefiles needed. On a clone only the largefiles at tip are
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
25 downloaded. When largefiles are downloaded from the central store, a copy is
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
26 saved in this store.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
27
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
28 == The Global Cache ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
29
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
30 largefiles in a local repository cache are hardlinked to files in the global
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
31 cache. Before a file is downloaded we check if it is in the global cache.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
32
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
33 == Implementation Details ==
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
34
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
35 Each largefile has a standin which is in .hglf. The standin is tracked by
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
36 Mercurial. The standin contains the SHA1 hash of the largefile. When a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
37 largefile is added/removed/copied/renamed/etc the same operation is applied to
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
38 the standin. Thus the history of the standin is the history of the largefile.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
39
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
40 For performance reasons, the contents of a standin are only updated before a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
41 commit. Standins are added/removed/copied/renamed from add/remove/copy/rename
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
42 Mercurial commands but their contents will not be updated. The contents of a
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
43 standin will always be the hash of the largefile as of the last commit. To
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
44 support some commands (revert) some standins are temporarily updated but will
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
45 be changed back after the command is finished.
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
46
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
47 A Mercurial dirstate object tracks the state of the largefiles. The dirstate
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
48 uses the last modified time and current size to detect if a file has changed
cfccd3bee7b3 hgext: add largefiles extension
various
parents:
diff changeset
49 (without reading the entire contents of the file).