Mercurial > hg-stable
comparison notes.txt @ 0:9117c6561b0b
Add back links from file revisions to changeset revisions
Add simple transaction support
Add hg verify
Improve caching in revlog
Fix a bunch of bugs
Self-hosting now that the metadata is close to finalized
author | mpm@selenic.com |
---|---|
date | Tue, 03 May 2005 13:16:10 -0800 |
parents | |
children | 45cc818f2445 |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:9117c6561b0b |
---|---|
1 Some notes about Mercurial's design | |
2 | |
3 Revlogs: | |
4 | |
5 The fundamental storage type in Mercurial is a "revlog". A revlog is | |
6 the set of all revisions to a file. Each revision is either stored | |
7 compressed in its entirety or as a compressed binary delta against the | |
8 previous version. The decision of when to store a full version is made | |
9 based on how much data would be needed to reconstruct the file. This | |
10 lets us ensure that we never need to read huge amounts of data to | |
11 reconstruct a file, regardless of how many revisions of it we store. | |
12 | |
13 In fact, we should always be able to do it with a single read, | |
14 provided we know when and where to read. This is where the index comes | |
15 in. Each revlog has an index containing a special hash (nodeid) of the | |
16 text, hashes for its parents, and where and how much of the revlog | |
17 data we need to read to reconstruct it. Thus, with one read of the | |
18 index and one read of the data, we can reconstruct any version in time | |
19 proportional to the file size. | |
20 | |
21 Similarly, revlogs and their indices are append-only. This means that | |
22 adding a new version is also O(1) seeks. | |
23 | |
24 Generally revlogs are used to represent revisions of files, but they | |
25 also are used to represent manifests and changesets. | |
26 | |
27 Manifests: | |
28 | |
29 A manifest is simply a list of all files in a given revision of a | |
30 project along with the nodeids of the corresponding file revisions. So | |
31 grabbing a given version of the project means simply looking up its | |
32 manifest and reconstruction all the file revisions pointed to by it. | |
33 | |
34 Changesets: | |
35 | |
36 A changeset is a list of all files changed in a check-in along with a | |
37 change description and some metadata like user and date. It also | |
38 contains a nodeid to the relevent revision of the manifest. Changesets | |
39 and manifests are one-to-one, but contain different data for | |
40 convenience. | |
41 | |
42 Nodeids: | |
43 | |
44 Nodeids are unique ids that are used to represent the contents of a | |
45 file AND its position in the project history. That is, if you change a | |
46 file and then change it back, the result will have a different nodeid | |
47 because it has different history. This is accomplished by including | |
48 the parents of a given revision's nodeids with the revision's text | |
49 when calculating the hash. | |
50 | |
51 Graph merging: | |
52 | |
53 Nodeids are implemented as they are to simplify merging. Merging a | |
54 pair of directed acyclic graphs (aka "the family tree" of the file | |
55 history) requires some method of determining if nodes in different | |
56 graphs correspond. Simply comparing the contents of the node (by | |
57 comparing text of given revisions or their hashes) can get confused by | |
58 identical revisions in the tree. | |
59 | |
60 The nodeid approach makes it trivial - the hash uniquely describes a | |
61 revision's contents and its graph position relative to the root, so | |
62 merge is simply checking whether each nodeid in graph A is in the hash | |
63 table of graph B. If not, we pull them in, adding them sequentially to | |
64 the revlog. | |
65 | |
66 Graph resolving: | |
67 | |
68 Mercurial does branching by copying (or COWing) a repository and thus | |
69 keeps everything nice and linear within a repository. However, when a | |
70 merge of repositories (a "pull") is done, we may often have two head | |
71 revisions in a given graph. To keep things simple, Mercurial forces | |
72 the head revisions to be merged. | |
73 | |
74 It first finds the closest common ancestor of the two heads. If one is | |
75 a child of the other, it becomes the new head. Otherwise, we call out | |
76 to a user-specified 3-way merge tool. | |
77 | |
78 Merging files, manifests, and changesets: | |
79 | |
80 We begin by comparing changeset DAGs, pulling all nodes we don't have | |
81 in our DAG from the other repository. As we do so, we collect a list | |
82 of changed files to merge. | |
83 | |
84 Then for each file, we perform a graph merge and resolve as above. | |
85 It's important to merge files using per-file DAGs rather than just | |
86 changeset level DAGs as this diagram illustrates: | |
87 | |
88 M M1 M2 | |
89 | |
90 AB | |
91 |`-------v M2 clones M | |
92 aB AB file A is change in mainline | |
93 |`---v AB' file B is changed in M2 | |
94 | aB / | M1 clones M | |
95 | ab/ | M1 changes B | |
96 | ab' | M1 merges from M2, changes to B conflict | |
97 | | A'B' M2 changes A | |
98 `---+--.| | |
99 | a'B' M2 merges from mainline, changes to A conflict | |
100 `--.| | |
101 ??? depending on which ancestor we choose, we will have | |
102 to redo A hand-merge, B hand-merge, or both | |
103 but if we look at the files independently, everything | |
104 is fine | |
105 | |
106 After we've merged files, we merge the manifest log DAG and resolve | |
107 additions and deletions. Then we are ready to resolve the changeset | |
108 DAG - if our merge required any changes (the new head is not a | |
109 decendent of our tip), we must create a new changeset describing all | |
110 of the changes needed to merge it into the tip. | |
111 | |
112 Merge performance: | |
113 | |
114 The I/O operations for performing a merge are O(changed files), not | |
115 O(total changes) and in many cases, we needn't even unpack the deltas | |
116 to add them to our repository (though this optimization isn't | |
117 necessary). | |
118 | |
119 Rollback: | |
120 | |
121 Rollback is not yet implemented, but will be easy to add. When | |
122 performing a commit or a merge, we order things so that the changeset | |
123 entry gets added last. We keep a transaction log of the name of each | |
124 file and its length prior to the transaction. On abort, we simply | |
125 truncate each file to its prior length. This is one of the nice | |
126 properties of the append-only structure of the revlogs. | |
127 | |
128 Remote access: | |
129 | |
130 Mercurial currently supports pulling from "serverless" repositories. | |
131 Simply making the repo directory accessibly via the web and pointing | |
132 hg at it can accomplish a pull. This is relatively bandwidth efficient | |
133 but no effort has been spent on pipelining, so it won't work | |
134 especially well over LAN yet. | |
135 | |
136 It's also quite amenable to rsync, if you don't mind keeping an intact | |
137 copy of the master around locally. | |
138 | |
139 Also note the append-only and ordering properties of the commit | |
140 guarantee that readers will always see a repository in a consistent | |
141 state and no special locking is necessary. As there is generally only | |
142 one writer to an hg repository, there is in fact no exclusion | |
143 implemented yet. | |
144 | |
145 | |
146 Some comparisons to git: | |
147 | |
148 Most notably, Mercurial uses delta compression and repositories | |
149 created with it will grow much more slowly over time. This also allows | |
150 it to be much more bandwidth efficient. I expect repos sizes and sync | |
151 speeds to be similar to or better than BK, given the use of binary diffs. | |
152 | |
153 Mercurial is roughly the same performance as git and is faster in | |
154 others as it keeps around more metadata. One example is listing and | |
155 retrieving past versions of a file, which it can do without reading | |
156 all the changesets. This metadata will also allow it to perform better | |
157 merges as described above. | |
158 | |
159 |