author | Thomas Arendsen Hein <thomas@intevation.de> |
Sat, 25 Mar 2006 10:31:06 +0100 | |
changeset 2006 | ff8b39daa930 |
parent 1308 | 2073e5a71008 |
child 2227 | 4f072bb06e89 |
permissions | -rw-r--r-- |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
1 |
Some notes about Mercurial's design |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
2 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
3 |
Revlogs: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
4 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
5 |
The fundamental storage type in Mercurial is a "revlog". A revlog is |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
6 |
the set of all revisions to a file. Each revision is either stored |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
7 |
compressed in its entirety or as a compressed binary delta against the |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
8 |
previous version. The decision of when to store a full version is made |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
9 |
based on how much data would be needed to reconstruct the file. This |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
10 |
lets us ensure that we never need to read huge amounts of data to |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
11 |
reconstruct a file, regardless of how many revisions of it we store. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
12 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
13 |
In fact, we should always be able to do it with a single read, |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
14 |
provided we know when and where to read. This is where the index comes |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
15 |
in. Each revlog has an index containing a special hash (nodeid) of the |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
16 |
text, hashes for its parents, and where and how much of the revlog |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
17 |
data we need to read to reconstruct it. Thus, with one read of the |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
18 |
index and one read of the data, we can reconstruct any version in time |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
19 |
proportional to the file size. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
20 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
21 |
Similarly, revlogs and their indices are append-only. This means that |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
22 |
adding a new version is also O(1) seeks. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
23 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
24 |
Generally revlogs are used to represent revisions of files, but they |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
25 |
also are used to represent manifests and changesets. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
26 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
27 |
Manifests: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
28 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
29 |
A manifest is simply a list of all files in a given revision of a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
30 |
project along with the nodeids of the corresponding file revisions. So |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
31 |
grabbing a given version of the project means simply looking up its |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
32 |
manifest and reconstruction all the file revisions pointed to by it. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
33 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
34 |
Changesets: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
35 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
36 |
A changeset is a list of all files changed in a check-in along with a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
37 |
change description and some metadata like user and date. It also |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
38 |
contains a nodeid to the relevent revision of the manifest. Changesets |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
39 |
and manifests are one-to-one, but contain different data for |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
40 |
convenience. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
41 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
42 |
Nodeids: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
43 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
44 |
Nodeids are unique ids that are used to represent the contents of a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
45 |
file AND its position in the project history. That is, if you change a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
46 |
file and then change it back, the result will have a different nodeid |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
47 |
because it has different history. This is accomplished by including |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
48 |
the parents of a given revision's nodeids with the revision's text |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
49 |
when calculating the hash. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
50 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
51 |
Graph merging: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
52 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
53 |
Nodeids are implemented as they are to simplify merging. Merging a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
54 |
pair of directed acyclic graphs (aka "the family tree" of the file |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
55 |
history) requires some method of determining if nodes in different |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
56 |
graphs correspond. Simply comparing the contents of the node (by |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
57 |
comparing text of given revisions or their hashes) can get confused by |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
58 |
identical revisions in the tree. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
59 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
60 |
The nodeid approach makes it trivial - the hash uniquely describes a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
61 |
revision's contents and its graph position relative to the root, so |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
62 |
merge is simply checking whether each nodeid in graph A is in the hash |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
63 |
table of graph B. If not, we pull them in, adding them sequentially to |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
64 |
the revlog. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
65 |
|
260 | 66 |
Branching and merging: |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
67 |
|
260 | 68 |
Everything in Mercurial is potentially a branch and every user |
69 |
effectively works in their own branch. When you do a checkout, |
|
70 |
Mercurial remembers what the parent changeset was and uses it for the |
|
71 |
next check in. |
|
72 |
||
73 |
To do a merge of branches in Mercurial, you check out the heads of the |
|
74 |
two branches into the same working directory which causes a merge to |
|
75 |
be performed, and then check in the result once you're happy with it. |
|
76 |
The resulting checkin will have two parents. |
|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
77 |
|
260 | 78 |
It decides when a merge is necessary by first determining if there are |
79 |
any uncommitted changes in the working directory. This effectively |
|
80 |
makes the working directory a branch off the checked in version it's |
|
81 |
based on. Then it also determines if the working directory is a direct |
|
82 |
ancestor or descendent of the second version we're attempting to |
|
83 |
checkout. If neither is true, we simply replace the working directory |
|
84 |
version with the new version. Otherwise we perform a merge between the |
|
85 |
two versions. |
|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
86 |
|
260 | 87 |
Merging files and manifests: |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
88 |
|
260 | 89 |
We begin by comparing two versions manifests and deciding which files |
90 |
need to be added, deleted, and merged. |
|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
91 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
92 |
Then for each file, we perform a graph merge and resolve as above. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
93 |
It's important to merge files using per-file DAGs rather than just |
1308
2073e5a71008
Cleanup of tabs and trailing spaces.
Thomas Arendsen Hein <thomas@intevation.de>
parents:
260
diff
changeset
|
94 |
changeset level DAGs as this diagram illustrates: |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
95 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
96 |
M M1 M2 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
97 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
98 |
AB |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
99 |
|`-------v M2 clones M |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
100 |
aB AB file A is change in mainline |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
101 |
|`---v AB' file B is changed in M2 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
102 |
| aB / | M1 clones M |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
103 |
| ab/ | M1 changes B |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
104 |
| ab' | M1 merges from M2, changes to B conflict |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
105 |
| | A'B' M2 changes A |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
106 |
`---+--.| |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
107 |
| a'B' M2 merges from mainline, changes to A conflict |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
108 |
`--.| |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
109 |
??? depending on which ancestor we choose, we will have |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
110 |
to redo A hand-merge, B hand-merge, or both |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
111 |
but if we look at the files independently, everything |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
112 |
is fine |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
113 |
|
260 | 114 |
The result is a merged version in the working directory, waiting for |
115 |
check-in. |
|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
116 |
|
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
117 |
Rollback: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
118 |
|
47 | 119 |
When performing a commit or a merge, we order things so that the |
120 |
changeset entry gets added last. We keep a transaction log of the name |
|
121 |
of each file touched and its length prior to the transaction. On |
|
122 |
abort, we simply truncate each file to its prior length. This is one |
|
123 |
of the nice properties of the append-only structure of the revlogs. |
|
260 | 124 |
We can also reuse this journal for "undo". |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
125 |
|
260 | 126 |
Merging between repositories: |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
127 |
|
260 | 128 |
One of the key features of Mercurial is the ability to merge between |
129 |
independent repositories in a decentralized fashion. Each repository |
|
130 |
can act as a read-only server or a client. Clients operating by |
|
131 |
pulling all branches that it hasn't seen from the server and adding |
|
132 |
them into its graph. This is done in two steps: searching for new |
|
133 |
"roots" and pulling a "changegroup" |
|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
134 |
|
260 | 135 |
Searching for new "roots" begins by finding all new heads and |
136 |
searching backwards from those heads to the first unknown nodes in |
|
137 |
their respective branches. These nodes are the 'roots' that are used |
|
138 |
to calculate the 'changegroup': the set of all changesets starting at |
|
139 |
those roots. Mercurial takes pains to make this search efficient in |
|
140 |
both bandwidth and round-trips. |
|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
141 |
|
260 | 142 |
Once the roots are found, the changegroup can be transferred as a |
143 |
single streaming transfer. This is organized as an ordered set of |
|
144 |
deltas for changesets, manifests, and files. Large chunks of deltas |
|
145 |
can be directly added to the repository without unpacking so it's |
|
146 |
fairly fast. |