Mercurial > hg-stable
annotate notes.txt @ 65:d40cc5aacc31 0.4f
Fix up a bunch of bugs in the new merge code
Move getchangegroup/addchangegroup to generators
author | mpm@selenic.com |
---|---|
date | Fri, 13 May 2005 11:47:16 -0800 |
parents | 2bff7c0ea1d3 |
children | d7ce76d82876 |
rev | line source |
---|---|
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
1 Some notes about Mercurial's design |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
2 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
3 Revlogs: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
4 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
5 The fundamental storage type in Mercurial is a "revlog". A revlog is |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
6 the set of all revisions to a file. Each revision is either stored |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
7 compressed in its entirety or as a compressed binary delta against the |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
8 previous version. The decision of when to store a full version is made |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
9 based on how much data would be needed to reconstruct the file. This |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
10 lets us ensure that we never need to read huge amounts of data to |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
11 reconstruct a file, regardless of how many revisions of it we store. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
12 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
13 In fact, we should always be able to do it with a single read, |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
14 provided we know when and where to read. This is where the index comes |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
15 in. Each revlog has an index containing a special hash (nodeid) of the |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
16 text, hashes for its parents, and where and how much of the revlog |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
17 data we need to read to reconstruct it. Thus, with one read of the |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
18 index and one read of the data, we can reconstruct any version in time |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
19 proportional to the file size. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
20 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
21 Similarly, revlogs and their indices are append-only. This means that |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
22 adding a new version is also O(1) seeks. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
23 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
24 Generally revlogs are used to represent revisions of files, but they |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
25 also are used to represent manifests and changesets. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
26 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
27 Manifests: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
28 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
29 A manifest is simply a list of all files in a given revision of a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
30 project along with the nodeids of the corresponding file revisions. So |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
31 grabbing a given version of the project means simply looking up its |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
32 manifest and reconstruction all the file revisions pointed to by it. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
33 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
34 Changesets: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
35 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
36 A changeset is a list of all files changed in a check-in along with a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
37 change description and some metadata like user and date. It also |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
38 contains a nodeid to the relevent revision of the manifest. Changesets |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
39 and manifests are one-to-one, but contain different data for |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
40 convenience. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
41 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
42 Nodeids: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
43 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
44 Nodeids are unique ids that are used to represent the contents of a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
45 file AND its position in the project history. That is, if you change a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
46 file and then change it back, the result will have a different nodeid |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
47 because it has different history. This is accomplished by including |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
48 the parents of a given revision's nodeids with the revision's text |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
49 when calculating the hash. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
50 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
51 Graph merging: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
52 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
53 Nodeids are implemented as they are to simplify merging. Merging a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
54 pair of directed acyclic graphs (aka "the family tree" of the file |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
55 history) requires some method of determining if nodes in different |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
56 graphs correspond. Simply comparing the contents of the node (by |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
57 comparing text of given revisions or their hashes) can get confused by |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
58 identical revisions in the tree. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
59 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
60 The nodeid approach makes it trivial - the hash uniquely describes a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
61 revision's contents and its graph position relative to the root, so |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
62 merge is simply checking whether each nodeid in graph A is in the hash |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
63 table of graph B. If not, we pull them in, adding them sequentially to |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
64 the revlog. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
65 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
66 Graph resolving: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
67 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
68 Mercurial does branching by copying (or COWing) a repository and thus |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
69 keeps everything nice and linear within a repository. However, when a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
70 merge of repositories (a "pull") is done, we may often have two head |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
71 revisions in a given graph. To keep things simple, Mercurial forces |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
72 the head revisions to be merged. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
73 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
74 It first finds the closest common ancestor of the two heads. If one is |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
75 a child of the other, it becomes the new head. Otherwise, we call out |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
76 to a user-specified 3-way merge tool. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
77 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
78 Merging files, manifests, and changesets: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
79 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
80 We begin by comparing changeset DAGs, pulling all nodes we don't have |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
81 in our DAG from the other repository. As we do so, we collect a list |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
82 of changed files to merge. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
83 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
84 Then for each file, we perform a graph merge and resolve as above. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
85 It's important to merge files using per-file DAGs rather than just |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
86 changeset level DAGs as this diagram illustrates: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
87 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
88 M M1 M2 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
89 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
90 AB |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
91 |`-------v M2 clones M |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
92 aB AB file A is change in mainline |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
93 |`---v AB' file B is changed in M2 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
94 | aB / | M1 clones M |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
95 | ab/ | M1 changes B |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
96 | ab' | M1 merges from M2, changes to B conflict |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
97 | | A'B' M2 changes A |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
98 `---+--.| |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
99 | a'B' M2 merges from mainline, changes to A conflict |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
100 `--.| |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
101 ??? depending on which ancestor we choose, we will have |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
102 to redo A hand-merge, B hand-merge, or both |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
103 but if we look at the files independently, everything |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
104 is fine |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
105 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
106 After we've merged files, we merge the manifest log DAG and resolve |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
107 additions and deletions. Then we are ready to resolve the changeset |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
108 DAG - if our merge required any changes (the new head is not a |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
109 decendent of our tip), we must create a new changeset describing all |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
110 of the changes needed to merge it into the tip. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
111 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
112 Merge performance: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
113 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
114 The I/O operations for performing a merge are O(changed files), not |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
115 O(total changes) and in many cases, we needn't even unpack the deltas |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
116 to add them to our repository (though this optimization isn't |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
117 necessary). |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
118 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
119 Rollback: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
120 |
47 | 121 When performing a commit or a merge, we order things so that the |
122 changeset entry gets added last. We keep a transaction log of the name | |
123 of each file touched and its length prior to the transaction. On | |
124 abort, we simply truncate each file to its prior length. This is one | |
125 of the nice properties of the append-only structure of the revlogs. | |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
126 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
127 Remote access: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
128 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
129 Mercurial currently supports pulling from "serverless" repositories. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
130 Simply making the repo directory accessibly via the web and pointing |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
131 hg at it can accomplish a pull. This is relatively bandwidth efficient |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
132 but no effort has been spent on pipelining, so it won't work |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
133 especially well over LAN yet. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
134 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
135 It's also quite amenable to rsync, if you don't mind keeping an intact |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
136 copy of the master around locally. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
137 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
138 Also note the append-only and ordering properties of the commit |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
139 guarantee that readers will always see a repository in a consistent |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
140 state and no special locking is necessary. As there is generally only |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
141 one writer to an hg repository, there is in fact no exclusion |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
142 implemented yet. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
143 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
144 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
145 Some comparisons to git: |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
146 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
147 Most notably, Mercurial uses delta compression and repositories |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
148 created with it will grow much more slowly over time. This also allows |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
149 it to be much more bandwidth efficient. I expect repos sizes and sync |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
150 speeds to be similar to or better than BK, given the use of binary diffs. |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
151 |
59 | 152 Mercurial is roughly the same performance as git in some areas and is |
153 faster in others as it keeps around more metadata. One example is | |
154 listing and retrieving past versions of a file, which it can do | |
155 without reading all the changesets. This metadata will also allow it | |
156 to perform better merges as described above. | |
0
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
157 |
9117c6561b0b
Add back links from file revisions to changeset revisions
mpm@selenic.com
parents:
diff
changeset
|
158 |