notes.txt
changeset 3538 54246ac4b652
parent 3537 060aefba4459
parent 3514 355f2a85ea23
child 3539 a6dd7ab568cc
equal deleted inserted replaced
3537:060aefba4459 3538:54246ac4b652
     1 Some notes about Mercurial's design
       
     2 
       
     3 Revlogs:
       
     4 
       
     5 The fundamental storage type in Mercurial is a "revlog". A revlog is
       
     6 the set of all revisions to a file. Each revision is either stored
       
     7 compressed in its entirety or as a compressed binary delta against the
       
     8 previous version. The decision of when to store a full version is made
       
     9 based on how much data would be needed to reconstruct the file. This
       
    10 lets us ensure that we never need to read huge amounts of data to
       
    11 reconstruct a file, regardless of how many revisions of it we store.
       
    12 
       
    13 In fact, we should always be able to do it with a single read,
       
    14 provided we know when and where to read. This is where the index comes
       
    15 in. Each revlog has an index containing a special hash (nodeid) of the
       
    16 text, hashes for its parents, and where and how much of the revlog
       
    17 data we need to read to reconstruct it. Thus, with one read of the
       
    18 index and one read of the data, we can reconstruct any version in time
       
    19 proportional to the file size.
       
    20 
       
    21 Similarly, revlogs and their indices are append-only. This means that
       
    22 adding a new version is also O(1) seeks.
       
    23 
       
    24 Generally revlogs are used to represent revisions of files, but they
       
    25 also are used to represent manifests and changesets.
       
    26 
       
    27 Manifests:
       
    28 
       
    29 A manifest is simply a list of all files in a given revision of a
       
    30 project along with the nodeids of the corresponding file revisions. So
       
    31 grabbing a given version of the project means simply looking up its
       
    32 manifest and reconstruction all the file revisions pointed to by it.
       
    33 
       
    34 Changesets:
       
    35 
       
    36 A changeset is a list of all files changed in a check-in along with a
       
    37 change description and some metadata like user and date. It also
       
    38 contains a nodeid to the relevent revision of the manifest. Changesets
       
    39 and manifests are one-to-one, but contain different data for
       
    40 convenience.
       
    41 
       
    42 Nodeids:
       
    43 
       
    44 Nodeids are unique ids that are used to represent the contents of a
       
    45 file AND its position in the project history. That is, if you change a
       
    46 file and then change it back, the result will have a different nodeid
       
    47 because it has different history. This is accomplished by including
       
    48 the parents of a given revision's nodeids with the revision's text
       
    49 when calculating the hash.
       
    50 
       
    51 Graph merging:
       
    52 
       
    53 Nodeids are implemented as they are to simplify merging. Merging a
       
    54 pair of directed acyclic graphs (aka "the family tree" of the file
       
    55 history) requires some method of determining if nodes in different
       
    56 graphs correspond. Simply comparing the contents of the node (by
       
    57 comparing text of given revisions or their hashes) can get confused by
       
    58 identical revisions in the tree.
       
    59 
       
    60 The nodeid approach makes it trivial - the hash uniquely describes a
       
    61 revision's contents and its graph position relative to the root, so
       
    62 merge is simply checking whether each nodeid in graph A is in the hash
       
    63 table of graph B. If not, we pull them in, adding them sequentially to
       
    64 the revlog.
       
    65 
       
    66 Branching and merging:
       
    67 
       
    68 Everything in Mercurial is potentially a branch and every user
       
    69 effectively works in their own branch. When you do a checkout,
       
    70 Mercurial remembers what the parent changeset was and uses it for the
       
    71 next check in.
       
    72 
       
    73 To do a merge of branches in Mercurial, you check out the heads of the
       
    74 two branches into the same working directory which causes a merge to
       
    75 be performed, and then check in the result once you're happy with it.
       
    76 The resulting checkin will have two parents.
       
    77 
       
    78 It decides when a merge is necessary by first determining if there are
       
    79 any uncommitted changes in the working directory. This effectively
       
    80 makes the working directory a branch off the checked in version it's
       
    81 based on. Then it also determines if the working directory is a direct
       
    82 ancestor or descendent of the second version we're attempting to
       
    83 checkout. If neither is true, we simply replace the working directory
       
    84 version with the new version. Otherwise we perform a merge between the
       
    85 two versions.
       
    86 
       
    87 Merging files and manifests:
       
    88 
       
    89 We begin by comparing two versions manifests and deciding which files
       
    90 need to be added, deleted, and merged.
       
    91 
       
    92 Then for each file, we perform a graph merge and resolve as above.
       
    93 It's important to merge files using per-file DAGs rather than just
       
    94 changeset level DAGs as this diagram illustrates:
       
    95 
       
    96 M   M1   M2
       
    97 
       
    98 AB
       
    99  |`-------v     M2 clones M
       
   100 aB       AB     file A is change in mainline
       
   101  |`---v  AB'    file B is changed in M2
       
   102  |   aB / |     M1 clones M
       
   103  |   ab/  |     M1 changes B
       
   104  |   ab'  |     M1 merges from M2, changes to B conflict
       
   105  |    |  A'B'   M2 changes A
       
   106   `---+--.|
       
   107       |  a'B'   M2 merges from mainline, changes to A conflict
       
   108       `--.|
       
   109          ???    depending on which ancestor we choose, we will have
       
   110 	        to redo A hand-merge, B hand-merge, or both
       
   111                 but if we look at the files independently, everything
       
   112 		is fine
       
   113 
       
   114 The result is a merged version in the working directory, waiting for
       
   115 check-in.
       
   116 
       
   117 Rollback:
       
   118 
       
   119 When performing a commit or a merge, we order things so that the
       
   120 changeset entry gets added last. We keep a transaction log of the name
       
   121 of each file touched and its length prior to the transaction. On
       
   122 abort, we simply truncate each file to its prior length. This is one
       
   123 of the nice properties of the append-only structure of the revlogs.
       
   124 We can also reuse this journal for "rollback".
       
   125 
       
   126 Merging between repositories:
       
   127 
       
   128 One of the key features of Mercurial is the ability to merge between
       
   129 independent repositories in a decentralized fashion. Each repository
       
   130 can act as a read-only server or a client. Clients operating by
       
   131 pulling all branches that it hasn't seen from the server and adding
       
   132 them into its graph. This is done in two steps: searching for new
       
   133 "roots" and pulling a "changegroup"
       
   134 
       
   135 Searching for new "roots" begins by finding all new heads and
       
   136 searching backwards from those heads to the first unknown nodes in
       
   137 their respective branches. These nodes are the 'roots' that are used
       
   138 to calculate the 'changegroup': the set of all changesets starting at
       
   139 those roots. Mercurial takes pains to make this search efficient in
       
   140 both bandwidth and round-trips.
       
   141 
       
   142 Once the roots are found, the changegroup can be transferred as a
       
   143 single streaming transfer. This is organized as an ordered set of
       
   144 deltas for changesets, manifests, and files. Large chunks of deltas
       
   145 can be directly added to the repository without unpacking so it's
       
   146 fairly fast.