# HG changeset patch # User Matt Mackall # Date 1161811544 18000 # Node ID da9506fe27105dc70da80562d0ef7655400f9c75 # Parent 1295ad66fac8988b09d033b5d00f04b64426456a Remove some old documentation that belongs on the wiki diff -r 1295ad66fac8 -r da9506fe2710 comparison.txt --- a/comparison.txt Wed Oct 25 16:24:28 2006 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,31 +0,0 @@ - Mercurial git BK (*) -storage revlog delta compressed revisions SCCS weave -storage naming by filename by revision hash by filename -merge file DAGs changeset DAG file DAGs? -consistency SHA1 SHA1 CRC -signable? yes yes no - -retrieve file tip O(1) O(1) O(revs) -add rev O(1) O(1) O(revs) -find prev file rev O(1) O(changesets) O(revs) -annotate file O(revs) O(changesets) O(revs) -find file changeset O(1) O(changesets) ? - -checkout O(files) O(files) O(revs)? -commit O(changes) O(changes) ? - 6 patches/s 6 patches/s slow -diff working dir O(changes) O(changes) ? - < 1s < 1s ? -tree diff revs O(changes) O(changes) ? - < 1s < 1s ? -hardlink clone O(files) O(revisions) O(files) - -find remote csets O(log new) rsync: O(revisions) ? - git-http: O(changesets) -pull remote csets O(patch) O(modified files) O(patch) - -repo growth O(patch) O(revisions) O(patch) - kernel history 300M 3.5G? 250M? -lines of code 2500 6500 (+ cogito) ?? - -* I've never used BK so this is just guesses diff -r 1295ad66fac8 -r da9506fe2710 notes.txt --- a/notes.txt Wed Oct 25 16:24:28 2006 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,146 +0,0 @@ -Some notes about Mercurial's design - -Revlogs: - -The fundamental storage type in Mercurial is a "revlog". A revlog is -the set of all revisions to a file. Each revision is either stored -compressed in its entirety or as a compressed binary delta against the -previous version. The decision of when to store a full version is made -based on how much data would be needed to reconstruct the file. This -lets us ensure that we never need to read huge amounts of data to -reconstruct a file, regardless of how many revisions of it we store. - -In fact, we should always be able to do it with a single read, -provided we know when and where to read. This is where the index comes -in. Each revlog has an index containing a special hash (nodeid) of the -text, hashes for its parents, and where and how much of the revlog -data we need to read to reconstruct it. Thus, with one read of the -index and one read of the data, we can reconstruct any version in time -proportional to the file size. - -Similarly, revlogs and their indices are append-only. This means that -adding a new version is also O(1) seeks. - -Generally revlogs are used to represent revisions of files, but they -also are used to represent manifests and changesets. - -Manifests: - -A manifest is simply a list of all files in a given revision of a -project along with the nodeids of the corresponding file revisions. So -grabbing a given version of the project means simply looking up its -manifest and reconstruction all the file revisions pointed to by it. - -Changesets: - -A changeset is a list of all files changed in a check-in along with a -change description and some metadata like user and date. It also -contains a nodeid to the relevent revision of the manifest. Changesets -and manifests are one-to-one, but contain different data for -convenience. - -Nodeids: - -Nodeids are unique ids that are used to represent the contents of a -file AND its position in the project history. That is, if you change a -file and then change it back, the result will have a different nodeid -because it has different history. This is accomplished by including -the parents of a given revision's nodeids with the revision's text -when calculating the hash. - -Graph merging: - -Nodeids are implemented as they are to simplify merging. Merging a -pair of directed acyclic graphs (aka "the family tree" of the file -history) requires some method of determining if nodes in different -graphs correspond. Simply comparing the contents of the node (by -comparing text of given revisions or their hashes) can get confused by -identical revisions in the tree. - -The nodeid approach makes it trivial - the hash uniquely describes a -revision's contents and its graph position relative to the root, so -merge is simply checking whether each nodeid in graph A is in the hash -table of graph B. If not, we pull them in, adding them sequentially to -the revlog. - -Branching and merging: - -Everything in Mercurial is potentially a branch and every user -effectively works in their own branch. When you do a checkout, -Mercurial remembers what the parent changeset was and uses it for the -next check in. - -To do a merge of branches in Mercurial, you check out the heads of the -two branches into the same working directory which causes a merge to -be performed, and then check in the result once you're happy with it. -The resulting checkin will have two parents. - -It decides when a merge is necessary by first determining if there are -any uncommitted changes in the working directory. This effectively -makes the working directory a branch off the checked in version it's -based on. Then it also determines if the working directory is a direct -ancestor or descendent of the second version we're attempting to -checkout. If neither is true, we simply replace the working directory -version with the new version. Otherwise we perform a merge between the -two versions. - -Merging files and manifests: - -We begin by comparing two versions manifests and deciding which files -need to be added, deleted, and merged. - -Then for each file, we perform a graph merge and resolve as above. -It's important to merge files using per-file DAGs rather than just -changeset level DAGs as this diagram illustrates: - -M M1 M2 - -AB - |`-------v M2 clones M -aB AB file A is change in mainline - |`---v AB' file B is changed in M2 - | aB / | M1 clones M - | ab/ | M1 changes B - | ab' | M1 merges from M2, changes to B conflict - | | A'B' M2 changes A - `---+--.| - | a'B' M2 merges from mainline, changes to A conflict - `--.| - ??? depending on which ancestor we choose, we will have - to redo A hand-merge, B hand-merge, or both - but if we look at the files independently, everything - is fine - -The result is a merged version in the working directory, waiting for -check-in. - -Rollback: - -When performing a commit or a merge, we order things so that the -changeset entry gets added last. We keep a transaction log of the name -of each file touched and its length prior to the transaction. On -abort, we simply truncate each file to its prior length. This is one -of the nice properties of the append-only structure of the revlogs. -We can also reuse this journal for "rollback". - -Merging between repositories: - -One of the key features of Mercurial is the ability to merge between -independent repositories in a decentralized fashion. Each repository -can act as a read-only server or a client. Clients operating by -pulling all branches that it hasn't seen from the server and adding -them into its graph. This is done in two steps: searching for new -"roots" and pulling a "changegroup" - -Searching for new "roots" begins by finding all new heads and -searching backwards from those heads to the first unknown nodes in -their respective branches. These nodes are the 'roots' that are used -to calculate the 'changegroup': the set of all changesets starting at -those roots. Mercurial takes pains to make this search efficient in -both bandwidth and round-trips. - -Once the roots are found, the changegroup can be transferred as a -single streaming transfer. This is organized as an ordered set of -deltas for changesets, manifests, and files. Large chunks of deltas -can be directly added to the repository without unpacking so it's -fairly fast.