Mercurial > hg
annotate mercurial/help/internals/linelog.txt @ 38795:422d661056be
linelog: add a Python implementation of the linelog datastructure
This datastructure was originally developed by Jun Wu at Facebook,
inspired by SCCS weaves. It's useful as a cache for blame information,
but also is the magic that makes `hg absorb` easy to implement. In
service of importing the code to Mercurial, I wanted to actually
/understand/ it, and once I did I decided to take a run at
implementing it.
The help/internals/linelog.txt document is the README from Jun Wu's
implementaiton. It all applies to our linelog implementation.
Differential Revision: https://phab.mercurial-scm.org/D3990
author | Augie Fackler <augie@google.com> |
---|---|
date | Mon, 30 Jul 2018 10:42:37 -0400 |
parents | |
children | c10be3fc200b |
rev | line source |
---|---|
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
1 linelog is a storage format inspired by the "Interleaved deltas" idea. See |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
2 https://en.wikipedia.org/wiki/Interleaved_deltas for its introduction. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
4 0. SCCS Weave |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
5 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
6 To understand what linelog is, first we have a quick look at a simplified |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
7 (with header removed) SCCS weave format, which is an implementation of the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
8 "Interleaved deltas" idea. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
9 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
10 0.1 Basic SCCS Weave File Format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
11 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
12 A SCCS weave file consists of plain text lines. Each line is either a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
13 special instruction starting with "^A" or part of the content of the real |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
14 file the weave tracks. There are 3 important operations, where REV denotes |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
15 the revision number: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
16 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
17 ^AI REV, marking the beginning of an insertion block introduced by REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
18 ^AD REV, marking the beginning of a deletion block introduced by REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
19 ^AE REV, marking the end of the block started by "^AI REV" or "^AD REV" |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
20 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
21 Note on revision numbers: For any two different revision numbers, one must |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
22 be an ancestor of the other to make them comparable. This enforces linear |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
23 history. Besides, the comparison functions (">=", "<") should be efficient. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
24 This means, if revisions are strings like git or hg, an external map is |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
25 required to convert them into integers. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
26 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
27 For example, to represent the following changes: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
28 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
29 REV 1 | REV 2 | REV 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
30 ------+-------+------- |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
31 a | a | a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
32 b | b | 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
33 c | 1 | c |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
34 | 2 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
35 | c | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
36 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
37 A possible weave file looks like: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
38 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
39 ^AI 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
40 a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
41 ^AD 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
42 b |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
43 ^AI 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
44 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
45 ^AE 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
46 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
47 ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
48 c |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
49 ^AE 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
50 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
51 An "^AE" does not always match its nearest operation ("^AI" or "^AD"). In |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
52 the above example, "^AE 3" does not match the nearest "^AI 2" but "^AD 3". |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
53 Therefore we need some extra information for "^AE". The SCCS weave uses a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
54 revision number. It could also be a boolean value about whether it is an |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
55 insertion or a deletion (see section 0.4). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
56 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
57 0.2 Checkout |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
58 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
59 The "checkout" operation is to retrieve file content at a given revision, |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
60 say X. It's doable by going through the file line by line and: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
61 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
62 - If meet ^AI rev, and rev > X, find the corresponding ^AE and jump there |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
63 - If meet ^AD rev, and rev <= X, find the corresponding ^AE and jump there |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
64 - Ignore ^AE |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
65 - For normal lines, just output them |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
66 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
67 0.3 Annotate |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
68 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
69 The "annotate" operation is to show extra metadata like the revision number |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
70 and the original line number a line comes from. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
71 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
72 It's basically just a "Checkout". For the extra metadata, they can be stored |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
73 side by side with the line contents. Alternatively, we can infer the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
74 revision number from "^AI"s. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
75 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
76 Some SCM tools have to calculate diffs on the fly and thus are much slower |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
77 on this operation. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
78 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
79 0.4 Tree Structure |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
80 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
81 The word "interleaved" is used because "^AI" .. "^AE" and "^AD" .. "^AE" |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
82 blocks can be interleaved. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
83 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
84 If we consider insertions and deletions separately, they can form tree |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
85 structures, respectively. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
86 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
87 +--- ^AI 1 +--- ^AD 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
88 | +- ^AI 2 | +- ^AD 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
89 | | | | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
90 | +- ^AE 2 | +- ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
91 | | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
92 +--- ^AE 1 +--- ^AE 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
93 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
94 More specifically, it's possible to build a tree for all insertions, where |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
95 the tree node has the structure "(rev, startline, endline)". "startline" is |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
96 the line number of "^AI" and "endline" is the line number of the matched |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
97 "^AE". The tree will have these properties: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
98 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
99 1. child.rev > parent.rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
100 2. child.startline > parent.startline |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
101 3. child.endline < parent.endline |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
102 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
103 A similar tree for all deletions can also be built with the first property |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
104 changed to: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
105 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
106 1. child.rev < parent.rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
107 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
108 0.5 Malformed Cases |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
109 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
110 The following cases are considered malformed in our implementation: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
111 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
112 1. Interleaved insertions, or interleaved deletions. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
113 It can be rewritten to a non-interleaved tree structure. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
114 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
115 ^AI/D x ^AI/D x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
116 ^AI/D y -> ^AI/D y |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
117 ^AE x ^AE y |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
118 ^AE y ^AE x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
119 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
120 2. Nested insertions, where the inner one has a smaller revision number. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
121 It can be rewritten to a non-nested form. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
122 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
123 ^AI x + 1 ^AI x + 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
124 ^AI x -> ^AE x + 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
125 ^AE x ^AI x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
126 ^AE x + 1 ^AE x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
127 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
128 3. Insertion or deletion inside another deletion, where the outer deletion |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
129 block has a smaller revision number. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
130 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
131 ^AD x ^AD x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
132 ^AI/D x + 1 -> ^AE x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
133 ^AE x + 1 ^AI/D x + 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
134 ^AE x ^AE x |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
135 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
136 Some of them may be valid in other implementations for special purposes. For |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
137 example, to "revive" a previously deleted block in a newer revision. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
138 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
139 0.6 Cases Can Be Optimized |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
140 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
141 It's always better to get things nested. For example, the left is more |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
142 efficient than the right while they represent the same content: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
143 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
144 +--- ^AD 2 +- ^AD 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
145 | +- ^AD 1 | LINE A |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
146 | | LINE A +- ^AE 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
147 | +- ^AE 1 +- ^AD 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
148 | LINE B | LINE B |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
149 +--- ^AE 2 +- ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
150 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
151 Our implementation sometimes generates the less efficient data. To always |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
152 get the optimal form, it requires extra code complexity that seems unworthy. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
153 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
154 0.7 Inefficiency |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
155 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
156 The file format can be slow because: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
157 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
158 - Inserting a new line at position P requires rewriting all data after P. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
159 - Finding "^AE" requires walking through the content (O(N), where N is the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
160 number of lines between "^AI/D" and "^AE"). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
161 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
162 1. Linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
163 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
164 The linelog is a binary format that dedicates to speed up mercurial (or |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
165 git)'s "annotate" operation. It's designed to avoid issues mentioned in |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
166 section 0.7. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
167 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
168 1.1 Content Stored |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
169 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
170 Linelog is not another storage for file contents. It only stores line |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
171 numbers and corresponding revision numbers, instead of actual line content. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
172 This is okay for the "annotate" operation because usually the external |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
173 source is fast to checkout the content of a file at a specific revision. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
174 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
175 A typical SCCS weave is also fast on the "grep" operation, which needs |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
176 random accesses to line contents from different revisions of a file. This |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
177 can be slow with linelog's no-line-content design. However we could use |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
178 an extra map ((rev, line num) -> line content) to speed it up. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
179 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
180 Note the revision numbers in linelog should be independent from mercurial |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
181 integer revision numbers. There should be some mapping between linelog rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
182 and hg hash stored side by side, to make the files reusable after being |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
183 copied to another machine. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
184 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
185 1.2 Basic Format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
186 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
187 A linelog file consists of "instruction"s. An "instruction" can be either: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
188 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
189 - JGE REV ADDR # jump to ADDR if rev >= REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
190 - JL REV ADDR # jump to ADDR if rev < REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
191 - LINE REV LINENUM # append the (LINENUM+1)-th line in revision REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
192 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
193 For example, here is the example linelog representing the same file with |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
194 3 revisions mentioned in section 0.1: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
195 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
196 SCCS | Linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
197 Weave | Addr : Instruction |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
198 ------+------+------------- |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
199 ^AI 1 | 0 : JL 1 8 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
200 a | 1 : LINE 1 0 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
201 ^AD 3 | 2 : JGE 3 6 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
202 b | 3 : LINE 1 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
203 ^AI 2 | 4 : JL 2 7 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
204 1 | 5 : LINE 2 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
205 ^AE 3 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
206 2 | 6 : LINE 2 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
207 ^AE 2 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
208 c | 7 : LINE 1 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
209 ^AE 1 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
210 | 8 : END |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
211 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
212 This way, "find ^AE" is O(1) because we just jump there. And we can insert |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
213 new lines without rewriting most part of the file by appending new lines and |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
214 changing a single instruction to jump to them. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
215 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
216 The current implementation uses 64 bits for an instruction: The opcode (JGE, |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
217 JL or LINE) takes 2 bits, REV takes 30 bits and ADDR or LINENUM takes 32 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
218 bits. It also stores the max revision number and buffer size at the first |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
219 64 bits for quick access to these values. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
220 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
221 1.3 Comparing with Mercurial's revlog format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
222 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
223 Apparently, linelog is very different from revlog: linelog stores rev and |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
224 line numbers, while revlog has line contents and other metadata (like |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
225 parents, flags). However, the revlog format could also be used to store rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
226 and line numbers. For example, to speed up the annotate operation, we could |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
227 also pre-calculate annotate results and just store them using the revlog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
228 format. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
229 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
230 Therefore, linelog is actually somehow similar to revlog, with the important |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
231 trade-off that it only supports linear history (mentioned in section 0.1). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
232 Essentially, the differences are: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
233 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
234 a) Linelog is full of deltas, while revlog could contain full file |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
235 contents sometimes. So linelog is smaller. Revlog could trade |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
236 reconstruction speed for file size - best case, revlog is as small as |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
237 linelog. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
238 b) The interleaved delta structure allows skipping large portion of |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
239 uninteresting deltas so linelog's content reconstruction is faster than |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
240 the delta-only version of revlog (however it's possible to construct |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
241 a case where interleaved deltas degrade to plain deltas, so linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
242 worst case would be delta-only revlog). Revlog could trade file size |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
243 for reconstruction speed. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
244 c) Linelog implicitly maintains the order of all lines it stores. So it |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
245 could dump all the lines from all revisions, with a reasonable order. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
246 While revlog could also dump all line additions, it requires extra |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
247 computation to figure out the order putting those lines - that's some |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
248 kind of "merge". |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
249 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
250 "c" makes "hg absorb" easier to implement and makes it possible to do |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
251 "annotate --deleted". |