Mercurial > hg
annotate mercurial/helptext/internals/linelog.txt @ 47233:bcafcd779d2e
branching: merge stable into default
author | Raphaël Gomès <rgomes@octobus.net> |
---|---|
date | Mon, 17 May 2021 15:05:24 +0200 |
parents | 2e017696181f |
children |
rev | line source |
---|---|
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
1 linelog is a storage format inspired by the "Interleaved deltas" idea. See |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
2 https://en.wikipedia.org/wiki/Interleaved_deltas for its introduction. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
4 0. SCCS Weave |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
5 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
6 To understand what linelog is, first we have a quick look at a simplified |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
7 (with header removed) SCCS weave format, which is an implementation of the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
8 "Interleaved deltas" idea. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
9 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
10 0.1 Basic SCCS Weave File Format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
11 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
12 A SCCS weave file consists of plain text lines. Each line is either a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
13 special instruction starting with "^A" or part of the content of the real |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
14 file the weave tracks. There are 3 important operations, where REV denotes |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
15 the revision number: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
16 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
17 ^AI REV, marking the beginning of an insertion block introduced by REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
18 ^AD REV, marking the beginning of a deletion block introduced by REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
19 ^AE REV, marking the end of the block started by "^AI REV" or "^AD REV" |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
20 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
21 Note on revision numbers: For any two different revision numbers, one must |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
22 be an ancestor of the other to make them comparable. This enforces linear |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
23 history. Besides, the comparison functions (">=", "<") should be efficient. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
24 This means, if revisions are strings like git or hg, an external map is |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
25 required to convert them into integers. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
26 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
27 For example, to represent the following changes: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
28 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
29 REV 1 | REV 2 | REV 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
30 ------+-------+------- |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
31 a | a | a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
32 b | b | 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
33 c | 1 | c |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
34 | 2 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
35 | c | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
36 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
37 A possible weave file looks like: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
38 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
39 ^AI 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
40 a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
41 ^AD 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
42 b |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
43 ^AI 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
44 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
45 ^AE 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
46 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
47 ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
48 c |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
49 ^AE 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
50 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
51 An "^AE" does not always match its nearest operation ("^AI" or "^AD"). In |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
52 the above example, "^AE 3" does not match the nearest "^AI 2" but "^AD 3". |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
53 Therefore we need some extra information for "^AE". The SCCS weave uses a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
54 revision number. It could also be a boolean value about whether it is an |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
55 insertion or a deletion (see section 0.4). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
56 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
57 0.2 Checkout |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
58 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
59 The "checkout" operation is to retrieve file content at a given revision, |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
60 say X. It's doable by going through the file line by line and: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
61 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
62 - If meet ^AI rev, and rev > X, find the corresponding ^AE and jump there |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
63 - If meet ^AD rev, and rev <= X, find the corresponding ^AE and jump there |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
64 - Ignore ^AE |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
65 - For normal lines, just output them |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
66 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
67 0.3 Annotate |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
68 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
69 The "annotate" operation is to show extra metadata like the revision number |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
70 and the original line number a line comes from. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
71 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
72 It's basically just a "Checkout". For the extra metadata, they can be stored |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
73 side by side with the line contents. Alternatively, we can infer the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
74 revision number from "^AI"s. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
75 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
76 Some SCM tools have to calculate diffs on the fly and thus are much slower |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
77 on this operation. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
78 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
79 0.4 Tree Structure |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
80 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
81 The word "interleaved" is used because "^AI" .. "^AE" and "^AD" .. "^AE" |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
82 blocks can be interleaved. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
83 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
84 If we consider insertions and deletions separately, they can form tree |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
85 structures, respectively. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
86 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
87 +--- ^AI 1 +--- ^AD 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
88 | +- ^AI 2 | +- ^AD 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
89 | | | | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
90 | +- ^AE 2 | +- ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
91 | | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
92 +--- ^AE 1 +--- ^AE 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
93 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
94 More specifically, it's possible to build a tree for all insertions, where |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
95 the tree node has the structure "(rev, startline, endline)". "startline" is |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
96 the line number of "^AI" and "endline" is the line number of the matched |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
97 "^AE". The tree will have these properties: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
98 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
99 1. child.rev > parent.rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
100 2. child.startline > parent.startline |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
101 3. child.endline < parent.endline |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
102 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
103 A similar tree for all deletions can also be built with the first property |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
104 changed to: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
105 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
106 1. child.rev < parent.rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
107 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
108 0.5 Malformed Cases |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
109 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
110 The following cases are considered malformed in our implementation: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
111 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
112 1. Interleaved insertions, or interleaved deletions. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
113 It can be rewritten to a non-interleaved tree structure. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
114 |
38968 | 115 Take insertions as example, deletions are similar: |
116 | |
117 ^AI x ^AI x | |
118 a a | |
119 ^AI x + 1 -> ^AI x + 1 | |
120 b b | |
121 ^AE x ^AE x + 1 | |
122 c ^AE x | |
123 ^AE x + 1 ^AI x + 1 | |
124 c | |
125 ^AE x + 1 | |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
126 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
127 2. Nested insertions, where the inner one has a smaller revision number. |
38968 | 128 Or nested deletions, where the inner one has a larger revision number. |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
129 It can be rewritten to a non-nested form. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
130 |
38968 | 131 Take insertions as example, deletions are similar: |
132 | |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
133 ^AI x + 1 ^AI x + 1 |
38968 | 134 a a |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
135 ^AI x -> ^AE x + 1 |
38968 | 136 b ^AI x |
137 ^AE x b | |
138 c ^AE x | |
139 ^AE x + 1 ^AI x + 1 | |
140 c | |
141 ^AE x + 1 | |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
142 |
38968 | 143 3. Insertion inside deletion with a smaller revision number. |
144 | |
145 Rewrite by duplicating the content inserted: | |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
146 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
147 ^AD x ^AD x |
38968 | 148 a a |
149 ^AI x + 1 -> b | |
150 b c | |
151 ^AE x + 1 ^AE x | |
152 c ^AI x + 1 | |
153 ^AE x b | |
154 ^AE x + 1 | |
155 | |
156 Note: If "annotate" purely depends on "^AI" information, then the | |
157 duplication content will lose track of where "b" is originally from. | |
38795
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
158 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
159 Some of them may be valid in other implementations for special purposes. For |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
160 example, to "revive" a previously deleted block in a newer revision. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
161 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
162 0.6 Cases Can Be Optimized |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
163 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
164 It's always better to get things nested. For example, the left is more |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
165 efficient than the right while they represent the same content: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
166 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
167 +--- ^AD 2 +- ^AD 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
168 | +- ^AD 1 | LINE A |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
169 | | LINE A +- ^AE 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
170 | +- ^AE 1 +- ^AD 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
171 | LINE B | LINE B |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
172 +--- ^AE 2 +- ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
173 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
174 Our implementation sometimes generates the less efficient data. To always |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
175 get the optimal form, it requires extra code complexity that seems unworthy. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
176 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
177 0.7 Inefficiency |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
178 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
179 The file format can be slow because: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
180 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
181 - Inserting a new line at position P requires rewriting all data after P. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
182 - Finding "^AE" requires walking through the content (O(N), where N is the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
183 number of lines between "^AI/D" and "^AE"). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
184 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
185 1. Linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
186 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
187 The linelog is a binary format that dedicates to speed up mercurial (or |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
188 git)'s "annotate" operation. It's designed to avoid issues mentioned in |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
189 section 0.7. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
190 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
191 1.1 Content Stored |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
192 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
193 Linelog is not another storage for file contents. It only stores line |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
194 numbers and corresponding revision numbers, instead of actual line content. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
195 This is okay for the "annotate" operation because usually the external |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
196 source is fast to checkout the content of a file at a specific revision. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
197 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
198 A typical SCCS weave is also fast on the "grep" operation, which needs |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
199 random accesses to line contents from different revisions of a file. This |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
200 can be slow with linelog's no-line-content design. However we could use |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
201 an extra map ((rev, line num) -> line content) to speed it up. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
202 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
203 Note the revision numbers in linelog should be independent from mercurial |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
204 integer revision numbers. There should be some mapping between linelog rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
205 and hg hash stored side by side, to make the files reusable after being |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
206 copied to another machine. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
207 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
208 1.2 Basic Format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
209 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
210 A linelog file consists of "instruction"s. An "instruction" can be either: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
211 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
212 - JGE REV ADDR # jump to ADDR if rev >= REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
213 - JL REV ADDR # jump to ADDR if rev < REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
214 - LINE REV LINENUM # append the (LINENUM+1)-th line in revision REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
215 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
216 For example, here is the example linelog representing the same file with |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
217 3 revisions mentioned in section 0.1: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
218 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
219 SCCS | Linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
220 Weave | Addr : Instruction |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
221 ------+------+------------- |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
222 ^AI 1 | 0 : JL 1 8 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
223 a | 1 : LINE 1 0 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
224 ^AD 3 | 2 : JGE 3 6 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
225 b | 3 : LINE 1 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
226 ^AI 2 | 4 : JL 2 7 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
227 1 | 5 : LINE 2 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
228 ^AE 3 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
229 2 | 6 : LINE 2 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
230 ^AE 2 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
231 c | 7 : LINE 1 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
232 ^AE 1 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
233 | 8 : END |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
234 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
235 This way, "find ^AE" is O(1) because we just jump there. And we can insert |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
236 new lines without rewriting most part of the file by appending new lines and |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
237 changing a single instruction to jump to them. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
238 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
239 The current implementation uses 64 bits for an instruction: The opcode (JGE, |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
240 JL or LINE) takes 2 bits, REV takes 30 bits and ADDR or LINENUM takes 32 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
241 bits. It also stores the max revision number and buffer size at the first |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
242 64 bits for quick access to these values. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
243 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
244 1.3 Comparing with Mercurial's revlog format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
245 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
246 Apparently, linelog is very different from revlog: linelog stores rev and |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
247 line numbers, while revlog has line contents and other metadata (like |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
248 parents, flags). However, the revlog format could also be used to store rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
249 and line numbers. For example, to speed up the annotate operation, we could |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
250 also pre-calculate annotate results and just store them using the revlog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
251 format. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
252 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
253 Therefore, linelog is actually somehow similar to revlog, with the important |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
254 trade-off that it only supports linear history (mentioned in section 0.1). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
255 Essentially, the differences are: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
256 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
257 a) Linelog is full of deltas, while revlog could contain full file |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
258 contents sometimes. So linelog is smaller. Revlog could trade |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
259 reconstruction speed for file size - best case, revlog is as small as |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
260 linelog. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
261 b) The interleaved delta structure allows skipping large portion of |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
262 uninteresting deltas so linelog's content reconstruction is faster than |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
263 the delta-only version of revlog (however it's possible to construct |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
264 a case where interleaved deltas degrade to plain deltas, so linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
265 worst case would be delta-only revlog). Revlog could trade file size |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
266 for reconstruction speed. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
267 c) Linelog implicitly maintains the order of all lines it stores. So it |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
268 could dump all the lines from all revisions, with a reasonable order. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
269 While revlog could also dump all line additions, it requires extra |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
270 computation to figure out the order putting those lines - that's some |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
271 kind of "merge". |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
272 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
273 "c" makes "hg absorb" easier to implement and makes it possible to do |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
274 "annotate --deleted". |
38968 | 275 |
276 1.4 Malformed Cases Handling | |
277 | |
278 The following "case 1", "case 2", and "case 3" refer to cases mentioned | |
279 in section 0.5. | |
280 | |
281 Using the exposed API (replacelines), case 1 is impossible to generate, | |
282 although it's possible to generate it by constructing rawdata and load that | |
283 via linelog.fromdata. | |
284 | |
285 Doing annotate(maxrev) before replacelines (aka. a1, a2 passed to | |
286 replacelines are related to the latest revision) eliminates the possibility | |
287 of case 3. That makes sense since usually you'd like to make edits on top of | |
288 the latest revision. Practically, both absorb and fastannotate do this. | |
289 | |
290 Doing annotate(maxrev), plus replacelines(rev, ...) where rev >= maxrev | |
291 eliminates the possibility of case 2. That makes sense since usually the | |
292 edits belong to "new revisions", not "old revisions". Practically, | |
293 fastannotate does this. Absorb calls replacelines with rev < maxrev to edit | |
294 past revisions. So it needs some extra care to not generate case 2. | |
295 | |
296 If case 1 occurs, that probably means linelog file corruption (assuming | |
297 linelog is edited via public APIs) the checkout or annotate result could | |
298 be less meaningful or even error out, but linelog wouldn't enter an infinite | |
299 loop. | |
300 | |
301 If either case 2 or 3 occurs, linelog works as if the inner "^AI/D" and "^AE" | |
302 operations on the left side are silently ignored. |