annotate mercurial/helptext/internals/linelog.txt @ 49541:976648e20856

tests: remove non-python3 line matching and tests block We don't support Python2 anymore
author Pierre-Yves David <pierre-yves.david@octobus.net>
date Thu, 22 Sep 2022 16:27:17 +0200
parents 2e017696181f
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
1 linelog is a storage format inspired by the "Interleaved deltas" idea. See
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
2 https://en.wikipedia.org/wiki/Interleaved_deltas for its introduction.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
4 0. SCCS Weave
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
5
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
6 To understand what linelog is, first we have a quick look at a simplified
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
7 (with header removed) SCCS weave format, which is an implementation of the
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
8 "Interleaved deltas" idea.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
9
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
10 0.1 Basic SCCS Weave File Format
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
11
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
12 A SCCS weave file consists of plain text lines. Each line is either a
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
13 special instruction starting with "^A" or part of the content of the real
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
14 file the weave tracks. There are 3 important operations, where REV denotes
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
15 the revision number:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
16
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
17 ^AI REV, marking the beginning of an insertion block introduced by REV
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
18 ^AD REV, marking the beginning of a deletion block introduced by REV
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
19 ^AE REV, marking the end of the block started by "^AI REV" or "^AD REV"
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
20
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
21 Note on revision numbers: For any two different revision numbers, one must
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
22 be an ancestor of the other to make them comparable. This enforces linear
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
23 history. Besides, the comparison functions (">=", "<") should be efficient.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
24 This means, if revisions are strings like git or hg, an external map is
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
25 required to convert them into integers.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
26
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
27 For example, to represent the following changes:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
28
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
29 REV 1 | REV 2 | REV 3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
30 ------+-------+-------
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
31 a | a | a
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
32 b | b | 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
33 c | 1 | c
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
34 | 2 |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
35 | c |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
36
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
37 A possible weave file looks like:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
38
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
39 ^AI 1
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
40 a
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
41 ^AD 3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
42 b
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
43 ^AI 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
44 1
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
45 ^AE 3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
46 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
47 ^AE 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
48 c
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
49 ^AE 1
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
50
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
51 An "^AE" does not always match its nearest operation ("^AI" or "^AD"). In
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
52 the above example, "^AE 3" does not match the nearest "^AI 2" but "^AD 3".
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
53 Therefore we need some extra information for "^AE". The SCCS weave uses a
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
54 revision number. It could also be a boolean value about whether it is an
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
55 insertion or a deletion (see section 0.4).
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
56
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
57 0.2 Checkout
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
58
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
59 The "checkout" operation is to retrieve file content at a given revision,
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
60 say X. It's doable by going through the file line by line and:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
61
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
62 - If meet ^AI rev, and rev > X, find the corresponding ^AE and jump there
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
63 - If meet ^AD rev, and rev <= X, find the corresponding ^AE and jump there
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
64 - Ignore ^AE
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
65 - For normal lines, just output them
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
66
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
67 0.3 Annotate
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
68
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
69 The "annotate" operation is to show extra metadata like the revision number
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
70 and the original line number a line comes from.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
71
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
72 It's basically just a "Checkout". For the extra metadata, they can be stored
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
73 side by side with the line contents. Alternatively, we can infer the
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
74 revision number from "^AI"s.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
75
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
76 Some SCM tools have to calculate diffs on the fly and thus are much slower
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
77 on this operation.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
78
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
79 0.4 Tree Structure
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
80
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
81 The word "interleaved" is used because "^AI" .. "^AE" and "^AD" .. "^AE"
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
82 blocks can be interleaved.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
83
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
84 If we consider insertions and deletions separately, they can form tree
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
85 structures, respectively.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
86
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
87 +--- ^AI 1 +--- ^AD 3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
88 | +- ^AI 2 | +- ^AD 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
89 | | | |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
90 | +- ^AE 2 | +- ^AE 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
91 | |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
92 +--- ^AE 1 +--- ^AE 3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
93
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
94 More specifically, it's possible to build a tree for all insertions, where
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
95 the tree node has the structure "(rev, startline, endline)". "startline" is
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
96 the line number of "^AI" and "endline" is the line number of the matched
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
97 "^AE". The tree will have these properties:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
98
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
99 1. child.rev > parent.rev
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
100 2. child.startline > parent.startline
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
101 3. child.endline < parent.endline
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
102
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
103 A similar tree for all deletions can also be built with the first property
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
104 changed to:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
105
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
106 1. child.rev < parent.rev
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
107
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
108 0.5 Malformed Cases
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
109
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
110 The following cases are considered malformed in our implementation:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
111
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
112 1. Interleaved insertions, or interleaved deletions.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
113 It can be rewritten to a non-interleaved tree structure.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
114
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
115 Take insertions as example, deletions are similar:
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
116
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
117 ^AI x ^AI x
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
118 a a
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
119 ^AI x + 1 -> ^AI x + 1
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
120 b b
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
121 ^AE x ^AE x + 1
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
122 c ^AE x
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
123 ^AE x + 1 ^AI x + 1
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
124 c
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
125 ^AE x + 1
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
126
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
127 2. Nested insertions, where the inner one has a smaller revision number.
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
128 Or nested deletions, where the inner one has a larger revision number.
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
129 It can be rewritten to a non-nested form.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
130
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
131 Take insertions as example, deletions are similar:
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
132
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
133 ^AI x + 1 ^AI x + 1
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
134 a a
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
135 ^AI x -> ^AE x + 1
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
136 b ^AI x
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
137 ^AE x b
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
138 c ^AE x
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
139 ^AE x + 1 ^AI x + 1
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
140 c
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
141 ^AE x + 1
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
142
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
143 3. Insertion inside deletion with a smaller revision number.
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
144
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
145 Rewrite by duplicating the content inserted:
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
146
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
147 ^AD x ^AD x
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
148 a a
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
149 ^AI x + 1 -> b
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
150 b c
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
151 ^AE x + 1 ^AE x
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
152 c ^AI x + 1
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
153 ^AE x b
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
154 ^AE x + 1
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
155
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
156 Note: If "annotate" purely depends on "^AI" information, then the
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
157 duplication content will lose track of where "b" is originally from.
38795
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
158
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
159 Some of them may be valid in other implementations for special purposes. For
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
160 example, to "revive" a previously deleted block in a newer revision.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
161
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
162 0.6 Cases Can Be Optimized
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
163
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
164 It's always better to get things nested. For example, the left is more
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
165 efficient than the right while they represent the same content:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
166
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
167 +--- ^AD 2 +- ^AD 1
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
168 | +- ^AD 1 | LINE A
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
169 | | LINE A +- ^AE 1
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
170 | +- ^AE 1 +- ^AD 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
171 | LINE B | LINE B
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
172 +--- ^AE 2 +- ^AE 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
173
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
174 Our implementation sometimes generates the less efficient data. To always
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
175 get the optimal form, it requires extra code complexity that seems unworthy.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
176
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
177 0.7 Inefficiency
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
178
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
179 The file format can be slow because:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
180
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
181 - Inserting a new line at position P requires rewriting all data after P.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
182 - Finding "^AE" requires walking through the content (O(N), where N is the
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
183 number of lines between "^AI/D" and "^AE").
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
184
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
185 1. Linelog
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
186
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
187 The linelog is a binary format that dedicates to speed up mercurial (or
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
188 git)'s "annotate" operation. It's designed to avoid issues mentioned in
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
189 section 0.7.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
190
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
191 1.1 Content Stored
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
192
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
193 Linelog is not another storage for file contents. It only stores line
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
194 numbers and corresponding revision numbers, instead of actual line content.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
195 This is okay for the "annotate" operation because usually the external
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
196 source is fast to checkout the content of a file at a specific revision.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
197
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
198 A typical SCCS weave is also fast on the "grep" operation, which needs
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
199 random accesses to line contents from different revisions of a file. This
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
200 can be slow with linelog's no-line-content design. However we could use
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
201 an extra map ((rev, line num) -> line content) to speed it up.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
202
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
203 Note the revision numbers in linelog should be independent from mercurial
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
204 integer revision numbers. There should be some mapping between linelog rev
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
205 and hg hash stored side by side, to make the files reusable after being
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
206 copied to another machine.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
207
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
208 1.2 Basic Format
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
209
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
210 A linelog file consists of "instruction"s. An "instruction" can be either:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
211
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
212 - JGE REV ADDR # jump to ADDR if rev >= REV
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
213 - JL REV ADDR # jump to ADDR if rev < REV
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
214 - LINE REV LINENUM # append the (LINENUM+1)-th line in revision REV
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
215
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
216 For example, here is the example linelog representing the same file with
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
217 3 revisions mentioned in section 0.1:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
218
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
219 SCCS | Linelog
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
220 Weave | Addr : Instruction
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
221 ------+------+-------------
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
222 ^AI 1 | 0 : JL 1 8
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
223 a | 1 : LINE 1 0
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
224 ^AD 3 | 2 : JGE 3 6
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
225 b | 3 : LINE 1 1
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
226 ^AI 2 | 4 : JL 2 7
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
227 1 | 5 : LINE 2 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
228 ^AE 3 |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
229 2 | 6 : LINE 2 3
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
230 ^AE 2 |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
231 c | 7 : LINE 1 2
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
232 ^AE 1 |
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
233 | 8 : END
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
234
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
235 This way, "find ^AE" is O(1) because we just jump there. And we can insert
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
236 new lines without rewriting most part of the file by appending new lines and
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
237 changing a single instruction to jump to them.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
238
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
239 The current implementation uses 64 bits for an instruction: The opcode (JGE,
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
240 JL or LINE) takes 2 bits, REV takes 30 bits and ADDR or LINENUM takes 32
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
241 bits. It also stores the max revision number and buffer size at the first
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
242 64 bits for quick access to these values.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
243
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
244 1.3 Comparing with Mercurial's revlog format
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
245
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
246 Apparently, linelog is very different from revlog: linelog stores rev and
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
247 line numbers, while revlog has line contents and other metadata (like
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
248 parents, flags). However, the revlog format could also be used to store rev
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
249 and line numbers. For example, to speed up the annotate operation, we could
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
250 also pre-calculate annotate results and just store them using the revlog
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
251 format.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
252
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
253 Therefore, linelog is actually somehow similar to revlog, with the important
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
254 trade-off that it only supports linear history (mentioned in section 0.1).
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
255 Essentially, the differences are:
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
256
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
257 a) Linelog is full of deltas, while revlog could contain full file
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
258 contents sometimes. So linelog is smaller. Revlog could trade
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
259 reconstruction speed for file size - best case, revlog is as small as
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
260 linelog.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
261 b) The interleaved delta structure allows skipping large portion of
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
262 uninteresting deltas so linelog's content reconstruction is faster than
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
263 the delta-only version of revlog (however it's possible to construct
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
264 a case where interleaved deltas degrade to plain deltas, so linelog
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
265 worst case would be delta-only revlog). Revlog could trade file size
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
266 for reconstruction speed.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
267 c) Linelog implicitly maintains the order of all lines it stores. So it
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
268 could dump all the lines from all revisions, with a reasonable order.
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
269 While revlog could also dump all line additions, it requires extra
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
270 computation to figure out the order putting those lines - that's some
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
271 kind of "merge".
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
272
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
273 "c" makes "hg absorb" easier to implement and makes it possible to do
422d661056be linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff changeset
274 "annotate --deleted".
38968
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
275
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
276 1.4 Malformed Cases Handling
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
277
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
278 The following "case 1", "case 2", and "case 3" refer to cases mentioned
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
279 in section 0.5.
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
280
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
281 Using the exposed API (replacelines), case 1 is impossible to generate,
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
282 although it's possible to generate it by constructing rawdata and load that
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
283 via linelog.fromdata.
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
284
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
285 Doing annotate(maxrev) before replacelines (aka. a1, a2 passed to
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
286 replacelines are related to the latest revision) eliminates the possibility
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
287 of case 3. That makes sense since usually you'd like to make edits on top of
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
288 the latest revision. Practically, both absorb and fastannotate do this.
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
289
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
290 Doing annotate(maxrev), plus replacelines(rev, ...) where rev >= maxrev
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
291 eliminates the possibility of case 2. That makes sense since usually the
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
292 edits belong to "new revisions", not "old revisions". Practically,
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
293 fastannotate does this. Absorb calls replacelines with rev < maxrev to edit
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
294 past revisions. So it needs some extra care to not generate case 2.
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
295
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
296 If case 1 occurs, that probably means linelog file corruption (assuming
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
297 linelog is edited via public APIs) the checkout or annotate result could
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
298 be less meaningful or even error out, but linelog wouldn't enter an infinite
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
299 loop.
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
300
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
301 If either case 2 or 3 occurs, linelog works as if the inner "^AI/D" and "^AE"
c10be3fc200b linelog: update internal help text
Jun Wu <quark@fb.com>
parents: 38795
diff changeset
302 operations on the left side are silently ignored.