author | Martin von Zweigbergk <martinvonz@google.com> |
Fri, 28 Sep 2018 12:56:57 -0700 | |
changeset 39967 | 707c3804e607 |
parent 39006 | c10be3fc200b |
permissions | -rw-r--r-- |
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
1 |
linelog is a storage format inspired by the "Interleaved deltas" idea. See |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
2 |
https://en.wikipedia.org/wiki/Interleaved_deltas for its introduction. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
3 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
4 |
0. SCCS Weave |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
5 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
6 |
To understand what linelog is, first we have a quick look at a simplified |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
7 |
(with header removed) SCCS weave format, which is an implementation of the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
8 |
"Interleaved deltas" idea. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
9 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
10 |
0.1 Basic SCCS Weave File Format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
11 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
12 |
A SCCS weave file consists of plain text lines. Each line is either a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
13 |
special instruction starting with "^A" or part of the content of the real |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
14 |
file the weave tracks. There are 3 important operations, where REV denotes |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
15 |
the revision number: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
16 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
17 |
^AI REV, marking the beginning of an insertion block introduced by REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
18 |
^AD REV, marking the beginning of a deletion block introduced by REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
19 |
^AE REV, marking the end of the block started by "^AI REV" or "^AD REV" |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
20 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
21 |
Note on revision numbers: For any two different revision numbers, one must |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
22 |
be an ancestor of the other to make them comparable. This enforces linear |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
23 |
history. Besides, the comparison functions (">=", "<") should be efficient. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
24 |
This means, if revisions are strings like git or hg, an external map is |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
25 |
required to convert them into integers. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
26 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
27 |
For example, to represent the following changes: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
28 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
29 |
REV 1 | REV 2 | REV 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
30 |
------+-------+------- |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
31 |
a | a | a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
32 |
b | b | 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
33 |
c | 1 | c |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
34 |
| 2 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
35 |
| c | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
36 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
37 |
A possible weave file looks like: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
38 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
39 |
^AI 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
40 |
a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
41 |
^AD 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
42 |
b |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
43 |
^AI 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
44 |
1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
45 |
^AE 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
46 |
2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
47 |
^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
48 |
c |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
49 |
^AE 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
50 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
51 |
An "^AE" does not always match its nearest operation ("^AI" or "^AD"). In |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
52 |
the above example, "^AE 3" does not match the nearest "^AI 2" but "^AD 3". |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
53 |
Therefore we need some extra information for "^AE". The SCCS weave uses a |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
54 |
revision number. It could also be a boolean value about whether it is an |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
55 |
insertion or a deletion (see section 0.4). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
56 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
57 |
0.2 Checkout |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
58 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
59 |
The "checkout" operation is to retrieve file content at a given revision, |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
60 |
say X. It's doable by going through the file line by line and: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
61 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
62 |
- If meet ^AI rev, and rev > X, find the corresponding ^AE and jump there |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
63 |
- If meet ^AD rev, and rev <= X, find the corresponding ^AE and jump there |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
64 |
- Ignore ^AE |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
65 |
- For normal lines, just output them |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
66 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
67 |
0.3 Annotate |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
68 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
69 |
The "annotate" operation is to show extra metadata like the revision number |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
70 |
and the original line number a line comes from. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
71 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
72 |
It's basically just a "Checkout". For the extra metadata, they can be stored |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
73 |
side by side with the line contents. Alternatively, we can infer the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
74 |
revision number from "^AI"s. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
75 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
76 |
Some SCM tools have to calculate diffs on the fly and thus are much slower |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
77 |
on this operation. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
78 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
79 |
0.4 Tree Structure |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
80 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
81 |
The word "interleaved" is used because "^AI" .. "^AE" and "^AD" .. "^AE" |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
82 |
blocks can be interleaved. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
83 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
84 |
If we consider insertions and deletions separately, they can form tree |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
85 |
structures, respectively. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
86 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
87 |
+--- ^AI 1 +--- ^AD 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
88 |
| +- ^AI 2 | +- ^AD 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
89 |
| | | | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
90 |
| +- ^AE 2 | +- ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
91 |
| | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
92 |
+--- ^AE 1 +--- ^AE 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
93 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
94 |
More specifically, it's possible to build a tree for all insertions, where |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
95 |
the tree node has the structure "(rev, startline, endline)". "startline" is |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
96 |
the line number of "^AI" and "endline" is the line number of the matched |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
97 |
"^AE". The tree will have these properties: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
98 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
99 |
1. child.rev > parent.rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
100 |
2. child.startline > parent.startline |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
101 |
3. child.endline < parent.endline |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
102 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
103 |
A similar tree for all deletions can also be built with the first property |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
104 |
changed to: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
105 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
106 |
1. child.rev < parent.rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
107 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
108 |
0.5 Malformed Cases |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
109 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
110 |
The following cases are considered malformed in our implementation: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
111 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
112 |
1. Interleaved insertions, or interleaved deletions. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
113 |
It can be rewritten to a non-interleaved tree structure. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
114 |
|
39006 | 115 |
Take insertions as example, deletions are similar: |
116 |
||
117 |
^AI x ^AI x |
|
118 |
a a |
|
119 |
^AI x + 1 -> ^AI x + 1 |
|
120 |
b b |
|
121 |
^AE x ^AE x + 1 |
|
122 |
c ^AE x |
|
123 |
^AE x + 1 ^AI x + 1 |
|
124 |
c |
|
125 |
^AE x + 1 |
|
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
126 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
127 |
2. Nested insertions, where the inner one has a smaller revision number. |
39006 | 128 |
Or nested deletions, where the inner one has a larger revision number. |
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
129 |
It can be rewritten to a non-nested form. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
130 |
|
39006 | 131 |
Take insertions as example, deletions are similar: |
132 |
||
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
133 |
^AI x + 1 ^AI x + 1 |
39006 | 134 |
a a |
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
135 |
^AI x -> ^AE x + 1 |
39006 | 136 |
b ^AI x |
137 |
^AE x b |
|
138 |
c ^AE x |
|
139 |
^AE x + 1 ^AI x + 1 |
|
140 |
c |
|
141 |
^AE x + 1 |
|
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
142 |
|
39006 | 143 |
3. Insertion inside deletion with a smaller revision number. |
144 |
||
145 |
Rewrite by duplicating the content inserted: |
|
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
146 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
147 |
^AD x ^AD x |
39006 | 148 |
a a |
149 |
^AI x + 1 -> b |
|
150 |
b c |
|
151 |
^AE x + 1 ^AE x |
|
152 |
c ^AI x + 1 |
|
153 |
^AE x b |
|
154 |
^AE x + 1 |
|
155 |
||
156 |
Note: If "annotate" purely depends on "^AI" information, then the |
|
157 |
duplication content will lose track of where "b" is originally from. |
|
38835
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
158 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
159 |
Some of them may be valid in other implementations for special purposes. For |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
160 |
example, to "revive" a previously deleted block in a newer revision. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
161 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
162 |
0.6 Cases Can Be Optimized |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
163 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
164 |
It's always better to get things nested. For example, the left is more |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
165 |
efficient than the right while they represent the same content: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
166 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
167 |
+--- ^AD 2 +- ^AD 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
168 |
| +- ^AD 1 | LINE A |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
169 |
| | LINE A +- ^AE 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
170 |
| +- ^AE 1 +- ^AD 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
171 |
| LINE B | LINE B |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
172 |
+--- ^AE 2 +- ^AE 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
173 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
174 |
Our implementation sometimes generates the less efficient data. To always |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
175 |
get the optimal form, it requires extra code complexity that seems unworthy. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
176 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
177 |
0.7 Inefficiency |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
178 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
179 |
The file format can be slow because: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
180 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
181 |
- Inserting a new line at position P requires rewriting all data after P. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
182 |
- Finding "^AE" requires walking through the content (O(N), where N is the |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
183 |
number of lines between "^AI/D" and "^AE"). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
184 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
185 |
1. Linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
186 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
187 |
The linelog is a binary format that dedicates to speed up mercurial (or |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
188 |
git)'s "annotate" operation. It's designed to avoid issues mentioned in |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
189 |
section 0.7. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
190 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
191 |
1.1 Content Stored |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
192 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
193 |
Linelog is not another storage for file contents. It only stores line |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
194 |
numbers and corresponding revision numbers, instead of actual line content. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
195 |
This is okay for the "annotate" operation because usually the external |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
196 |
source is fast to checkout the content of a file at a specific revision. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
197 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
198 |
A typical SCCS weave is also fast on the "grep" operation, which needs |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
199 |
random accesses to line contents from different revisions of a file. This |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
200 |
can be slow with linelog's no-line-content design. However we could use |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
201 |
an extra map ((rev, line num) -> line content) to speed it up. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
202 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
203 |
Note the revision numbers in linelog should be independent from mercurial |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
204 |
integer revision numbers. There should be some mapping between linelog rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
205 |
and hg hash stored side by side, to make the files reusable after being |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
206 |
copied to another machine. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
207 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
208 |
1.2 Basic Format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
209 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
210 |
A linelog file consists of "instruction"s. An "instruction" can be either: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
211 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
212 |
- JGE REV ADDR # jump to ADDR if rev >= REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
213 |
- JL REV ADDR # jump to ADDR if rev < REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
214 |
- LINE REV LINENUM # append the (LINENUM+1)-th line in revision REV |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
215 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
216 |
For example, here is the example linelog representing the same file with |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
217 |
3 revisions mentioned in section 0.1: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
218 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
219 |
SCCS | Linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
220 |
Weave | Addr : Instruction |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
221 |
------+------+------------- |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
222 |
^AI 1 | 0 : JL 1 8 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
223 |
a | 1 : LINE 1 0 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
224 |
^AD 3 | 2 : JGE 3 6 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
225 |
b | 3 : LINE 1 1 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
226 |
^AI 2 | 4 : JL 2 7 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
227 |
1 | 5 : LINE 2 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
228 |
^AE 3 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
229 |
2 | 6 : LINE 2 3 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
230 |
^AE 2 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
231 |
c | 7 : LINE 1 2 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
232 |
^AE 1 | |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
233 |
| 8 : END |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
234 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
235 |
This way, "find ^AE" is O(1) because we just jump there. And we can insert |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
236 |
new lines without rewriting most part of the file by appending new lines and |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
237 |
changing a single instruction to jump to them. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
238 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
239 |
The current implementation uses 64 bits for an instruction: The opcode (JGE, |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
240 |
JL or LINE) takes 2 bits, REV takes 30 bits and ADDR or LINENUM takes 32 |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
241 |
bits. It also stores the max revision number and buffer size at the first |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
242 |
64 bits for quick access to these values. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
243 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
244 |
1.3 Comparing with Mercurial's revlog format |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
245 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
246 |
Apparently, linelog is very different from revlog: linelog stores rev and |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
247 |
line numbers, while revlog has line contents and other metadata (like |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
248 |
parents, flags). However, the revlog format could also be used to store rev |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
249 |
and line numbers. For example, to speed up the annotate operation, we could |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
250 |
also pre-calculate annotate results and just store them using the revlog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
251 |
format. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
252 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
253 |
Therefore, linelog is actually somehow similar to revlog, with the important |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
254 |
trade-off that it only supports linear history (mentioned in section 0.1). |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
255 |
Essentially, the differences are: |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
256 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
257 |
a) Linelog is full of deltas, while revlog could contain full file |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
258 |
contents sometimes. So linelog is smaller. Revlog could trade |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
259 |
reconstruction speed for file size - best case, revlog is as small as |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
260 |
linelog. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
261 |
b) The interleaved delta structure allows skipping large portion of |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
262 |
uninteresting deltas so linelog's content reconstruction is faster than |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
263 |
the delta-only version of revlog (however it's possible to construct |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
264 |
a case where interleaved deltas degrade to plain deltas, so linelog |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
265 |
worst case would be delta-only revlog). Revlog could trade file size |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
266 |
for reconstruction speed. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
267 |
c) Linelog implicitly maintains the order of all lines it stores. So it |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
268 |
could dump all the lines from all revisions, with a reasonable order. |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
269 |
While revlog could also dump all line additions, it requires extra |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
270 |
computation to figure out the order putting those lines - that's some |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
271 |
kind of "merge". |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
272 |
|
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
273 |
"c" makes "hg absorb" easier to implement and makes it possible to do |
422d661056be
linelog: add a Python implementation of the linelog datastructure
Augie Fackler <augie@google.com>
parents:
diff
changeset
|
274 |
"annotate --deleted". |
39006 | 275 |
|
276 |
1.4 Malformed Cases Handling |
|
277 |
||
278 |
The following "case 1", "case 2", and "case 3" refer to cases mentioned |
|
279 |
in section 0.5. |
|
280 |
||
281 |
Using the exposed API (replacelines), case 1 is impossible to generate, |
|
282 |
although it's possible to generate it by constructing rawdata and load that |
|
283 |
via linelog.fromdata. |
|
284 |
||
285 |
Doing annotate(maxrev) before replacelines (aka. a1, a2 passed to |
|
286 |
replacelines are related to the latest revision) eliminates the possibility |
|
287 |
of case 3. That makes sense since usually you'd like to make edits on top of |
|
288 |
the latest revision. Practically, both absorb and fastannotate do this. |
|
289 |
||
290 |
Doing annotate(maxrev), plus replacelines(rev, ...) where rev >= maxrev |
|
291 |
eliminates the possibility of case 2. That makes sense since usually the |
|
292 |
edits belong to "new revisions", not "old revisions". Practically, |
|
293 |
fastannotate does this. Absorb calls replacelines with rev < maxrev to edit |
|
294 |
past revisions. So it needs some extra care to not generate case 2. |
|
295 |
||
296 |
If case 1 occurs, that probably means linelog file corruption (assuming |
|
297 |
linelog is edited via public APIs) the checkout or annotate result could |
|
298 |
be less meaningful or even error out, but linelog wouldn't enter an infinite |
|
299 |
loop. |
|
300 |
||
301 |
If either case 2 or 3 occurs, linelog works as if the inner "^AI/D" and "^AE" |
|
302 |
operations on the left side are silently ignored. |