mercurial/help/internals/revlogs.txt
author Gregory Szorc <gregory.szorc@gmail.com>
Wed, 03 Oct 2018 12:54:39 -0700
changeset 40178 46a40bce3ae0
parent 32697 19b9fc40cc51
child 41199 d8fe67db5234
permissions -rw-r--r--
wireprotov2: define and implement "filesdata" command Previously, the only way to access file revision data was the "filedata" command. This command is useful to have. But, it only allowed resolving revision data for a single file. This meant that clients needed to send 1 command for each tracked path they were seeking data on. Furthermore, those commands would need to enumerate the exact file nodes they wanted data for. This approach meant that clients were sending a lot of data to remotes in order to request file data. e.g. if there were 1M file revisions, we'd need at least 20,000,000 bytes just to encode file nodes! Many clients on the internet don't have that kind of upload capacity. In order to limit the amount of data that clients must send, we'll need more efficient ways to request repository data. This commit defines and implements a new "filesdata" command. This command allows the retrieval of data for multiple files by specifying changeset revisions and optional file patterns. The command figures out what file revisions are "relevant" and sends them in bulk. The logic around choosing which file revisions to send in the case of haveparents not being set is overly simple and will over-send files. We will need more smarts here eventually. (Specifically, the client will need to tell the server which revisions it knows about.) This work is deferred until a later time. Differential Revision: https://phab.mercurial-scm.org/D4981
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     1
Revision logs - or *revlogs* - are an append only data structure for
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     2
storing discrete entries, or *revisions*. They are the primary storage
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     3
mechanism of repository data.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     4
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     5
Revlogs effectively model a directed acyclic graph (DAG). Each node
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     6
has edges to 1 or 2 *parent* nodes. Each node contains metadata and
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     7
the raw value for that node.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     8
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
     9
Revlogs consist of entries which have metadata and revision data.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    10
Metadata includes the hash of the revision's content, sizes, and
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    11
links to its *parent* entries. The collective metadata is referred
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    12
to as the *index* and the revision data is the *data*.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    13
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    14
Revision data is stored as a series of compressed deltas against previous
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    15
revisions.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    16
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    17
Revlogs are written in an append-only fashion. We never need to rewrite
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    18
a file to insert nor do we need to remove data. Rolling back in-progress
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    19
writes can be performed by truncating files. Read locks can be avoided
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    20
using simple techniques. This means that references to other data in
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    21
the same revlog *always* refer to a previous entry.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    22
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    23
Revlogs can be modeled as 0-indexed arrays. The first revision is
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    24
revision #0 and the second is revision #1. The revision -1 is typically
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    25
used to mean *does not exist* or *not defined*.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    26
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    27
File Format
29747
aba2bb2a6d0f help: don't try to render a section on sub-topics
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29094
diff changeset
    28
===========
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    29
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    30
A revlog begins with a 32-bit big endian integer holding version info
28590
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    31
and feature flags. This integer is shared with the first revision
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    32
entry.
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    33
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    34
This integer is logically divided into 2 16-bit shorts. The least
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    35
significant half of the integer is the format/version short. The other
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    36
short holds feature flags that dictate behavior of the revlog.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    37
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    38
Only 1 bit of the format/version short is currently used. Remaining
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    39
bits are reserved for future use.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    40
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    41
The following values for the format/version short are defined:
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    42
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    43
0
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    44
   The original revlog version.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    45
1
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    46
   RevlogNG (*next generation*). It replaced version 0 when it was
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    47
   implemented in 2006.
32697
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
    48
2
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
    49
   In-development version incorporating accumulated knowledge and
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
    50
   missing features from 10+ years of revlog version 1.
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
    51
57005 (0xdead)
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
    52
   Reserved for internal testing of new versions. No defined format
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
    53
   beyond 32-bit header.
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    54
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    55
The feature flags short consists of bit flags. Where 0 is the least
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    56
significant bit, the following bit offsets define flags:
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    57
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    58
0
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    59
   Store revision data inline.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    60
1
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    61
   Generaldelta encoding.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    62
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    63
2-15
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    64
   Reserved for future use.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    65
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    66
The following header values are common:
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    67
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    68
00 00 00 01
32393
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    69
   v1
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    70
00 01 00 01
32393
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    71
   v1 + inline
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    72
00 02 00 01
32393
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    73
   v1 + generaldelta
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    74
00 03 00 01
32393
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    75
   v1 + inline + generaldelta
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    76
28590
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    77
Following the 32-bit header is the remainder of the first index entry.
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    78
Following that are remaining *index* data. Inlined revision data is
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    79
possibly located between index entries. More on this layout is described
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    80
below.
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    81
32393
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    82
Version 1 Format
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    83
================
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    84
32393
d47b62368f3a revlog: remove some revlogNG terminology
Gregory Szorc <gregory.szorc@gmail.com>
parents: 31214
diff changeset
    85
Version 1 (RevlogNG) begins with an index describing the revisions in
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    86
the revlog. If the ``inline`` flag is set, revision data is stored inline,
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    87
or between index entries (as opposed to in a separate container).
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    88
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    89
Each index entry is 64 bytes. The byte layout of each entry is as
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    90
follows, with byte 0 being the first byte (all data stored as big endian):
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    91
28590
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    92
0-3 (4 bytes) (rev 0 only)
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
    93
   Revlog header
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
    94
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    95
0-5 (6 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    96
   Absolute offset of revision data from beginning of revlog.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
    97
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
    98
6-7 (2 bytes)
30523
726d30a6d89b censor: flag internal documentation
Remi Chaintron <remi@fb.com>
parents: 30499
diff changeset
    99
   Bit flags impacting revision behavior. The following bit offsets define:
30828
0b792507ea15 help: don't let tools reflow revlog flags list
Martin von Zweigbergk <martinvonz@google.com>
parents: 30827
diff changeset
   100
30658
c49be208ae34 documentation: better censor flag documentation
Remi Chaintron <remi@fb.com>
parents: 30523
diff changeset
   101
   0: REVIDX_ISCENSORED revision has censor metadata, must be verified.
30828
0b792507ea15 help: don't let tools reflow revlog flags list
Martin von Zweigbergk <martinvonz@google.com>
parents: 30827
diff changeset
   102
30829
08b34c3a6f74 revlog: give EXTSTORED flag value to narrowhg
Martin von Zweigbergk <martinvonz@google.com>
parents: 30828
diff changeset
   103
   1: REVIDX_ELLIPSIS revision hash does not match its data. Used by
08b34c3a6f74 revlog: give EXTSTORED flag value to narrowhg
Martin von Zweigbergk <martinvonz@google.com>
parents: 30828
diff changeset
   104
   narrowhg
08b34c3a6f74 revlog: give EXTSTORED flag value to narrowhg
Martin von Zweigbergk <martinvonz@google.com>
parents: 30828
diff changeset
   105
08b34c3a6f74 revlog: give EXTSTORED flag value to narrowhg
Martin von Zweigbergk <martinvonz@google.com>
parents: 30828
diff changeset
   106
   2: REVIDX_EXTSTORED revision data is stored externally.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   107
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   108
8-11 (4 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   109
   Compressed length of revision data / chunk as stored in revlog.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   110
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   111
12-15 (4 bytes)
30499
22d05b53b0e8 help: clarify contents of revlog index
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29747
diff changeset
   112
   Uncompressed length of revision data. This is the size of the full
22d05b53b0e8 help: clarify contents of revlog index
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29747
diff changeset
   113
   revision data, not the size of the chunk post decompression.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   114
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   115
16-19 (4 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   116
   Base or previous revision this revision's delta was produced against.
31214
167b69ccc62c help: align description of 'base rev' with reality [issue5488]
Kim Alvefur <zash@zash.se>
parents: 30829
diff changeset
   117
   This revision holds full text (as opposed to a delta) if it points to
167b69ccc62c help: align description of 'base rev' with reality [issue5488]
Kim Alvefur <zash@zash.se>
parents: 30829
diff changeset
   118
   itself. For generaldelta repos, this is the previous revision in the
167b69ccc62c help: align description of 'base rev' with reality [issue5488]
Kim Alvefur <zash@zash.se>
parents: 30829
diff changeset
   119
   delta chain. For non-generaldelta repos, this is the base or first
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   120
   revision in the delta chain.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   121
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   122
20-23 (4 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   123
   A revision this revision is *linked* to. This allows a revision in
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   124
   one revlog to be forever associated with a revision in another
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   125
   revlog. For example, a file's revlog may point to the changelog
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   126
   revision that introduced it.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   127
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   128
24-27 (4 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   129
   Revision of 1st parent. -1 indicates no parent.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   130
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   131
28-31 (4 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   132
   Revision of 2nd parent. -1 indicates no 2nd parent.
30827
e997e4826459 help: format revlog.txt more closely to result
Martin von Zweigbergk <martinvonz@google.com>
parents: 30746
diff changeset
   133
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   134
32-63 (32 bytes)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   135
   Hash of revision's full text. Currently, SHA-1 is used and only
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   136
   the first 20 bytes of this field are used. The rest of the bytes
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   137
   are ignored and should be stored as \0.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   138
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   139
If inline revision data is being stored, the compressed revision data
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   140
(of length from bytes offset 8-11 from the index entry) immediately
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   141
follows the index entry. There is no header on the revision data. There
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   142
is no padding between it and the index entries before and after.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   143
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   144
If revision data is not inline, then raw revision data is stored in a
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   145
separate byte container. The offsets from bytes 0-5 and the compressed
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   146
length from bytes 8-11 define how to access this data.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   147
28590
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
   148
The first 4 bytes of the revlog are shared between the revlog header
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
   149
and the 6 byte absolute offset field from the first revlog entry.
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
   150
32697
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   151
Version 2 Format
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   152
================
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   153
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   154
(In development. Format not finalized or stable.)
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   155
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   156
Version 2 is currently identical to version 1. This will obviously
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   157
change.
19b9fc40cc51 revlog: skeleton support for version 2 revlogs
Gregory Szorc <gregory.szorc@gmail.com>
parents: 32393
diff changeset
   158
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   159
Delta Chains
29747
aba2bb2a6d0f help: don't try to render a section on sub-topics
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29094
diff changeset
   160
============
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   161
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   162
Revision data is encoded as a chain of *chunks*. Each chain begins with
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   163
the compressed original full text for that revision. Each subsequent
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   164
*chunk* is a *delta* against the previous revision. We therefore call
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   165
these chains of chunks/deltas *delta chains*.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   166
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   167
The full text for a revision is reconstructed by loading the original
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   168
full text for the base revision of a *delta chain* and then applying
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   169
*deltas* until the target revision is reconstructed.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   170
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   171
*Delta chains* are limited in length so lookup time is bound. They are
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   172
limited to ~2x the length of the revision's data. The linear distance
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   173
between the base chunk and the final chunk is also limited so the
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   174
amount of read I/O to load all chunks in the delta chain is bound.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   175
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   176
Deltas and delta chains are either computed against the previous
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   177
revision in the revlog or another revision (almost certainly one of
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   178
the parents of the revision). Historically, deltas were computed against
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   179
the previous revision. The *generaldelta* revlog feature flag (enabled
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   180
by default in Mercurial 3.7) activates the mode where deltas are
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   181
computed against an arbitrary revision (almost certainly a parent revision).
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   182
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   183
File Storage
29747
aba2bb2a6d0f help: don't try to render a section on sub-topics
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29094
diff changeset
   184
============
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   185
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   186
Revlogs logically consist of an index (metadata of entries) and
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   187
revision data. This data may be stored together in a single file or in
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   188
separate files. The mechanism used is indicated by the ``inline`` feature
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   189
flag on the revlog.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   190
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   191
Mercurial's behavior is to use inline storage until a revlog reaches a
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   192
certain size, at which point it will be converted to non-inline. The
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   193
reason there is a size limit on inline storage is to establish an upper
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   194
bound on how much data must be read to load the index. It would be a waste
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   195
to read tens or hundreds of extra megabytes of data just to access the
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   196
index data.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   197
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   198
The actual layout of revlog files on disk is governed by the repository's
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   199
*store format*. Typically, a ``.i`` file represents the index revlog
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   200
(possibly containing inline data) and a ``.d`` file holds the revision data.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   201
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   202
Revision Entries
29747
aba2bb2a6d0f help: don't try to render a section on sub-topics
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29094
diff changeset
   203
================
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   204
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   205
Revision entries consist of an optional 1 byte header followed by an
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   206
encoding of the revision data. The headers are as follows:
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   207
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   208
\0 (0x00)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   209
   Revision data is the entirety of the entry, including this header.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   210
u (0x75)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   211
   Raw revision data follows.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   212
x (0x78)
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   213
   zlib (RFC 1950) data.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   214
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   215
   The 0x78 value is actually the first byte of the zlib header (CMF byte).
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   216
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   217
Hash Computation
29747
aba2bb2a6d0f help: don't try to render a section on sub-topics
Gregory Szorc <gregory.szorc@gmail.com>
parents: 29094
diff changeset
   218
================
27631
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   219
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   220
The hash of the revision is stored in the index and is used both as a primary
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   221
key and for data integrity verification.
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   222
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   223
Currently, SHA-1 is the only supported hashing algorithm. To obtain the SHA-1
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   224
hash of a revision:
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   225
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   226
1. Hash the parent nodes
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   227
2. Hash the fulltext of the revision
c18292a6ff54 internals: document revlog format
Gregory Szorc <gregory.szorc@gmail.com>
parents:
diff changeset
   228
28590
b0b9f6b0a777 help: document sharing of revlog header with revision 0
Gregory Szorc <gregory.szorc@gmail.com>
parents: 27631
diff changeset
   229
The 20 byte node ids of the parents are fed into the hasher in ascending order.