Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:33 +0100] rev 44371
nodemap: double check the source docket when doing incremental update
In theory, the index will have the information we expect it to have. However by
security, it seems safer to double check that the incremental data are generated
from the data currently on disk.
Differential Revision: https://phab.mercurial-scm.org/D7890
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:24 +0100] rev 44370
nodemap: track the total and unused amount of data in the rawdata file
We need to keep that information around:
* total data will allow transaction to start appending new information without
confusing other reader.
* unused data will allow to detect when we should regenerate new rawdata file.
Differential Revision: https://phab.mercurial-scm.org/D7889
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:14 +0100] rev 44369
nodemap: track the maximum revision tracked in the nodemap
We need a simple way to detect when the on disk data contains less revision
than the index we read from disk. The docket file is meant for this, we just had
to start tracking that data.
We should also try to detect strip operation, but we will deal with this in
later changesets. Right now we are focusing on defining the API for index
supporting persistent nodemap.
Differential Revision: https://phab.mercurial-scm.org/D7888
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:50:04 +0100] rev 44368
nodemap: add a flag to dump the details of the docket
We are about to add more information to the docket. We first introduce a way to
debug its content.
Differential Revision: https://phab.mercurial-scm.org/D7887
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:54 +0100] rev 44367
nodemap: introduce append-only incremental update of the persistent data
Rewriting the full nodemap for each transaction has a cost we would like to
avoid. We introduce a new way to write persistent nodemap data by adding new
information at the end for file. Any new and updated block as added at the end
of the file. The last block is the new root node.
With this method, some of the block already on disk get "dereferenced" and
become dead data. In later changesets, We'll start tracking the amount of dead
data to eventually re-generate a full nodemap.
Differential Revision: https://phab.mercurial-scm.org/D7886
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:45 +0100] rev 44366
nodemap: keep track of the docket for loaded data
To perform incremental update of the on disk data, we need to keep tracks of
some aspect of that data.
Differential Revision: https://phab.mercurial-scm.org/D7885
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:35 +0100] rev 44365
nodemap: introduce an explicit class/object for the docket
We are about to add more information to this docket, having a clear location to
stock them in memory will help.
Differential Revision: https://phab.mercurial-scm.org/D7884
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:26 +0100] rev 44364
nodemap: keep track of the ondisk id of nodemap blocks
If we are to incrementally update the files, we need to keep some details about
the data we read.
Differential Revision: https://phab.mercurial-scm.org/D7883
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:16 +0100] rev 44363
nodemap: provide the on disk data to indexes who support it
Time to start defining the API and prepare the rust index support. We provide
a method to do so. We use a distinct method instead of passing them in the
constructor because we will need this method anyway later (to refresh the mmap
once we update the data on disk).
Differential Revision: https://phab.mercurial-scm.org/D7847
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:49:06 +0100] rev 44362
nodemap: all check that revision and nodes match in the nodemap
More check is always useful.
Differential Revision: https://phab.mercurial-scm.org/D7846
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:48:57 +0100] rev 44361
nodemap: add basic checking of the on disk nodemap content
The simplest check it so verify we have all the revision we needs, and nothing
more.
Differential Revision: https://phab.mercurial-scm.org/D7845
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:48:47 +0100] rev 44360
nodemap: code to parse the persistent binary nodemap data
We now have code to read back what we persisted. This will be put to use in
later changesets.
Differential Revision: https://phab.mercurial-scm.org/D7844
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:48:38 +0100] rev 44359
nodemap: move the iteratio inside the Block object
Having the iteration inside the serialization function does not help
readability. Now that we have a `Block` object, let us move that code there.
Differential Revision: https://phab.mercurial-scm.org/D7843
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:48:28 +0100] rev 44358
nodemap: use an explicit "Block" object in the reference implementation
This will help us to introduce some test around the data currently written on
disk.
Differential Revision: https://phab.mercurial-scm.org/D7842
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:48:19 +0100] rev 44357
nodemap: add a optional `nodemap_add_full` method on indexes
This method can be used to obtains persistent data for a full nodemap. The end
goal is for some index implementation to managed the nodemap serialization them
selves (eg: the rust implementation)
Differential Revision: https://phab.mercurial-scm.org/D7841
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:48:09 +0100] rev 44356
nodemap: add a (python) index class for persistent nodemap testing
Using the persistent nodemap require a compeling performance boost and an
existing implementation. The benefit of the persistent nodemap for pure python
code is unclear and we don't have a C implementation for it. Yet we would like
to actually start testing it in more details and define an API for using that
persistent nodemap.
We introduce a new `devel` config option to use an index class dedicated to
Nodemap Testing. This feature is "pure" only because having using a pure-python
index with the `cext` policy proved more difficult than I would like.
There is nothing going on in that class for now, but the coming changeset will
change that.
Differential Revision: https://phab.mercurial-scm.org/D7840
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:47:59 +0100] rev 44355
nodemap: delete older raw data file when creating a new ones
When we write new full files, it replace an older one with a different name. We
add the associated cleanup for the older file to be removed after the
transaction.
We delete all file matching the expected pattern to give use extra chance to
delete orphan files we might have failed to delete earlier.
Note: eventually we won't rewrite all data for each transaction. This is coming
in later changesets.
Differential Revision: https://phab.mercurial-scm.org/D7839
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:47:50 +0100] rev 44354
nodemap: use an intermediate "docket" file to carry small metadata
This intermediate file will make mmapping, transaction and content validation
easier. (Most of this usefulness will arrive gradually in later changeset). In
particular it will become very useful to append new data are the end of raw file
instead of rewriting on the file on each transaction.
See in code comments for details.
Differential Revision: https://phab.mercurial-scm.org/D7838
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:47:40 +0100] rev 44353
nodemap: only use persistent nodemap for non-inlined revlog
Revlog are inlined while they are small (to avoid having too many file to deal
with). The persistent nodemap will only provides a significant boost for large
enough revlog index. So it does not make sens to add an extra file to store
nodemap for small revlog.
We could consider inclining the nodemap data inside the revlog itself, but the
benefit is unclear so let it be an adventure for another time.
Differential Revision: https://phab.mercurial-scm.org/D7837
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:47:31 +0100] rev 44352
nodemap: add a function to read the data from disk
This changeset is small and mostly an excuse to introduce an API function
reading the data from disk.
Differential Revision: https://phab.mercurial-scm.org/D7836
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:47:21 +0100] rev 44351
nodemap: write nodemap data on disk
Let us start writing data on disk (so that we can read it from there later).
This series of changeset is going to focus first on having data on disk and
updating it.
Right now the data is written right next to the revlog data, in the store. We
might move it to cache (with proper cache validation mechanism) later, but for
now revlog have a storevfs instance and it is simpler to us it. The right
location for this data is not the focus of this series.
Differential Revision: https://phab.mercurial-scm.org/D7835
Pierre-Yves David <pierre-yves.david@octobus.net> [Wed, 15 Jan 2020 15:47:12 +0100] rev 44350
nodemap: have some python code writing a nodemap in persistent binary form
This python code aims to be as "simple" as possible. It is a reference
implementation of the data we are going to write on disk (and possibly,
later a way for pure python install to make sure the on disk data are up to
date).
It is not optimized for performance and rebuild the full data structure from
the index every time.
This is a stepping stone toward a persistent nodemap on disk.
Differential Revision: https://phab.mercurial-scm.org/D7834
Augie Fackler <augie@google.com> [Mon, 10 Feb 2020 17:31:05 -0500] rev 44349
cleanup: re-run black on the codebase
Looks like a few patches have landed without having been blackened. I
strongly suspect I should write a patch for baymax that blackens
things on the way in...
# skip-blame automatic formatting
Differential Revision: https://phab.mercurial-scm.org/D8104
Raphaël Gomès <rgomes@octobus.net> [Thu, 16 Jan 2020 13:34:04 +0100] rev 44348
rust-re2: add wrapper for calling Re2 from Rust
This assumes that Re2 is installed following Google's guide. I am not sure
how we want to integrate it in the project, but I think a follow-up patch would
be more appropriate for such work.
As it stands, *not* having Re2 installed results in a compilation error, which
is a problem as it breaks install compatibility. Hence, this is gated behind
a non-default `with-re2` compilation feature.
Differential Revision: https://phab.mercurial-scm.org/D7910