view tests/test-contrib-relnotes.t @ 40326:fed697fa1734

sqlitestore: file storage backend using SQLite This commit provides an extension which uses SQLite to store file data (as opposed to revlogs). As the inline documentation describes, there are still several aspects to the extension that are incomplete. But it's a start. The extension does support basic clone, checkout, and commit workflows, which makes it suitable for simple use cases. One notable missing feature is support for "bundlerepos." This is probably responsible for the most test failures when the extension is activated as part of the test suite. All revision data is stored in SQLite. Data is stored as zstd compressed chunks (default if zstd is available), zlib compressed chunks (default if zstd is not available), or raw chunks (if configured or if a compressed delta is not smaller than the raw delta). This makes things very similar to revlogs. Unlike revlogs, the extension doesn't yet enforce a limit on delta chain length. This is an obvious limitation and should be addressed. This is somewhat mitigated by the use of zstd, which is much faster than zlib to decompress. There is a dedicated table for storing deltas. Deltas are stored by the SHA-1 hash of their uncompressed content. The "fileindex" table has columns that reference the delta for each revision and the base delta that delta should be applied against. A recursive SQL query is used to resolve the delta chain along with the delta data. By storing deltas by hash, we are able to de-duplicate delta storage! With revlogs, the same deltas in different revlogs would result in duplicate storage of that delta. In this scheme, inserting the duplicate delta is a no-op and delta chains simply reference the existing delta. When initially implementing this extension, I did not have content-indexed deltas and deltas could be duplicated across files (just like revlogs). When I implemented content-indexed deltas, the size of the SQLite database for a full clone of mozilla-unified dropped: before: 2,554,261,504 bytes after: 2,488,754,176 bytes Surprisingly, this is still larger than the bytes size of revlog files: revlog files: 2,104,861,230 bytes du -b: 2,254,381,614 I would have expected storage to be smaller since we're not limiting delta chain length and since we're using zstd instead of zlib. I suspect the SQLite indexes and per-column overhead account for the bulk of the differences. (Keep in mind that revlog uses a 64-byte packed struct for revision index data and deltas are stored without padding. Aside from the 12 unused bytes in the 32 byte node field, revlogs are pretty efficient.) Another source of overhead is file name storage. With revlogs, file names are stored in the filesystem. But with SQLite, we need to store file names in the database. This is roughly equivalent to the size of the fncache file, which for the mozilla-unified repository is ~34MB. Since the SQLite database isn't append-only and since delta chains can reference any delta, this opens some interesting possibilities. For example, we could store deltas in reverse, such that fulltexts are stored for newer revisions and deltas are applied to reconstruct older revisions. This is likely a more optimal storage strategy for version control, as new data tends to be more frequently accessed than old data. We would obviously need wire protocol support for transferring revision data from newest to oldest. And we would probably need some kind of mechanism for "re-encoding" stores. But it should be doable. This extension is very much experimental quality. There are a handful of features that don't work. It probably isn't suitable for day-to-day use. But it could be used in limited cases (e.g. read-only checkouts like in CI). And it is also a good proving ground for alternate storage backends. As we continue to define interfaces for all things storage, it will be useful to have a viable alternate storage backend to see how things shake out in practice. test-storage.py passes on Python 2 and introduces no new test failures on Python 3. Having the storage-level unit tests has proved to be insanely useful when developing this extension. Those tests caught numerous bugs during development and I'm convinced this style of testing is the way forward for ensuring alternate storage backends work as intended. Of course, test coverage isn't close to what it needs to be. But it is a start. And what coverage we have gives me confidence that basic store functionality is implemented properly. Differential Revision: https://phab.mercurial-scm.org/D4928
author Gregory Szorc <gregory.szorc@gmail.com>
date Tue, 09 Oct 2018 08:50:13 -0700
parents 4971c9724206
children 683e99f0b30c
line wrap: on
line source

#require test-repo py3exe
  $ . "$TESTDIR/helpers-testrepo.sh"

  $ cd $TESTDIR/..
  $ python3 contrib/relnotes 4.4 --stoprev 4.5
  changeset 3398603c5621: unexpected block in release notes directive feature
  New Features
  ============
  
  revert --interactive
  --------------------
  
  The revert command now accepts the flag --interactive to allow reverting only
  some of the changes to the specified files.
  
  Rebase with different destination per source revision
  -----------------------------------------------------
  
  Previously, rebase only supports one unique destination. Now "SRC" and
  "ALLSRC" can be used in rebase destination revset to precisely define
  destination per each individual source revision.
  
  For example, the following command could move some orphaned changesets to
  reasonable new places so they become no longer orphaned:
  
  hg rebase   -r 'orphan()-obsolete()'   -d 'max((successors(max(roots(ALLSRC) &
  ::SRC)^)-obsolete())::)'
  
  Accessing hidden changesets
  ---------------------------
  
  Set config option 'experimental.directaccess = True' to access hidden
  changesets from read only commands.
  
  githelp extension
  -----------------
  
  The "githelp" extension provides the "hg githelp" command. This command
  attempts to convert a "git" command to its Mercurial equivalent. The extension
  can be useful to Git users new to Mercurial.
  
  Other Changes
  -------------
  
  * When interactive revert is run against a revision other than the working
    directory parent, the diff shown is the diff to *apply* to the working
    directory, rather than the diff to *discard* from the working copy. This is
    in line with related user experiences with 'git' and appears to be less
    confusing with 'ui.interface=curses'.
  
  * Let 'hg rebase' avoid content-divergence by skipping obsolete changesets
    (and their descendants) when they are present in the rebase set along with
    one of their successors but none of their successors is in destination.
  
  * hgweb now displays phases of non-public changesets
  
  * The "HGPLAINEXCEPT" environment variable can now include "color" to allow
    automatic output colorization in otherwise automated environments.
  
  * A new unamend command in uncommit extension which undoes the effect of the
    amend command by creating a new changeset which was there before amend and
    moving the changes that were amended to the working directory.
  
  * A '--abort' flag to merge command to abort the ongoing merge.
  
  * An experimental flag '--rev' to 'hg branch' which can be used to change
    branch of changesets.
  
  Backwards Compatibility Changes
  ===============================
  
  * "log --follow-first -rREV", which is deprecated, now follows the first
    parent of merge revisions from the specified "REV" just like "log --follow
    -rREV".
  
  * "log --follow -rREV FILE.." now follows file history across copies and
    renames.
  
  Bug Fixes
  =========
  
  Issue 5165
  ----------
  
  Bookmark, whose name is longer than 255, can again be exchanged again between
  4.4+ client and servers.
  
  Performance Improvements
  ========================
  
  * bundle2 read I/O throughput significantly increased.
  
  * Significant memory use reductions when reading from bundle2 bundles.
  
    On the BSD repository, peak RSS during changegroup application decreased by
    ~185 MB from ~752 MB to ~567 MB.
  
  API Changes
  ===========
  
  * bundlerepo.bundlerepository.bundle and
    bundlerepo.bundlerepository.bundlefile are now prefixed with an underscore.
  
  * Rename bundlerepo.bundlerepository.bundlefilespos to _cgfilespos.
  
  * dirstate no longer provides a 'dirs()' method.  To test for the existence of
    a directory in the dirstate, use 'dirstate.hasdir(dirname)'.
  
  * bundle2 parts are no longer seekable by default.
  
  * mapping does not contain all template resources. use context.resource() in
    template functions.
  
  * "text=False|True" option is dropped from the vfs interface because of Python
    3 compatibility issue. Use "util.tonativeeol/fromnativeeol()" to convert EOL
    manually.
  
  * wireproto.streamres.__init__ no longer accepts a "reader" argument. Use the
    "gen" argument instead.
  
  * exchange.getbundlechunks() now returns a 2-tuple instead of just an
    iterator.
  
  
  === commands ===
   * amend: do not drop missing files (Bts:issue5732)
   * amend: do not take untracked files as modified or clean (Bts:issue5732)
   * amend: update .hgsubstate before committing a memctx (Bts:issue5677)
   * annotate: add support to specify hidden revs if directaccess config is set
   * bookmark: add methods to binary encode and decode bookmark values
   * bookmark: deprecate direct update of a bookmark value
   * bookmark: introduce a 'bookmarks' part
   * bookmark: introduce in advance a variant of the exchange test
   * bookmark: run 'pushkey' hooks after bookmark move, not 'prepushkey'
   * bookmarks: add bookmarks to hidden revs if directaccess config is set
   * bookmarks: calculate visibility exceptions only once
   * bookmarks: display the obsfate of hidden revision we create a bookmark on
   * bookmarks: fix pushkey compatibility mode (Bts:issue5777)
   * bookmarks: use context managers for lock and transaction in update()
   * bookmarks: use context managers for locks and transaction in pushbookmark()
   * branch: allow changing branch name to existing name if possible
   * clone: add support for storing remotenames while cloning
   * clone: use utility function to write hgrc
   * clonebundle: make it possible to retrieve the initial bundle through largefile
   * commandserver: restore cwd in case of exception
   * commandserver: unblock SIGCHLD
   * help: deprecate ui.slash in favor of slashpath template filter (Bts:issue5572)
   * log: allow matchfn to be non-null even if both --patch/--stat are off
   * log: build follow-log filematcher at once
   * log: don't expand aliases in revset built from command options
   * log: make "slowpath" condition slightly more readable
   * log: make opt2revset table a module constant
   * log: merge getlogrevs() and getgraphlogrevs()
   * log: remove temporary variable 'date' used only once
   * log: resolve --follow thoroughly in getlogrevs()
   * log: resolve --follow with -rREV in cmdutil.getlogrevs()
   * log: simplify 'x or ancestors(x)' expression
   * log: translate column labels at once (Bts:issue5750)
   * log: use revsetlang.formatspec() thoroughly
   * log: use revsetlang.formatspec() to concatenate list expression
   * log: use smartset.slice() to limit number of revisions to be displayed
   * merge: cache unknown dir checks (Bts:issue5716)
   * merge: check created file dirs for path conflicts only once (Bts:issue5716)
   * patch: add within-line color diff capacity
   * patch: catch unexpected case in _inlinediff
   * patch: do not break up multibyte character when highlighting word
   * patch: improve heuristics to not take the word "diff" as header (Bts:issue1879)
   * patch: reverse _inlinediff output for consistency
   * pull: clarify that -u only updates linearly
   * pull: hold wlock for the full operation when --update is used
   * pull: retrieve bookmarks through the binary part when possible
   * pull: store binary node in pullop.remotebookmarks
   * push: include a 'check:bookmarks' part when possible
   * push: restrict common discovery to the pushed set
   * revert: support reverting to hidden cset if directaccess config is set
  
  === core ===
   * filelog: add the ability to report the user facing name
   * revlog: choose between ifh and dfh once for all
   * revlog: don't use slicing to return parents
   * revlog: group delta computation methods under _deltacomputer object
   * revlog: group revision info into a dedicated structure
   * revlog: introduce 'deltainfo' to distinguish from 'delta'
   * revlog: rename 'rev' to 'base', as it is the base revision
   * revlog: separate diff computation from the collection of other info
   * revset: evaluate filesets against each revision for 'file()' (Bts:issue5778)
   * revset: parse x^:: as (x^):: (Bts:issue5764)
   * templater: look up symbols/resources as if they were separated (Bts:issue5699)
   * transaction: register summary callbacks only at start of transaction (BC)
   * util: whitelist NTFS for hardlink creation (Bts:issue4580)
  
  === extensions ===
   * convert: restore the ability to use bzr < 2.6.0 (Bts:issue5733)
   * histedit: add support to output nodechanges using formatter
   * largefiles: add a 'debuglfput' command to put largefile into the store
   * largefiles: add support for 'largefiles://' url scheme
   * largefiles: allow to run 'debugupgraderepo' on repo with largefiles
   * largefiles: convert EOL of hgrc before appending to bytes IO
   * largefiles: explicitly set the source and sink types to 'hg' for lfconvert
   * largefiles: modernize how capabilities are added to the wire protocol
   * largefiles: pay attention to dropped standin files when updating largefiles
   * rebase: add concludememorynode(), and call it when rebasing in-memory
   * rebase: add the --inmemory option flag; assign a wctx object for the rebase
   * rebase: add ui.log calls for whether IMM used, whether rebasing WCP
   * rebase: disable 'inmemory' if the rebaseset contains the working copy
   * rebase: do not bail on uncomitted changes if rebasing in-memory
   * rebase: do not update if IMM; instead, set the overlaywctx's parents
   * rebase: don't run IMM if running rebase in a transaction
   * rebase: don't take out a dirstate guard for in-memory rebase
   * rebase: drop --style option
   * rebase: fix for hgsubversion
   * rebase: pass the wctx object (IMM or on-disk) to merge.update
   * rebase: pass wctx to rebasenode()
   * rebase: rerun a rebase on-disk if IMM merge conflicts arise
   * rebase: switch ui.log calls to common style
   * rebase: use fm.formatlist() and fm.formatdict() to support user template
  
  === hgweb ===
   * hgweb: disable diff.noprefix option for diffstat
   * hgweb: drop support of browsers that don't understand <canvas> (BC)
   * hgweb: only include graph-related data in jsdata variable on /graph pages (BC)
   * hgweb: stop adding strings to innerHTML of #graphnodes and #nodebgs (BC)
  
  === unsorted ===
   * archive: add support to specify hidden revs if directaccess config is set
   * atomicupdate: add an experimental option to use atomictemp when updating
   * bundle: allow bundlerepo to support alternative manifest implementations
   * changelog: introduce a 'tiprev' method
   * changelog: use 'tiprev()' in 'tip()'
   * completion: add support for new "amend" command
   * debugssl: convert port number to int (Bts:issue5757)
   * diff: disable diff.noprefix option for diffstat (Bts:issue5759)
   * dispatch: abort if early boolean options can't be parsed
   * dispatch: add HGPLAIN=+strictflags to restrict early parsing of global options
   * dispatch: add option to not strip command args parsed by _earlygetopt()
   * dispatch: alias --repo to --repository while parsing early options
   * dispatch: convert non-list option parsed by _earlygetopt() to string
   * dispatch: fix early parsing of short option with value like -R=foo
   * dispatch: handle IOError when writing to stderr
   * dispatch: stop parsing of early boolean option at "--"
   * dispatch: verify result of early command parsing
   * evolution: make reporting of new unstable changesets optional
   * extdata: abort if external command exits with non-zero status (BC)
   * fancyopts: add early-options parser compatible with getopt()
   * graphlog: add another graph node type, unstable, using character "*" (BC)
   * hgdemandimport: use correct hyperlink to python-bug in comments (Bts:issue5765)
   * httppeer: add support for tracing all http request made by the peer
   * identify: document -r. explicitly how to disable wdir scanning (Bts:issue5622)
   * lfs: register config options
   * localrepo: specify optional callback parameter to pathauditor as a keyword
   * match: do not weirdly include explicit files excluded by -X option
   * memfilectx: make changectx argument mandatory in constructor (API)
   * morestatus: don't crash with different drive letters for repo.root and CWD
   * outgoing: respect ":pushurl" paths (Bts:issue5365)
   * remove: print message for each file in verbose mode only while using '-A' (BC)
   * rewriteutil: use precheck() in uncommit and amend commands
   * scmutil: don't try to delete origbackup symlinks to directories (Bts:issue5731)
   * sshpeer: add support for request tracing
   * streamclone: add support for bundle2 based stream clone
   * streamclone: add support for cloning non append-only file
   * streamclone: also stream caches to the client
   * streamclone: define first iteration of version 2 of stream format
   * streamclone: move wire protocol status code from wireproto command
   * streamclone: rework canperformstreamclone
   * streamclone: tests phase exchange during stream clone
   * streamclone: use readexactly when reading stream v2
   * subrepo: add config option to reject any subrepo operations (SEC)
   * subrepo: disable git and svn subrepos by default (BC) (SEC)
   * subrepo: extend config option to disable subrepos by type (SEC)
   * subrepo: handle 'C:' style paths on the command line (Bts:issue5770)
   * subrepo: use per-type config options to enable subrepos
   * svnsubrepo: check if subrepo is missing when checking dirty state (Bts:issue5657)
   * tr-summary: keep a weakref to the unfiltered repository
   * unamend: fix command summary line
   * uncommit: unify functions _uncommitdirstate and _unamenddirstate to one
   * update: support updating to hidden cset if directaccess config is set
  
  === BC ===
  
   * extdata: abort if external command exits with non-zero status (BC)
   * graphlog: add another graph node type, unstable, using character "*" (BC)
   * hgweb: drop support of browsers that don't understand <canvas> (BC)
   * hgweb: only include graph-related data in jsdata variable on /graph pages (BC)
   * hgweb: stop adding strings to innerHTML of #graphnodes and #nodebgs (BC)
   * remove: print message for each file in verbose mode only while using '-A' (BC)
   * subrepo: disable git and svn subrepos by default (BC) (SEC)
   * transaction: register summary callbacks only at start of transaction (BC)
  
  === API Changes ===
  
   * memfilectx: make changectx argument mandatory in constructor (API)