view tests/test-mq-qimport.t @ 30442:41a8106789ca

util: implement zstd compression engine Now that zstd is vendored and being built (in some configurations), we can implement a compression engine for zstd! The zstd engine is a little different from existing engines. Because it may not always be present, we have to defer load the module in case importing it fails. We facilitate this via a cached property that holds a reference to the module or None. The "available" method is implemented to reflect reality. The zstd engine declares its ability to handle bundles using the "zstd" human name and the "ZS" internal name. The latter was chosen because internal names are 2 characters (by only convention I think) and "ZS" seems reasonable. The engine, like others, supports specifying the compression level. However, there are no consumers of this API that yet pass in that argument. I have plans to change that, so stay tuned. Since all we need to do to support bundle generation with a new compression engine is implement and register the compression engine, bundle generation with zstd "just works!" Tests demonstrating this have been added. How does performance of zstd for bundle generation compare? On the mozilla-unified repo, `hg bundle --all -t <engine>-v2` yields the following on my i7-6700K on Linux: engine CPU time bundle size vs orig size throughput none 97.0s 4,054,405,584 100.0% 41.8 MB/s bzip2 (l=9) 393.6s 975,343,098 24.0% 10.3 MB/s gzip (l=6) 184.0s 1,140,533,074 28.1% 22.0 MB/s zstd (l=1) 108.2s 1,119,434,718 27.6% 37.5 MB/s zstd (l=2) 111.3s 1,078,328,002 26.6% 36.4 MB/s zstd (l=3) 113.7s 1,011,823,727 25.0% 35.7 MB/s zstd (l=4) 116.0s 1,008,965,888 24.9% 35.0 MB/s zstd (l=5) 121.0s 977,203,148 24.1% 33.5 MB/s zstd (l=6) 131.7s 927,360,198 22.9% 30.8 MB/s zstd (l=7) 139.0s 912,808,505 22.5% 29.2 MB/s zstd (l=12) 198.1s 854,527,714 21.1% 20.5 MB/s zstd (l=18) 681.6s 789,750,690 19.5% 5.9 MB/s On compression, zstd for bundle generation delivers: * better compression than gzip with significantly less CPU utilization * better than bzip2 compression ratios while still being significantly faster than gzip * ability to aggressively tune compression level to achieve significantly smaller bundles That last point is important. With clone bundles, a server can pre-generate a bundle file, upload it to a static file server, and redirect clients to transparently download it during clone. The server could choose to produce a zstd bundle with the highest compression settings possible. This would take a very long time - a magnitude longer than a typical zstd bundle generation - but the result would be hundreds of megabytes smaller! For the clone volume we do at Mozilla, this could translate to petabytes of bandwidth savings per year and faster clones (due to smaller transfer size). I don't have detailed numbers to report on decompression. However, zstd decompression is fast: >1 GB/s output throughput on this machine, even through the Python bindings. And it can do that regardless of the compression level of the input. By the time you have enough data to worry about overhead of decompression, you have plenty of other things to worry about performance wise. zstd is wins all around. I can't wait to implement support for it on the wire protocol and in revlogs.
author Gregory Szorc <gregory.szorc@gmail.com>
date Fri, 11 Nov 2016 01:10:07 -0800
parents 0342bf292f73
children 448acdee9161
line wrap: on
line source

#require killdaemons

  $ cat > writelines.py <<EOF
  > import sys
  > path = sys.argv[1]
  > args = sys.argv[2:]
  > assert (len(args) % 2) == 0
  > 
  > f = file(path, 'wb')
  > for i in xrange(len(args)/2):
  >    count, s = args[2*i:2*i+2]
  >    count = int(count)
  >    s = s.decode('string_escape')
  >    f.write(s*count)
  > f.close()
  > 
  > EOF
  > cat <<EOF >> $HGRCPATH
  > [extensions]
  > mq =
  > [diff]
  > git = 1
  > EOF
  $ hg init repo
  $ cd repo

qimport without file or revision

  $ hg qimport
  abort: no files or revisions specified
  [255]

qimport non-existing-file

  $ hg qimport non-existing-file
  abort: unable to read file non-existing-file
  [255]

qimport null revision

  $ hg qimport -r null
  abort: revision -1 is not mutable
  (see 'hg help phases' for details)
  [255]
  $ hg qseries

import email

  $ hg qimport --push -n email - <<EOF
  > From: Username in email <test@example.net>
  > Subject: [PATCH] Message in email
  > Date: Fri, 02 Jan 1970 00:00:00 +0000
  > 
  > Text before patch.
  > 
  > # HG changeset patch
  > # User Username in patch <test@example.net>
  > # Date 0 0
  > # Node ID 1a706973a7d84cb549823634a821d9bdf21c6220
  > # Parent  0000000000000000000000000000000000000000
  > First line of commit message.
  > 
  > More text in commit message.
  > --- confuse the diff detection
  > 
  > diff --git a/x b/x
  > new file mode 100644
  > --- /dev/null
  > +++ b/x
  > @@ -0,0 +1,1 @@
  > +new file
  > Text after patch.
  > 
  > EOF
  adding email to series file
  applying email
  now at: email

hg tip -v

  $ hg tip -v
  changeset:   0:1a706973a7d8
  tag:         email
  tag:         qbase
  tag:         qtip
  tag:         tip
  user:        Username in patch <test@example.net>
  date:        Thu Jan 01 00:00:00 1970 +0000
  files:       x
  description:
  First line of commit message.
  
  More text in commit message.
  
  
  $ hg qpop
  popping email
  patch queue now empty
  $ hg qdelete email

import URL

  $ echo foo >> foo
  $ hg add foo
  $ hg diff > url.diff
  $ hg revert --no-backup foo
  $ rm foo

Under unix: file:///foobar/blah
Under windows: file:///c:/foobar/blah

  $ patchurl=`pwd | tr '\\\\' /`/url.diff
  $ expr "$patchurl" : "\/" > /dev/null || patchurl="/$patchurl"
  $ hg qimport file://"$patchurl"
  adding url.diff to series file
  $ rm url.diff
  $ hg qun
  url.diff

import patch that already exists

  $ echo foo2 >> foo
  $ hg add foo
  $ hg diff > ../url.diff
  $ hg revert --no-backup foo
  $ rm foo
  $ hg qimport ../url.diff
  abort: patch "url.diff" already exists
  [255]
  $ hg qpush
  applying url.diff
  now at: url.diff
  $ cat foo
  foo
  $ hg qpop
  popping url.diff
  patch queue now empty

qimport -f

  $ hg qimport -f ../url.diff
  adding url.diff to series file
  $ hg qpush
  applying url.diff
  now at: url.diff
  $ cat foo
  foo2
  $ hg qpop
  popping url.diff
  patch queue now empty

build diff with CRLF

  $ python ../writelines.py b 5 'a\n' 5 'a\r\n'
  $ hg ci -Am addb
  adding b
  $ python ../writelines.py b 2 'a\n' 10 'b\n' 2 'a\r\n'
  $ hg diff > b.diff
  $ hg up -C
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved

qimport CRLF diff

  $ hg qimport b.diff
  adding b.diff to series file
  $ hg qpush
  applying b.diff
  now at: b.diff

try to import --push

  $ cat > appendfoo.diff <<EOF
  > append foo
  > 
  > diff -r 07f494440405 -r 261500830e46 baz
  > --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
  > +++ b/baz	Thu Jan 01 00:00:00 1970 +0000
  > @@ -0,0 +1,1 @@
  > +foo
  > EOF

  $ cat > appendbar.diff <<EOF
  > append bar
  > 
  > diff -r 07f494440405 -r 261500830e46 baz
  > --- a/baz	Thu Jan 01 00:00:00 1970 +0000
  > +++ b/baz	Thu Jan 01 00:00:00 1970 +0000
  > @@ -1,1 +1,2 @@
  >  foo
  > +bar
  > EOF

  $ hg qimport --push appendfoo.diff appendbar.diff
  adding appendfoo.diff to series file
  adding appendbar.diff to series file
  applying appendfoo.diff
  applying appendbar.diff
  now at: appendbar.diff
  $ hg qfin -a
  patch b.diff finalized without changeset message
  $ touch .hg/patches/append_foo
  $ hg qimport -r 'p1(.)::'
  $ hg qapplied
  append_foo__1
  append_bar
  $ hg qfin -a
  $ rm .hg/patches/append_foo
  $ hg qimport -r 'p1(.)::' -P
  $ hg qpop -a
  popping append_bar
  popping append_foo
  patch queue now empty
  $ hg qdel append_foo
  $ hg qdel -k append_bar

qimport -e

  $ hg qimport -e append_bar
  adding append_bar to series file
  $ hg qdel -k append_bar

qimport -e --name newname oldexisitingpatch

  $ hg qimport -e --name this-name-is-better append_bar
  renaming append_bar to this-name-is-better
  adding this-name-is-better to series file
  $ hg qser
  this-name-is-better
  url.diff

qimport -e --name without --force

  $ cp .hg/patches/this-name-is-better .hg/patches/3.diff
  $ hg qimport -e --name this-name-is-better 3.diff
  abort: patch "this-name-is-better" already exists
  [255]
  $ hg qser
  this-name-is-better
  url.diff

qimport -e --name with --force

  $ hg qimport --force -e --name this-name-is-better 3.diff
  renaming 3.diff to this-name-is-better
  adding this-name-is-better to series file
  $ hg qser
  this-name-is-better
  url.diff

qimport with bad name, should abort before reading file

  $ hg qimport non-existent-file --name .hg
  abort: patch name cannot begin with ".hg"
  [255]

qimport http:// patch with leading slashes in url

set up hgweb

  $ cd ..
  $ hg init served
  $ cd served
  $ echo a > a
  $ hg ci -Am patch
  adding a
  $ hg serve -p $HGPORT -d --pid-file=hg.pid -A access.log -E errors.log
  $ cat hg.pid >> $DAEMON_PIDS

  $ cd ../repo
  $ hg qimport http://localhost:$HGPORT/raw-rev/0///
  adding 0 to series file

check qimport phase:

  $ hg -q qpush
  now at: 0
  $ hg phase qparent
  1: draft
  $ hg qimport -r qparent
  $ hg phase qbase
  1: draft
  $ hg qfinish qbase
  $ echo '[mq]' >> $HGRCPATH
  $ echo 'secret=true' >> $HGRCPATH
  $ hg qimport -r qparent
  $ hg phase qbase
  1: secret

  $ cd ..

  $ killdaemons.py

check patch name generation for non-alpha-numeric summary line

  $ cd repo

  $ hg qpop -a -q
  patch queue now empty
  $ hg qseries -v
  0 U imported_patch_b_diff
  1 U 0
  2 U this-name-is-better
  3 U url.diff

  $ echo bb >> b
  $ hg commit -m '==++--=='

  $ hg qimport -r tip
  $ hg qseries -v
  0 A 1.diff
  1 U imported_patch_b_diff
  2 U 0
  3 U this-name-is-better
  4 U url.diff

check reserved patch names

  $ hg qpop -qa
  patch queue now empty
  $ echo >> b
  $ hg commit -m 'status'
  $ echo >> b
  $ hg commit -m '.'
  $ echo >> b
  $ hg commit -m 'taken'
  $ mkdir .hg/patches/taken
  $ touch .hg/patches/taken__1
  $ hg qimport -r -3::
  $ hg qap
  1.diff__1
  2.diff
  taken__2

check very long patch name

  $ hg qpop -qa
  patch queue now empty
  $ echo >> b
  $ hg commit -m 'abcdefghi pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi pqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
  $ hg qimport -r .
  $ hg qap
  abcdefghi_pqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghi_pqrstuvwxyzabcdefg