view tests/test-journal.t @ 29830:92ac2baaea86

revlog: use an LRU cache for delta chain bases Profiling using statprof revealed a hotspot during changegroup application calculating delta chain bases on generaldelta repos. Essentially, revlog._addrevision() was performing a lot of redundant work tracing the delta chain as part of determining when the chain distance was acceptable. This was most pronounced when adding revisions to manifests, which can have delta chains thousands of revisions long. There was a delta chain base cache on revlogs before, but it only captured a single revision. This was acceptable before generaldelta, when _addrevision would build deltas from the previous revision and thus we'd pretty much guarantee a cache hit when resolving the delta chain base on a subsequent _addrevision call. However, it isn't suitable for generaldelta because parent revisions aren't necessarily the last processed revision. This patch converts the delta chain base cache to an LRU dict cache. The cache can hold multiple entries, so generaldelta repos have a higher chance of getting a cache hit. The impact of this change when processing changegroup additions is significant. On a generaldelta conversion of the "mozilla-unified" repo (which contains heads of the main Firefox repositories in chronological order - this means there are lots of transitions between heads in revlog order), this change has the following impact when performing an `hg unbundle` of an uncompressed bundle of the repo: before: 5:42 CPU time after: 4:34 CPU time Most of this time is saved when applying the changelog and manifest revlogs: before: 2:30 CPU time after: 1:17 CPU time That nearly a 50% reduction in CPU time applying changesets and manifests! Applying a gzipped bundle of the same repo (effectively simulating a `hg clone` over HTTP) showed a similar speedup: before: 5:53 CPU time after: 4:46 CPU time Wall time improvements were basically the same as CPU time. I didn't measure explicitly, but it feels like most of the time is saved when processing manifests. This makes sense, as large manifests tend to have very long delta chains and thus benefit the most from this cache. So, this change effectively makes changegroup application (which is used by `hg unbundle`, `hg clone`, `hg pull`, `hg unshelve`, and various other commands) significantly faster when delta chains are long (which can happen on repos with large numbers of files and thus large manifests). In theory, this change can result in more memory utilization. However, we're caching a dict of ints. At most we have 200 ints + Python object overhead per revlog. And, the cache is really only populated when performing read-heavy operations, such as adding changegroups or scanning an individual revlog. For memory bloat to be an issue, we'd need to scan/read several revisions from several revlogs all while having active references to several revlogs. I don't think there are many operations that do this, so I don't think memory bloat from the cache will be an issue.
author Gregory Szorc <gregory.szorc@gmail.com>
date Mon, 22 Aug 2016 21:48:50 -0700
parents e54bdd3aa23b
children 6d11ae3c4c4b
line wrap: on
line source

Tests for the journal extension; records bookmark locations.

  $ cat >> testmocks.py << EOF
  > # mock out util.getuser() and util.makedate() to supply testable values
  > import os
  > from mercurial import util
  > def mockgetuser():
  >     return 'foobar'
  > 
  > def mockmakedate():
  >     filename = os.path.join(os.environ['TESTTMP'], 'testtime')
  >     try:
  >         with open(filename, 'rb') as timef:
  >             time = float(timef.read()) + 1
  >     except IOError:
  >         time = 0.0
  >     with open(filename, 'wb') as timef:
  >         timef.write(str(time))
  >     return (time, 0)
  > 
  > util.getuser = mockgetuser
  > util.makedate = mockmakedate
  > EOF

  $ cat >> $HGRCPATH << EOF
  > [extensions]
  > journal=
  > testmocks=`pwd`/testmocks.py
  > EOF

Setup repo

  $ hg init repo
  $ cd repo

Test empty journal

  $ hg journal
  previous locations of '.':
  no recorded locations
  $ hg journal foo
  previous locations of 'foo':
  no recorded locations

Test that working copy changes are tracked

  $ echo a > a
  $ hg commit -Aqm a
  $ hg journal
  previous locations of '.':
  cb9a9f314b8b  commit -Aqm a
  $ echo b > a
  $ hg commit -Aqm b
  $ hg journal
  previous locations of '.':
  1e6c11564562  commit -Aqm b
  cb9a9f314b8b  commit -Aqm a
  $ hg up 0
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ hg journal
  previous locations of '.':
  cb9a9f314b8b  up 0
  1e6c11564562  commit -Aqm b
  cb9a9f314b8b  commit -Aqm a

Test that bookmarks are tracked

  $ hg book -r tip bar
  $ hg journal bar
  previous locations of 'bar':
  1e6c11564562  book -r tip bar
  $ hg book -f bar
  $ hg journal bar
  previous locations of 'bar':
  cb9a9f314b8b  book -f bar
  1e6c11564562  book -r tip bar
  $ hg up
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  updating bookmark bar
  $ hg journal bar
  previous locations of 'bar':
  1e6c11564562  up
  cb9a9f314b8b  book -f bar
  1e6c11564562  book -r tip bar

Test that bookmarks and working copy tracking is not mixed

  $ hg journal
  previous locations of '.':
  1e6c11564562  up
  cb9a9f314b8b  up 0
  1e6c11564562  commit -Aqm b
  cb9a9f314b8b  commit -Aqm a

Test that you can list all entries as well as limit the list or filter on them

  $ hg book -r tip baz
  $ hg journal --all
  previous locations of the working copy and bookmarks:
  1e6c11564562  baz       book -r tip baz
  1e6c11564562  bar       up
  1e6c11564562  .         up
  cb9a9f314b8b  bar       book -f bar
  1e6c11564562  bar       book -r tip bar
  cb9a9f314b8b  .         up 0
  1e6c11564562  .         commit -Aqm b
  cb9a9f314b8b  .         commit -Aqm a
  $ hg journal --limit 2
  previous locations of '.':
  1e6c11564562  up
  cb9a9f314b8b  up 0
  $ hg journal bar
  previous locations of 'bar':
  1e6c11564562  up
  cb9a9f314b8b  book -f bar
  1e6c11564562  book -r tip bar
  $ hg journal foo
  previous locations of 'foo':
  no recorded locations
  $ hg journal .
  previous locations of '.':
  1e6c11564562  up
  cb9a9f314b8b  up 0
  1e6c11564562  commit -Aqm b
  cb9a9f314b8b  commit -Aqm a
  $ hg journal "re:ba."
  previous locations of 're:ba.':
  1e6c11564562  baz       book -r tip baz
  1e6c11564562  bar       up
  cb9a9f314b8b  bar       book -f bar
  1e6c11564562  bar       book -r tip bar

Test that verbose, JSON, template and commit output work

  $ hg journal --verbose --all
  previous locations of the working copy and bookmarks:
  000000000000 -> 1e6c11564562 foobar    baz      1970-01-01 00:00 +0000  book -r tip baz
  cb9a9f314b8b -> 1e6c11564562 foobar    bar      1970-01-01 00:00 +0000  up
  cb9a9f314b8b -> 1e6c11564562 foobar    .        1970-01-01 00:00 +0000  up
  1e6c11564562 -> cb9a9f314b8b foobar    bar      1970-01-01 00:00 +0000  book -f bar
  000000000000 -> 1e6c11564562 foobar    bar      1970-01-01 00:00 +0000  book -r tip bar
  1e6c11564562 -> cb9a9f314b8b foobar    .        1970-01-01 00:00 +0000  up 0
  cb9a9f314b8b -> 1e6c11564562 foobar    .        1970-01-01 00:00 +0000  commit -Aqm b
  000000000000 -> cb9a9f314b8b foobar    .        1970-01-01 00:00 +0000  commit -Aqm a
  $ hg journal --verbose -Tjson
  [
   {
    "command": "up",
    "date": [5.0, 0],
    "name": ".",
    "newhashes": ["1e6c11564562b4ed919baca798bc4338bd299d6a"],
    "oldhashes": ["cb9a9f314b8b07ba71012fcdbc544b5a4d82ff5b"],
    "user": "foobar"
   },
   {
    "command": "up 0",
    "date": [2.0, 0],
    "name": ".",
    "newhashes": ["cb9a9f314b8b07ba71012fcdbc544b5a4d82ff5b"],
    "oldhashes": ["1e6c11564562b4ed919baca798bc4338bd299d6a"],
    "user": "foobar"
   },
   {
    "command": "commit -Aqm b",
    "date": [1.0, 0],
    "name": ".",
    "newhashes": ["1e6c11564562b4ed919baca798bc4338bd299d6a"],
    "oldhashes": ["cb9a9f314b8b07ba71012fcdbc544b5a4d82ff5b"],
    "user": "foobar"
   },
   {
    "command": "commit -Aqm a",
    "date": [0.0, 0],
    "name": ".",
    "newhashes": ["cb9a9f314b8b07ba71012fcdbc544b5a4d82ff5b"],
    "oldhashes": ["0000000000000000000000000000000000000000"],
    "user": "foobar"
   }
  ]

  $ cat <<EOF >> $HGRCPATH
  > [templates]
  > j = "{oldhashes % '{node|upper}'} -> {newhashes % '{node|upper}'}
  >      - user: {user}
  >      - command: {command}
  >      - date: {date|rfc3339date}
  >      - newhashes: {newhashes}
  >      - oldhashes: {oldhashes}
  >      "
  > EOF
  $ hg journal -Tj -l1
  previous locations of '.':
  CB9A9F314B8B07BA71012FCDBC544B5A4D82FF5B -> 1E6C11564562B4ED919BACA798BC4338BD299D6A
  - user: foobar
  - command: up
  - date: 1970-01-01T00:00:05+00:00
  - newhashes: 1e6c11564562b4ed919baca798bc4338bd299d6a
  - oldhashes: cb9a9f314b8b07ba71012fcdbc544b5a4d82ff5b

  $ hg journal --commit
  previous locations of '.':
  1e6c11564562  up
  changeset:   1:1e6c11564562
  bookmark:    bar
  bookmark:    baz
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     b
  
  cb9a9f314b8b  up 0
  changeset:   0:cb9a9f314b8b
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     a
  
  1e6c11564562  commit -Aqm b
  changeset:   1:1e6c11564562
  bookmark:    bar
  bookmark:    baz
  tag:         tip
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     b
  
  cb9a9f314b8b  commit -Aqm a
  changeset:   0:cb9a9f314b8b
  user:        test
  date:        Thu Jan 01 00:00:00 1970 +0000
  summary:     a
  

Test for behaviour on unexpected storage version information

  $ printf '42\0' > .hg/journal
  $ hg journal
  previous locations of '.':
  abort: unknown journal file version '42'
  [255]
  $ hg book -r tip doomed
  unsupported journal file version '42'