view tests/test-sparse-merges.t @ 40021:c537144fdbef

wireprotov2: support response caching One of the things I've learned from managing VCS servers over the years is that they are hard to scale. It is well known that some companies have very beefy (read: very expensive) servers to power their VCS needs. It is also known that specialized servers for various VCS exist in order to facilitate scaling servers. (Mercurial is in this boat.) One of the aspects that make a VCS server hard to scale is the high CPU load incurred by constant client clone/pull operations. To alleviate the scaling pain associated with data retrieval operations, I want to integrate caching into the Mercurial wire protocol server as robustly as possible such that servers can aggressively cache responses and defer as much server load as possible. This commit represents the initial implementation of a general caching layer in wire protocol version 2. We define a new interface and behavior for a wire protocol cacher in repository.py. (This is probably where a reviewer should look first to understand what is going on.) The bulk of the added code is in wireprotov2server.py, where we define how a command can opt in to being cached and integrate caching into command dispatching. From a very high-level: * A command can declare itself as cacheable by providing a callable that can be used to derive a cache key. * At dispatch time, if a command is cacheable, we attempt to construct a cacher and use it for serving the request and/or caching the request. * The dispatch layer handles the bulk of the business logic for caching, making cachers mostly "dumb content stores." * The mechanism for invalidating cached entries (one of the harder parts about caching in general) is by varying the cache key when state changes. As such, cachers don't need to be concerned with cache invalidation. Initially, we've hooked up support for caching "manifestdata" and "filedata" commands. These are the simplest to cache, as they should be immutable over time. Caching of commands related to changeset data is a bit harder (because cache validation is impacted by changes to bookmarks, phases, etc). This will be implemented later. (Strictly speaking, censoring a file should invalidate caches. I've added an inline TODO to track this edge case.) To prove it works, this commit implements a test-only extension providing in-memory caching backed by an lrucachedict. A new test showing this extension behaving properly is added. FWIW, the cacher is ~50 lines of code, demonstrating the relative ease with which a cache can be added to a server. While the test cacher is not suitable for production workloads, just for kicks I performed a clone of just the changeset and manifest data for the mozilla-unified repository. With a fully warmed cache (of just the manifest data since changeset data is not cached), server-side CPU usage dropped from ~73s to ~28s. That's pretty significant and demonstrates the potential that response caching has on server scalability! Differential Revision: https://phab.mercurial-scm.org/D4773
author Gregory Szorc <gregory.szorc@gmail.com>
date Wed, 26 Sep 2018 17:16:56 -0700
parents 9db856446298
children 4764e8436b2a
line wrap: on
line source

test merging things outside of the sparse checkout

  $ hg init myrepo
  $ cd myrepo
  $ cat > .hg/hgrc <<EOF
  > [extensions]
  > sparse=
  > EOF

  $ echo foo > foo
  $ echo bar > bar
  $ hg add foo bar
  $ hg commit -m initial

  $ hg branch feature
  marked working directory as branch feature
  (branches are permanent and global, did you want a bookmark?)
  $ echo bar2 >> bar
  $ hg commit -m 'feature - bar2'

  $ hg update -q default
  $ hg debugsparse --exclude 'bar**'

  $ hg merge feature
  temporarily included 1 file(s) in the sparse checkout for merging
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  (branch merge, don't forget to commit)

Verify bar was merged temporarily

  $ ls
  bar
  foo
  $ hg status
  M bar

Verify bar disappears automatically when the working copy becomes clean

  $ hg commit -m "merged"
  cleaned up 1 temporarily added file(s) from the sparse checkout
  $ hg status
  $ ls
  foo

  $ hg cat -r . bar
  bar
  bar2

Test merging things outside of the sparse checkout that are not in the working
copy

  $ hg strip -q -r . --config extensions.strip=
  $ hg up -q feature
  $ touch branchonly
  $ hg ci -Aqm 'add branchonly'

  $ hg up -q default
  $ hg debugsparse -X branchonly
  $ hg merge feature
  temporarily included 2 file(s) in the sparse checkout for merging
  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
  (branch merge, don't forget to commit)

  $ cd ..

Tests merging a file which is modified in one branch and deleted in another and
file is excluded from sparse checkout

  $ hg init ytest
  $ cd ytest
  $ echo "syntax: glob" >> .hgignore
  $ echo "*.orig" >> .hgignore
  $ hg ci -Aqm "added .hgignore"
  $ for ch in a d; do echo foo > $ch; hg ci -Aqm "added "$ch; done;
  $ cat >> .hg/hgrc <<EOF
  > [alias]
  > glog = log -GT "{rev}:{node|short} {desc}"
  > [extensions]
  > sparse =
  > EOF

  $ hg glog
  @  2:f29feff37cfc added d
  |
  o  1:617125d27d6b added a
  |
  o  0:53f3774ed939 added .hgignore
  
  $ hg rm d
  $ hg ci -m "removed d"

  $ hg up '.^'
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ hg debugsparse --reset
  $ echo bar >> d
  $ hg ci -Am "added bar to d"
  created new head

  $ hg glog
  @  4:6527874a90e4 added bar to d
  |
  | o  3:372c8558de45 removed d
  |/
  o  2:f29feff37cfc added d
  |
  o  1:617125d27d6b added a
  |
  o  0:53f3774ed939 added .hgignore
  
  $ hg debugsparse --exclude "d"
  $ ls
  a

  $ hg merge
  temporarily included 1 file(s) in the sparse checkout for merging
  file 'd' was deleted in other [merge rev] but was modified in local [working copy].
  What do you want to do?
  use (c)hanged version, (d)elete, or leave (u)nresolved? u
  0 files updated, 0 files merged, 0 files removed, 1 files unresolved
  use 'hg resolve' to retry unresolved file merges or 'hg merge --abort' to abandon
  [1]

  $ cd ..

Testing merging of a file which is renamed+modified on one side and modified on
another

  $ hg init mvtest
  $ cd mvtest
  $ echo "syntax: glob" >> .hgignore
  $ echo "*.orig" >> .hgignore
  $ hg ci -Aqm "added .hgignore"
  $ for ch in a d; do echo foo > $ch; hg ci -Aqm "added "$ch; done;
  $ cat >> .hg/hgrc <<EOF
  > [alias]
  > glog = log -GT "{rev}:{node|short} {desc}"
  > [extensions]
  > sparse =
  > EOF

  $ hg glog
  @  2:f29feff37cfc added d
  |
  o  1:617125d27d6b added a
  |
  o  0:53f3774ed939 added .hgignore
  
  $ echo babar >> a
  $ hg ci -m "added babar to a"

  $ hg up '.^'
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
  $ hg mv a amove
  $ hg ci -m "moved a to amove"
  created new head

  $ hg up 3
  1 files updated, 0 files merged, 1 files removed, 0 files unresolved
  $ hg glog
  o  4:5d1e85955f6d moved a to amove
  |
  | @  3:a06e41a6c16c added babar to a
  |/
  o  2:f29feff37cfc added d
  |
  o  1:617125d27d6b added a
  |
  o  0:53f3774ed939 added .hgignore
  
  $ hg debugsparse --exclude "a"
  $ ls
  d

  $ hg merge
  temporarily included 1 file(s) in the sparse checkout for merging
  merging a and amove to amove
  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
  (branch merge, don't forget to commit)

  $ hg up -C 4
  cleaned up 1 temporarily added file(s) from the sparse checkout
  1 files updated, 0 files merged, 0 files removed, 0 files unresolved

  $ hg merge
  merging amove and a to amove
  abort: cannot add 'a' - it is outside the sparse checkout
  (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
  [255]