mercurial/help/filesets.txt
author Gregory Szorc <gregory.szorc@gmail.com>
Mon, 22 Aug 2016 21:48:50 -0700
changeset 29830 92ac2baaea86
parent 23109 cf56f7a60b45
child 30729 a4bc8fff67fc
permissions -rw-r--r--
revlog: use an LRU cache for delta chain bases Profiling using statprof revealed a hotspot during changegroup application calculating delta chain bases on generaldelta repos. Essentially, revlog._addrevision() was performing a lot of redundant work tracing the delta chain as part of determining when the chain distance was acceptable. This was most pronounced when adding revisions to manifests, which can have delta chains thousands of revisions long. There was a delta chain base cache on revlogs before, but it only captured a single revision. This was acceptable before generaldelta, when _addrevision would build deltas from the previous revision and thus we'd pretty much guarantee a cache hit when resolving the delta chain base on a subsequent _addrevision call. However, it isn't suitable for generaldelta because parent revisions aren't necessarily the last processed revision. This patch converts the delta chain base cache to an LRU dict cache. The cache can hold multiple entries, so generaldelta repos have a higher chance of getting a cache hit. The impact of this change when processing changegroup additions is significant. On a generaldelta conversion of the "mozilla-unified" repo (which contains heads of the main Firefox repositories in chronological order - this means there are lots of transitions between heads in revlog order), this change has the following impact when performing an `hg unbundle` of an uncompressed bundle of the repo: before: 5:42 CPU time after: 4:34 CPU time Most of this time is saved when applying the changelog and manifest revlogs: before: 2:30 CPU time after: 1:17 CPU time That nearly a 50% reduction in CPU time applying changesets and manifests! Applying a gzipped bundle of the same repo (effectively simulating a `hg clone` over HTTP) showed a similar speedup: before: 5:53 CPU time after: 4:46 CPU time Wall time improvements were basically the same as CPU time. I didn't measure explicitly, but it feels like most of the time is saved when processing manifests. This makes sense, as large manifests tend to have very long delta chains and thus benefit the most from this cache. So, this change effectively makes changegroup application (which is used by `hg unbundle`, `hg clone`, `hg pull`, `hg unshelve`, and various other commands) significantly faster when delta chains are long (which can happen on repos with large numbers of files and thus large manifests). In theory, this change can result in more memory utilization. However, we're caching a dict of ints. At most we have 200 ints + Python object overhead per revlog. And, the cache is really only populated when performing read-heavy operations, such as adding changegroups or scanning an individual revlog. For memory bloat to be an issue, we'd need to scan/read several revisions from several revlogs all while having active references to several revlogs. I don't think there are many operations that do this, so I don't think memory bloat from the cache will be an issue.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
14686
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     1
Mercurial supports a functional language for selecting a set of
18960
170fc0949fb6 check-code: check txt files for trailing whitespace
Mads Kiilerich <madski@unity3d.com>
parents: 15825
diff changeset
     2
files.
14686
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     3
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     4
Like other file patterns, this pattern type is indicated by a prefix,
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     5
'set:'. The language supports a number of predicates which are joined
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     6
by infix operators. Parenthesis can be used for grouping.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     7
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     8
Identifiers such as filenames or patterns must be quoted with single
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
     9
or double quotes if they contain characters outside of
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    10
``[.*{}[]?/\_a-zA-Z0-9\x80-\xff]`` or if they match one of the
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    11
predefined predicates. This generally applies to file patterns other
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    12
than globs and arguments for predicates.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    13
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    14
Special characters can be used in quoted identifiers by escaping them,
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    15
e.g., ``\n`` is interpreted as a newline. To prevent them from being
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    16
interpreted, strings can be prefixed with ``r``, e.g. ``r'...'``.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    17
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    18
There is a single prefix operator:
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    19
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    20
``not x``
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    21
  Files not in x. Short form is ``! x``.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    22
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    23
These are the supported infix operators:
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    24
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    25
``x and y``
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    26
  The intersection of files in x and y. Short form is ``x & y``.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    27
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    28
``x or y``
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    29
  The union of files in x and y. There are two alternative short
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    30
  forms: ``x | y`` and ``x + y``.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    31
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    32
``x - y``
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    33
  Files in x but not in y.
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    34
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    35
The following predicates are supported:
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    36
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    37
.. predicatesmarker
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    38
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    39
Some sample queries:
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    40
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    41
- Show status of files that appear to be binary in the working directory::
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    42
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    43
    hg status -A "set:binary()"
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    44
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    45
- Forget files that are in .hgignore but are already tracked::
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    46
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    47
    hg forget "set:hgignore() and not ignored()"
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    48
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    49
- Find text files that contain a string::
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    50
23109
cf56f7a60b45 help: use "hg files" instead of "hg locate" in "hg help filesets"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 18960
diff changeset
    51
    hg files "set:grep(magic) and not binary()"
14686
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    52
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    53
- Find C files in a non-standard encoding::
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    54
23109
cf56f7a60b45 help: use "hg files" instead of "hg locate" in "hg help filesets"
FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
parents: 18960
diff changeset
    55
    hg files "set:**.c and not encoding('UTF-8')"
14686
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    56
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    57
- Revert copies of large binary files::
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    58
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    59
    hg revert "set:copied() and binary() and size('>1M')"
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    60
14829
968c301a1005 help: fileset foo.lst was named files.lst
Arne Babenhauserheide <bab@draketo.de>
parents: 14686
diff changeset
    61
- Remove files listed in foo.lst that contain the letter a or b::
14686
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    62
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    63
    hg remove "set: 'listfile:foo.lst' and (**a* or **b*)"
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    64
6ab8b17adc03 fileset: add a help topic
Matt Mackall <mpm@selenic.com>
parents:
diff changeset
    65
See also :hg:`help patterns`.