Mercurial > hg-stable
changeset 40242:19ed212de2d1
match: optimize matcher when all patterns are of rootfilesin kind
Internally at Google, we use narrowspecs with only rootfilesin-kind
patterns. Sometimes there are thousands of such patterns
(i.e. thousands of tracked directories). In such cases, it can take
quite long to build and evaluate the resulting matcher.
This patch optimizes matchers that have only patterns of rootfilesin
so it instead of creating a regular expression, it matches the given
file's directory against the set of directories.
In a repo with ~3600 tracked directories, it takes about 1.35 s to
build the matcher and 2.7 s to walk the dirstate before this
patch. After, it takes 0.04 s to create the matcher and 0.87 s to walk
the dirstate.
It may be worthwhile to do similar optimizations for e.g. patterns of
type "kind:", but that's not a priority for us right now.
Differential Revision: https://phab.mercurial-scm.org/D5058
author | Martin von Zweigbergk <martinvonz@google.com> |
---|---|
date | Sat, 13 Oct 2018 00:22:05 -0700 |
parents | 81e4f039a0cd |
children | 96e50dfd8c94 |
files | mercurial/match.py tests/test-walk.t |
diffstat | 2 files changed, 30 insertions(+), 18 deletions(-) [+] |
line wrap: on
line diff
--- a/mercurial/match.py Sat Oct 13 06:02:27 2018 -0400 +++ b/mercurial/match.py Sat Oct 13 00:22:05 2018 -0700 @@ -1164,8 +1164,20 @@ regex = '' if kindpats: - regex, mf = _buildregexmatch(kindpats, globsuffix) - matchfuncs.append(mf) + if all(k == 'rootfilesin' for k, p, s in kindpats): + dirs = {p for k, p, s in kindpats} + def mf(f): + i = f.rfind('/') + if i >= 0: + dir = f[:i] + else: + dir = '.' + return dir in dirs + regex = b'rootfilesin: %s' % sorted(dirs) + matchfuncs.append(mf) + else: + regex, mf = _buildregexmatch(kindpats, globsuffix) + matchfuncs.append(mf) if len(matchfuncs) == 1: return regex, matchfuncs[0]
--- a/tests/test-walk.t Sat Oct 13 06:02:27 2018 -0400 +++ b/tests/test-walk.t Sat Oct 13 00:22:05 2018 -0700 @@ -143,25 +143,25 @@ $ hg debugwalk -v 'rootfilesin:' * matcher: - <patternmatcher patterns='(?:[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['.']"> f fennel ../fennel f fenugreek ../fenugreek f fiddlehead ../fiddlehead $ hg debugwalk -v -I 'rootfilesin:' * matcher: - <includematcher includes='(?:[^/]+$)'> + <includematcher includes="rootfilesin: ['.']"> f fennel ../fennel f fenugreek ../fenugreek f fiddlehead ../fiddlehead $ hg debugwalk -v 'rootfilesin:.' * matcher: - <patternmatcher patterns='(?:[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['.']"> f fennel ../fennel f fenugreek ../fenugreek f fiddlehead ../fiddlehead $ hg debugwalk -v -I 'rootfilesin:.' * matcher: - <includematcher includes='(?:[^/]+$)'> + <includematcher includes="rootfilesin: ['.']"> f fennel ../fennel f fenugreek ../fenugreek f fiddlehead ../fiddlehead @@ -169,7 +169,7 @@ * matcher: <differencematcher m1=<alwaysmatcher>, - m2=<includematcher includes='(?:[^/]+$)'>> + m2=<includematcher includes="rootfilesin: ['.']">> f beans/black ../beans/black f beans/borlotti ../beans/borlotti f beans/kidney ../beans/kidney @@ -182,19 +182,19 @@ f mammals/skunk skunk $ hg debugwalk -v 'rootfilesin:fennel' * matcher: - <patternmatcher patterns='(?:fennel/[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['fennel']"> $ hg debugwalk -v -I 'rootfilesin:fennel' * matcher: - <includematcher includes='(?:fennel/[^/]+$)'> + <includematcher includes="rootfilesin: ['fennel']"> $ hg debugwalk -v 'rootfilesin:skunk' * matcher: - <patternmatcher patterns='(?:skunk/[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['skunk']"> $ hg debugwalk -v -I 'rootfilesin:skunk' * matcher: - <includematcher includes='(?:skunk/[^/]+$)'> + <includematcher includes="rootfilesin: ['skunk']"> $ hg debugwalk -v 'rootfilesin:beans' * matcher: - <patternmatcher patterns='(?:beans/[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['beans']"> f beans/black ../beans/black f beans/borlotti ../beans/borlotti f beans/kidney ../beans/kidney @@ -203,7 +203,7 @@ f beans/turtle ../beans/turtle $ hg debugwalk -v -I 'rootfilesin:beans' * matcher: - <includematcher includes='(?:beans/[^/]+$)'> + <includematcher includes="rootfilesin: ['beans']"> f beans/black ../beans/black f beans/borlotti ../beans/borlotti f beans/kidney ../beans/kidney @@ -212,25 +212,25 @@ f beans/turtle ../beans/turtle $ hg debugwalk -v 'rootfilesin:mammals' * matcher: - <patternmatcher patterns='(?:mammals/[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['mammals']"> f mammals/skunk skunk $ hg debugwalk -v -I 'rootfilesin:mammals' * matcher: - <includematcher includes='(?:mammals/[^/]+$)'> + <includematcher includes="rootfilesin: ['mammals']"> f mammals/skunk skunk $ hg debugwalk -v 'rootfilesin:mammals/' * matcher: - <patternmatcher patterns='(?:mammals/[^/]+$)'> + <patternmatcher patterns="rootfilesin: ['mammals']"> f mammals/skunk skunk $ hg debugwalk -v -I 'rootfilesin:mammals/' * matcher: - <includematcher includes='(?:mammals/[^/]+$)'> + <includematcher includes="rootfilesin: ['mammals']"> f mammals/skunk skunk $ hg debugwalk -v -X 'rootfilesin:mammals' * matcher: <differencematcher m1=<alwaysmatcher>, - m2=<includematcher includes='(?:mammals/[^/]+$)'>> + m2=<includematcher includes="rootfilesin: ['mammals']">> f beans/black ../beans/black f beans/borlotti ../beans/borlotti f beans/kidney ../beans/kidney