match: support rooted globs in hgignore
In a .hgignore, "glob:foo" always means "**/foo". This cannot be
avoided because there is no syntax like "^" in regexes to say you
don't want the implied "**/" (of course one can use regexes, but glob
syntax is nice).
When you have a long list of fairly specific globs like
path/to/some/thing, this has two consequences:
1. unintended files may be ignored (not too common though)
2. matching performance can suffer significantly
Here is vanilla hg status timing on a private repository:
Using syntax:glob everywhere
real 0m2.199s
user 0m1.545s
sys 0m0.619s
When rooting the appropriate globs
real 0m1.434s
user 0m0.847s
sys 0m0.565s
(tangentially, none of this shows up in --profile's output. It
seems that C code doesn't play well with profiling)
The code already supports this but there is no syntax to make use of
it, so it seems reasonable to create such syntax. I create a new
hgignore syntax "rootglob".
Differential Revision: https://phab.mercurial-scm.org/D5493
--- a/mercurial/help/hgignore.txt Wed Nov 07 15:45:09 2018 -0800
+++ b/mercurial/help/hgignore.txt Thu Jan 03 19:02:46 2019 -0500
@@ -59,14 +59,17 @@
Regular expression, Python/Perl syntax.
``glob``
Shell-style glob.
+``rootglob``
+ A variant of ``glob`` that is rooted (see below).
The chosen syntax stays in effect when parsing all patterns that
follow, until another syntax is selected.
-Neither glob nor regexp patterns are rooted. A glob-syntax pattern of
-the form ``*.c`` will match a file ending in ``.c`` in any directory,
-and a regexp pattern of the form ``\.c$`` will do the same. To root a
-regexp pattern, start it with ``^``.
+Neither ``glob`` nor regexp patterns are rooted. A glob-syntax
+pattern of the form ``*.c`` will match a file ending in ``.c`` in any
+directory, and a regexp pattern of the form ``\.c$`` will do the
+same. To root a regexp pattern, start it with ``^``. To get the same
+effect with glob-syntax, you have to use ``rootglob``.
Subdirectories can have their own .hgignore settings by adding
``subinclude:path/to/subdir/.hgignore`` to the root ``.hgignore``. See
--- a/mercurial/help/patterns.txt Wed Nov 07 15:45:09 2018 -0800
+++ b/mercurial/help/patterns.txt Thu Jan 03 19:02:46 2019 -0500
@@ -20,7 +20,9 @@
To use an extended glob, start a name with ``glob:``. Globs are rooted
at the current directory; a glob such as ``*.c`` will only match files
-in the current directory ending with ``.c``.
+in the current directory ending with ``.c``. ``rootglob:`` can be used
+instead of ``glob:`` for a glob that is rooted at the root of the
+repository.
The supported glob syntax extensions are ``**`` to match any string
across path separators and ``{a,b}`` to mean "a or b".
@@ -64,6 +66,7 @@
foo/*.c any name ending in ".c" in the directory foo
foo/**.c any name ending in ".c" in any subdirectory of foo
including itself.
+ rootglob:*.c any name ending in ".c" in the root of the repository
Regexp examples::
--- a/mercurial/match.py Wed Nov 07 15:45:09 2018 -0800
+++ b/mercurial/match.py Thu Jan 03 19:02:46 2019 -0500
@@ -25,6 +25,7 @@
)
allpatternkinds = ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
+ 'rootglob',
'listfile', 'listfile0', 'set', 'include', 'subinclude',
'rootfilesin')
cwdrelativepatternkinds = ('relpath', 'glob')
@@ -221,7 +222,7 @@
for kind, pat in [_patsplit(p, default) for p in patterns]:
if kind in cwdrelativepatternkinds:
pat = pathutil.canonpath(root, cwd, pat, auditor)
- elif kind in ('relglob', 'path', 'rootfilesin'):
+ elif kind in ('relglob', 'path', 'rootfilesin', 'rootglob'):
pat = util.normpath(pat)
elif kind in ('listfile', 'listfile0'):
try:
@@ -1137,7 +1138,7 @@
if pat.startswith('^'):
return pat
return '.*' + pat
- if kind == 'glob':
+ if kind in ('glob', 'rootglob'):
return _globre(pat) + globsuffix
raise error.ProgrammingError('not a regex pattern: %s:%s' % (kind, pat))
@@ -1252,7 +1253,7 @@
r = []
d = []
for kind, pat, source in kindpats:
- if kind == 'glob': # find the non-glob prefix
+ if kind in ('glob', 'rootglob'): # find the non-glob prefix
root = []
for p in pat.split('/'):
if '[' in p or '{' in p or '*' in p or '?' in p:
@@ -1351,6 +1352,7 @@
syntax: glob # defaults following lines to non-rooted globs
re:pattern # non-rooted regular expression
glob:pattern # non-rooted glob
+ rootglob:pat # rooted glob (same root as ^ in regexps)
pattern # pattern of the current default type
if sourceinfo is set, returns a list of tuples:
@@ -1361,6 +1363,7 @@
're': 'relre:',
'regexp': 'relre:',
'glob': 'relglob:',
+ 'rootglob': 'rootglob:',
'include': 'include',
'subinclude': 'subinclude',
}
--- a/tests/test-hgignore.t Wed Nov 07 15:45:09 2018 -0800
+++ b/tests/test-hgignore.t Thu Jan 03 19:02:46 2019 -0500
@@ -239,6 +239,17 @@
dir/c.o is ignored
(ignore rule in $TESTTMP/ignorerepo/.hgignore, line 2: 'dir/**/c.o') (glob)
+Check rooted globs
+
+ $ hg purge --all --config extensions.purge=
+ $ echo "syntax: rootglob" > .hgignore
+ $ echo "a/*.ext" >> .hgignore
+ $ for p in a b/a aa; do mkdir -p $p; touch $p/b.ext; done
+ $ hg status -A 'set:**.ext'
+ ? aa/b.ext
+ ? b/a/b.ext
+ I a/b.ext
+
Check using 'include:' in ignore file
$ hg purge --all --config extensions.purge=
@@ -257,10 +268,15 @@
Check recursive uses of 'include:'
$ echo "include:nested/ignore" >> otherignore
- $ mkdir nested
+ $ mkdir nested nested/more
$ echo "glob:*ignore" > nested/ignore
+ $ echo "rootglob:a" >> nested/ignore
+ $ touch a nested/a nested/more/a
$ hg status
A dir/b.o
+ ? nested/a
+ ? nested/more/a
+ $ rm a nested/a nested/more/a
$ cp otherignore goodignore
$ echo "include:badignore" >> otherignore
@@ -291,18 +307,26 @@
? dir1/file2
? dir2/file1
-Check including subincludes with regexs
+Check including subincludes with other patterns
$ echo "subinclude:dir1/.hgignore" >> .hgignore
+
+ $ mkdir dir1/subdir
+ $ touch dir1/subdir/file1
+ $ echo "rootglob:f?le1" > dir1/.hgignore
+ $ hg status
+ ? dir1/file2
+ ? dir1/subdir/file1
+ ? dir2/file1
+ $ rm dir1/subdir/file1
+
$ echo "regexp:f.le1" > dir1/.hgignore
-
$ hg status
? dir1/file2
? dir2/file1
Check multiple levels of sub-ignores
- $ mkdir dir1/subdir
$ touch dir1/subdir/subfile1 dir1/subdir/subfile3 dir1/subdir/subfile4
$ echo "subinclude:subdir/.hgignore" >> dir1/.hgignore
$ echo "glob:subfil*3" >> dir1/subdir/.hgignore