Gregory Szorc <gregory.szorc@gmail.com> [Sat, 01 Jul 2017 20:51:19 -0700] rev 33389
localrepo: cache types for filtered repos (
issue5043)
Python introduces a reference cycle on dynamically created types
via __mro__, making them very easy to leak. See
https://bugs.python.org/
issue17950.
Previously, repo.filtered() created a type on every invocation.
Long-running processes (like `hg convert`) could call this
function thousands of times, leading to a steady memory leak.
Since we're Unable to stop the leak because this is a bug in
Python, the next best thing is to contain it.
This patch adds a cache of of the dynamically generated repoview/filter
types on the localrepo object. Since we only generate each type
once, we cap the amount of memory that can leak to something
reasonable.
After this change, `hg convert` no longer leaks memory on every
revision. The process will likely grow memory usage over time due
to e.g. larger manifests. But there are no leaks.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Tue, 11 Jul 2017 02:10:04 +0900] rev 33388
convert: transcode CVS log messages by specified encoding (
issue5597)
Converting from CVS to Mercurial assumes that CVS log messages in "cvs
rlog" output are encoded in UTF-8 (or basic Latin-1). But cvs itself
is usually unaware of encoding of log messages, in practice.
Therefore, if there are commits, of which log message is encoded in
other than UTF-8, log message of corresponded revisions in the
converted repository will be broken.
To avoid such broken log messages, this patch transcodes CVS log
messages by encoding specified via "convert.cvsps.logencoding"
configuration.
This patch accepts multiple encoding for convenience, because
"multiple encoding mixed in a repository" easily occurs. For example,
UTF-8 (recent POSIX), cp932 (Windows), and EUC-JP (legacy POSIX) are
well known encoding for Japanese.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Mon, 10 Jul 2017 23:09:52 +0900] rev 33387
fsmonitor: execute setup procedures only if dirstate is already instantiated
Before this patch, reposetup() of fsmonitor executes setup procedures
for dirstate, even if it isn't yet instantiated at that time.
On the other hand, dirstate might be already instantiated before
reposetup() intentionally (prefilling by chg, for example, see
bf3af0eced44 for detail). If so, just discarding already instantiated
one in reposetup() causes issue.
To resolve both issues above, this patch executes setup procedures,
only if dirstate is already instantiated.
BTW, this patch removes "del repo.unfiltered().__dict__['dirstate']",
because it is responsibility of the code path, which causes
instantiation of dirstate before reposetup(). After this patch, using
localrepo.isfilecached() should avoid creating the corresponded entry
in repo.unfiltered().__dict__.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Mon, 10 Jul 2017 23:09:52 +0900] rev 33386
fsmonitor: centralize setup procedures for dirstate
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Mon, 10 Jul 2017 23:09:52 +0900] rev 33385
fsmonitor: avoid needless instantiation of dirstate
Using repo.local() instead of util.safehasattr(repo, 'dirstate') also
avoids executing setup procedures for remote repository (including
statichttprepo).
This is reason why this patch also removes a part of subsequent
comment, and try/except for AttributeError at accessing to repo.wvfs.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Mon, 10 Jul 2017 23:09:51 +0900] rev 33384
journal: use wrapfilecache instead of wrapfunction on func of filecache
wrapfilecache() on filecache-ed property works more strictly than
wrapfunction() directly on func() of filecache.
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Mon, 10 Jul 2017 23:09:51 +0900] rev 33383
journal: execute setup procedures for already instantiated dirstate
If dirstate is instantiated before reposetup() of journal extension,
it doesn't have "journalstorage" property, even if it is instantiated
via wrapdirstate() wrapping repo.dirstate(), because wrapdirstate()
works as same as original one before marking repo as "journal"-ing in
reposetup().
This issue can be reproduced by running test-journal.t or
test-journal-share.t with fsmonitor-run-tests.py.
On the other hand, just discarding already instantiated dirstate in
reposetup() prevents chg from filling dirstate before reposetup() (see
bf3af0eced44 for detail).
Therefore, this patch executes setup procedures for already
instantiated dirstate explicitly in reposetup().
To centralize setup procedures for dirstate, this patch also factors
them out from wrapdirstate().
FUJIWARA Katsunori <foozy@lares.dti.ne.jp> [Mon, 10 Jul 2017 23:09:51 +0900] rev 33382
localrepo: add isfilecached to check filecache-ed property is already cached
isfilecached() encapsulates internal implementation of filecache-ed
property.
"name in repo.unfiltered().__dict__" or so can't be used for this
purpose, because corresponded entry in __dict__ might be discarded by
repo.invalidate(), repo.invalidatedirstate() or so (fsmonitor does so,
for example).
This patch makes isfilecached() return not only whether filecache-ed
property is already cached, but also already cached value (or None),
in order to avoid subsequent access to cached object via "repo.NAME",
which prevents main Mercurial procedure after reposetup() from
validating cache.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 10 Jul 2017 21:09:46 -0700] rev 33381
sslutil: check for missing certificate and key files (
issue5598)
Currently, sslutil._hostsettings() performs validation that web.cacerts
exists. However, client certificates are passed in to the function
and not all callers may validate them. This includes
httpconnection.readauthforuri(), which loads the [auth] section.
If a missing file is specified, the ssl module will raise a generic
IOException. And, it doesn't even give us the courtesy of telling
us which file is missing! Mercurial then prints a generic
"abort: No such file or directory" (or similar) error, leaving users
to scratch their head as to what file is missing.
This commit introduces explicit validation of all paths passed as
arguments to wrapsocket() and wrapserversocket(). Any missing file
is alerted about explicitly.
We should probably catch missing files earlier - as part of loading
the [auth] section. However, I think the sslutil functions should
check for file presence regardless of what callers do because that's
the only way to be sure that missing files are always detected.
Martin von Zweigbergk <martinvonz@google.com> [Fri, 07 Jul 2017 08:55:12 -0700] rev 33380
match: override matchfn instead of __call__ for consistency
The matchers that were recently moved into core from the sparse
extension override __call__, while the previously existing matchers
override matchfn. Let's switch to the latter for consistency.
Martin von Zweigbergk <martinvonz@google.com> [Sun, 09 Jul 2017 17:02:09 -0700] rev 33379
match: express anypats(), not prefix(), in terms of the others
When I added prefix() in
9789b4a7c595 (match: introduce boolean
prefix() method, 2014-10-28), we already had always(), isexact(), and
anypats(), so it made sense to write it in terms of them (a prefix
matcher is one that isn't any of the other types). It's only now that
I realize that it's much more natural to define prefix() explicitly
(it's one that uses path: patterns, roughly speaking) and let
anypats() be defined in terms of the others. Remember that these
methods are all used for determining which fast paths are
possible. anypats() simply means that no fast paths are possible (it
could be called complex() instead). Further evidence is that
rootfilesin:some/dir does not have any patterns, but it's still
considered to be an anypats() matcher. That's because anypats() really
just means that it's not a prefix() matcher (and not always() and not
isexact()).
This patch thus changes prefix() to return False by default and
anypats() to return True only if the other three are False. Having
anypats() be True by default also seems like a good thing, because it
means forgetting to override it will lead only to performance bugs,
not correctness bugs.
Since the base class's implementation changes, we're also forced to
update the subclasses. That change exposed and fixed a bug in the
differencematcher: for example when both its two input matchers were
prefix matchers, we would say that the result was also a prefix
matcher, which is incorrect, because e.g "path:dir - path:dir/foo" no
longer matches everything under "dir" (which is what prefix() means).
Martin von Zweigbergk <martinvonz@google.com> [Sun, 09 Jul 2017 15:19:27 -0700] rev 33378
match: make nevermatcher an exact matcher and a prefix matcher
The m.isexact() and m.prefix() methods are used by callers to
determine whether m.files() can be used for fast paths. It seems safe
to let callers to any fast paths it can that rely on the empty
m.files().
Jun Wu <quark@fb.com> [Mon, 10 Jul 2017 10:56:40 -0700] rev 33377
revset: define successors revset
This revset returns all successors, including transit nodes and the source
nodes (to be consistent with existing revsets like "ancestors").
To filter out transit nodes, use `successors(X)-obsolete()`.
To filter out divergent case, use `successors(X)-divergent()-obsolete()`.
The revset could be useful to define rebase destination, like:
`max(successors(BASE)-divergent()-obsolete())`. The `max` is to deal with
splits.
There are other implementations where `successors` returns just one level of
successors, and `allsuccessors` returns everything. I think `successors`
returning all successors by default is more user friendly. We have seen
cases in production where people use 1-level `successors` while they really
want `allsuccessors`. So it seems better to just have one single revset
returning all successors by default to avoid user errors.
In the future we might want to add `depth` keyword argument to it and for
other revsets like `ancestors` etc. Or even build some flexible indexing
syntax [1] to satisfy people having the depth limit requirement.
[1]: https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-July/101140.html
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 10 Jul 2017 21:55:43 -0700] rev 33376
sparse: shorten try..except block in updateconfig()
It now only covers refreshwdir(). This is what importfromfiles()
does. I think it is the more appropriate behavior.
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 10 Jul 2017 21:43:19 -0700] rev 33375
sparse: clean up updateconfig()
* Use context manager for wlock
* Rename oldsparsematch to oldmatcher
* Always call parseconfig() because parsing an empty string yields
the same result as the old code
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 10 Jul 2017 21:39:49 -0700] rev 33374
sparse: move config updating function into core
As part of the move, the ui argument was dropped.
Additional fixups will be made in a follow-up commit.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 08 Jul 2017 16:18:04 -0700] rev 33373
dirstate: expose a sparse matcher on dirstate (API)
The sparse extension performs a lot of monkeypatching of dirstate
to make it sparse aware. Essentially, various operations need to
take the active sparse config into account. They do this by obtaining
a matcher representing the sparse config and filtering paths through
it.
The monkeypatching is done by stuffing a reference to a repo on
dirstate and calling sparse.matcher() (which takes a repo instance)
during each function call. The reason this function takes a repo
instance is because resolving the sparse config may require resolving
file contents from filelogs, and that requires a repo. (If the
current sparse config references "profile" files, the contents of
those files from the dirstate's parent revisions is resolved.)
I seem to recall people having strong opinions that the dirstate
object not have a reference to a repo. So copying what the sparse
extension does probably won't fly in core. Plus, the dirstate
modifications shouldn't require a full repo: they only need a matcher.
So there's no good reason to stuff a reference to the repo in
dirstate.
This commit exposes a sparse matcher to dirstate via a property that
when looked up will call a function that eventually calls
sparse.matcher(). The repo instance is bound in a closure, so it
isn't exposed to dirstate.
This approach is functionally similar to what the sparse extension does
today, except it hides the repo instance from dirstate. The approach
is not optimal because we have to call a proxy function and
sparse.matcher() on every property lookup. There is room to cache
the matcher instance in dirstate. After all, the matcher only changes
if the dirstate's parents change or if the sparse config changes. It
feels like we should be able to detect both events and update the
matcher when this occurs. But for now we preserve the existing
semantics so we can move the dirstate sparseness bits into core. Once
in core, refactoring becomes a bit easier since it will be clearer how
all these components interact.
The sparse extension has been updated to use the new property.
Because all references to the repo on dirstate have been removed,
the code for setting it has been removed.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 08 Jul 2017 15:42:11 -0700] rev 33372
sparse: use self instead of repo.dirstate
"self" here is the dirstate instance. I'm pretty confident that self
and repo.dirstate will be the exact same object. So remove a dependency
on repo by just looking at self.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 08 Jul 2017 14:15:07 -0700] rev 33371
sparse: move code for importing rules from files into core
This is a pretty straightforward port. Some code cleanup was
performed. But no major changes to the logic were made.
I'm not a huge fan of this function because it does multiple
things. I'd like to get things into core first to facilitate
refactoring later.
Please also note the added inline comment about the oddities
of writeconfig() and the try..except to undo it. This is because
of the hackiness in which the sparse matcher is obtained by
various consumers, notably dirstate. We'll need a massive
refactor to address this. That refactor is effectively blocked
on having the sparse dirstate hacks live in core.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 08 Jul 2017 14:01:32 -0700] rev 33370
sparse: refactor activeprofiles into a generic function (API)
activeprofiles() is a special case of a more generic function.
Furthermore, that generic function is essentially already
implemented inline in the sparse extension.
So, refactor activeprofiles() to a generic activeconfig(). Change
the only consumer of activeprofiles() to use it. And have the
inline implementation in the sparse extension use it.
Augie Fackler <raf@durin42.com> [Fri, 07 Jul 2017 15:11:11 -0400] rev 33369
check-code: prohibit `if False` antipattern
Differential Revision: https://phab.mercurial-scm.org/D20
Augie Fackler <raf@durin42.com> [Fri, 07 Jul 2017 15:08:23 -0400] rev 33368
convert: remove `if False` block
This code has never run since its introduction on July 18th,
2007. It's time for it to go.
Differential Revision: https://phab.mercurial-scm.org/D19
Augie Fackler <raf@durin42.com> [Fri, 07 Jul 2017 15:07:36 -0400] rev 33367
filterpyflakes: move self-test into test file
This will avoid a false positive on an upcoming check-code rule.
Differential Revision: https://phab.mercurial-scm.org/D18
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 16:38:04 -0400] rev 33366
test-subrepo: demonstrate a status problem when merge deletes a file
At the interactive update prompt, if (c) is chosen and then followed by `hg rm`,
both `status -R` and `status -S` show the file as 'R', and `files -R` shows no
files (OK, because explicitly removed files aren't supposed to be listed). If
`rm` follows selecting (c), then both flavors of `status` list the file as '!',
and `files -R` lists the missing file. So somehow, the (d) option has followed
a third path.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 16:13:30 -0400] rev 33365
subrepo: make the output references to subrepositories consistent
Well, mostly. The annotation on subrepo functions tacks on a parenthetical to
the abort message, which seems reasonable for a generic mechanism. But now all
messages consistently spell out 'subrepository', and double quote the name of
the repo. I noticed the inconsistency in the change for the last commit.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 02:55:46 -0400] rev 33364
subrepo: consider the parent repo dirty when a file is missing
This simply passes the 'missing' argument down from the context of the parent
repo, so the same rules apply. subrepo.bailifchanged() is hardcoded to care
about missing files, because cmdutil.bailifchanged() is too.
In the end, it looks like this addresses inconsistencies with 'archive',
'identify', blackbox logs, 'merge', and 'update --check'. I wasn't sure how to
implement this in git, so that's left for someone more familiar with it.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 02:46:03 -0400] rev 33363
archival: flag missing files as a dirty wdir() in the metadata file (BC)
Since the identify command adds a '+' for missing files, it's reasonable that
this does too. Perhaps the node field's hex value should be p1+p2 for merges?
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 00:53:16 -0400] rev 33362
cmdutil: simplify the dirty check in howtocontinue()
This is equivalent to the previous code. But it seems to me that if the user is
going to be prompted that a commit is needed, missing files should be ignored,
but branch and merge changes shouldn't be.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 00:23:03 -0400] rev 33361
blackbox: simplify the dirty check
Same idea (and possibly incorrect behavior) as the previous commit.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 00:19:03 -0400] rev 33360
identify: simplify the dirty check
This is equivalent to the previous code, but it seems better to be explicit
about what aspects of dirty are being ignored. Perhaps they shouldn't be, since
the help text says 'followed by a "+" if the working directory has uncommitted
changes'. Both merges and branch changes are committable, even if the files are
unchanged.
Additionally, this will make the `identify` command notice missing subrepo
files, once subrepos are taught to look for missing files.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 09 Jul 2017 00:05:31 -0400] rev 33359
tests: tweak the subrepo dirty state tests
This is a continuation of
439b4d005b4a. I overlooked that blackbox logs also
have a dirty marker. Also, the `hg update --check` test was updating to a
revision where the deleted file wasn't tracked, which is why status seemed to
show the deleted file was restored.
Martin von Zweigbergk <martinvonz@google.com> [Sun, 09 Jul 2017 23:01:11 -0700] rev 33358
match: combine regex code for path: and relpath:
The regexes for path: and relpath: patterns are the same (since the
paths have already been normalized at the point we create the
regexes).
I don't think the "if pat == '.'" will have any effect relpath:
because relpath: patterns will have the root directory already
normalized to '' by pathutil.canonpath() (unlike path:, for which the
root gets normalized to '.' by util.normpath()).