Gregory Szorc <gregory.szorc@gmail.com> [Sat, 18 Jul 2015 10:57:20 -0700] rev 25823
changegroup: compute seen files as changesets are added (
issue4750)
Before this patch, addchangegroup() would walk the changelog and compute
the set of seen files between applying changesets and applying
manifests. When cloning large repositories such as mozilla-central,
this consumed a non-trivial amount of time. On my MBP, this walk takes
~10s. On a dainty EC2 instance, this was measured to take ~125s! On the
latter machine, this delay was enough for the Mercurial server to
disconnect the client, thinking it had timed out, thus causing a clone
to abort.
This patch enables the changelog to compute the set of changed files as
new revisions are added. By doing so, we:
* avoid a potentially heavy computation between changelog and manifest
processing by spreading the computation across all changelog additions
* avoid extra reads from the changelog by operating on the data as it is
added
The downside of this is that the add revision callback does result in
extra I/O. Before, we would perform a flush (and subsequent read to
construct the full revision) when new delta chains were created. For
changelogs, this is typically every 2-4 revisions. Using the callback
guarantees there will be a flush after every added revision *and* an
open + read of the changelog to obtain the full revision in order to
read the added files. So, this increases the frequency of these
operations by the average chain length. In the future, the revlog
should be smart enough to know how to read revisions that haven't been
flushed yet, thus eliminating this extra I/O.
On my MBP, the total CPU times for an `hg unbundle` with a local
mozilla-central gzip bundle containing 251,934 changesets and 211,065
files did not have a statistically significant change with this patch,
holding steady around 360s. So, the increased revlog flushing did not
have an effect.
With this patch, there is no longer a visible pause between applying
changeset and manifest data. Before, it sure felt like Mercurial was
lethargic making this transition. Now, the transition is nearly
instantaneous, giving the impression that Mercurial is faster. Of course,
eliminating this pause means that the potential for network disconnect due
to channel inactivity during the changelog walk is eliminated as well.
And that is the impetus behind this change.
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 18 Jul 2015 10:29:37 -0700] rev 25822
revlog: add support for a callback whenever revisions are added
A subsequent patch will add a feature that performs iterative
computation as changesets are added from a changegroup. To facilitate
this type of processing in a generic manner, we add a mechanism for
calling a function whenever a revision is added via revlog.addgroup().
There are potential performance concerns with this callback, as using it
will flush the revlog after every revision is added.
Laurent Charignon <lcharignon@fb.com> [Fri, 17 Jul 2015 13:44:01 -0700] rev 25821
crecord: throws error instead of crashing for large diffs
Before this patch, crecord was crashing for large diffs
(30k lines on my laptop). This patch catches the exception raised in that case
and use the error reporting mechanism introduced in the previous patch for
notifying the user of the issue. It is not possible to add a test for that for
now as we don't yet have full blown ui tests for the curses interface.
Laurent Charignon <lcharignon@fb.com> [Fri, 17 Jul 2015 13:41:17 -0700] rev 25820
crecord: add error reporting for failure in curses interface initialization
Before this patch, we couldn't report to the user any error that occurred:
- after we enabled the curses interface but
- before the interface is set up and drawn
This patch, provides a way to set errors that happens during the initialization
of the interface and log them once the curses interface has been displayed.
Yuya Nishihara <yuya@tcha.org> [Sun, 05 Jul 2015 12:15:54 +0900] rev 25819
revset: parse nullary ":" operator as "0:tip"
This is necessary for compatibility with the old-style parser that will be
removed by future patches.
Yuya Nishihara <yuya@tcha.org> [Mon, 06 Jul 2015 22:01:41 +0900] rev 25818
parser: take suffix action if no infix action is defined
If no infix action is defined, a suffix action isn't ambiguous, so it should
be taken no matter if the next token can be an operand. This is exactly the
same flow as prefix/primary handling.
This change has no effect now because all suffix tokens have infix actions.
Yuya Nishihara <yuya@tcha.org> [Mon, 06 Jul 2015 21:55:55 +0900] rev 25817
parser: reorder infix/suffix handling to be similar to prefix/primary flow
It can be exactly the same flow as the prefix/primary handling. A suffix
action is accepted only if the next token never starts new term.
Yuya Nishihara <yuya@tcha.org> [Sun, 05 Jul 2015 12:09:27 +0900] rev 25816
parser: resolve ambiguity where both prefix and primary actions are defined
If both actions are defined, a primary-expression action is accepted only if
the next token never starts new term. For example,
parsed as primary expression:
":" # next token 'end' has no action
"(:)" # next token ')' has no action
":+y" # next token '+' is infix operator
parsed as prefix operator:
":y" # next token 'y' is primary expression
":-y" # next token '-' is prefix operator
This is mostly the same resolution as the infix/suffix rules.
Yuya Nishihara <yuya@tcha.org> [Sun, 05 Jul 2015 12:02:13 +0900] rev 25815
parser: separate actions for primary expression and prefix operator
This will allow us to define both a primary expression, ":", and a prefix
operator, ":y". The ambiguity will be resolved by the next patch.
Prefix actions in elements table are adjusted as follows:
original prefix primary prefix
----------------- -------- -----------------
("group", 1, ")") -> n/a ("group", 1, ")")
("negate", 19) -> n/a ("negate", 19)
("symbol",) -> "symbol" n/a
Pierre-Yves David <pierre-yves.david@fb.com> [Fri, 17 Jul 2015 15:53:56 +0200] rev 25814
changelog: update read pending documentation
The pending index contains a full copy of the index + in-transaction data. We
replace "extend" with "overwrite" to make this clearer.
Matt Harbison <matt_harbison@yahoo.com> [Sun, 15 Jul 2012 12:43:10 -0400] rev 25813
extdiff: add support for subrepos
Git and svn subrepo support is incomplete, because they don't support archiving
the working copy.
Matt Harbison <matt_harbison@yahoo.com> [Wed, 11 Jul 2012 20:48:51 -0400] rev 25812
extdiff: use archiver to take snapshots of committed revisions
This is the last step before supporting extdiff -S. It maintains the existing
behavior of diffing the largefile standins instead of the largefiles themselves.
Note however that the standins are not updated immediately upon modification, so
uncommitted largefile changes are ignored, as they previously were, even with
the diff command.
Matt Harbison <matt_harbison@yahoo.com> [Sat, 11 Jul 2015 23:26:33 -0400] rev 25811
largefiles: allow the archiving of largefiles to be disabled
There are currently no users of this, but it is a necessary step before
converting extdiff to use archive. It may be useful to add an argument to
extdiff in the future and allow largefiles to be diffed, but archiving
largefiles can have significant overhead and may not be very diffable, so
archiving them by default seems wrong.
It is a mystery to me why the lfstatus attribute needs to be set on the
unfiltered repo. However if it is set on the filtered repo instead (and the
filtered repo is passed to the original command), the lfstatus attribute is
False in the overrides for archival.archive() and hgsubrepo.archive() when
invoking the archive command. This smells like the buggy status behavior (see
67d63ec85eb7, which was reverted in
df463ca0adef). Neither the status nor
summary commands have this weird behavior in their respective overrides.
Yuya Nishihara <yuya@tcha.org> [Thu, 16 Jul 2015 23:36:08 +0900] rev 25810
parsers: fix buffer overflow by invalid parent revision read from revlog
If revlog file is corrupted, it can have parent pointing to invalid revision.
So we should validate it before updating nothead[], phases[], seen[], etc.
Otherwise it would cause segfault at best.
We could use "rev" instead of "maxrev" as upper bound, but I think the explicit
"maxrev" can clarify that we just want to avoid possible buffer overflow
vulnerability.