Thu, 08 Mar 2018 16:38:01 -0800 wireprotoserver: access headers through parsed request
Gregory Szorc <gregory.szorc@gmail.com> [Thu, 08 Mar 2018 16:38:01 -0800] rev 36846
wireprotoserver: access headers through parsed request Now that we can access headers via the parsed request object, let's do that. Since the new object uses bytes, hyphens, and is case-insensitive, a bit of code around normalizing values has been removed. I think the new code is much more intuitive because it more closely matches what is going out over the wire. Differential Revision: https://phab.mercurial-scm.org/D2743
Mon, 12 Mar 2018 13:15:00 -0700 hgweb: garbage collect on every request stable
Gregory Szorc <gregory.szorc@gmail.com> [Mon, 12 Mar 2018 13:15:00 -0700] rev 36845
hgweb: garbage collect on every request There appears to be a cycle in localrepository or hgweb that is preventing repositories from being garbage collected when hgwebdir dispatches to hgweb. Every request creates a new repository instance and then leaks that object and other referenced objects. A periodic GC to find cycles will eventually collect the old repositories. But these don't run reliably and rapid requests to hgwebdir can result in rapidly increasing memory consumption. With the Firefox repository, repeated requests to raw-file URLs leak ~100 MB per hgwebdir request (most of this appears to be cached manifest data structures). WSGI processes quickly grow to >1 GB RSS. Breaking the cycles in localrepository is going to be a bit of work. Because we know that hgwebdir leaks localrepository instances, let's put a band aid on the problem in the form of an explicit gc.collect() on every hgwebdir request. As the inline comment states, ideally we'd do this in a finally block for the current request iff it dispatches to hgweb. But _runwsgi() returns an explicit value. We need the finally to run after generator exhaustion. So we'd need to refactor _runwsgi() to "yield" instead of "return." That's too much change for a patch to stable. So we implement this hack one function above and run it on every request. The performance impact of this change should be minimal. Any impact should be offset by benefits from not having hgwebdir processes leak memory.
Sun, 11 Mar 2018 20:10:38 +0900 amend: abort if unresolved merge conflicts found (issue5805) stable
Yuya Nishihara <yuya@tcha.org> [Sun, 11 Mar 2018 20:10:38 +0900] rev 36844
amend: abort if unresolved merge conflicts found (issue5805) It was checked by repo.commit() before e8a7c1a0565a "cmdutil: remove the redundant commit during amend."
Mon, 12 Mar 2018 22:47:33 +0900 debugwireproto: close the write end before consuming all available data
Yuya Nishihara <yuya@tcha.org> [Mon, 12 Mar 2018 22:47:33 +0900] rev 36843
debugwireproto: close the write end before consuming all available data And make it read all available data deterministically. Otherwise util.poll() may deadlock because both stdout and stderr could have no data. Spotted by the next patch which removes stderr from the fds.
Fri, 09 Mar 2018 15:57:16 +0100 graft: check for missing revision first before scanning working copy
Joerg Sonnenberger <joerg@bec.de> [Fri, 09 Mar 2018 15:57:16 +0100] rev 36842
graft: check for missing revision first before scanning working copy Differential Revision: https://phab.mercurial-scm.org/D2753
Sat, 10 Mar 2018 22:02:58 -0500 hook: ensure stderr is flushed when an exception is raised, for test stability
Matt Harbison <matt_harbison@yahoo.com> [Sat, 10 Mar 2018 22:02:58 -0500] rev 36841
hook: ensure stderr is flushed when an exception is raised, for test stability Windows has had issues with output order in test-ssh-proto-unbundle.t[1] since it was created a few weeks ago. Each of the problems occurred when an exception was thrown out of the hook. Now the only thing blocking D2720 is the fact that the "abort: ..." lines on stderr are totally AWOL. I have no idea where there are. [1] https://buildbot.mercurial-scm.org/builders/Win7%20x86_64%20hg%20tests/builds/541/steps/run-tests.py%20%28python%202.7.13%29/logs/stdio
Sat, 10 Mar 2018 10:27:56 -0800 wireproto: raise ProgrammingError instead of Abort
Gregory Szorc <gregory.szorc@gmail.com> [Sat, 10 Mar 2018 10:27:56 -0800] rev 36840
wireproto: raise ProgrammingError instead of Abort This isn't a user-facing error and can only be caused by bad Python code. Thanks to Yuya for spotting this. Differential Revision: https://phab.mercurial-scm.org/D2777
Sat, 10 Mar 2018 19:56:47 +0900 py3: make test-commit-interactive.t byte-safe
Yuya Nishihara <yuya@tcha.org> [Sat, 10 Mar 2018 19:56:47 +0900] rev 36839
py3: make test-commit-interactive.t byte-safe
Sat, 10 Mar 2018 19:49:09 +0900 py3: open patch file in binary mode and convert eol manually
Yuya Nishihara <yuya@tcha.org> [Sat, 10 Mar 2018 19:49:09 +0900] rev 36838
py3: open patch file in binary mode and convert eol manually Here we don't introduce a reader wrapper since it wouldn't be easy to make read(n) handle partial data and length correctly.
Tue, 06 Mar 2018 07:45:57 -0600 py3: wrap file object to write patch in native eol preserving byte-ness
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 07:45:57 -0600] rev 36837
py3: wrap file object to write patch in native eol preserving byte-ness
Tue, 06 Mar 2018 07:24:12 -0600 py3: drop b'' from debug message "moving bookmarks"
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 07:24:12 -0600] rev 36836
py3: drop b'' from debug message "moving bookmarks"
Sat, 10 Mar 2018 15:57:16 +0900 py3: use r'' instead of sysstr('') to get around code transformer
Yuya Nishihara <yuya@tcha.org> [Sat, 10 Mar 2018 15:57:16 +0900] rev 36835
py3: use r'' instead of sysstr('') to get around code transformer Fewer function calls should be better.
Sat, 10 Mar 2018 15:50:09 +0900 ui: remove any combinations of CR|LF from prompt response
Yuya Nishihara <yuya@tcha.org> [Sat, 10 Mar 2018 15:50:09 +0900] rev 36834
ui: remove any combinations of CR|LF from prompt response On Windows, we have to accept both CR+LF and LF. This patch simply makes any trailing CRs and LFs removed from a user input instead of doing stricter parsing, as an input must be a readable text.
Sat, 10 Mar 2018 12:45:10 -0500 sshpeer: check pipe validity before forwarding output from it
Matt Harbison <matt_harbison@yahoo.com> [Sat, 10 Mar 2018 12:45:10 -0500] rev 36833
sshpeer: check pipe validity before forwarding output from it After the previous fix, fileobjectproxy._observedcall() (called when win32.peekpipe() accesses .fileno) started exploding. With this fix, similar checks are needed inside debugwireproto(). Since that is hardcoded to not use os.devnull, IDK if those are worth fixing.
Sat, 10 Mar 2018 12:22:08 -0500 util: forward __bool__()/__nonzero__() on fileobjectproxy
Matt Harbison <matt_harbison@yahoo.com> [Sat, 10 Mar 2018 12:22:08 -0500] rev 36832
util: forward __bool__()/__nonzero__() on fileobjectproxy In trying to debug the Windows process hang in D2720, I changed the stderr pipe to the peer to be os.devnull instead. That caused sshpeer._cleanuppipes()[1] to explode, complaining NoneType has no __iter__ attribute, even though the previous line checked for None. [1] https://www.mercurial-scm.org/repo/hg/file/b434965f984e/mercurial/sshpeer.py#l133
Tue, 06 Mar 2018 07:16:41 -0600 py3: fix slicing of bisect label in templatefilters.shortbisect()
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 07:16:41 -0600] rev 36831
py3: fix slicing of bisect label in templatefilters.shortbisect()
Tue, 06 Mar 2018 07:15:01 -0600 templatefilters: inline hbisect.shortlabel()
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 07:15:01 -0600] rev 36830
templatefilters: inline hbisect.shortlabel() It's pretty simple. I don't think the business logic has to be placed in hbisect.py.
Tue, 06 Mar 2018 07:11:24 -0600 py3: make test-bisect.t bytes-safe
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 07:11:24 -0600] rev 36829
py3: make test-bisect.t bytes-safe
Tue, 06 Mar 2018 07:10:50 -0600 py3: fix integer formatting in bisect error
Yuya Nishihara <yuya@tcha.org> [Tue, 06 Mar 2018 07:10:50 -0600] rev 36828
py3: fix integer formatting in bisect error
Sat, 10 Mar 2018 16:55:54 +0900 py3: silence f.write() in test-annotate.t
Yuya Nishihara <yuya@tcha.org> [Sat, 10 Mar 2018 16:55:54 +0900] rev 36827
py3: silence f.write() in test-annotate.t
Fri, 09 Mar 2018 14:52:36 -0800 xdiff: resolve signed unsigned comparison warning
Jun Wu <quark@fb.com> [Fri, 09 Mar 2018 14:52:36 -0800] rev 36826
xdiff: resolve signed unsigned comparison warning Since the value won't be changed inside the code (because context lines feature was removed by D2705), let's just remove the variable and inline the 0 value. The code might be potentially further simplified. But I'd like to make sure correctness is easily verifiable in this patch. Differential Revision: https://phab.mercurial-scm.org/D2766
Fri, 09 Mar 2018 14:47:29 -0800 xdiff: use int64 for hash table size
Jun Wu <quark@fb.com> [Fri, 09 Mar 2018 14:47:29 -0800] rev 36825
xdiff: use int64 for hash table size Follow-up of the previous "long" -> "int64" change. Now xdiff only uses int for return values and small integers (ex. booleans, shifting score, bits in hash table size, etc) so it should be able to handle large input. Differential Revision: https://phab.mercurial-scm.org/D2765
Fri, 09 Mar 2018 14:39:35 -0800 xdiff: remove unused xpp and xecfg parameters
Jun Wu <quark@fb.com> [Fri, 09 Mar 2018 14:39:35 -0800] rev 36824
xdiff: remove unused xpp and xecfg parameters They are unused. Thus removed. Differential Revision: https://phab.mercurial-scm.org/D2764
Fri, 09 Mar 2018 14:37:55 -0800 xdiff: remove unused flags parameter
Jun Wu <quark@fb.com> [Fri, 09 Mar 2018 14:37:55 -0800] rev 36823
xdiff: remove unused flags parameter After D2683, the flags parameter in some functions is no longer needed. Thus removed. Differential Revision: https://phab.mercurial-scm.org/D2763
Fri, 09 Mar 2018 14:24:27 -0800 xdiff: replace {unsigned ,}long with {u,}int64_t
Jun Wu <quark@fb.com> [Fri, 09 Mar 2018 14:24:27 -0800] rev 36822
xdiff: replace {unsigned ,}long with {u,}int64_t MSVC treats "long" as 4-byte. That could cause overflows since the xdiff code uses "long" in places where "size_t" or "ssize_t" should be used. Let's use explicit 8 byte integers to avoid FWIW git avoids that overflow by limiting diff size to 1GB [1]. After examining the code, I think the remaining risk (the use of "int") is low since "int" is only used for return values and hash table size. Although a wrong hash table size would not affect the correctness of the code, but that could make the code extremely slow. The next patch will change hash table size to 8-byte integer so the 1GB limit is unlikely needed. This patch was done by using `sed`. [1]: https://github.com/git/git/commit/dcd1742e56ebb944c4ff62346da4548e1e3be67 Differential Revision: https://phab.mercurial-scm.org/D2762
Sun, 04 Mar 2018 11:30:16 -0800 xdiff: add comments for fields in xdfile_t
Jun Wu <quark@fb.com> [Sun, 04 Mar 2018 11:30:16 -0800] rev 36821
xdiff: add comments for fields in xdfile_t This makes the related code easier to understand. Differential Revision: https://phab.mercurial-scm.org/D2685
Wed, 07 Mar 2018 14:45:31 -0800 xdiff: add a preprocessing step that trims files
Jun Wu <quark@fb.com> [Wed, 07 Mar 2018 14:45:31 -0800] rev 36820
xdiff: add a preprocessing step that trims files xdiff has a `xdl_trim_ends` step that removes common lines, unmatchable lines. That is in theory good, but happens too late - after splitting, hashing, and adjusting the hash values so they are unique. Those splitting, hashing and adjusting hash values steps could have noticeable overhead. Diffing two large files with minor (one-line-ish) changes are not uncommon. In that case, the raw performance of those preparation steps seriously matter. Even allocating an O(N) array and storing line offsets to it is expensive. Therefore my previous attempts [1] [2] cannot be good enough since they do not remove the O(N) array assignment. This patch adds a preprocessing step - `xdl_trim_files` that runs before other preprocessing steps. It counts common prefix and suffix and lines in them (needed for displaying line number), without doing anything else. Testing with a crafted large (169MB) file, with minor change: ``` open('a','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6000000)) open('b','w').write(''.join('%s\n' % (i % 100000) for i in xrange(30000000) if i != 6003000)) ``` Running xdiff by a simple binary [3], this patch improves the xdiff perf by more than 10x for the above case: ``` # xdiff before this patch 2.41s user 1.13s system 98% cpu 3.592 total # xdiff after this patch 0.14s user 0.16s system 98% cpu 0.309 total # gnu diffutils 0.12s user 0.15s system 98% cpu 0.272 total # (best of 20 runs) ``` It's still slightly slower than GNU diffutils. But it's pretty close now. Testing with real repo data: For the whole repo, this patch makes xdiff 25% faster: ``` # hg perfbdiff --count 100 --alldata -c d334afc585e2 --blocks [--xdiff] # xdiff, after ! wall 0.058861 comb 0.050000 user 0.050000 sys 0.000000 (best of 100) # xdiff, before ! wall 0.077816 comb 0.080000 user 0.080000 sys 0.000000 (best of 91) # bdiff ! wall 0.117473 comb 0.120000 user 0.120000 sys 0.000000 (best of 67) ``` For files that are long (ex. commands.py), the speedup is more than 3x, very significant: ``` # hg perfbdiff --count 3000 --blocks commands.py.i 1 [--xdiff] # xdiff, after ! wall 0.690583 comb 0.690000 user 0.690000 sys 0.000000 (best of 12) # xdiff, before ! wall 2.240361 comb 2.210000 user 2.210000 sys 0.000000 (best of 4) # bdiff ! wall 2.469852 comb 2.440000 user 2.440000 sys 0.000000 (best of 4) ``` [1]: https://phab.mercurial-scm.org/D2631 [2]: https://phab.mercurial-scm.org/D2634 [3]: ``` // Code to run xdiff from command line. No proper error handling. #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include "mercurial/thirdparty/xdiff/xdiff.h" #define ensure(x) if (!(x)) exit(255); mmfile_t readfile(const char *path) { struct stat st; int fd = open(path, O_RDONLY); fstat(fd, &st); mmfile_t file = { malloc(st.st_size), st.st_size }; ensure(read(fd, file.ptr, st.st_size) == st.st_size); close(fd); return file; } int main(int argc, char const *argv[]) { mmfile_t a = readfile(argv[1]), b = readfile(argv[2]); xpparam_t xpp = {0}; xdemitconf_t xecfg = {0}; xdemitcb_t ecb = {0}; xdl_diff(&a, &b, &xpp, &xecfg, &ecb); return 0; } ``` Differential Revision: https://phab.mercurial-scm.org/D2686
Fri, 09 Mar 2018 14:30:15 -0800 transaction: add a name and a __repr__ implementation (API)
Martin von Zweigbergk <martinvonz@google.com> [Fri, 09 Mar 2018 14:30:15 -0800] rev 36819
transaction: add a name and a __repr__ implementation (API) This has been useful for me for debugging. Differential Revision: https://phab.mercurial-scm.org/D2758
Fri, 09 Mar 2018 16:10:55 +0100 phabricator: update doc string for deprecated token argument
Joerg Sonnenberger <joerg@bec.de> [Fri, 09 Mar 2018 16:10:55 +0100] rev 36818
phabricator: update doc string for deprecated token argument Differential Revision: https://phab.mercurial-scm.org/D2755
Fri, 09 Mar 2018 16:09:27 +0100 phabricator: print deprecation warning only once
Joerg Sonnenberger <joerg@bec.de> [Fri, 09 Mar 2018 16:09:27 +0100] rev 36817
phabricator: print deprecation warning only once Differential Revision: https://phab.mercurial-scm.org/D2754
(0) -30000 -10000 -3000 -1000 -300 -100 -50 -30 +30 +50 +100 +300 +1000 +3000 +10000 tip