httppeer: detect redirect to URL without query string (issue5860)
197d10e157ce subtly changed the HTTP peer's handling of HTTP redirects.
Before that changeset, we instantiated an HTTP peer instance and
performed the capabilities lookup with that instance. The old code had
the following relevant properties:
1) The HTTP request layer would automatically follow HTTP redirects.
2) An encountered HTTP redirect would update a peer instance variable
pointing to the repo URL.
3) The peer would automagically perform a "capabilities" command
request if a caller requested capabilities but capabilities were
not yet defined.
The first HTTP request issued by a peer is for ?cmd=capabilities. If
the server responds with an HTTP redirect to a ?cmd=capabilities URL,
the HTTP request layer automatically followed it, retrieved a valid
capabilities response, and the peer's base URL was updated
automatically so subsequent requests used the proper URL. In other
words, things "just worked."
In the case where the server redirected to a URL without the
?cmd=capabilities query string, the HTTP request layer would follow
the redirect and likely encounter HTML. The peer's base URL would be
updated and the unexpected Content-Type would raise a RepoError. We
would catch RepoError and immediately call between() (testing the case
for pre 0.9.1 servers not supporting the "capabilities" command). e.g.
try:
inst._fetchcaps()
except error.RepoError:
inst.between([(nullid, nullid)])
between() would eventually call into _callstream(). And _callstream()
made a call to self.capable('httpheader'). capable() would call
self.capabilities(), which would see that no capabilities were set
(because HTML was returned for that request) and call the "capabilities"
command to fetch capabilities. Because the base URL had been updated
from the redirect, this 2nd "capabilities" command would succeed and
the client would immediately call "between," which would also succeed.
The legacy handshake succeeded. Only because "capabilities" was
successfully executed as a side effect did the peer recognize that it
was talking to a modern server. In other words, this all appeared to
work accidentally.
After 197d10e157ce, we stopped calling the "capabilities" command on
the peer instance. Instead, we made the request via a low-level opener,
detected the redirect as part of response handling code, and passed the
redirected URL into the constructed peer instance.
For cases where the redirected URL included the query string, this
"just worked." But for cases where the redirected URL stripped the query
string, we threw RepoError and because we removed the "between" handshake
fallback, we fell through to the "is a static HTTP repo" check and
performed an HTTP request for .hg/requires.
While 197d10e157ce was marked as backwards incompatible, the only
intended backwards incompatible behavior was not performing the
"between" fallback. It was not realized that the "between" command
had the side-effect of recovering from an errant redirect that
dropped the query string.
This commit restores the previous behavior and allows clients to
handle a redirect that drops the query string. In the case where
the request is redirected and the query string is dropped, we raise
a special case of RepoError. We then catch this special exception in
the handshake code and perform another "capabilities" request against
the redirected URL. If that works, all is well. Otherwise, we fall back
to the "is a static HTTP repo" check.
The new code is arguably better than before 197d10e157ce, as it is
explicit about the expected behavior and we avoid performing a
"between" request, saving a server round trip.
Differential Revision: https://phab.mercurial-scm.org/D3433
# blackbox.py - log repository events to a file for post-mortem debugging
#
# Copyright 2010 Nicolas Dumazet
# Copyright 2013 Facebook, Inc.
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
"""log repository events to a blackbox for debugging
Logs event information to .hg/blackbox.log to help debug and diagnose problems.
The events that get logged can be configured via the blackbox.track config key.
Examples::
[blackbox]
track = *
# dirty is *EXPENSIVE* (slow);
# each log entry indicates `+` if the repository is dirty, like :hg:`id`.
dirty = True
# record the source of log messages
logsource = True
[blackbox]
track = command, commandfinish, commandexception, exthook, pythonhook
[blackbox]
track = incoming
[blackbox]
# limit the size of a log file
maxsize = 1.5 MB
# rotate up to N log files when the current one gets too big
maxfiles = 3
"""
from __future__ import absolute_import
import errno
import re
from mercurial.i18n import _
from mercurial.node import hex
from mercurial import (
encoding,
registrar,
ui as uimod,
util,
)
from mercurial.utils import (
dateutil,
procutil,
)
# Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for
# extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
# be specifying the version(s) of Mercurial they are tested with, or
# leave the attribute unspecified.
testedwith = 'ships-with-hg-core'
cmdtable = {}
command = registrar.command(cmdtable)
configtable = {}
configitem = registrar.configitem(configtable)
configitem('blackbox', 'dirty',
default=False,
)
configitem('blackbox', 'maxsize',
default='1 MB',
)
configitem('blackbox', 'logsource',
default=False,
)
configitem('blackbox', 'maxfiles',
default=7,
)
configitem('blackbox', 'track',
default=lambda: ['*'],
)
lastui = None
def _openlogfile(ui, vfs):
def rotate(oldpath, newpath):
try:
vfs.unlink(newpath)
except OSError as err:
if err.errno != errno.ENOENT:
ui.debug("warning: cannot remove '%s': %s\n" %
(newpath, err.strerror))
try:
if newpath:
vfs.rename(oldpath, newpath)
except OSError as err:
if err.errno != errno.ENOENT:
ui.debug("warning: cannot rename '%s' to '%s': %s\n" %
(newpath, oldpath, err.strerror))
maxsize = ui.configbytes('blackbox', 'maxsize')
name = 'blackbox.log'
if maxsize > 0:
try:
st = vfs.stat(name)
except OSError:
pass
else:
if st.st_size >= maxsize:
path = vfs.join(name)
maxfiles = ui.configint('blackbox', 'maxfiles')
for i in xrange(maxfiles - 1, 1, -1):
rotate(oldpath='%s.%d' % (path, i - 1),
newpath='%s.%d' % (path, i))
rotate(oldpath=path,
newpath=maxfiles > 0 and path + '.1')
return vfs(name, 'a')
def wrapui(ui):
class blackboxui(ui.__class__):
@property
def _bbvfs(self):
vfs = None
repo = getattr(self, '_bbrepo', None)
if repo:
vfs = repo.vfs
if not vfs.isdir('.'):
vfs = None
return vfs
@util.propertycache
def track(self):
return self.configlist('blackbox', 'track')
def debug(self, *msg, **opts):
super(blackboxui, self).debug(*msg, **opts)
if self.debugflag:
self.log('debug', '%s', ''.join(msg))
def log(self, event, *msg, **opts):
global lastui
super(blackboxui, self).log(event, *msg, **opts)
if not '*' in self.track and not event in self.track:
return
if self._bbvfs:
ui = self
else:
# certain ui instances exist outside the context of
# a repo, so just default to the last blackbox that
# was seen.
ui = lastui
if not ui:
return
vfs = ui._bbvfs
if not vfs:
return
repo = getattr(ui, '_bbrepo', None)
if not lastui or repo:
lastui = ui
if getattr(ui, '_bbinlog', False):
# recursion and failure guard
return
ui._bbinlog = True
default = self.configdate('devel', 'default-date')
date = dateutil.datestr(default, '%Y/%m/%d %H:%M:%S')
user = procutil.getuser()
pid = '%d' % procutil.getpid()
formattedmsg = msg[0] % msg[1:]
rev = '(unknown)'
changed = ''
if repo:
ctx = repo[None]
parents = ctx.parents()
rev = ('+'.join([hex(p.node()) for p in parents]))
if (ui.configbool('blackbox', 'dirty') and
ctx.dirty(missing=True, merge=False, branch=False)):
changed = '+'
if ui.configbool('blackbox', 'logsource'):
src = ' [%s]' % event
else:
src = ''
try:
fmt = '%s %s @%s%s (%s)%s> %s'
args = (date, user, rev, changed, pid, src, formattedmsg)
with _openlogfile(ui, vfs) as fp:
fp.write(fmt % args)
except (IOError, OSError) as err:
self.debug('warning: cannot write to blackbox.log: %s\n' %
encoding.strtolocal(err.strerror))
# do not restore _bbinlog intentionally to avoid failed
# logging again
else:
ui._bbinlog = False
def setrepo(self, repo):
self._bbrepo = repo
ui.__class__ = blackboxui
uimod.ui = blackboxui
def uisetup(ui):
wrapui(ui)
def reposetup(ui, repo):
# During 'hg pull' a httppeer repo is created to represent the remote repo.
# It doesn't have a .hg directory to put a blackbox in, so we don't do
# the blackbox setup for it.
if not repo.local():
return
if util.safehasattr(ui, 'setrepo'):
ui.setrepo(repo)
# Set lastui even if ui.log is not called. This gives blackbox a
# fallback place to log.
global lastui
if lastui is None:
lastui = ui
repo._wlockfreeprefix.add('blackbox.log')
@command('^blackbox',
[('l', 'limit', 10, _('the number of events to show')),
],
_('hg blackbox [OPTION]...'))
def blackbox(ui, repo, *revs, **opts):
'''view the recent repository events
'''
if not repo.vfs.exists('blackbox.log'):
return
limit = opts.get(r'limit')
fp = repo.vfs('blackbox.log', 'r')
lines = fp.read().split('\n')
count = 0
output = []
for line in reversed(lines):
if count >= limit:
break
# count the commands by matching lines like: 2013/01/23 19:13:36 root>
if re.match('^\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} .*> .*', line):
count += 1
output.append(line)
ui.status('\n'.join(reversed(output)))