view Makefile @ 30760:753b9d43ca81

internals: document compression negotiation As part of adding zstd support to all of the things, we'll need to teach the wire protocol to support non-zlib compression formats. This commit documents how we'll implement that. To understand how we arrived at this proposal, let's look at how things are done today. The wire protocol today doesn't have a unified format. Instead, there is a limited facility for differentiating replies as successful or not. And, each command essentially defines its own response format. A significant deficiency in the current protocol is the lack of payload framing over the SSH transport. In the HTTP transport, chunked transfer is used and the end of an HTTP response body (and the end of a Mercurial command response) can be identified by a 0 length chunk. This is how HTTP chunked transfer works. But in the SSH transport, there is no such framing, at least for certain responses (notably the response to "getbundle" requests). Clients can't simply read until end of stream because the socket is persistent and reused for multiple requests. Clients need to know when they've encountered the end of a request but there is nothing simple for them to key off of to detect this. So what happens is the client must decode the payload (as opposed to being dumb and forwarding frames/packets). This means the payload itself needs to support identifying end of stream. In some cases (bundle2), it also means the payload can encode "error" or "interrupt" events telling the client to e.g. abort processing. The lack of framing on the SSH transport and the transfer of its responsibilities to e.g. bundle2 is a massive layering violation and a wart on the protocol architecture. It needs to be fixed someday by inventing a proper framing protocol. So about compression. The client transport abstractions have a "_callcompressable()" API. This API is called to invoke a remote command that will send a compressible response. The response is essentially a "streaming" response (no framing data at the Mercurial layer) that is fed into a decompressor. On the HTTP transport, the decompressor is zlib and only zlib. There is currently no mechanism for the client to specify an alternate compression format. And, clients don't advertise what compression formats they support or ask the server to send a specific compression format. Instead, it is assumed that non-error responses to "compressible" commands are zlib compressed. On the SSH transport, there is no compression at the Mercurial protocol layer. Instead, compression must be handled by SSH itself (e.g. `ssh -C`) or within the payload data (e.g. bundle compression). For the HTTP transport, adding new compression formats is pretty straightforward. Once you know what decompressor to use, you can stream data into the decompressor until you reach a 0 size HTTP chunk, at which point you are at end of stream. So our wire protocol changes for the HTTP transport are pretty straightforward: the client and server advertise what compression formats they support and an appropriate compression format is chosen. We introduce a new HTTP media type to hold compressed payloads. The header of the payload defines the compression format being used. Whoever is on the receiving end can sniff the first few bytes route to an appropriate decompressor. Support for multiple compression formats is advertised on both server and client. The server advertises a "compression" capability saying which compression formats it supports and in what order they are preferred. Clients advertise their support for multiple compression formats and media types via the introduced "X-HgProto" request header. Strictly speaking, servers don't need to advertise which compression formats they support. But doing so allows clients to fail fast if they don't support any of the formats the server does. This is useful in situations like sending bundles, where the client may have to perform expensive computation before sending data to the server. Rather than simply advertise a list of supported compression formats, we introduce an additional "httpmediatype" server capability advertising which media types the server supports. This means servers are explicit about what formats they exchange. IMO, this is superior to inferring support from other capabilities (like "compression"). By advertising compression support on each request in the "X-HgProto" header and media type and direction at the server level, we are able to gradually transition existing commands/responses to the new media type and possibly compression. Contrast with the old world, where we only supported a single media type and the use of compression was built-in to the semantics of the command on both client and server. In the new world, if "application/mercurial-0.2" is supported, compression is supported. It's that simple. It's worth noting that we explicitly don't use "Accept," "Accept-Encoding," "Content-Encoding," or "Transfer-Encoding" for content negotiation and compression. People knowledgeable of the HTTP specifications will say that we should use these because that's what they are designed to be used for. They have a point and I sympathize with the argument. Earlier versions of this commit even defined supported media types in the "Accept" header. However, my years of experience rolling out services leveraging HTTP has taught me to not trust the HTTP layer, especially if you are going outside the normal spec (such as using a custom "Content-Encoding" value to represent zstd streams). I've seen load balancers, proxies, and other network devices do very bad and unexpected things to HTTP messages (like insisting zlib compressed content is decoded and then re-encoded at a different compression level or even stripping compression completely). I've found that the best way to avoid surprises when writing protocols on top of HTTP is to use HTTP as a dumb transport as much as possible to minimize the chances that an "intelligent" agent between endpoints will muck with your data. While the widespread use of TLS is mitigating many intermediate network agents interfering with HTTP, there are still problems at the edges, with e.g. the origin HTTP server needing to convert HTTP to and from WSGI and buggy or feature-lacking HTTP client implementations. I've found the best way to avoid these problems is to avoid using headers like "Content-Encoding" and to bake as much logic as possible into media types and HTTP message bodies. The protocol changes in this commit do rely on a custom HTTP request header and the "Content-Type" headers. But we used them before, so we shouldn't be increasing our exposure to "bad" HTTP agents. For the SSH transport, we can't easily implement content negotiation to determine compression formats because the SSH transport has no content negotiation capabilities today. And without a framing protocol, we don't know how much data to feed into a decompressor. So in order to implement compression support on the SSH transport, we'd need to invent a mechanism to represent content types and an outer framing protocol to stream data robustly. While I'm fully capable of doing that, it is a lot of work and not something that should be undertaken lightly. My opinion is that if we're going to change the SSH transport protocol, we should take a long hard look at implementing a grand unified protocol that attempts to address all the deficiencies with the existing protocol. While I want this to happen, that would be massive scope bloat standing in the way of zstd support. So, I've decided to take the easy solution: the SSH transport will not gain support for multiple compression formats. Keep in mind it doesn't support *any* compression today. So essentially nothing is changing on the SSH front.
author Gregory Szorc <gregory.szorc@gmail.com>
date Sat, 24 Dec 2016 13:56:36 -0700
parents 06b17f6c6559
children 22a4f664c1a5
line wrap: on
line source

# If you want to change PREFIX, do not just edit it below. The changed
# value wont get passed on to recursive make calls. You should instead
# override the variable on the command like:
#
# % make PREFIX=/opt/ install

export PREFIX=/usr/local
PYTHON=python
$(eval HGROOT := $(shell pwd))
HGPYTHONS ?= $(HGROOT)/build/pythons
PURE=
PYFILES:=$(shell find mercurial hgext doc -name '*.py')
DOCFILES=mercurial/help/*.txt
export LANGUAGE=C
export LC_ALL=C
TESTFLAGS ?= $(shell echo $$HGTESTFLAGS)

# Set this to e.g. "mingw32" to use a non-default compiler.
COMPILER=

COMPILERFLAG_tmp_ =
COMPILERFLAG_tmp_${COMPILER} ?= -c $(COMPILER)
COMPILERFLAG=${COMPILERFLAG_tmp_${COMPILER}}

help:
	@echo 'Commonly used make targets:'
	@echo '  all          - build program and documentation'
	@echo '  install      - install program and man pages to $$PREFIX ($(PREFIX))'
	@echo '  install-home - install with setup.py install --home=$$HOME ($(HOME))'
	@echo '  local        - build for inplace usage'
	@echo '  tests        - run all tests in the automatic test suite'
	@echo '  test-foo     - run only specified tests (e.g. test-merge1.t)'
	@echo '  dist         - run all tests and create a source tarball in dist/'
	@echo '  clean        - remove files created by other targets'
	@echo '                 (except installed files or dist source tarball)'
	@echo '  update-pot   - update i18n/hg.pot'
	@echo
	@echo 'Example for a system-wide installation under /usr/local:'
	@echo '  make all && su -c "make install" && hg version'
	@echo
	@echo 'Example for a local installation (usable in this directory):'
	@echo '  make local && ./hg version'

all: build doc

local:
	$(PYTHON) setup.py $(PURE) \
	  build_py -c -d . \
	  build_ext $(COMPILERFLAG) -i \
	  build_hgexe $(COMPILERFLAG) -i \
	  build_mo
	env HGRCPATH= $(PYTHON) hg version

build:
	$(PYTHON) setup.py $(PURE) build $(COMPILERFLAG)

wheel:
	FORCE_SETUPTOOLS=1 $(PYTHON) setup.py $(PURE) bdist_wheel $(COMPILERFLAG)

doc:
	$(MAKE) -C doc

cleanbutpackages:
	-$(PYTHON) setup.py clean --all # ignore errors from this command
	find contrib doc hgext hgext3rd i18n mercurial tests \
		\( -name '*.py[cdo]' -o -name '*.so' \) -exec rm -f '{}' ';'
	rm -f $(addprefix mercurial/,$(notdir $(wildcard mercurial/pure/[a-z]*.py)))
	rm -f MANIFEST MANIFEST.in hgext/__index__.py tests/*.err
	rm -f mercurial/__modulepolicy__.py
	if test -d .hg; then rm -f mercurial/__version__.py; fi
	rm -rf build mercurial/locale
	$(MAKE) -C doc clean
	$(MAKE) -C contrib/chg distclean

clean: cleanbutpackages
	rm -rf packages

install: install-bin install-doc

install-bin: build
	$(PYTHON) setup.py $(PURE) install --root="$(DESTDIR)/" --prefix="$(PREFIX)" --force

install-doc: doc
	cd doc && $(MAKE) $(MFLAGS) install

install-home: install-home-bin install-home-doc

install-home-bin: build
	$(PYTHON) setup.py $(PURE) install --home="$(HOME)" --prefix="" --force

install-home-doc: doc
	cd doc && $(MAKE) $(MFLAGS) PREFIX="$(HOME)" install

MANIFEST-doc:
	$(MAKE) -C doc MANIFEST

MANIFEST.in: MANIFEST-doc
	hg manifest | sed -e 's/^/include /' > MANIFEST.in
	echo include mercurial/__version__.py >> MANIFEST.in
	sed -e 's/^/include /' < doc/MANIFEST >> MANIFEST.in

dist:	tests dist-notests

dist-notests:	doc MANIFEST.in
	TAR_OPTIONS="--owner=root --group=root --mode=u+w,go-w,a+rX-s" $(PYTHON) setup.py -q sdist

check: tests

tests:
	cd tests && $(PYTHON) run-tests.py $(TESTFLAGS)

test-%:
	cd tests && $(PYTHON) run-tests.py $(TESTFLAGS) $@

testpy-%:
	@echo Looking for Python $* in $(HGPYTHONS)
	[ -e $(HGPYTHONS)/$*/bin/python ] || ( \
	cd $$(mktemp --directory --tmpdir) && \
        $(MAKE) -f $(HGROOT)/contrib/Makefile.python PYTHONVER=$* PREFIX=$(HGPYTHONS)/$* python )
	cd tests && $(HGPYTHONS)/$*/bin/python run-tests.py $(TESTFLAGS)

check-code:
	hg manifest | xargs python contrib/check-code.py

update-pot: i18n/hg.pot

i18n/hg.pot: $(PYFILES) $(DOCFILES) i18n/posplit i18n/hggettext
	$(PYTHON) i18n/hggettext mercurial/commands.py \
	  hgext/*.py hgext/*/__init__.py \
	  mercurial/fileset.py mercurial/revset.py \
	  mercurial/templatefilters.py mercurial/templatekw.py \
	  mercurial/templater.py \
	  mercurial/filemerge.py \
	  mercurial/hgweb/webcommands.py \
	  $(DOCFILES) > i18n/hg.pot.tmp
        # All strings marked for translation in Mercurial contain
        # ASCII characters only. But some files contain string
        # literals like this '\037\213'. xgettext thinks it has to
        # parse them even though they are not marked for translation.
        # Extracting with an explicit encoding of ISO-8859-1 will make
        # xgettext "parse" and ignore them.
	echo $(PYFILES) | xargs \
	  xgettext --package-name "Mercurial" \
	  --msgid-bugs-address "<mercurial-devel@selenic.com>" \
	  --copyright-holder "Matt Mackall <mpm@selenic.com> and others" \
	  --from-code ISO-8859-1 --join --sort-by-file --add-comments=i18n: \
	  -d hg -p i18n -o hg.pot.tmp
	$(PYTHON) i18n/posplit i18n/hg.pot.tmp
        # The target file is not created before the last step. So it never is in
        # an intermediate state.
	mv -f i18n/hg.pot.tmp i18n/hg.pot

%.po: i18n/hg.pot
        # work on a temporary copy for never having a half completed target
	cp $@ $@.tmp
	msgmerge --no-location --update $@.tmp $^
	mv -f $@.tmp $@

# Packaging targets

osx:
	/usr/bin/python2.7 setup.py install --optimize=1 \
	  --root=build/mercurial/ --prefix=/usr/local/ \
	  --install-lib=/Library/Python/2.7/site-packages/
	make -C doc all install DESTDIR="$(PWD)/build/mercurial/"
	mkdir -p $${OUTPUTDIR:-dist}
	HGVER=$$((cat build/mercurial/Library/Python/2.7/site-packages/mercurial/__version__.py; echo 'print(version)') | python) && \
	OSXVER=$$(sw_vers -productVersion | cut -d. -f1,2) && \
	pkgbuild --root build/mercurial/ \
	  --identifier org.mercurial-scm.mercurial \
	  --version "$${HGVER}" \
	  build/mercurial.pkg && \
	productbuild --distribution contrib/macosx/distribution.xml \
	  --package-path build/ \
	  --version "$${HGVER}" \
	  --resources contrib/macosx/ \
	  "$${OUTPUTDIR:-dist/}"/Mercurial-"$${HGVER}"-macosx"$${OSXVER}".pkg

deb:
	contrib/builddeb

ppa:
	contrib/builddeb --source-only

docker-debian-jessie:
	mkdir -p packages/debian-jessie
	contrib/dockerdeb debian jessie

contrib/docker/ubuntu-%: contrib/docker/ubuntu.template
	sed "s/__CODENAME__/$*/" $< > $@

docker-ubuntu-trusty: contrib/docker/ubuntu-trusty
	contrib/dockerdeb ubuntu trusty

docker-ubuntu-trusty-ppa: contrib/docker/ubuntu-trusty
	contrib/dockerdeb ubuntu trusty --source-only

docker-ubuntu-xenial: contrib/docker/ubuntu-xenial
	contrib/dockerdeb ubuntu xenial

docker-ubuntu-xenial-ppa: contrib/docker/ubuntu-xenial
	contrib/dockerdeb ubuntu xenial --source-only

docker-ubuntu-yakkety: contrib/docker/ubuntu-yakkety
	contrib/dockerdeb ubuntu yakkety

docker-ubuntu-yakkety-ppa: contrib/docker/ubuntu-yakkety
	contrib/dockerdeb ubuntu yakkety --source-only

fedora20:
	mkdir -p packages/fedora20
	contrib/buildrpm
	cp rpmbuild/RPMS/*/* packages/fedora20
	cp rpmbuild/SRPMS/* packages/fedora20
	rm -rf rpmbuild

docker-fedora20:
	mkdir -p packages/fedora20
	contrib/dockerrpm fedora20

fedora21:
	mkdir -p packages/fedora21
	contrib/buildrpm
	cp rpmbuild/RPMS/*/* packages/fedora21
	cp rpmbuild/SRPMS/* packages/fedora21
	rm -rf rpmbuild

docker-fedora21:
	mkdir -p packages/fedora21
	contrib/dockerrpm fedora21

centos5:
	mkdir -p packages/centos5
	contrib/buildrpm --withpython
	cp rpmbuild/RPMS/*/* packages/centos5
	cp rpmbuild/SRPMS/* packages/centos5

docker-centos5:
	mkdir -p packages/centos5
	contrib/dockerrpm centos5 --withpython

centos6:
	mkdir -p packages/centos6
	contrib/buildrpm
	cp rpmbuild/RPMS/*/* packages/centos6
	cp rpmbuild/SRPMS/* packages/centos6

docker-centos6:
	mkdir -p packages/centos6
	contrib/dockerrpm centos6

centos7:
	mkdir -p packages/centos7
	contrib/buildrpm
	cp rpmbuild/RPMS/*/* packages/centos7
	cp rpmbuild/SRPMS/* packages/centos7

docker-centos7:
	mkdir -p packages/centos7
	contrib/dockerrpm centos7

.PHONY: help all local build doc cleanbutpackages clean install install-bin \
	install-doc install-home install-home-bin install-home-doc \
	dist dist-notests check tests check-code update-pot \
	osx fedora20 docker-fedora20 fedora21 docker-fedora21 \
	centos5 docker-centos5 centos6 docker-centos6 centos7 docker-centos7