annotate hgext/highlight/__init__.py @ 26680:7a3f6490ef97

highlight: add option to prevent content-only based fallback When Mozilla enabled Pygments on hg.mozilla.org, we got a lot of weirdly colorized files. Upon further investigation, the hightlight extension is first attempting a filename+content based match then falling back to a purely content-driven detection mode in Pygments. Sounds good in theory. Unfortunately, Pygments' content-driven detection establishes no minimum threshold for returning a lexer. Furthermore, the detection code for a number of languages is very liberal. For example, ActionScript 3 will return a confidence of 0.3 (out of 1.0) if the first 1k of the file we pass in matches the regex "\w+\s*:\s*\w"! Python matches on "import ". It's no coincidence that a number of our extension-less files were getting highlighted improperly. This patch adds an option to have the highlighter not fall back to purely content-based detection when filename+content detection failed. This can be enabled to render unlighted text instead of taking the risk that unknown file types are highlighted incorrectly. The old behavior is still the default.
author Gregory Szorc <gregory.szorc@gmail.com>
date Wed, 14 Oct 2015 18:22:16 -0700
parents 0d93df4d1e44
children 6a98f9408a50
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
8251
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
1 # highlight - syntax highlighting in hgweb, based on Pygments
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
2 #
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
3 # Copyright 2008, 2009 Patrick Mezard <pmezard@gmail.com> and others
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
4 #
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
5 # This software may be used and distributed according to the terms of the
10263
25e572394f5c Update license to GPLv2+
Matt Mackall <mpm@selenic.com>
parents: 9409
diff changeset
6 # GNU General Public License version 2 or any later version.
8251
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
7 #
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
8 # The original module was split in an interface and an implementation
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
9 # file to defer pygments loading and speedup extension setup.
7fc30044b514 highlight: add copyright and license header
Martin Geisler <mg@lazybytes.net>
parents: 7216
diff changeset
10
8932
f87884329419 extensions: fix up description lines some more
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8894
diff changeset
11 """syntax highlighting for hgweb (requires Pygments)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
12
9262
917e1d5674d6 highlight: wrap docstrings at 70 characters
Martin Geisler <mg@lazybytes.net>
parents: 9210
diff changeset
13 It depends on the Pygments syntax highlighting library:
917e1d5674d6 highlight: wrap docstrings at 70 characters
Martin Geisler <mg@lazybytes.net>
parents: 9210
diff changeset
14 http://pygments.org/
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
15
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
16 There are the following configuration options::
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
17
9210
2667ca525b59 highlight: use reST syntax for literal block
Martin Geisler <mg@lazybytes.net>
parents: 9064
diff changeset
18 [web]
26249
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
19 pygments_style = <style> (default: colorful)
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
20 highlightfiles = <fileset> (default: size('<5M'))
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
21 highlightonlymatchfilename = <bool> (default False)
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
22
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
23 ``highlightonlymatchfilename`` will only highlight files if their type could
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
24 be identified by their filename. When this is not enabled (the default),
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
25 Pygments will try very hard to identify the file type from content and any
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
26 match (even matches with a low confidence score) will be used.
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
27 """
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
28
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
29 import highlight
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
30 from mercurial.hgweb import webcommands, webutil, common
26249
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
31 from mercurial import extensions, encoding, fileset
25186
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
32 # Note for extension authors: ONLY specify testedwith = 'internal' for
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
33 # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
34 # be specifying the version(s) of Mercurial they are tested with, or
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 19872
diff changeset
35 # leave the attribute unspecified.
16743
38caf405d010 hgext: mark all first-party extensions as such
Augie Fackler <raf@durin42.com>
parents: 16683
diff changeset
36 testedwith = 'internal'
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
37
26679
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
38 def pygmentize(web, field, fctx, tmpl):
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
39 style = web.config('web', 'pygments_style', 'colorful')
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
40 expr = web.config('web', 'highlightfiles', "size('<5M')")
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
41 filenameonly = web.configbool('web', 'highlightonlymatchfilename', False)
26679
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
42
26249
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
43 ctx = fctx.changectx()
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
44 tree = fileset.parse(expr)
3166bcc0c538 highlight: add highlightfiles config option which takes a fileset (issue3005)
Anton Shestakov <av6@dwimlabs.net>
parents: 25602
diff changeset
45 mctx = fileset.matchctx(ctx, subset=[fctx.path()], status=None)
26679
0d93df4d1e44 highlight: inline checkfctx()
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26678
diff changeset
46 if fctx.path() in fileset.getset(mctx, tree):
26680
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
47 highlight.pygmentize(field, fctx, style, tmpl,
7a3f6490ef97 highlight: add option to prevent content-only based fallback
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26679
diff changeset
48 guessfilenameonly=filenameonly)
26678
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
49
25602
85fb416f2fa7 hgweb: provide symrev (symbolic revision) property to the templates
Anton Shestakov <av6@dwimlabs.net>
parents: 25186
diff changeset
50 def filerevision_highlight(orig, web, req, tmpl, fctx):
8874
74baf78202e8 highlight: was broken since 580a79dde2a3 (encoding)
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8866
diff changeset
51 mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
6987
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
52 # only pygmentize for mimetype containing 'html' so we both match
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
53 # 'text/html' and possibly 'application/xhtml+xml' in the future
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
54 # so that we don't have to touch the extension when the mimetype
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
55 # for a template changes; also hgweb optimizes the case that a
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
56 # raw file is sent using rawfile() and doesn't call us, so we
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
57 # can't clash with the file's content-type here in case we
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
58 # pygmentize a html file
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
59 if 'html' in mt:
26678
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
60 pygmentize(web, 'fileline', fctx, tmpl)
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
61
25602
85fb416f2fa7 hgweb: provide symrev (symbolic revision) property to the templates
Anton Shestakov <av6@dwimlabs.net>
parents: 25186
diff changeset
62 return orig(web, req, tmpl, fctx)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
63
7216
292fb2ad2846 extensions: use new wrapper functions
Matt Mackall <mpm@selenic.com>
parents: 7127
diff changeset
64 def annotate_highlight(orig, web, req, tmpl):
8874
74baf78202e8 highlight: was broken since 580a79dde2a3 (encoding)
Dirkjan Ochtman <dirkjan@ochtman.nl>
parents: 8866
diff changeset
65 mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
6987
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
66 if 'html' in mt:
d09e813b21e3 highlight: only pygmentize for HTML mimetypes
Rocco Rutte <pdmef@gmx.net>
parents: 6938
diff changeset
67 fctx = webutil.filectx(web.repo, req)
26678
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
68 pygmentize(web, 'annotateline', fctx, tmpl)
613d850cce53 highlight: consolidate duplicate code
Gregory Szorc <gregory.szorc@gmail.com>
parents: 26295
diff changeset
69
7216
292fb2ad2846 extensions: use new wrapper functions
Matt Mackall <mpm@selenic.com>
parents: 7127
diff changeset
70 return orig(web, req, tmpl)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
71
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
72 def generate_css(web, req, tmpl):
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
73 pg_style = web.config('web', 'pygments_style', 'colorful')
19872
681f7b9213a4 check-code: check for spaces around = for named parameters
Mads Kiilerich <madski@unity3d.com>
parents: 16743
diff changeset
74 fmter = highlight.HtmlFormatter(style=pg_style)
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
75 req.respond(common.HTTP_OK, 'text/css')
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
76 return ['/* pygments_style = %s */\n\n' % pg_style,
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
77 fmter.get_style_defs('')]
6938
ce94b3236ea4 highlight: split code to improve startup times
Patrick Mezard <pmezard@gmail.com>
parents:
diff changeset
78
9409
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
79 def extsetup():
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
80 # monkeypatch in the new version
16683
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
81 extensions.wrapfunction(webcommands, '_filerevision',
525fdb738975 cleanup: eradicate long lines
Brodie Rao <brodie@sf.io>
parents: 10263
diff changeset
82 filerevision_highlight)
9409
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
83 extensions.wrapfunction(webcommands, 'annotate', annotate_highlight)
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
84 webcommands.highlightcss = generate_css
57157a224037 highlight: move code from module top-level into extsetup
Martin Geisler <mg@lazybytes.net>
parents: 9262
diff changeset
85 webcommands.__all__.append('highlightcss')