highlight: ignore Unicode's extra linebreaks (
issue4291)
Unicode and Python's unicode.splitlines() treat several extra legacy
ASCII codepoints as linebreaks, even though the vast bulk of computing
and Python's own str.splitlines() do not. Rather than introduce line
numbering confusion, we filter them out when highlighting.
--- a/hgext/highlight/highlight.py Thu Dec 18 21:53:55 2014 +0100
+++ b/hgext/highlight/highlight.py Wed Dec 17 13:25:24 2014 -0600
@@ -32,6 +32,11 @@
if util.binary(text):
return
+ # str.splitlines() != unicode.splitlines() because "reasons"
+ for c in "\x0c\x1c\x1d\x1e":
+ if c in text:
+ text = text.replace(c, '')
+
# Pygments is best used with Unicode strings:
# <http://pygments.org/docs/unicode/>
text = text.decode(encoding.encoding, 'replace')