highlight: ignore Unicode's extra linebreaks (issue4291)
authorMatt Mackall <mpm@selenic.com>
Wed, 17 Dec 2014 13:25:24 -0600
changeset 23613 7b8ff3fd11d3
parent 23612 6006cad5e7a9
child 23614 cd79fb4d75fd
highlight: ignore Unicode's extra linebreaks (issue4291) Unicode and Python's unicode.splitlines() treat several extra legacy ASCII codepoints as linebreaks, even though the vast bulk of computing and Python's own str.splitlines() do not. Rather than introduce line numbering confusion, we filter them out when highlighting.
hgext/highlight/highlight.py
--- a/hgext/highlight/highlight.py	Thu Dec 18 21:53:55 2014 +0100
+++ b/hgext/highlight/highlight.py	Wed Dec 17 13:25:24 2014 -0600
@@ -32,6 +32,11 @@
     if util.binary(text):
         return
 
+    # str.splitlines() != unicode.splitlines() because "reasons"
+    for c in "\x0c\x1c\x1d\x1e":
+        if c in text:
+            text = text.replace(c, '')
+
     # Pygments is best used with Unicode strings:
     # <http://pygments.org/docs/unicode/>
     text = text.decode(encoding.encoding, 'replace')