fsmonitor: match watchman and filesystem encoding
watchman's paths encoding can differ from filesystem encoding. For example,
on Windows, it's always utf-8.
Before this patch, on Windows, mismatch in path comparison between fsmonitor
state and osutil.statfiles would yield a clean status for added/modified files.
In addition to status reporting wrong results, this leads to files being
discarded from changesets while doing history editing operations such as rebase.
Benchmark:
There is a little overhead at module import:
python -m timeit "import hgext.fsmonitor"
Windows before patch: 1000000 loops, best of 3: 0.563 usec per loop
Windows after patch: 1000000 loops, best of 3: 0.583 usec per loop
Linx before patch: 1000000 loops, best of 3: 0.579 usec per loop
Linux after patch: 1000000 loops, best of 3: 0.588 usec per loop
10000 calls to _watchmantofsencoding:
python -m timeit -s "from hgext.fsmonitor import _watchmantofsencoding, _fixencoding" "fname = '/path/to/file'" "for i in range(10000):" " if _fixencoding: fname = _watchmantofsencoding(fname)"
Windows (_fixencoding is True): 100 loops, best of 3: 19.5 msec per loop
Linux (_fixencoding is False): 100 loops, best of 3: 3.08 msec per loop
# Read the output of a "svn log --xml" command on stdin, parse it and
# print a subset of attributes common to all svn versions tested by
# hg.
from __future__ import absolute_import
import sys
import xml.dom.minidom
def xmltext(e):
return ''.join(c.data for c
in e.childNodes
if c.nodeType == c.TEXT_NODE)
def parseentry(entry):
e = {}
e['revision'] = entry.getAttribute('revision')
e['author'] = xmltext(entry.getElementsByTagName('author')[0])
e['msg'] = xmltext(entry.getElementsByTagName('msg')[0])
e['paths'] = []
paths = entry.getElementsByTagName('paths')
if paths:
paths = paths[0]
for p in paths.getElementsByTagName('path'):
action = p.getAttribute('action')
path = xmltext(p)
frompath = p.getAttribute('copyfrom-path')
fromrev = p.getAttribute('copyfrom-rev')
e['paths'].append((path, action, frompath, fromrev))
return e
def parselog(data):
entries = []
doc = xml.dom.minidom.parseString(data)
for e in doc.getElementsByTagName('logentry'):
entries.append(parseentry(e))
return entries
def printentries(entries):
fp = sys.stdout
for e in entries:
for k in ('revision', 'author', 'msg'):
fp.write(('%s: %s\n' % (k, e[k])).encode('utf-8'))
for path, action, fpath, frev in sorted(e['paths']):
frominfo = ''
if frev:
frominfo = ' (from %s@%s)' % (fpath, frev)
p = ' %s %s%s\n' % (action, path, frominfo)
fp.write(p.encode('utf-8'))
if __name__ == '__main__':
data = sys.stdin.read()
entries = parselog(data)
printentries(entries)