tests/filterpyflakes.py
author FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
Fri, 03 Mar 2017 02:57:06 +0900
changeset 31220 e1d035905b2e
parent 30430 21772a6a7861
child 32317 80e3002cd29e
permissions -rwxr-xr-x
similar: compare between actual file contents for exact identity Before this patch, similarity detection logic (for addremove and automv) depends entirely on SHA-1 digesting. But this causes incorrect rename detection, if: - removing file A and adding file B occur at same committing, and - SHA-1 hash values of file A and B are same This may prevent security experts from managing sample files for SHAttered issue in Mercurial repository, for example. https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html https://shattered.it/ Hash collision itself isn't so serious for core repository functionality of Mercurial, described by mpm as below, though. https://www.mercurial-scm.org/wiki/mpm/SHA1 This patch compares between actual file contents after hash comparison for exact identity. Even after this patch, SHA-1 is still used, because it is reasonable enough to quickly detect existence of "(almost) same" file. - replacing SHA-1 causes decreasing performance, and - replacement of it has ambiguity, yet Getting content of removed file (= rfctx.data()) at each exact comparison should be cheap enough, even though getting content of added one costs much. ======= ============== ===================== file fctx data() reads from ======= ============== ===================== removed filectx in-memory revlog data added workingfilectx storage ======= ============== =====================

#!/usr/bin/env python

# Filter output by pyflakes to control which warnings we check

from __future__ import absolute_import, print_function

import re
import sys

lines = []
for line in sys.stdin:
    # We blacklist tests that are too noisy for us
    pats = [
        r"undefined name '(WindowsError|memoryview)'",
        r"redefinition of unused '[^']+' from line",
    ]

    keep = True
    for pat in pats:
        if re.search(pat, line):
            keep = False
            break # pattern matches
    if keep:
        fn = line.split(':', 1)[0]
        f = open(fn)
        data = f.read()
        f.close()
        if 'no-' 'check-code' in data:
            continue
        lines.append(line)

for line in lines:
    sys.stdout.write(line)
print()

# self test of "undefined name" detection for other than 'memoryview'
if False:
    print(memoryview)
    print(undefinedname)