comparison hgext/lfs/wrapper.py @ 35476:417e8e040102

lfs: verify lfs object content when transferring to and from the remote store This avoids inserting corrupt files into the usercache, and local and remote stores. One down side is that the bad file won't be available locally for forensic purposes after a remote download. I'm thinking about adding an 'incoming' directory to the local lfs store to handle the download, and then move it to the 'objects' directory after it passes verification. That would have the additional benefit of not concatenating each transfer chunk in memory until the full file is transferred. Verification isn't needed when the data is passed back through the revlog interface or when the oid was just calculated, but otherwise it is on by default. The additional overhead should be well worth avoiding problems with file based remote stores, or buggy lfs servers. Having two different verify functions is a little sad, but the full data of the blob is mostly passed around in memory, because that's what the revlog interface wants. The upload function, however, chunks up the data. It would be ideal if that was how the content is always handled, but that's probably a huge project. I don't really like printing the long hash, but `hg debugdata` isn't a public interface, and is the only way to get it. The filelog and revision info is nowhere near this area, so recommending `hg verify` is the easiest thing to do.
author Matt Harbison <matt_harbison@yahoo.com>
date Fri, 17 Nov 2017 00:06:45 -0500
parents 02f54a1ec9eb
children 5a73a0446afd
comparison
equal deleted inserted replaced
35475:b0c01a5ee35c 35476:417e8e040102
52 oid = p.oid() 52 oid = p.oid()
53 store = self.opener.lfslocalblobstore 53 store = self.opener.lfslocalblobstore
54 if not store.has(oid): 54 if not store.has(oid):
55 p.filename = getattr(self, 'indexfile', None) 55 p.filename = getattr(self, 'indexfile', None)
56 self.opener.lfsremoteblobstore.readbatch([p], store) 56 self.opener.lfsremoteblobstore.readbatch([p], store)
57 text = store.read(oid) 57
58 # The caller will validate the content
59 text = store.read(oid, verify=False)
58 60
59 # pack hg filelog metadata 61 # pack hg filelog metadata
60 hgmeta = {} 62 hgmeta = {}
61 for k in p.keys(): 63 for k in p.keys():
62 if k.startswith('x-hg-'): 64 if k.startswith('x-hg-'):
74 # lfs blob does not contain hg filelog metadata 76 # lfs blob does not contain hg filelog metadata
75 text = text[offset:] 77 text = text[offset:]
76 78
77 # git-lfs only supports sha256 79 # git-lfs only supports sha256
78 oid = hashlib.sha256(text).hexdigest() 80 oid = hashlib.sha256(text).hexdigest()
79 self.opener.lfslocalblobstore.write(oid, text) 81 self.opener.lfslocalblobstore.write(oid, text, verify=False)
80 82
81 # replace contents with metadata 83 # replace contents with metadata
82 longoid = 'sha256:%s' % oid 84 longoid = 'sha256:%s' % oid
83 metadata = pointer.gitlfspointer(oid=longoid, size=str(len(text))) 85 metadata = pointer.gitlfspointer(oid=longoid, size=str(len(text)))
84 86