lock: fix race in lock-breaking code
authorValentin Gatien-Baron <valentin.gatienbaron@gmail.com>
Mon, 18 Nov 2019 20:10:38 -0800
changeset 43709 039fbd14d4e2
parent 43708 cd822413b9aa
child 43711 7f443cce2972
lock: fix race in lock-breaking code With low frequency, I see hg pulls fail with output like: abort: no such file or directory: .hg/store/lock I think what happens is, in lock.py, in: def _testlock(self, locker): if not self._lockshouldbebroken(locker): return locker # if locker dead, break lock. must do this with another lock # held, or can race and break valid lock. try: with lock(self.vfs, self.f + b'.break', timeout=0): self.vfs.unlink(self.f) except error.LockError: return locker if a lock is breakable on disk, and two hg processes concurrently get to the "if locker dead" comment, a possible interleaving is: process1 finishes executing the function and then process2 finishes executing the function. If that happens, process2 will either get ENOENT in self.vfs.unlink (resulting in the spurious failure above), or break a valid lock and potentially cause repository corruption. The fix is simple enough: make sure the lock is breakable _inside_ the critical section, because only then can we know that no other process can invalidate our knowledge on the lock on disk. I don't think there are tests for this. I've tested this manually with: diff --git a/mercurial/lock.py b/mercurial/lock.py --- a/mercurial/lock.py +++ b/mercurial/lock.py @@ -351,6 +351,8 @@ class lock(object): if not self._lockshouldbebroken(locker): return locker + import random + time.sleep(1. + random.random()) # if locker dead, break lock. must do this with another lock # held, or can race and break valid lock. try: @@ -358,6 +360,7 @@ class lock(object): self.vfs.unlink(self.f) except error.LockError: return locker + time.sleep(1) def testlock(self): """return id of locker if lock is valid, else None. and I see this change of behavior before/after this commit: $ $hg init repo $ cd repo $ ln -s $HOSTNAME/effffffc:987654321 .hg/wlock $ touch a $ $hg commit -Am_ & $hg commit -Am _; wait -abort: No such file or directory: '/tmp/repo/.hg/wlock' adding a +warning: ignoring unknown working parent 679a8959a8ca! +nothing changed Differential Revision: https://phab.mercurial-scm.org/D7199
mercurial/lock.py
--- a/mercurial/lock.py	Fri Nov 01 19:59:07 2019 -0400
+++ b/mercurial/lock.py	Mon Nov 18 20:10:38 2019 -0800
@@ -355,6 +355,9 @@
         # held, or can race and break valid lock.
         try:
             with lock(self.vfs, self.f + b'.break', timeout=0):
+                locker = self._readlock()
+                if not self._lockshouldbebroken(locker):
+                    return locker
                 self.vfs.unlink(self.f)
         except error.LockError:
             return locker