comparison mercurial/revlogutils/deltas.py @ 40603:2f7e531ef3e7

sparse-revlog: skip the span check in the sparse-revlog case This significantly improves the performance on unbundling on smaller repositories. Mercurial: unbundling 1K revisions no-sparse-revlog: 500 ms sparse-revlog-before: 689 ms sparse-revlog-after: 484 ms Pypy: unbundling 1K revisions no-sparse-revlog: 1.242 s sparse-revlog-before: 1.135 s sparse-revlog-after: 0.860 s NetBeans: unbundling 1K revisions no-sparse-revlog: 1.386 s sparse-revlog-before: 2.368 s sparse-revlog-after: 1.191 s Mozilla: unbundling 1K revisions no-sparse-revlog: 3.103 s sparse-revlog-before: 3.367 s sparse-revlog-after: 3.093 s
author Boris Feld <boris.feld@octobus.net>
date Mon, 15 Oct 2018 15:45:08 +0200
parents 324ba8b14d78
children 3ac23dad6364
comparison
equal deleted inserted replaced
40602:c36175456350 40603:2f7e531ef3e7
487 # bounding it limits the amount of I/O we need to do. 487 # bounding it limits the amount of I/O we need to do.
488 # - 'deltainfo.compresseddeltalen' is the sum of the total size of 488 # - 'deltainfo.compresseddeltalen' is the sum of the total size of
489 # deltas we need to apply -- bounding it limits the amount of CPU 489 # deltas we need to apply -- bounding it limits the amount of CPU
490 # we consume. 490 # we consume.
491 491
492 if revlog._sparserevlog:
493 # As sparse-read will be used, we can consider that the distance,
494 # instead of being the span of the whole chunk,
495 # is the span of the largest read chunk
496 base = deltainfo.base
497
498 if base != nullrev:
499 deltachain = revlog._deltachain(base)[0]
500 else:
501 deltachain = []
502
503 # search for the first non-snapshot revision
504 for idx, r in enumerate(deltachain):
505 if not revlog.issnapshot(r):
506 break
507 deltachain = deltachain[idx:]
508 chunks = slicechunk(revlog, deltachain, deltainfo)
509 all_span = [segmentspan(revlog, revs, deltainfo)
510 for revs in chunks]
511 distance = max(all_span)
512 else:
513 distance = deltainfo.distance
514
515 textlen = revinfo.textlen 492 textlen = revinfo.textlen
516 defaultmax = textlen * 4 493 defaultmax = textlen * 4
517 maxdist = revlog._maxdeltachainspan 494 maxdist = revlog._maxdeltachainspan
518 if not maxdist: 495 if not maxdist:
519 maxdist = distance # ensure the conditional pass 496 maxdist = deltainfo.distance # ensure the conditional pass
520 maxdist = max(maxdist, defaultmax) 497 maxdist = max(maxdist, defaultmax)
521 if revlog._sparserevlog and maxdist < revlog._srmingapsize:
522 # In multiple place, we are ignoring irrelevant data range below a
523 # certain size. Be also apply this tradeoff here and relax span
524 # constraint for small enought content.
525 maxdist = revlog._srmingapsize
526 498
527 # Bad delta from read span: 499 # Bad delta from read span:
528 # 500 #
529 # If the span of data read is larger than the maximum allowed. 501 # If the span of data read is larger than the maximum allowed.
530 if maxdist < distance: 502 #
503 # In the sparse-revlog case, we rely on the associated "sparse reading"
504 # to avoid issue related to the span of data. In theory, it would be
505 # possible to build pathological revlog where delta pattern would lead
506 # to too many reads. However, they do not happen in practice at all. So
507 # we skip the span check entirely.
508 if not revlog._sparserevlog and maxdist < deltainfo.distance:
531 return False 509 return False
532 510
533 # Bad delta from new delta size: 511 # Bad delta from new delta size:
534 # 512 #
535 # If the delta size is larger than the target text, storing the 513 # If the delta size is larger than the target text, storing the