view mercurial/commit.py @ 48687:f8f2ecdde4b5

branchmap: skip obsolete revisions while computing heads It's time to make this part of core Mercurial obsolescence-aware. Not considering obsolete revisions when computing heads is clearly what Mercurial should do. But there are a couple of small issues: - Let's say tip of the repo is obsolete. There are two ways of finding tiprev for branchcache (both are in use): looking at input data for update() and looking at computed heads after update(). Previously, repo tip would be tiprev of the branchcache. With this patch, an obsolete revision can no longer be tiprev. And depending on what way we use for finding tiprev (input data vs computed heads) we'll get a different result. This is relevant when recomputing cache key from cache contents, and may lead to updating cache for obsolete revisions multiple times (not from scratch, because it still would be considered valid for a subset of revisions in the repo). - If all commits on a branch are obsolete, the branchcache will include that branch, but the list of heads will be empty (that's why there's now `if not heads` when recomputing tiprev/tipnode from cache contents). Having an entry for every branch is currently required for notify extension (and test-notify.t to pass), because notify doesn't handle revsets in its subscription config very well and will throw an error if e.g. a branch doesn't exist. - Cloning static HTTP repos may try to stat() a non-existent obsstore file. The issue is that we now care about obsolescence during clone, but statichttpvfs doesn't implement a stat method, so a regular vfs.stat() is used, and it assumes that file is local and calls os.stat(). During a clone, we're trying to stat() .hg/store/obsstore, but in static HTTP case we provide a literal URL to the obsstore file on the remote as if it were a local file path. On windows it actually results in a failure in test-static-http.t. The first issue is going to be addressed in a series dedicated to making sure branchcache is properly and timely written on disk (it wasn't perfect even before this patch, but there aren't enough tests to demonstrate that). The second issue will be addressed in a future patch for notify extension that will make it not raise an exception if a branch doesn't exist. And the third one was partially addressed in the previous patch in this series and will be properly fixed in a future patch when this series is accepted. filteredhash() grows a keyword argument to make sure that branchcache is also invalidated when there are new obsolete revisions in its repo view. This way the on-disk cache format is unchanged and compatible between versions (although it will obviously be recomputed when switching versions before/after this patch and the repo has obsolete revisions). There's one test that uses plain `hg up` without arguments while updated to a pruned commit. To make this test pass, simply return current working directory parent. Later in this series this code will be replaced by what prune command does: updating to the closest non-obsolete ancestor. Test changes: test-branch-change.t: update branch head and cache update message. The head of default listed in hg heads is changed because revision 2 was rewritten as 7, and 1 is the closest ancestor on the same branch, so it's the head of default now. The cache invalidation message appears now because of the cache hash change, since we're now accounting for obsolete revisions. Here's some context: "served.hidden" repo filter means everything is visible (no filtered revisions), so before this series branch2-served.hidden file would not contain any cache hash, only revnum and node. Now it also has a hash when there are obsolete changesets in the repo. The command that the message appears for is changing branch of 5 and 6, which are now obsolete, so the cache hash changes. In general, when cache is simply out-of-date, it can be updated using the old version as a base. But if cache hash differs, then the cache for that particular repo filter is recomputed (at least with the current implementation). This is what happens here. test-obsmarker-template.t: the pull reports 2 heads changed, but after that the repo correctly sees only 1. The new message could be better, but it's still an improvement over the previous one where hg pull suggested merging with an obsolete revision. test-obsolete.t: we can see these revisions in hg log --hidden, but they shouldn't be considered heads even with --hidden. test-rebase-obsolete{,2}.t: there were new heads created previously after making new orphan changesets, but they weren't detected. Now we are properly detecting and reporting them. test-rebase-obsolete4.t: there's only one head now because the other head is pruned and was falsely reported before. test-static-http.t: add obsstore to the list of requested files. This file doesn't exist on the remotes, but clients want it anyway (they get 404). This is fine, because there are other nonexistent files that clients request, like .hg/bookmarks or .hg/cache/tags2-served. Differential Revision: https://phab.mercurial-scm.org/D12097
author Anton Shestakov <av6@dwimlabs.net>
date Fri, 07 Jan 2022 11:53:23 +0300
parents 5b9de38a0356
children f1eb77dceb36
line wrap: on
line source

# commit.py - fonction to perform commit
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.

from __future__ import absolute_import

import errno

from .i18n import _
from .node import (
    hex,
    nullrev,
)

from . import (
    context,
    mergestate,
    metadata,
    phases,
    scmutil,
    subrepoutil,
)


def _write_copy_meta(repo):
    """return a (changelog, filelog) boolean tuple

    changelog: copy related information should be stored in the changeset
    filelof:   copy related information should be written in the file revision
    """
    if repo.filecopiesmode == b'changeset-sidedata':
        writechangesetcopy = True
        writefilecopymeta = True
    else:
        writecopiesto = repo.ui.config(b'experimental', b'copies.write-to')
        writefilecopymeta = writecopiesto != b'changeset-only'
        writechangesetcopy = writecopiesto in (
            b'changeset-only',
            b'compatibility',
        )
    return writechangesetcopy, writefilecopymeta


def commitctx(repo, ctx, error=False, origctx=None):
    """Add a new revision to the target repository.
    Revision information is passed via the context argument.

    ctx.files() should list all files involved in this commit, i.e.
    modified/added/removed files. On merge, it may be wider than the
    ctx.files() to be committed, since any file nodes derived directly
    from p1 or p2 are excluded from the committed ctx.files().

    origctx is for convert to work around the problem that bug
    fixes to the files list in changesets change hashes. For
    convert to be the identity, it can pass an origctx and this
    function will use the same files list when it makes sense to
    do so.
    """
    repo = repo.unfiltered()

    p1, p2 = ctx.p1(), ctx.p2()
    user = ctx.user()

    with repo.lock(), repo.transaction(b"commit") as tr:
        mn, files = _prepare_files(tr, ctx, error=error, origctx=origctx)

        extra = ctx.extra().copy()

        if extra is not None:
            for name in (
                b'p1copies',
                b'p2copies',
                b'filesadded',
                b'filesremoved',
            ):
                extra.pop(name, None)
        if repo.changelog._copiesstorage == b'extra':
            extra = _extra_with_copies(repo, extra, files)

        # save the tip to check whether we actually committed anything
        oldtip = repo.changelog.tiprev()

        # update changelog
        repo.ui.note(_(b"committing changelog\n"))
        repo.changelog.delayupdate(tr)
        n = repo.changelog.add(
            mn,
            files,
            ctx.description(),
            tr,
            p1.node(),
            p2.node(),
            user,
            ctx.date(),
            extra,
        )
        rev = repo[n].rev()
        if oldtip != repo.changelog.tiprev():
            repo.register_changeset(rev, repo.changelog.changelogrevision(rev))

        xp1, xp2 = p1.hex(), p2 and p2.hex() or b''
        repo.hook(
            b'pretxncommit',
            throw=True,
            node=hex(n),
            parent1=xp1,
            parent2=xp2,
        )
        # set the new commit is proper phase
        targetphase = subrepoutil.newcommitphase(repo.ui, ctx)

        # prevent unmarking changesets as public on recommit
        waspublic = oldtip == repo.changelog.tiprev() and not repo[rev].phase()

        if targetphase and not waspublic:
            # retract boundary do not alter parent changeset.
            # if a parent have higher the resulting phase will
            # be compliant anyway
            #
            # if minimal phase was 0 we don't need to retract anything
            phases.registernew(repo, tr, targetphase, [rev])
        return n


def _prepare_files(tr, ctx, error=False, origctx=None):
    repo = ctx.repo()
    p1 = ctx.p1()

    writechangesetcopy, writefilecopymeta = _write_copy_meta(repo)
    files = metadata.ChangingFiles()
    ms = mergestate.mergestate.read(repo)
    salvaged = _get_salvaged(repo, ms, ctx)
    for s in salvaged:
        files.mark_salvaged(s)

    if ctx.manifestnode():
        # reuse an existing manifest revision
        repo.ui.debug(b'reusing known manifest\n')
        mn = ctx.manifestnode()
        files.update_touched(ctx.files())
        if writechangesetcopy:
            files.update_added(ctx.filesadded())
            files.update_removed(ctx.filesremoved())
    elif not ctx.files():
        repo.ui.debug(b'reusing manifest from p1 (no file change)\n')
        mn = p1.manifestnode()
    else:
        mn = _process_files(tr, ctx, ms, files, error=error)

    if origctx and origctx.manifestnode() == mn:
        origfiles = origctx.files()
        assert files.touched.issubset(origfiles)
        files.update_touched(origfiles)

    if writechangesetcopy:
        files.update_copies_from_p1(ctx.p1copies())
        files.update_copies_from_p2(ctx.p2copies())

    return mn, files


def _get_salvaged(repo, ms, ctx):
    """returns a list of salvaged files

    returns empty list if config option which process salvaged files are
    not enabled"""
    salvaged = []
    copy_sd = repo.filecopiesmode == b'changeset-sidedata'
    if copy_sd and len(ctx.parents()) > 1:
        if ms.active():
            for fname in sorted(ms.allextras().keys()):
                might_removed = ms.extras(fname).get(b'merge-removal-candidate')
                if might_removed == b'yes':
                    if fname in ctx:
                        salvaged.append(fname)
    return salvaged


def _process_files(tr, ctx, ms, files, error=False):
    repo = ctx.repo()
    p1 = ctx.p1()
    p2 = ctx.p2()

    writechangesetcopy, writefilecopymeta = _write_copy_meta(repo)

    m1ctx = p1.manifestctx()
    m2ctx = p2.manifestctx()
    mctx = m1ctx.copy()

    m = mctx.read()
    m1 = m1ctx.read()
    m2 = m2ctx.read()

    # check in files
    added = []
    removed = list(ctx.removed())
    linkrev = len(repo)
    repo.ui.note(_(b"committing files:\n"))
    uipathfn = scmutil.getuipathfn(repo)
    for f in sorted(ctx.modified() + ctx.added()):
        repo.ui.note(uipathfn(f) + b"\n")
        try:
            fctx = ctx[f]
            if fctx is None:
                removed.append(f)
            else:
                added.append(f)
                m[f], is_touched = _filecommit(
                    repo, fctx, m1, m2, linkrev, tr, writefilecopymeta, ms
                )
                if is_touched:
                    if is_touched == 'added':
                        files.mark_added(f)
                    elif is_touched == 'merged':
                        files.mark_merged(f)
                    else:
                        files.mark_touched(f)
                m.setflag(f, fctx.flags())
        except OSError:
            repo.ui.warn(_(b"trouble committing %s!\n") % uipathfn(f))
            raise
        except IOError as inst:
            errcode = getattr(inst, 'errno', errno.ENOENT)
            if error or errcode and errcode != errno.ENOENT:
                repo.ui.warn(_(b"trouble committing %s!\n") % uipathfn(f))
            raise

    # update manifest
    removed = [f for f in removed if f in m1 or f in m2]
    drop = sorted([f for f in removed if f in m])
    for f in drop:
        del m[f]
    if p2.rev() == nullrev:
        files.update_removed(removed)
    else:
        rf = metadata.get_removal_filter(ctx, (p1, p2, m1, m2))
        for f in removed:
            if not rf(f):
                files.mark_removed(f)

    mn = _commit_manifest(tr, linkrev, ctx, mctx, m, files.touched, added, drop)

    return mn


def _filecommit(
    repo,
    fctx,
    manifest1,
    manifest2,
    linkrev,
    tr,
    includecopymeta,
    ms,
):
    """
    commit an individual file as part of a larger transaction

    input:

        fctx:       a file context with the content we are trying to commit
        manifest1:  manifest of changeset first parent
        manifest2:  manifest of changeset second parent
        linkrev:    revision number of the changeset being created
        tr:         current transation
        includecopymeta: boolean, set to False to skip storing the copy data
                    (only used by the Google specific feature of using
                    changeset extra as copy source of truth).
        ms:         mergestate object

    output: (filenode, touched)

        filenode: the filenode that should be used by this changeset
        touched:  one of: None (mean untouched), 'added' or 'modified'
    """

    fname = fctx.path()
    fparent1 = manifest1.get(fname, repo.nullid)
    fparent2 = manifest2.get(fname, repo.nullid)
    touched = None
    if fparent1 == fparent2 == repo.nullid:
        touched = 'added'

    if isinstance(fctx, context.filectx):
        # This block fast path most comparisons which are usually done. It
        # assumes that bare filectx is used and no merge happened, hence no
        # need to create a new file revision in this case.
        node = fctx.filenode()
        if node in [fparent1, fparent2]:
            repo.ui.debug(b'reusing %s filelog entry\n' % fname)
            if (
                fparent1 != repo.nullid
                and manifest1.flags(fname) != fctx.flags()
            ) or (
                fparent2 != repo.nullid
                and manifest2.flags(fname) != fctx.flags()
            ):
                touched = 'modified'
            return node, touched

    flog = repo.file(fname)
    meta = {}
    cfname = fctx.copysource()
    fnode = None

    if cfname and cfname != fname:
        # Mark the new revision of this file as a copy of another
        # file.  This copy data will effectively act as a parent
        # of this new revision.  If this is a merge, the first
        # parent will be the nullid (meaning "look up the copy data")
        # and the second one will be the other parent.  For example:
        #
        # 0 --- 1 --- 3   rev1 changes file foo
        #   \       /     rev2 renames foo to bar and changes it
        #    \- 2 -/      rev3 should have bar with all changes and
        #                      should record that bar descends from
        #                      bar in rev2 and foo in rev1
        #
        # this allows this merge to succeed:
        #
        # 0 --- 1 --- 3   rev4 reverts the content change from rev2
        #   \       /     merging rev3 and rev4 should use bar@rev2
        #    \- 2 --- 4        as the merge base
        #

        cnode = manifest1.get(cfname)
        newfparent = fparent2

        if manifest2:  # branch merge
            if (
                fparent2 == repo.nullid or cnode is None
            ):  # copied on remote side
                if cfname in manifest2:
                    cnode = manifest2[cfname]
                    newfparent = fparent1

        # Here, we used to search backwards through history to try to find
        # where the file copy came from if the source of a copy was not in
        # the parent directory. However, this doesn't actually make sense to
        # do (what does a copy from something not in your working copy even
        # mean?) and it causes bugs (eg, issue4476). Instead, we will warn
        # the user that copy information was dropped, so if they didn't
        # expect this outcome it can be fixed, but this is the correct
        # behavior in this circumstance.

        if cnode:
            repo.ui.debug(b" %s: copy %s:%s\n" % (fname, cfname, hex(cnode)))
            if includecopymeta:
                meta[b"copy"] = cfname
                meta[b"copyrev"] = hex(cnode)
            fparent1, fparent2 = repo.nullid, newfparent
        else:
            repo.ui.warn(
                _(
                    b"warning: can't find ancestor for '%s' "
                    b"copied from '%s'!\n"
                )
                % (fname, cfname)
            )

    elif fparent1 == repo.nullid:
        fparent1, fparent2 = fparent2, repo.nullid
    elif fparent2 != repo.nullid:
        if ms.active() and ms.extras(fname).get(b'filenode-source') == b'other':
            fparent1, fparent2 = fparent2, repo.nullid
        elif ms.active() and ms.extras(fname).get(b'merged') != b'yes':
            fparent1, fparent2 = fparent1, repo.nullid
        # is one parent an ancestor of the other?
        else:
            fparentancestors = flog.commonancestorsheads(fparent1, fparent2)
            if fparent1 in fparentancestors:
                fparent1, fparent2 = fparent2, repo.nullid
            elif fparent2 in fparentancestors:
                fparent2 = repo.nullid

    force_new_node = False
    # The file might have been deleted by merge code and user explicitly choose
    # to revert the file and keep it. The other case can be where there is
    # change-delete or delete-change conflict and user explicitly choose to keep
    # the file. The goal is to create a new filenode for users explicit choices
    if (
        repo.ui.configbool(b'experimental', b'merge-track-salvaged')
        and ms.active()
        and ms.extras(fname).get(b'merge-removal-candidate') == b'yes'
    ):
        force_new_node = True
    # is the file changed?
    text = fctx.data()
    if (
        fparent2 != repo.nullid
        or fparent1 == repo.nullid
        or meta
        or flog.cmp(fparent1, text)
        or force_new_node
    ):
        if touched is None:  # do not overwrite added
            if fparent2 == repo.nullid:
                touched = 'modified'
            else:
                touched = 'merged'
        fnode = flog.add(text, meta, tr, linkrev, fparent1, fparent2)
    # are just the flags changed during merge?
    elif fname in manifest1 and manifest1.flags(fname) != fctx.flags():
        touched = 'modified'
        fnode = fparent1
    else:
        fnode = fparent1
    return fnode, touched


def _commit_manifest(tr, linkrev, ctx, mctx, manifest, files, added, drop):
    """make a new manifest entry (or reuse a new one)

    given an initialised manifest context and precomputed list of
    - files: files affected by the commit
    - added: new entries in the manifest
    - drop:  entries present in parents but absent of this one

    Create a new manifest revision, reuse existing ones if possible.

    Return the nodeid of the manifest revision.
    """
    repo = ctx.repo()

    md = None

    # all this is cached, so it is find to get them all from the ctx.
    p1 = ctx.p1()
    p2 = ctx.p2()
    m1ctx = p1.manifestctx()

    m1 = m1ctx.read()

    if not files:
        # if no "files" actually changed in terms of the changelog,
        # try hard to detect unmodified manifest entry so that the
        # exact same commit can be reproduced later on convert.
        md = m1.diff(manifest, scmutil.matchfiles(repo, ctx.files()))
    if not files and md:
        repo.ui.debug(
            b'not reusing manifest (no file change in '
            b'changelog, but manifest differs)\n'
        )
    if files or md:
        repo.ui.note(_(b"committing manifest\n"))
        # we're using narrowmatch here since it's already applied at
        # other stages (such as dirstate.walk), so we're already
        # ignoring things outside of narrowspec in most cases. The
        # one case where we might have files outside the narrowspec
        # at this point is merges, and we already error out in the
        # case where the merge has files outside of the narrowspec,
        # so this is safe.
        mn = mctx.write(
            tr,
            linkrev,
            p1.manifestnode(),
            p2.manifestnode(),
            added,
            drop,
            match=repo.narrowmatch(),
        )
    else:
        repo.ui.debug(
            b'reusing manifest from p1 (listed files ' b'actually unchanged)\n'
        )
        mn = p1.manifestnode()

    return mn


def _extra_with_copies(repo, extra, files):
    """encode copy information into a `extra` dictionnary"""
    p1copies = files.copied_from_p1
    p2copies = files.copied_from_p2
    filesadded = files.added
    filesremoved = files.removed
    files = sorted(files.touched)
    if not _write_copy_meta(repo)[1]:
        # If writing only to changeset extras, use None to indicate that
        # no entry should be written. If writing to both, write an empty
        # entry to prevent the reader from falling back to reading
        # filelogs.
        p1copies = p1copies or None
        p2copies = p2copies or None
        filesadded = filesadded or None
        filesremoved = filesremoved or None

    extrasentries = p1copies, p2copies, filesadded, filesremoved
    if extra is None and any(x is not None for x in extrasentries):
        extra = {}
    if p1copies is not None:
        p1copies = metadata.encodecopies(files, p1copies)
        extra[b'p1copies'] = p1copies
    if p2copies is not None:
        p2copies = metadata.encodecopies(files, p2copies)
        extra[b'p2copies'] = p2copies
    if filesadded is not None:
        filesadded = metadata.encodefileindices(files, filesadded)
        extra[b'filesadded'] = filesadded
    if filesremoved is not None:
        filesremoved = metadata.encodefileindices(files, filesremoved)
        extra[b'filesremoved'] = filesremoved
    return extra