view hgext/automv.py @ 34413:014d467f9d08

effectflag: store an empty effect flag for the moment The idea behind effect flag is to store additional information in obs-markers about what changed between a changeset and its successor(s). It's a low-level information that comes without guarantees. This information can be computed a posteriori, but only if we have all changesets locally. This is not the case with distributed workflows where you work with several people or on several computers (eg: laptop + build server). Storing the effect-flag as a bitfield has several advantages: - It's compact, we are using one byte per obs-marker at most for the effect- flag. - It's compoundable, the obsfate log approach needs to display evolve history that could spans several obs-markers. Computing the effect-flag between a changeset and its grand-grand-grand-successor is simple thanks to the bitfield. The effect-flag design has also some limitations: - Evolving a changeset and reverting these changes just after would lead to two obs-markers with the same effect-flag without information that the first and third changesets are the same. The effect-flag current design is a trade-off between compactness and usefulness. Storing this information helps commands to display a more complete and understandable evolve history. For example, obslog (an Evolve command) use it to improve its output: x 62206adfd571 (34302) obscache: skip updating outdated obscache... | rewritten(parent) by Matthieu Laneuville <matthieu.laneuville@octobus... | rewritten(content) by Boris Feld <boris.feld@octobus.net> The effect flag is stored in obs-markers metadata while we iterate on the information we want to store. We plan to extend the existing obsmarkers bit-field when the effect flag design will be stabilized. It's different from the CommitCustody concept, effect-flag are not signed and can be forged. It's also different from the operation metadata as the command name (for example: amend) could alter a changeset in different ways (changing the content with hg amend, changing the description with hg amend -e, changing the user with hg amend -U). Also it's compatible with every custom command that writes obs-markers without needing to be updated. The effect-flag is placed behind an experimental flag set to off by default. Hook the saving of effect flag in create markers, but store only an empty one for the moment, I will refine the values in effect flag in following patches. For more information, see: https://www.mercurial-scm.org/wiki/ChangesetEvolutionDevel#Record_types_of_operation Differential Revision: https://phab.mercurial-scm.org/D533
author Boris Feld <boris.feld@octobus.net>
date Thu, 06 Jul 2017 14:50:17 +0200
parents 54bc88c56ec8
children 38637dd39cfd
line wrap: on
line source

# automv.py
#
# Copyright 2013-2016 Facebook, Inc.
#
# This software may be used and distributed according to the terms of the
# GNU General Public License version 2 or any later version.
"""check for unrecorded moves at commit time (EXPERIMENTAL)

This extension checks at commit/amend time if any of the committed files
comes from an unrecorded mv.

The threshold at which a file is considered a move can be set with the
``automv.similarity`` config option. This option takes a percentage between 0
(disabled) and 100 (files must be identical), the default is 95.

"""

# Using 95 as a default similarity is based on an analysis of the mercurial
# repositories of the cpython, mozilla-central & mercurial repositories, as
# well as 2 very large facebook repositories. At 95 50% of all potential
# missed moves would be caught, as well as correspond with 87% of all
# explicitly marked moves.  Together, 80% of moved files are 95% similar or
# more.
#
# See http://markmail.org/thread/5pxnljesvufvom57 for context.

from __future__ import absolute_import

from mercurial.i18n import _
from mercurial import (
    commands,
    copies,
    error,
    extensions,
    registrar,
    scmutil,
    similar
)

configtable = {}
configitem = registrar.configitem(configtable)

configitem('automv', 'similarity',
    default=95,
)

def extsetup(ui):
    entry = extensions.wrapcommand(
        commands.table, 'commit', mvcheck)
    entry[1].append(
        ('', 'no-automv', None,
         _('disable automatic file move detection')))

def mvcheck(orig, ui, repo, *pats, **opts):
    """Hook to check for moves at commit time"""
    renames = None
    disabled = opts.pop('no_automv', False)
    if not disabled:
        threshold = ui.configint('automv', 'similarity')
        if not 0 <= threshold <= 100:
            raise error.Abort(_('automv.similarity must be between 0 and 100'))
        if threshold > 0:
            match = scmutil.match(repo[None], pats, opts)
            added, removed = _interestingfiles(repo, match)
            renames = _findrenames(repo, match, added, removed,
                                   threshold / 100.0)

    with repo.wlock():
        if renames is not None:
            scmutil._markchanges(repo, (), (), renames)
        return orig(ui, repo, *pats, **opts)

def _interestingfiles(repo, matcher):
    """Find what files were added or removed in this commit.

    Returns a tuple of two lists: (added, removed). Only files not *already*
    marked as moved are included in the added list.

    """
    stat = repo.status(match=matcher)
    added = stat[1]
    removed = stat[2]

    copy = copies._forwardcopies(repo['.'], repo[None], matcher)
    # remove the copy files for which we already have copy info
    added = [f for f in added if f not in copy]

    return added, removed

def _findrenames(repo, matcher, added, removed, similarity):
    """Find what files in added are really moved files.

    Any file named in removed that is at least similarity% similar to a file
    in added is seen as a rename.

    """
    renames = {}
    if similarity > 0:
        for src, dst, score in similar.findrenames(
                repo, added, removed, similarity):
            if repo.ui.verbose:
                repo.ui.status(
                    _('detected move of %s as %s (%d%% similar)\n') % (
                        matcher.rel(src), matcher.rel(dst), score * 100))
            renames[dst] = src
    if renames:
        repo.ui.status(_('detected move of %d files\n') % len(renames))
    return renames