contrib/synthrepo.py
author Martin von Zweigbergk <martinvonz@google.com>
Thu, 22 Jan 2015 23:18:43 -0800
changeset 24105 0f8baebcdbea
parent 23778 a5dbec255f14
child 24306 6ddc86eedc3b
permissions -rw-r--r--
trydiff: read file data in only one place This moves getfilectx() out of the initial block in the loop, leaving that block to be only about finding pairs of filenames in ctx1 and ctx2 to diff.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     1
# synthrepo.py - repo synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     2
#
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     3
# Copyright 2012 Facebook
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     4
#
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     5
# This software may be used and distributed according to the terms of the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     6
# GNU General Public License version 2 or any later version.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     7
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     8
'''synthesize structurally interesting change history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
     9
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    10
This extension is useful for creating a repository with properties
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    11
that are statistically similar to an existing repository. During
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    12
analysis, a simple probability table is constructed from the history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    13
of an existing repository.  During synthesis, these properties are
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    14
reconstructed.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    15
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    16
Properties that are analyzed and synthesized include the following:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    17
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    18
- Lines added or removed when an existing file is modified
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    19
- Number and sizes of files added
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    20
- Number of files removed
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    21
- Line lengths
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    22
- Topological distance to parent changeset(s)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    23
- Probability of a commit being a merge
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    24
- Probability of a newly added file being added to a new directory
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    25
- Interarrival time, and time zone, of commits
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
    26
- Number of files in each directory
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    27
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    28
A few obvious properties that are not currently handled realistically:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    29
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    30
- Merges are treated as regular commits with two parents, which is not
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    31
  realistic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    32
- Modifications are not treated as operations on hunks of lines, but
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    33
  as insertions and deletions of randomly chosen single lines
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    34
- Committer ID (always random)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    35
- Executability of files
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    36
- Symlinks and binary files are ignored
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    37
'''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    38
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
    39
import bisect, collections, itertools, json, os, random, time, sys
19322
ff1586a3adc5 cleanup: remove unused imports
Simon Heimberg <simohe@besonet.ch>
parents: 18927
diff changeset
    40
from mercurial import cmdutil, context, patch, scmutil, util, hg
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    41
from mercurial.i18n import _
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
    42
from mercurial.node import nullrev, nullid, short
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    43
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    44
testedwith = 'internal'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    45
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    46
cmdtable = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    47
command = cmdutil.command(cmdtable)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    48
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    49
newfile = set(('new fi', 'rename', 'copy f', 'copy t'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    50
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    51
def zerodict():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    52
    return collections.defaultdict(lambda: 0)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    53
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    54
def roundto(x, k):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    55
    if x > k * 2:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    56
        return int(round(x / float(k)) * k)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    57
    return int(round(x))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    58
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    59
def parsegitdiff(lines):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    60
    filename, mar, lineadd, lineremove = None, None, zerodict(), 0
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    61
    binary = False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    62
    for line in lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    63
        start = line[:6]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    64
        if start == 'diff -':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    65
            if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    66
                yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    67
            mar, lineadd, lineremove, binary = 'm', zerodict(), 0, False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    68
            filename = patch.gitre.match(line).group(1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    69
        elif start in newfile:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    70
            mar = 'a'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    71
        elif start == 'GIT bi':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    72
            binary = True
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    73
        elif start == 'delete':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    74
            mar = 'r'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    75
        elif start:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    76
            s = start[0]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    77
            if s == '-' and not line.startswith('--- '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    78
                lineremove += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    79
            elif s == '+' and not line.startswith('+++ '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    80
                lineadd[roundto(len(line) - 1, 5)] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    81
    if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    82
        yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    83
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    84
@command('analyze',
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
    85
         [('o', 'output', '', _('write output to given file'), _('FILE')),
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    86
          ('r', 'rev', [], _('analyze specified revisions'), _('REV'))],
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
    87
         _('hg analyze'), optionalrepo=True)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    88
def analyze(ui, repo, *revs, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    89
    '''create a simple model of a repository to use for later synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    90
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    91
    This command examines every changeset in the given range (or all
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    92
    of history if none are specified) and creates a simple statistical
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
    93
    model of the history of the repository. It also measures the directory
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
    94
    structure of the repository as checked out.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    95
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    96
    The model is written out to a JSON file, and can be used by
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    97
    :hg:`synthesize` to create or augment a repository with synthetic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    98
    commits that have a structure that is statistically similar to the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
    99
    analyzed repository.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   100
    '''
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   101
    root = repo.root
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   102
    if not root.endswith(os.path.sep):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   103
        root += os.path.sep
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   104
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   105
    revs = list(revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   106
    revs.extend(opts['rev'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   107
    if not revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   108
        revs = [':']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   109
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   110
    output = opts['output']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   111
    if not output:
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   112
        output = os.path.basename(root) + '.json'
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   113
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   114
    if output == '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   115
        fp = sys.stdout
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   116
    else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   117
        fp = open(output, 'w')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   118
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   119
    # Always obtain file counts of each directory in the given root directory.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   120
    def onerror(e):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   121
        ui.warn(_('error walking directory structure: %s\n') % e)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   122
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   123
    dirs = {}
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   124
    rootprefixlen = len(root)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   125
    for dirpath, dirnames, filenames in os.walk(root, onerror=onerror):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   126
        dirpathfromroot = dirpath[rootprefixlen:]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   127
        dirs[dirpathfromroot] = len(filenames)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   128
        if '.hg' in dirnames:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   129
            dirnames.remove('.hg')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   130
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   131
    lineschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   132
    children = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   133
    p1distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   134
    p2distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   135
    linesinfilesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   136
    fileschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   137
    filesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   138
    filesremoved = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   139
    linelengths = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   140
    interarrival = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   141
    parents = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   142
    dirsadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   143
    tzoffset = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   144
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   145
    # If a mercurial repo is available, also model the commit history.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   146
    if repo:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   147
        revs = scmutil.revrange(repo, revs)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   148
        revs.sort()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   149
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   150
        progress = ui.progress
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   151
        _analyzing = _('analyzing')
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   152
        _changesets = _('changesets')
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   153
        _total = len(revs)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   154
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   155
        for i, rev in enumerate(revs):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   156
            progress(_analyzing, i, unit=_changesets, total=_total)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   157
            ctx = repo[rev]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   158
            pl = ctx.parents()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   159
            pctx = pl[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   160
            prev = pctx.rev()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   161
            children[prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   162
            p1distance[rev - prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   163
            parents[len(pl)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   164
            tzoffset[ctx.date()[1]] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   165
            if len(pl) > 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   166
                p2distance[rev - pl[1].rev()] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   167
            if prev == rev - 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   168
                lastctx = pctx
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   169
            else:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   170
                lastctx = repo[rev - 1]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   171
            if lastctx.rev() != nullrev:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   172
                timedelta = ctx.date()[0] - lastctx.date()[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   173
                interarrival[roundto(timedelta, 300)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   174
            diff = sum((d.splitlines() for d in ctx.diff(pctx, git=True)), [])
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   175
            fileadds, diradds, fileremoves, filechanges = 0, 0, 0, 0
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   176
            for filename, mar, lineadd, lineremove, isbin in parsegitdiff(diff):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   177
                if isbin:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   178
                    continue
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   179
                added = sum(lineadd.itervalues(), 0)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   180
                if mar == 'm':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   181
                    if added and lineremove:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   182
                        lineschanged[roundto(added, 5),
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   183
                                     roundto(lineremove, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   184
                        filechanges += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   185
                elif mar == 'a':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   186
                    fileadds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   187
                    if '/' in filename:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   188
                        filedir = filename.rsplit('/', 1)[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   189
                        if filedir not in pctx.dirs():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   190
                            diradds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   191
                    linesinfilesadded[roundto(added, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   192
                elif mar == 'r':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   193
                    fileremoves += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   194
                for length, count in lineadd.iteritems():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   195
                    linelengths[length] += count
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   196
            fileschanged[filechanges] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   197
            filesadded[fileadds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   198
            dirsadded[diradds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   199
            filesremoved[fileremoves] += 1
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   200
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   201
    invchildren = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   202
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   203
    for rev, count in children.iteritems():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   204
        invchildren[count] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   205
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   206
    if output != '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   207
        ui.status(_('writing output to %s\n') % output)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   208
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   209
    def pronk(d):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   210
        return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   211
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   212
    json.dump({'revs': len(revs),
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
   213
               'initdirs': pronk(dirs),
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   214
               'lineschanged': pronk(lineschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   215
               'children': pronk(invchildren),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   216
               'fileschanged': pronk(fileschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   217
               'filesadded': pronk(filesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   218
               'linesinfilesadded': pronk(linesinfilesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   219
               'dirsadded': pronk(dirsadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   220
               'filesremoved': pronk(filesremoved),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   221
               'linelengths': pronk(linelengths),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   222
               'parents': pronk(parents),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   223
               'p1distance': pronk(p1distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   224
               'p2distance': pronk(p2distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   225
               'interarrival': pronk(interarrival),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   226
               'tzoffset': pronk(tzoffset),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
   227
               },
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   228
              fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   229
    fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   230
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   231
@command('synthesize',
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   232
         [('c', 'count', 0, _('create given number of commits'), _('COUNT')),
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   233
          ('', 'dict', '', _('path to a dictionary of words'), _('FILE')),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   234
          ('', 'initfiles', 0, _('initial file count to create'), _('COUNT'))],
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   235
         _('hg synthesize [OPTION].. DESCFILE'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   236
def synthesize(ui, repo, descpath, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   237
    '''synthesize commits based on a model of an existing repository
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   238
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   239
    The model must have been generated by :hg:`analyze`. Commits will
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   240
    be generated randomly according to the probabilities described in
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   241
    the model. If --initfiles is set, the repository will be seeded with
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   242
    the given number files following the modeled repository's directory
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   243
    structure.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   244
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   245
    When synthesizing new content, commit descriptions, and user
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   246
    names, words will be chosen randomly from a dictionary that is
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   247
    presumed to contain one word per line. Use --dict to specify the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   248
    path to an alternate dictionary to use.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   249
    '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   250
    try:
17887
0e2846b2482c url: use open and not url.open for local files (issue3624)
Siddharth Agarwal <sid0@fb.com>
parents: 17734
diff changeset
   251
        fp = hg.openpath(ui, descpath)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   252
    except Exception, err:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   253
        raise util.Abort('%s: %s' % (descpath, err[0].strerror))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   254
    desc = json.load(fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   255
    fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   256
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   257
    def cdf(l):
18047
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
   258
        if not l:
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
   259
            return [], []
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   260
        vals, probs = zip(*sorted(l, key=lambda x: x[1], reverse=True))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   261
        t = float(sum(probs, 0))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   262
        s, cdfs = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   263
        for v in probs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   264
            s += v
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   265
            cdfs.append(s / t)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   266
        return vals, cdfs
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   267
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   268
    lineschanged = cdf(desc['lineschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   269
    fileschanged = cdf(desc['fileschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   270
    filesadded = cdf(desc['filesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   271
    dirsadded = cdf(desc['dirsadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   272
    filesremoved = cdf(desc['filesremoved'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   273
    linelengths = cdf(desc['linelengths'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   274
    parents = cdf(desc['parents'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   275
    p1distance = cdf(desc['p1distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   276
    p2distance = cdf(desc['p2distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   277
    interarrival = cdf(desc['interarrival'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   278
    linesinfilesadded = cdf(desc['linesinfilesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   279
    tzoffset = cdf(desc['tzoffset'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   280
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   281
    dictfile = opts.get('dict') or '/usr/share/dict/words'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   282
    try:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   283
        fp = open(dictfile, 'rU')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   284
    except IOError, err:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   285
        raise util.Abort('%s: %s' % (dictfile, err.strerror))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   286
    words = fp.read().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   287
    fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   288
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   289
    initdirs = {}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   290
    if desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   291
        for k, v in desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   292
            initdirs[k.encode('utf-8').replace('.hg', '_hg')] = v
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   293
        initdirs = renamedirs(initdirs, words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   294
    initdirscdf = cdf(initdirs)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   295
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   296
    def pick(cdf):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   297
        return cdf[0][bisect.bisect_left(cdf[1], random.random())]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   298
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   299
    def pickpath():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   300
        return os.path.join(pick(initdirscdf), random.choice(words))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   301
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   302
    def makeline(minimum=0):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   303
        total = max(minimum, pick(linelengths))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   304
        c, l = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   305
        while c < total:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   306
            w = random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   307
            c += len(w) + 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   308
            l.append(w)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   309
        return ' '.join(l)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   310
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   311
    wlock = repo.wlock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   312
    lock = repo.lock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   313
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   314
    nevertouch = set(('.hgsub', '.hgignore', '.hgtags'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   315
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   316
    progress = ui.progress
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   317
    _synthesizing = _('synthesizing')
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   318
    _files = _('initial files')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   319
    _changesets = _('changesets')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   320
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   321
    # Synthesize a single initial revision adding files to the repo according
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   322
    # to the modeled directory structure.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   323
    initcount = int(opts['initfiles'])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   324
    if initcount and initdirs:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   325
        pctx = repo[None].parents()[0]
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   326
        dirs = set(pctx.dirs())
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   327
        files = {}
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   328
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   329
        def validpath(path):
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   330
            # Don't pick filenames which are already directory names.
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   331
            if path in dirs:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   332
                return False
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   333
            # Don't pick directories which were used as file names.
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   334
            while path:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   335
                if path in files:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   336
                    return False
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   337
                path = os.path.dirname(path)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   338
            return True
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   339
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   340
        for i in xrange(0, initcount):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   341
            ui.progress(_synthesizing, i, unit=_files, total=initcount)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   342
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   343
            path = pickpath()
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   344
            while not validpath(path):
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   345
                path = pickpath()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   346
            data = '%s contents\n' % path
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   347
            files[path] = context.memfilectx(repo, path, data)
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   348
            dir = os.path.dirname(path)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   349
            while dir and dir not in dirs:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   350
                dirs.add(dir)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
   351
                dir = os.path.dirname(dir)
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   352
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   353
        def filectxfn(repo, memctx, path):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   354
            return files[path]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   355
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   356
        ui.progress(_synthesizing, None)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   357
        message = 'synthesized wide repo with %d files' % (len(files),)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   358
        mc = context.memctx(repo, [pctx.node(), nullid], message,
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   359
                            files.iterkeys(), filectxfn, ui.username(),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   360
                            '%d %d' % util.makedate())
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   361
        initnode = mc.commit()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   362
        hexfn = ui.debugflag and hex or short
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   363
        ui.status(_('added commit %s with %d files\n')
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   364
                  % (hexfn(initnode), len(files)))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   365
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   366
    # Synthesize incremental revisions to the repository, adding repo depth.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   367
    count = int(opts['count'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   368
    heads = set(map(repo.changelog.rev, repo.heads()))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   369
    for i in xrange(count):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   370
        progress(_synthesizing, i, unit=_changesets, total=count)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   371
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   372
        node = repo.changelog.node
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   373
        revs = len(repo)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   374
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   375
        def pickhead(heads, distance):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   376
            if heads:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   377
                lheads = sorted(heads)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   378
                rev = revs - min(pick(distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   379
                if rev < lheads[-1]:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   380
                    rev = lheads[bisect.bisect_left(lheads, rev)]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   381
                else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   382
                    rev = lheads[-1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   383
                return rev, node(rev)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   384
            return nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   385
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   386
        r1 = revs - min(pick(p1distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   387
        p1 = node(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   388
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   389
        # the number of heads will grow without bound if we use a pure
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   390
        # model, so artificially constrain their proliferation
22472
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
   391
        toomanyheads = len(heads) > random.randint(1, 20)
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
   392
        if p2distance[0] and (pick(parents) == 2 or toomanyheads):
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   393
            r2, p2 = pickhead(heads.difference([r1]), p2distance)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   394
        else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   395
            r2, p2 = nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   396
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   397
        pl = [p1, p2]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   398
        pctx = repo[r1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   399
        mf = pctx.manifest()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   400
        mfk = mf.keys()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   401
        changes = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   402
        if mfk:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   403
            for __ in xrange(pick(fileschanged)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   404
                for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   405
                    fctx = pctx.filectx(random.choice(mfk))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   406
                    path = fctx.path()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   407
                    if not (path in nevertouch or fctx.isbinary() or
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   408
                            'l' in fctx.flags()):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   409
                        break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   410
                lines = fctx.data().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   411
                add, remove = pick(lineschanged)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   412
                for __ in xrange(remove):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   413
                    if not lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   414
                        break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   415
                    del lines[random.randrange(0, len(lines))]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   416
                for __ in xrange(add):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   417
                    lines.insert(random.randint(0, len(lines)), makeline())
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   418
                path = fctx.path()
21689
503bb3af70fe memfilectx: call super.__init__ instead of duplicating code
Sean Farley <sean.michael.farley@gmail.com>
parents: 20672
diff changeset
   419
                changes[path] = context.memfilectx(repo, path,
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   420
                                                   '\n'.join(lines) + '\n')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   421
            for __ in xrange(pick(filesremoved)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   422
                path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   423
                for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   424
                    path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   425
                    if path not in changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   426
                        changes[path] = None
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   427
                        break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   428
        if filesadded:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   429
            dirs = list(pctx.dirs())
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   430
            dirs.insert(0, '')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   431
        for __ in xrange(pick(filesadded)):
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   432
            pathstr = ''
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   433
            while pathstr in dirs:
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   434
                path = [random.choice(dirs)]
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   435
                if pick(dirsadded):
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   436
                    path.append(random.choice(words))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   437
                path.append(random.choice(words))
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   438
                pathstr = '/'.join(filter(None, path))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   439
            data = '\n'.join(makeline()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   440
                             for __ in xrange(pick(linesinfilesadded))) + '\n'
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
   441
            changes[pathstr] = context.memfilectx(repo, pathstr, data)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   442
        def filectxfn(repo, memctx, path):
22446
054ec0212718 contrib/synthrepo: return None to delete files on commit, don't raise IOError
Mike Edgar <adgar@google.com>
parents: 21689
diff changeset
   443
            return changes[path]
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   444
        if not changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   445
            continue
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   446
        if revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   447
            date = repo['tip'].date()[0] + pick(interarrival)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   448
        else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   449
            date = time.time() - (86400 * count)
23234
944d6cfbe166 synthrepo: synthesized dates must be positive, fit in 32-bit signed ints
Mike Edgar <adgar@google.com>
parents: 22709
diff changeset
   450
        # dates in mercurial must be positive, fit in 32-bit signed integers.
944d6cfbe166 synthrepo: synthesized dates must be positive, fit in 32-bit signed ints
Mike Edgar <adgar@google.com>
parents: 22709
diff changeset
   451
        date = min(0x7fffffff, max(0, date))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   452
        user = random.choice(words) + '@' + random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   453
        mc = context.memctx(repo, pl, makeline(minimum=2),
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   454
                            sorted(changes.iterkeys()),
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   455
                            filectxfn, user, '%d %d' % (date, pick(tzoffset)))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   456
        newnode = mc.commit()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   457
        heads.add(repo.changelog.rev(newnode))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   458
        heads.discard(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   459
        heads.discard(r2)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   460
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   461
    lock.release()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
   462
    wlock.release()
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   463
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   464
def renamedirs(dirs, words):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   465
    '''Randomly rename the directory names in the per-dir file count dict.'''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   466
    wordgen = itertools.cycle(words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   467
    replacements = {'': ''}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   468
    def rename(dirpath):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   469
        '''Recursively rename the directory and all path prefixes.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   470
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   471
        The mapping from path to renamed path is stored for all path prefixes
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   472
        as in dynamic programming, ensuring linear runtime and consistent
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   473
        renaming regardless of iteration order through the model.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   474
        '''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   475
        if dirpath in replacements:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   476
            return replacements[dirpath]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   477
        head, _ = os.path.split(dirpath)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   478
        head = head and rename(head) or ''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   479
        renamed = os.path.join(head, wordgen.next())
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   480
        replacements[dirpath] = renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   481
        return renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   482
    result = []
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   483
    for dirpath, count in dirs.iteritems():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   484
        result.append([rename(dirpath.lstrip(os.sep)), count])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
   485
    return result