annotate contrib/synthrepo.py @ 22709:889789a2ca9f

contrib/synthrepo: walk a repo's directory structure during analysis Augments the analyze command to additionally walk the repo's current directory structure (or of any directory tree), counting how many files appear in which paths. This data is saved in the repo model to be used by synthesize, for creating an initial commit with many files. This change is aimed at developing, testing and measuring scaling improvements when importing/converting a large repository to mercurial.
author Mike Edgar <adgar@google.com>
date Fri, 12 Sep 2014 22:07:23 -0400
parents 4c66e70c3488
children 944d6cfbe166
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
1 # synthrepo.py - repo synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
2 #
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
3 # Copyright 2012 Facebook
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
4 #
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
7
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
8 '''synthesize structurally interesting change history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
9
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
10 This extension is useful for creating a repository with properties
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
11 that are statistically similar to an existing repository. During
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
12 analysis, a simple probability table is constructed from the history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
13 of an existing repository. During synthesis, these properties are
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
14 reconstructed.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
15
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
16 Properties that are analyzed and synthesized include the following:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
17
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
18 - Lines added or removed when an existing file is modified
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
19 - Number and sizes of files added
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
20 - Number of files removed
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
21 - Line lengths
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
22 - Topological distance to parent changeset(s)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
23 - Probability of a commit being a merge
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
24 - Probability of a newly added file being added to a new directory
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
25 - Interarrival time, and time zone, of commits
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
26 - Number of files in each directory
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
27
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
28 A few obvious properties that are not currently handled realistically:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
29
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
30 - Merges are treated as regular commits with two parents, which is not
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
31 realistic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
32 - Modifications are not treated as operations on hunks of lines, but
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
33 as insertions and deletions of randomly chosen single lines
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
34 - Committer ID (always random)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
35 - Executability of files
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
36 - Symlinks and binary files are ignored
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
37 '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
38
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
39 import bisect, collections, itertools, json, os, random, time, sys
19322
ff1586a3adc5 cleanup: remove unused imports
Simon Heimberg <simohe@besonet.ch>
parents: 18927
diff changeset
40 from mercurial import cmdutil, context, patch, scmutil, util, hg
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
41 from mercurial.i18n import _
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
42 from mercurial.node import nullrev, nullid, short
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
43
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
44 testedwith = 'internal'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
45
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
46 cmdtable = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
47 command = cmdutil.command(cmdtable)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
48
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
49 newfile = set(('new fi', 'rename', 'copy f', 'copy t'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
50
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
51 def zerodict():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
52 return collections.defaultdict(lambda: 0)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
53
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
54 def roundto(x, k):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
55 if x > k * 2:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
56 return int(round(x / float(k)) * k)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
57 return int(round(x))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
58
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
59 def parsegitdiff(lines):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
60 filename, mar, lineadd, lineremove = None, None, zerodict(), 0
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
61 binary = False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
62 for line in lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
63 start = line[:6]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
64 if start == 'diff -':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
65 if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
66 yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
67 mar, lineadd, lineremove, binary = 'm', zerodict(), 0, False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
68 filename = patch.gitre.match(line).group(1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
69 elif start in newfile:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
70 mar = 'a'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
71 elif start == 'GIT bi':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
72 binary = True
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
73 elif start == 'delete':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
74 mar = 'r'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
75 elif start:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
76 s = start[0]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
77 if s == '-' and not line.startswith('--- '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
78 lineremove += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
79 elif s == '+' and not line.startswith('+++ '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
80 lineadd[roundto(len(line) - 1, 5)] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
81 if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
82 yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
83
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
84 @command('analyze',
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
85 [('o', 'output', '', _('write output to given file'), _('FILE')),
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
86 ('r', 'rev', [], _('analyze specified revisions'), _('REV'))],
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
87 _('hg analyze'), optionalrepo=True)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
88 def analyze(ui, repo, *revs, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
89 '''create a simple model of a repository to use for later synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
90
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
91 This command examines every changeset in the given range (or all
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
92 of history if none are specified) and creates a simple statistical
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
93 model of the history of the repository. It also measures the directory
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
94 structure of the repository as checked out.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
95
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
96 The model is written out to a JSON file, and can be used by
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
97 :hg:`synthesize` to create or augment a repository with synthetic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
98 commits that have a structure that is statistically similar to the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
99 analyzed repository.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
100 '''
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
101 root = repo.root
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
102 if not root.endswith(os.path.sep):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
103 root += os.path.sep
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
104
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
105 revs = list(revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
106 revs.extend(opts['rev'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
107 if not revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
108 revs = [':']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
109
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
110 output = opts['output']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
111 if not output:
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
112 output = os.path.basename(root) + '.json'
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
113
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
114 if output == '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
115 fp = sys.stdout
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
116 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
117 fp = open(output, 'w')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
118
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
119 # Always obtain file counts of each directory in the given root directory.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
120 def onerror(e):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
121 ui.warn(_('error walking directory structure: %s\n') % e)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
122
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
123 dirs = {}
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
124 rootprefixlen = len(root)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
125 for dirpath, dirnames, filenames in os.walk(root, onerror=onerror):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
126 dirpathfromroot = dirpath[rootprefixlen:]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
127 dirs[dirpathfromroot] = len(filenames)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
128 if '.hg' in dirnames:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
129 dirnames.remove('.hg')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
130
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
131 lineschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
132 children = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
133 p1distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
134 p2distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
135 linesinfilesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
136 fileschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
137 filesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
138 filesremoved = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
139 linelengths = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
140 interarrival = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
141 parents = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
142 dirsadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
143 tzoffset = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
144
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
145 # If a mercurial repo is available, also model the commit history.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
146 if repo:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
147 revs = scmutil.revrange(repo, revs)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
148 revs.sort()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
149
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
150 progress = ui.progress
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
151 _analyzing = _('analyzing')
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
152 _changesets = _('changesets')
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
153 _total = len(revs)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
154
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
155 for i, rev in enumerate(revs):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
156 progress(_analyzing, i, unit=_changesets, total=_total)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
157 ctx = repo[rev]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
158 pl = ctx.parents()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
159 pctx = pl[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
160 prev = pctx.rev()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
161 children[prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
162 p1distance[rev - prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
163 parents[len(pl)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
164 tzoffset[ctx.date()[1]] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
165 if len(pl) > 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
166 p2distance[rev - pl[1].rev()] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
167 if prev == rev - 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
168 lastctx = pctx
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
169 else:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
170 lastctx = repo[rev - 1]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
171 if lastctx.rev() != nullrev:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
172 timedelta = ctx.date()[0] - lastctx.date()[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
173 interarrival[roundto(timedelta, 300)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
174 diff = sum((d.splitlines() for d in ctx.diff(pctx, git=True)), [])
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
175 fileadds, diradds, fileremoves, filechanges = 0, 0, 0, 0
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
176 for filename, mar, lineadd, lineremove, isbin in parsegitdiff(diff):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
177 if isbin:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
178 continue
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
179 added = sum(lineadd.itervalues(), 0)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
180 if mar == 'm':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
181 if added and lineremove:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
182 lineschanged[roundto(added, 5),
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
183 roundto(lineremove, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
184 filechanges += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
185 elif mar == 'a':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
186 fileadds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
187 if '/' in filename:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
188 filedir = filename.rsplit('/', 1)[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
189 if filedir not in pctx.dirs():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
190 diradds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
191 linesinfilesadded[roundto(added, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
192 elif mar == 'r':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
193 fileremoves += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
194 for length, count in lineadd.iteritems():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
195 linelengths[length] += count
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
196 fileschanged[filechanges] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
197 filesadded[fileadds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
198 dirsadded[diradds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
199 filesremoved[fileremoves] += 1
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
200
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
201 invchildren = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
202
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
203 for rev, count in children.iteritems():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
204 invchildren[count] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
205
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
206 if output != '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
207 ui.status(_('writing output to %s\n') % output)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
208
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
209 def pronk(d):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
210 return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
211
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
212 json.dump({'revs': len(revs),
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
213 'initdirs': pronk(dirs),
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
214 'lineschanged': pronk(lineschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
215 'children': pronk(invchildren),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
216 'fileschanged': pronk(fileschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
217 'filesadded': pronk(filesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
218 'linesinfilesadded': pronk(linesinfilesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
219 'dirsadded': pronk(dirsadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
220 'filesremoved': pronk(filesremoved),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
221 'linelengths': pronk(linelengths),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
222 'parents': pronk(parents),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
223 'p1distance': pronk(p1distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
224 'p2distance': pronk(p2distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
225 'interarrival': pronk(interarrival),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
226 'tzoffset': pronk(tzoffset),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
227 },
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
228 fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
229 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
230
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
231 @command('synthesize',
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
232 [('c', 'count', 0, _('create given number of commits'), _('COUNT')),
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
233 ('', 'dict', '', _('path to a dictionary of words'), _('FILE')),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
234 ('', 'initfiles', 0, _('initial file count to create'), _('COUNT'))],
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
235 _('hg synthesize [OPTION].. DESCFILE'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
236 def synthesize(ui, repo, descpath, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
237 '''synthesize commits based on a model of an existing repository
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
238
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
239 The model must have been generated by :hg:`analyze`. Commits will
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
240 be generated randomly according to the probabilities described in
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
241 the model. If --initfiles is set, the repository will be seeded with
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
242 the given number files following the modeled repository's directory
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
243 structure.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
244
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
245 When synthesizing new content, commit descriptions, and user
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
246 names, words will be chosen randomly from a dictionary that is
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
247 presumed to contain one word per line. Use --dict to specify the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
248 path to an alternate dictionary to use.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
249 '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
250 try:
17887
0e2846b2482c url: use open and not url.open for local files (issue3624)
Siddharth Agarwal <sid0@fb.com>
parents: 17734
diff changeset
251 fp = hg.openpath(ui, descpath)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
252 except Exception, err:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
253 raise util.Abort('%s: %s' % (descpath, err[0].strerror))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
254 desc = json.load(fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
255 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
256
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
257 def cdf(l):
18047
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
258 if not l:
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
259 return [], []
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
260 vals, probs = zip(*sorted(l, key=lambda x: x[1], reverse=True))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
261 t = float(sum(probs, 0))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
262 s, cdfs = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
263 for v in probs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
264 s += v
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
265 cdfs.append(s / t)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
266 return vals, cdfs
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
267
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
268 lineschanged = cdf(desc['lineschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
269 fileschanged = cdf(desc['fileschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
270 filesadded = cdf(desc['filesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
271 dirsadded = cdf(desc['dirsadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
272 filesremoved = cdf(desc['filesremoved'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
273 linelengths = cdf(desc['linelengths'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
274 parents = cdf(desc['parents'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
275 p1distance = cdf(desc['p1distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
276 p2distance = cdf(desc['p2distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
277 interarrival = cdf(desc['interarrival'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
278 linesinfilesadded = cdf(desc['linesinfilesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
279 tzoffset = cdf(desc['tzoffset'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
280
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
281 dictfile = opts.get('dict') or '/usr/share/dict/words'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
282 try:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
283 fp = open(dictfile, 'rU')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
284 except IOError, err:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
285 raise util.Abort('%s: %s' % (dictfile, err.strerror))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
286 words = fp.read().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
287 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
288
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
289 initdirs = {}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
290 if desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
291 for k, v in desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
292 initdirs[k.encode('utf-8').replace('.hg', '_hg')] = v
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
293 initdirs = renamedirs(initdirs, words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
294 initdirscdf = cdf(initdirs)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
295
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
296 def pick(cdf):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
297 return cdf[0][bisect.bisect_left(cdf[1], random.random())]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
298
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
299 def pickpath():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
300 return os.path.join(pick(initdirscdf), random.choice(words))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
301
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
302 def makeline(minimum=0):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
303 total = max(minimum, pick(linelengths))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
304 c, l = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
305 while c < total:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
306 w = random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
307 c += len(w) + 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
308 l.append(w)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
309 return ' '.join(l)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
310
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
311 wlock = repo.wlock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
312 lock = repo.lock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
313
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
314 nevertouch = set(('.hgsub', '.hgignore', '.hgtags'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
315
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
316 progress = ui.progress
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
317 _synthesizing = _('synthesizing')
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
318 _files = _('initial files')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
319 _changesets = _('changesets')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
320
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
321 # Synthesize a single initial revision adding files to the repo according
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
322 # to the modeled directory structure.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
323 initcount = int(opts['initfiles'])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
324 if initcount and initdirs:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
325 pctx = repo[None].parents()[0]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
326 files = {}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
327 for i in xrange(0, initcount):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
328 ui.progress(_synthesizing, i, unit=_files, total=initcount)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
329
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
330 path = pickpath()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
331 while path in pctx.dirs():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
332 path = pickpath()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
333 data = '%s contents\n' % path
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
334 files[path] = context.memfilectx(repo, path, data)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
335
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
336 def filectxfn(repo, memctx, path):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
337 return files[path]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
338
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
339 ui.progress(_synthesizing, None)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
340 message = 'synthesized wide repo with %d files' % (len(files),)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
341 mc = context.memctx(repo, [pctx.node(), nullid], message,
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
342 files.iterkeys(), filectxfn, ui.username(),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
343 '%d %d' % util.makedate())
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
344 initnode = mc.commit()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
345 hexfn = ui.debugflag and hex or short
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
346 ui.status(_('added commit %s with %d files\n')
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
347 % (hexfn(initnode), len(files)))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
348
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
349 # Synthesize incremental revisions to the repository, adding repo depth.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
350 count = int(opts['count'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
351 heads = set(map(repo.changelog.rev, repo.heads()))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
352 for i in xrange(count):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
353 progress(_synthesizing, i, unit=_changesets, total=count)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
354
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
355 node = repo.changelog.node
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
356 revs = len(repo)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
357
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
358 def pickhead(heads, distance):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
359 if heads:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
360 lheads = sorted(heads)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
361 rev = revs - min(pick(distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
362 if rev < lheads[-1]:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
363 rev = lheads[bisect.bisect_left(lheads, rev)]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
364 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
365 rev = lheads[-1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
366 return rev, node(rev)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
367 return nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
368
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
369 r1 = revs - min(pick(p1distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
370 p1 = node(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
371
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
372 # the number of heads will grow without bound if we use a pure
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
373 # model, so artificially constrain their proliferation
22472
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
374 toomanyheads = len(heads) > random.randint(1, 20)
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
375 if p2distance[0] and (pick(parents) == 2 or toomanyheads):
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
376 r2, p2 = pickhead(heads.difference([r1]), p2distance)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
377 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
378 r2, p2 = nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
379
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
380 pl = [p1, p2]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
381 pctx = repo[r1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
382 mf = pctx.manifest()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
383 mfk = mf.keys()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
384 changes = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
385 if mfk:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
386 for __ in xrange(pick(fileschanged)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
387 for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
388 fctx = pctx.filectx(random.choice(mfk))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
389 path = fctx.path()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
390 if not (path in nevertouch or fctx.isbinary() or
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
391 'l' in fctx.flags()):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
392 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
393 lines = fctx.data().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
394 add, remove = pick(lineschanged)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
395 for __ in xrange(remove):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
396 if not lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
397 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
398 del lines[random.randrange(0, len(lines))]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
399 for __ in xrange(add):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
400 lines.insert(random.randint(0, len(lines)), makeline())
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
401 path = fctx.path()
21689
503bb3af70fe memfilectx: call super.__init__ instead of duplicating code
Sean Farley <sean.michael.farley@gmail.com>
parents: 20672
diff changeset
402 changes[path] = context.memfilectx(repo, path,
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
403 '\n'.join(lines) + '\n')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
404 for __ in xrange(pick(filesremoved)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
405 path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
406 for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
407 path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
408 if path not in changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
409 changes[path] = None
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
410 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
411 if filesadded:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
412 dirs = list(pctx.dirs())
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
413 dirs.append('')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
414 for __ in xrange(pick(filesadded)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
415 path = [random.choice(dirs)]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
416 if pick(dirsadded):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
417 path.append(random.choice(words))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
418 path.append(random.choice(words))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
419 path = '/'.join(filter(None, path))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
420 data = '\n'.join(makeline()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
421 for __ in xrange(pick(linesinfilesadded))) + '\n'
21689
503bb3af70fe memfilectx: call super.__init__ instead of duplicating code
Sean Farley <sean.michael.farley@gmail.com>
parents: 20672
diff changeset
422 changes[path] = context.memfilectx(repo, path, data)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
423 def filectxfn(repo, memctx, path):
22446
054ec0212718 contrib/synthrepo: return None to delete files on commit, don't raise IOError
Mike Edgar <adgar@google.com>
parents: 21689
diff changeset
424 return changes[path]
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
425 if not changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
426 continue
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
427 if revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
428 date = repo['tip'].date()[0] + pick(interarrival)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
429 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
430 date = time.time() - (86400 * count)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
431 user = random.choice(words) + '@' + random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
432 mc = context.memctx(repo, pl, makeline(minimum=2),
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
433 sorted(changes.iterkeys()),
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
434 filectxfn, user, '%d %d' % (date, pick(tzoffset)))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
435 newnode = mc.commit()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
436 heads.add(repo.changelog.rev(newnode))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
437 heads.discard(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
438 heads.discard(r2)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
439
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
440 lock.release()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
441 wlock.release()
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
442
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
443 def renamedirs(dirs, words):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
444 '''Randomly rename the directory names in the per-dir file count dict.'''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
445 wordgen = itertools.cycle(words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
446 replacements = {'': ''}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
447 def rename(dirpath):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
448 '''Recursively rename the directory and all path prefixes.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
449
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
450 The mapping from path to renamed path is stored for all path prefixes
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
451 as in dynamic programming, ensuring linear runtime and consistent
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
452 renaming regardless of iteration order through the model.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
453 '''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
454 if dirpath in replacements:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
455 return replacements[dirpath]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
456 head, _ = os.path.split(dirpath)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
457 head = head and rename(head) or ''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
458 renamed = os.path.join(head, wordgen.next())
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
459 replacements[dirpath] = renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
460 return renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
461 result = []
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
462 for dirpath, count in dirs.iteritems():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
463 result.append([rename(dirpath.lstrip(os.sep)), count])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
464 return result