annotate contrib/synthrepo.py @ 28889:7a1e0711401e

test-remove: drop a useless Windows specific conditional The Windows branch didn't pick up the 'deleting' progress bar addition from 62e73d42bd14. But since the Windows branch already globbed the error message, let's just drop the other branch.
author Matt Harbison <matt_harbison@yahoo.com>
date Tue, 12 Apr 2016 00:34:02 -0400
parents 62250a48dc7f
children a0939666b836
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
1 # synthrepo.py - repo synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
2 #
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
3 # Copyright 2012 Facebook
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
4 #
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
7
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
8 '''synthesize structurally interesting change history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
9
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
10 This extension is useful for creating a repository with properties
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
11 that are statistically similar to an existing repository. During
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
12 analysis, a simple probability table is constructed from the history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
13 of an existing repository. During synthesis, these properties are
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
14 reconstructed.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
15
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
16 Properties that are analyzed and synthesized include the following:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
17
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
18 - Lines added or removed when an existing file is modified
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
19 - Number and sizes of files added
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
20 - Number of files removed
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
21 - Line lengths
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
22 - Topological distance to parent changeset(s)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
23 - Probability of a commit being a merge
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
24 - Probability of a newly added file being added to a new directory
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
25 - Interarrival time, and time zone, of commits
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
26 - Number of files in each directory
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
27
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
28 A few obvious properties that are not currently handled realistically:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
29
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
30 - Merges are treated as regular commits with two parents, which is not
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
31 realistic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
32 - Modifications are not treated as operations on hunks of lines, but
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
33 as insertions and deletions of randomly chosen single lines
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
34 - Committer ID (always random)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
35 - Executability of files
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
36 - Symlinks and binary files are ignored
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
37 '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
38
28563
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
39 from __future__ import absolute_import
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
40 import bisect
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
41 import collections
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
42 import itertools
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
43 import json
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
44 import os
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
45 import random
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
46 import sys
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
47 import time
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
48 from mercurial import (
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
49 cmdutil,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
50 context,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
51 error,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
52 hg,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
53 patch,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
54 scmutil,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
55 util,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
56 )
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
57 from mercurial.i18n import _
28563
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
58 from mercurial.node import (
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
59 nullid,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
60 nullrev,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
61 short,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
62 )
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
63
25186
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
64 # Note for extension authors: ONLY specify testedwith = 'internal' for
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
65 # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
66 # be specifying the version(s) of Mercurial they are tested with, or
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
67 # leave the attribute unspecified.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
68 testedwith = 'internal'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
69
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
70 cmdtable = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
71 command = cmdutil.command(cmdtable)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
72
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
73 newfile = set(('new fi', 'rename', 'copy f', 'copy t'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
74
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
75 def zerodict():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
76 return collections.defaultdict(lambda: 0)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
77
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
78 def roundto(x, k):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
79 if x > k * 2:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
80 return int(round(x / float(k)) * k)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
81 return int(round(x))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
82
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
83 def parsegitdiff(lines):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
84 filename, mar, lineadd, lineremove = None, None, zerodict(), 0
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
85 binary = False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
86 for line in lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
87 start = line[:6]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
88 if start == 'diff -':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
89 if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
90 yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
91 mar, lineadd, lineremove, binary = 'm', zerodict(), 0, False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
92 filename = patch.gitre.match(line).group(1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
93 elif start in newfile:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
94 mar = 'a'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
95 elif start == 'GIT bi':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
96 binary = True
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
97 elif start == 'delete':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
98 mar = 'r'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
99 elif start:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
100 s = start[0]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
101 if s == '-' and not line.startswith('--- '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
102 lineremove += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
103 elif s == '+' and not line.startswith('+++ '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
104 lineadd[roundto(len(line) - 1, 5)] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
105 if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
106 yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
107
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
108 @command('analyze',
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
109 [('o', 'output', '', _('write output to given file'), _('FILE')),
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
110 ('r', 'rev', [], _('analyze specified revisions'), _('REV'))],
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
111 _('hg analyze'), optionalrepo=True)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
112 def analyze(ui, repo, *revs, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
113 '''create a simple model of a repository to use for later synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
114
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
115 This command examines every changeset in the given range (or all
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
116 of history if none are specified) and creates a simple statistical
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
117 model of the history of the repository. It also measures the directory
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
118 structure of the repository as checked out.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
119
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
120 The model is written out to a JSON file, and can be used by
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
121 :hg:`synthesize` to create or augment a repository with synthetic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
122 commits that have a structure that is statistically similar to the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
123 analyzed repository.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
124 '''
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
125 root = repo.root
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
126 if not root.endswith(os.path.sep):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
127 root += os.path.sep
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
128
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
129 revs = list(revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
130 revs.extend(opts['rev'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
131 if not revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
132 revs = [':']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
133
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
134 output = opts['output']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
135 if not output:
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
136 output = os.path.basename(root) + '.json'
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
137
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
138 if output == '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
139 fp = sys.stdout
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
140 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
141 fp = open(output, 'w')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
142
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
143 # Always obtain file counts of each directory in the given root directory.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
144 def onerror(e):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
145 ui.warn(_('error walking directory structure: %s\n') % e)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
146
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
147 dirs = {}
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
148 rootprefixlen = len(root)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
149 for dirpath, dirnames, filenames in os.walk(root, onerror=onerror):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
150 dirpathfromroot = dirpath[rootprefixlen:]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
151 dirs[dirpathfromroot] = len(filenames)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
152 if '.hg' in dirnames:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
153 dirnames.remove('.hg')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
154
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
155 lineschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
156 children = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
157 p1distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
158 p2distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
159 linesinfilesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
160 fileschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
161 filesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
162 filesremoved = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
163 linelengths = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
164 interarrival = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
165 parents = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
166 dirsadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
167 tzoffset = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
168
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
169 # If a mercurial repo is available, also model the commit history.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
170 if repo:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
171 revs = scmutil.revrange(repo, revs)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
172 revs.sort()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
173
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
174 progress = ui.progress
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
175 _analyzing = _('analyzing')
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
176 _changesets = _('changesets')
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
177 _total = len(revs)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
178
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
179 for i, rev in enumerate(revs):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
180 progress(_analyzing, i, unit=_changesets, total=_total)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
181 ctx = repo[rev]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
182 pl = ctx.parents()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
183 pctx = pl[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
184 prev = pctx.rev()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
185 children[prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
186 p1distance[rev - prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
187 parents[len(pl)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
188 tzoffset[ctx.date()[1]] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
189 if len(pl) > 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
190 p2distance[rev - pl[1].rev()] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
191 if prev == rev - 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
192 lastctx = pctx
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
193 else:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
194 lastctx = repo[rev - 1]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
195 if lastctx.rev() != nullrev:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
196 timedelta = ctx.date()[0] - lastctx.date()[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
197 interarrival[roundto(timedelta, 300)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
198 diff = sum((d.splitlines() for d in ctx.diff(pctx, git=True)), [])
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
199 fileadds, diradds, fileremoves, filechanges = 0, 0, 0, 0
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
200 for filename, mar, lineadd, lineremove, isbin in parsegitdiff(diff):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
201 if isbin:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
202 continue
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
203 added = sum(lineadd.itervalues(), 0)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
204 if mar == 'm':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
205 if added and lineremove:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
206 lineschanged[roundto(added, 5),
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
207 roundto(lineremove, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
208 filechanges += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
209 elif mar == 'a':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
210 fileadds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
211 if '/' in filename:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
212 filedir = filename.rsplit('/', 1)[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
213 if filedir not in pctx.dirs():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
214 diradds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
215 linesinfilesadded[roundto(added, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
216 elif mar == 'r':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
217 fileremoves += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
218 for length, count in lineadd.iteritems():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
219 linelengths[length] += count
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
220 fileschanged[filechanges] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
221 filesadded[fileadds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
222 dirsadded[diradds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
223 filesremoved[fileremoves] += 1
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
224
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
225 invchildren = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
226
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
227 for rev, count in children.iteritems():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
228 invchildren[count] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
229
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
230 if output != '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
231 ui.status(_('writing output to %s\n') % output)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
232
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
233 def pronk(d):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
234 return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
235
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
236 json.dump({'revs': len(revs),
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
237 'initdirs': pronk(dirs),
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
238 'lineschanged': pronk(lineschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
239 'children': pronk(invchildren),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
240 'fileschanged': pronk(fileschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
241 'filesadded': pronk(filesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
242 'linesinfilesadded': pronk(linesinfilesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
243 'dirsadded': pronk(dirsadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
244 'filesremoved': pronk(filesremoved),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
245 'linelengths': pronk(linelengths),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
246 'parents': pronk(parents),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
247 'p1distance': pronk(p1distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
248 'p2distance': pronk(p2distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
249 'interarrival': pronk(interarrival),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
250 'tzoffset': pronk(tzoffset),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
251 },
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
252 fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
253 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
254
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
255 @command('synthesize',
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
256 [('c', 'count', 0, _('create given number of commits'), _('COUNT')),
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
257 ('', 'dict', '', _('path to a dictionary of words'), _('FILE')),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
258 ('', 'initfiles', 0, _('initial file count to create'), _('COUNT'))],
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
259 _('hg synthesize [OPTION].. DESCFILE'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
260 def synthesize(ui, repo, descpath, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
261 '''synthesize commits based on a model of an existing repository
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
262
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
263 The model must have been generated by :hg:`analyze`. Commits will
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
264 be generated randomly according to the probabilities described in
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
265 the model. If --initfiles is set, the repository will be seeded with
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
266 the given number files following the modeled repository's directory
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
267 structure.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
268
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
269 When synthesizing new content, commit descriptions, and user
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
270 names, words will be chosen randomly from a dictionary that is
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
271 presumed to contain one word per line. Use --dict to specify the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
272 path to an alternate dictionary to use.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
273 '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
274 try:
17887
0e2846b2482c url: use open and not url.open for local files (issue3624)
Siddharth Agarwal <sid0@fb.com>
parents: 17734
diff changeset
275 fp = hg.openpath(ui, descpath)
25660
328739ea70c3 global: mass rewrite to use modern exception syntax
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25186
diff changeset
276 except Exception as err:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25660
diff changeset
277 raise error.Abort('%s: %s' % (descpath, err[0].strerror))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
278 desc = json.load(fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
279 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
280
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
281 def cdf(l):
18047
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
282 if not l:
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
283 return [], []
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
284 vals, probs = zip(*sorted(l, key=lambda x: x[1], reverse=True))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
285 t = float(sum(probs, 0))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
286 s, cdfs = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
287 for v in probs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
288 s += v
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
289 cdfs.append(s / t)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
290 return vals, cdfs
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
291
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
292 lineschanged = cdf(desc['lineschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
293 fileschanged = cdf(desc['fileschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
294 filesadded = cdf(desc['filesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
295 dirsadded = cdf(desc['dirsadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
296 filesremoved = cdf(desc['filesremoved'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
297 linelengths = cdf(desc['linelengths'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
298 parents = cdf(desc['parents'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
299 p1distance = cdf(desc['p1distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
300 p2distance = cdf(desc['p2distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
301 interarrival = cdf(desc['interarrival'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
302 linesinfilesadded = cdf(desc['linesinfilesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
303 tzoffset = cdf(desc['tzoffset'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
304
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
305 dictfile = opts.get('dict') or '/usr/share/dict/words'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
306 try:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
307 fp = open(dictfile, 'rU')
25660
328739ea70c3 global: mass rewrite to use modern exception syntax
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25186
diff changeset
308 except IOError as err:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25660
diff changeset
309 raise error.Abort('%s: %s' % (dictfile, err.strerror))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
310 words = fp.read().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
311 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
312
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
313 initdirs = {}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
314 if desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
315 for k, v in desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
316 initdirs[k.encode('utf-8').replace('.hg', '_hg')] = v
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
317 initdirs = renamedirs(initdirs, words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
318 initdirscdf = cdf(initdirs)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
319
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
320 def pick(cdf):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
321 return cdf[0][bisect.bisect_left(cdf[1], random.random())]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
322
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
323 def pickpath():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
324 return os.path.join(pick(initdirscdf), random.choice(words))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
325
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
326 def makeline(minimum=0):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
327 total = max(minimum, pick(linelengths))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
328 c, l = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
329 while c < total:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
330 w = random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
331 c += len(w) + 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
332 l.append(w)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
333 return ' '.join(l)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
334
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
335 wlock = repo.wlock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
336 lock = repo.lock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
337
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
338 nevertouch = set(('.hgsub', '.hgignore', '.hgtags'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
339
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
340 progress = ui.progress
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
341 _synthesizing = _('synthesizing')
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
342 _files = _('initial files')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
343 _changesets = _('changesets')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
344
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
345 # Synthesize a single initial revision adding files to the repo according
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
346 # to the modeled directory structure.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
347 initcount = int(opts['initfiles'])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
348 if initcount and initdirs:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
349 pctx = repo[None].parents()[0]
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
350 dirs = set(pctx.dirs())
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
351 files = {}
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
352
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
353 def validpath(path):
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
354 # Don't pick filenames which are already directory names.
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
355 if path in dirs:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
356 return False
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
357 # Don't pick directories which were used as file names.
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
358 while path:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
359 if path in files:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
360 return False
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
361 path = os.path.dirname(path)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
362 return True
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
363
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
364 for i in xrange(0, initcount):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
365 ui.progress(_synthesizing, i, unit=_files, total=initcount)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
366
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
367 path = pickpath()
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
368 while not validpath(path):
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
369 path = pickpath()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
370 data = '%s contents\n' % path
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
371 files[path] = context.memfilectx(repo, path, data)
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
372 dir = os.path.dirname(path)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
373 while dir and dir not in dirs:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
374 dirs.add(dir)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
375 dir = os.path.dirname(dir)
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
376
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
377 def filectxfn(repo, memctx, path):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
378 return files[path]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
379
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
380 ui.progress(_synthesizing, None)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
381 message = 'synthesized wide repo with %d files' % (len(files),)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
382 mc = context.memctx(repo, [pctx.node(), nullid], message,
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
383 files.iterkeys(), filectxfn, ui.username(),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
384 '%d %d' % util.makedate())
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
385 initnode = mc.commit()
24306
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
386 if ui.debugflag:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
387 hexfn = hex
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
388 else:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
389 hexfn = short
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
390 ui.status(_('added commit %s with %d files\n')
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
391 % (hexfn(initnode), len(files)))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
392
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
393 # Synthesize incremental revisions to the repository, adding repo depth.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
394 count = int(opts['count'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
395 heads = set(map(repo.changelog.rev, repo.heads()))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
396 for i in xrange(count):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
397 progress(_synthesizing, i, unit=_changesets, total=count)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
398
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
399 node = repo.changelog.node
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
400 revs = len(repo)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
401
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
402 def pickhead(heads, distance):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
403 if heads:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
404 lheads = sorted(heads)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
405 rev = revs - min(pick(distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
406 if rev < lheads[-1]:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
407 rev = lheads[bisect.bisect_left(lheads, rev)]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
408 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
409 rev = lheads[-1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
410 return rev, node(rev)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
411 return nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
412
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
413 r1 = revs - min(pick(p1distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
414 p1 = node(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
415
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
416 # the number of heads will grow without bound if we use a pure
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
417 # model, so artificially constrain their proliferation
22472
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
418 toomanyheads = len(heads) > random.randint(1, 20)
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
419 if p2distance[0] and (pick(parents) == 2 or toomanyheads):
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
420 r2, p2 = pickhead(heads.difference([r1]), p2distance)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
421 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
422 r2, p2 = nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
423
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
424 pl = [p1, p2]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
425 pctx = repo[r1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
426 mf = pctx.manifest()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
427 mfk = mf.keys()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
428 changes = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
429 if mfk:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
430 for __ in xrange(pick(fileschanged)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
431 for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
432 fctx = pctx.filectx(random.choice(mfk))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
433 path = fctx.path()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
434 if not (path in nevertouch or fctx.isbinary() or
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
435 'l' in fctx.flags()):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
436 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
437 lines = fctx.data().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
438 add, remove = pick(lineschanged)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
439 for __ in xrange(remove):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
440 if not lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
441 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
442 del lines[random.randrange(0, len(lines))]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
443 for __ in xrange(add):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
444 lines.insert(random.randint(0, len(lines)), makeline())
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
445 path = fctx.path()
21689
503bb3af70fe memfilectx: call super.__init__ instead of duplicating code
Sean Farley <sean.michael.farley@gmail.com>
parents: 20672
diff changeset
446 changes[path] = context.memfilectx(repo, path,
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
447 '\n'.join(lines) + '\n')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
448 for __ in xrange(pick(filesremoved)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
449 path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
450 for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
451 path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
452 if path not in changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
453 changes[path] = None
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
454 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
455 if filesadded:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
456 dirs = list(pctx.dirs())
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
457 dirs.insert(0, '')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
458 for __ in xrange(pick(filesadded)):
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
459 pathstr = ''
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
460 while pathstr in dirs:
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
461 path = [random.choice(dirs)]
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
462 if pick(dirsadded):
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
463 path.append(random.choice(words))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
464 path.append(random.choice(words))
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
465 pathstr = '/'.join(filter(None, path))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
466 data = '\n'.join(makeline()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
467 for __ in xrange(pick(linesinfilesadded))) + '\n'
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
468 changes[pathstr] = context.memfilectx(repo, pathstr, data)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
469 def filectxfn(repo, memctx, path):
22446
054ec0212718 contrib/synthrepo: return None to delete files on commit, don't raise IOError
Mike Edgar <adgar@google.com>
parents: 21689
diff changeset
470 return changes[path]
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
471 if not changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
472 continue
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
473 if revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
474 date = repo['tip'].date()[0] + pick(interarrival)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
475 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
476 date = time.time() - (86400 * count)
23234
944d6cfbe166 synthrepo: synthesized dates must be positive, fit in 32-bit signed ints
Mike Edgar <adgar@google.com>
parents: 22709
diff changeset
477 # dates in mercurial must be positive, fit in 32-bit signed integers.
944d6cfbe166 synthrepo: synthesized dates must be positive, fit in 32-bit signed ints
Mike Edgar <adgar@google.com>
parents: 22709
diff changeset
478 date = min(0x7fffffff, max(0, date))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
479 user = random.choice(words) + '@' + random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
480 mc = context.memctx(repo, pl, makeline(minimum=2),
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
481 sorted(changes.iterkeys()),
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
482 filectxfn, user, '%d %d' % (date, pick(tzoffset)))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
483 newnode = mc.commit()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
484 heads.add(repo.changelog.rev(newnode))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
485 heads.discard(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
486 heads.discard(r2)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
487
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
488 lock.release()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
489 wlock.release()
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
490
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
491 def renamedirs(dirs, words):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
492 '''Randomly rename the directory names in the per-dir file count dict.'''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
493 wordgen = itertools.cycle(words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
494 replacements = {'': ''}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
495 def rename(dirpath):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
496 '''Recursively rename the directory and all path prefixes.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
497
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
498 The mapping from path to renamed path is stored for all path prefixes
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
499 as in dynamic programming, ensuring linear runtime and consistent
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
500 renaming regardless of iteration order through the model.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
501 '''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
502 if dirpath in replacements:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
503 return replacements[dirpath]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
504 head, _ = os.path.split(dirpath)
24306
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
505 if head:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
506 head = rename(head)
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
507 else:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
508 head = ''
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
509 renamed = os.path.join(head, wordgen.next())
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
510 replacements[dirpath] = renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
511 return renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
512 result = []
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
513 for dirpath, count in dirs.iteritems():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
514 result.append([rename(dirpath.lstrip(os.sep)), count])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
515 return result