annotate contrib/synthrepo.py @ 42063:912d82daeda3

perf: make perf.run-limits code work with Python 3 We need b'' because perf.py isn't run through the source transformer. We need to cast the exception to bytes using pycompat.bytestr() because ValueError can't be %s formatted due to built-in exceptions lacking __bytes__. We need to pycompat.sysstr() before the float() and int() cast so the ValueError message doesn't have b'' in it. Even with that, it looks like the error message for the ValueError for float casts added quotes, so we need to account for that in test output. Differential Revision: https://phab.mercurial-scm.org/D6200
author Gregory Szorc <gregory.szorc@gmail.com>
date Thu, 04 Apr 2019 17:47:25 -0700
parents 2ff8994ac71d
children c07812bdd568
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
1 # synthrepo.py - repo synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
2 #
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
3 # Copyright 2012 Facebook
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
4 #
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
6 # GNU General Public License version 2 or any later version.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
7
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
8 '''synthesize structurally interesting change history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
9
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
10 This extension is useful for creating a repository with properties
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
11 that are statistically similar to an existing repository. During
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
12 analysis, a simple probability table is constructed from the history
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
13 of an existing repository. During synthesis, these properties are
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
14 reconstructed.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
15
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
16 Properties that are analyzed and synthesized include the following:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
17
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
18 - Lines added or removed when an existing file is modified
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
19 - Number and sizes of files added
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
20 - Number of files removed
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
21 - Line lengths
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
22 - Topological distance to parent changeset(s)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
23 - Probability of a commit being a merge
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
24 - Probability of a newly added file being added to a new directory
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
25 - Interarrival time, and time zone, of commits
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
26 - Number of files in each directory
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
27
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
28 A few obvious properties that are not currently handled realistically:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
29
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
30 - Merges are treated as regular commits with two parents, which is not
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
31 realistic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
32 - Modifications are not treated as operations on hunks of lines, but
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
33 as insertions and deletions of randomly chosen single lines
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
34 - Committer ID (always random)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
35 - Executability of files
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
36 - Symlinks and binary files are ignored
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
37 '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
38
28563
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
39 from __future__ import absolute_import
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
40 import bisect
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
41 import collections
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
42 import itertools
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
43 import json
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
44 import os
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
45 import random
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
46 import sys
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
47 import time
29205
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
48
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
49 from mercurial.i18n import _
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
50 from mercurial.node import (
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
51 nullid,
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
52 nullrev,
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
53 short,
a0939666b836 py3: move up symbol imports to enforce import-checker rules
Yuya Nishihara <yuya@tcha.org>
parents: 28563
diff changeset
54 )
28563
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
55 from mercurial import (
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
56 context,
38588
1c93e0237a24 diffutil: move the module out of utils package
Yuya Nishihara <yuya@tcha.org>
parents: 38587
diff changeset
57 diffutil,
28563
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
58 error,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
59 hg,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
60 patch,
32337
46ba2cdda476 registrar: move cmdutil.command to registrar module (API)
Yuya Nishihara <yuya@tcha.org>
parents: 32291
diff changeset
61 registrar,
28563
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
62 scmutil,
62250a48dc7f contrib: synthrepo use absolute_import
Pulkit Goyal <7895pulkit@gmail.com>
parents: 26587
diff changeset
63 )
38567
97469c5430cd synthrepo: pass a diffopts object to context.diff
Boris Feld <boris.feld@octobus.net>
parents: 38519
diff changeset
64 from mercurial.utils import (
97469c5430cd synthrepo: pass a diffopts object to context.diff
Boris Feld <boris.feld@octobus.net>
parents: 38519
diff changeset
65 dateutil,
97469c5430cd synthrepo: pass a diffopts object to context.diff
Boris Feld <boris.feld@octobus.net>
parents: 38519
diff changeset
66 )
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
67
29841
d5883fd055c6 extensions: change magic "shipped with hg" string
Augie Fackler <augie@google.com>
parents: 29216
diff changeset
68 # Note for extension authors: ONLY specify testedwith = 'ships-with-hg-core' for
25186
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
69 # extensions which SHIP WITH MERCURIAL. Non-mainline extensions should
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
70 # be specifying the version(s) of Mercurial they are tested with, or
80c5b2666a96 extensions: document that `testedwith = 'internal'` is special
Augie Fackler <augie@google.com>
parents: 24306
diff changeset
71 # leave the attribute unspecified.
29841
d5883fd055c6 extensions: change magic "shipped with hg" string
Augie Fackler <augie@google.com>
parents: 29216
diff changeset
72 testedwith = 'ships-with-hg-core'
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
73
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
74 cmdtable = {}
32337
46ba2cdda476 registrar: move cmdutil.command to registrar module (API)
Yuya Nishihara <yuya@tcha.org>
parents: 32291
diff changeset
75 command = registrar.command(cmdtable)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
76
32291
bd872f64a8ba cleanup: use set literals
Martin von Zweigbergk <martinvonz@google.com>
parents: 29841
diff changeset
77 newfile = {'new fi', 'rename', 'copy f', 'copy t'}
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
78
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
79 def zerodict():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
80 return collections.defaultdict(lambda: 0)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
81
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
82 def roundto(x, k):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
83 if x > k * 2:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
84 return int(round(x / float(k)) * k)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
85 return int(round(x))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
86
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
87 def parsegitdiff(lines):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
88 filename, mar, lineadd, lineremove = None, None, zerodict(), 0
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
89 binary = False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
90 for line in lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
91 start = line[:6]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
92 if start == 'diff -':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
93 if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
94 yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
95 mar, lineadd, lineremove, binary = 'm', zerodict(), 0, False
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
96 filename = patch.gitre.match(line).group(1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
97 elif start in newfile:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
98 mar = 'a'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
99 elif start == 'GIT bi':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
100 binary = True
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
101 elif start == 'delete':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
102 mar = 'r'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
103 elif start:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
104 s = start[0]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
105 if s == '-' and not line.startswith('--- '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
106 lineremove += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
107 elif s == '+' and not line.startswith('+++ '):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
108 lineadd[roundto(len(line) - 1, 5)] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
109 if filename:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
110 yield filename, mar, lineadd, lineremove, binary
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
111
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
112 @command('analyze',
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
113 [('o', 'output', '', _('write output to given file'), _('FILE')),
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
114 ('r', 'rev', [], _('analyze specified revisions'), _('REV'))],
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
115 _('hg analyze'), optionalrepo=True)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
116 def analyze(ui, repo, *revs, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
117 '''create a simple model of a repository to use for later synthesis
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
118
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
119 This command examines every changeset in the given range (or all
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
120 of history if none are specified) and creates a simple statistical
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
121 model of the history of the repository. It also measures the directory
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
122 structure of the repository as checked out.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
123
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
124 The model is written out to a JSON file, and can be used by
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
125 :hg:`synthesize` to create or augment a repository with synthetic
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
126 commits that have a structure that is statistically similar to the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
127 analyzed repository.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
128 '''
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
129 root = repo.root
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
130 if not root.endswith(os.path.sep):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
131 root += os.path.sep
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
132
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
133 revs = list(revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
134 revs.extend(opts['rev'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
135 if not revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
136 revs = [':']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
137
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
138 output = opts['output']
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
139 if not output:
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
140 output = os.path.basename(root) + '.json'
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
141
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
142 if output == '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
143 fp = sys.stdout
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
144 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
145 fp = open(output, 'w')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
146
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
147 # Always obtain file counts of each directory in the given root directory.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
148 def onerror(e):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
149 ui.warn(_('error walking directory structure: %s\n') % e)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
150
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
151 dirs = {}
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
152 rootprefixlen = len(root)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
153 for dirpath, dirnames, filenames in os.walk(root, onerror=onerror):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
154 dirpathfromroot = dirpath[rootprefixlen:]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
155 dirs[dirpathfromroot] = len(filenames)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
156 if '.hg' in dirnames:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
157 dirnames.remove('.hg')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
158
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
159 lineschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
160 children = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
161 p1distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
162 p2distance = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
163 linesinfilesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
164 fileschanged = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
165 filesadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
166 filesremoved = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
167 linelengths = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
168 interarrival = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
169 parents = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
170 dirsadded = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
171 tzoffset = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
172
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
173 # If a mercurial repo is available, also model the commit history.
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
174 if repo:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
175 revs = scmutil.revrange(repo, revs)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
176 revs.sort()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
177
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
178 progress = ui.makeprogress(_('analyzing'), unit=_('changesets'),
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
179 total=len(revs))
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
180 for i, rev in enumerate(revs):
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
181 progress.update(i)
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
182 ctx = repo[rev]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
183 pl = ctx.parents()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
184 pctx = pl[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
185 prev = pctx.rev()
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
186 children[prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
187 p1distance[rev - prev] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
188 parents[len(pl)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
189 tzoffset[ctx.date()[1]] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
190 if len(pl) > 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
191 p2distance[rev - pl[1].rev()] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
192 if prev == rev - 1:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
193 lastctx = pctx
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
194 else:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
195 lastctx = repo[rev - 1]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
196 if lastctx.rev() != nullrev:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
197 timedelta = ctx.date()[0] - lastctx.date()[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
198 interarrival[roundto(timedelta, 300)] += 1
38587
b62000a28812 diffutil: remove diffopts() in favor of diffallopts()
Yuya Nishihara <yuya@tcha.org>
parents: 38584
diff changeset
199 diffopts = diffutil.diffallopts(ui, {'git': True})
38519
4455e5d4d59c context: explicitly take diffopts in `context.diff` (API)
Boris Feld <boris.feld@octobus.net>
parents: 38409
diff changeset
200 diff = sum((d.splitlines()
38567
97469c5430cd synthrepo: pass a diffopts object to context.diff
Boris Feld <boris.feld@octobus.net>
parents: 38519
diff changeset
201 for d in ctx.diff(pctx, opts=diffopts)), [])
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
202 fileadds, diradds, fileremoves, filechanges = 0, 0, 0, 0
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
203 for filename, mar, lineadd, lineremove, isbin in parsegitdiff(diff):
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
204 if isbin:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
205 continue
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
206 added = sum(lineadd.itervalues(), 0)
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
207 if mar == 'm':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
208 if added and lineremove:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
209 lineschanged[roundto(added, 5),
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
210 roundto(lineremove, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
211 filechanges += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
212 elif mar == 'a':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
213 fileadds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
214 if '/' in filename:
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
215 filedir = filename.rsplit('/', 1)[0]
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
216 if filedir not in pctx.dirs():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
217 diradds += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
218 linesinfilesadded[roundto(added, 5)] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
219 elif mar == 'r':
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
220 fileremoves += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
221 for length, count in lineadd.iteritems():
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
222 linelengths[length] += count
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
223 fileschanged[filechanges] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
224 filesadded[fileadds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
225 dirsadded[diradds] += 1
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
226 filesremoved[fileremoves] += 1
38409
ce65c25dc161 synthrepo: close progress topics
Martin von Zweigbergk <martinvonz@google.com>
parents: 38408
diff changeset
227 progress.complete()
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
228
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
229 invchildren = zerodict()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
230
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
231 for rev, count in children.iteritems():
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
232 invchildren[count] += 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
233
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
234 if output != '-':
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
235 ui.status(_('writing output to %s\n') % output)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
236
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
237 def pronk(d):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
238 return sorted(d.iteritems(), key=lambda x: x[1], reverse=True)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
239
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
240 json.dump({'revs': len(revs),
22709
889789a2ca9f contrib/synthrepo: walk a repo's directory structure during analysis
Mike Edgar <adgar@google.com>
parents: 22708
diff changeset
241 'initdirs': pronk(dirs),
20672
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
242 'lineschanged': pronk(lineschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
243 'children': pronk(invchildren),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
244 'fileschanged': pronk(fileschanged),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
245 'filesadded': pronk(filesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
246 'linesinfilesadded': pronk(linesinfilesadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
247 'dirsadded': pronk(dirsadded),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
248 'filesremoved': pronk(filesremoved),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
249 'linelengths': pronk(linelengths),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
250 'parents': pronk(parents),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
251 'p1distance': pronk(p1distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
252 'p2distance': pronk(p2distance),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
253 'interarrival': pronk(interarrival),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
254 'tzoffset': pronk(tzoffset),
05e58b08fdfe synthrepo: move from dict() construction to {} literals
Augie Fackler <raf@durin42.com>
parents: 19322
diff changeset
255 },
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
256 fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
257 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
258
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
259 @command('synthesize',
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
260 [('c', 'count', 0, _('create given number of commits'), _('COUNT')),
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
261 ('', 'dict', '', _('path to a dictionary of words'), _('FILE')),
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
262 ('', 'initfiles', 0, _('initial file count to create'), _('COUNT'))],
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
263 _('hg synthesize [OPTION].. DESCFILE'))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
264 def synthesize(ui, repo, descpath, **opts):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
265 '''synthesize commits based on a model of an existing repository
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
266
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
267 The model must have been generated by :hg:`analyze`. Commits will
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
268 be generated randomly according to the probabilities described in
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
269 the model. If --initfiles is set, the repository will be seeded with
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
270 the given number files following the modeled repository's directory
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
271 structure.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
272
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
273 When synthesizing new content, commit descriptions, and user
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
274 names, words will be chosen randomly from a dictionary that is
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
275 presumed to contain one word per line. Use --dict to specify the
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
276 path to an alternate dictionary to use.
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
277 '''
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
278 try:
17887
0e2846b2482c url: use open and not url.open for local files (issue3624)
Siddharth Agarwal <sid0@fb.com>
parents: 17734
diff changeset
279 fp = hg.openpath(ui, descpath)
25660
328739ea70c3 global: mass rewrite to use modern exception syntax
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25186
diff changeset
280 except Exception as err:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25660
diff changeset
281 raise error.Abort('%s: %s' % (descpath, err[0].strerror))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
282 desc = json.load(fp)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
283 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
284
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
285 def cdf(l):
18047
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
286 if not l:
9196638b08ce synthrepo: do not crash if a list is empty
Bryan O'Sullivan <bryano@fb.com>
parents: 17887
diff changeset
287 return [], []
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
288 vals, probs = zip(*sorted(l, key=lambda x: x[1], reverse=True))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
289 t = float(sum(probs, 0))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
290 s, cdfs = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
291 for v in probs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
292 s += v
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
293 cdfs.append(s / t)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
294 return vals, cdfs
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
295
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
296 lineschanged = cdf(desc['lineschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
297 fileschanged = cdf(desc['fileschanged'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
298 filesadded = cdf(desc['filesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
299 dirsadded = cdf(desc['dirsadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
300 filesremoved = cdf(desc['filesremoved'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
301 linelengths = cdf(desc['linelengths'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
302 parents = cdf(desc['parents'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
303 p1distance = cdf(desc['p1distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
304 p2distance = cdf(desc['p2distance'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
305 interarrival = cdf(desc['interarrival'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
306 linesinfilesadded = cdf(desc['linesinfilesadded'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
307 tzoffset = cdf(desc['tzoffset'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
308
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
309 dictfile = opts.get('dict') or '/usr/share/dict/words'
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
310 try:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
311 fp = open(dictfile, 'rU')
25660
328739ea70c3 global: mass rewrite to use modern exception syntax
Gregory Szorc <gregory.szorc@gmail.com>
parents: 25186
diff changeset
312 except IOError as err:
26587
56b2bcea2529 error: get Abort from 'error' instead of 'util'
Pierre-Yves David <pierre-yves.david@fb.com>
parents: 25660
diff changeset
313 raise error.Abort('%s: %s' % (dictfile, err.strerror))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
314 words = fp.read().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
315 fp.close()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
316
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
317 initdirs = {}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
318 if desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
319 for k, v in desc['initdirs']:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
320 initdirs[k.encode('utf-8').replace('.hg', '_hg')] = v
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
321 initdirs = renamedirs(initdirs, words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
322 initdirscdf = cdf(initdirs)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
323
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
324 def pick(cdf):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
325 return cdf[0][bisect.bisect_left(cdf[1], random.random())]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
326
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
327 def pickpath():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
328 return os.path.join(pick(initdirscdf), random.choice(words))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
329
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
330 def makeline(minimum=0):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
331 total = max(minimum, pick(linelengths))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
332 c, l = 0, []
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
333 while c < total:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
334 w = random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
335 c += len(w) + 1
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
336 l.append(w)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
337 return ' '.join(l)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
338
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
339 wlock = repo.wlock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
340 lock = repo.lock()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
341
32291
bd872f64a8ba cleanup: use set literals
Martin von Zweigbergk <martinvonz@google.com>
parents: 29841
diff changeset
342 nevertouch = {'.hgsub', '.hgignore', '.hgtags'}
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
343
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
344 _synthesizing = _('synthesizing')
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
345 _files = _('initial files')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
346 _changesets = _('changesets')
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
347
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
348 # Synthesize a single initial revision adding files to the repo according
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
349 # to the modeled directory structure.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
350 initcount = int(opts['initfiles'])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
351 if initcount and initdirs:
41398
2ff8994ac71d cleanup: use repo['.'] instead of repo[None].p1()
Martin von Zweigbergk <martinvonz@google.com>
parents: 41397
diff changeset
352 pctx = repo['.']
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
353 dirs = set(pctx.dirs())
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
354 files = {}
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
355
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
356 def validpath(path):
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
357 # Don't pick filenames which are already directory names.
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
358 if path in dirs:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
359 return False
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
360 # Don't pick directories which were used as file names.
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
361 while path:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
362 if path in files:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
363 return False
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
364 path = os.path.dirname(path)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
365 return True
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
366
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
367 progress = ui.makeprogress(_synthesizing, unit=_files, total=initcount)
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
368 for i in xrange(0, initcount):
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
369 progress.update(i)
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
370
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
371 path = pickpath()
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
372 while not validpath(path):
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
373 path = pickpath()
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
374 data = '%s contents\n' % path
35398
2123e7629ec0 synthrepo: create filectx instance in 'filectxfn' callback
Martin von Zweigbergk <martinvonz@google.com>
parents: 34023
diff changeset
375 files[path] = data
23778
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
376 dir = os.path.dirname(path)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
377 while dir and dir not in dirs:
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
378 dirs.add(dir)
a5dbec255f14 synthrepo: new filenames must not also be new directories, and vice-versa
Mike Edgar <adgar@google.com>
parents: 23235
diff changeset
379 dir = os.path.dirname(dir)
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
380
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
381 def filectxfn(repo, memctx, path):
35400
8a0cac20a1ad memfilectx: make changectx argument mandatory in constructor (API)
Martin von Zweigbergk <martinvonz@google.com>
parents: 35398
diff changeset
382 return context.memfilectx(repo, memctx, path, files[path])
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
383
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
384 progress.complete()
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
385 message = 'synthesized wide repo with %d files' % (len(files),)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
386 mc = context.memctx(repo, [pctx.node(), nullid], message,
36299
238646784294 py3: use default dict iterator instead of iterkeys
Augie Fackler <augie@google.com>
parents: 35400
diff changeset
387 files, filectxfn, ui.username(),
36607
c6061cadb400 util: extract all date-related utils in utils/dateutil module
Boris Feld <boris.feld@octobus.net>
parents: 36299
diff changeset
388 '%d %d' % dateutil.makedate())
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
389 initnode = mc.commit()
24306
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
390 if ui.debugflag:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
391 hexfn = hex
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
392 else:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
393 hexfn = short
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
394 ui.status(_('added commit %s with %d files\n')
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
395 % (hexfn(initnode), len(files)))
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
396
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
397 # Synthesize incremental revisions to the repository, adding repo depth.
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
398 count = int(opts['count'])
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
399 heads = set(map(repo.changelog.rev, repo.heads()))
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
400 progress = ui.makeprogress(_synthesizing, unit=_changesets, total=count)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
401 for i in xrange(count):
38408
6540333acb95 synthrepo: use progress helper
Martin von Zweigbergk <martinvonz@google.com>
parents: 36607
diff changeset
402 progress.update(i)
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
403
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
404 node = repo.changelog.node
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
405 revs = len(repo)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
406
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
407 def pickhead(heads, distance):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
408 if heads:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
409 lheads = sorted(heads)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
410 rev = revs - min(pick(distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
411 if rev < lheads[-1]:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
412 rev = lheads[bisect.bisect_left(lheads, rev)]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
413 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
414 rev = lheads[-1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
415 return rev, node(rev)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
416 return nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
417
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
418 r1 = revs - min(pick(p1distance), revs)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
419 p1 = node(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
420
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
421 # the number of heads will grow without bound if we use a pure
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
422 # model, so artificially constrain their proliferation
22472
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
423 toomanyheads = len(heads) > random.randint(1, 20)
2e2577b0ccbe contrib/synthrepo: only generate 2 parents if model contains merges
Mike Edgar <adgar@google.com>
parents: 22446
diff changeset
424 if p2distance[0] and (pick(parents) == 2 or toomanyheads):
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
425 r2, p2 = pickhead(heads.difference([r1]), p2distance)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
426 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
427 r2, p2 = nullrev, nullid
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
428
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
429 pl = [p1, p2]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
430 pctx = repo[r1]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
431 mf = pctx.manifest()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
432 mfk = mf.keys()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
433 changes = {}
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
434 if mfk:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
435 for __ in xrange(pick(fileschanged)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
436 for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
437 fctx = pctx.filectx(random.choice(mfk))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
438 path = fctx.path()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
439 if not (path in nevertouch or fctx.isbinary() or
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
440 'l' in fctx.flags()):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
441 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
442 lines = fctx.data().splitlines()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
443 add, remove = pick(lineschanged)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
444 for __ in xrange(remove):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
445 if not lines:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
446 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
447 del lines[random.randrange(0, len(lines))]
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
448 for __ in xrange(add):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
449 lines.insert(random.randint(0, len(lines)), makeline())
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
450 path = fctx.path()
35398
2123e7629ec0 synthrepo: create filectx instance in 'filectxfn' callback
Martin von Zweigbergk <martinvonz@google.com>
parents: 34023
diff changeset
451 changes[path] = '\n'.join(lines) + '\n'
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
452 for __ in xrange(pick(filesremoved)):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
453 for __ in xrange(10):
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
454 path = random.choice(mfk)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
455 if path not in changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
456 break
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
457 if filesadded:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
458 dirs = list(pctx.dirs())
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
459 dirs.insert(0, '')
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
460 for __ in xrange(pick(filesadded)):
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
461 pathstr = ''
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
462 while pathstr in dirs:
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
463 path = [random.choice(dirs)]
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
464 if pick(dirsadded):
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
465 path.append(random.choice(words))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
466 path.append(random.choice(words))
23235
4cdc3e2810b9 synthrepo: when adding files, ensure new path is not a directory
Mike Edgar <adgar@google.com>
parents: 23234
diff changeset
467 pathstr = '/'.join(filter(None, path))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
468 data = '\n'.join(makeline()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
469 for __ in xrange(pick(linesinfilesadded))) + '\n'
35398
2123e7629ec0 synthrepo: create filectx instance in 'filectxfn' callback
Martin von Zweigbergk <martinvonz@google.com>
parents: 34023
diff changeset
470 changes[pathstr] = data
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
471 def filectxfn(repo, memctx, path):
35398
2123e7629ec0 synthrepo: create filectx instance in 'filectxfn' callback
Martin von Zweigbergk <martinvonz@google.com>
parents: 34023
diff changeset
472 if path not in changes:
2123e7629ec0 synthrepo: create filectx instance in 'filectxfn' callback
Martin von Zweigbergk <martinvonz@google.com>
parents: 34023
diff changeset
473 return None
35400
8a0cac20a1ad memfilectx: make changectx argument mandatory in constructor (API)
Martin von Zweigbergk <martinvonz@google.com>
parents: 35398
diff changeset
474 return context.memfilectx(repo, memctx, path, changes[path])
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
475 if not changes:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
476 continue
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
477 if revs:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
478 date = repo['tip'].date()[0] + pick(interarrival)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
479 else:
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
480 date = time.time() - (86400 * count)
23234
944d6cfbe166 synthrepo: synthesized dates must be positive, fit in 32-bit signed ints
Mike Edgar <adgar@google.com>
parents: 22709
diff changeset
481 # dates in mercurial must be positive, fit in 32-bit signed integers.
944d6cfbe166 synthrepo: synthesized dates must be positive, fit in 32-bit signed ints
Mike Edgar <adgar@google.com>
parents: 22709
diff changeset
482 date = min(0x7fffffff, max(0, date))
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
483 user = random.choice(words) + '@' + random.choice(words)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
484 mc = context.memctx(repo, pl, makeline(minimum=2),
34023
ba479850c9c7 python3: replace sorted(<dict>.iterkeys()) with sorted(<dict>)
Augie Fackler <raf@durin42.com>
parents: 32337
diff changeset
485 sorted(changes),
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
486 filectxfn, user, '%d %d' % (date, pick(tzoffset)))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
487 newnode = mc.commit()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
488 heads.add(repo.changelog.rev(newnode))
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
489 heads.discard(r1)
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
490 heads.discard(r2)
38409
ce65c25dc161 synthrepo: close progress topics
Martin von Zweigbergk <martinvonz@google.com>
parents: 38408
diff changeset
491 progress.complete()
17734
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
492
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
493 lock.release()
619068c280fd contrib: add a commit synthesizer for reproducing scaling problems
Bryan O'Sullivan <bryano@fb.com>
parents:
diff changeset
494 wlock.release()
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
495
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
496 def renamedirs(dirs, words):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
497 '''Randomly rename the directory names in the per-dir file count dict.'''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
498 wordgen = itertools.cycle(words)
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
499 replacements = {'': ''}
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
500 def rename(dirpath):
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
501 '''Recursively rename the directory and all path prefixes.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
502
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
503 The mapping from path to renamed path is stored for all path prefixes
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
504 as in dynamic programming, ensuring linear runtime and consistent
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
505 renaming regardless of iteration order through the model.
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
506 '''
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
507 if dirpath in replacements:
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
508 return replacements[dirpath]
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
509 head, _ = os.path.split(dirpath)
24306
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
510 if head:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
511 head = rename(head)
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
512 else:
6ddc86eedc3b style: kill ersatz if-else ternary operators
Jordi Gutiérrez Hermoso <jordigh@octave.org>
parents: 23778
diff changeset
513 head = ''
29216
ead25aa27a43 py3: convert to next() function
timeless <timeless@mozdev.org>
parents: 29205
diff changeset
514 renamed = os.path.join(head, next(wordgen))
22708
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
515 replacements[dirpath] = renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
516 return renamed
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
517 result = []
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
518 for dirpath, count in dirs.iteritems():
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
519 result.append([rename(dirpath.lstrip(os.sep)), count])
4c66e70c3488 contrib/synthrepo: generate initial repo contents using directory shape model
Mike Edgar <adgar@google.com>
parents: 22473
diff changeset
520 return result