comparison tests/test-fileset.t @ 38865:899b4c74209c

fileset: combine union of basic patterns into single matcher This appears to improve query performance in a big repository than I thought. Writing less Python in a hot loop, faster computation we gain. $ hg files --cwd mozilla-central --time 'set:a* + b* + c* + d* + e*' (orig) time: real 0.670 secs (user 0.640+0.000 sys 0.030+0.000) (new) time: real 0.210 secs (user 0.180+0.000 sys 0.020+0.000)
author Yuya Nishihara <yuya@tcha.org>
date Sat, 21 Jul 2018 17:19:12 +0900
parents 73731fa8d1bd
children e79a69af1593
comparison
equal deleted inserted replaced
38864:73731fa8d1bd 38865:899b4c74209c
51 (symbol 'a1')) 51 (symbol 'a1'))
52 (kindpat 52 (kindpat
53 (symbol 'glob') 53 (symbol 'glob')
54 (symbol 'b?'))) 54 (symbol 'b?')))
55 * matcher: 55 * matcher:
56 <unionmatcher matchers=[ 56 <patternmatcher patterns='(?:a1(?:/|$)|b.$)'>
57 <patternmatcher patterns='(?:a1(?:/|$))'>,
58 <patternmatcher patterns='(?:b.$)'>]>
59 a1 57 a1
60 b1 58 b1
61 b2 59 b2
62 $ fileset -v --no-show-matcher 'a1 or a2' 60 $ fileset -v --no-show-matcher 'a1 or a2'
63 (or 61 (or
180 (func 178 (func
181 (symbol 'clean') 179 (symbol 'clean')
182 None))) 180 None)))
183 * optimized: 181 * optimized:
184 (or 182 (or
185 (symbol 'a1') 183 (patterns
186 (symbol 'a2') 184 (symbol 'a1')
185 (symbol 'a2'))
187 (and 186 (and
188 (func 187 (func
189 (symbol 'clean') 188 (symbol 'clean')
190 None) 189 None)
191 (func 190 (func
192 (symbol 'grep') 191 (symbol 'grep')
193 (string 'b')))) 192 (string 'b'))))
194 * matcher: 193 * matcher:
195 <unionmatcher matchers=[ 194 <unionmatcher matchers=[
196 <patternmatcher patterns='(?:a1$)'>, 195 <patternmatcher patterns='(?:a1$|a2$)'>,
197 <patternmatcher patterns='(?:a2$)'>,
198 <intersectionmatcher 196 <intersectionmatcher
199 m1=<predicatenmatcher pred=clean>, 197 m1=<predicatenmatcher pred=clean>,
200 m2=<predicatenmatcher pred=grep('b')>>]> 198 m2=<predicatenmatcher pred=grep('b')>>]>
201 a1 199 a1
202 a2 200 a2
203 b1 201 b1
204 b2 202 b2
205 203
204 Union of basic patterns:
205
206 $ fileset -p optimized -s -r. 'a1 or a2 or path:b1'
207 * optimized:
208 (patterns
209 (symbol 'a1')
210 (symbol 'a2')
211 (kindpat
212 (symbol 'path')
213 (symbol 'b1')))
214 * matcher:
215 <patternmatcher patterns='(?:a1$|a2$|b1(?:/|$))'>
216 a1
217 a2
218 b1
219
206 OR expression should be reordered by weight: 220 OR expression should be reordered by weight:
207 221
208 $ fileset -p optimized -s -r. 'grep("a") or a1 or grep("b") or b2' 222 $ fileset -p optimized -s -r. 'grep("a") or a1 or grep("b") or b2'
209 * optimized: 223 * optimized:
210 (or 224 (or
211 (symbol 'a1') 225 (patterns
212 (symbol 'b2') 226 (symbol 'a1')
227 (symbol 'b2'))
213 (func 228 (func
214 (symbol 'grep') 229 (symbol 'grep')
215 (string 'a')) 230 (string 'a'))
216 (func 231 (func
217 (symbol 'grep') 232 (symbol 'grep')
218 (string 'b'))) 233 (string 'b')))
219 * matcher: 234 * matcher:
220 <unionmatcher matchers=[ 235 <unionmatcher matchers=[
221 <patternmatcher patterns='(?:a1$)'>, 236 <patternmatcher patterns='(?:a1$|b2$)'>,
222 <patternmatcher patterns='(?:b2$)'>,
223 <predicatenmatcher pred=grep('a')>, 237 <predicatenmatcher pred=grep('a')>,
224 <predicatenmatcher pred=grep('b')>]> 238 <predicatenmatcher pred=grep('b')>]>
225 a1 239 a1
226 a2 240 a2
227 b1 241 b1