comparison mercurial/help/internals/wireprotocolv2.txt @ 40178:46a40bce3ae0

wireprotov2: define and implement "filesdata" command Previously, the only way to access file revision data was the "filedata" command. This command is useful to have. But, it only allowed resolving revision data for a single file. This meant that clients needed to send 1 command for each tracked path they were seeking data on. Furthermore, those commands would need to enumerate the exact file nodes they wanted data for. This approach meant that clients were sending a lot of data to remotes in order to request file data. e.g. if there were 1M file revisions, we'd need at least 20,000,000 bytes just to encode file nodes! Many clients on the internet don't have that kind of upload capacity. In order to limit the amount of data that clients must send, we'll need more efficient ways to request repository data. This commit defines and implements a new "filesdata" command. This command allows the retrieval of data for multiple files by specifying changeset revisions and optional file patterns. The command figures out what file revisions are "relevant" and sends them in bulk. The logic around choosing which file revisions to send in the case of haveparents not being set is overly simple and will over-send files. We will need more smarts here eventually. (Specifically, the client will need to tell the server which revisions it knows about.) This work is deferred until a later time. Differential Revision: https://phab.mercurial-scm.org/D4981
author Gregory Szorc <gregory.szorc@gmail.com>
date Wed, 03 Oct 2018 12:54:39 -0700
parents 41263df08109
children ed55a0077490
comparison
equal deleted inserted replaced
40177:41e2633bcd00 40178:46a40bce3ae0
338 When ``revision`` data is requested, the server chooses to emit either fulltext 338 When ``revision`` data is requested, the server chooses to emit either fulltext
339 revision data or a delta. What the server decides can be inferred by looking 339 revision data or a delta. What the server decides can be inferred by looking
340 for the presence of the ``delta`` or ``revision`` keys in the 340 for the presence of the ``delta`` or ``revision`` keys in the
341 ``fieldsfollowing`` array. 341 ``fieldsfollowing`` array.
342 342
343 filesdata
344 ---------
345
346 Obtain various data related to multiple tracked files for specific changesets.
347
348 This command is similar to ``filedata`` with the main difference being that
349 individual requests operate on multiple file paths. This allows clients to
350 request data for multiple paths by issuing a single command.
351
352 The command accepts the following arguments:
353
354 fields
355 (set of bytestring) Which data associated with a file to fetch.
356 The following values are recognized:
357
358 parents
359 Parent nodes for the revision.
360
361 revision
362 The raw revision data for a file.
363
364 haveparents
365 (bool) Whether the client has the parent revisions of all requested
366 nodes.
367
368 pathfilter
369 (map) Defines a filter that determines what file paths are relevant.
370
371 See the *Path Filters* section for more.
372
373 If the argument is omitted, it is assumed that all paths are relevant.
374
375 revisions
376 (array of maps) Specifies revisions whose data is being requested. Each value
377 in the array is a map describing revisions. See the *Revisions Specifiers*
378 section below for the format of this map.
379
380 Data will be sent for the union of all revisions resolved by all revision
381 specifiers.
382
383 Only revision specifiers operating on changeset revisions are allowed.
384
385 The response bytestream starts with a CBOR map describing the data that
386 follows. This map has the following bytestring keys:
387
388 totalpaths
389 (unsigned integer) Total number of paths whose data is being transferred.
390
391 totalitems
392 (unsigned integer) Total number of file revisions whose data is being
393 transferred.
394
395 Following the map header are 0 or more sequences of CBOR values. Each sequence
396 represents data for a specific tracked path. Each sequence begins with a CBOR
397 map describing the file data that follows. Following that map is N CBOR values
398 describing file revision data. The format of this data is identical to that
399 returned by the ``filedata`` command.
400
401 Each sequence's map header has the following bytestring keys:
402
403 path
404 (bytestring) The tracked file path whose data follows.
405
406 totalitems
407 (unsigned integer) Total number of file revisions whose data is being
408 transferred.
409
410 The ``haveparents`` argument has significant implications on the data
411 transferred.
412
413 When ``haveparents`` is true, the command MAY only emit data for file
414 revisions introduced by the set of changeset revisions whose data is being
415 requested. In other words, the command may assume that all file revisions
416 for all relevant paths for ancestors of the requested changeset revisions
417 are present on the receiver.
418
419 When ``haveparents`` is false, the command MUST assume that the receiver
420 has no file revisions data. This means that all referenced file revisions
421 in the queried set of changeset revisions will be sent.
422
423 TODO we'll probably want a more complicated mechanism for the client to
424 specify which ancestor revisions are known.
425
343 heads 426 heads
344 ----- 427 -----
345 428
346 Obtain DAG heads in the repository. 429 Obtain DAG heads in the repository.
347 430
557 640
558 The DAG range between ``roots`` and ``heads`` will be resolved and all 641 The DAG range between ``roots`` and ``heads`` will be resolved and all
559 revisions between will be used. Nodes in ``roots`` are not part of the 642 revisions between will be used. Nodes in ``roots`` are not part of the
560 resolved set. Nodes in ``heads`` are. The ``roots`` array may be empty. 643 resolved set. Nodes in ``heads`` are. The ``roots`` array may be empty.
561 The ``heads`` array MUST be defined. 644 The ``heads`` array MUST be defined.
645
646 Path Filters
647 ============
648
649 Various commands accept a *path filter* argument that defines the set of file
650 paths relevant to the request.
651
652 A *path filter* is defined as a map with the bytestring keys ``include`` and
653 ``exclude``. Each is an array of bytestring values. Each value defines a pattern
654 rule (see :hg:`help patterns`) that is used to match file paths.
655
656 A path matches the path filter if it is matched by a rule in the ``include``
657 set but doesn't match a rule in the ``exclude`` set. In other words, a path
658 matcher takes the union of all ``include`` patterns and then substracts the
659 union of all ``exclude`` patterns.
660
661 Patterns MUST be prefixed with their pattern type. Only the following pattern
662 types are allowed: ``path:``, ``rootfilesin:``.
663
664 If the ``include`` key is omitted, it is assumed that all paths are
665 relevant. The patterns from ``exclude`` will still be used, if defined.
666
667 An example value is ``path:tests/foo``, which would match a file named
668 ``tests/foo`` or a directory ``tests/foo`` and all files under it.