view tests/test-minirst.py @ 46607:e9901d01d135

revlog: add a mechanism to verify expected file position before appending If someone uses `hg debuglocks`, or some non-hg process writes to the .hg directory without respecting the locks, or if the repo's on a networked filesystem, it's possible for the revlog code to write out corrupted data. The form of this corruption can vary depending on what data was written and how that happened. We are in the "networked filesystem" case (though I've had users also do this to themselves with the "`hg debuglocks`" scenario), and most often see this with the changelog. What ends up happening is we produce two items (let's call them rev1 and rev2) in the .i file that have the same linkrev, baserev, and offset into the .d file, while the data in the .d file is appended properly. rev2's compressed_size is accurate for rev2, but when we go to decompress the data in the .d file, we use the offset that's recorded in the index file, which is the same as rev1, and attempt to decompress rev2.compressed_size bytes of rev1's data. This usually does not succeed. :) When using inline data, this also fails, though I haven't investigated why too closely. This shows up as a "patch decode" error. I believe what's happening there is that we're basically ignoring the offset field, getting the data properly, but since baserev != rev, it thinks this is a delta based on rev (instead of a full text) and can't actually apply it as such. For now, I'm going to make this an optional component and default it to entirely off. I may increase the default severity of this in the future, once I've enabled it for my users and we gain more experience with it. Luckily, most of my users have a versioned filesystem and can roll back to before the corruption has been written, it's just a hassle to do so and not everyone knows how (so it's a support burden). Users on other filesystems will not have that luxury, and this can cause them to have a corrupted repository that they are unlikely to know how to resolve, and they'll see this as a data-loss event. Refusing to create the corruption is a much better user experience. This mechanism is not perfect. There may be false-negatives (racy writes that are not detected). There should not be any false-positives (non-racy writes that are detected as such). This is not a mechanism that makes putting a repo on a networked filesystem "safe" or "supported", just *less* likely to cause corruption. Differential Revision: https://phab.mercurial-scm.org/D9952
author Kyle Lippincott <spectral@google.com>
date Wed, 03 Feb 2021 16:33:10 -0800
parents aaff3bc75306
children 6000f5b25c9b
line wrap: on
line source

from __future__ import absolute_import, print_function
from mercurial import minirst
from mercurial.utils import stringutil


def debugformat(text, form, **kwargs):
    blocks, pruned = minirst.parse(text, **kwargs)
    if form == b'html':
        print("html format:")
        out = minirst.format(text, style=form, **kwargs)
    else:
        print("%d column format:" % form)
        out = minirst.format(text, width=form, **kwargs)

    print("-" * 70)
    print(out[:-1].decode('utf8'))
    if kwargs.get('keep'):
        print("-" * 70)
        print(stringutil.pprint(pruned).decode('utf8'))
    print("-" * 70)
    print()


def debugformats(title, text, **kwargs):
    print("== %s ==" % title)
    debugformat(text, 60, **kwargs)
    debugformat(text, 30, **kwargs)
    debugformat(text, b'html', **kwargs)


paragraphs = b"""
This is some text in the first paragraph.

  A small indented paragraph.
  It is followed by some lines
  containing random whitespace.
 \n  \n   \nThe third and final paragraph.
"""

debugformats('paragraphs', paragraphs)

definitions = b"""
A Term
  Definition. The indented
  lines make up the definition.
Another Term
  Another definition. The final line in the
   definition determines the indentation, so
    this will be indented with four spaces.

  A Nested/Indented Term
    Definition.
"""

debugformats('definitions', definitions)

literals = br"""
The fully minimized form is the most
convenient form::

  Hello
    literal
      world

In the partially minimized form a paragraph
simply ends with space-double-colon. ::

  ////////////////////////////////////////
  long un-wrapped line in a literal block
  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

::

  This literal block is started with '::',
    the so-called expanded form. The paragraph
      with '::' disappears in the final output.
"""

debugformats('literals', literals)

lists = b"""
- This is the first list item.

  Second paragraph in the first list item.

- List items need not be separated
  by a blank line.
- And will be rendered without
  one in any case.

We can have indented lists:

  - This is an indented list item

  - Another indented list item::

      - A literal block in the middle
            of an indented list.

      (The above is not a list item since we are in the literal block.)

::

  Literal block with no indentation (apart from
  the two spaces added to all literal blocks).

1. This is an enumerated list (first item).
2. Continuing with the second item.

(1) foo
(2) bar

1) Another
2) List

Line blocks are also a form of list:

| This is the first line.
  The line continues here.
| This is the second line.

Bullet lists are also detected:

* This is the first bullet
* This is the second bullet
  It has 2 lines
* This is the third bullet
"""

debugformats('lists', lists)

options = b"""
There is support for simple option lists,
but only with long options:

-X, --exclude  filter  an option with a short and long option with an argument
-I, --include          an option with both a short option and a long option
--all                  Output all.
--both                 Output both (this description is
                       quite long).
--long                 Output all day long.

--par                 This option has two paragraphs in its description.
                      This is the first.

                      This is the second.  Blank lines may be omitted between
                      options (as above) or left in (as here).


The next paragraph looks like an option list, but lacks the two-space
marker after the option. It is treated as a normal paragraph:

--foo bar baz
"""

debugformats('options', options)

fields = b"""
:a: First item.
:ab: Second item. Indentation and wrapping
     is handled automatically.
:c\:d: a key with colon
:efg\:\:hh: a key with many colon

Next list:

:small: The larger key below triggers full indentation here.
:much too large: This key is big enough to get its own line.
"""

debugformats('fields', fields)

containers = b"""
Normal output.

.. container:: debug

   Initial debug output.

.. container:: verbose

   Verbose output.

   .. container:: debug

      Debug output.
"""

debugformats('containers (normal)', containers)
debugformats('containers (verbose)', containers, keep=[b'verbose'])
debugformats('containers (debug)', containers, keep=[b'debug'])
debugformats(
    'containers (verbose debug)', containers, keep=[b'verbose', b'debug']
)

roles = b"""Please see :hg:`add`."""
debugformats('roles', roles)


sections = b"""
Title
=====

Section
-------

Subsection
''''''''''

Markup: ``foo`` and :hg:`help`
------------------------------
"""
debugformats('sections', sections)


admonitions = b"""
.. note::

   This is a note

   - Bullet 1
   - Bullet 2

   .. warning:: This is a warning Second
      input line of warning

.. danger::
   This is danger
"""

debugformats('admonitions', admonitions)

comments = b"""
Some text.

.. A comment

   .. An indented comment

   Some indented text.

..

Empty comment above
"""

debugformats('comments', comments)


data = [
    [b'a', b'b', b'c'],
    [b'1', b'2', b'3'],
    [b'foo', b'bar', b'baz this list is very very very long man'],
]

rst = minirst.maketable(data, 2, True)
table = b''.join(rst)

print(table.decode('utf8'))

debugformats('table', table)

data = [
    [b's', b'long', b'line\ngoes on here'],
    [b'', b'xy', b'tried to fix here\n        by indenting'],
]

rst = minirst.maketable(data, 1, False)
table = b''.join(rst)

print(table.decode('utf8'))

debugformats('table+nl', table)