annotate mercurial/minirst.py @ 9996:2770d03ae49f stable

handle file URIs correctly, according to RFC 2396 (issue1153) The new code aims to implement the RFC correctly for file URIs. Previously they were handled incorrectly in several ways, which could cause problem on Windows in particular.
author Sune Foldager <cryo@cyanite.org>
date Thu, 03 Dec 2009 11:06:55 +0100
parents 245689e7f869
children a46478b80ea3 25e572394f5c
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
1 # minirst.py - minimal reStructuredText parser
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
2 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
3 # Copyright 2009 Matt Mackall <mpm@selenic.com> and others
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
4 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
6 # GNU General Public License version 2, incorporated herein by reference.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
7
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
8 """simplified reStructuredText parser.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
9
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
10 This parser knows just enough about reStructuredText to parse the
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
11 Mercurial docstrings.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
12
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
13 It cheats in a major way: nested blocks are not really nested. They
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
14 are just indented blocks that look like they are nested. This relies
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
15 on the user to keep the right indentation for the blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
16
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
17 It only supports a small subset of reStructuredText:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
18
9741
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
19 - sections
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
20
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
21 - paragraphs
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
22
9741
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
23 - literal blocks
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
24
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
25 - definition lists
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
26
9741
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
27 - bullet lists (items must start with '-')
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
28
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
29 - enumerated lists (no autonumbering)
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
30
9293
e48a48b754d3 minirst: parse field lists
Martin Geisler <mg@lazybytes.net>
parents: 9292
diff changeset
31 - field lists (colons cannot be escaped)
e48a48b754d3 minirst: parse field lists
Martin Geisler <mg@lazybytes.net>
parents: 9292
diff changeset
32
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
33 - option lists (supports only long options without arguments)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
34
9741
245689e7f869 minirst: update module docstring
Martin Geisler <mg@lazybytes.net>
parents: 9739
diff changeset
35 - inline literals (no other inline markup is not recognized)
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
36 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
37
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
38 import re, sys, textwrap
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
39
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
40
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
41 def findblocks(text):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
42 """Find continuous blocks of lines in text.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
43
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
44 Returns a list of dictionaries representing the blocks. Each block
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
45 has an 'indent' field and a 'lines' field.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
46 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
47 blocks = [[]]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
48 lines = text.splitlines()
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
49 for line in lines:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
50 if line.strip():
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
51 blocks[-1].append(line)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
52 elif blocks[-1]:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
53 blocks.append([])
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
54 if not blocks[-1]:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
55 del blocks[-1]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
56
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
57 for i, block in enumerate(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
58 indent = min((len(l) - len(l.lstrip())) for l in block)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
59 blocks[i] = dict(indent=indent, lines=[l[indent:] for l in block])
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
60 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
61
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
62
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
63 def findliteralblocks(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
64 """Finds literal blocks and adds a 'type' field to the blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
65
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
66 Literal blocks are given the type 'literal', all other blocks are
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
67 given type the 'paragraph'.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
68 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
69 i = 0
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
70 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
71 # Searching for a block that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
72 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
73 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
74 # | paragraph |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
75 # | (ends with "::") |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
76 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
77 # +---------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
78 # | indented literal block |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
79 # +---------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
80 blocks[i]['type'] = 'paragraph'
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
81 if blocks[i]['lines'][-1].endswith('::') and i+1 < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
82 indent = blocks[i]['indent']
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
83 adjustment = blocks[i+1]['indent'] - indent
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
84
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
85 if blocks[i]['lines'] == ['::']:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
86 # Expanded form: remove block
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
87 del blocks[i]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
88 i -= 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
89 elif blocks[i]['lines'][-1].endswith(' ::'):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
90 # Partially minimized form: remove space and both
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
91 # colons.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
92 blocks[i]['lines'][-1] = blocks[i]['lines'][-1][:-3]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
93 else:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
94 # Fully minimized form: remove just one colon.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
95 blocks[i]['lines'][-1] = blocks[i]['lines'][-1][:-1]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
96
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
97 # List items are formatted with a hanging indent. We must
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
98 # correct for this here while we still have the original
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
99 # information on the indentation of the subsequent literal
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
100 # blocks available.
9738
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
101 m = _bulletre.match(blocks[i]['lines'][0])
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
102 if m:
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
103 indent += m.end()
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
104 adjustment -= m.end()
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
105
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
106 # Mark the following indented blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
107 while i+1 < len(blocks) and blocks[i+1]['indent'] > indent:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
108 blocks[i+1]['type'] = 'literal'
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
109 blocks[i+1]['indent'] -= adjustment
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
110 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
111 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
112 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
113
9739
75cff8f12910 minirst: support enumerated lists
Martin Geisler <mg@lazybytes.net>
parents: 9738
diff changeset
114 _bulletre = re.compile(r'(-|[0-9A-Za-z]+\.|\(?[0-9A-Za-z]+\)) ')
9737
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
115 _optionre = re.compile(r'^(--[a-z-]+)((?:[ =][a-zA-Z][\w-]*)? +)(.*)$')
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
116 _fieldre = re.compile(r':(?![: ])([^:]*)(?<! ):( +)(.*)')
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
117 _definitionre = re.compile(r'[^ ]')
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
118
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
119 def splitparagraphs(blocks):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
120 """Split paragraphs into lists."""
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
121 # Tuples with (list type, item regexp, single line items?). Order
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
122 # matters: definition lists has the least specific regexp and must
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
123 # come last.
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
124 listtypes = [('bullet', _bulletre, True),
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
125 ('option', _optionre, True),
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
126 ('field', _fieldre, True),
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
127 ('definition', _definitionre, False)]
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
128
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
129 def match(lines, i, itemre, singleline):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
130 """Does itemre match an item at line i?
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
131
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
132 A list item can be followed by an idented line or another list
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
133 item (but only if singleline is True).
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
134 """
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
135 line1 = lines[i]
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
136 line2 = i+1 < len(lines) and lines[i+1] or ''
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
137 if not itemre.match(line1):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
138 return False
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
139 if singleline:
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
140 return line2 == '' or line2[0] == ' ' or itemre.match(line2)
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
141 else:
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
142 return line2.startswith(' ')
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
143
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
144 i = 0
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
145 while i < len(blocks):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
146 if blocks[i]['type'] == 'paragraph':
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
147 lines = blocks[i]['lines']
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
148 for type, itemre, singleline in listtypes:
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
149 if match(lines, 0, itemre, singleline):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
150 items = []
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
151 for j, line in enumerate(lines):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
152 if match(lines, j, itemre, singleline):
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
153 items.append(dict(type=type, lines=[],
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
154 indent=blocks[i]['indent']))
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
155 items[-1]['lines'].append(line)
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
156 blocks[i:i+1] = items
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
157 break
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
158 i += 1
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
159 return blocks
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
160
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
161
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
162 def findsections(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
163 """Finds sections.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
164
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
165 The blocks must have a 'type' field, i.e., they should have been
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
166 run through findliteralblocks first.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
167 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
168 for block in blocks:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
169 # Searching for a block that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
170 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
171 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
172 # | Section title |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
173 # | ------------- |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
174 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
175 if (block['type'] == 'paragraph' and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
176 len(block['lines']) == 2 and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
177 block['lines'][1] == '-' * len(block['lines'][0])):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
178 block['type'] = 'section'
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
179 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
180
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
181
9623
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
182 def inlineliterals(blocks):
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
183 for b in blocks:
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
184 if b['type'] == 'paragraph':
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
185 b['lines'] = [l.replace('``', '"') for l in b['lines']]
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
186 return blocks
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
187
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
188
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
189 def addmargins(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
190 """Adds empty blocks for vertical spacing.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
191
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
192 This groups bullets, options, and definitions together with no vertical
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
193 space between them, and adds an empty block between all other blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
194 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
195 i = 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
196 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
197 if (blocks[i]['type'] == blocks[i-1]['type'] and
9293
e48a48b754d3 minirst: parse field lists
Martin Geisler <mg@lazybytes.net>
parents: 9292
diff changeset
198 blocks[i]['type'] in ('bullet', 'option', 'field', 'definition')):
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
199 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
200 else:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
201 blocks.insert(i, dict(lines=[''], indent=0, type='margin'))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
202 i += 2
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
203 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
204
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
205
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
206 def formatblock(block, width):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
207 """Format a block according to width."""
9417
4c3fb45123e5 util, minirst: do not crash with COLUMNS=0
Martin Geisler <mg@lazybytes.net>
parents: 9293
diff changeset
208 if width <= 0:
4c3fb45123e5 util, minirst: do not crash with COLUMNS=0
Martin Geisler <mg@lazybytes.net>
parents: 9293
diff changeset
209 width = 78
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
210 indent = ' ' * block['indent']
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
211 if block['type'] == 'margin':
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
212 return ''
9735
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
213 if block['type'] == 'literal':
9291
cd5b6a11b607 minirst: indent literal blocks with two spaces
Martin Geisler <mg@lazybytes.net>
parents: 9156
diff changeset
214 indent += ' '
cd5b6a11b607 minirst: indent literal blocks with two spaces
Martin Geisler <mg@lazybytes.net>
parents: 9156
diff changeset
215 return indent + ('\n' + indent).join(block['lines'])
9735
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
216 if block['type'] == 'section':
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
217 return indent + ('\n' + indent).join(block['lines'])
9735
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
218 if block['type'] == 'definition':
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
219 term = indent + block['lines'][0]
9737
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
220 hang = len(block['lines'][-1]) - len(block['lines'][-1].lstrip())
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
221 defindent = indent + hang * ' '
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
222 text = ' '.join(map(str.strip, block['lines'][1:]))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
223 return "%s\n%s" % (term, textwrap.fill(text, width=width,
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
224 initial_indent=defindent,
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
225 subsequent_indent=defindent))
9735
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
226 initindent = subindent = indent
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
227 if block['type'] == 'bullet':
9738
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
228 m = _bulletre.match(block['lines'][0])
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
229 if m:
f52c4f7a4732 minirst: prepare for general types of bullet lists
Martin Geisler <mg@lazybytes.net>
parents: 9737
diff changeset
230 subindent = indent + m.end() * ' '
9737
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
231 elif block['type'] == 'field':
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
232 m = _fieldre.match(block['lines'][0])
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
233 if m:
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
234 key, spaces, rest = m.groups()
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
235 # Turn ":foo: bar" into "foo bar".
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
236 block['lines'][0] = '%s %s%s' % (key, spaces, rest)
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
237 subindent = indent + (2 + len(key) + len(spaces)) * ' '
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
238 elif block['type'] == 'option':
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
239 m = _optionre.match(block['lines'][0])
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
240 if m:
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
241 option, arg, rest = m.groups()
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
242 subindent = indent + (len(option) + len(arg)) * ' '
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
243
9737
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
244 text = ' '.join(map(str.strip, block['lines']))
9735
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
245 return textwrap.fill(text, width=width,
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
246 initial_indent=initindent,
97d0d910fa5d minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents: 9623
diff changeset
247 subsequent_indent=subindent)
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
248
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
249
9540
cad36e496640 help: un-indent help topics
Martin Geisler <mg@lazybytes.net>
parents: 9417
diff changeset
250 def format(text, width, indent=0):
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
251 """Parse and format the text according to width."""
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
252 blocks = findblocks(text)
9540
cad36e496640 help: un-indent help topics
Martin Geisler <mg@lazybytes.net>
parents: 9417
diff changeset
253 for b in blocks:
cad36e496640 help: un-indent help topics
Martin Geisler <mg@lazybytes.net>
parents: 9417
diff changeset
254 b['indent'] += indent
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
255 blocks = findliteralblocks(blocks)
9623
32727ce029de minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents: 9540
diff changeset
256 blocks = inlineliterals(blocks)
9737
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
257 blocks = splitparagraphs(blocks)
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
258 blocks = findsections(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
259 blocks = addmargins(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
260 return '\n'.join(formatblock(b, width) for b in blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
261
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
262
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
263 if __name__ == "__main__":
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
264 from pprint import pprint
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
265
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
266 def debug(func, blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
267 blocks = func(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
268 print "*** after %s:" % func.__name__
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
269 pprint(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
270 print
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
271 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
272
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
273 text = open(sys.argv[1]).read()
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
274 blocks = debug(findblocks, text)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
275 blocks = debug(findliteralblocks, blocks)
9737
5f101af4a921 minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents: 9735
diff changeset
276 blocks = debug(splitparagraphs, blocks)
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
277 blocks = debug(findsections, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
278 blocks = debug(addmargins, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
279 print '\n'.join(formatblock(b, 30) for b in blocks)