annotate mercurial/minirst.py @ 9206:c1a1b49221e3

color: use reST syntax for literal block
author Martin Geisler <mg@lazybytes.net>
date Thu, 23 Jul 2009 00:22:35 +0200
parents c9c7e8cdac9c
children cd5b6a11b607
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
9156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
1 # minirst.py - minimal reStructuredText parser
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
2 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
3 # Copyright 2009 Matt Mackall <mpm@selenic.com> and others
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
4 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
5 # This software may be used and distributed according to the terms of the
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
6 # GNU General Public License version 2, incorporated herein by reference.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
7
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
8 """simplified reStructuredText parser.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
9
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
10 This parser knows just enough about reStructuredText to parse the
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
11 Mercurial docstrings.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
12
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
13 It cheats in a major way: nested blocks are not really nested. They
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
14 are just indented blocks that look like they are nested. This relies
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
15 on the user to keep the right indentation for the blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
16
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
17 It only supports a small subset of reStructuredText:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
18
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
19 - paragraphs
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
20
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
21 - definition lists (must use ' ' to indent definitions)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
22
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
23 - lists (items must start with '-')
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
24
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
25 - literal blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
26
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
27 - option lists (supports only long options without arguments)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
28
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
29 - inline markup is not recognized at all.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
30 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
31
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
32 import re, sys, textwrap
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
33
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
34
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
35 def findblocks(text):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
36 """Find continuous blocks of lines in text.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
37
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
38 Returns a list of dictionaries representing the blocks. Each block
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
39 has an 'indent' field and a 'lines' field.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
40 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
41 blocks = [[]]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
42 lines = text.splitlines()
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
43 for line in lines:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
44 if line.strip():
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
45 blocks[-1].append(line)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
46 elif blocks[-1]:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
47 blocks.append([])
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
48 if not blocks[-1]:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
49 del blocks[-1]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
50
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
51 for i, block in enumerate(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
52 indent = min((len(l) - len(l.lstrip())) for l in block)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
53 blocks[i] = dict(indent=indent, lines=[l[indent:] for l in block])
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
54 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
55
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
56
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
57 def findliteralblocks(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
58 """Finds literal blocks and adds a 'type' field to the blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
59
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
60 Literal blocks are given the type 'literal', all other blocks are
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
61 given type the 'paragraph'.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
62 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
63 i = 0
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
64 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
65 # Searching for a block that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
66 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
67 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
68 # | paragraph |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
69 # | (ends with "::") |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
70 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
71 # +---------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
72 # | indented literal block |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
73 # +---------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
74 blocks[i]['type'] = 'paragraph'
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
75 if blocks[i]['lines'][-1].endswith('::') and i+1 < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
76 indent = blocks[i]['indent']
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
77 adjustment = blocks[i+1]['indent'] - indent
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
78
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
79 if blocks[i]['lines'] == ['::']:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
80 # Expanded form: remove block
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
81 del blocks[i]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
82 i -= 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
83 elif blocks[i]['lines'][-1].endswith(' ::'):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
84 # Partially minimized form: remove space and both
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
85 # colons.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
86 blocks[i]['lines'][-1] = blocks[i]['lines'][-1][:-3]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
87 else:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
88 # Fully minimized form: remove just one colon.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
89 blocks[i]['lines'][-1] = blocks[i]['lines'][-1][:-1]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
90
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
91 # List items are formatted with a hanging indent. We must
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
92 # correct for this here while we still have the original
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
93 # information on the indentation of the subsequent literal
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
94 # blocks available.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
95 if blocks[i]['lines'][0].startswith('- '):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
96 indent += 2
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
97 adjustment -= 2
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
98
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
99 # Mark the following indented blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
100 while i+1 < len(blocks) and blocks[i+1]['indent'] > indent:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
101 blocks[i+1]['type'] = 'literal'
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
102 blocks[i+1]['indent'] -= adjustment
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
103 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
104 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
105 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
106
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
107
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
108 def findsections(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
109 """Finds sections.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
110
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
111 The blocks must have a 'type' field, i.e., they should have been
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
112 run through findliteralblocks first.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
113 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
114 for block in blocks:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
115 # Searching for a block that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
116 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
117 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
118 # | Section title |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
119 # | ------------- |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
120 # +------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
121 if (block['type'] == 'paragraph' and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
122 len(block['lines']) == 2 and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
123 block['lines'][1] == '-' * len(block['lines'][0])):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
124 block['type'] = 'section'
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
125 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
126
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
127
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
128 def findbulletlists(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
129 """Finds bullet lists.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
130
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
131 The blocks must have a 'type' field, i.e., they should have been
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
132 run through findliteralblocks first.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
133 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
134 i = 0
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
135 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
136 # Searching for a paragraph that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
137 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
138 # +------+-----------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
139 # | "- " | list item |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
140 # +------| (body elements)+ |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
141 # +-----------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
142 if (blocks[i]['type'] == 'paragraph' and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
143 blocks[i]['lines'][0].startswith('- ')):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
144 items = []
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
145 for line in blocks[i]['lines']:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
146 if line.startswith('- '):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
147 items.append(dict(type='bullet', lines=[],
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
148 indent=blocks[i]['indent'] + 2))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
149 line = line[2:]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
150 items[-1]['lines'].append(line)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
151 blocks[i:i+1] = items
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
152 i += len(items) - 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
153 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
154 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
155
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
156
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
157 _optionre = re.compile(r'^(--[a-z-]+)((?:[ =][a-zA-Z][\w-]*)? +)(.*)$')
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
158 def findoptionlists(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
159 """Finds option lists.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
160
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
161 The blocks must have a 'type' field, i.e., they should have been
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
162 run through findliteralblocks first.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
163 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
164 i = 0
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
165 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
166 # Searching for a paragraph that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
167 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
168 # +----------------------------+-------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
169 # | "--" option " " | description |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
170 # +-------+--------------------+ |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
171 # | (body elements)+ |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
172 # +----------------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
173 if (blocks[i]['type'] == 'paragraph' and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
174 _optionre.match(blocks[i]['lines'][0])):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
175 options = []
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
176 for line in blocks[i]['lines']:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
177 m = _optionre.match(line)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
178 if m:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
179 option, arg, rest = m.groups()
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
180 width = len(option) + len(arg)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
181 options.append(dict(type='option', lines=[],
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
182 indent=blocks[i]['indent'],
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
183 width=width))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
184 options[-1]['lines'].append(line)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
185 blocks[i:i+1] = options
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
186 i += len(options) - 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
187 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
188 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
189
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
190
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
191 def finddefinitionlists(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
192 """Finds definition lists.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
193
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
194 The blocks must have a 'type' field, i.e., they should have been
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
195 run through findliteralblocks first.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
196 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
197 i = 0
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
198 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
199 # Searching for a paragraph that looks like this:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
200 #
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
201 # +----------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
202 # | term |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
203 # +--+-------------------------+--+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
204 # | definition |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
205 # | (body elements)+ |
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
206 # +----------------------------+
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
207 if (blocks[i]['type'] == 'paragraph' and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
208 len(blocks[i]['lines']) > 1 and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
209 not blocks[i]['lines'][0].startswith(' ') and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
210 blocks[i]['lines'][1].startswith(' ')):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
211 definitions = []
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
212 for line in blocks[i]['lines']:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
213 if not line.startswith(' '):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
214 definitions.append(dict(type='definition', lines=[],
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
215 indent=blocks[i]['indent']))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
216 definitions[-1]['lines'].append(line)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
217 definitions[-1]['hang'] = len(line) - len(line.lstrip())
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
218 blocks[i:i+1] = definitions
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
219 i += len(definitions) - 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
220 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
221 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
222
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
223
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
224 def addmargins(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
225 """Adds empty blocks for vertical spacing.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
226
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
227 This groups bullets, options, and definitions together with no vertical
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
228 space between them, and adds an empty block between all other blocks.
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
229 """
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
230 i = 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
231 while i < len(blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
232 if (blocks[i]['type'] == blocks[i-1]['type'] and
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
233 blocks[i]['type'] in ('bullet', 'option', 'definition')):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
234 i += 1
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
235 else:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
236 blocks.insert(i, dict(lines=[''], indent=0, type='margin'))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
237 i += 2
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
238 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
239
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
240
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
241 def formatblock(block, width):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
242 """Format a block according to width."""
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
243 indent = ' ' * block['indent']
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
244 if block['type'] == 'margin':
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
245 return ''
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
246 elif block['type'] in ('literal', 'section'):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
247 return indent + ('\n' + indent).join(block['lines'])
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
248 elif block['type'] == 'definition':
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
249 term = indent + block['lines'][0]
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
250 defindent = indent + block['hang'] * ' '
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
251 text = ' '.join(map(str.strip, block['lines'][1:]))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
252 return "%s\n%s" % (term, textwrap.fill(text, width=width,
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
253 initial_indent=defindent,
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
254 subsequent_indent=defindent))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
255 else:
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
256 initindent = subindent = indent
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
257 text = ' '.join(map(str.strip, block['lines']))
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
258 if block['type'] == 'bullet':
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
259 initindent = indent[:-2] + '- '
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
260 subindent = indent
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
261 elif block['type'] == 'option':
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
262 subindent = indent + block['width'] * ' '
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
263
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
264 return textwrap.fill(text, width=width,
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
265 initial_indent=initindent,
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
266 subsequent_indent=subindent)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
267
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
268
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
269 def format(text, width):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
270 """Parse and format the text according to width."""
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
271 blocks = findblocks(text)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
272 blocks = findliteralblocks(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
273 blocks = findsections(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
274 blocks = findbulletlists(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
275 blocks = findoptionlists(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
276 blocks = finddefinitionlists(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
277 blocks = addmargins(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
278 return '\n'.join(formatblock(b, width) for b in blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
279
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
280
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
281 if __name__ == "__main__":
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
282 from pprint import pprint
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
283
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
284 def debug(func, blocks):
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
285 blocks = func(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
286 print "*** after %s:" % func.__name__
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
287 pprint(blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
288 print
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
289 return blocks
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
290
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
291 text = open(sys.argv[1]).read()
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
292 blocks = debug(findblocks, text)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
293 blocks = debug(findliteralblocks, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
294 blocks = debug(findsections, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
295 blocks = debug(findbulletlists, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
296 blocks = debug(findoptionlists, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
297 blocks = debug(finddefinitionlists, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
298 blocks = debug(addmargins, blocks)
c9c7e8cdac9c minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff changeset
299 print '\n'.join(formatblock(b, 30) for b in blocks)