parsimonius 0.10.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,10 @@
1
+ # Python-generated files
2
+ __pycache__/
3
+ *.py[oc]
4
+ build/
5
+ dist/
6
+ wheels/
7
+ *.egg-info
8
+
9
+ # Virtual environments
10
+ .venv
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2012 Erik Rose
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
4
+ this software and associated documentation files (the "Software"), to deal in
5
+ the Software without restriction, including without limitation the rights to
6
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
7
+ of the Software, and to permit persons to whom the Software is furnished to do
8
+ so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
@@ -0,0 +1,657 @@
1
+ Metadata-Version: 2.4
2
+ Name: parsimonius
3
+ Version: 0.10.0
4
+ Summary: (Soon to be) the fastest pure-Python PEG parser I could muster
5
+ Project-URL: Homepage, https://github.com/erikrose/parsimonius
6
+ License-Expression: MIT
7
+ License-File: LICENSE
8
+ Keywords: grammar,language,packrat,parse,parser,parsing,peg
9
+ Classifier: Development Status :: 3 - Alpha
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: MIT License
12
+ Classifier: Natural Language :: English
13
+ Classifier: Operating System :: OS Independent
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3 :: Only
16
+ Classifier: Programming Language :: Python :: 3.9
17
+ Classifier: Programming Language :: Python :: 3.10
18
+ Classifier: Programming Language :: Python :: 3.11
19
+ Classifier: Programming Language :: Python :: 3.12
20
+ Classifier: Programming Language :: Python :: 3.13
21
+ Classifier: Programming Language :: Python :: 3.14
22
+ Classifier: Topic :: Scientific/Engineering :: Information Analysis
23
+ Classifier: Topic :: Software Development :: Libraries
24
+ Classifier: Topic :: Text Processing :: General
25
+ Requires-Python: <3.15,>=3.9
26
+ Requires-Dist: aiogram==2.25.2; python_version < '3.10'
27
+ Requires-Dist: aiogram>=3.27.0; python_version >= '3.10'
28
+ Requires-Dist: aiohttp<3.9,>=3.8; python_version < '3.10'
29
+ Requires-Dist: aiohttp>=3.13.5; python_version >= '3.10'
30
+ Requires-Dist: parsimonious==0.10.0
31
+ Requires-Dist: regex>=2022.3.15
32
+ Provides-Extra: testing
33
+ Requires-Dist: pytest; extra == 'testing'
34
+ Description-Content-Type: text/x-rst
35
+
36
+ ============
37
+ Parsimonius
38
+ ============
39
+
40
+ Parsimonius aims to be the fastest arbitrary-lookahead parser written in pure
41
+ Python—and the most usable. It's based on parsing expression grammars (PEGs),
42
+ which means you feed it a simplified sort of EBNF notation. Parsimonius was
43
+ designed to undergird a MediaWiki parser that wouldn't take 5 seconds or a GB
44
+ of RAM to do one page, but it's applicable to all sorts of languages.
45
+
46
+ :Code: https://github.com/erikrose/parsimonius/
47
+ :Issues: https://github.com/erikrose/parsimonius/issues
48
+ :License: MIT License (MIT)
49
+ :Package: https://pypi.org/project/parsimonius/
50
+
51
+
52
+ Goals
53
+ =====
54
+
55
+ * Speed
56
+ * Frugal RAM use
57
+ * Minimalistic, understandable, idiomatic Python code
58
+ * Readable grammars
59
+ * Extensible grammars
60
+ * Complete test coverage
61
+ * Separation of concerns. Some Python parsing kits mix recognition with
62
+ instructions about how to turn the resulting tree into some kind of other
63
+ representation. This is limiting when you want to do several different things
64
+ with a tree: for example, render wiki markup to HTML *or* to text.
65
+ * Good error reporting. I want the parser to work *with* me as I develop a
66
+ grammar.
67
+
68
+
69
+ Install
70
+ =======
71
+
72
+ To install Parsimonius, run::
73
+
74
+ $ pip install parsimonius
75
+
76
+
77
+ Example Usage
78
+ =============
79
+
80
+ Here's how to build a simple grammar:
81
+
82
+ .. code:: python
83
+
84
+ >>> from parsimonius.grammar import Grammar
85
+ >>> grammar = Grammar(
86
+ ... """
87
+ ... bold_text = bold_open text bold_close
88
+ ... text = ~"[A-Z 0-9]*"i
89
+ ... bold_open = "(("
90
+ ... bold_close = "))"
91
+ ... """)
92
+
93
+ You can have forward references and even right recursion; it's all taken care
94
+ of by the grammar compiler. The first rule is taken to be the default start
95
+ symbol, but you can override that.
96
+
97
+ Next, let's parse something and get an abstract syntax tree:
98
+
99
+ .. code:: python
100
+
101
+ >>> print(grammar.parse('((bold stuff))'))
102
+ <Node called "bold_text" matching "((bold stuff))">
103
+ <Node called "bold_open" matching "((">
104
+ <RegexNode called "text" matching "bold stuff">
105
+ <Node called "bold_close" matching "))">
106
+
107
+ You'd typically then use a ``nodes.NodeVisitor`` subclass (see below) to walk
108
+ the tree and do something useful with it.
109
+
110
+ Another example would be to implement a parser for ``.ini``-files. Consider the following:
111
+
112
+ .. code:: python
113
+
114
+ grammar = Grammar(
115
+ r"""
116
+ expr = (entry / emptyline)*
117
+ entry = section pair*
118
+
119
+ section = lpar word rpar ws
120
+ pair = key equal value ws?
121
+
122
+ key = word+
123
+ value = (word / quoted)+
124
+ word = ~r"[-\w]+"
125
+ quoted = ~'"[^\"]+"'
126
+ equal = ws? "=" ws?
127
+ lpar = "["
128
+ rpar = "]"
129
+ ws = ~r"\s*"
130
+ emptyline = ws+
131
+ """
132
+ )
133
+
134
+
135
+ We could now implement a subclass of ``NodeVisitor`` like so:
136
+
137
+ .. code:: python
138
+
139
+ class IniVisitor(NodeVisitor):
140
+ def visit_expr(self, node, visited_children):
141
+ """ Returns the overall output. """
142
+ output = {}
143
+ for child in visited_children:
144
+ output.update(child[0])
145
+ return output
146
+
147
+ def visit_entry(self, node, visited_children):
148
+ """ Makes a dict of the section (as key) and the key/value pairs. """
149
+ key, values = visited_children
150
+ return {key: dict(values)}
151
+
152
+ def visit_section(self, node, visited_children):
153
+ """ Gets the section name. """
154
+ _, section, *_ = visited_children
155
+ return section.text
156
+
157
+ def visit_pair(self, node, visited_children):
158
+ """ Gets each key/value pair, returns a tuple. """
159
+ key, _, value, *_ = node.children
160
+ return key.text, value.text
161
+
162
+ def generic_visit(self, node, visited_children):
163
+ """ The generic visit method. """
164
+ return visited_children or node
165
+
166
+ And call it like that:
167
+
168
+ .. code:: python
169
+
170
+ from parsimonius.grammar import Grammar
171
+ from parsimonius.nodes import NodeVisitor
172
+
173
+ data = """[section]
174
+ somekey = somevalue
175
+ someotherkey=someothervalue
176
+
177
+ [anothersection]
178
+ key123 = "what the heck?"
179
+ key456="yet another one here"
180
+
181
+ """
182
+
183
+ tree = grammar.parse(data)
184
+
185
+ iv = IniVisitor()
186
+ output = iv.visit(tree)
187
+ print(output)
188
+
189
+ This would yield
190
+
191
+ .. code:: python
192
+
193
+ {'section': {'somekey': 'somevalue', 'someotherkey': 'someothervalue'}, 'anothersection': {'key123': '"what the heck?"', 'key456': '"yet another one here"'}}
194
+
195
+ Status
196
+ ======
197
+
198
+ * Everything that exists works. Test coverage is good.
199
+ * I don't plan on making any backward-incompatible changes to the rule syntax
200
+ in the future, so you can write grammars with confidence.
201
+ * It may be slow and use a lot of RAM; I haven't measured either yet. However,
202
+ I have yet to begin optimizing in earnest.
203
+ * Error reporting is now in place. ``repr`` methods of expressions, grammars,
204
+ and nodes are clear and helpful as well. The ``Grammar`` ones are
205
+ even round-trippable!
206
+ * The grammar extensibility story is underdeveloped at the moment. You should
207
+ be able to extend a grammar by simply concatenating more rules onto the
208
+ existing ones; later rules of the same name should override previous ones.
209
+ However, this is untested and may not be the final story.
210
+ * Sphinx docs are coming, but the docstrings are quite useful now.
211
+ * Note that there may be API changes until we get to 1.0, so be sure to pin to
212
+ the version you're using.
213
+
214
+ Coming Soon
215
+ -----------
216
+
217
+ * Optimizations to make Parsimonius worthy of its name
218
+ * Tighter RAM use
219
+ * Better-thought-out grammar extensibility story
220
+ * Amazing grammar debugging
221
+
222
+
223
+ A Little About PEG Parsers
224
+ ==========================
225
+
226
+ PEG parsers don't draw a distinction between lexing and parsing; everything is
227
+ done at once. As a result, there is no lookahead limit, as there is with, for
228
+ instance, Yacc. And, due to both of these properties, PEG grammars are easier
229
+ to write: they're basically just a more practical dialect of EBNF. With
230
+ caching, they take O(grammar size * text length) memory (though I plan to do
231
+ better), but they run in O(text length) time.
232
+
233
+ More Technically
234
+ ----------------
235
+
236
+ PEGs can describe a superset of *LL(k)* languages, any deterministic *LR(k)*
237
+ language, and many others—including some that aren't context-free
238
+ (http://www.brynosaurus.com/pub/lang/peg.pdf). They can also deal with what
239
+ would be ambiguous languages if described in canonical EBNF. They do this by
240
+ trading the ``|`` alternation operator for the ``/`` operator, which works the
241
+ same except that it makes priority explicit: ``a / b / c`` first tries matching
242
+ ``a``. If that fails, it tries ``b``, and, failing that, moves on to ``c``.
243
+ Thus, ambiguity is resolved by always yielding the first successful recognition.
244
+
245
+
246
+ Writing Grammars
247
+ ================
248
+
249
+ Grammars are defined by a series of rules. The syntax should be familiar to
250
+ anyone who uses regexes or reads programming language manuals. An example will
251
+ serve best:
252
+
253
+ .. code:: python
254
+
255
+ my_grammar = Grammar(r"""
256
+ styled_text = bold_text / italic_text
257
+ bold_text = "((" text "))"
258
+ italic_text = "''" text "''"
259
+ text = ~"[A-Z 0-9]*"i
260
+ """)
261
+
262
+ You can wrap a rule across multiple lines if you like; the syntax is very
263
+ forgiving.
264
+
265
+ If you want to save your grammar into a separate file, you should name it using
266
+ ``.ppeg`` extension.
267
+
268
+
269
+ Syntax Reference
270
+ ----------------
271
+
272
+ ==================== ========================================================
273
+ ``"some literal"`` Used to quote literals. Backslash escaping and Python
274
+ conventions for "raw" and Unicode strings help support
275
+ fiddly characters.
276
+
277
+ ``b"some literal"`` A bytes literal. Using bytes literals and regular
278
+ expressions allows your grammar to parse binary files.
279
+ Note that all literals and regular expressions must be
280
+ of the same type within a grammar. In grammars that
281
+ process bytestrings, you should make the grammar string
282
+ an ``r"""string"""`` so that byte literals like ``\xff``
283
+ work correctly.
284
+
285
+ [space] Sequences are made out of space- or tab-delimited
286
+ things. ``a b c`` matches spots where those 3
287
+ terms appear in that order.
288
+
289
+ ``a / b / c`` Alternatives. The first to succeed of ``a / b / c``
290
+ wins.
291
+
292
+ ``thing?`` An optional expression. This is greedy, always consuming
293
+ ``thing`` if it exists.
294
+
295
+ ``&thing`` A lookahead assertion. Ensures ``thing`` matches at the
296
+ current position but does not consume it.
297
+
298
+ ``!thing`` A negative lookahead assertion. Matches if ``thing``
299
+ isn't found here. Doesn't consume any text.
300
+
301
+ ``things*`` Zero or more things. This is greedy, always consuming as
302
+ many repetitions as it can.
303
+
304
+ ``things+`` One or more things. This is greedy, always consuming as
305
+ many repetitions as it can.
306
+
307
+ ``~r"regex"ilmsuxa`` Regexes have ``~`` in front and are quoted like
308
+ literals. Any flags_ (``asilmx``) follow the end quotes
309
+ as single chars. Regexes are good for representing
310
+ character classes (``[a-z0-9]``) and optimizing for
311
+ speed. The downside is that they won't be able to take
312
+ advantage of our fancy debugging, once we get that
313
+ working. Ultimately, I'd like to deprecate explicit
314
+ regexes and instead have Parsimonius dynamically build
315
+ them out of simpler primitives. Parsimonius uses the
316
+ regex_ library instead of the built-in re module.
317
+
318
+ ``~br"regex"`` A bytes regex; required if your grammar parses
319
+ bytestrings.
320
+
321
+ ``(things)`` Parentheses are used for grouping, like in every other
322
+ language.
323
+
324
+ ``thing{n}`` Exactly ``n`` repetitions of ``thing``.
325
+
326
+ ``thing{n,m}`` Between ``n`` and ``m`` repititions (inclusive.)
327
+
328
+ ``thing{,m}`` At most ``m`` repetitions of ``thing``.
329
+
330
+ ``thing{n,}`` At least ``n`` repetitions of ``thing``.
331
+
332
+ ==================== ========================================================
333
+
334
+ .. _flags: https://docs.python.org/3/howto/regex.html#compilation
335
+ .. _regex: https://github.com/mrabarnett/mrab-regex
336
+
337
+ Optimizing Grammars
338
+ ===================
339
+
340
+ Don't Repeat Expressions
341
+ ------------------------
342
+
343
+ If you need a ``~"[a-z0-9]"i`` at two points in your grammar, don't type it
344
+ twice. Make it a rule of its own, and reference it from wherever you need it.
345
+ You'll get the most out of the caching this way, since cache lookups are by
346
+ expression object identity (for speed).
347
+
348
+ Even if you have an expression that's very simple, not repeating it will
349
+ save RAM, as there can, at worst, be a cached int for every char in the text
350
+ you're parsing. In the future, we may identify repeated subexpressions
351
+ automatically and factor them up while building the grammar.
352
+
353
+ How much should you shove into one regex, versus how much should you break them
354
+ up to not repeat yourself? That's a fine balance and worthy of benchmarking.
355
+ More stuff jammed into a regex will execute faster, because it doesn't have to
356
+ run any Python between pieces, but a broken-up one will give better cache
357
+ performance if the individual pieces are re-used elsewhere. If the pieces of a
358
+ regex aren't used anywhere else, by all means keep the whole thing together.
359
+
360
+
361
+ Quantifiers
362
+ -----------
363
+
364
+ Bring your ``?`` and ``*`` quantifiers up to the highest level you
365
+ can. Otherwise, lower-level patterns could succeed but be empty and put a bunch
366
+ of useless nodes in your tree that didn't really match anything.
367
+
368
+
369
+ Processing Parse Trees
370
+ ======================
371
+
372
+ A parse tree has a node for each expression matched, even if it matched a
373
+ zero-length string, like ``"thing"?`` might.
374
+
375
+ The ``NodeVisitor`` class provides an inversion-of-control framework for
376
+ walking a tree and returning a new construct (tree, string, or whatever) based
377
+ on it. For now, have a look at its docstrings for more detail. There's also a
378
+ good example in ``grammar.RuleVisitor``. Notice how we take advantage of nodes'
379
+ iterability by using tuple unpacks in the formal parameter lists:
380
+
381
+ .. code:: python
382
+
383
+ def visit_or_term(self, or_term, (slash, _, term)):
384
+ ...
385
+
386
+ For reference, here is the production the above unpacks::
387
+
388
+ or_term = "/" _ term
389
+
390
+ When something goes wrong in your visitor, you get a nice error like this::
391
+
392
+ [normal traceback here...]
393
+ VisitationException: 'Node' object has no attribute 'foo'
394
+
395
+ Parse tree:
396
+ <Node called "rules" matching "number = ~"[0-9]+""> <-- *** We were here. ***
397
+ <Node matching "number = ~"[0-9]+"">
398
+ <Node called "rule" matching "number = ~"[0-9]+"">
399
+ <Node matching "">
400
+ <Node called "label" matching "number">
401
+ <Node matching " ">
402
+ <Node called "_" matching " ">
403
+ <Node matching "=">
404
+ <Node matching " ">
405
+ <Node called "_" matching " ">
406
+ <Node called "rhs" matching "~"[0-9]+"">
407
+ <Node called "term" matching "~"[0-9]+"">
408
+ <Node called "atom" matching "~"[0-9]+"">
409
+ <Node called "regex" matching "~"[0-9]+"">
410
+ <Node matching "~">
411
+ <Node called "literal" matching ""[0-9]+"">
412
+ <Node matching "">
413
+ <Node matching "">
414
+ <Node called "eol" matching "
415
+ ">
416
+ <Node matching "">
417
+
418
+ The parse tree is tacked onto the exception, and the node whose visitor method
419
+ raised the error is pointed out.
420
+
421
+ Why No Streaming Tree Processing?
422
+ ---------------------------------
423
+
424
+ Some have asked why we don't process the tree as we go, SAX-style. There are
425
+ two main reasons:
426
+
427
+ 1. It wouldn't work. With a PEG parser, no parsing decision is final until the
428
+ whole text is parsed. If we had to change a decision, we'd have to backtrack
429
+ and redo the SAX-style interpretation as well, which would involve
430
+ reconstituting part of the AST and quite possibly scuttling whatever you
431
+ were doing with the streaming output. (Note that some bursty SAX-style
432
+ processing may be possible in the future if we use cuts.)
433
+
434
+ 2. It interferes with the ability to derive multiple representations from the
435
+ AST: for example, turning wiki markup into first HTML and then text.
436
+
437
+
438
+ Future Directions
439
+ =================
440
+
441
+ Rule Syntax Changes
442
+ -------------------
443
+
444
+ * Maybe support left-recursive rules like PyMeta, if anybody cares.
445
+ * Ultimately, I'd like to get rid of explicit regexes and break them into more
446
+ atomic things like character classes. Then we can dynamically compile bits
447
+ of the grammar into regexes as necessary to boost speed.
448
+
449
+ Optimizations
450
+ -------------
451
+
452
+ * Make RAM use almost constant by automatically inserting "cuts", as described
453
+ in
454
+ http://ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf.
455
+ This would also improve error reporting, as we wouldn't backtrack out of
456
+ everything informative before finally failing.
457
+ * Find all the distinct subexpressions, and unify duplicates for a better cache
458
+ hit ratio.
459
+ * Think about having the user (optionally) provide some representative input
460
+ along with a grammar. We can then profile against it, see which expressions
461
+ are worth caching, and annotate the grammar. Perhaps there will even be
462
+ positions at which a given expression is more worth caching. Or we could keep
463
+ a count of how many times each cache entry has been used and evict the most
464
+ useless ones as RAM use grows.
465
+ * We could possibly compile the grammar into VM instructions, like in "A
466
+ parsing machine for PEGs" by Medeiros.
467
+ * If the recursion gets too deep in practice, use trampolining to dodge it.
468
+
469
+ Niceties
470
+ --------
471
+
472
+ * Pijnu has a raft of tree manipulators. I don't think I want all of them, but
473
+ a judicious subset might be nice. Don't get into mixing formatting with tree
474
+ manipulation.
475
+ https://github.com/erikrose/pijnu/blob/master/library/node.py#L333. PyPy's
476
+ parsing lib exposes a sane subset:
477
+ http://doc.pypy.org/en/latest/rlib.html#tree-transformations.
478
+
479
+
480
+ Version History
481
+ ===============
482
+
483
+ 0.11.0
484
+ * Correctly handle `/` expressions with multiple terms in a row. (lucaswiman)
485
+ * Start using pyproject.toml. (Kolanich)
486
+ * Add a ``ParsimoniusError`` exception base class. (Kevin Kirsche)
487
+ * Fall back to ``re`` when the ``regex`` lib is not available. (Pavel Kirienko)
488
+
489
+ 0.10.0
490
+ * Fix infinite recursion in __eq__ in some cases. (FelisNivalis)
491
+ * Improve error message in left-recursive rules. (lucaswiman)
492
+ * Add support for range ``{min,max}`` repetition expressions (righthandabacus)
493
+ * Fix bug in ``*`` and ``+`` for token grammars (lucaswiman)
494
+ * Add support for grammars on bytestrings (lucaswiman)
495
+ * Fix LazyReference resolution bug #134 (righthandabacus)
496
+ * ~15% speedup on benchmarks with a faster node cache (ethframe)
497
+
498
+ .. warning::
499
+
500
+ This release makes backward-incompatible changes:
501
+
502
+ * Fix precedence of string literal modifiers ``u/r/b``.
503
+ This will break grammars with no spaces between a
504
+ reference and a string literal. (lucaswiman)
505
+
506
+
507
+ 0.9.0
508
+ * Add support for Python 3.7, 3.8, 3.9, 3.10 (righthandabacus, Lonnen)
509
+ * Drop support for Python 2.x, 3.3, 3.4 (righthandabacus, Lonnen)
510
+ * Remove six and go all in on Python 3 idioms (Lonnen)
511
+ * Replace re with regex for improved handling of unicode characters
512
+ in regexes (Oderjunkie)
513
+ * Dropped nose for unittest (swayson)
514
+ * `Grammar.__repr__()` now correctly escapes backslashes (ingolemo)
515
+ * Custom rules can now be class methods in addition to
516
+ functions (James Addison)
517
+ * Make the ascii flag available in the regex syntax (Roman Inflianskas)
518
+
519
+ 0.8.1
520
+ * Switch to a function-style ``print`` in the benchmark tests so we work
521
+ cleanly as a dependency on Python 3. (Edward Betts)
522
+
523
+ 0.8.0
524
+ * Make Grammar iteration ordered, making the ``__repr__`` more like the
525
+ original input. (Lucas Wiman)
526
+ * Improve text representation and error messages for anonymous
527
+ subexpressions. (Lucas Wiman)
528
+ * Expose BadGrammar and VisitationError as top-level imports.
529
+ * No longer crash when you try to compare a Node to an instance of a
530
+ different class. (Esben Sonne)
531
+ * Pin ``six`` at 1.9.0 to ensure we have ``python_2_unicode_compatible``.
532
+ (Sam Raker)
533
+ * Drop Python 2.6 support.
534
+
535
+ 0.7.0
536
+ * Add experimental token-based parsing, via TokenGrammar class, for those
537
+ operating on pre-lexed streams of tokens. This can, for example, help parse
538
+ indentation-sensitive languages that use the "off-side rule", like Python.
539
+ (Erik Rose)
540
+ * Common codebase for Python 2 and 3: no more 2to3 translation step (Mattias
541
+ Urlichs, Lucas Wiman)
542
+ * Drop Python 3.1 and 3.2 support.
543
+ * Fix a bug in ``Grammar.__repr__`` which fails to work on Python 3 since the
544
+ string_escape codec is gone in Python 3. (Lucas Wiman)
545
+ * Don't lose parentheses when printing representations of expressions.
546
+ (Michael Kelly)
547
+ * Make Grammar an immutable mapping (until we add automatic recompilation).
548
+ (Michael Kelly)
549
+
550
+ 0.6.2
551
+ * Make grammar compilation 100x faster. Thanks to dmoisset for the initial
552
+ patch.
553
+
554
+ 0.6.1
555
+ * Fix bug which made the default rule of a grammar invalid when it
556
+ contained a forward reference.
557
+
558
+ 0.6
559
+ .. warning::
560
+
561
+ This release makes backward-incompatible changes:
562
+
563
+ * The ``default_rule`` arg to Grammar's constructor has been replaced
564
+ with a method, ``some_grammar.default('rule_name')``, which returns a
565
+ new grammar just like the old except with its default rule changed.
566
+ This is to free up the constructor kwargs for custom rules.
567
+ * ``UndefinedLabel`` is no longer a subclass of ``VisitationError``. This
568
+ matters only in the unlikely case that you were catching
569
+ ``VisitationError`` exceptions and expecting to thus also catch
570
+ ``UndefinedLabel``.
571
+
572
+ * Add support for "custom rules" in Grammars. These provide a hook for simple
573
+ custom parsing hooks spelled as Python lambdas. For heavy-duty needs,
574
+ you can put in Compound Expressions with LazyReferences as subexpressions,
575
+ and the Grammar will hook them up for optimal efficiency--no calling
576
+ ``__getitem__`` on Grammar at parse time.
577
+ * Allow grammars without a default rule (in cases where there are no string
578
+ rules), which leads to also allowing empty grammars. Perhaps someone
579
+ building up grammars dynamically will find that useful.
580
+ * Add ``@rule`` decorator, allowing grammars to be constructed out of
581
+ notations on ``NodeVisitor`` methods. This saves looking back and forth
582
+ between the visitor and the grammar when there is only one visitor per
583
+ grammar.
584
+ * Add ``parse()`` and ``match()`` convenience methods to ``NodeVisitor``.
585
+ This makes the common case of parsing a string and applying exactly one
586
+ visitor to the AST shorter and simpler.
587
+ * Improve exception message when you forget to declare a visitor method.
588
+ * Add ``unwrapped_exceptions`` attribute to ``NodeVisitor``, letting you
589
+ name certain exceptions which propagate out of visitors without being
590
+ wrapped by ``VisitationError`` exceptions.
591
+ * Expose much more of the library in ``__init__``, making your imports
592
+ shorter.
593
+ * Drastically simplify reference resolution machinery. (Vladimir Keleshev)
594
+
595
+ 0.5
596
+ .. warning::
597
+
598
+ This release makes some backward-incompatible changes. See below.
599
+
600
+ * Add alpha-quality error reporting. Now, rather than returning ``None``,
601
+ ``parse()`` and ``match()`` raise ``ParseError`` if they don't succeed.
602
+ This makes more sense, since you'd rarely attempt to parse something and
603
+ not care if it succeeds. It was too easy before to forget to check for a
604
+ ``None`` result. ``ParseError`` gives you a human-readable unicode
605
+ representation as well as some attributes that let you construct your own
606
+ custom presentation.
607
+ * Grammar construction now raises ``ParseError`` rather than ``BadGrammar``
608
+ if it can't parse your rules.
609
+ * ``parse()`` now takes an optional ``pos`` argument, like ``match()``.
610
+ * Make the ``_str__()`` method of ``UndefinedLabel`` return the right type.
611
+ * Support splitting rules across multiple lines, interleaving comments,
612
+ putting multiple rules on one line (but don't do that) and all sorts of
613
+ other horrific behavior.
614
+ * Tolerate whitespace after opening parens.
615
+ * Add support for single-quoted literals.
616
+
617
+ 0.4
618
+ * Support Python 3.
619
+ * Fix ``import *`` for ``parsimonius.expressions``.
620
+ * Rewrite grammar compiler so right-recursive rules can be compiled and
621
+ parsing no longer fails in some cases with forward rule references.
622
+
623
+ 0.3
624
+ * Support comments, the ``!`` ("not") operator, and parentheses in grammar
625
+ definition syntax.
626
+ * Change the ``&`` operator to a prefix operator to conform to the original
627
+ PEG syntax. The version in Parsing Techniques was infix, and that's what I
628
+ used as a reference. However, the unary version is more convenient, as it
629
+ lets you spell ``AB & A`` as simply ``A &B``.
630
+ * Take the ``print`` statements out of the benchmark tests.
631
+ * Give Node an evaluate-able ``__repr__``.
632
+
633
+ 0.2
634
+ * Support matching of prefixes and other not-to-the-end slices of strings by
635
+ making ``match()`` public and able to initialize a new cache. Add
636
+ ``match()`` callthrough method to ``Grammar``.
637
+ * Report a ``BadGrammar`` exception (rather than crashing) when there are
638
+ mistakes in a grammar definition.
639
+ * Simplify grammar compilation internals: get rid of superfluous visitor
640
+ methods and factor up repetitive ones. Simplify rule grammar as well.
641
+ * Add ``NodeVisitor.lift_child`` convenience method.
642
+ * Rename ``VisitationException`` to ``VisitationError`` for consistency with
643
+ the standard Python exception hierarchy.
644
+ * Rework ``repr`` and ``str`` values for grammars and expressions. Now they
645
+ both look like rule syntax. Grammars are even round-trippable! This fixes a
646
+ unicode encoding error when printing nodes that had parsed unicode text.
647
+ * Add tox for testing. Stop advertising Python 2.5 support, which never
648
+ worked (and won't unless somebody cares a lot, since it makes Python 3
649
+ support harder).
650
+ * Settle (hopefully) on the term "rule" to mean "the string representation of
651
+ a production". Get rid of the vague, mysterious "DSL".
652
+
653
+ 0.1
654
+ * A rough but useable preview release
655
+
656
+ Thanks to Wiki Loves Monuments Panama for showing their support with a generous
657
+ gift.