python-waterloo-lexer 0.5.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- python_waterloo_lexer-0.5.0/PKG-INFO +74 -0
- python_waterloo_lexer-0.5.0/README.md +114 -0
- python_waterloo_lexer-0.5.0/README_PYPI.md +61 -0
- python_waterloo_lexer-0.5.0/pyproject.toml +27 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.egg-info/PKG-INFO +74 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.egg-info/SOURCES.txt +10 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.egg-info/dependency_links.txt +1 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.egg-info/entry_points.txt +2 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.egg-info/requires.txt +1 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.egg-info/top_level.txt +1 -0
- python_waterloo_lexer-0.5.0/python_waterloo_lexer.py +803 -0
- python_waterloo_lexer-0.5.0/setup.cfg +4 -0
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: python-waterloo-lexer
|
|
3
|
+
Version: 0.5.0
|
|
4
|
+
Summary: A pygments lexer derived from the Python lexer, for Waterloo docstrings
|
|
5
|
+
Author: Uwe
|
|
6
|
+
License: BSD-2-Clause
|
|
7
|
+
Classifier: Programming Language :: Python :: 3
|
|
8
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Requires-Python: >=3.10
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
Requires-Dist: pygments
|
|
13
|
+
|
|
14
|
+
# Python-Waterloo Lexer
|
|
15
|
+
|
|
16
|
+

|
|
17
|
+

|
|
18
|
+

|
|
19
|
+
|
|
20
|
+
`python-waterloo-lexer` is a Pygments lexer for Python files that contain
|
|
21
|
+
Waterloo docstrings.
|
|
22
|
+
|
|
23
|
+
It can be used with `pygmentize` and other tools that load Pygments lexers via
|
|
24
|
+
entry points.
|
|
25
|
+
|
|
26
|
+
## What it provides
|
|
27
|
+
|
|
28
|
+
- a `python-waterloo` Pygments lexer alias
|
|
29
|
+
- syntax highlighting for Python files with Waterloo docstrings
|
|
30
|
+
- installation via PyPI, local checkout, or Git URL
|
|
31
|
+
|
|
32
|
+
## Installation
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
pip install python-waterloo-lexer
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Quick test
|
|
39
|
+
|
|
40
|
+
After installation, the lexer is available under the alias
|
|
41
|
+
`python-waterloo`.
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
pygmentize -l python-waterloo -f terminal16m <file.py>
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
You can also check whether Pygments lists the lexer:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
pygmentize -L lexers | grep -i waterloo || true
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Terminal viewer
|
|
54
|
+
|
|
55
|
+
For a quick terminal preview, a `less` alias can be handy:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
alias lessh='LESSOPEN="| pygmentize -O style=monokai %s" less -M -R'
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Then open files with:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
lessh <file.py>
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Project repository
|
|
68
|
+
|
|
69
|
+
Development happens in the Waterloo repository:
|
|
70
|
+
|
|
71
|
+
- <https://github.com/uwe-at-sdv/sdv_doc_waterloo>
|
|
72
|
+
|
|
73
|
+
The repository also contains related tooling, documentation, and editor
|
|
74
|
+
integrations for Waterloo docstrings.
|
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# Python-Waterloo Lexer (Pygments)
|
|
2
|
+
|
|
3
|
+
This folder contains a custom Pygments lexer for Python files with Waterloo docstrings.
|
|
4
|
+
|
|
5
|
+
> **Branch note (`ide-plugins`)**
|
|
6
|
+
>
|
|
7
|
+
> The Pygments lexer package lives on the `ide-plugins` branch. Installation from Git must
|
|
8
|
+
> reference `@ide-plugins` (see below). General repository documentation is in `@main/README.md`.
|
|
9
|
+
|
|
10
|
+
## Prerequisites
|
|
11
|
+
|
|
12
|
+
- Python >= 3.10
|
|
13
|
+
- `pip`
|
|
14
|
+
- `pygmentize` (provided by the `Pygments` package)
|
|
15
|
+
|
|
16
|
+
## Clone the correct branch
|
|
17
|
+
|
|
18
|
+
The plugin sources live on branch `ide-plugins`. If you want to work from a
|
|
19
|
+
fresh checkout, clone that branch explicitly:
|
|
20
|
+
|
|
21
|
+
HTTPS:
|
|
22
|
+
```bash
|
|
23
|
+
git clone --branch ide-plugins --single-branch https://github.com/uwe-at-sdv/sdv_doc_waterloo.git
|
|
24
|
+
cd sdv_doc_waterloo
|
|
25
|
+
```
|
|
26
|
+
SSH:
|
|
27
|
+
```bash
|
|
28
|
+
git clone --branch ide-plugins --single-branch git+ssh://git@github.com/uwe-at-sdv/sdv_doc_waterloo.git
|
|
29
|
+
cd sdv_doc_waterloo
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Quick test (no install)
|
|
33
|
+
|
|
34
|
+
Run `pygmentize` directly against the lexer source file (no packaging/install needed):
|
|
35
|
+
|
|
36
|
+
Dark terminal theme:
|
|
37
|
+
```bash
|
|
38
|
+
pygmentize -x \
|
|
39
|
+
-l pygments/python_waterloo_lexer.py:PythonWaterlooLexer -f terminal16m -O style=monokai \
|
|
40
|
+
examples-python/example_function_full.py
|
|
41
|
+
```
|
|
42
|
+
Light terminal theme:
|
|
43
|
+
```bash
|
|
44
|
+
pygmentize -x \
|
|
45
|
+
-l pygments/python_waterloo_lexer.py:PythonWaterlooLexer -f terminal16m -O style=friendly \
|
|
46
|
+
examples-python/example_function_full.py
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Terminal viewer
|
|
50
|
+
|
|
51
|
+
For ad-hoc file viewing in a terminal, a simple `less` alias can be useful:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
alias lessh='LESSOPEN="| pygmentize -O style=monokai %s" less -M -R'
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
Then use:
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
lessh examples-python/example_function_full.py
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
If your system already provides `lesspipe`, you can also wire that into your
|
|
64
|
+
shell startup file and have `less` itself perform the preprocessing step.
|
|
65
|
+
|
|
66
|
+
## Install from a local checkout
|
|
67
|
+
|
|
68
|
+
If you have cloned the repository, install the lexer package from this folder:
|
|
69
|
+
|
|
70
|
+
```bash
|
|
71
|
+
pip install ./pygments
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
For development, an editable install can be convenient:
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
pip install -e ./pygments
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Sanity check (after install)
|
|
81
|
+
|
|
82
|
+
The lexer should be discoverable by its alias:
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
pygmentize -l python-waterloo -f terminal16m examples-python/example_function_full.py
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
Tip: You can also check whether Pygments lists the lexer:
|
|
89
|
+
|
|
90
|
+
```bash
|
|
91
|
+
pygmentize -L lexers | grep -i waterloo || true
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## Install from PyPI (once available)
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
pip install python-waterloo-lexer
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Install directly from Git
|
|
101
|
+
|
|
102
|
+
If the repository is reachable by `pip`, the package can also be installed
|
|
103
|
+
directly from a Git URL.
|
|
104
|
+
|
|
105
|
+
**Important:** The `@ide-plugins` ref is required because this package is maintained on that branch.
|
|
106
|
+
|
|
107
|
+
HTTPS:
|
|
108
|
+
```bash
|
|
109
|
+
pip install "git+https://github.com/uwe-at-sdv/sdv_doc_waterloo.git@ide-plugins#subdirectory=pygments"
|
|
110
|
+
```
|
|
111
|
+
SSH:
|
|
112
|
+
```bash
|
|
113
|
+
pip install "git+ssh://git@github.com/uwe-at-sdv/sdv_doc_waterloo.git@ide-plugins#subdirectory=pygments"
|
|
114
|
+
```
|
|
@@ -0,0 +1,61 @@
|
|
|
1
|
+
# Python-Waterloo Lexer
|
|
2
|
+
|
|
3
|
+

|
|
4
|
+

|
|
5
|
+

|
|
6
|
+
|
|
7
|
+
`python-waterloo-lexer` is a Pygments lexer for Python files that contain
|
|
8
|
+
Waterloo docstrings.
|
|
9
|
+
|
|
10
|
+
It can be used with `pygmentize` and other tools that load Pygments lexers via
|
|
11
|
+
entry points.
|
|
12
|
+
|
|
13
|
+
## What it provides
|
|
14
|
+
|
|
15
|
+
- a `python-waterloo` Pygments lexer alias
|
|
16
|
+
- syntax highlighting for Python files with Waterloo docstrings
|
|
17
|
+
- installation via PyPI, local checkout, or Git URL
|
|
18
|
+
|
|
19
|
+
## Installation
|
|
20
|
+
|
|
21
|
+
```bash
|
|
22
|
+
pip install python-waterloo-lexer
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
## Quick test
|
|
26
|
+
|
|
27
|
+
After installation, the lexer is available under the alias
|
|
28
|
+
`python-waterloo`.
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
pygmentize -l python-waterloo -f terminal16m <file.py>
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
You can also check whether Pygments lists the lexer:
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
pygmentize -L lexers | grep -i waterloo || true
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Terminal viewer
|
|
41
|
+
|
|
42
|
+
For a quick terminal preview, a `less` alias can be handy:
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
alias lessh='LESSOPEN="| pygmentize -O style=monokai %s" less -M -R'
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Then open files with:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
lessh <file.py>
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
## Project repository
|
|
55
|
+
|
|
56
|
+
Development happens in the Waterloo repository:
|
|
57
|
+
|
|
58
|
+
- <https://github.com/uwe-at-sdv/sdv_doc_waterloo>
|
|
59
|
+
|
|
60
|
+
The repository also contains related tooling, documentation, and editor
|
|
61
|
+
integrations for Waterloo docstrings.
|
|
@@ -0,0 +1,27 @@
|
|
|
1
|
+
[build-system]
|
|
2
|
+
requires = ["setuptools>=70"]
|
|
3
|
+
build-backend = "setuptools.build_meta"
|
|
4
|
+
|
|
5
|
+
[project]
|
|
6
|
+
name = "python-waterloo-lexer"
|
|
7
|
+
description = "A pygments lexer derived from the Python lexer, for Waterloo docstrings"
|
|
8
|
+
readme = "README_PYPI.md"
|
|
9
|
+
dependencies = ["pygments"]
|
|
10
|
+
requires-python = ">=3.10"
|
|
11
|
+
authors = [{ name = "Uwe" }]
|
|
12
|
+
license = { text = "BSD-2-Clause" }
|
|
13
|
+
classifiers = [
|
|
14
|
+
"Programming Language :: Python :: 3",
|
|
15
|
+
"Programming Language :: Python :: 3.10",
|
|
16
|
+
"Operating System :: OS Independent",
|
|
17
|
+
]
|
|
18
|
+
dynamic = ["version"]
|
|
19
|
+
|
|
20
|
+
[project.entry-points."pygments.lexers"]
|
|
21
|
+
python_waterloo = "python_waterloo_lexer:PythonWaterlooLexer"
|
|
22
|
+
|
|
23
|
+
[tool.setuptools.dynamic]
|
|
24
|
+
version = { attr = "python_waterloo_lexer.__version__" }
|
|
25
|
+
|
|
26
|
+
[tool.setuptools]
|
|
27
|
+
py-modules = ["python_waterloo_lexer"]
|
|
@@ -0,0 +1,74 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: python-waterloo-lexer
|
|
3
|
+
Version: 0.5.0
|
|
4
|
+
Summary: A pygments lexer derived from the Python lexer, for Waterloo docstrings
|
|
5
|
+
Author: Uwe
|
|
6
|
+
License: BSD-2-Clause
|
|
7
|
+
Classifier: Programming Language :: Python :: 3
|
|
8
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Requires-Python: >=3.10
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
Requires-Dist: pygments
|
|
13
|
+
|
|
14
|
+
# Python-Waterloo Lexer
|
|
15
|
+
|
|
16
|
+

|
|
17
|
+

|
|
18
|
+

|
|
19
|
+
|
|
20
|
+
`python-waterloo-lexer` is a Pygments lexer for Python files that contain
|
|
21
|
+
Waterloo docstrings.
|
|
22
|
+
|
|
23
|
+
It can be used with `pygmentize` and other tools that load Pygments lexers via
|
|
24
|
+
entry points.
|
|
25
|
+
|
|
26
|
+
## What it provides
|
|
27
|
+
|
|
28
|
+
- a `python-waterloo` Pygments lexer alias
|
|
29
|
+
- syntax highlighting for Python files with Waterloo docstrings
|
|
30
|
+
- installation via PyPI, local checkout, or Git URL
|
|
31
|
+
|
|
32
|
+
## Installation
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
pip install python-waterloo-lexer
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
## Quick test
|
|
39
|
+
|
|
40
|
+
After installation, the lexer is available under the alias
|
|
41
|
+
`python-waterloo`.
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
pygmentize -l python-waterloo -f terminal16m <file.py>
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
You can also check whether Pygments lists the lexer:
|
|
48
|
+
|
|
49
|
+
```bash
|
|
50
|
+
pygmentize -L lexers | grep -i waterloo || true
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
## Terminal viewer
|
|
54
|
+
|
|
55
|
+
For a quick terminal preview, a `less` alias can be handy:
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
alias lessh='LESSOPEN="| pygmentize -O style=monokai %s" less -M -R'
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
Then open files with:
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
lessh <file.py>
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
## Project repository
|
|
68
|
+
|
|
69
|
+
Development happens in the Waterloo repository:
|
|
70
|
+
|
|
71
|
+
- <https://github.com/uwe-at-sdv/sdv_doc_waterloo>
|
|
72
|
+
|
|
73
|
+
The repository also contains related tooling, documentation, and editor
|
|
74
|
+
integrations for Waterloo docstrings.
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
README.md
|
|
2
|
+
README_PYPI.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
python_waterloo_lexer.py
|
|
5
|
+
python_waterloo_lexer.egg-info/PKG-INFO
|
|
6
|
+
python_waterloo_lexer.egg-info/SOURCES.txt
|
|
7
|
+
python_waterloo_lexer.egg-info/dependency_links.txt
|
|
8
|
+
python_waterloo_lexer.egg-info/entry_points.txt
|
|
9
|
+
python_waterloo_lexer.egg-info/requires.txt
|
|
10
|
+
python_waterloo_lexer.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
pygments
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
python_waterloo_lexer
|
|
@@ -0,0 +1,803 @@
|
|
|
1
|
+
r"""
|
|
2
|
+
Preamble:
|
|
3
|
+
profile:
|
|
4
|
+
module
|
|
5
|
+
normative_sections:
|
|
6
|
+
Contract, Definitions, Public_classes, Public_constants
|
|
7
|
+
Definitions:
|
|
8
|
+
Pos_Role_Substring_Triple:
|
|
9
|
+
A tuple (|var|`position`, |var|`token_type`, |var|`substring`) specifying syntax highlighting:
|
|
10
|
+
# |var|`position`: The starting index of the substring in the original text.
|
|
11
|
+
# |var|`token_type`: A Pygments token class (from |type|`pygments.token`).
|
|
12
|
+
# |var|`substring`: The text to be highlighted.
|
|
13
|
+
Contract:
|
|
14
|
+
general:
|
|
15
|
+
|Must| provide a Python module that defines the PythonWaterlooLexer class for syntax highlighting of Waterloo-docstrings.
|
|
16
|
+
Public_classes:
|
|
17
|
+
PythonWaterlooLexer
|
|
18
|
+
Public_constants:
|
|
19
|
+
RE_SECTION_ALLOWED_NORMATIVE:
|
|
20
|
+
A regular expression pattern that matches section names allowed or mandatory\
|
|
21
|
+
in the "normative_sections" subsection of the Preamble.\
|
|
22
|
+
This expression is governed by the normative rules and must be kept up to date.
|
|
23
|
+
Mandatory : CON-002, CON-005, CON-020, CON-034, CPCL-002, CPCON-002, CPMT-002,\
|
|
24
|
+
CPTYP-002, CPVAR-002, DEF-002, DER-004, FAC-009, MPCL-002, MPCON-002, MPFN-002,\
|
|
25
|
+
MPTYP-002, MPVAR-002, PAR-002, RAI-002, RET-002.
|
|
26
|
+
Accepted : DESC-002, SEE-011.
|
|
27
|
+
Refused : NOTE-002, PRE-002, TERM-002.
|
|
28
|
+
"""
|
|
29
|
+
from __future__ import annotations
|
|
30
|
+
|
|
31
|
+
import sys,re
|
|
32
|
+
import ast
|
|
33
|
+
import textwrap
|
|
34
|
+
import inspect
|
|
35
|
+
from typing import Any, Iterable, Iterator
|
|
36
|
+
|
|
37
|
+
from pygments.lexer import Lexer
|
|
38
|
+
from pygments.lexers.python import PythonLexer
|
|
39
|
+
from pygments.token import Error, Generic, Keyword, Name, String, Literal, Number, Operator
|
|
40
|
+
|
|
41
|
+
|
|
42
|
+
#----- Changelog ----------------------------------------------#
|
|
43
|
+
|
|
44
|
+
__version__ = "0.5.0"
|
|
45
|
+
|
|
46
|
+
#----- Constants ----------------------------------------------#
|
|
47
|
+
RE_SECTION = re.compile(
|
|
48
|
+
r"^\s*(?:"
|
|
49
|
+
r"Preamble|Contract|Parameters|Returns|Raises|Notes|See_also|"
|
|
50
|
+
r"Definitions|Terminology|Description|Derived_from|Factory|"
|
|
51
|
+
r"Public_[A-Za-z_][A-Za-z0-9_]*|"
|
|
52
|
+
r"[A-Za-z_][A-Za-z0-9_]*_overview"
|
|
53
|
+
r"):\s*$"
|
|
54
|
+
)
|
|
55
|
+
RE_SECTION_CAPTURE = re.compile(
|
|
56
|
+
r"^\s*("
|
|
57
|
+
r"Preamble|Contract|Parameters|Returns|Raises|Notes|See_also|"
|
|
58
|
+
r"Definitions|Terminology|Description|Derived_from|Factory|"
|
|
59
|
+
r"Public_[A-Za-z_][A-Za-z0-9_]*|"
|
|
60
|
+
r"[A-Za-z_][A-Za-z0-9_]*_overview"
|
|
61
|
+
r"):\s*$"
|
|
62
|
+
)
|
|
63
|
+
|
|
64
|
+
RE_SECTION_ALLOWED_NORMATIVE = re.compile(
|
|
65
|
+
r"^(?:"
|
|
66
|
+
r"Contract|Parameters|Returns|Raises|Derived_from|"
|
|
67
|
+
r"Definitions|Description|Factory|See_also|"
|
|
68
|
+
r"Public_[A-Za-z_][A-Za-z0-9_]*|"
|
|
69
|
+
r")$"
|
|
70
|
+
)
|
|
71
|
+
RE_SUBSECTION_QUALIFIED_IDENTIFIER = re.compile(
|
|
72
|
+
r"^(\s*)([A-Za-z_][A-Za-z0-9_.]*:)(\s*)$"
|
|
73
|
+
)
|
|
74
|
+
RE_SUBSECTION_ANY = re.compile(
|
|
75
|
+
r"^(\s*)(.+:)(\s*)$"
|
|
76
|
+
)
|
|
77
|
+
# Free-form sections like "Definitions" allow subdivision of
|
|
78
|
+
# text using a single pipe operator on a line.
|
|
79
|
+
RE_TEXTFLOW_MARKER = re.compile(
|
|
80
|
+
r"^\s*(?:\|)\s*$"
|
|
81
|
+
)
|
|
82
|
+
RE_LIST_MARKER = re.compile(
|
|
83
|
+
r"^([ \t]*)([-+*#])(\s+)(.*)$"
|
|
84
|
+
)
|
|
85
|
+
|
|
86
|
+
# 1: Normativity keywords
|
|
87
|
+
# 2: Special values
|
|
88
|
+
# 3,4: |ref| and argument
|
|
89
|
+
# 5,6: |lit| and argument
|
|
90
|
+
# 7,8: |var| and argument
|
|
91
|
+
# 9,10: |type| and argument
|
|
92
|
+
# 11,12: |mod| and argument
|
|
93
|
+
# 13,14: |value| and argument
|
|
94
|
+
# 15,16: |op| and argument
|
|
95
|
+
# 17,18: |func| and argument
|
|
96
|
+
# 19,20: |label| and argument
|
|
97
|
+
# 21,22: |attr| and argument
|
|
98
|
+
# 23,24: |file| and argument
|
|
99
|
+
# 25,26: |dfn| and argument
|
|
100
|
+
# 27,28: |term| and argument
|
|
101
|
+
# 29,30: |cmd| and argument
|
|
102
|
+
# 31,32: |opt| and argument
|
|
103
|
+
# 33,34: |tag| and argument
|
|
104
|
+
# 35,36: |norm| and argument
|
|
105
|
+
# 37,38: |var_type| and argument
|
|
106
|
+
# 39,40: Generic role and argument
|
|
107
|
+
# 41: Line connector
|
|
108
|
+
RE_INLINE = re.compile(
|
|
109
|
+
r"(\|(?:Must|must|Must_not|must_not|Should|should|Should_not|should_not|May|may)\|)"
|
|
110
|
+
r"|(\|(?:Self|None|True|False)\|)"
|
|
111
|
+
r"|(\|ref\|)(`[^`]+`)"
|
|
112
|
+
r"|(\|lit\|)(`[^`]+`)"
|
|
113
|
+
r"|(\|var\|)(`[^`]+`)"
|
|
114
|
+
r"|(\|type\|)(`[^`]+`)"
|
|
115
|
+
r"|(\|mod\|)(`[^`]+`)"
|
|
116
|
+
r"|(\|value\|)(`[^`]+`)"
|
|
117
|
+
r"|(\|op\|)(`[^`]+`)"
|
|
118
|
+
r"|(\|func\|)(`[^`]+`)"
|
|
119
|
+
r"|(\|label\|)(`[^`]+`)"
|
|
120
|
+
r"|(\|attr\|)(`[^`]+`)"
|
|
121
|
+
r"|(\|file\|)(`[^`]+`)"
|
|
122
|
+
r"|(\|dfn\|)(`[^`]+`)"
|
|
123
|
+
r"|(\|term\|)(`[^`]+`)"
|
|
124
|
+
r"|(\|cmd\|)(`[^`]+`)"
|
|
125
|
+
r"|(\|opt\|)(`[^`]+`)"
|
|
126
|
+
r"|(\|tag\|)(`[^`]+`)"
|
|
127
|
+
r"|(\|norm\|)(`[^`]+`)"
|
|
128
|
+
r"|(\|var_type\|)(`[^`]+`)"
|
|
129
|
+
r"|(\|[A-Za-z_][A-Za-z0-9_]*\|)(`[^`]+`)"
|
|
130
|
+
r"|(\\)(?=\s*(?:\n)?$)"
|
|
131
|
+
)
|
|
132
|
+
RE_REF_ARG = re.compile(r"^(.*?)\s*<([^<>]+)>\s*$")
|
|
133
|
+
|
|
134
|
+
SUBSECTIONS_WITH_FREE_FORM_LABELS = (
|
|
135
|
+
"Notes",
|
|
136
|
+
"Terminology"
|
|
137
|
+
)
|
|
138
|
+
|
|
139
|
+
SUBSECTIONS_WITH_CSV_IDENTIFIER_LABELS = (
|
|
140
|
+
"Definitions"
|
|
141
|
+
)
|
|
142
|
+
|
|
143
|
+
PREAMBLE_BUILTIN_SUBSECTIONS = frozenset(("profile", "normative_sections", "status", "scope"))
|
|
144
|
+
CONTRACT_BUILTIN_SUBSECTIONS = frozenset(("general", "constructor", "base", "traits", "invariants", "requires", "ensures"))
|
|
145
|
+
|
|
146
|
+
SECTION_LIST_TYPE_QUALIFIED = frozenset(("Public_classes", "Derived_from"))
|
|
147
|
+
SECTION_LIST_FUNC_QUALIFIED = frozenset(("Public_functions", "Public_methods"))
|
|
148
|
+
SECTION_LIST_GENERIC_QUALIFIED = frozenset(("See_also",))
|
|
149
|
+
|
|
150
|
+
SECTION_MAP_TERM_IDENTIFIER_TO_FREEFORM = frozenset(("Definitions","Terminology"))
|
|
151
|
+
SECTION_MAP_ANY_IDENTIFIER_TO_FREEFORM = frozenset(("Notes",))
|
|
152
|
+
SECTION_MAP_TYPE_IDENTIFIER_TO_FREEFORM = frozenset(("Class_overview", "Public_types"))
|
|
153
|
+
SECTION_MAP_FUNC_IDENTIFIER_TO_FREEFORM = frozenset(("Function_overview", "Method_overview","Factory"))
|
|
154
|
+
SECTION_MAP_VAR_IDENTIFIER_TO_FREEFORM = frozenset(("Public_variables", "Public_constants","Parameters"))
|
|
155
|
+
SECTION_MAP_FUNC_QUALIFIED = frozenset(("Factory",))
|
|
156
|
+
SECTION_MAP_EXCEPTION_QUALIFIED_IDENTIFIER_TO_FREEFORM = frozenset(("Raises",))
|
|
157
|
+
|
|
158
|
+
SECTIONS_WITH_SUBSECTIONS = frozenset((
|
|
159
|
+
"Preamble", "Contract", "Definitions", "Terminology", "Notes", "Parameters", "Raises",
|
|
160
|
+
"Factory", "Class_overview", "Function_overview", "Method_overview",
|
|
161
|
+
"Public_types", "Public_variables", "Public_constants",
|
|
162
|
+
))
|
|
163
|
+
|
|
164
|
+
RE_PREAMBLE = re.compile(r"^\s*Preamble:\s*$")
|
|
165
|
+
RE_CONTRACT = re.compile(r"^\s*Contract:\s*$")
|
|
166
|
+
|
|
167
|
+
#----- Lexer --------------------------------------------------#
|
|
168
|
+
|
|
169
|
+
class PythonWaterlooLexer(PythonLexer):
|
|
170
|
+
r"""
|
|
171
|
+
Preamble:
|
|
172
|
+
profile:
|
|
173
|
+
class
|
|
174
|
+
normative_sections:
|
|
175
|
+
Contract, Public_methods
|
|
176
|
+
Contract:
|
|
177
|
+
general:
|
|
178
|
+
|Must| implement a Pygments lexer class that extends the standard Python\
|
|
179
|
+
lexer to provide syntax highlighting for Waterloo-docstrings.
|
|
180
|
+
|Must| define a static method `analyse_text` that checks if a\
|
|
181
|
+
given string looks like a Waterloo-docstring, returning a float score for lexer selection.
|
|
182
|
+
|Must| implement the `get_tokens_unprocessed` method to yield\
|
|
183
|
+
tokens with appropriate types for different parts of the docstring,\
|
|
184
|
+
such as section headers, normative keywords, and inline markup.
|
|
185
|
+
|Must| use regular expressions to identify section headers and inline markup patterns.
|
|
186
|
+
|Must| ensure that the lexer can be used with Pygments for\
|
|
187
|
+
syntax highlighting in various tools and editors.
|
|
188
|
+
|Must| prioritize this lexer over the standard Python lexer\
|
|
189
|
+
when the input text contains Waterloo-docstring patterns.
|
|
190
|
+
|Must| handle edge cases gracefully, such as docstrings that\
|
|
191
|
+
do not conform to the expected structure or contain mixed indentation.
|
|
192
|
+
|Must| include comprehensive comments and documentation within\
|
|
193
|
+
the code to explain the implementation and usage of the lexer.
|
|
194
|
+
|Must| be implemented in a way that allows for easy maintenance\
|
|
195
|
+
and extension in the future, such as adding support for new sections or inline markup patterns.
|
|
196
|
+
constructor:
|
|
197
|
+
default
|
|
198
|
+
Public_methods:
|
|
199
|
+
highlight_docstring,highlight_line,looks_like_waterloo_docstring,has_mixed_indentation,analyse_text,get_tokens_unprocessed
|
|
200
|
+
"""
|
|
201
|
+
name = "Python-Waterloo"
|
|
202
|
+
aliases = ["python-waterloo"]
|
|
203
|
+
filenames = ["*.py"]
|
|
204
|
+
priority = 0.9
|
|
205
|
+
|
|
206
|
+
def get_tokens_unprocessed(self, text: str) -> Iterator[tuple[int, object, str]]:
|
|
207
|
+
"""
|
|
208
|
+
Preamble:
|
|
209
|
+
profile:
|
|
210
|
+
method
|
|
211
|
+
normative_sections:
|
|
212
|
+
Contract, Definitions, Parameters, Returns, Raises
|
|
213
|
+
scope:
|
|
214
|
+
extension
|
|
215
|
+
Definitions:
|
|
216
|
+
_inherit:
|
|
217
|
+
Pos_Role_Substring_Triple
|
|
218
|
+
Description:
|
|
219
|
+
This is the core tokenization method that Pygments calls to obtain syntax highlighting tokens.
|
|
220
|
+
It iterates over tokens from the parent PythonLexer and intercepts docstrings (String.Doc tokens)
|
|
221
|
+
to apply Waterloo-specific highlighting rules. This method is the entry point for all tokenization
|
|
222
|
+
and determines the highlighting of the entire source code.
|
|
223
|
+
Contract:
|
|
224
|
+
general:
|
|
225
|
+
|Must| analyze the provided string.
|
|
226
|
+
|Must| iterate over the input lines to identify Waterloo-specific tokens.
|
|
227
|
+
|Must| generate a |term|`Pos_Role_Substring_Triple` for each matching token.
|
|
228
|
+
|Must| fall back to default highlighting for non-matching docstring parts.
|
|
229
|
+
Parameters:
|
|
230
|
+
text:
|
|
231
|
+
The input text to tokenize.
|
|
232
|
+
Returns:
|
|
233
|
+
An iterable of |term|`Pos_Role_Substring_Triple` tuples.
|
|
234
|
+
Raises:
|
|
235
|
+
BaseException:
|
|
236
|
+
|May| propagate from the pygments module.
|
|
237
|
+
"""
|
|
238
|
+
self._current_section = ""
|
|
239
|
+
self._current_subsection = ""
|
|
240
|
+
for index, ttype, value in super().get_tokens_unprocessed(text):
|
|
241
|
+
if ttype in String.Doc: # oder is String.Doc
|
|
242
|
+
yield from self.highlight_docstring(index, value)
|
|
243
|
+
else:
|
|
244
|
+
yield index, ttype, value
|
|
245
|
+
|
|
246
|
+
# waterlint demands profile function because internally this
|
|
247
|
+
# seems to be a function make_analysator.<locals>.text_analyse.
|
|
248
|
+
# Not sure how this remapping works, but to satisfy waterlint
|
|
249
|
+
# we need to comment it with profile function, even though it is a static method.
|
|
250
|
+
@staticmethod
|
|
251
|
+
def analyse_text(text: str) -> float:
|
|
252
|
+
"""
|
|
253
|
+
Preamble:
|
|
254
|
+
profile:
|
|
255
|
+
function
|
|
256
|
+
normative_sections:
|
|
257
|
+
Contract, Parameters, Returns, Raises
|
|
258
|
+
Description:
|
|
259
|
+
This method is called by Pygments during lexer selection to prioritize
|
|
260
|
+
the PythonWaterlooLexer over the standard PythonLexer.
|
|
261
|
+
It analyzes the input text for patterns indicative of Waterloo-docstrings,
|
|
262
|
+
such as the presence of "Preamble:" and "Contract:" sections, ensuring
|
|
263
|
+
that files containing structured docstrings are highlighted
|
|
264
|
+
appropriately by this specialized lexer.
|
|
265
|
+
Contract:
|
|
266
|
+
general:
|
|
267
|
+
|Must| determine if the given text resembles a Waterloo-docstring.
|
|
268
|
+
|Must| perform a quick check for presence of "Preamble:" and "Contract:".
|
|
269
|
+
|Must| conduct a detailed line-by-line validation using regex patterns.
|
|
270
|
+
|Must| return 1.0 if both sections are properly matched, otherwise 0.0.
|
|
271
|
+
Parameters:
|
|
272
|
+
text:
|
|
273
|
+
The input text to analyze for Waterloo-docstring characteristics.
|
|
274
|
+
Returns:
|
|
275
|
+
A float value: 1.0 if the text is identified as a Waterloo-docstring, 0.0 otherwise.
|
|
276
|
+
Raises:
|
|
277
|
+
BaseException:
|
|
278
|
+
|May| propagate exceptions from the |mod|`re` module.
|
|
279
|
+
"""
|
|
280
|
+
# This is important in order to priorize our Lexer over the standard Python lexer.
|
|
281
|
+
# Quick check
|
|
282
|
+
if "Preamble:" not in text or "Contract:" not in text:
|
|
283
|
+
return 0.0
|
|
284
|
+
# Closer look
|
|
285
|
+
found_preamble = False
|
|
286
|
+
found_contract = False
|
|
287
|
+
for line in text.splitlines():
|
|
288
|
+
if not found_preamble:
|
|
289
|
+
if "Preamble:" in line:
|
|
290
|
+
if RE_PREAMBLE.match(line):
|
|
291
|
+
found_preamble = True
|
|
292
|
+
if not found_contract:
|
|
293
|
+
if "Contract:" in line:
|
|
294
|
+
if RE_CONTRACT.match(line):
|
|
295
|
+
found_contract = True
|
|
296
|
+
if found_preamble and found_contract:
|
|
297
|
+
return 1.0
|
|
298
|
+
return 0.0
|
|
299
|
+
|
|
300
|
+
def highlight_docstring(self, base: int, text: str) -> Iterable[tuple[int, object, str]]:
|
|
301
|
+
"""
|
|
302
|
+
Preamble:
|
|
303
|
+
profile:
|
|
304
|
+
method
|
|
305
|
+
normative_sections:
|
|
306
|
+
Definitions, Contract, Parameters, Returns, Raises
|
|
307
|
+
Definitions:
|
|
308
|
+
_inherit:
|
|
309
|
+
Pos_Role_Substring_Triple
|
|
310
|
+
Contract:
|
|
311
|
+
general:
|
|
312
|
+
|Must| check if the given text looks like a Waterloo-docstring.
|
|
313
|
+
|Must| if it does not look like a Waterloo-docstring, yield the entire text as a single String.Doc token.
|
|
314
|
+
|Must| if it looks like a Waterloo-docstring, reset the parser state and analyze the text line by line.
|
|
315
|
+
|Must| for each line, call the |func|`highlight_line` method to identify and yield tokens for that line.
|
|
316
|
+
Parameters:
|
|
317
|
+
base:
|
|
318
|
+
The base index for token positions in the original text.
|
|
319
|
+
text:
|
|
320
|
+
The docstring text to analyze and tokenize.
|
|
321
|
+
Returns:
|
|
322
|
+
An iterable of |term|`Pos_Role_Substring_Triple` tuples for the given docstring.
|
|
323
|
+
Raises:
|
|
324
|
+
BaseException:
|
|
325
|
+
|May| propagate exceptions from the |mod|`re` module.
|
|
326
|
+
|May| propagate exceptions from the |mod|`pygments` module.
|
|
327
|
+
"""
|
|
328
|
+
if not self.looks_like_waterloo_docstring(text):
|
|
329
|
+
yield base, String.Doc, text
|
|
330
|
+
return
|
|
331
|
+
|
|
332
|
+
# Reset parser state per docstring to avoid carry-over between unrelated docstrings.
|
|
333
|
+
self._current_section = ""
|
|
334
|
+
self._current_subsection = ""
|
|
335
|
+
pos = 0
|
|
336
|
+
while pos < len(text):
|
|
337
|
+
nl = text.find("\n", pos)
|
|
338
|
+
# Make sure we identify the last line correctly, even without a trailing newline character
|
|
339
|
+
if nl < 0:
|
|
340
|
+
line = text[pos:]
|
|
341
|
+
next_pos = len(text)
|
|
342
|
+
else:
|
|
343
|
+
# The position is the character after the newline character.
|
|
344
|
+
line = text[pos : nl + 1]
|
|
345
|
+
next_pos = nl + 1
|
|
346
|
+
# We have identified a line, now find out how to highlight.
|
|
347
|
+
yield from self.highlight_line(base + pos, line)
|
|
348
|
+
# Advance to next line.
|
|
349
|
+
pos = next_pos
|
|
350
|
+
|
|
351
|
+
def _find_subsection_match(self, stripped: str) -> re.Match[str] | None:
|
|
352
|
+
if self._current_section in SUBSECTIONS_WITH_FREE_FORM_LABELS:
|
|
353
|
+
return RE_SUBSECTION_ANY.match(stripped)
|
|
354
|
+
elif self._current_section in SUBSECTIONS_WITH_CSV_IDENTIFIER_LABELS:
|
|
355
|
+
return RE_SUBSECTION_ANY.match(stripped)
|
|
356
|
+
return RE_SUBSECTION_QUALIFIED_IDENTIFIER.match(stripped)
|
|
357
|
+
|
|
358
|
+
def _token_for_subsection_label(self, section: str, subsection: str) -> Any:
|
|
359
|
+
if section == "Preamble":
|
|
360
|
+
if subsection in PREAMBLE_BUILTIN_SUBSECTIONS:
|
|
361
|
+
return Generic.Subheading
|
|
362
|
+
return Error
|
|
363
|
+
if section == "Contract":
|
|
364
|
+
if subsection in CONTRACT_BUILTIN_SUBSECTIONS:
|
|
365
|
+
return Generic.Subheading
|
|
366
|
+
return Error
|
|
367
|
+
if section == "Definitions":
|
|
368
|
+
if subsection == "_inherit":
|
|
369
|
+
return Keyword
|
|
370
|
+
else:
|
|
371
|
+
return Generic.Emph
|
|
372
|
+
if section == "Terminology":
|
|
373
|
+
return Generic.Emph
|
|
374
|
+
if section in SECTION_MAP_TYPE_IDENTIFIER_TO_FREEFORM:
|
|
375
|
+
return Name.Class
|
|
376
|
+
if section in SECTION_MAP_FUNC_IDENTIFIER_TO_FREEFORM:
|
|
377
|
+
return Name.Function
|
|
378
|
+
if section in SECTION_MAP_VAR_IDENTIFIER_TO_FREEFORM:
|
|
379
|
+
return Name.Variable
|
|
380
|
+
if section in SECTION_MAP_FUNC_QUALIFIED:
|
|
381
|
+
return Name.Function
|
|
382
|
+
if section in SECTION_MAP_EXCEPTION_QUALIFIED_IDENTIFIER_TO_FREEFORM:
|
|
383
|
+
return Name.Exception
|
|
384
|
+
return Generic.Heading
|
|
385
|
+
|
|
386
|
+
def _emit_subsection_line(self, base: int, line: str, match: re.Match[str]) -> Iterable[tuple[int, object, str]]:
|
|
387
|
+
stripped_no_nl = line.rstrip("\r\n")
|
|
388
|
+
line_ending = line[len(stripped_no_nl):]
|
|
389
|
+
prefix, label, suffix = match.groups()
|
|
390
|
+
self._current_subsection = label[:-1].strip()
|
|
391
|
+
cur = base
|
|
392
|
+
if prefix:
|
|
393
|
+
yield cur, String.Doc, prefix
|
|
394
|
+
cur += len(prefix)
|
|
395
|
+
token = self._token_for_subsection_label(self._current_section, self._current_subsection)
|
|
396
|
+
yield cur, token, label
|
|
397
|
+
cur += len(label)
|
|
398
|
+
if suffix:
|
|
399
|
+
yield cur, String.Doc, suffix
|
|
400
|
+
cur += len(suffix)
|
|
401
|
+
if line_ending:
|
|
402
|
+
yield cur, String.Doc, line_ending
|
|
403
|
+
|
|
404
|
+
def _emit_normative_sections_line(self, base: int, line: str) -> Iterable[tuple[int, object, str]]:
|
|
405
|
+
stripped_no_nl = line.rstrip("\r\n")
|
|
406
|
+
line_ending = line[len(stripped_no_nl):]
|
|
407
|
+
cur = base
|
|
408
|
+
|
|
409
|
+
for part in re.split(r"(,)", stripped_no_nl):
|
|
410
|
+
if part == "":
|
|
411
|
+
continue
|
|
412
|
+
if part == ",":
|
|
413
|
+
yield cur, String.Doc, part
|
|
414
|
+
cur += len(part)
|
|
415
|
+
continue
|
|
416
|
+
|
|
417
|
+
leading_len = len(part) - len(part.lstrip())
|
|
418
|
+
trailing_len = len(part) - len(part.rstrip())
|
|
419
|
+
ident = part.strip()
|
|
420
|
+
|
|
421
|
+
if leading_len:
|
|
422
|
+
yield cur, String.Doc, part[:leading_len]
|
|
423
|
+
cur += leading_len
|
|
424
|
+
if ident:
|
|
425
|
+
if RE_SECTION_ALLOWED_NORMATIVE.fullmatch(ident):
|
|
426
|
+
yield cur, Name.Constant, ident
|
|
427
|
+
else:
|
|
428
|
+
yield cur, Error, ident
|
|
429
|
+
cur += len(ident)
|
|
430
|
+
if trailing_len:
|
|
431
|
+
yield cur, String.Doc, part[-trailing_len:]
|
|
432
|
+
cur += trailing_len
|
|
433
|
+
|
|
434
|
+
if line_ending:
|
|
435
|
+
yield cur, String.Doc, line_ending
|
|
436
|
+
|
|
437
|
+
def _emit_csv_identifiers_line(self, base: int, line: str, token_for_identifier: Any) -> Iterable[tuple[int, object, str]]:
|
|
438
|
+
stripped_no_nl = line.rstrip("\r\n")
|
|
439
|
+
line_ending = line[len(stripped_no_nl):]
|
|
440
|
+
cur = base
|
|
441
|
+
|
|
442
|
+
for part in re.split(r"(,)", stripped_no_nl):
|
|
443
|
+
if part == "":
|
|
444
|
+
continue
|
|
445
|
+
if part == ",":
|
|
446
|
+
yield cur, String.Doc, part
|
|
447
|
+
cur += len(part)
|
|
448
|
+
continue
|
|
449
|
+
|
|
450
|
+
leading_len = len(part) - len(part.lstrip())
|
|
451
|
+
trailing_len = len(part) - len(part.rstrip())
|
|
452
|
+
ident = part.strip()
|
|
453
|
+
|
|
454
|
+
if leading_len:
|
|
455
|
+
yield cur, String.Doc, part[:leading_len]
|
|
456
|
+
cur += leading_len
|
|
457
|
+
if ident:
|
|
458
|
+
yield cur, token_for_identifier, ident
|
|
459
|
+
cur += len(ident)
|
|
460
|
+
if trailing_len:
|
|
461
|
+
yield cur, String.Doc, part[-trailing_len:]
|
|
462
|
+
cur += trailing_len
|
|
463
|
+
|
|
464
|
+
if line_ending:
|
|
465
|
+
yield cur, String.Doc, line_ending
|
|
466
|
+
|
|
467
|
+
def _emit_ref_arg(self, arg_start: int, arg: str) -> Iterable[tuple[int, object, str]]:
|
|
468
|
+
if len(arg) < 2 or arg[0] != "`" or arg[-1] != "`":
|
|
469
|
+
yield arg_start, Error, arg
|
|
470
|
+
return
|
|
471
|
+
|
|
472
|
+
inner = arg[1:-1]
|
|
473
|
+
m_ref = RE_REF_ARG.match(inner)
|
|
474
|
+
if m_ref is None:
|
|
475
|
+
yield arg_start, Error, arg
|
|
476
|
+
return
|
|
477
|
+
|
|
478
|
+
ref_name, ref_target = m_ref.groups()
|
|
479
|
+
pos = arg_start
|
|
480
|
+
yield pos, String, "`"
|
|
481
|
+
pos += 1
|
|
482
|
+
if ref_name:
|
|
483
|
+
yield pos, Name.Namespace, ref_name
|
|
484
|
+
pos += len(ref_name)
|
|
485
|
+
yield pos, String, " <"
|
|
486
|
+
pos += 2
|
|
487
|
+
yield pos, Name.Tag, ref_target
|
|
488
|
+
pos += len(ref_target)
|
|
489
|
+
yield pos, String, ">`"
|
|
490
|
+
|
|
491
|
+
def _emit_inline_line(self, base: int, line: str) -> Iterable[tuple[int, object, str]]:
|
|
492
|
+
cur = 0
|
|
493
|
+
for m in RE_INLINE.finditer(line):
|
|
494
|
+
if m.start() > cur:
|
|
495
|
+
yield base + cur, String.Doc, line[cur : m.start()]
|
|
496
|
+
|
|
497
|
+
token_txt = m.group(0)
|
|
498
|
+
if m.group(1) is not None:
|
|
499
|
+
yield base + m.start(), Keyword, m.group(1)
|
|
500
|
+
elif m.group(2) is not None:
|
|
501
|
+
yield base + m.start(), Keyword.Constant, m.group(2)
|
|
502
|
+
elif m.group(3) is not None and m.group(4) is not None:
|
|
503
|
+
yield base + m.start(), Keyword, m.group(3)
|
|
504
|
+
arg_start = base + m.start() + len(m.group(3))
|
|
505
|
+
yield from self._emit_ref_arg(arg_start, m.group(4))
|
|
506
|
+
elif m.group(5) is not None and m.group(6) is not None: # lit
|
|
507
|
+
yield base + m.start(), Keyword, m.group(5)
|
|
508
|
+
yield base + m.start() + len(m.group(5)), Literal, m.group(6)
|
|
509
|
+
elif m.group(7) is not None and m.group(8) is not None: # var
|
|
510
|
+
yield base + m.start(), Keyword, m.group(7)
|
|
511
|
+
yield base + m.start() + len(m.group(7)), Name.Variable, m.group(8)
|
|
512
|
+
elif m.group(9) is not None and m.group(10) is not None: # type
|
|
513
|
+
yield base + m.start(), Keyword, m.group(9)
|
|
514
|
+
yield base + m.start() + len(m.group(9)), Name.Class, m.group(10)
|
|
515
|
+
elif m.group(11) is not None and m.group(12) is not None: # mod
|
|
516
|
+
yield base + m.start(), Keyword, m.group(11)
|
|
517
|
+
yield base + m.start() + len(m.group(11)), Name.Namespace, m.group(12)
|
|
518
|
+
elif m.group(13) is not None and m.group(14) is not None: # value
|
|
519
|
+
yield base + m.start(), Keyword, m.group(13)
|
|
520
|
+
yield base + m.start() + len(m.group(13)), Name.Constant, m.group(14)
|
|
521
|
+
elif m.group(15) is not None and m.group(16) is not None: # op
|
|
522
|
+
yield base + m.start(), Keyword, m.group(15)
|
|
523
|
+
yield base + m.start() + len(m.group(15)), Operator, m.group(16)
|
|
524
|
+
elif m.group(17) is not None and m.group(18) is not None: # func
|
|
525
|
+
yield base + m.start(), Keyword, m.group(17)
|
|
526
|
+
yield base + m.start() + len(m.group(17)), Name.Function, m.group(18)
|
|
527
|
+
elif m.group(19) is not None and m.group(20) is not None: # label
|
|
528
|
+
yield base + m.start(), Keyword, m.group(19)
|
|
529
|
+
yield base + m.start() + len(m.group(19)), Generic.Subheading, m.group(20)
|
|
530
|
+
elif m.group(21) is not None and m.group(22) is not None: # attr
|
|
531
|
+
yield base + m.start(), Keyword, m.group(21)
|
|
532
|
+
yield base + m.start() + len(m.group(21)), Name.Attribute, m.group(22)
|
|
533
|
+
elif m.group(23) is not None and m.group(24) is not None: # file
|
|
534
|
+
yield base + m.start(), Keyword, m.group(23)
|
|
535
|
+
yield base + m.start() + len(m.group(23)), String.Other, m.group(24)
|
|
536
|
+
elif m.group(25) is not None and m.group(26) is not None: # dfn
|
|
537
|
+
yield base + m.start(), Keyword, m.group(25)
|
|
538
|
+
yield base + m.start() + len(m.group(25)), Generic.Emph, m.group(26)
|
|
539
|
+
elif m.group(27) is not None and m.group(28) is not None: # term
|
|
540
|
+
yield base + m.start(), Keyword, m.group(27)
|
|
541
|
+
yield base + m.start() + len(m.group(27)), Generic.Emph, m.group(28)
|
|
542
|
+
elif m.group(29) is not None and m.group(30) is not None: # cmd
|
|
543
|
+
yield base + m.start(), Keyword, m.group(29)
|
|
544
|
+
yield base + m.start() + len(m.group(29)), Name.Builtin, m.group(30)
|
|
545
|
+
elif m.group(31) is not None and m.group(32) is not None: # opt
|
|
546
|
+
yield base + m.start(), Keyword, m.group(31)
|
|
547
|
+
yield base + m.start() + len(m.group(31)), Name.Variable, m.group(32)
|
|
548
|
+
elif m.group(33) is not None and m.group(34) is not None: # tag
|
|
549
|
+
yield base + m.start(), Keyword, m.group(33)
|
|
550
|
+
yield base + m.start() + len(m.group(33)), Name.Tag, m.group(34)
|
|
551
|
+
elif m.group(35) is not None and m.group(36) is not None: # norm
|
|
552
|
+
yield base + m.start(), Keyword, m.group(35)
|
|
553
|
+
yield base + m.start() + len(m.group(35)), Keyword, m.group(36)
|
|
554
|
+
elif m.group(37) is not None and m.group(38) is not None: # var_type
|
|
555
|
+
yield base + m.start(), Keyword, m.group(37)
|
|
556
|
+
yield base + m.start() + len(m.group(37)), Name.Class, m.group(38)
|
|
557
|
+
elif m.group(39) is not None and m.group(40) is not None: # generic role
|
|
558
|
+
yield base + m.start(), String.Doc, m.group(39)
|
|
559
|
+
yield base + m.start() + len(m.group(39)), String.Doc, m.group(40)
|
|
560
|
+
elif m.group(41) is not None:
|
|
561
|
+
yield base + m.start(), Keyword, m.group(41)
|
|
562
|
+
else:
|
|
563
|
+
yield base + m.start(), Name.Constant, token_txt
|
|
564
|
+
cur = m.end()
|
|
565
|
+
|
|
566
|
+
if cur < len(line):
|
|
567
|
+
yield base + cur, String.Doc, line[cur:]
|
|
568
|
+
|
|
569
|
+
def highlight_line(self, base: int, line: str) -> Iterable[tuple[int, object, str]]:
|
|
570
|
+
"""
|
|
571
|
+
Preamble:
|
|
572
|
+
profile:
|
|
573
|
+
method
|
|
574
|
+
normative_sections:
|
|
575
|
+
Definitions, Contract, Parameters, Returns, Raises
|
|
576
|
+
Definitions:
|
|
577
|
+
_inherit:
|
|
578
|
+
Pos_Role_Substring_Triple
|
|
579
|
+
Contract:
|
|
580
|
+
general:
|
|
581
|
+
|Must| analyze the given line in the context of the current section and subsection.
|
|
582
|
+
|Must| identify section headers and update the current section state.
|
|
583
|
+
|Must| identify subsection headers if the current section allows subsections, and update the current subsection state.
|
|
584
|
+
|Must| identify special markers such as text flow markers and apply appropriate token types.
|
|
585
|
+
|Must| apply specific tokenization rules based on the current section and subsection, such as treating certain lines as lists of identifiers or free-form text.
|
|
586
|
+
|Must| yield tokens with appropriate types for the identified elements in the line.
|
|
587
|
+
Parameters:
|
|
588
|
+
base:
|
|
589
|
+
The base index for token positions in the original text.
|
|
590
|
+
line:
|
|
591
|
+
The line of text to analyze and tokenize.
|
|
592
|
+
Returns:
|
|
593
|
+
An iterable of |term|`Pos_Role_Substring_Triple` tuples for the given line.
|
|
594
|
+
Raises:
|
|
595
|
+
BaseException:
|
|
596
|
+
|May| propagate exceptions from the |mod|`re` module.
|
|
597
|
+
"""
|
|
598
|
+
stripped = line.rstrip("\r\n")
|
|
599
|
+
# Analyze for section labels
|
|
600
|
+
m_sec = RE_SECTION_CAPTURE.match(stripped)
|
|
601
|
+
if m_sec is not None:
|
|
602
|
+
self._current_section = m_sec.group(1)
|
|
603
|
+
self._current_subsection = ""
|
|
604
|
+
yield base, Generic.Heading, line
|
|
605
|
+
return
|
|
606
|
+
# Analyze for subsection labels
|
|
607
|
+
if self._current_section in SECTIONS_WITH_SUBSECTIONS:
|
|
608
|
+
m = self._find_subsection_match(stripped)
|
|
609
|
+
if m is not None:
|
|
610
|
+
yield from self._emit_subsection_line(base, line, m)
|
|
611
|
+
return
|
|
612
|
+
# Analyze for paragraph marker
|
|
613
|
+
if RE_TEXTFLOW_MARKER.fullmatch(stripped):
|
|
614
|
+
yield base, Keyword, line
|
|
615
|
+
return
|
|
616
|
+
# Analyze for bullet list marker
|
|
617
|
+
line_no_nl = line.rstrip("\r\n")
|
|
618
|
+
line_ending = line[len(line_no_nl):]
|
|
619
|
+
m_bullet = RE_LIST_MARKER.match(line_no_nl)
|
|
620
|
+
if m_bullet is not None:
|
|
621
|
+
indent = m_bullet.group(1)
|
|
622
|
+
marker = m_bullet.group(2)
|
|
623
|
+
gap = m_bullet.group(3)
|
|
624
|
+
rest = m_bullet.group(4)
|
|
625
|
+
prefix_len = len(indent) + len(marker) + len(gap)
|
|
626
|
+
if indent:
|
|
627
|
+
yield base, String.Doc, indent
|
|
628
|
+
yield base + len(indent), Keyword, marker
|
|
629
|
+
if gap:
|
|
630
|
+
yield base + len(indent) + len(marker), String.Doc, gap
|
|
631
|
+
if rest:
|
|
632
|
+
yield from self._emit_inline_line(base + prefix_len, rest)
|
|
633
|
+
if line_ending:
|
|
634
|
+
yield base + len(line_no_nl), String.Doc, line_ending
|
|
635
|
+
return
|
|
636
|
+
|
|
637
|
+
# Special handling of "normative_sections"
|
|
638
|
+
if self._current_section == "Preamble":
|
|
639
|
+
if self._current_subsection == "normative_sections":
|
|
640
|
+
yield from self._emit_normative_sections_line(base, line)
|
|
641
|
+
return
|
|
642
|
+
elif self._current_subsection in ("status", "scope"):
|
|
643
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Constant)
|
|
644
|
+
return
|
|
645
|
+
elif self._current_subsection == "profile":
|
|
646
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Constant)
|
|
647
|
+
return
|
|
648
|
+
else:
|
|
649
|
+
yield from self._emit_inline_line(base, line)
|
|
650
|
+
return
|
|
651
|
+
if self._current_section == "Contract":
|
|
652
|
+
if self._current_subsection == "traits":
|
|
653
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Constant)
|
|
654
|
+
return
|
|
655
|
+
elif self._current_subsection == "base":
|
|
656
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Class)
|
|
657
|
+
return
|
|
658
|
+
else:
|
|
659
|
+
yield from self._emit_inline_line(base, line)
|
|
660
|
+
return
|
|
661
|
+
|
|
662
|
+
if self._current_section in SECTION_LIST_TYPE_QUALIFIED:
|
|
663
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Class)
|
|
664
|
+
return
|
|
665
|
+
if self._current_section in SECTION_LIST_FUNC_QUALIFIED:
|
|
666
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Function)
|
|
667
|
+
return
|
|
668
|
+
if self._current_section in SECTION_LIST_GENERIC_QUALIFIED:
|
|
669
|
+
yield from self._emit_csv_identifiers_line(base, line, Name.Variable)
|
|
670
|
+
return
|
|
671
|
+
if self._current_section in SECTION_MAP_EXCEPTION_QUALIFIED_IDENTIFIER_TO_FREEFORM:
|
|
672
|
+
yield from self._emit_inline_line(base, line)
|
|
673
|
+
return
|
|
674
|
+
if self._current_section in SECTION_MAP_TERM_IDENTIFIER_TO_FREEFORM:
|
|
675
|
+
yield from self._emit_inline_line(base, line)
|
|
676
|
+
return
|
|
677
|
+
if self._current_section in SECTION_MAP_ANY_IDENTIFIER_TO_FREEFORM:
|
|
678
|
+
yield from self._emit_inline_line(base, line)
|
|
679
|
+
return
|
|
680
|
+
if self._current_section in SECTION_MAP_VAR_IDENTIFIER_TO_FREEFORM:
|
|
681
|
+
yield from self._emit_inline_line(base, line)
|
|
682
|
+
return
|
|
683
|
+
if self._current_section in SECTION_MAP_TYPE_IDENTIFIER_TO_FREEFORM:
|
|
684
|
+
yield from self._emit_inline_line(base, line)
|
|
685
|
+
return
|
|
686
|
+
if self._current_section in SECTION_MAP_FUNC_IDENTIFIER_TO_FREEFORM:
|
|
687
|
+
yield from self._emit_inline_line(base, line)
|
|
688
|
+
return
|
|
689
|
+
if self._current_section in ("Returns","Description"):
|
|
690
|
+
yield from self._emit_inline_line(base, line)
|
|
691
|
+
return
|
|
692
|
+
|
|
693
|
+
yield from self._emit_inline_line(base, line)
|
|
694
|
+
|
|
695
|
+
@staticmethod
|
|
696
|
+
def has_mixed_indentation(text: str) -> bool:
|
|
697
|
+
r"""
|
|
698
|
+
Preamble:
|
|
699
|
+
profile:
|
|
700
|
+
method
|
|
701
|
+
normative_sections:
|
|
702
|
+
Contract, Parameters, Returns, Raises
|
|
703
|
+
Contract:
|
|
704
|
+
general:
|
|
705
|
+
|Must| check for the presence of mixed indentation in the text.
|
|
706
|
+
|Must| consider a line to have mixed indentation if it contains\
|
|
707
|
+
both tabs and spaces in the leading whitespace.
|
|
708
|
+
|Must| consider the text to have mixed indentation if there are\
|
|
709
|
+
some lines that use tabs for indentation and some lines that\
|
|
710
|
+
use spaces, even if no individual line has mixed indentation.
|
|
711
|
+
Parameters:
|
|
712
|
+
text:
|
|
713
|
+
The string to analyze for mixed indentation.
|
|
714
|
+
Returns:
|
|
715
|
+
|True| if the text has mixed indentation, |False| otherwise.
|
|
716
|
+
Raises:
|
|
717
|
+
BaseException:
|
|
718
|
+
|May| propagate from module |mod|`re`.
|
|
719
|
+
|May| propagate from module |mod|`sys`.
|
|
720
|
+
"""
|
|
721
|
+
uses_tabs = False
|
|
722
|
+
uses_spaces = False
|
|
723
|
+
for line in text.splitlines():
|
|
724
|
+
if not line.strip():
|
|
725
|
+
continue
|
|
726
|
+
indent = line[: len(line) - len(line.lstrip())]
|
|
727
|
+
if not indent:
|
|
728
|
+
continue
|
|
729
|
+
if " " in indent and "\t" in indent:
|
|
730
|
+
return True
|
|
731
|
+
if "\t" in indent:
|
|
732
|
+
uses_tabs = True
|
|
733
|
+
if " " in indent:
|
|
734
|
+
uses_spaces = True
|
|
735
|
+
if uses_tabs and uses_spaces:
|
|
736
|
+
return True
|
|
737
|
+
return False
|
|
738
|
+
|
|
739
|
+
@staticmethod
|
|
740
|
+
def looks_like_waterloo_docstring(text: str) -> bool:
|
|
741
|
+
r"""
|
|
742
|
+
Preamble:
|
|
743
|
+
profile:
|
|
744
|
+
method
|
|
745
|
+
normative_sections:
|
|
746
|
+
Contract, Parameters, Returns, Raises
|
|
747
|
+
scope:
|
|
748
|
+
core
|
|
749
|
+
Description:
|
|
750
|
+
This method performs a heuristic analysis of the input text
|
|
751
|
+
to determine if it resembles a Waterloo-docstring.
|
|
752
|
+
For a full and exact validation use the tool |cmd|`waterlint` instead,
|
|
753
|
+
which implements the complete set of normative rules for Waterloo-docstrings.
|
|
754
|
+
Contract:
|
|
755
|
+
general:
|
|
756
|
+
|Must| look for the presence of a |label|`Preamble` and |label|`Contract` section (PRE-001, CON-001).
|
|
757
|
+
|Must| check that the section labels match the expected format (PRSR-001, PRSR-002, PRSR-003, PRSR-004, PRSR-005).
|
|
758
|
+
|Must| check that the |label|`Preamble` and |label|`Contract` sections appear in the correct order (PRE-001).
|
|
759
|
+
|Must| check that there is no mixed indentation in the docstring (TKN-003).
|
|
760
|
+
Parameters:
|
|
761
|
+
text:
|
|
762
|
+
The string to analyze
|
|
763
|
+
Returns:
|
|
764
|
+
|True| if the string looks like a Waterloo docstring, |False| otherwise.
|
|
765
|
+
Raises:
|
|
766
|
+
BaseException:
|
|
767
|
+
|May| propagate from module |mod|`re`.
|
|
768
|
+
"""
|
|
769
|
+
found_preamble = False
|
|
770
|
+
found_contract = False
|
|
771
|
+
for line in text.splitlines():
|
|
772
|
+
if not found_preamble and RE_PREAMBLE.match(line):
|
|
773
|
+
found_preamble = True
|
|
774
|
+
if not found_contract and RE_CONTRACT.match(line):
|
|
775
|
+
if not found_preamble:
|
|
776
|
+
return False
|
|
777
|
+
found_contract = True
|
|
778
|
+
if found_preamble and found_contract:
|
|
779
|
+
if PythonWaterlooLexer.has_mixed_indentation(text):
|
|
780
|
+
return False
|
|
781
|
+
return True
|
|
782
|
+
return False
|
|
783
|
+
|
|
784
|
+
def _patch_analyse_text_docstring() -> None:
|
|
785
|
+
try:
|
|
786
|
+
src = inspect.getsource(PythonWaterlooLexer)
|
|
787
|
+
tree = ast.parse(textwrap.dedent(src))
|
|
788
|
+
for node in tree.body:
|
|
789
|
+
if isinstance(node, ast.ClassDef) and node.name == "PythonWaterlooLexer":
|
|
790
|
+
for child in node.body:
|
|
791
|
+
if isinstance(child, ast.FunctionDef) and child.name == "analyse_text":
|
|
792
|
+
doc = ast.get_docstring(child, clean=False)
|
|
793
|
+
if doc:
|
|
794
|
+
PythonWaterlooLexer.analyse_text.__doc__ = doc
|
|
795
|
+
return
|
|
796
|
+
except Exception:
|
|
797
|
+
pass
|
|
798
|
+
|
|
799
|
+
_patch_analyse_text_docstring()
|
|
800
|
+
|
|
801
|
+
# Don't change this.
|
|
802
|
+
if __name__ == "__main__":
|
|
803
|
+
print(__version__)
|