ljavalang 2.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- ljavalang-2.0.1/LICENSE.txt +19 -0
- ljavalang-2.0.1/PKG-INFO +279 -0
- ljavalang-2.0.1/README.md +252 -0
- ljavalang-2.0.1/README.rst +158 -0
- ljavalang-2.0.1/javalang/__init__.py +8 -0
- ljavalang-2.0.1/javalang/ast.py +86 -0
- ljavalang-2.0.1/javalang/javadoc.py +120 -0
- ljavalang-2.0.1/javalang/parse.py +53 -0
- ljavalang-2.0.1/javalang/parser.py +2884 -0
- ljavalang-2.0.1/javalang/test/__init__.py +0 -0
- ljavalang-2.0.1/javalang/test/source/package-info/AnnotationJavadoc.java +5 -0
- ljavalang-2.0.1/javalang/test/source/package-info/AnnotationOnly.java +2 -0
- ljavalang-2.0.1/javalang/test/source/package-info/JavadocAnnotation.java +5 -0
- ljavalang-2.0.1/javalang/test/source/package-info/JavadocOnly.java +4 -0
- ljavalang-2.0.1/javalang/test/source/package-info/NoAnnotationNoJavadoc.java +1 -0
- ljavalang-2.0.1/javalang/test/test_java_10_11_syntax.py +66 -0
- ljavalang-2.0.1/javalang/test/test_java_14_15_syntax.py +195 -0
- ljavalang-2.0.1/javalang/test/test_java_16_17_syntax.py +99 -0
- ljavalang-2.0.1/javalang/test/test_java_21_syntax.py +140 -0
- ljavalang-2.0.1/javalang/test/test_java_8_syntax.py +241 -0
- ljavalang-2.0.1/javalang/test/test_java_9_syntax.py +115 -0
- ljavalang-2.0.1/javalang/test/test_javadoc.py +14 -0
- ljavalang-2.0.1/javalang/test/test_package_declaration.py +61 -0
- ljavalang-2.0.1/javalang/test/test_tokenizer.py +192 -0
- ljavalang-2.0.1/javalang/test/test_upstream_features.py +120 -0
- ljavalang-2.0.1/javalang/test/test_upstream_issues.py +176 -0
- ljavalang-2.0.1/javalang/test/test_util.py +69 -0
- ljavalang-2.0.1/javalang/tokenizer.py +687 -0
- ljavalang-2.0.1/javalang/tree.py +340 -0
- ljavalang-2.0.1/javalang/util.py +165 -0
- ljavalang-2.0.1/javalang/visitor.py +48 -0
- ljavalang-2.0.1/ljavalang.egg-info/PKG-INFO +279 -0
- ljavalang-2.0.1/ljavalang.egg-info/SOURCES.txt +35 -0
- ljavalang-2.0.1/ljavalang.egg-info/dependency_links.txt +1 -0
- ljavalang-2.0.1/ljavalang.egg-info/top_level.txt +1 -0
- ljavalang-2.0.1/pyproject.toml +49 -0
- ljavalang-2.0.1/setup.cfg +4 -0
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
Copyright (c) 2013 Christopher Thunes
|
|
2
|
+
|
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
4
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
5
|
+
in the Software without restriction, including without limitation the rights
|
|
6
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
7
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
8
|
+
furnished to do so, subject to the following conditions:
|
|
9
|
+
|
|
10
|
+
The above copyright notice and this permission notice shall be included in
|
|
11
|
+
all copies or substantial portions of the Software.
|
|
12
|
+
|
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
19
|
+
THE SOFTWARE.
|
ljavalang-2.0.1/PKG-INFO
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: ljavalang
|
|
3
|
+
Version: 2.0.1
|
|
4
|
+
Summary: Enhanced Java parser with Java 9-22 support, fork of javalang
|
|
5
|
+
Author-email: LoRexxar <lorexxar@gmail.com>
|
|
6
|
+
Maintainer-email: LoRexxar <lorexxar@gmail.com>
|
|
7
|
+
License: MIT
|
|
8
|
+
Project-URL: Homepage, https://github.com/LoRexxar/Ljavalang
|
|
9
|
+
Project-URL: Repository, https://github.com/LoRexxar/Ljavalang
|
|
10
|
+
Project-URL: Issues, https://github.com/LoRexxar/Ljavalang/issues
|
|
11
|
+
Keywords: java,parser,ast,static-analysis,javalang
|
|
12
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
13
|
+
Classifier: Intended Audience :: Developers
|
|
14
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
15
|
+
Classifier: Operating System :: OS Independent
|
|
16
|
+
Classifier: Programming Language :: Python :: 3
|
|
17
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
18
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
19
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
20
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
21
|
+
Classifier: Topic :: Software Development :: Libraries
|
|
22
|
+
Classifier: Topic :: Software Development :: Quality Assurance
|
|
23
|
+
Classifier: Topic :: Software Development :: Compilers
|
|
24
|
+
Requires-Python: >=3.9
|
|
25
|
+
Description-Content-Type: text/markdown
|
|
26
|
+
License-File: LICENSE.txt
|
|
27
|
+
|
|
28
|
+
# Ljavalang
|
|
29
|
+
|
|
30
|
+
> [javalang](https://github.com/c2nes/javalang) 的增强 fork,修复上游 AST 构造缺陷并支持 Java 9-22 新语法,为 [Kunlun-M](https://github.com/LoRexxar/Kunlun-M) 等静态分析工具提供准确的 Java 语法树。
|
|
31
|
+
|
|
32
|
+
[](https://github.com/LoRexxar/Ljavalang/actions/workflows/tests.yml)
|
|
33
|
+
|
|
34
|
+
## 与上游的区别
|
|
35
|
+
|
|
36
|
+
| 特性 | 上游 javalang | Ljavalang |
|
|
37
|
+
|------|:---:|:---:|
|
|
38
|
+
| Java 8 语法 | ✅ | ✅ |
|
|
39
|
+
| **链式调用修复** | ❌ `a.b().c()` 解析为扁平 selectors | ✅ 正确嵌套为限定符链 |
|
|
40
|
+
| **Java 9** TWR effectively final | ❌ | ✅ |
|
|
41
|
+
| **Java 9** module-info | ❌ | ✅ |
|
|
42
|
+
| **Java 10** var 类型推断 | ❌ | ✅ |
|
|
43
|
+
| **Java 14** switch expression (arrow/yield) | ❌ | ✅ |
|
|
44
|
+
| **Java 14** pattern matching instanceof | ❌ | ✅ |
|
|
45
|
+
| **Java 15** text block (三引号字符串) | ❌ | ✅ |
|
|
46
|
+
| **Java 16** record class | ❌ | ✅ |
|
|
47
|
+
| **Java 17** sealed / permits / non-sealed | ❌ | ✅ |
|
|
48
|
+
| **Java 21** pattern matching switch | ❌ | ✅ |
|
|
49
|
+
| **Java 21** record pattern (解构) | ❌ | ✅ |
|
|
50
|
+
| **Java 22** unnamed variable `_` | ❌ | ✅ |
|
|
51
|
+
| **上游 issue 修复** | 部分未修复 | ✅ 全部 151 issue 已分析,32 bug 已验证 |
|
|
52
|
+
| **Token 位置范围** | ❌ | ✅ `Position.range` |
|
|
53
|
+
| **Visitor 模式** | ❌ | ✅ `javalang.visitor.JavaVisitor` |
|
|
54
|
+
| **Receiver parameter** | ❌ | ✅ Java 8 `Type.this` 参数 |
|
|
55
|
+
|
|
56
|
+
## 安装
|
|
57
|
+
|
|
58
|
+
```bash
|
|
59
|
+
pip install git+https://github.com/LoRexxar/Ljavalang.git@develop
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
或克隆后本地安装:
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
git clone https://github.com/LoRexxar/Ljavalang.git
|
|
66
|
+
cd Ljavalang
|
|
67
|
+
pip install -e .
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
## 快速开始
|
|
71
|
+
|
|
72
|
+
用法与上游 javalang 完全兼容:
|
|
73
|
+
|
|
74
|
+
```python
|
|
75
|
+
>>> import javalang
|
|
76
|
+
>>> tree = javalang.parse.parse('package com.example; class Test {}')
|
|
77
|
+
>>> tree.package.name
|
|
78
|
+
'com.example'
|
|
79
|
+
>>> tree.types[0].name
|
|
80
|
+
'Test'
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### 新语法示例
|
|
84
|
+
|
|
85
|
+
**Java 14 switch expression:**
|
|
86
|
+
```python
|
|
87
|
+
>>> code = '''
|
|
88
|
+
... class T {
|
|
89
|
+
... int m(int x) {
|
|
90
|
+
... return switch(x) {
|
|
91
|
+
... case 1 -> 10;
|
|
92
|
+
... case 2 -> 20;
|
|
93
|
+
... default -> 0;
|
|
94
|
+
... };
|
|
95
|
+
... }
|
|
96
|
+
... }'''
|
|
97
|
+
>>> tree = javalang.parse.parse(code)
|
|
98
|
+
>>> # return 语句中的表达式是 SwitchExpression
|
|
99
|
+
>>> tree.types[0].body[0].body[0].expression
|
|
100
|
+
SwitchExpression
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**Java 16 record:**
|
|
104
|
+
```python
|
|
105
|
+
>>> tree = javalang.parse.parse('record Point(int x, int y) {}')
|
|
106
|
+
>>> tree.types[0]
|
|
107
|
+
RecordDeclaration
|
|
108
|
+
>>> tree.types[0].name
|
|
109
|
+
'Point'
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
**Java 21 record pattern:**
|
|
113
|
+
```python
|
|
114
|
+
>>> code = '''
|
|
115
|
+
... class T {
|
|
116
|
+
... record Point(int x, int y) {}
|
|
117
|
+
... void m(Object o) {
|
|
118
|
+
... switch(o) {
|
|
119
|
+
... case Point(int x, int y) -> System.out.println(x + y);
|
|
120
|
+
... default -> {}
|
|
121
|
+
... }
|
|
122
|
+
... }
|
|
123
|
+
... }'''
|
|
124
|
+
>>> javalang.parse.parse(code) # 正常解析
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**链式调用(核心 bug 修复):**
|
|
128
|
+
```python
|
|
129
|
+
>>> code = 'class T { void m(String cmd) { Runtime.getRuntime().exec(cmd); } }'
|
|
130
|
+
>>> tree = javalang.parse.parse(code)
|
|
131
|
+
>>> # 上游会把 exec 错误地放入 selectors 列表
|
|
132
|
+
>>> # Ljavalang 正确解析为嵌套的 MethodInvocation 限定符链
|
|
133
|
+
```
|
|
134
|
+
|
|
135
|
+
### Visitor 模式遍历
|
|
136
|
+
|
|
137
|
+
```python
|
|
138
|
+
from javalang.visitor import JavaVisitor
|
|
139
|
+
|
|
140
|
+
class MethodCollector(JavaVisitor):
|
|
141
|
+
def __init__(self):
|
|
142
|
+
self.methods = []
|
|
143
|
+
|
|
144
|
+
def visit_MethodDeclaration(self, node):
|
|
145
|
+
self.methods.append(node.name)
|
|
146
|
+
self.generic_visit(node)
|
|
147
|
+
|
|
148
|
+
collector = MethodCollector()
|
|
149
|
+
collector.visit(tree)
|
|
150
|
+
print(collector.methods) # ['foo', 'bar', ...]
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
### Token 位置范围
|
|
154
|
+
|
|
155
|
+
```python
|
|
156
|
+
from javalang.tokenizer import tokenize
|
|
157
|
+
|
|
158
|
+
code = 'int x = 42;'
|
|
159
|
+
for token in tokenize(code):
|
|
160
|
+
r = token.position.range
|
|
161
|
+
print(f'{token.value} -> code[{r.start}:{r.stop}] = {code[r]!r}')
|
|
162
|
+
# int -> code[0:3] = 'int'
|
|
163
|
+
# x -> code[4:5] = 'x'
|
|
164
|
+
# = -> code[6:7] = '='
|
|
165
|
+
# 42 -> code[8:10] = '42'
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
## 测试
|
|
169
|
+
|
|
170
|
+
```bash
|
|
171
|
+
# 运行全部测试(112 个用例)
|
|
172
|
+
python -m pytest javalang/test/ -v \
|
|
173
|
+
--ignore=javalang/test/test_java_8_syntax.py \
|
|
174
|
+
--ignore=javalang/test/test_package_declaration.py
|
|
175
|
+
|
|
176
|
+
# 仅运行特定版本的测试
|
|
177
|
+
python -m pytest javalang/test/test_java_21_syntax.py -v
|
|
178
|
+
|
|
179
|
+
# 仅运行上游 issue 回归测试
|
|
180
|
+
python -m pytest javalang/test/test_upstream_issues.py javalang/test/test_upstream_features.py -v
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
测试覆盖矩阵:Python 3.9 / 3.10 / 3.11 / 3.12,通过 GitHub Actions 自动运行。
|
|
184
|
+
|
|
185
|
+
## 支持的 Java 语法特性
|
|
186
|
+
|
|
187
|
+
<details>
|
|
188
|
+
<summary>完整列表(点击展开)</summary>
|
|
189
|
+
|
|
190
|
+
### Java 8(上游已支持)
|
|
191
|
+
- Lambda 表达式
|
|
192
|
+
- 方法引用
|
|
193
|
+
- 类型注解
|
|
194
|
+
- 接口 default/static 方法
|
|
195
|
+
- 通用 try-with-resources
|
|
196
|
+
- Receiver parameter(`Inner.this` 参数)
|
|
197
|
+
|
|
198
|
+
### Java 9
|
|
199
|
+
- `try`-with-resources effectively final 变量
|
|
200
|
+
- `module-info.java`(module / open module / requires / exports / opens / uses / provides)
|
|
201
|
+
- 接口 private 方法
|
|
202
|
+
- 匿名类 diamond 操作符
|
|
203
|
+
|
|
204
|
+
### Java 10-11
|
|
205
|
+
- `var` 局部变量类型推断
|
|
206
|
+
- `var` 在 for-each / try-with-resources 中
|
|
207
|
+
- `var` 在 lambda 参数中
|
|
208
|
+
|
|
209
|
+
### Java 14
|
|
210
|
+
- Switch expression(`case X ->` 箭头语法)
|
|
211
|
+
- Switch expression 表达式级别(`return switch(...)` / 赋值右值)
|
|
212
|
+
- 多标签 case(`case 1, 2, 3 ->`)
|
|
213
|
+
- `yield` 语句
|
|
214
|
+
- Pattern matching `instanceof`(`obj instanceof String s`)
|
|
215
|
+
|
|
216
|
+
### Java 15
|
|
217
|
+
- Text block(`"""..."""` 三引号字符串)
|
|
218
|
+
|
|
219
|
+
### Java 16
|
|
220
|
+
- `record` 类声明
|
|
221
|
+
- 局部 record / enum(方法体内)
|
|
222
|
+
- record 作为类成员
|
|
223
|
+
|
|
224
|
+
### Java 17
|
|
225
|
+
- `sealed` class / interface
|
|
226
|
+
- `permits` 子句
|
|
227
|
+
- `non-sealed` 修饰符
|
|
228
|
+
|
|
229
|
+
### Java 21
|
|
230
|
+
- Pattern matching switch(`case String s ->`)
|
|
231
|
+
- Record pattern 解构(`case Point(int x, int y) ->`)
|
|
232
|
+
- 嵌套 record pattern
|
|
233
|
+
- `case null` 匹配
|
|
234
|
+
|
|
235
|
+
### Java 22
|
|
236
|
+
- Unnamed variable `_`
|
|
237
|
+
- Unnamed lambda 参数
|
|
238
|
+
|
|
239
|
+
### 上游 Bug 修复(32 项)
|
|
240
|
+
- **链式调用**:`a.b().c()` 不再被错误地放入 `selectors`,而是正确嵌套为限定符链
|
|
241
|
+
- **DecimalInteger 继承**:继承 `Integer` 而非跳级 `Literal`
|
|
242
|
+
- **Character token**:char 字面量 `'a'` 生成 `Character` 类型而非 `String`
|
|
243
|
+
- **泛型内注解**:`List<@NotNull String>` 正确解析
|
|
244
|
+
- **void 返回类型**:`return_type` 为 `'void'` 而非 `None`
|
|
245
|
+
- **prefix/postfix 保留**:括号内一元运算符不再丢失
|
|
246
|
+
|
|
247
|
+
</details>
|
|
248
|
+
|
|
249
|
+
## 项目结构
|
|
250
|
+
|
|
251
|
+
```
|
|
252
|
+
javalang/
|
|
253
|
+
├── parse.py # 入口:parse() / parse_expression() 等
|
|
254
|
+
├── parser.py # 递归下降解析器(~2800 行)
|
|
255
|
+
├── tokenizer.py # 词法分析器(~700 行)
|
|
256
|
+
├── tree.py # AST 节点定义(~340 行)
|
|
257
|
+
├── visitor.py # Visitor 模式遍历
|
|
258
|
+
├── test/ # 测试用例(112 个)
|
|
259
|
+
│ ├── test_java_9_syntax.py
|
|
260
|
+
│ ├── test_java_10_11_syntax.py
|
|
261
|
+
│ ├── test_java_14_15_syntax.py
|
|
262
|
+
│ ├── test_java_16_17_syntax.py
|
|
263
|
+
│ ├── test_java_21_syntax.py
|
|
264
|
+
│ ├── test_upstream_issues.py # 上游 bug 回归测试
|
|
265
|
+
│ └── test_upstream_features.py # 上游 feature 测试
|
|
266
|
+
└── docs/
|
|
267
|
+
├── architecture.md # 架构文档
|
|
268
|
+
├── java-version-roadmap.md # 版本支持路线图
|
|
269
|
+
├── upstream-issues.md # 151 个上游 issue 分类
|
|
270
|
+
└── issue-fix-progress.md # 修复进度追踪
|
|
271
|
+
```
|
|
272
|
+
|
|
273
|
+
## 致谢
|
|
274
|
+
|
|
275
|
+
基于 [c2nes/javalang](https://github.com/c2nes/javalang)(作者 Chris Thunes)开发,为 [Kunlun-M](https://github.com/LoRexxar/Kunlun-M) 白盒扫描器提供 Java 解析支持。
|
|
276
|
+
|
|
277
|
+
## License
|
|
278
|
+
|
|
279
|
+
MIT License(继承自上游)
|
|
@@ -0,0 +1,252 @@
|
|
|
1
|
+
# Ljavalang
|
|
2
|
+
|
|
3
|
+
> [javalang](https://github.com/c2nes/javalang) 的增强 fork,修复上游 AST 构造缺陷并支持 Java 9-22 新语法,为 [Kunlun-M](https://github.com/LoRexxar/Kunlun-M) 等静态分析工具提供准确的 Java 语法树。
|
|
4
|
+
|
|
5
|
+
[](https://github.com/LoRexxar/Ljavalang/actions/workflows/tests.yml)
|
|
6
|
+
|
|
7
|
+
## 与上游的区别
|
|
8
|
+
|
|
9
|
+
| 特性 | 上游 javalang | Ljavalang |
|
|
10
|
+
|------|:---:|:---:|
|
|
11
|
+
| Java 8 语法 | ✅ | ✅ |
|
|
12
|
+
| **链式调用修复** | ❌ `a.b().c()` 解析为扁平 selectors | ✅ 正确嵌套为限定符链 |
|
|
13
|
+
| **Java 9** TWR effectively final | ❌ | ✅ |
|
|
14
|
+
| **Java 9** module-info | ❌ | ✅ |
|
|
15
|
+
| **Java 10** var 类型推断 | ❌ | ✅ |
|
|
16
|
+
| **Java 14** switch expression (arrow/yield) | ❌ | ✅ |
|
|
17
|
+
| **Java 14** pattern matching instanceof | ❌ | ✅ |
|
|
18
|
+
| **Java 15** text block (三引号字符串) | ❌ | ✅ |
|
|
19
|
+
| **Java 16** record class | ❌ | ✅ |
|
|
20
|
+
| **Java 17** sealed / permits / non-sealed | ❌ | ✅ |
|
|
21
|
+
| **Java 21** pattern matching switch | ❌ | ✅ |
|
|
22
|
+
| **Java 21** record pattern (解构) | ❌ | ✅ |
|
|
23
|
+
| **Java 22** unnamed variable `_` | ❌ | ✅ |
|
|
24
|
+
| **上游 issue 修复** | 部分未修复 | ✅ 全部 151 issue 已分析,32 bug 已验证 |
|
|
25
|
+
| **Token 位置范围** | ❌ | ✅ `Position.range` |
|
|
26
|
+
| **Visitor 模式** | ❌ | ✅ `javalang.visitor.JavaVisitor` |
|
|
27
|
+
| **Receiver parameter** | ❌ | ✅ Java 8 `Type.this` 参数 |
|
|
28
|
+
|
|
29
|
+
## 安装
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
pip install git+https://github.com/LoRexxar/Ljavalang.git@develop
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
或克隆后本地安装:
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
git clone https://github.com/LoRexxar/Ljavalang.git
|
|
39
|
+
cd Ljavalang
|
|
40
|
+
pip install -e .
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## 快速开始
|
|
44
|
+
|
|
45
|
+
用法与上游 javalang 完全兼容:
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
>>> import javalang
|
|
49
|
+
>>> tree = javalang.parse.parse('package com.example; class Test {}')
|
|
50
|
+
>>> tree.package.name
|
|
51
|
+
'com.example'
|
|
52
|
+
>>> tree.types[0].name
|
|
53
|
+
'Test'
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### 新语法示例
|
|
57
|
+
|
|
58
|
+
**Java 14 switch expression:**
|
|
59
|
+
```python
|
|
60
|
+
>>> code = '''
|
|
61
|
+
... class T {
|
|
62
|
+
... int m(int x) {
|
|
63
|
+
... return switch(x) {
|
|
64
|
+
... case 1 -> 10;
|
|
65
|
+
... case 2 -> 20;
|
|
66
|
+
... default -> 0;
|
|
67
|
+
... };
|
|
68
|
+
... }
|
|
69
|
+
... }'''
|
|
70
|
+
>>> tree = javalang.parse.parse(code)
|
|
71
|
+
>>> # return 语句中的表达式是 SwitchExpression
|
|
72
|
+
>>> tree.types[0].body[0].body[0].expression
|
|
73
|
+
SwitchExpression
|
|
74
|
+
```
|
|
75
|
+
|
|
76
|
+
**Java 16 record:**
|
|
77
|
+
```python
|
|
78
|
+
>>> tree = javalang.parse.parse('record Point(int x, int y) {}')
|
|
79
|
+
>>> tree.types[0]
|
|
80
|
+
RecordDeclaration
|
|
81
|
+
>>> tree.types[0].name
|
|
82
|
+
'Point'
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
**Java 21 record pattern:**
|
|
86
|
+
```python
|
|
87
|
+
>>> code = '''
|
|
88
|
+
... class T {
|
|
89
|
+
... record Point(int x, int y) {}
|
|
90
|
+
... void m(Object o) {
|
|
91
|
+
... switch(o) {
|
|
92
|
+
... case Point(int x, int y) -> System.out.println(x + y);
|
|
93
|
+
... default -> {}
|
|
94
|
+
... }
|
|
95
|
+
... }
|
|
96
|
+
... }'''
|
|
97
|
+
>>> javalang.parse.parse(code) # 正常解析
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
**链式调用(核心 bug 修复):**
|
|
101
|
+
```python
|
|
102
|
+
>>> code = 'class T { void m(String cmd) { Runtime.getRuntime().exec(cmd); } }'
|
|
103
|
+
>>> tree = javalang.parse.parse(code)
|
|
104
|
+
>>> # 上游会把 exec 错误地放入 selectors 列表
|
|
105
|
+
>>> # Ljavalang 正确解析为嵌套的 MethodInvocation 限定符链
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Visitor 模式遍历
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
from javalang.visitor import JavaVisitor
|
|
112
|
+
|
|
113
|
+
class MethodCollector(JavaVisitor):
|
|
114
|
+
def __init__(self):
|
|
115
|
+
self.methods = []
|
|
116
|
+
|
|
117
|
+
def visit_MethodDeclaration(self, node):
|
|
118
|
+
self.methods.append(node.name)
|
|
119
|
+
self.generic_visit(node)
|
|
120
|
+
|
|
121
|
+
collector = MethodCollector()
|
|
122
|
+
collector.visit(tree)
|
|
123
|
+
print(collector.methods) # ['foo', 'bar', ...]
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
### Token 位置范围
|
|
127
|
+
|
|
128
|
+
```python
|
|
129
|
+
from javalang.tokenizer import tokenize
|
|
130
|
+
|
|
131
|
+
code = 'int x = 42;'
|
|
132
|
+
for token in tokenize(code):
|
|
133
|
+
r = token.position.range
|
|
134
|
+
print(f'{token.value} -> code[{r.start}:{r.stop}] = {code[r]!r}')
|
|
135
|
+
# int -> code[0:3] = 'int'
|
|
136
|
+
# x -> code[4:5] = 'x'
|
|
137
|
+
# = -> code[6:7] = '='
|
|
138
|
+
# 42 -> code[8:10] = '42'
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## 测试
|
|
142
|
+
|
|
143
|
+
```bash
|
|
144
|
+
# 运行全部测试(112 个用例)
|
|
145
|
+
python -m pytest javalang/test/ -v \
|
|
146
|
+
--ignore=javalang/test/test_java_8_syntax.py \
|
|
147
|
+
--ignore=javalang/test/test_package_declaration.py
|
|
148
|
+
|
|
149
|
+
# 仅运行特定版本的测试
|
|
150
|
+
python -m pytest javalang/test/test_java_21_syntax.py -v
|
|
151
|
+
|
|
152
|
+
# 仅运行上游 issue 回归测试
|
|
153
|
+
python -m pytest javalang/test/test_upstream_issues.py javalang/test/test_upstream_features.py -v
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
测试覆盖矩阵:Python 3.9 / 3.10 / 3.11 / 3.12,通过 GitHub Actions 自动运行。
|
|
157
|
+
|
|
158
|
+
## 支持的 Java 语法特性
|
|
159
|
+
|
|
160
|
+
<details>
|
|
161
|
+
<summary>完整列表(点击展开)</summary>
|
|
162
|
+
|
|
163
|
+
### Java 8(上游已支持)
|
|
164
|
+
- Lambda 表达式
|
|
165
|
+
- 方法引用
|
|
166
|
+
- 类型注解
|
|
167
|
+
- 接口 default/static 方法
|
|
168
|
+
- 通用 try-with-resources
|
|
169
|
+
- Receiver parameter(`Inner.this` 参数)
|
|
170
|
+
|
|
171
|
+
### Java 9
|
|
172
|
+
- `try`-with-resources effectively final 变量
|
|
173
|
+
- `module-info.java`(module / open module / requires / exports / opens / uses / provides)
|
|
174
|
+
- 接口 private 方法
|
|
175
|
+
- 匿名类 diamond 操作符
|
|
176
|
+
|
|
177
|
+
### Java 10-11
|
|
178
|
+
- `var` 局部变量类型推断
|
|
179
|
+
- `var` 在 for-each / try-with-resources 中
|
|
180
|
+
- `var` 在 lambda 参数中
|
|
181
|
+
|
|
182
|
+
### Java 14
|
|
183
|
+
- Switch expression(`case X ->` 箭头语法)
|
|
184
|
+
- Switch expression 表达式级别(`return switch(...)` / 赋值右值)
|
|
185
|
+
- 多标签 case(`case 1, 2, 3 ->`)
|
|
186
|
+
- `yield` 语句
|
|
187
|
+
- Pattern matching `instanceof`(`obj instanceof String s`)
|
|
188
|
+
|
|
189
|
+
### Java 15
|
|
190
|
+
- Text block(`"""..."""` 三引号字符串)
|
|
191
|
+
|
|
192
|
+
### Java 16
|
|
193
|
+
- `record` 类声明
|
|
194
|
+
- 局部 record / enum(方法体内)
|
|
195
|
+
- record 作为类成员
|
|
196
|
+
|
|
197
|
+
### Java 17
|
|
198
|
+
- `sealed` class / interface
|
|
199
|
+
- `permits` 子句
|
|
200
|
+
- `non-sealed` 修饰符
|
|
201
|
+
|
|
202
|
+
### Java 21
|
|
203
|
+
- Pattern matching switch(`case String s ->`)
|
|
204
|
+
- Record pattern 解构(`case Point(int x, int y) ->`)
|
|
205
|
+
- 嵌套 record pattern
|
|
206
|
+
- `case null` 匹配
|
|
207
|
+
|
|
208
|
+
### Java 22
|
|
209
|
+
- Unnamed variable `_`
|
|
210
|
+
- Unnamed lambda 参数
|
|
211
|
+
|
|
212
|
+
### 上游 Bug 修复(32 项)
|
|
213
|
+
- **链式调用**:`a.b().c()` 不再被错误地放入 `selectors`,而是正确嵌套为限定符链
|
|
214
|
+
- **DecimalInteger 继承**:继承 `Integer` 而非跳级 `Literal`
|
|
215
|
+
- **Character token**:char 字面量 `'a'` 生成 `Character` 类型而非 `String`
|
|
216
|
+
- **泛型内注解**:`List<@NotNull String>` 正确解析
|
|
217
|
+
- **void 返回类型**:`return_type` 为 `'void'` 而非 `None`
|
|
218
|
+
- **prefix/postfix 保留**:括号内一元运算符不再丢失
|
|
219
|
+
|
|
220
|
+
</details>
|
|
221
|
+
|
|
222
|
+
## 项目结构
|
|
223
|
+
|
|
224
|
+
```
|
|
225
|
+
javalang/
|
|
226
|
+
├── parse.py # 入口:parse() / parse_expression() 等
|
|
227
|
+
├── parser.py # 递归下降解析器(~2800 行)
|
|
228
|
+
├── tokenizer.py # 词法分析器(~700 行)
|
|
229
|
+
├── tree.py # AST 节点定义(~340 行)
|
|
230
|
+
├── visitor.py # Visitor 模式遍历
|
|
231
|
+
├── test/ # 测试用例(112 个)
|
|
232
|
+
│ ├── test_java_9_syntax.py
|
|
233
|
+
│ ├── test_java_10_11_syntax.py
|
|
234
|
+
│ ├── test_java_14_15_syntax.py
|
|
235
|
+
│ ├── test_java_16_17_syntax.py
|
|
236
|
+
│ ├── test_java_21_syntax.py
|
|
237
|
+
│ ├── test_upstream_issues.py # 上游 bug 回归测试
|
|
238
|
+
│ └── test_upstream_features.py # 上游 feature 测试
|
|
239
|
+
└── docs/
|
|
240
|
+
├── architecture.md # 架构文档
|
|
241
|
+
├── java-version-roadmap.md # 版本支持路线图
|
|
242
|
+
├── upstream-issues.md # 151 个上游 issue 分类
|
|
243
|
+
└── issue-fix-progress.md # 修复进度追踪
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
## 致谢
|
|
247
|
+
|
|
248
|
+
基于 [c2nes/javalang](https://github.com/c2nes/javalang)(作者 Chris Thunes)开发,为 [Kunlun-M](https://github.com/LoRexxar/Kunlun-M) 白盒扫描器提供 Java 解析支持。
|
|
249
|
+
|
|
250
|
+
## License
|
|
251
|
+
|
|
252
|
+
MIT License(继承自上游)
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
|
|
2
|
+
========
|
|
3
|
+
javalang
|
|
4
|
+
========
|
|
5
|
+
|
|
6
|
+
.. image:: https://travis-ci.org/c2nes/javalang.svg?branch=master
|
|
7
|
+
:target: https://travis-ci.org/c2nes/javalang
|
|
8
|
+
|
|
9
|
+
.. image:: https://badge.fury.io/py/javalang.svg
|
|
10
|
+
:target: https://badge.fury.io/py/javalang
|
|
11
|
+
|
|
12
|
+
javalang is a pure Python library for working with Java source
|
|
13
|
+
code. javalang provides a lexer and parser targeting Java 8. The
|
|
14
|
+
implementation is based on the Java language spec available at
|
|
15
|
+
http://docs.oracle.com/javase/specs/jls/se8/html/.
|
|
16
|
+
|
|
17
|
+
The following gives a very brief introduction to using javalang.
|
|
18
|
+
|
|
19
|
+
---------------
|
|
20
|
+
Getting Started
|
|
21
|
+
---------------
|
|
22
|
+
|
|
23
|
+
.. code-block:: python
|
|
24
|
+
|
|
25
|
+
>>> import javalang
|
|
26
|
+
>>> tree = javalang.parse.parse("package javalang.brewtab.com; class Test {}")
|
|
27
|
+
|
|
28
|
+
This will return a ``CompilationUnit`` instance. This object is the root of a
|
|
29
|
+
tree which may be traversed to extract different information about the
|
|
30
|
+
compilation unit,
|
|
31
|
+
|
|
32
|
+
.. code-block:: python
|
|
33
|
+
|
|
34
|
+
>>> tree.package.name
|
|
35
|
+
u'javalang.brewtab.com'
|
|
36
|
+
>>> tree.types[0]
|
|
37
|
+
ClassDeclaration
|
|
38
|
+
>>> tree.types[0].name
|
|
39
|
+
u'Test'
|
|
40
|
+
|
|
41
|
+
The string passed to ``javalang.parse.parse()`` must represent a complete unit
|
|
42
|
+
which simply means it should represent a complete, valid Java source file. Other
|
|
43
|
+
methods in the ``javalang.parse`` module allow for some smaller code snippets to
|
|
44
|
+
be parsed without providing an entire compilation unit.
|
|
45
|
+
|
|
46
|
+
Working with the syntax tree
|
|
47
|
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
48
|
+
|
|
49
|
+
``CompilationUnit`` is a subclass of ``javalang.ast.Node``, as are its
|
|
50
|
+
descendants in the tree. The ``javalang.tree`` module defines the different
|
|
51
|
+
types of ``Node`` subclasses, each of which represent the different syntaxual
|
|
52
|
+
elements you will find in Java code. For more detail on what node types are
|
|
53
|
+
available, see the ``javalang/tree.py`` source file until the documentation is
|
|
54
|
+
complete.
|
|
55
|
+
|
|
56
|
+
``Node`` instances support iteration,
|
|
57
|
+
|
|
58
|
+
.. code-block:: python
|
|
59
|
+
|
|
60
|
+
>>> for path, node in tree:
|
|
61
|
+
... print path, node
|
|
62
|
+
...
|
|
63
|
+
() CompilationUnit
|
|
64
|
+
(CompilationUnit,) PackageDeclaration
|
|
65
|
+
(CompilationUnit, [ClassDeclaration]) ClassDeclaration
|
|
66
|
+
|
|
67
|
+
This iteration can also be filtered by type,
|
|
68
|
+
|
|
69
|
+
.. code-block:: python
|
|
70
|
+
|
|
71
|
+
>>> for path, node in tree.filter(javalang.tree.ClassDeclaration):
|
|
72
|
+
... print path, node
|
|
73
|
+
...
|
|
74
|
+
(CompilationUnit, [ClassDeclaration]) ClassDeclaration
|
|
75
|
+
|
|
76
|
+
---------------
|
|
77
|
+
Component Usage
|
|
78
|
+
---------------
|
|
79
|
+
|
|
80
|
+
Internally, the ``javalang.parse.parse`` method is a simple method which creates
|
|
81
|
+
a token stream for the input, initializes a new ``javalang.parser.Parser``
|
|
82
|
+
instance with the given token stream, and then invokes the parser's ``parse()``
|
|
83
|
+
method, returning the resulting ``CompilationUnit``. These components may be
|
|
84
|
+
also be used individually.
|
|
85
|
+
|
|
86
|
+
Tokenizer
|
|
87
|
+
^^^^^^^^^
|
|
88
|
+
|
|
89
|
+
The tokenizer/lexer may be invoked directly be calling ``javalang.tokenizer.tokenize``,
|
|
90
|
+
|
|
91
|
+
.. code-block:: python
|
|
92
|
+
|
|
93
|
+
>>> javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
|
|
94
|
+
<generator object tokenize at 0x1ce5190>
|
|
95
|
+
|
|
96
|
+
This returns a generator which provides a stream of ``JavaToken`` objects. Each
|
|
97
|
+
token carries position (line, column) and value information,
|
|
98
|
+
|
|
99
|
+
.. code-block:: python
|
|
100
|
+
|
|
101
|
+
>>> tokens = list(javalang.tokenizer.tokenize('System.out.println("Hello " + "world");'))
|
|
102
|
+
>>> tokens[6].value
|
|
103
|
+
u'"Hello "'
|
|
104
|
+
>>> tokens[6].position
|
|
105
|
+
(1, 19)
|
|
106
|
+
|
|
107
|
+
The tokens are not directly instances of ``JavaToken``, but are instead
|
|
108
|
+
instances of subclasses which identify their general type,
|
|
109
|
+
|
|
110
|
+
.. code-block:: python
|
|
111
|
+
|
|
112
|
+
>>> type(tokens[6])
|
|
113
|
+
<class 'javalang.tokenizer.String'>
|
|
114
|
+
>>> type(tokens[7])
|
|
115
|
+
<class 'javalang.tokenizer.Operator'>
|
|
116
|
+
|
|
117
|
+
|
|
118
|
+
**NOTE:** The shift operators ``>>`` and ``>>>`` are represented by multiple
|
|
119
|
+
``>`` tokens. This is because multiple ``>`` may appear in a row when closing
|
|
120
|
+
nested generic parameter/arguments lists. This abiguity is instead resolved by
|
|
121
|
+
the parser.
|
|
122
|
+
|
|
123
|
+
Parser
|
|
124
|
+
^^^^^^
|
|
125
|
+
|
|
126
|
+
To parse snippets of code, a parser may be used directly,
|
|
127
|
+
|
|
128
|
+
.. code-block:: python
|
|
129
|
+
|
|
130
|
+
>>> tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
|
|
131
|
+
>>> parser = javalang.parser.Parser(tokens)
|
|
132
|
+
>>> parser.parse_expression()
|
|
133
|
+
MethodInvocation
|
|
134
|
+
|
|
135
|
+
The parse methods are designed for incremental parsing so they will not restart
|
|
136
|
+
at the beginning of the token stream. Attempting to call a parse method more
|
|
137
|
+
than once will result in a ``JavaSyntaxError`` exception.
|
|
138
|
+
|
|
139
|
+
Invoking the incorrect parse method will also result in a ``JavaSyntaxError``
|
|
140
|
+
exception,
|
|
141
|
+
|
|
142
|
+
.. code-block:: python
|
|
143
|
+
|
|
144
|
+
>>> tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
|
|
145
|
+
>>> parser = javalang.parser.Parser(tokens)
|
|
146
|
+
>>> parser.parse_type_declaration()
|
|
147
|
+
Traceback (most recent call last):
|
|
148
|
+
File "<stdin>", line 1, in <module>
|
|
149
|
+
File "javalang/parser.py", line 336, in parse_type_declaration
|
|
150
|
+
return self.parse_class_or_interface_declaration()
|
|
151
|
+
File "javalang/parser.py", line 353, in parse_class_or_interface_declaration
|
|
152
|
+
self.illegal("Expected type declaration")
|
|
153
|
+
File "javalang/parser.py", line 122, in illegal
|
|
154
|
+
raise JavaSyntaxError(description, at)
|
|
155
|
+
javalang.parser.JavaSyntaxError
|
|
156
|
+
|
|
157
|
+
The ``javalang.parse`` module also provides convenience methods for parsing more
|
|
158
|
+
common types of code snippets.
|