cpg2py 1.0.4__tar.gz → 1.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (44) hide show
  1. cpg2py-1.1.0/PKG-INFO +407 -0
  2. cpg2py-1.1.0/README.md +358 -0
  3. cpg2py-1.1.0/cpg2py/__init__.py +80 -0
  4. {cpg2py-1.0.4/cpg2py/abc → cpg2py-1.1.0/cpg2py/_abc}/__init__.py +2 -2
  5. cpg2py-1.1.0/cpg2py/_abc/edge.py +96 -0
  6. cpg2py-1.1.0/cpg2py/_abc/graph.py +247 -0
  7. cpg2py-1.1.0/cpg2py/_abc/node.py +62 -0
  8. cpg2py-1.1.0/cpg2py/_abc/storage.py +190 -0
  9. cpg2py-1.1.0/cpg2py/_cpg/__init__.py +5 -0
  10. cpg2py-1.1.0/cpg2py/_cpg/edge.py +32 -0
  11. cpg2py-1.1.0/cpg2py/_cpg/graph.py +183 -0
  12. cpg2py-1.1.0/cpg2py/_cpg/node.py +67 -0
  13. cpg2py-1.1.0/cpg2py/_exceptions.py +41 -0
  14. cpg2py-1.1.0/cpg2py/_logger.py +53 -0
  15. cpg2py-1.1.0/cpg2py.egg-info/PKG-INFO +407 -0
  16. cpg2py-1.1.0/cpg2py.egg-info/SOURCES.txt +27 -0
  17. cpg2py-1.1.0/cpg2py.egg-info/requires.txt +6 -0
  18. cpg2py-1.1.0/pyproject.toml +134 -0
  19. cpg2py-1.1.0/tests/test_edge.py +139 -0
  20. cpg2py-1.1.0/tests/test_exceptions.py +132 -0
  21. cpg2py-1.1.0/tests/test_generics.py +195 -0
  22. cpg2py-1.1.0/tests/test_graph.py +464 -0
  23. cpg2py-1.1.0/tests/test_node.py +247 -0
  24. cpg2py-1.1.0/tests/test_storage.py +472 -0
  25. cpg2py-1.0.4/PKG-INFO +0 -261
  26. cpg2py-1.0.4/README.md +0 -222
  27. cpg2py-1.0.4/cpg2py/__init__.py +0 -40
  28. cpg2py-1.0.4/cpg2py/abc/edge.py +0 -39
  29. cpg2py-1.0.4/cpg2py/abc/graph.py +0 -98
  30. cpg2py-1.0.4/cpg2py/abc/node.py +0 -26
  31. cpg2py-1.0.4/cpg2py/abc/storage.py +0 -153
  32. cpg2py-1.0.4/cpg2py/cpg/__init__.py +0 -3
  33. cpg2py-1.0.4/cpg2py/cpg/edge.py +0 -31
  34. cpg2py-1.0.4/cpg2py/cpg/graph.py +0 -97
  35. cpg2py-1.0.4/cpg2py/cpg/node.py +0 -65
  36. cpg2py-1.0.4/cpg2py.egg-info/PKG-INFO +0 -261
  37. cpg2py-1.0.4/cpg2py.egg-info/SOURCES.txt +0 -19
  38. cpg2py-1.0.4/pyproject.toml +0 -24
  39. cpg2py-1.0.4/setup.py +0 -20
  40. {cpg2py-1.0.4 → cpg2py-1.1.0}/LICENSE +0 -0
  41. {cpg2py-1.0.4 → cpg2py-1.1.0}/MANIFEST.in +0 -0
  42. {cpg2py-1.0.4 → cpg2py-1.1.0}/cpg2py.egg-info/dependency_links.txt +0 -0
  43. {cpg2py-1.0.4 → cpg2py-1.1.0}/cpg2py.egg-info/top_level.txt +0 -0
  44. {cpg2py-1.0.4 → cpg2py-1.1.0}/setup.cfg +0 -0
cpg2py-1.1.0/PKG-INFO ADDED
@@ -0,0 +1,407 @@
1
+ Metadata-Version: 2.1
2
+ Name: cpg2py
3
+ Version: 1.1.0
4
+ Summary: A graph-based data structure designed for querying CSV files in Joern format in Python
5
+ Author-email: samhsu-dev <yxu166@jhu.edu>
6
+ License: MIT License
7
+
8
+ Copyright (c) 2025 Yichao Xu
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a copy
11
+ of this software and associated documentation files (the "Software"), to deal
12
+ in the Software without restriction, including without limitation the rights
13
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
14
+ copies of the Software, and to permit persons to whom the Software is
15
+ furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included in all
18
+ copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
23
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
25
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
26
+ SOFTWARE.
27
+
28
+ Project-URL: Homepage, https://github.com/YichaoXu/cpg2py
29
+ Project-URL: Repository, https://github.com/YichaoXu/cpg2py
30
+ Project-URL: Documentation, https://github.com/YichaoXu/cpg2py
31
+ Keywords: Joern,CPG,Graph,CSV
32
+ Classifier: Development Status :: 4 - Beta
33
+ Classifier: Intended Audience :: Developers
34
+ Classifier: License :: OSI Approved :: MIT License
35
+ Classifier: Programming Language :: Python :: 3
36
+ Classifier: Programming Language :: Python :: 3.8
37
+ Classifier: Programming Language :: Python :: 3.9
38
+ Classifier: Programming Language :: Python :: 3.10
39
+ Classifier: Programming Language :: Python :: 3.11
40
+ Classifier: Programming Language :: Python :: 3.12
41
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
42
+ Requires-Python: >=3.8
43
+ Description-Content-Type: text/markdown
44
+ License-File: LICENSE
45
+ Provides-Extra: test
46
+ Provides-Extra: dev
47
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
48
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
49
+
50
+ # **cpg2py: Graph-Based Query Engine for Joern CSV Files**
51
+
52
+ `cpg2py` is a Python library that provides a lightweight **graph-based query engine** for analyzing **Code Property Graphs (CPG)** extracted from Joern CSV files. The library offers an **abstract base class (ABC) architecture**, allowing users to extend and implement their own custom graph queries.
53
+
54
+ ---
55
+
56
+ ## **🚀 Features**
57
+
58
+ - **MultiDiGraph Representation**: A directed multi-graph with support for multiple edges between nodes.
59
+ - **CSV-Based Graph Construction**: Reads `nodes.csv` and `rels.csv` to construct a graph structure.
60
+ - **Type-Safe Generic Types**: Uses Python generics for type-safe graph operations (similar to Java generics).
61
+ - **Extensible Abstract Base Classes (ABC)**:
62
+ - `AbcGraphQuerier` for implementing **custom graph queries** with generic type support.
63
+ - `AbcNodeQuerier` for interacting with **nodes**.
64
+ - `AbcEdgeQuerier` for interacting with **edges**.
65
+ - **Built-in Query Mechanisms**:
66
+ - **Retrieve all nodes and edges** with type-safe iteration.
67
+ - **Get incoming and outgoing edges** of a node.
68
+ - **Find successors and predecessors** with type preservation.
69
+ - **Traverse AST, Control Flow, and Data Flow Graphs**.
70
+ - **Concrete Implementation**: `CpgGraph`, `CpgNode`, and `CpgEdge` provide ready-to-use implementations.
71
+
72
+ ---
73
+
74
+ ## **📚 Installation**
75
+
76
+ ### Using pip
77
+
78
+ To install the package, use:
79
+
80
+ ```bash
81
+ pip install git+https://github.com/samhsu-dev/cpg2py.git
82
+ ```
83
+
84
+ ### Using uv (Recommended)
85
+
86
+ This project uses [uv](https://github.com/astral-sh/uv) for fast and reliable package management.
87
+
88
+ **Install uv:**
89
+ ```bash
90
+ curl -LsSf https://astral.sh/uv/install.sh | sh
91
+ ```
92
+
93
+ **Clone and install:**
94
+ ```bash
95
+ git clone https://github.com/samhsu-dev/cpg2py.git
96
+ cd cpg2py
97
+ uv sync --dev # Install with dev dependencies
98
+ ```
99
+
100
+ **For development:**
101
+ ```bash
102
+ uv sync --dev
103
+ uv run pytest tests/ # Run tests
104
+ ```
105
+
106
+ Or clone the pip repository:
107
+
108
+ ```bash
109
+ pip install cpg2py
110
+ ```
111
+
112
+ ---
113
+
114
+ ## **📂 File Structure**
115
+
116
+ - **`nodes.csv`** (Example):
117
+ ```csv
118
+ id:int labels:label type flags:string_array lineno:int code childnum:int funcid:int classname namespace endlineno:int name doccomment
119
+ 0 Filesystem Directory "input"
120
+ 1 Filesystem File "example.php"
121
+ 2 AST AST_TOPLEVEL TOPLEVEL_FILE 1 "" 25 "/input/example.php"
122
+
123
+ ````
124
+ - **`rels.csv`** (Example):
125
+ ```csv
126
+ start end type
127
+ 2 3 ENTRY
128
+ 2 4 EXIT
129
+ 6 7 ENTRY
130
+ 6 9 PARENT_OF
131
+ ````
132
+
133
+ ---
134
+
135
+ ## **🎯 Type Safety with Generics**
136
+
137
+ `cpg2py` uses Python's generic types (similar to Java generics) to provide type-safe operations:
138
+
139
+ ```python
140
+ from typing import Iterable
141
+ from cpg2py import cpg_graph, CpgGraph, CpgNode, CpgEdge
142
+
143
+ # Type checker knows graph is CpgGraph[CpgNode, CpgEdge]
144
+ graph: CpgGraph = cpg_graph("nodes.csv", "rels.csv")
145
+
146
+ # Type checker knows node is CpgNode (not just AbcNodeQuerier)
147
+ node: CpgNode = graph.node("5")
148
+
149
+ # Type checker knows successors are Iterable[CpgNode]
150
+ successors: Iterable[CpgNode] = graph.succ(node)
151
+ for succ in successors:
152
+ succ.code # Type-safe: IDE knows succ is CpgNode
153
+ ```
154
+
155
+ This ensures that:
156
+ - Return types are preserved throughout graph operations
157
+ - IDE autocomplete works correctly
158
+ - Type checkers (mypy, pyright) can verify type correctness
159
+
160
+ For more details, see [Generics Documentation](docs/GENERICS.md).
161
+
162
+ ---
163
+
164
+ ## **📚 Usage**
165
+
166
+ ### **1️⃣ Load Graph from Joern CSVs**
167
+
168
+ ```python
169
+ from cpg2py import cpg_graph
170
+
171
+ # Load graph from CSV files
172
+ graph = cpg_graph("nodes.csv", "rels.csv")
173
+ ```
174
+
175
+ The `cpg_graph` function returns a `CpgGraph` instance, which is the concrete implementation of the graph querier.
176
+
177
+ ---
178
+
179
+ ### **2️⃣ Query Nodes & Edges**
180
+
181
+ ```python
182
+ from cpg2py import CpgGraph, CpgNode, CpgEdge
183
+
184
+ # Get a specific node (returns CpgNode)
185
+ node: CpgNode = graph.node("2")
186
+ print(node.name, node.type) # Example output: "/tmp/example.php" AST_TOPLEVEL
187
+
188
+ # Get a specific edge (returns CpgEdge)
189
+ edge: CpgEdge = graph.edge("2", "3", "ENTRY")
190
+ print(edge.type) # Output: ENTRY
191
+ ```
192
+
193
+ ---
194
+
195
+ ### **3️⃣ Get Node Connections**
196
+
197
+ ```python
198
+ # Get all outgoing edges from a node
199
+ outgoing_edges = graph.succ(node)
200
+ for out_node in outgoing_edges:
201
+ print(out_node.id, out_node.name) # out_node is CpgNode
202
+
203
+ # Get all incoming edges to a node
204
+ incoming_edges = graph.prev(node)
205
+ for in_node in incoming_edges:
206
+ print(in_node.id, in_node.name) # in_node is CpgNode
207
+ ```
208
+
209
+ ---
210
+
211
+ ### **4️⃣ AST and Flow Queries**
212
+
213
+ ```python
214
+ # Get top-level file node for a given node
215
+ top_file: CpgNode = graph.topfile_node("5")
216
+ print(top_file.name) # Output: "example.php"
217
+
218
+ # Get child nodes in the AST hierarchy
219
+ children = graph.children(node)
220
+ print([child.id for child in children]) # children are CpgNode instances
221
+
222
+ # Get data flow successors
223
+ flow_successors = graph.flow_to(node)
224
+ print([succ.id for succ in flow_successors]) # successors are CpgNode instances
225
+ ```
226
+
227
+ ---
228
+
229
+ ## **🛠 Abstract Base Classes (ABC)**
230
+
231
+ The following abstract base classes (`ABC`) provide interfaces for extending **node**, **edge**, and **graph** querying behavior. All ABCs are imported directly from the main `cpg2py` package.
232
+
233
+ ---
234
+
235
+ ### **🔹 AbcNodeQuerier (Abstract Node Interface)**
236
+
237
+ This class defines how nodes interact with the graph storage.
238
+
239
+ ```python
240
+ from cpg2py import AbcNodeQuerier, Storage
241
+
242
+ class MyNodeQuerier(AbcNodeQuerier):
243
+ def __init__(self, graph: Storage, nid: str):
244
+ super().__init__(graph, nid)
245
+
246
+ @property
247
+ def name(self):
248
+ return self.get_property("name")
249
+ ```
250
+
251
+ ---
252
+
253
+ ### **🔹 AbcEdgeQuerier (Abstract Edge Interface)**
254
+
255
+ Defines the querying mechanisms for edges in the graph.
256
+
257
+ ```python
258
+ from cpg2py import AbcEdgeQuerier, Storage
259
+
260
+ class MyEdgeQuerier(AbcEdgeQuerier):
261
+ def __init__(self, graph: Storage, f_nid: str, t_nid: str, e_type: str):
262
+ super().__init__(graph, f_nid, t_nid, e_type)
263
+
264
+ @property
265
+ def type(self):
266
+ return self.get_property("type")
267
+ ```
268
+
269
+ ---
270
+
271
+ ### **🔹 AbcGraphQuerier (Abstract Graph Interface)**
272
+
273
+ This class provides an interface for implementing custom graph query mechanisms. It's a generic class that supports type-safe operations.
274
+
275
+ ```python
276
+ from cpg2py import AbcGraphQuerier, Storage
277
+ from typing import Optional
278
+
279
+ class MyGraphQuerier(AbcGraphQuerier[MyNodeQuerier, MyEdgeQuerier]):
280
+ def node(self, nid: str) -> Optional[MyNodeQuerier]:
281
+ return MyNodeQuerier(self.storage, nid)
282
+
283
+ def edge(self, fid: str, tid: str, eid: str) -> Optional[MyEdgeQuerier]:
284
+ return MyEdgeQuerier(self.storage, fid, tid, eid)
285
+ ```
286
+
287
+ **Note**: `AbcGraphQuerier` is a generic class parameterized by node and edge types, ensuring type safety throughout graph operations. The concrete implementation `CpgGraph` is defined as `AbcGraphQuerier[CpgNode, CpgEdge]`.
288
+
289
+ ---
290
+
291
+ ## **🔍 Querying The Graph**
292
+
293
+ After implementing the abstract classes, you can perform advanced queries:
294
+
295
+ ```python
296
+ from cpg2py import Storage
297
+
298
+ storage = Storage()
299
+ graph = MyGraphQuerier(storage)
300
+
301
+ # Query node properties
302
+ node = graph.node("5")
303
+ print(node.name) # Example Output: "main"
304
+
305
+ # Query edge properties
306
+ edge = graph.edge("5", "6", "FLOWS_TO")
307
+ print(edge.type) # Output: "FLOWS_TO"
308
+ ```
309
+
310
+ ### **Using the Built-in CpgGraph**
311
+
312
+ You can also use the built-in `CpgGraph` implementation directly:
313
+
314
+ ```python
315
+ from typing import Iterable
316
+ from cpg2py import cpg_graph, CpgGraph, CpgNode, CpgEdge
317
+
318
+ # Load from CSV files
319
+ graph: CpgGraph = cpg_graph("nodes.csv", "rels.csv")
320
+
321
+ # Type-safe operations
322
+ node: CpgNode = graph.node("5")
323
+ edge: CpgEdge = graph.edge("5", "6", "FLOWS_TO")
324
+
325
+ # Type-safe iteration
326
+ successors: Iterable[CpgNode] = graph.succ(node)
327
+ for succ in successors:
328
+ print(succ.code) # Type checker knows succ is CpgNode
329
+ ```
330
+
331
+ ---
332
+
333
+ ## **🐝 API Reference**
334
+
335
+ For more detailed API documentation, please see our [APIs doc](docs/APIs.md).
336
+
337
+ ### **Main Package Exports**
338
+
339
+ All public APIs are available directly from the `cpg2py` package:
340
+
341
+ ```python
342
+ from cpg2py import (
343
+ # Factory function
344
+ cpg_graph,
345
+
346
+ # Concrete implementations
347
+ CpgGraph,
348
+ CpgNode,
349
+ CpgEdge,
350
+
351
+ # Abstract base classes
352
+ AbcGraphQuerier,
353
+ AbcNodeQuerier,
354
+ AbcEdgeQuerier,
355
+ Storage,
356
+
357
+ # Exceptions
358
+ CPGError,
359
+ NodeNotFoundError,
360
+ EdgeNotFoundError,
361
+ TopFileNotFoundError,
362
+ )
363
+ ```
364
+
365
+ ### **Graph Functions**
366
+
367
+ - `cpg_graph(node_csv: Path, edge_csv: Path, verbose: bool = False) -> CpgGraph`: Loads graph from CSV files and returns a `CpgGraph` instance.
368
+ - `graph.node(nid: str) -> Optional[CpgNode]`: Retrieves a node by ID (returns `CpgNode`).
369
+ - `graph.edge(fid: str, tid: str, eid: str) -> Optional[CpgEdge]`: Retrieves an edge (returns `CpgEdge`).
370
+ - `graph.succ(node: CpgNode) -> Iterable[CpgNode]`: Gets successor nodes.
371
+ - `graph.prev(node: CpgNode) -> Iterable[CpgNode]`: Gets predecessor nodes.
372
+ - `graph.children(node: CpgNode) -> Iterable[CpgNode]`: Gets child nodes via PARENT_OF edges.
373
+ - `graph.parent(node: CpgNode) -> Iterable[CpgNode]`: Gets parent nodes via PARENT_OF edges.
374
+ - `graph.flow_to(node: CpgNode) -> Iterable[CpgNode]`: Gets data flow successors.
375
+ - `graph.flow_from(node: CpgNode) -> Iterable[CpgNode]`: Gets data flow predecessors.
376
+ - `graph.topfile_node(nid: str) -> CpgNode`: Finds the top-level file node.
377
+
378
+ ### **Node Properties (CpgNode)**
379
+
380
+ - `.id`: Node ID (string).
381
+ - `.name`: Node name.
382
+ - `.type`: Node type.
383
+ - `.code`: Source code content.
384
+ - `.label`: Node label.
385
+ - `.line_num`: Source code line number.
386
+ - `.flags`: List of node flags.
387
+ - `.children_num`: Number of children.
388
+ - `.func_id`: Function ID.
389
+ - `.class_name`: Class name.
390
+ - `.namespace`: Namespace.
391
+ - `.end_num`: End line number.
392
+ - `.comment`: Documentation comment.
393
+
394
+ ### **Edge Properties (CpgEdge)**
395
+
396
+ - `.id`: Edge ID tuple `(from_node, to_node, edge_type)`.
397
+ - `.start`: Edge start position.
398
+ - `.end`: Edge end position.
399
+ - `.type`: Edge type.
400
+ - `.var`: Variable name (if applicable).
401
+
402
+ ---
403
+
404
+ ## **🌟 License**
405
+
406
+ This project is licensed under the **MIT License**.
407
+