rusty-graph 0.3.17__cp311-cp311-win_amd64.whl → 0.3.18__cp311-cp311-win_amd64.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- rusty_graph/rusty_graph.cp311-win_amd64.pyd +0 -0
- rusty_graph-0.3.18.dist-info/METADATA +1102 -0
- rusty_graph-0.3.18.dist-info/RECORD +6 -0
- rusty_graph-0.3.17.dist-info/METADATA +0 -8
- rusty_graph-0.3.17.dist-info/RECORD +0 -6
- {rusty_graph-0.3.17.dist-info → rusty_graph-0.3.18.dist-info}/WHEEL +0 -0
- {rusty_graph-0.3.17.dist-info → rusty_graph-0.3.18.dist-info}/licenses/LICENSE +0 -0
|
Binary file
|
|
@@ -0,0 +1,1102 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: rusty_graph
|
|
3
|
+
Version: 0.3.18
|
|
4
|
+
Classifier: Development Status :: 4 - Beta
|
|
5
|
+
Classifier: Intended Audience :: Developers
|
|
6
|
+
Classifier: Intended Audience :: Science/Research
|
|
7
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
8
|
+
Classifier: Operating System :: OS Independent
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.8
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.9
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
13
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
15
|
+
Classifier: Programming Language :: Rust
|
|
16
|
+
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
17
|
+
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
|
18
|
+
Classifier: Topic :: Database
|
|
19
|
+
Classifier: Topic :: Scientific/Engineering
|
|
20
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
21
|
+
License-File: LICENSE
|
|
22
|
+
Summary: A high-performance graph database library with Python bindings written in Rust
|
|
23
|
+
Keywords: graph,database,knowledge-graph,rust,high-performance,data-science
|
|
24
|
+
Author-email: Kristian dF Kollsgård <kkollsg@gmail.com>
|
|
25
|
+
License: MIT
|
|
26
|
+
Requires-Python: >=3.8
|
|
27
|
+
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
|
|
28
|
+
Project-URL: Documentation, https://github.com/kkollsga/rusty-graph#readme
|
|
29
|
+
Project-URL: Homepage, https://github.com/kkollsga/rusty-graph
|
|
30
|
+
Project-URL: Repository, https://github.com/kkollsga/rusty-graph
|
|
31
|
+
|
|
32
|
+
# Rusty Graph Python Library
|
|
33
|
+
|
|
34
|
+
A high-performance graph database library with Python bindings written in Rust.
|
|
35
|
+
|
|
36
|
+
## Table of Contents
|
|
37
|
+
|
|
38
|
+
- [Installation](#installation)
|
|
39
|
+
- [Introduction](#introduction)
|
|
40
|
+
- [Basic Usage](#basic-usage)
|
|
41
|
+
- [Working with Nodes](#working-with-nodes)
|
|
42
|
+
- [Working with Dates](#working-with-dates)
|
|
43
|
+
- [Creating Connections](#creating-connections)
|
|
44
|
+
- [Filtering and Querying](#filtering-and-querying)
|
|
45
|
+
- [Basic Filtering](#basic-filtering)
|
|
46
|
+
- [Null Value Checks](#null-value-checks)
|
|
47
|
+
- [Filtering Orphan Nodes](#filtering-orphan-nodes)
|
|
48
|
+
- [Sorting Results](#sorting-results)
|
|
49
|
+
- [Limiting Results](#limiting-results)
|
|
50
|
+
- [Traversing the Graph](#traversing-the-graph)
|
|
51
|
+
- [Set Operations on Selections](#set-operations-on-selections)
|
|
52
|
+
- [Path Finding and Graph Algorithms](#path-finding-and-graph-algorithms)
|
|
53
|
+
- [Pattern Matching](#pattern-matching)
|
|
54
|
+
- [Subgraph Extraction](#subgraph-extraction)
|
|
55
|
+
- [Spatial and Geometry Operations](#spatial-and-geometry-operations)
|
|
56
|
+
- [Schema Definition and Validation](#schema-definition-and-validation)
|
|
57
|
+
- [Index Management](#index-management)
|
|
58
|
+
- [Export Formats](#export-formats)
|
|
59
|
+
- [Statistics and Calculations](#statistics-and-calculations)
|
|
60
|
+
- [Saving and Loading](#saving-and-loading)
|
|
61
|
+
- [Operation Reports](#operation-reports)
|
|
62
|
+
- [Performance Tips](#performance-tips)
|
|
63
|
+
|
|
64
|
+
## Installation
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
pip install rusty-graph
|
|
68
|
+
# upgrade
|
|
69
|
+
pip install rusty-graph --upgrade
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
## Introduction
|
|
73
|
+
|
|
74
|
+
Rusty Graph is a Rust-based project that aims to empower the generation of high-performance knowledge graphs within Python environments. Specifically designed for aggregating and merging data from SQL databases, Rusty Graph facilitates the seamless transition of relational database information into structured knowledge graphs. By leveraging Rust's efficiency and Python's flexibility, Rusty Graph offers an optimal solution for data scientists and developers looking to harness the power of knowledge graphs in their data-driven applications.
|
|
75
|
+
|
|
76
|
+
## Key Features
|
|
77
|
+
|
|
78
|
+
- **Efficient Data Integration:** Easily import and merge data from SQL databases to construct knowledge graphs, optimizing for performance and scalability.
|
|
79
|
+
- **High-Performance Operations:** Utilize Rust's performance capabilities to handle graph operations, making Rusty Graph ideal for working with large-scale data.
|
|
80
|
+
- **Python Compatibility:** Directly integrate Rusty Graph into Python projects, allowing for a smooth workflow within Python-based data analysis and machine learning pipelines.
|
|
81
|
+
- **Flexible Graph Manipulation:** Create, modify, and query knowledge graphs with a rich set of features, catering to complex data structures and relationships.
|
|
82
|
+
- **Graph Algorithms:** Built-in shortest path, all paths, and connected components algorithms powered by petgraph.
|
|
83
|
+
- **Pattern Matching:** Cypher-like query syntax for expressive multi-hop graph traversals.
|
|
84
|
+
- **Spatial Operations:** Geographic queries including bounding box, distance (Haversine), and WKT geometry intersection.
|
|
85
|
+
- **Export Formats:** Export to GraphML, GEXF, D3 JSON, and CSV for visualization and interoperability.
|
|
86
|
+
|
|
87
|
+
## Basic Usage
|
|
88
|
+
|
|
89
|
+
```python
|
|
90
|
+
import rusty_graph
|
|
91
|
+
import pandas as pd
|
|
92
|
+
|
|
93
|
+
# Create a new knowledge graph
|
|
94
|
+
graph = rusty_graph.KnowledgeGraph()
|
|
95
|
+
|
|
96
|
+
# Create some data using pandas
|
|
97
|
+
users_df = pd.DataFrame({
|
|
98
|
+
'user_id': [1001, 1002, 1003],
|
|
99
|
+
'name': ['Alice', 'Bob', 'Charlie'],
|
|
100
|
+
'age': [28, 35, 42]
|
|
101
|
+
})
|
|
102
|
+
|
|
103
|
+
# Add nodes to the graph
|
|
104
|
+
graph.add_nodes(
|
|
105
|
+
data=users_df,
|
|
106
|
+
node_type='User',
|
|
107
|
+
unique_id_field='user_id',
|
|
108
|
+
node_title_field='name'
|
|
109
|
+
)
|
|
110
|
+
|
|
111
|
+
# View graph schema
|
|
112
|
+
print(graph.get_schema())
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Working with Nodes
|
|
116
|
+
|
|
117
|
+
### Adding Nodes
|
|
118
|
+
|
|
119
|
+
```python
|
|
120
|
+
# Add products to graph
|
|
121
|
+
products_df = pd.DataFrame({
|
|
122
|
+
'product_id': [101, 102, 103],
|
|
123
|
+
'title': ['Laptop', 'Phone', 'Tablet'],
|
|
124
|
+
'price': [999.99, 699.99, 349.99],
|
|
125
|
+
'stock': [45, 120, 30]
|
|
126
|
+
})
|
|
127
|
+
|
|
128
|
+
graph.add_nodes(
|
|
129
|
+
data=products_df,
|
|
130
|
+
node_type='Product',
|
|
131
|
+
unique_id_field='product_id',
|
|
132
|
+
node_title_field='title',
|
|
133
|
+
# Optional: specify which columns to include
|
|
134
|
+
columns=['product_id', 'title', 'price', 'stock', 'category'],
|
|
135
|
+
# Optional: how to handle conflicts with existing nodes
|
|
136
|
+
conflict_handling='update' # Options: 'update', 'replace', 'skip', 'preserve'
|
|
137
|
+
)
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Retrieving Nodes
|
|
141
|
+
|
|
142
|
+
```python
|
|
143
|
+
# Get all products
|
|
144
|
+
products = graph.type_filter('Product')
|
|
145
|
+
|
|
146
|
+
# Get node information
|
|
147
|
+
product_nodes = products.get_nodes()
|
|
148
|
+
print(product_nodes)
|
|
149
|
+
|
|
150
|
+
# Get specific properties
|
|
151
|
+
prices = products.get_properties(['price', 'stock'])
|
|
152
|
+
print(prices)
|
|
153
|
+
|
|
154
|
+
# Get only titles
|
|
155
|
+
titles = products.get_titles()
|
|
156
|
+
print(titles)
|
|
157
|
+
```
|
|
158
|
+
|
|
159
|
+
## Working with Dates
|
|
160
|
+
|
|
161
|
+
Rusty Graph supports native DateTime values for date-based filtering and operations.
|
|
162
|
+
|
|
163
|
+
### Specifying Date Columns
|
|
164
|
+
|
|
165
|
+
When adding nodes, use the `column_types` parameter to specify which columns should be parsed as dates:
|
|
166
|
+
|
|
167
|
+
```python
|
|
168
|
+
import pandas as pd
|
|
169
|
+
|
|
170
|
+
# Create data with date columns
|
|
171
|
+
estimates_df = pd.DataFrame({
|
|
172
|
+
'estimate_id': [1, 2, 3],
|
|
173
|
+
'name': ['Estimate A', 'Estimate B', 'Estimate C'],
|
|
174
|
+
'valid_from': ['2020-01-01', '2020-06-15', '2021-01-01'],
|
|
175
|
+
'valid_to': ['2020-12-31', '2021-06-14', '2021-12-31'],
|
|
176
|
+
'value': [100.5, 250.3, 180.0]
|
|
177
|
+
})
|
|
178
|
+
|
|
179
|
+
# Add nodes with date columns specified
|
|
180
|
+
graph.add_nodes(
|
|
181
|
+
data=estimates_df,
|
|
182
|
+
node_type='Estimate',
|
|
183
|
+
unique_id_field='estimate_id',
|
|
184
|
+
node_title_field='name',
|
|
185
|
+
column_types={'valid_from': 'datetime', 'valid_to': 'datetime'}
|
|
186
|
+
)
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Filtering on Date Fields
|
|
190
|
+
|
|
191
|
+
Date fields can be filtered using comparison operators. ISO format strings (YYYY-MM-DD) work correctly for date comparisons:
|
|
192
|
+
|
|
193
|
+
```python
|
|
194
|
+
# Find estimates valid after a specific date
|
|
195
|
+
recent_estimates = graph.type_filter('Estimate').filter({
|
|
196
|
+
'valid_from': {'>=': '2020-06-01'}
|
|
197
|
+
})
|
|
198
|
+
|
|
199
|
+
# Find estimates within a date range
|
|
200
|
+
active_in_2020 = graph.type_filter('Estimate').filter({
|
|
201
|
+
'valid_from': {'<=': '2020-12-31'},
|
|
202
|
+
'valid_to': {'>=': '2020-01-01'}
|
|
203
|
+
})
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
### Temporal Queries
|
|
207
|
+
|
|
208
|
+
For entities with validity periods (like estimates, contracts, or versions), Rusty Graph provides convenient methods to query based on time:
|
|
209
|
+
|
|
210
|
+
```python
|
|
211
|
+
# Find entities valid at a specific point in time
|
|
212
|
+
# Default field names: 'date_from' and 'date_to'
|
|
213
|
+
valid_estimates = graph.type_filter('Estimate').valid_at('2020-06-15')
|
|
214
|
+
|
|
215
|
+
# Use custom field names if your data uses different column names
|
|
216
|
+
active_contracts = graph.type_filter('Contract').valid_at(
|
|
217
|
+
'2021-03-01',
|
|
218
|
+
date_from_field='start_date',
|
|
219
|
+
date_to_field='end_date'
|
|
220
|
+
)
|
|
221
|
+
|
|
222
|
+
# Find entities valid during a date range (overlapping periods)
|
|
223
|
+
overlapping = graph.type_filter('Estimate').valid_during('2020-01-01', '2020-06-30')
|
|
224
|
+
|
|
225
|
+
# Chain with other operations
|
|
226
|
+
high_value_valid = (
|
|
227
|
+
graph.type_filter('Estimate')
|
|
228
|
+
.valid_at('2020-06-15')
|
|
229
|
+
.filter({'value': {'>=': 100.0}})
|
|
230
|
+
)
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
**Note:** `valid_at(date)` finds nodes where `date_from <= date <= date_to`. `valid_during(start, end)` finds nodes whose validity period overlaps with the given range.
|
|
234
|
+
|
|
235
|
+
### Batch Property Updates
|
|
236
|
+
|
|
237
|
+
Update properties on multiple nodes at once based on a selection:
|
|
238
|
+
|
|
239
|
+
```python
|
|
240
|
+
# Select nodes and update them in batch
|
|
241
|
+
result = graph.type_filter('Prospect').filter({'status': 'Inactive'}).update({
|
|
242
|
+
'is_active': False,
|
|
243
|
+
'deactivation_reason': 'status_inactive'
|
|
244
|
+
})
|
|
245
|
+
|
|
246
|
+
# Access the updated graph and count
|
|
247
|
+
updated_graph = result['graph']
|
|
248
|
+
nodes_updated = result['nodes_updated']
|
|
249
|
+
print(f"Updated {nodes_updated} nodes")
|
|
250
|
+
|
|
251
|
+
# Use keep_selection=True to preserve the selection for chaining
|
|
252
|
+
result = selection.update({'processed': True}, keep_selection=True)
|
|
253
|
+
|
|
254
|
+
# Update with different value types
|
|
255
|
+
graph.type_filter('Node').update({
|
|
256
|
+
'count': 42, # Integer
|
|
257
|
+
'ratio': 3.14159, # Float
|
|
258
|
+
'active': True, # Boolean
|
|
259
|
+
'category': 'updated' # String
|
|
260
|
+
})
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
**Note:** The `update()` method returns a dictionary with `graph` (the updated KnowledgeGraph), `nodes_updated` (count of updated nodes), and `report_index` (index of the operation report). By default, the selection is cleared after update; use `keep_selection=True` to preserve it.
|
|
264
|
+
|
|
265
|
+
### Query Explain
|
|
266
|
+
|
|
267
|
+
Get insight into how your queries are executed with the `explain()` method:
|
|
268
|
+
|
|
269
|
+
```python
|
|
270
|
+
# Build a query chain
|
|
271
|
+
result = (
|
|
272
|
+
graph.type_filter('Prospect')
|
|
273
|
+
.filter({'region': 'North'})
|
|
274
|
+
.traverse('HAS_ESTIMATE')
|
|
275
|
+
)
|
|
276
|
+
|
|
277
|
+
# See the execution plan
|
|
278
|
+
print(result.explain())
|
|
279
|
+
# Output: TYPE_FILTER Prospect (6775 nodes) -> FILTER (3200 nodes) -> TRAVERSE HAS_ESTIMATE (10954 nodes)
|
|
280
|
+
|
|
281
|
+
# Works with temporal queries too
|
|
282
|
+
valid_estimates = graph.type_filter('Estimate').valid_at('2020-06-15')
|
|
283
|
+
print(valid_estimates.explain())
|
|
284
|
+
# Output: TYPE_FILTER Estimate (1000 nodes) -> VALID_AT (450 nodes)
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
**Note:** The `explain()` method shows each operation in the query chain with the actual number of nodes at each step. This helps you understand query performance and optimize your queries.
|
|
288
|
+
|
|
289
|
+
## Creating Connections
|
|
290
|
+
|
|
291
|
+
```python
|
|
292
|
+
# Purchase data
|
|
293
|
+
purchases_df = pd.DataFrame({
|
|
294
|
+
'user_id': [1001, 1001, 1002],
|
|
295
|
+
'product_id': [101, 103, 102],
|
|
296
|
+
'date': ['2023-01-15', '2023-02-10', '2023-01-20'],
|
|
297
|
+
'quantity': [1, 2, 1]
|
|
298
|
+
})
|
|
299
|
+
|
|
300
|
+
# Create connections
|
|
301
|
+
graph.add_connections(
|
|
302
|
+
data=purchases_df,
|
|
303
|
+
connection_type='PURCHASED',
|
|
304
|
+
source_type='User',
|
|
305
|
+
source_id_field='user_id',
|
|
306
|
+
target_type='Product',
|
|
307
|
+
target_id_field='product_id',
|
|
308
|
+
# Optional additional fields to include
|
|
309
|
+
columns=['date', 'quantity']
|
|
310
|
+
)
|
|
311
|
+
|
|
312
|
+
# Create connections from currently selected nodes
|
|
313
|
+
users = graph.type_filter('User')
|
|
314
|
+
products = graph.type_filter('Product')
|
|
315
|
+
# This would connect all users to all products with a 'VIEWED' connection
|
|
316
|
+
users.selection_to_new_connections(connection_type='VIEWED')
|
|
317
|
+
```
|
|
318
|
+
|
|
319
|
+
## Filtering and Querying
|
|
320
|
+
|
|
321
|
+
### Basic Filtering
|
|
322
|
+
|
|
323
|
+
```python
|
|
324
|
+
# Filter by exact match
|
|
325
|
+
expensive_products = graph.type_filter('Product').filter({'price': 999.99})
|
|
326
|
+
|
|
327
|
+
# Filter using operators
|
|
328
|
+
affordable_products = graph.type_filter('Product').filter({
|
|
329
|
+
'price': {'<': 500.0}
|
|
330
|
+
})
|
|
331
|
+
|
|
332
|
+
# Multiple conditions
|
|
333
|
+
popular_affordable = graph.type_filter('Product').filter({
|
|
334
|
+
'price': {'<': 500.0},
|
|
335
|
+
'stock': {'>': 50}
|
|
336
|
+
})
|
|
337
|
+
|
|
338
|
+
# In operator
|
|
339
|
+
selected_products = graph.type_filter('Product').filter({
|
|
340
|
+
'product_id': {'in': [101, 103]}
|
|
341
|
+
})
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
### Null Value Checks
|
|
345
|
+
|
|
346
|
+
You can filter nodes based on whether a field is null (missing) or not null:
|
|
347
|
+
|
|
348
|
+
```python
|
|
349
|
+
# Find nodes where a field is null or missing
|
|
350
|
+
nodes_without_category = graph.type_filter('Product').filter({
|
|
351
|
+
'category': {'is_null': True}
|
|
352
|
+
})
|
|
353
|
+
|
|
354
|
+
# Find nodes where a field exists and is not null
|
|
355
|
+
nodes_with_category = graph.type_filter('Product').filter({
|
|
356
|
+
'category': {'is_not_null': True}
|
|
357
|
+
})
|
|
358
|
+
|
|
359
|
+
# Combine with other conditions
|
|
360
|
+
incomplete_products = graph.type_filter('Product').filter({
|
|
361
|
+
'description': {'is_null': True},
|
|
362
|
+
'price': {'>': 0}
|
|
363
|
+
})
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
### Filtering Orphan Nodes
|
|
367
|
+
|
|
368
|
+
Orphan nodes are nodes that have no connections (no incoming or outgoing edges). You can filter to include or exclude orphan nodes:
|
|
369
|
+
|
|
370
|
+
```python
|
|
371
|
+
# Get only orphan nodes
|
|
372
|
+
orphans = graph.filter_orphans(include_orphans=True)
|
|
373
|
+
|
|
374
|
+
# Get only nodes that have at least one connection
|
|
375
|
+
connected = graph.filter_orphans(include_orphans=False)
|
|
376
|
+
|
|
377
|
+
# Filter orphans with sorting and limits
|
|
378
|
+
recent_orphans = graph.filter_orphans(
|
|
379
|
+
include_orphans=True,
|
|
380
|
+
sort_spec='created_date',
|
|
381
|
+
max_nodes=100
|
|
382
|
+
)
|
|
383
|
+
|
|
384
|
+
# Chain with other operations
|
|
385
|
+
product_orphans = graph.type_filter('Product').filter_orphans(include_orphans=True)
|
|
386
|
+
```
|
|
387
|
+
|
|
388
|
+
### Sorting Results
|
|
389
|
+
|
|
390
|
+
Rusty Graph offers flexible options for sorting nodes based on their properties. The `sort_spec` parameter can be used in various methods including `type_filter()`, `filter()`, `filter_orphans()`, `traverse()`, and the standalone `sort()` method.
|
|
391
|
+
|
|
392
|
+
#### Sort Specification Format Options
|
|
393
|
+
|
|
394
|
+
1. **Single field string**: Sorts by the specified field in ascending order.
|
|
395
|
+
```python
|
|
396
|
+
# Sort products by price (lowest to highest)
|
|
397
|
+
sorted_products = graph.type_filter('Product').sort('price')
|
|
398
|
+
|
|
399
|
+
# Can also be used in other methods
|
|
400
|
+
cheap_products = graph.type_filter('Product').filter(
|
|
401
|
+
{'stock': {'>': 10}},
|
|
402
|
+
sort_spec='price'
|
|
403
|
+
)
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
2. **Field with direction**: Explicitly specify ascending or descending order.
|
|
407
|
+
```python
|
|
408
|
+
# Sort products by price (highest to lowest)
|
|
409
|
+
expensive_first = graph.type_filter('Product').sort('price', ascending=False)
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
3. **List of tuples**: For multi-field sorting with different directions.
|
|
413
|
+
```python
|
|
414
|
+
# First sort by stock (descending), then by price (ascending)
|
|
415
|
+
# This prioritizes high-stock items, and for items with equal stock,
|
|
416
|
+
# shows the cheapest ones first
|
|
417
|
+
complex_sort = graph.type_filter('Product').sort([
|
|
418
|
+
('stock', False), # False = descending order
|
|
419
|
+
('price', True) # True = ascending order
|
|
420
|
+
])
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
4. **Dictionary with field and direction**: Alternative format for single field sorting.
|
|
424
|
+
```python
|
|
425
|
+
# Sort by rating in descending order
|
|
426
|
+
top_rated = graph.type_filter('Product').filter(
|
|
427
|
+
{},
|
|
428
|
+
sort_spec={'field': 'rating', 'ascending': False}
|
|
429
|
+
)
|
|
430
|
+
```
|
|
431
|
+
|
|
432
|
+
#### Using Sort Specifications in Different Methods
|
|
433
|
+
|
|
434
|
+
Sort specifications work consistently across methods:
|
|
435
|
+
|
|
436
|
+
```python
|
|
437
|
+
# In type_filter
|
|
438
|
+
latest_users = graph.type_filter('User', sort_spec='creation_date', max_nodes=10)
|
|
439
|
+
|
|
440
|
+
# In filter
|
|
441
|
+
new_expensive = graph.type_filter('Product').filter(
|
|
442
|
+
{'price': {'>': 500.0}},
|
|
443
|
+
sort_spec=[('creation_date', False), ('price', True)]
|
|
444
|
+
)
|
|
445
|
+
|
|
446
|
+
# In traversal
|
|
447
|
+
alice_recent_purchases = graph.type_filter('User').filter({'name': 'Alice'}).traverse(
|
|
448
|
+
connection_type='PURCHASED',
|
|
449
|
+
sort_target='date',
|
|
450
|
+
max_nodes=5
|
|
451
|
+
)
|
|
452
|
+
|
|
453
|
+
# In filter_orphans
|
|
454
|
+
recent_orphans = graph.filter_orphans(
|
|
455
|
+
include_orphans=True,
|
|
456
|
+
sort_spec='last_modified',
|
|
457
|
+
max_nodes=20
|
|
458
|
+
)
|
|
459
|
+
|
|
460
|
+
# In children_properties_to_list
|
|
461
|
+
expensive_products = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
|
|
462
|
+
property='title',
|
|
463
|
+
sort_spec='price', # Sort children by price before creating the list
|
|
464
|
+
max_nodes=3,
|
|
465
|
+
store_as='top_expensive_purchases'
|
|
466
|
+
)
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
### Limiting Results
|
|
470
|
+
|
|
471
|
+
```python
|
|
472
|
+
# Get at most 5 nodes per group
|
|
473
|
+
limited_products = graph.type_filter('Product').max_nodes(5)
|
|
474
|
+
```
|
|
475
|
+
|
|
476
|
+
## Traversing the Graph
|
|
477
|
+
|
|
478
|
+
```python
|
|
479
|
+
# Find products purchased by a specific user
|
|
480
|
+
alice = graph.type_filter('User').filter({'name': 'Alice'})
|
|
481
|
+
alice_products = alice.traverse(
|
|
482
|
+
connection_type='PURCHASED',
|
|
483
|
+
direction='outgoing'
|
|
484
|
+
)
|
|
485
|
+
|
|
486
|
+
# Access the resulting products
|
|
487
|
+
alice_product_data = alice_products.get_nodes()
|
|
488
|
+
|
|
489
|
+
# Filter the traversal target nodes
|
|
490
|
+
expensive_purchases = alice.traverse(
|
|
491
|
+
connection_type='PURCHASED',
|
|
492
|
+
filter_target={'price': {'>=': 500.0}},
|
|
493
|
+
sort_target='price',
|
|
494
|
+
max_nodes=10
|
|
495
|
+
)
|
|
496
|
+
|
|
497
|
+
# Get connection information
|
|
498
|
+
connection_data = alice.get_connections(include_node_properties=True)
|
|
499
|
+
```
|
|
500
|
+
|
|
501
|
+
### Filtering on Connection Properties
|
|
502
|
+
|
|
503
|
+
You can filter traversals based on properties stored on the connections themselves:
|
|
504
|
+
|
|
505
|
+
```python
|
|
506
|
+
# Traverse only through connections with specific property values
|
|
507
|
+
high_share_blocks = graph.type_filter('Discovery').traverse(
|
|
508
|
+
connection_type='EXTENDS_INTO',
|
|
509
|
+
filter_connection={'share_pct': {'>=': 50.0}}
|
|
510
|
+
)
|
|
511
|
+
|
|
512
|
+
# Combine connection and target filters
|
|
513
|
+
result = graph.type_filter('Discovery').traverse(
|
|
514
|
+
connection_type='EXTENDS_INTO',
|
|
515
|
+
filter_connection={'year': 2021},
|
|
516
|
+
filter_target={'status': 'active'}
|
|
517
|
+
)
|
|
518
|
+
|
|
519
|
+
# Filter connections with null/not-null checks
|
|
520
|
+
discounted = user.traverse(
|
|
521
|
+
connection_type='PURCHASED',
|
|
522
|
+
filter_connection={'discount': {'is_not_null': True}}
|
|
523
|
+
)
|
|
524
|
+
```
|
|
525
|
+
|
|
526
|
+
## Set Operations on Selections
|
|
527
|
+
|
|
528
|
+
Rusty Graph supports set operations to combine, intersect, or subtract selections. These operations create new selections without modifying the originals.
|
|
529
|
+
|
|
530
|
+
### Union
|
|
531
|
+
|
|
532
|
+
Combines all nodes from both selections (logical OR):
|
|
533
|
+
|
|
534
|
+
```python
|
|
535
|
+
# Select prospects from different geoprovinces
|
|
536
|
+
n3_prospects = graph.type_filter('Prospect').filter({'geoprovince': 'N3'})
|
|
537
|
+
m3_prospects = graph.type_filter('Prospect').filter({'geoprovince': 'M3'})
|
|
538
|
+
|
|
539
|
+
# Combine both selections
|
|
540
|
+
combined = n3_prospects.union(m3_prospects)
|
|
541
|
+
print(f"Total prospects: {len(combined.get_nodes())}")
|
|
542
|
+
```
|
|
543
|
+
|
|
544
|
+
### Intersection
|
|
545
|
+
|
|
546
|
+
Keeps only nodes present in both selections (logical AND):
|
|
547
|
+
|
|
548
|
+
```python
|
|
549
|
+
# Select large discoveries and discoveries in a specific block
|
|
550
|
+
large_discoveries = graph.type_filter('Discovery').filter({'oil_reserves': {'>=': 100.0}})
|
|
551
|
+
block_34_discoveries = graph.type_filter('Block').filter({'block_id': 34}).traverse('CONTAINS', direction='incoming')
|
|
552
|
+
|
|
553
|
+
# Get large discoveries in block 34
|
|
554
|
+
result = large_discoveries.intersection(block_34_discoveries)
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
### Difference
|
|
558
|
+
|
|
559
|
+
Keeps nodes in the first selection but not in the second (subtraction):
|
|
560
|
+
|
|
561
|
+
```python
|
|
562
|
+
# Get all prospects
|
|
563
|
+
all_prospects = graph.type_filter('Prospect')
|
|
564
|
+
|
|
565
|
+
# Get prospects that have estimates
|
|
566
|
+
with_estimates = graph.type_filter('ProspectEstimate').traverse('BELONGS_TO', direction='incoming')
|
|
567
|
+
|
|
568
|
+
# Get prospects WITHOUT estimates
|
|
569
|
+
without_estimates = all_prospects.difference(with_estimates)
|
|
570
|
+
```
|
|
571
|
+
|
|
572
|
+
### Symmetric Difference
|
|
573
|
+
|
|
574
|
+
Keeps nodes that are in exactly one selection but not both (exclusive OR):
|
|
575
|
+
|
|
576
|
+
```python
|
|
577
|
+
# Nodes in category A or B but not both
|
|
578
|
+
exclusive_nodes = category_a.symmetric_difference(category_b)
|
|
579
|
+
```
|
|
580
|
+
|
|
581
|
+
### Chaining Operations
|
|
582
|
+
|
|
583
|
+
Set operations can be chained for complex queries:
|
|
584
|
+
|
|
585
|
+
```python
|
|
586
|
+
# (A union B) intersection C
|
|
587
|
+
result = selection_a.union(selection_b).intersection(selection_c)
|
|
588
|
+
|
|
589
|
+
# A difference (B intersection C)
|
|
590
|
+
b_inter_c = selection_b.intersection(selection_c)
|
|
591
|
+
result = selection_a.difference(b_inter_c)
|
|
592
|
+
```
|
|
593
|
+
|
|
594
|
+
## Path Finding and Graph Algorithms
|
|
595
|
+
|
|
596
|
+
Rusty Graph provides efficient implementations of common graph algorithms powered by petgraph.
|
|
597
|
+
|
|
598
|
+
### Shortest Path
|
|
599
|
+
|
|
600
|
+
Find the shortest path between two nodes:
|
|
601
|
+
|
|
602
|
+
```python
|
|
603
|
+
# Find shortest path between two nodes
|
|
604
|
+
path = graph.shortest_path(
|
|
605
|
+
source_type='Person',
|
|
606
|
+
source_id=1,
|
|
607
|
+
target_type='Person',
|
|
608
|
+
target_id=100,
|
|
609
|
+
max_hops=10 # Optional limit
|
|
610
|
+
)
|
|
611
|
+
|
|
612
|
+
# Path is a list of node dictionaries
|
|
613
|
+
for node in path:
|
|
614
|
+
print(f"{node['node_type']}: {node['title']}")
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
### All Paths
|
|
618
|
+
|
|
619
|
+
Find all paths between nodes up to a maximum number of hops:
|
|
620
|
+
|
|
621
|
+
```python
|
|
622
|
+
# Find all paths up to 4 hops
|
|
623
|
+
paths = graph.all_paths(
|
|
624
|
+
source_type='Play',
|
|
625
|
+
source_id=1,
|
|
626
|
+
target_type='Wellbore',
|
|
627
|
+
max_hops=4
|
|
628
|
+
)
|
|
629
|
+
|
|
630
|
+
# Returns a list of paths, each path is a list of nodes
|
|
631
|
+
print(f"Found {len(paths)} paths")
|
|
632
|
+
for i, path in enumerate(paths):
|
|
633
|
+
print(f"Path {i+1}: {' -> '.join(n['title'] for n in path)}")
|
|
634
|
+
```
|
|
635
|
+
|
|
636
|
+
### Connected Components
|
|
637
|
+
|
|
638
|
+
Identify connected components in the graph:
|
|
639
|
+
|
|
640
|
+
```python
|
|
641
|
+
# Get all connected components
|
|
642
|
+
components = graph.connected_components()
|
|
643
|
+
|
|
644
|
+
# Returns a list of components, each component is a list of node IDs
|
|
645
|
+
print(f"Found {len(components)} connected components")
|
|
646
|
+
for i, component in enumerate(components):
|
|
647
|
+
print(f"Component {i+1}: {len(component)} nodes")
|
|
648
|
+
```
|
|
649
|
+
|
|
650
|
+
## Pattern Matching
|
|
651
|
+
|
|
652
|
+
Query the graph using Cypher-like pattern syntax for expressive multi-hop queries:
|
|
653
|
+
|
|
654
|
+
```python
|
|
655
|
+
# Simple pattern: Find plays with prospects that became discoveries
|
|
656
|
+
results = graph.match_pattern(
|
|
657
|
+
'(p:Play)-[:HAS_PROSPECT]->(pr:Prospect)-[:BECAME_DISCOVERY]->(d:Discovery)'
|
|
658
|
+
)
|
|
659
|
+
|
|
660
|
+
# Access matched variables
|
|
661
|
+
for match in results:
|
|
662
|
+
print(f"Play: {match['p']['title']}")
|
|
663
|
+
print(f"Prospect: {match['pr']['title']}")
|
|
664
|
+
print(f"Discovery: {match['d']['title']}")
|
|
665
|
+
|
|
666
|
+
# Pattern with property conditions
|
|
667
|
+
results = graph.match_pattern(
|
|
668
|
+
'(u:User)-[:PURCHASED]->(p:Product {category: "Electronics"})'
|
|
669
|
+
)
|
|
670
|
+
|
|
671
|
+
# Limit results for performance on large graphs
|
|
672
|
+
results = graph.match_pattern(
|
|
673
|
+
'(a:Person)-[:KNOWS]->(b:Person)',
|
|
674
|
+
max_matches=100
|
|
675
|
+
)
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
**Supported pattern syntax:**
|
|
679
|
+
|
|
680
|
+
- Node patterns: `(variable:NodeType)` or `(variable:NodeType {property: "value"})`
|
|
681
|
+
- Relationship patterns: `-[:CONNECTION_TYPE]->`
|
|
682
|
+
- Multiple hops: Chain patterns like `(a)-[:REL1]->(b)-[:REL2]->(c)`
|
|
683
|
+
|
|
684
|
+
## Subgraph Extraction
|
|
685
|
+
|
|
686
|
+
Extract a portion of the graph for isolated analysis or export:
|
|
687
|
+
|
|
688
|
+
```python
|
|
689
|
+
# Start with a selection and expand to include neighbors
|
|
690
|
+
subgraph = (
|
|
691
|
+
graph.type_filter('Company')
|
|
692
|
+
.filter({'name': 'Acme Corp'})
|
|
693
|
+
.expand(hops=2) # Include all nodes within 2 hops
|
|
694
|
+
.to_subgraph() # Create independent subgraph
|
|
695
|
+
)
|
|
696
|
+
|
|
697
|
+
# The subgraph is a fully functional KnowledgeGraph
|
|
698
|
+
print(f"Subgraph has {subgraph.node_count()} nodes")
|
|
699
|
+
|
|
700
|
+
# Save the subgraph
|
|
701
|
+
subgraph.save('acme_network.bin')
|
|
702
|
+
|
|
703
|
+
# Export to visualization format
|
|
704
|
+
subgraph.export('acme_network.graphml', format='graphml')
|
|
705
|
+
```
|
|
706
|
+
|
|
707
|
+
### Expand Method
|
|
708
|
+
|
|
709
|
+
The `expand()` method uses breadth-first search to include neighboring nodes:
|
|
710
|
+
|
|
711
|
+
```python
|
|
712
|
+
# Expand selection by 1 hop (immediate neighbors)
|
|
713
|
+
expanded = graph.type_filter('Person').filter({'name': 'Alice'}).expand(hops=1)
|
|
714
|
+
|
|
715
|
+
# Expand by 3 hops for broader context
|
|
716
|
+
broad_context = selection.expand(hops=3)
|
|
717
|
+
```
|
|
718
|
+
|
|
719
|
+
## Spatial and Geometry Operations
|
|
720
|
+
|
|
721
|
+
Query nodes based on geographic location and geometry. Useful for GIS applications and location-based analysis.
|
|
722
|
+
|
|
723
|
+
### Bounding Box Queries
|
|
724
|
+
|
|
725
|
+
Find nodes within a rectangular geographic area:
|
|
726
|
+
|
|
727
|
+
```python
|
|
728
|
+
# Find discoveries within a bounding box
|
|
729
|
+
north_sea_discoveries = graph.type_filter('Discovery').within_bounds(
|
|
730
|
+
lat_field='latitude',
|
|
731
|
+
lon_field='longitude',
|
|
732
|
+
min_lat=58.0,
|
|
733
|
+
max_lat=62.0,
|
|
734
|
+
min_lon=1.0,
|
|
735
|
+
max_lon=5.0
|
|
736
|
+
)
|
|
737
|
+
```
|
|
738
|
+
|
|
739
|
+
### Distance Queries (Haversine)
|
|
740
|
+
|
|
741
|
+
Find nodes within a radius of a point using great-circle distance:
|
|
742
|
+
|
|
743
|
+
```python
|
|
744
|
+
# Find wellbores within 50km of a location
|
|
745
|
+
nearby_wellbores = graph.type_filter('Wellbore').near_point_km(
|
|
746
|
+
lat_field='latitude',
|
|
747
|
+
lon_field='longitude',
|
|
748
|
+
center_lat=60.5,
|
|
749
|
+
center_lon=3.2,
|
|
750
|
+
radius_km=50.0
|
|
751
|
+
)
|
|
752
|
+
```
|
|
753
|
+
|
|
754
|
+
### WKT Geometry Intersection
|
|
755
|
+
|
|
756
|
+
Find nodes whose geometry intersects with a WKT polygon:
|
|
757
|
+
|
|
758
|
+
```python
|
|
759
|
+
# Define a polygon in WKT format
|
|
760
|
+
search_area = 'POLYGON((1 58, 5 58, 5 62, 1 62, 1 58))'
|
|
761
|
+
|
|
762
|
+
# Find fields that intersect the polygon
|
|
763
|
+
fields_in_area = graph.type_filter('Field').intersects(
|
|
764
|
+
geometry_field='wkt_geometry',
|
|
765
|
+
wkt=search_area
|
|
766
|
+
)
|
|
767
|
+
```
|
|
768
|
+
|
|
769
|
+
## Schema Definition and Validation
|
|
770
|
+
|
|
771
|
+
Define expected structure and validate your graph data:
|
|
772
|
+
|
|
773
|
+
### Defining a Schema
|
|
774
|
+
|
|
775
|
+
```python
|
|
776
|
+
# Define schema for node types and connections
|
|
777
|
+
graph.define_schema({
|
|
778
|
+
'nodes': {
|
|
779
|
+
'Prospect': {
|
|
780
|
+
'required': ['npdid_prospect', 'prospect_name'],
|
|
781
|
+
'optional': ['prospect_status', 'prospect_geoprovince'],
|
|
782
|
+
'types': {
|
|
783
|
+
'npdid_prospect': 'integer',
|
|
784
|
+
'prospect_name': 'string',
|
|
785
|
+
'prospect_ns_dec': 'float'
|
|
786
|
+
}
|
|
787
|
+
},
|
|
788
|
+
'ProspectEstimate': {
|
|
789
|
+
'required': ['estimate_id'],
|
|
790
|
+
'types': {
|
|
791
|
+
'estimate_id': 'integer',
|
|
792
|
+
'value': 'float'
|
|
793
|
+
}
|
|
794
|
+
}
|
|
795
|
+
},
|
|
796
|
+
'connections': {
|
|
797
|
+
'HAS_ESTIMATE': {
|
|
798
|
+
'source': 'Prospect',
|
|
799
|
+
'target': 'ProspectEstimate'
|
|
800
|
+
}
|
|
801
|
+
}
|
|
802
|
+
})
|
|
803
|
+
```
|
|
804
|
+
|
|
805
|
+
### Validating Against Schema
|
|
806
|
+
|
|
807
|
+
```python
|
|
808
|
+
# Validate the graph against the defined schema
|
|
809
|
+
errors = graph.validate_schema()
|
|
810
|
+
|
|
811
|
+
if errors:
|
|
812
|
+
print("Validation errors found:")
|
|
813
|
+
for error in errors:
|
|
814
|
+
print(f" - {error}")
|
|
815
|
+
else:
|
|
816
|
+
print("Graph validates successfully!")
|
|
817
|
+
```
|
|
818
|
+
|
|
819
|
+
### Getting Current Schema
|
|
820
|
+
|
|
821
|
+
```python
|
|
822
|
+
# View the current schema (auto-generated from data)
|
|
823
|
+
schema = graph.get_schema()
|
|
824
|
+
print(schema)
|
|
825
|
+
```
|
|
826
|
+
|
|
827
|
+
## Index Management
|
|
828
|
+
|
|
829
|
+
Create indexes for faster filtering on frequently queried properties:
|
|
830
|
+
|
|
831
|
+
### Creating Indexes
|
|
832
|
+
|
|
833
|
+
```python
|
|
834
|
+
# Create an index on a property
|
|
835
|
+
graph.create_index('Prospect', 'prospect_geoprovince')
|
|
836
|
+
|
|
837
|
+
# Indexed properties get O(1) lookup for equality filters
|
|
838
|
+
# This query will be much faster with an index:
|
|
839
|
+
north_prospects = graph.type_filter('Prospect').filter({
|
|
840
|
+
'prospect_geoprovince': 'North Sea'
|
|
841
|
+
})
|
|
842
|
+
```
|
|
843
|
+
|
|
844
|
+
### Listing and Dropping Indexes
|
|
845
|
+
|
|
846
|
+
```python
|
|
847
|
+
# List all indexes
|
|
848
|
+
indexes = graph.list_indexes()
|
|
849
|
+
for idx in indexes:
|
|
850
|
+
print(f"Index on {idx['node_type']}.{idx['property']}")
|
|
851
|
+
|
|
852
|
+
# Drop an index
|
|
853
|
+
graph.drop_index('Prospect', 'prospect_geoprovince')
|
|
854
|
+
```
|
|
855
|
+
|
|
856
|
+
**Performance Note:** Benchmarks show ~3.3x speedup for equality filters on indexed properties. Create indexes on properties you frequently filter by exact value.
|
|
857
|
+
|
|
858
|
+
## Export Formats
|
|
859
|
+
|
|
860
|
+
Export your graph to various formats for visualization and interoperability:
|
|
861
|
+
|
|
862
|
+
### Export to File
|
|
863
|
+
|
|
864
|
+
```python
|
|
865
|
+
# GraphML format (compatible with Gephi, yEd, etc.)
|
|
866
|
+
graph.export('my_graph.graphml', format='graphml')
|
|
867
|
+
|
|
868
|
+
# GEXF format (Gephi native format)
|
|
869
|
+
graph.export('my_graph.gexf', format='gexf')
|
|
870
|
+
|
|
871
|
+
# D3.js JSON format (for web visualization)
|
|
872
|
+
graph.export('my_graph.json', format='d3')
|
|
873
|
+
|
|
874
|
+
# CSV format (nodes and edges as CSV)
|
|
875
|
+
graph.export('my_graph.csv', format='csv')
|
|
876
|
+
```
|
|
877
|
+
|
|
878
|
+
### Export to String
|
|
879
|
+
|
|
880
|
+
Get export data as a string for programmatic use:
|
|
881
|
+
|
|
882
|
+
```python
|
|
883
|
+
# Get GraphML as string
|
|
884
|
+
graphml_string = graph.export_string(format='graphml')
|
|
885
|
+
|
|
886
|
+
# Get D3 JSON as string
|
|
887
|
+
d3_json = graph.export_string(format='d3')
|
|
888
|
+
|
|
889
|
+
# Export only current selection
|
|
890
|
+
selected_json = graph.type_filter('Person').export_string(
|
|
891
|
+
format='d3',
|
|
892
|
+
selection_only=True
|
|
893
|
+
)
|
|
894
|
+
```
|
|
895
|
+
|
|
896
|
+
### Export Subgraphs
|
|
897
|
+
|
|
898
|
+
Combine with subgraph extraction for partial exports:
|
|
899
|
+
|
|
900
|
+
```python
|
|
901
|
+
# Export just a portion of the graph
|
|
902
|
+
subgraph = (
|
|
903
|
+
graph.type_filter('Company')
|
|
904
|
+
.filter({'region': 'Europe'})
|
|
905
|
+
.expand(hops=2)
|
|
906
|
+
.to_subgraph()
|
|
907
|
+
)
|
|
908
|
+
subgraph.export('europe_companies.graphml', format='graphml')
|
|
909
|
+
```
|
|
910
|
+
|
|
911
|
+
## Statistics and Calculations
|
|
912
|
+
|
|
913
|
+
### Basic Statistics
|
|
914
|
+
|
|
915
|
+
```python
|
|
916
|
+
# Get statistics for a property
|
|
917
|
+
price_stats = graph.type_filter('Product').statistics('price')
|
|
918
|
+
print(price_stats)
|
|
919
|
+
|
|
920
|
+
# Calculate unique values
|
|
921
|
+
unique_categories = graph.type_filter('Product').unique_values(
|
|
922
|
+
property='category',
|
|
923
|
+
# Store result in node property
|
|
924
|
+
store_as='category_list',
|
|
925
|
+
max_length=10
|
|
926
|
+
)
|
|
927
|
+
|
|
928
|
+
# Convert children properties to a comma-separated list in parent nodes
|
|
929
|
+
# Option 1: Store results in parent nodes
|
|
930
|
+
users_with_products = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
|
|
931
|
+
property='title', # Default is 'title' if not specified
|
|
932
|
+
filter={'price': {'<': 500.0}}, # Optional filtering of children
|
|
933
|
+
sort_spec='price', # Optional sorting of children
|
|
934
|
+
max_nodes=5, # Optional limit of children per parent
|
|
935
|
+
store_as='purchased_products', # Property name to store the list in parent
|
|
936
|
+
max_length=100, # Optional maximum string length (adds "..." if truncated)
|
|
937
|
+
keep_selection=False # Whether to keep the current selection
|
|
938
|
+
)
|
|
939
|
+
|
|
940
|
+
# Option 2: Get results as a dictionary without storing them
|
|
941
|
+
product_names = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
|
|
942
|
+
property='title',
|
|
943
|
+
sort_spec='price',
|
|
944
|
+
max_nodes=5
|
|
945
|
+
)
|
|
946
|
+
print(product_names) # Returns {'User1': 'Product1, Product2', 'User2': 'Product3, Product4, Product5'}
|
|
947
|
+
```
|
|
948
|
+
|
|
949
|
+
### Custom Calculations
|
|
950
|
+
|
|
951
|
+
```python
|
|
952
|
+
# Simple calculation: tax inclusive price
|
|
953
|
+
with_tax = graph.type_filter('Product').calculate(
|
|
954
|
+
expression='price * 1.1',
|
|
955
|
+
store_as='price_with_tax'
|
|
956
|
+
)
|
|
957
|
+
|
|
958
|
+
# Aggregate calculations per group
|
|
959
|
+
user_spending = graph.type_filter('User').traverse('PURCHASED').calculate(
|
|
960
|
+
expression='sum(price * quantity)',
|
|
961
|
+
store_as='total_spent'
|
|
962
|
+
)
|
|
963
|
+
|
|
964
|
+
# Count operations
|
|
965
|
+
products_per_user = graph.type_filter('User').traverse('PURCHASED').count(
|
|
966
|
+
store_as='product_count',
|
|
967
|
+
group_by_parent=True
|
|
968
|
+
)
|
|
969
|
+
```
|
|
970
|
+
|
|
971
|
+
### Aggregating Connection Properties
|
|
972
|
+
|
|
973
|
+
Aggregate properties stored on connections (edges) rather than nodes. This is useful when you have data like ownership percentages, weights, or quantities stored on the connections themselves.
|
|
974
|
+
|
|
975
|
+
```python
|
|
976
|
+
# Sum connection properties
|
|
977
|
+
# For each Discovery, sum the share_pct on its EXTENDS_INTO connections
|
|
978
|
+
total_shares = graph.type_filter('Discovery').traverse('EXTENDS_INTO').calculate(
|
|
979
|
+
expression='sum(share_pct)',
|
|
980
|
+
aggregate_connections=True # Key parameter for connection aggregation
|
|
981
|
+
)
|
|
982
|
+
print(total_shares) # Returns {'Discovery A': 100.0, 'Discovery B': 100.0}
|
|
983
|
+
|
|
984
|
+
# Average connection properties
|
|
985
|
+
avg_ownership = graph.type_filter('Company').traverse('OWNS').calculate(
|
|
986
|
+
expression='avg(ownership_pct)',
|
|
987
|
+
aggregate_connections=True
|
|
988
|
+
)
|
|
989
|
+
|
|
990
|
+
# Count connections
|
|
991
|
+
connection_count = graph.type_filter('Parent').traverse('HAS_CHILD').calculate(
|
|
992
|
+
expression='count(any_property)', # Use any property that exists on connections
|
|
993
|
+
aggregate_connections=True
|
|
994
|
+
)
|
|
995
|
+
|
|
996
|
+
# Store aggregated results on parent nodes
|
|
997
|
+
updated_graph = graph.type_filter('Prospect').traverse('HAS_ESTIMATE').calculate(
|
|
998
|
+
expression='sum(weight)',
|
|
999
|
+
aggregate_connections=True,
|
|
1000
|
+
store_as='total_weight' # Stores result on parent Prospect nodes
|
|
1001
|
+
)
|
|
1002
|
+
```
|
|
1003
|
+
|
|
1004
|
+
**Supported aggregate functions for connections:**
|
|
1005
|
+
|
|
1006
|
+
- `sum(property)` - Sum of property values
|
|
1007
|
+
- `avg(property)` / `mean(property)` - Average of property values
|
|
1008
|
+
- `min(property)` - Minimum value
|
|
1009
|
+
- `max(property)` - Maximum value
|
|
1010
|
+
- `count(property)` - Count of connections (with non-null property values)
|
|
1011
|
+
- `std(property)` - Standard deviation
|
|
1012
|
+
|
|
1013
|
+
**Note:** Connection aggregation requires a traversal before `calculate()`. The results are grouped by the parent (source) node of the traversal.
|
|
1014
|
+
|
|
1015
|
+
## Saving and Loading
|
|
1016
|
+
|
|
1017
|
+
```python
|
|
1018
|
+
# Save graph to file
|
|
1019
|
+
graph.save("my_graph.bin")
|
|
1020
|
+
|
|
1021
|
+
# Load graph from file
|
|
1022
|
+
loaded_graph = rusty_graph.load("my_graph.bin")
|
|
1023
|
+
```
|
|
1024
|
+
|
|
1025
|
+
## Operation Reports
|
|
1026
|
+
|
|
1027
|
+
Rusty Graph provides detailed reports for operations that modify the graph, helping you track what changed and diagnose issues.
|
|
1028
|
+
|
|
1029
|
+
### Getting Operation Reports
|
|
1030
|
+
|
|
1031
|
+
```python
|
|
1032
|
+
# Add nodes and get the report
|
|
1033
|
+
report = graph.add_nodes(
|
|
1034
|
+
data=df,
|
|
1035
|
+
node_type='Product',
|
|
1036
|
+
unique_id_field='product_id'
|
|
1037
|
+
)
|
|
1038
|
+
print(f"Created {report['nodes_created']} nodes in {report['processing_time_ms']}ms")
|
|
1039
|
+
|
|
1040
|
+
# Check for errors
|
|
1041
|
+
if report['has_errors']:
|
|
1042
|
+
print(f"Errors: {report['errors']}")
|
|
1043
|
+
```
|
|
1044
|
+
|
|
1045
|
+
### Report Fields
|
|
1046
|
+
|
|
1047
|
+
Node operation reports include:
|
|
1048
|
+
|
|
1049
|
+
- `operation`: Type of operation performed
|
|
1050
|
+
- `timestamp`: When the operation occurred
|
|
1051
|
+
- `nodes_created`: Number of new nodes created
|
|
1052
|
+
- `nodes_updated`: Number of existing nodes updated
|
|
1053
|
+
- `nodes_skipped`: Number of nodes skipped (e.g., due to conflicts)
|
|
1054
|
+
- `processing_time_ms`: Time taken in milliseconds
|
|
1055
|
+
- `has_errors`: Boolean indicating if errors occurred
|
|
1056
|
+
- `errors`: List of error messages (if any)
|
|
1057
|
+
|
|
1058
|
+
Connection operation reports include:
|
|
1059
|
+
|
|
1060
|
+
- `connections_created`: Number of new connections created
|
|
1061
|
+
- `connections_skipped`: Number of connections skipped
|
|
1062
|
+
- `property_fields_tracked`: Number of property fields on connections
|
|
1063
|
+
|
|
1064
|
+
### Operation History
|
|
1065
|
+
|
|
1066
|
+
```python
|
|
1067
|
+
# Get the most recent operation report
|
|
1068
|
+
last_report = graph.get_last_report()
|
|
1069
|
+
|
|
1070
|
+
# Get the operation index (sequential counter)
|
|
1071
|
+
op_index = graph.get_operation_index()
|
|
1072
|
+
|
|
1073
|
+
# Get full operation history
|
|
1074
|
+
history = graph.get_report_history()
|
|
1075
|
+
for report in history:
|
|
1076
|
+
print(f"{report['operation']}: {report['timestamp']}")
|
|
1077
|
+
```
|
|
1078
|
+
|
|
1079
|
+
## Performance Tips
|
|
1080
|
+
|
|
1081
|
+
1. **Batch Operations**: Add nodes and connections in batches rather than individually.
|
|
1082
|
+
|
|
1083
|
+
2. **Specify Columns**: When adding nodes or connections, explicitly specify which columns to include to reduce memory usage.
|
|
1084
|
+
|
|
1085
|
+
3. **Use Indexing**: Filter on node type first before applying other filters.
|
|
1086
|
+
|
|
1087
|
+
4. **Avoid Overloading**: Keep node property count reasonable; too many properties per node will increase memory usage.
|
|
1088
|
+
|
|
1089
|
+
5. **Conflict Handling**: Choose the appropriate conflict handling strategy:
|
|
1090
|
+
- Use `'update'` to merge new properties with existing ones
|
|
1091
|
+
- Use `'replace'` for a complete overwrite
|
|
1092
|
+
- Use `'skip'` to avoid any changes to existing nodes
|
|
1093
|
+
- Use `'preserve'` to only add missing properties
|
|
1094
|
+
|
|
1095
|
+
6. **Connection Direction**: Specify direction in traversals when possible to improve performance.
|
|
1096
|
+
|
|
1097
|
+
7. **Limit Results**: Use `max_nodes()` to limit result size when working with large datasets.
|
|
1098
|
+
|
|
1099
|
+
8. **Create Indexes**: Use `create_index()` on frequently filtered properties for ~3.3x speedup on equality filters.
|
|
1100
|
+
|
|
1101
|
+
9. **Use Pattern Matching Limits**: When using `match_pattern()`, set `max_matches` to avoid scanning the entire graph.
|
|
1102
|
+
|
|
@@ -0,0 +1,6 @@
|
|
|
1
|
+
rusty_graph\__init__.py,sha256=_Fds04T5qV95XgyZm7qIPfLghgoCZi-_hDbw-e_18oA,127
|
|
2
|
+
rusty_graph\rusty_graph.cp311-win_amd64.pyd,sha256=MYlmj-C-pbWO6D_YiWB3UEhKcOSfbqY1INIpiDPAi3k,1598976
|
|
3
|
+
rusty_graph-0.3.18.dist-info\METADATA,sha256=wlGTPk8AlIjdBdjhLR9OAuh8OAnfYIj25R_9JAzVokg,34186
|
|
4
|
+
rusty_graph-0.3.18.dist-info\WHEEL,sha256=X79LywvMB9iCuFHu88xBAFTJDhRqJi6Yh9hhoCI9jao,97
|
|
5
|
+
rusty_graph-0.3.18.dist-info\licenses\LICENSE,sha256=APR3-pK3VHs2KSVNVV9RQwAM0yIdoLq4FTCy1Cx2EMs,938
|
|
6
|
+
rusty_graph-0.3.18.dist-info\RECORD,,
|
|
@@ -1,8 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.4
|
|
2
|
-
Name: rusty_graph
|
|
3
|
-
Version: 0.3.17
|
|
4
|
-
Classifier: Programming Language :: Rust
|
|
5
|
-
Classifier: Programming Language :: Python :: Implementation :: CPython
|
|
6
|
-
Classifier: Programming Language :: Python :: Implementation :: PyPy
|
|
7
|
-
License-File: LICENSE
|
|
8
|
-
Requires-Python: >=3.8
|
|
@@ -1,6 +0,0 @@
|
|
|
1
|
-
rusty_graph\__init__.py,sha256=_Fds04T5qV95XgyZm7qIPfLghgoCZi-_hDbw-e_18oA,127
|
|
2
|
-
rusty_graph\rusty_graph.cp311-win_amd64.pyd,sha256=P5gmKv6mNMb2VPv9KiT203DRNtVhRSdl593287XTxo0,1599488
|
|
3
|
-
rusty_graph-0.3.17.dist-info\METADATA,sha256=0raMGmGeH9IdUO59MU6qokgChr3Mdi1yS7LjQ_Y9dsk,283
|
|
4
|
-
rusty_graph-0.3.17.dist-info\WHEEL,sha256=X79LywvMB9iCuFHu88xBAFTJDhRqJi6Yh9hhoCI9jao,97
|
|
5
|
-
rusty_graph-0.3.17.dist-info\licenses\LICENSE,sha256=APR3-pK3VHs2KSVNVV9RQwAM0yIdoLq4FTCy1Cx2EMs,938
|
|
6
|
-
rusty_graph-0.3.17.dist-info\RECORD,,
|
|
File without changes
|
|
File without changes
|