rusty-graph 0.3.17__cp311-cp311-win_amd64.whl → 0.3.18__cp311-cp311-win_amd64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Binary file
@@ -0,0 +1,1102 @@
1
+ Metadata-Version: 2.4
2
+ Name: rusty_graph
3
+ Version: 0.3.18
4
+ Classifier: Development Status :: 4 - Beta
5
+ Classifier: Intended Audience :: Developers
6
+ Classifier: Intended Audience :: Science/Research
7
+ Classifier: License :: OSI Approved :: MIT License
8
+ Classifier: Operating System :: OS Independent
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.8
11
+ Classifier: Programming Language :: Python :: 3.9
12
+ Classifier: Programming Language :: Python :: 3.10
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Programming Language :: Rust
16
+ Classifier: Programming Language :: Python :: Implementation :: CPython
17
+ Classifier: Programming Language :: Python :: Implementation :: PyPy
18
+ Classifier: Topic :: Database
19
+ Classifier: Topic :: Scientific/Engineering
20
+ Classifier: Topic :: Software Development :: Libraries :: Python Modules
21
+ License-File: LICENSE
22
+ Summary: A high-performance graph database library with Python bindings written in Rust
23
+ Keywords: graph,database,knowledge-graph,rust,high-performance,data-science
24
+ Author-email: Kristian dF Kollsgård <kkollsg@gmail.com>
25
+ License: MIT
26
+ Requires-Python: >=3.8
27
+ Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
28
+ Project-URL: Documentation, https://github.com/kkollsga/rusty-graph#readme
29
+ Project-URL: Homepage, https://github.com/kkollsga/rusty-graph
30
+ Project-URL: Repository, https://github.com/kkollsga/rusty-graph
31
+
32
+ # Rusty Graph Python Library
33
+
34
+ A high-performance graph database library with Python bindings written in Rust.
35
+
36
+ ## Table of Contents
37
+
38
+ - [Installation](#installation)
39
+ - [Introduction](#introduction)
40
+ - [Basic Usage](#basic-usage)
41
+ - [Working with Nodes](#working-with-nodes)
42
+ - [Working with Dates](#working-with-dates)
43
+ - [Creating Connections](#creating-connections)
44
+ - [Filtering and Querying](#filtering-and-querying)
45
+ - [Basic Filtering](#basic-filtering)
46
+ - [Null Value Checks](#null-value-checks)
47
+ - [Filtering Orphan Nodes](#filtering-orphan-nodes)
48
+ - [Sorting Results](#sorting-results)
49
+ - [Limiting Results](#limiting-results)
50
+ - [Traversing the Graph](#traversing-the-graph)
51
+ - [Set Operations on Selections](#set-operations-on-selections)
52
+ - [Path Finding and Graph Algorithms](#path-finding-and-graph-algorithms)
53
+ - [Pattern Matching](#pattern-matching)
54
+ - [Subgraph Extraction](#subgraph-extraction)
55
+ - [Spatial and Geometry Operations](#spatial-and-geometry-operations)
56
+ - [Schema Definition and Validation](#schema-definition-and-validation)
57
+ - [Index Management](#index-management)
58
+ - [Export Formats](#export-formats)
59
+ - [Statistics and Calculations](#statistics-and-calculations)
60
+ - [Saving and Loading](#saving-and-loading)
61
+ - [Operation Reports](#operation-reports)
62
+ - [Performance Tips](#performance-tips)
63
+
64
+ ## Installation
65
+
66
+ ```bash
67
+ pip install rusty-graph
68
+ # upgrade
69
+ pip install rusty-graph --upgrade
70
+ ```
71
+
72
+ ## Introduction
73
+
74
+ Rusty Graph is a Rust-based project that aims to empower the generation of high-performance knowledge graphs within Python environments. Specifically designed for aggregating and merging data from SQL databases, Rusty Graph facilitates the seamless transition of relational database information into structured knowledge graphs. By leveraging Rust's efficiency and Python's flexibility, Rusty Graph offers an optimal solution for data scientists and developers looking to harness the power of knowledge graphs in their data-driven applications.
75
+
76
+ ## Key Features
77
+
78
+ - **Efficient Data Integration:** Easily import and merge data from SQL databases to construct knowledge graphs, optimizing for performance and scalability.
79
+ - **High-Performance Operations:** Utilize Rust's performance capabilities to handle graph operations, making Rusty Graph ideal for working with large-scale data.
80
+ - **Python Compatibility:** Directly integrate Rusty Graph into Python projects, allowing for a smooth workflow within Python-based data analysis and machine learning pipelines.
81
+ - **Flexible Graph Manipulation:** Create, modify, and query knowledge graphs with a rich set of features, catering to complex data structures and relationships.
82
+ - **Graph Algorithms:** Built-in shortest path, all paths, and connected components algorithms powered by petgraph.
83
+ - **Pattern Matching:** Cypher-like query syntax for expressive multi-hop graph traversals.
84
+ - **Spatial Operations:** Geographic queries including bounding box, distance (Haversine), and WKT geometry intersection.
85
+ - **Export Formats:** Export to GraphML, GEXF, D3 JSON, and CSV for visualization and interoperability.
86
+
87
+ ## Basic Usage
88
+
89
+ ```python
90
+ import rusty_graph
91
+ import pandas as pd
92
+
93
+ # Create a new knowledge graph
94
+ graph = rusty_graph.KnowledgeGraph()
95
+
96
+ # Create some data using pandas
97
+ users_df = pd.DataFrame({
98
+ 'user_id': [1001, 1002, 1003],
99
+ 'name': ['Alice', 'Bob', 'Charlie'],
100
+ 'age': [28, 35, 42]
101
+ })
102
+
103
+ # Add nodes to the graph
104
+ graph.add_nodes(
105
+ data=users_df,
106
+ node_type='User',
107
+ unique_id_field='user_id',
108
+ node_title_field='name'
109
+ )
110
+
111
+ # View graph schema
112
+ print(graph.get_schema())
113
+ ```
114
+
115
+ ## Working with Nodes
116
+
117
+ ### Adding Nodes
118
+
119
+ ```python
120
+ # Add products to graph
121
+ products_df = pd.DataFrame({
122
+ 'product_id': [101, 102, 103],
123
+ 'title': ['Laptop', 'Phone', 'Tablet'],
124
+ 'price': [999.99, 699.99, 349.99],
125
+ 'stock': [45, 120, 30]
126
+ })
127
+
128
+ graph.add_nodes(
129
+ data=products_df,
130
+ node_type='Product',
131
+ unique_id_field='product_id',
132
+ node_title_field='title',
133
+ # Optional: specify which columns to include
134
+ columns=['product_id', 'title', 'price', 'stock', 'category'],
135
+ # Optional: how to handle conflicts with existing nodes
136
+ conflict_handling='update' # Options: 'update', 'replace', 'skip', 'preserve'
137
+ )
138
+ ```
139
+
140
+ ### Retrieving Nodes
141
+
142
+ ```python
143
+ # Get all products
144
+ products = graph.type_filter('Product')
145
+
146
+ # Get node information
147
+ product_nodes = products.get_nodes()
148
+ print(product_nodes)
149
+
150
+ # Get specific properties
151
+ prices = products.get_properties(['price', 'stock'])
152
+ print(prices)
153
+
154
+ # Get only titles
155
+ titles = products.get_titles()
156
+ print(titles)
157
+ ```
158
+
159
+ ## Working with Dates
160
+
161
+ Rusty Graph supports native DateTime values for date-based filtering and operations.
162
+
163
+ ### Specifying Date Columns
164
+
165
+ When adding nodes, use the `column_types` parameter to specify which columns should be parsed as dates:
166
+
167
+ ```python
168
+ import pandas as pd
169
+
170
+ # Create data with date columns
171
+ estimates_df = pd.DataFrame({
172
+ 'estimate_id': [1, 2, 3],
173
+ 'name': ['Estimate A', 'Estimate B', 'Estimate C'],
174
+ 'valid_from': ['2020-01-01', '2020-06-15', '2021-01-01'],
175
+ 'valid_to': ['2020-12-31', '2021-06-14', '2021-12-31'],
176
+ 'value': [100.5, 250.3, 180.0]
177
+ })
178
+
179
+ # Add nodes with date columns specified
180
+ graph.add_nodes(
181
+ data=estimates_df,
182
+ node_type='Estimate',
183
+ unique_id_field='estimate_id',
184
+ node_title_field='name',
185
+ column_types={'valid_from': 'datetime', 'valid_to': 'datetime'}
186
+ )
187
+ ```
188
+
189
+ ### Filtering on Date Fields
190
+
191
+ Date fields can be filtered using comparison operators. ISO format strings (YYYY-MM-DD) work correctly for date comparisons:
192
+
193
+ ```python
194
+ # Find estimates valid after a specific date
195
+ recent_estimates = graph.type_filter('Estimate').filter({
196
+ 'valid_from': {'>=': '2020-06-01'}
197
+ })
198
+
199
+ # Find estimates within a date range
200
+ active_in_2020 = graph.type_filter('Estimate').filter({
201
+ 'valid_from': {'<=': '2020-12-31'},
202
+ 'valid_to': {'>=': '2020-01-01'}
203
+ })
204
+ ```
205
+
206
+ ### Temporal Queries
207
+
208
+ For entities with validity periods (like estimates, contracts, or versions), Rusty Graph provides convenient methods to query based on time:
209
+
210
+ ```python
211
+ # Find entities valid at a specific point in time
212
+ # Default field names: 'date_from' and 'date_to'
213
+ valid_estimates = graph.type_filter('Estimate').valid_at('2020-06-15')
214
+
215
+ # Use custom field names if your data uses different column names
216
+ active_contracts = graph.type_filter('Contract').valid_at(
217
+ '2021-03-01',
218
+ date_from_field='start_date',
219
+ date_to_field='end_date'
220
+ )
221
+
222
+ # Find entities valid during a date range (overlapping periods)
223
+ overlapping = graph.type_filter('Estimate').valid_during('2020-01-01', '2020-06-30')
224
+
225
+ # Chain with other operations
226
+ high_value_valid = (
227
+ graph.type_filter('Estimate')
228
+ .valid_at('2020-06-15')
229
+ .filter({'value': {'>=': 100.0}})
230
+ )
231
+ ```
232
+
233
+ **Note:** `valid_at(date)` finds nodes where `date_from <= date <= date_to`. `valid_during(start, end)` finds nodes whose validity period overlaps with the given range.
234
+
235
+ ### Batch Property Updates
236
+
237
+ Update properties on multiple nodes at once based on a selection:
238
+
239
+ ```python
240
+ # Select nodes and update them in batch
241
+ result = graph.type_filter('Prospect').filter({'status': 'Inactive'}).update({
242
+ 'is_active': False,
243
+ 'deactivation_reason': 'status_inactive'
244
+ })
245
+
246
+ # Access the updated graph and count
247
+ updated_graph = result['graph']
248
+ nodes_updated = result['nodes_updated']
249
+ print(f"Updated {nodes_updated} nodes")
250
+
251
+ # Use keep_selection=True to preserve the selection for chaining
252
+ result = selection.update({'processed': True}, keep_selection=True)
253
+
254
+ # Update with different value types
255
+ graph.type_filter('Node').update({
256
+ 'count': 42, # Integer
257
+ 'ratio': 3.14159, # Float
258
+ 'active': True, # Boolean
259
+ 'category': 'updated' # String
260
+ })
261
+ ```
262
+
263
+ **Note:** The `update()` method returns a dictionary with `graph` (the updated KnowledgeGraph), `nodes_updated` (count of updated nodes), and `report_index` (index of the operation report). By default, the selection is cleared after update; use `keep_selection=True` to preserve it.
264
+
265
+ ### Query Explain
266
+
267
+ Get insight into how your queries are executed with the `explain()` method:
268
+
269
+ ```python
270
+ # Build a query chain
271
+ result = (
272
+ graph.type_filter('Prospect')
273
+ .filter({'region': 'North'})
274
+ .traverse('HAS_ESTIMATE')
275
+ )
276
+
277
+ # See the execution plan
278
+ print(result.explain())
279
+ # Output: TYPE_FILTER Prospect (6775 nodes) -> FILTER (3200 nodes) -> TRAVERSE HAS_ESTIMATE (10954 nodes)
280
+
281
+ # Works with temporal queries too
282
+ valid_estimates = graph.type_filter('Estimate').valid_at('2020-06-15')
283
+ print(valid_estimates.explain())
284
+ # Output: TYPE_FILTER Estimate (1000 nodes) -> VALID_AT (450 nodes)
285
+ ```
286
+
287
+ **Note:** The `explain()` method shows each operation in the query chain with the actual number of nodes at each step. This helps you understand query performance and optimize your queries.
288
+
289
+ ## Creating Connections
290
+
291
+ ```python
292
+ # Purchase data
293
+ purchases_df = pd.DataFrame({
294
+ 'user_id': [1001, 1001, 1002],
295
+ 'product_id': [101, 103, 102],
296
+ 'date': ['2023-01-15', '2023-02-10', '2023-01-20'],
297
+ 'quantity': [1, 2, 1]
298
+ })
299
+
300
+ # Create connections
301
+ graph.add_connections(
302
+ data=purchases_df,
303
+ connection_type='PURCHASED',
304
+ source_type='User',
305
+ source_id_field='user_id',
306
+ target_type='Product',
307
+ target_id_field='product_id',
308
+ # Optional additional fields to include
309
+ columns=['date', 'quantity']
310
+ )
311
+
312
+ # Create connections from currently selected nodes
313
+ users = graph.type_filter('User')
314
+ products = graph.type_filter('Product')
315
+ # This would connect all users to all products with a 'VIEWED' connection
316
+ users.selection_to_new_connections(connection_type='VIEWED')
317
+ ```
318
+
319
+ ## Filtering and Querying
320
+
321
+ ### Basic Filtering
322
+
323
+ ```python
324
+ # Filter by exact match
325
+ expensive_products = graph.type_filter('Product').filter({'price': 999.99})
326
+
327
+ # Filter using operators
328
+ affordable_products = graph.type_filter('Product').filter({
329
+ 'price': {'<': 500.0}
330
+ })
331
+
332
+ # Multiple conditions
333
+ popular_affordable = graph.type_filter('Product').filter({
334
+ 'price': {'<': 500.0},
335
+ 'stock': {'>': 50}
336
+ })
337
+
338
+ # In operator
339
+ selected_products = graph.type_filter('Product').filter({
340
+ 'product_id': {'in': [101, 103]}
341
+ })
342
+ ```
343
+
344
+ ### Null Value Checks
345
+
346
+ You can filter nodes based on whether a field is null (missing) or not null:
347
+
348
+ ```python
349
+ # Find nodes where a field is null or missing
350
+ nodes_without_category = graph.type_filter('Product').filter({
351
+ 'category': {'is_null': True}
352
+ })
353
+
354
+ # Find nodes where a field exists and is not null
355
+ nodes_with_category = graph.type_filter('Product').filter({
356
+ 'category': {'is_not_null': True}
357
+ })
358
+
359
+ # Combine with other conditions
360
+ incomplete_products = graph.type_filter('Product').filter({
361
+ 'description': {'is_null': True},
362
+ 'price': {'>': 0}
363
+ })
364
+ ```
365
+
366
+ ### Filtering Orphan Nodes
367
+
368
+ Orphan nodes are nodes that have no connections (no incoming or outgoing edges). You can filter to include or exclude orphan nodes:
369
+
370
+ ```python
371
+ # Get only orphan nodes
372
+ orphans = graph.filter_orphans(include_orphans=True)
373
+
374
+ # Get only nodes that have at least one connection
375
+ connected = graph.filter_orphans(include_orphans=False)
376
+
377
+ # Filter orphans with sorting and limits
378
+ recent_orphans = graph.filter_orphans(
379
+ include_orphans=True,
380
+ sort_spec='created_date',
381
+ max_nodes=100
382
+ )
383
+
384
+ # Chain with other operations
385
+ product_orphans = graph.type_filter('Product').filter_orphans(include_orphans=True)
386
+ ```
387
+
388
+ ### Sorting Results
389
+
390
+ Rusty Graph offers flexible options for sorting nodes based on their properties. The `sort_spec` parameter can be used in various methods including `type_filter()`, `filter()`, `filter_orphans()`, `traverse()`, and the standalone `sort()` method.
391
+
392
+ #### Sort Specification Format Options
393
+
394
+ 1. **Single field string**: Sorts by the specified field in ascending order.
395
+ ```python
396
+ # Sort products by price (lowest to highest)
397
+ sorted_products = graph.type_filter('Product').sort('price')
398
+
399
+ # Can also be used in other methods
400
+ cheap_products = graph.type_filter('Product').filter(
401
+ {'stock': {'>': 10}},
402
+ sort_spec='price'
403
+ )
404
+ ```
405
+
406
+ 2. **Field with direction**: Explicitly specify ascending or descending order.
407
+ ```python
408
+ # Sort products by price (highest to lowest)
409
+ expensive_first = graph.type_filter('Product').sort('price', ascending=False)
410
+ ```
411
+
412
+ 3. **List of tuples**: For multi-field sorting with different directions.
413
+ ```python
414
+ # First sort by stock (descending), then by price (ascending)
415
+ # This prioritizes high-stock items, and for items with equal stock,
416
+ # shows the cheapest ones first
417
+ complex_sort = graph.type_filter('Product').sort([
418
+ ('stock', False), # False = descending order
419
+ ('price', True) # True = ascending order
420
+ ])
421
+ ```
422
+
423
+ 4. **Dictionary with field and direction**: Alternative format for single field sorting.
424
+ ```python
425
+ # Sort by rating in descending order
426
+ top_rated = graph.type_filter('Product').filter(
427
+ {},
428
+ sort_spec={'field': 'rating', 'ascending': False}
429
+ )
430
+ ```
431
+
432
+ #### Using Sort Specifications in Different Methods
433
+
434
+ Sort specifications work consistently across methods:
435
+
436
+ ```python
437
+ # In type_filter
438
+ latest_users = graph.type_filter('User', sort_spec='creation_date', max_nodes=10)
439
+
440
+ # In filter
441
+ new_expensive = graph.type_filter('Product').filter(
442
+ {'price': {'>': 500.0}},
443
+ sort_spec=[('creation_date', False), ('price', True)]
444
+ )
445
+
446
+ # In traversal
447
+ alice_recent_purchases = graph.type_filter('User').filter({'name': 'Alice'}).traverse(
448
+ connection_type='PURCHASED',
449
+ sort_target='date',
450
+ max_nodes=5
451
+ )
452
+
453
+ # In filter_orphans
454
+ recent_orphans = graph.filter_orphans(
455
+ include_orphans=True,
456
+ sort_spec='last_modified',
457
+ max_nodes=20
458
+ )
459
+
460
+ # In children_properties_to_list
461
+ expensive_products = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
462
+ property='title',
463
+ sort_spec='price', # Sort children by price before creating the list
464
+ max_nodes=3,
465
+ store_as='top_expensive_purchases'
466
+ )
467
+ ```
468
+
469
+ ### Limiting Results
470
+
471
+ ```python
472
+ # Get at most 5 nodes per group
473
+ limited_products = graph.type_filter('Product').max_nodes(5)
474
+ ```
475
+
476
+ ## Traversing the Graph
477
+
478
+ ```python
479
+ # Find products purchased by a specific user
480
+ alice = graph.type_filter('User').filter({'name': 'Alice'})
481
+ alice_products = alice.traverse(
482
+ connection_type='PURCHASED',
483
+ direction='outgoing'
484
+ )
485
+
486
+ # Access the resulting products
487
+ alice_product_data = alice_products.get_nodes()
488
+
489
+ # Filter the traversal target nodes
490
+ expensive_purchases = alice.traverse(
491
+ connection_type='PURCHASED',
492
+ filter_target={'price': {'>=': 500.0}},
493
+ sort_target='price',
494
+ max_nodes=10
495
+ )
496
+
497
+ # Get connection information
498
+ connection_data = alice.get_connections(include_node_properties=True)
499
+ ```
500
+
501
+ ### Filtering on Connection Properties
502
+
503
+ You can filter traversals based on properties stored on the connections themselves:
504
+
505
+ ```python
506
+ # Traverse only through connections with specific property values
507
+ high_share_blocks = graph.type_filter('Discovery').traverse(
508
+ connection_type='EXTENDS_INTO',
509
+ filter_connection={'share_pct': {'>=': 50.0}}
510
+ )
511
+
512
+ # Combine connection and target filters
513
+ result = graph.type_filter('Discovery').traverse(
514
+ connection_type='EXTENDS_INTO',
515
+ filter_connection={'year': 2021},
516
+ filter_target={'status': 'active'}
517
+ )
518
+
519
+ # Filter connections with null/not-null checks
520
+ discounted = user.traverse(
521
+ connection_type='PURCHASED',
522
+ filter_connection={'discount': {'is_not_null': True}}
523
+ )
524
+ ```
525
+
526
+ ## Set Operations on Selections
527
+
528
+ Rusty Graph supports set operations to combine, intersect, or subtract selections. These operations create new selections without modifying the originals.
529
+
530
+ ### Union
531
+
532
+ Combines all nodes from both selections (logical OR):
533
+
534
+ ```python
535
+ # Select prospects from different geoprovinces
536
+ n3_prospects = graph.type_filter('Prospect').filter({'geoprovince': 'N3'})
537
+ m3_prospects = graph.type_filter('Prospect').filter({'geoprovince': 'M3'})
538
+
539
+ # Combine both selections
540
+ combined = n3_prospects.union(m3_prospects)
541
+ print(f"Total prospects: {len(combined.get_nodes())}")
542
+ ```
543
+
544
+ ### Intersection
545
+
546
+ Keeps only nodes present in both selections (logical AND):
547
+
548
+ ```python
549
+ # Select large discoveries and discoveries in a specific block
550
+ large_discoveries = graph.type_filter('Discovery').filter({'oil_reserves': {'>=': 100.0}})
551
+ block_34_discoveries = graph.type_filter('Block').filter({'block_id': 34}).traverse('CONTAINS', direction='incoming')
552
+
553
+ # Get large discoveries in block 34
554
+ result = large_discoveries.intersection(block_34_discoveries)
555
+ ```
556
+
557
+ ### Difference
558
+
559
+ Keeps nodes in the first selection but not in the second (subtraction):
560
+
561
+ ```python
562
+ # Get all prospects
563
+ all_prospects = graph.type_filter('Prospect')
564
+
565
+ # Get prospects that have estimates
566
+ with_estimates = graph.type_filter('ProspectEstimate').traverse('BELONGS_TO', direction='incoming')
567
+
568
+ # Get prospects WITHOUT estimates
569
+ without_estimates = all_prospects.difference(with_estimates)
570
+ ```
571
+
572
+ ### Symmetric Difference
573
+
574
+ Keeps nodes that are in exactly one selection but not both (exclusive OR):
575
+
576
+ ```python
577
+ # Nodes in category A or B but not both
578
+ exclusive_nodes = category_a.symmetric_difference(category_b)
579
+ ```
580
+
581
+ ### Chaining Operations
582
+
583
+ Set operations can be chained for complex queries:
584
+
585
+ ```python
586
+ # (A union B) intersection C
587
+ result = selection_a.union(selection_b).intersection(selection_c)
588
+
589
+ # A difference (B intersection C)
590
+ b_inter_c = selection_b.intersection(selection_c)
591
+ result = selection_a.difference(b_inter_c)
592
+ ```
593
+
594
+ ## Path Finding and Graph Algorithms
595
+
596
+ Rusty Graph provides efficient implementations of common graph algorithms powered by petgraph.
597
+
598
+ ### Shortest Path
599
+
600
+ Find the shortest path between two nodes:
601
+
602
+ ```python
603
+ # Find shortest path between two nodes
604
+ path = graph.shortest_path(
605
+ source_type='Person',
606
+ source_id=1,
607
+ target_type='Person',
608
+ target_id=100,
609
+ max_hops=10 # Optional limit
610
+ )
611
+
612
+ # Path is a list of node dictionaries
613
+ for node in path:
614
+ print(f"{node['node_type']}: {node['title']}")
615
+ ```
616
+
617
+ ### All Paths
618
+
619
+ Find all paths between nodes up to a maximum number of hops:
620
+
621
+ ```python
622
+ # Find all paths up to 4 hops
623
+ paths = graph.all_paths(
624
+ source_type='Play',
625
+ source_id=1,
626
+ target_type='Wellbore',
627
+ max_hops=4
628
+ )
629
+
630
+ # Returns a list of paths, each path is a list of nodes
631
+ print(f"Found {len(paths)} paths")
632
+ for i, path in enumerate(paths):
633
+ print(f"Path {i+1}: {' -> '.join(n['title'] for n in path)}")
634
+ ```
635
+
636
+ ### Connected Components
637
+
638
+ Identify connected components in the graph:
639
+
640
+ ```python
641
+ # Get all connected components
642
+ components = graph.connected_components()
643
+
644
+ # Returns a list of components, each component is a list of node IDs
645
+ print(f"Found {len(components)} connected components")
646
+ for i, component in enumerate(components):
647
+ print(f"Component {i+1}: {len(component)} nodes")
648
+ ```
649
+
650
+ ## Pattern Matching
651
+
652
+ Query the graph using Cypher-like pattern syntax for expressive multi-hop queries:
653
+
654
+ ```python
655
+ # Simple pattern: Find plays with prospects that became discoveries
656
+ results = graph.match_pattern(
657
+ '(p:Play)-[:HAS_PROSPECT]->(pr:Prospect)-[:BECAME_DISCOVERY]->(d:Discovery)'
658
+ )
659
+
660
+ # Access matched variables
661
+ for match in results:
662
+ print(f"Play: {match['p']['title']}")
663
+ print(f"Prospect: {match['pr']['title']}")
664
+ print(f"Discovery: {match['d']['title']}")
665
+
666
+ # Pattern with property conditions
667
+ results = graph.match_pattern(
668
+ '(u:User)-[:PURCHASED]->(p:Product {category: "Electronics"})'
669
+ )
670
+
671
+ # Limit results for performance on large graphs
672
+ results = graph.match_pattern(
673
+ '(a:Person)-[:KNOWS]->(b:Person)',
674
+ max_matches=100
675
+ )
676
+ ```
677
+
678
+ **Supported pattern syntax:**
679
+
680
+ - Node patterns: `(variable:NodeType)` or `(variable:NodeType {property: "value"})`
681
+ - Relationship patterns: `-[:CONNECTION_TYPE]->`
682
+ - Multiple hops: Chain patterns like `(a)-[:REL1]->(b)-[:REL2]->(c)`
683
+
684
+ ## Subgraph Extraction
685
+
686
+ Extract a portion of the graph for isolated analysis or export:
687
+
688
+ ```python
689
+ # Start with a selection and expand to include neighbors
690
+ subgraph = (
691
+ graph.type_filter('Company')
692
+ .filter({'name': 'Acme Corp'})
693
+ .expand(hops=2) # Include all nodes within 2 hops
694
+ .to_subgraph() # Create independent subgraph
695
+ )
696
+
697
+ # The subgraph is a fully functional KnowledgeGraph
698
+ print(f"Subgraph has {subgraph.node_count()} nodes")
699
+
700
+ # Save the subgraph
701
+ subgraph.save('acme_network.bin')
702
+
703
+ # Export to visualization format
704
+ subgraph.export('acme_network.graphml', format='graphml')
705
+ ```
706
+
707
+ ### Expand Method
708
+
709
+ The `expand()` method uses breadth-first search to include neighboring nodes:
710
+
711
+ ```python
712
+ # Expand selection by 1 hop (immediate neighbors)
713
+ expanded = graph.type_filter('Person').filter({'name': 'Alice'}).expand(hops=1)
714
+
715
+ # Expand by 3 hops for broader context
716
+ broad_context = selection.expand(hops=3)
717
+ ```
718
+
719
+ ## Spatial and Geometry Operations
720
+
721
+ Query nodes based on geographic location and geometry. Useful for GIS applications and location-based analysis.
722
+
723
+ ### Bounding Box Queries
724
+
725
+ Find nodes within a rectangular geographic area:
726
+
727
+ ```python
728
+ # Find discoveries within a bounding box
729
+ north_sea_discoveries = graph.type_filter('Discovery').within_bounds(
730
+ lat_field='latitude',
731
+ lon_field='longitude',
732
+ min_lat=58.0,
733
+ max_lat=62.0,
734
+ min_lon=1.0,
735
+ max_lon=5.0
736
+ )
737
+ ```
738
+
739
+ ### Distance Queries (Haversine)
740
+
741
+ Find nodes within a radius of a point using great-circle distance:
742
+
743
+ ```python
744
+ # Find wellbores within 50km of a location
745
+ nearby_wellbores = graph.type_filter('Wellbore').near_point_km(
746
+ lat_field='latitude',
747
+ lon_field='longitude',
748
+ center_lat=60.5,
749
+ center_lon=3.2,
750
+ radius_km=50.0
751
+ )
752
+ ```
753
+
754
+ ### WKT Geometry Intersection
755
+
756
+ Find nodes whose geometry intersects with a WKT polygon:
757
+
758
+ ```python
759
+ # Define a polygon in WKT format
760
+ search_area = 'POLYGON((1 58, 5 58, 5 62, 1 62, 1 58))'
761
+
762
+ # Find fields that intersect the polygon
763
+ fields_in_area = graph.type_filter('Field').intersects(
764
+ geometry_field='wkt_geometry',
765
+ wkt=search_area
766
+ )
767
+ ```
768
+
769
+ ## Schema Definition and Validation
770
+
771
+ Define expected structure and validate your graph data:
772
+
773
+ ### Defining a Schema
774
+
775
+ ```python
776
+ # Define schema for node types and connections
777
+ graph.define_schema({
778
+ 'nodes': {
779
+ 'Prospect': {
780
+ 'required': ['npdid_prospect', 'prospect_name'],
781
+ 'optional': ['prospect_status', 'prospect_geoprovince'],
782
+ 'types': {
783
+ 'npdid_prospect': 'integer',
784
+ 'prospect_name': 'string',
785
+ 'prospect_ns_dec': 'float'
786
+ }
787
+ },
788
+ 'ProspectEstimate': {
789
+ 'required': ['estimate_id'],
790
+ 'types': {
791
+ 'estimate_id': 'integer',
792
+ 'value': 'float'
793
+ }
794
+ }
795
+ },
796
+ 'connections': {
797
+ 'HAS_ESTIMATE': {
798
+ 'source': 'Prospect',
799
+ 'target': 'ProspectEstimate'
800
+ }
801
+ }
802
+ })
803
+ ```
804
+
805
+ ### Validating Against Schema
806
+
807
+ ```python
808
+ # Validate the graph against the defined schema
809
+ errors = graph.validate_schema()
810
+
811
+ if errors:
812
+ print("Validation errors found:")
813
+ for error in errors:
814
+ print(f" - {error}")
815
+ else:
816
+ print("Graph validates successfully!")
817
+ ```
818
+
819
+ ### Getting Current Schema
820
+
821
+ ```python
822
+ # View the current schema (auto-generated from data)
823
+ schema = graph.get_schema()
824
+ print(schema)
825
+ ```
826
+
827
+ ## Index Management
828
+
829
+ Create indexes for faster filtering on frequently queried properties:
830
+
831
+ ### Creating Indexes
832
+
833
+ ```python
834
+ # Create an index on a property
835
+ graph.create_index('Prospect', 'prospect_geoprovince')
836
+
837
+ # Indexed properties get O(1) lookup for equality filters
838
+ # This query will be much faster with an index:
839
+ north_prospects = graph.type_filter('Prospect').filter({
840
+ 'prospect_geoprovince': 'North Sea'
841
+ })
842
+ ```
843
+
844
+ ### Listing and Dropping Indexes
845
+
846
+ ```python
847
+ # List all indexes
848
+ indexes = graph.list_indexes()
849
+ for idx in indexes:
850
+ print(f"Index on {idx['node_type']}.{idx['property']}")
851
+
852
+ # Drop an index
853
+ graph.drop_index('Prospect', 'prospect_geoprovince')
854
+ ```
855
+
856
+ **Performance Note:** Benchmarks show ~3.3x speedup for equality filters on indexed properties. Create indexes on properties you frequently filter by exact value.
857
+
858
+ ## Export Formats
859
+
860
+ Export your graph to various formats for visualization and interoperability:
861
+
862
+ ### Export to File
863
+
864
+ ```python
865
+ # GraphML format (compatible with Gephi, yEd, etc.)
866
+ graph.export('my_graph.graphml', format='graphml')
867
+
868
+ # GEXF format (Gephi native format)
869
+ graph.export('my_graph.gexf', format='gexf')
870
+
871
+ # D3.js JSON format (for web visualization)
872
+ graph.export('my_graph.json', format='d3')
873
+
874
+ # CSV format (nodes and edges as CSV)
875
+ graph.export('my_graph.csv', format='csv')
876
+ ```
877
+
878
+ ### Export to String
879
+
880
+ Get export data as a string for programmatic use:
881
+
882
+ ```python
883
+ # Get GraphML as string
884
+ graphml_string = graph.export_string(format='graphml')
885
+
886
+ # Get D3 JSON as string
887
+ d3_json = graph.export_string(format='d3')
888
+
889
+ # Export only current selection
890
+ selected_json = graph.type_filter('Person').export_string(
891
+ format='d3',
892
+ selection_only=True
893
+ )
894
+ ```
895
+
896
+ ### Export Subgraphs
897
+
898
+ Combine with subgraph extraction for partial exports:
899
+
900
+ ```python
901
+ # Export just a portion of the graph
902
+ subgraph = (
903
+ graph.type_filter('Company')
904
+ .filter({'region': 'Europe'})
905
+ .expand(hops=2)
906
+ .to_subgraph()
907
+ )
908
+ subgraph.export('europe_companies.graphml', format='graphml')
909
+ ```
910
+
911
+ ## Statistics and Calculations
912
+
913
+ ### Basic Statistics
914
+
915
+ ```python
916
+ # Get statistics for a property
917
+ price_stats = graph.type_filter('Product').statistics('price')
918
+ print(price_stats)
919
+
920
+ # Calculate unique values
921
+ unique_categories = graph.type_filter('Product').unique_values(
922
+ property='category',
923
+ # Store result in node property
924
+ store_as='category_list',
925
+ max_length=10
926
+ )
927
+
928
+ # Convert children properties to a comma-separated list in parent nodes
929
+ # Option 1: Store results in parent nodes
930
+ users_with_products = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
931
+ property='title', # Default is 'title' if not specified
932
+ filter={'price': {'<': 500.0}}, # Optional filtering of children
933
+ sort_spec='price', # Optional sorting of children
934
+ max_nodes=5, # Optional limit of children per parent
935
+ store_as='purchased_products', # Property name to store the list in parent
936
+ max_length=100, # Optional maximum string length (adds "..." if truncated)
937
+ keep_selection=False # Whether to keep the current selection
938
+ )
939
+
940
+ # Option 2: Get results as a dictionary without storing them
941
+ product_names = graph.type_filter('User').traverse('PURCHASED').children_properties_to_list(
942
+ property='title',
943
+ sort_spec='price',
944
+ max_nodes=5
945
+ )
946
+ print(product_names) # Returns {'User1': 'Product1, Product2', 'User2': 'Product3, Product4, Product5'}
947
+ ```
948
+
949
+ ### Custom Calculations
950
+
951
+ ```python
952
+ # Simple calculation: tax inclusive price
953
+ with_tax = graph.type_filter('Product').calculate(
954
+ expression='price * 1.1',
955
+ store_as='price_with_tax'
956
+ )
957
+
958
+ # Aggregate calculations per group
959
+ user_spending = graph.type_filter('User').traverse('PURCHASED').calculate(
960
+ expression='sum(price * quantity)',
961
+ store_as='total_spent'
962
+ )
963
+
964
+ # Count operations
965
+ products_per_user = graph.type_filter('User').traverse('PURCHASED').count(
966
+ store_as='product_count',
967
+ group_by_parent=True
968
+ )
969
+ ```
970
+
971
+ ### Aggregating Connection Properties
972
+
973
+ Aggregate properties stored on connections (edges) rather than nodes. This is useful when you have data like ownership percentages, weights, or quantities stored on the connections themselves.
974
+
975
+ ```python
976
+ # Sum connection properties
977
+ # For each Discovery, sum the share_pct on its EXTENDS_INTO connections
978
+ total_shares = graph.type_filter('Discovery').traverse('EXTENDS_INTO').calculate(
979
+ expression='sum(share_pct)',
980
+ aggregate_connections=True # Key parameter for connection aggregation
981
+ )
982
+ print(total_shares) # Returns {'Discovery A': 100.0, 'Discovery B': 100.0}
983
+
984
+ # Average connection properties
985
+ avg_ownership = graph.type_filter('Company').traverse('OWNS').calculate(
986
+ expression='avg(ownership_pct)',
987
+ aggregate_connections=True
988
+ )
989
+
990
+ # Count connections
991
+ connection_count = graph.type_filter('Parent').traverse('HAS_CHILD').calculate(
992
+ expression='count(any_property)', # Use any property that exists on connections
993
+ aggregate_connections=True
994
+ )
995
+
996
+ # Store aggregated results on parent nodes
997
+ updated_graph = graph.type_filter('Prospect').traverse('HAS_ESTIMATE').calculate(
998
+ expression='sum(weight)',
999
+ aggregate_connections=True,
1000
+ store_as='total_weight' # Stores result on parent Prospect nodes
1001
+ )
1002
+ ```
1003
+
1004
+ **Supported aggregate functions for connections:**
1005
+
1006
+ - `sum(property)` - Sum of property values
1007
+ - `avg(property)` / `mean(property)` - Average of property values
1008
+ - `min(property)` - Minimum value
1009
+ - `max(property)` - Maximum value
1010
+ - `count(property)` - Count of connections (with non-null property values)
1011
+ - `std(property)` - Standard deviation
1012
+
1013
+ **Note:** Connection aggregation requires a traversal before `calculate()`. The results are grouped by the parent (source) node of the traversal.
1014
+
1015
+ ## Saving and Loading
1016
+
1017
+ ```python
1018
+ # Save graph to file
1019
+ graph.save("my_graph.bin")
1020
+
1021
+ # Load graph from file
1022
+ loaded_graph = rusty_graph.load("my_graph.bin")
1023
+ ```
1024
+
1025
+ ## Operation Reports
1026
+
1027
+ Rusty Graph provides detailed reports for operations that modify the graph, helping you track what changed and diagnose issues.
1028
+
1029
+ ### Getting Operation Reports
1030
+
1031
+ ```python
1032
+ # Add nodes and get the report
1033
+ report = graph.add_nodes(
1034
+ data=df,
1035
+ node_type='Product',
1036
+ unique_id_field='product_id'
1037
+ )
1038
+ print(f"Created {report['nodes_created']} nodes in {report['processing_time_ms']}ms")
1039
+
1040
+ # Check for errors
1041
+ if report['has_errors']:
1042
+ print(f"Errors: {report['errors']}")
1043
+ ```
1044
+
1045
+ ### Report Fields
1046
+
1047
+ Node operation reports include:
1048
+
1049
+ - `operation`: Type of operation performed
1050
+ - `timestamp`: When the operation occurred
1051
+ - `nodes_created`: Number of new nodes created
1052
+ - `nodes_updated`: Number of existing nodes updated
1053
+ - `nodes_skipped`: Number of nodes skipped (e.g., due to conflicts)
1054
+ - `processing_time_ms`: Time taken in milliseconds
1055
+ - `has_errors`: Boolean indicating if errors occurred
1056
+ - `errors`: List of error messages (if any)
1057
+
1058
+ Connection operation reports include:
1059
+
1060
+ - `connections_created`: Number of new connections created
1061
+ - `connections_skipped`: Number of connections skipped
1062
+ - `property_fields_tracked`: Number of property fields on connections
1063
+
1064
+ ### Operation History
1065
+
1066
+ ```python
1067
+ # Get the most recent operation report
1068
+ last_report = graph.get_last_report()
1069
+
1070
+ # Get the operation index (sequential counter)
1071
+ op_index = graph.get_operation_index()
1072
+
1073
+ # Get full operation history
1074
+ history = graph.get_report_history()
1075
+ for report in history:
1076
+ print(f"{report['operation']}: {report['timestamp']}")
1077
+ ```
1078
+
1079
+ ## Performance Tips
1080
+
1081
+ 1. **Batch Operations**: Add nodes and connections in batches rather than individually.
1082
+
1083
+ 2. **Specify Columns**: When adding nodes or connections, explicitly specify which columns to include to reduce memory usage.
1084
+
1085
+ 3. **Use Indexing**: Filter on node type first before applying other filters.
1086
+
1087
+ 4. **Avoid Overloading**: Keep node property count reasonable; too many properties per node will increase memory usage.
1088
+
1089
+ 5. **Conflict Handling**: Choose the appropriate conflict handling strategy:
1090
+ - Use `'update'` to merge new properties with existing ones
1091
+ - Use `'replace'` for a complete overwrite
1092
+ - Use `'skip'` to avoid any changes to existing nodes
1093
+ - Use `'preserve'` to only add missing properties
1094
+
1095
+ 6. **Connection Direction**: Specify direction in traversals when possible to improve performance.
1096
+
1097
+ 7. **Limit Results**: Use `max_nodes()` to limit result size when working with large datasets.
1098
+
1099
+ 8. **Create Indexes**: Use `create_index()` on frequently filtered properties for ~3.3x speedup on equality filters.
1100
+
1101
+ 9. **Use Pattern Matching Limits**: When using `match_pattern()`, set `max_matches` to avoid scanning the entire graph.
1102
+
@@ -0,0 +1,6 @@
1
+ rusty_graph\__init__.py,sha256=_Fds04T5qV95XgyZm7qIPfLghgoCZi-_hDbw-e_18oA,127
2
+ rusty_graph\rusty_graph.cp311-win_amd64.pyd,sha256=MYlmj-C-pbWO6D_YiWB3UEhKcOSfbqY1INIpiDPAi3k,1598976
3
+ rusty_graph-0.3.18.dist-info\METADATA,sha256=wlGTPk8AlIjdBdjhLR9OAuh8OAnfYIj25R_9JAzVokg,34186
4
+ rusty_graph-0.3.18.dist-info\WHEEL,sha256=X79LywvMB9iCuFHu88xBAFTJDhRqJi6Yh9hhoCI9jao,97
5
+ rusty_graph-0.3.18.dist-info\licenses\LICENSE,sha256=APR3-pK3VHs2KSVNVV9RQwAM0yIdoLq4FTCy1Cx2EMs,938
6
+ rusty_graph-0.3.18.dist-info\RECORD,,
@@ -1,8 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: rusty_graph
3
- Version: 0.3.17
4
- Classifier: Programming Language :: Rust
5
- Classifier: Programming Language :: Python :: Implementation :: CPython
6
- Classifier: Programming Language :: Python :: Implementation :: PyPy
7
- License-File: LICENSE
8
- Requires-Python: >=3.8
@@ -1,6 +0,0 @@
1
- rusty_graph\__init__.py,sha256=_Fds04T5qV95XgyZm7qIPfLghgoCZi-_hDbw-e_18oA,127
2
- rusty_graph\rusty_graph.cp311-win_amd64.pyd,sha256=P5gmKv6mNMb2VPv9KiT203DRNtVhRSdl593287XTxo0,1599488
3
- rusty_graph-0.3.17.dist-info\METADATA,sha256=0raMGmGeH9IdUO59MU6qokgChr3Mdi1yS7LjQ_Y9dsk,283
4
- rusty_graph-0.3.17.dist-info\WHEEL,sha256=X79LywvMB9iCuFHu88xBAFTJDhRqJi6Yh9hhoCI9jao,97
5
- rusty_graph-0.3.17.dist-info\licenses\LICENSE,sha256=APR3-pK3VHs2KSVNVV9RQwAM0yIdoLq4FTCy1Cx2EMs,938
6
- rusty_graph-0.3.17.dist-info\RECORD,,