xyzgraph 1.0.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- xyzgraph-1.0.1/LICENSE +21 -0
- xyzgraph-1.0.1/PKG-INFO +656 -0
- xyzgraph-1.0.1/README.md +639 -0
- xyzgraph-1.0.1/pyproject.toml +34 -0
- xyzgraph-1.0.1/setup.cfg +4 -0
- xyzgraph-1.0.1/src/xyzgraph/__init__.py +52 -0
- xyzgraph-1.0.1/src/xyzgraph/ascii_renderer.py +324 -0
- xyzgraph-1.0.1/src/xyzgraph/cli.py +120 -0
- xyzgraph-1.0.1/src/xyzgraph/compare.py +185 -0
- xyzgraph-1.0.1/src/xyzgraph/data/__init__.py +3 -0
- xyzgraph-1.0.1/src/xyzgraph/data/atom_symbols.json +9 -0
- xyzgraph-1.0.1/src/xyzgraph/data/expected_valences.json +107 -0
- xyzgraph-1.0.1/src/xyzgraph/data/valence_electrons.json +12 -0
- xyzgraph-1.0.1/src/xyzgraph/data/vdw_radii.json +477 -0
- xyzgraph-1.0.1/src/xyzgraph/data_loader.py +97 -0
- xyzgraph-1.0.1/src/xyzgraph/graph_builders.py +1507 -0
- xyzgraph-1.0.1/src/xyzgraph/utils.py +171 -0
- xyzgraph-1.0.1/src/xyzgraph.egg-info/PKG-INFO +656 -0
- xyzgraph-1.0.1/src/xyzgraph.egg-info/SOURCES.txt +21 -0
- xyzgraph-1.0.1/src/xyzgraph.egg-info/dependency_links.txt +1 -0
- xyzgraph-1.0.1/src/xyzgraph.egg-info/entry_points.txt +3 -0
- xyzgraph-1.0.1/src/xyzgraph.egg-info/requires.txt +3 -0
- xyzgraph-1.0.1/src/xyzgraph.egg-info/top_level.txt +1 -0
xyzgraph-1.0.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Ali Goodfellow
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
xyzgraph-1.0.1/PKG-INFO
ADDED
|
@@ -0,0 +1,656 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: xyzgraph
|
|
3
|
+
Version: 1.0.1
|
|
4
|
+
Summary: Convert xyz molecule file to a graph.
|
|
5
|
+
Author: Dr Alister Goodfellow
|
|
6
|
+
Project-URL: homepage, https://github.com/aligfellow/xyzgraph
|
|
7
|
+
Classifier: Programming Language :: Python :: 3
|
|
8
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
9
|
+
Classifier: Operating System :: OS Independent
|
|
10
|
+
Requires-Python: >=3.8
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: numpy
|
|
14
|
+
Requires-Dist: rdkit
|
|
15
|
+
Requires-Dist: networkx
|
|
16
|
+
Dynamic: license-file
|
|
17
|
+
|
|
18
|
+
# xyzgraph: Molecular Graph Construction from Cartesian Coordinates
|
|
19
|
+
|
|
20
|
+
**xyzgraph** is a Python toolkit for building molecular graphs (bond connectivity, bond orders, formal charges, and partial charges) directly from 3D atomic coordinates in XYZ format. It provides both **cheminformatics-based** and **quantum chemistry-based** (xTB) workflows.
|
|
21
|
+
|
|
22
|
+
---
|
|
23
|
+
|
|
24
|
+
## Table of Contents
|
|
25
|
+
|
|
26
|
+
1. [Key Features](#key-features)
|
|
27
|
+
2. [Installation](#installation)
|
|
28
|
+
3. [Quick Start](#quick-start)
|
|
29
|
+
4. [Methodology Overview](#methodology-overview)
|
|
30
|
+
5. [Workflow Comparison](#workflow-comparison)
|
|
31
|
+
6. [CLI Reference](#cli-reference)
|
|
32
|
+
7. [Python API](#python-api)
|
|
33
|
+
8. [Visualization](#visualization)
|
|
34
|
+
9. [Limitations & Future Work](#limitations--future-work)
|
|
35
|
+
10. [References](#references)
|
|
36
|
+
11. [Contributing & Contact](#contributing--contact)
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## Key Features
|
|
41
|
+
|
|
42
|
+
- **Distance-based initial bonding** using *consistent* van der Waals radii across *all elements* from Charry and Tkatchenko [[1]](https://doi.org/10.1021/acs.jctc.4c00784)
|
|
43
|
+
- **Two construction methods**:
|
|
44
|
+
- `cheminf`: Pure cheminformatics with bond order optimization
|
|
45
|
+
- `xtb`: semi-empirical calculation of bond orders via xTB Wiberg bond orders with Mulliken charges [[2]](https://pubs.acs.org/doi/10.1021/acs.jctc.8b01176)
|
|
46
|
+
- **Cheminformatics modes**:
|
|
47
|
+
- `--quick`: Fast (crude) valence adjustment
|
|
48
|
+
- Full optimization with valence and charge minimisation
|
|
49
|
+
- `--optimizer`:
|
|
50
|
+
**beam**: optimization across multiple paths (slightly slower)
|
|
51
|
+
**greedy**: iterative valence adjustment
|
|
52
|
+
- **Aromatic detection**: Hückel 4n+2 rule for 6-membered rings
|
|
53
|
+
- **Charge computation**: Gasteiger (cheminf) or Mulliken (xTB) partial charges
|
|
54
|
+
- **RDkit/xyz2mol comparison** validation against RDKit bond perception [[3]](https://github.com/jensengroup/xyz2mol), [[4]](https://github.com/rdkit)
|
|
55
|
+
- **ASCII 2D depiction** with layout alignment for method comparison (see also [[5]](https://github.com/whitead/moltext))
|
|
56
|
+
|
|
57
|
+
---
|
|
58
|
+
|
|
59
|
+
## Installation
|
|
60
|
+
|
|
61
|
+
### From PyPI - *coming soon (maybe)*
|
|
62
|
+
```bash
|
|
63
|
+
pip install xyzgraph
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### From Source
|
|
67
|
+
```bash
|
|
68
|
+
git clone https://github.com/aligfellow/xyzgraph
|
|
69
|
+
cd xyzgraph
|
|
70
|
+
pip install .
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### Dependencies
|
|
74
|
+
- **Core**: `numpy`, `networkx`, `rdkit`
|
|
75
|
+
- **Optional**: [xTB binary](https://github.com/grimme-lab/xtb) (for `--method xtb`)
|
|
76
|
+
|
|
77
|
+
To install xTB (Linux/macOS) see [here](https://github.com/grimme-lab/xtb):
|
|
78
|
+
```bash
|
|
79
|
+
conda install -c conda-forge xtb # or download from GitHub releases
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
---
|
|
83
|
+
|
|
84
|
+
## Quick Start
|
|
85
|
+
|
|
86
|
+
### CLI Examples
|
|
87
|
+
|
|
88
|
+
**Minimal usage** (auto-displays ASCII depiction):
|
|
89
|
+
```bash
|
|
90
|
+
xyzgraph molecule.xyz
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
**Specify charge and method**:
|
|
94
|
+
```bash
|
|
95
|
+
xyzgraph molecule.xyz --method xtb --charge -1 --multiplicity 2
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
**Detailed debug output**:
|
|
99
|
+
```bash
|
|
100
|
+
xyzgraph molecule.xyz --debug
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**Compare with RDKit**:
|
|
104
|
+
```bash
|
|
105
|
+
xyzgraph molecule.xyz --compare-rdkit
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Python API
|
|
109
|
+
|
|
110
|
+
**Basic usage**:
|
|
111
|
+
```python
|
|
112
|
+
from xyzgraph import build_graph, graph_to_ascii, read_xyz_file
|
|
113
|
+
|
|
114
|
+
atoms = read_xyz_file("molecule.xyz")
|
|
115
|
+
G = build_graph(atoms, charge=0)
|
|
116
|
+
# OR
|
|
117
|
+
G = build_graph("molecule.xyz", charge=0)
|
|
118
|
+
|
|
119
|
+
# Print ASCII structure
|
|
120
|
+
print(graph_to_ascii(G, scale=3.0, include_h=False))
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
## Methodology Overview
|
|
126
|
+
|
|
127
|
+
### Design Philosophy
|
|
128
|
+
|
|
129
|
+
xyzgraph offers two distinct pathways for molecular graph construction:
|
|
130
|
+
|
|
131
|
+
1. **Cheminformatics Path** (`method='cheminf'`):
|
|
132
|
+
- Pure graph-based approach using chemical heuristics
|
|
133
|
+
- No external quantum chemistry calls
|
|
134
|
+
- Cached scoring, valence, edge and graph properties
|
|
135
|
+
- Fast and suitable for both organic *and* inorganic molecules
|
|
136
|
+
|
|
137
|
+
2. **Quantum Chemistry Path** (`method='xtb'`):
|
|
138
|
+
- Uses GFN2-xTB (extended tight-binding) calculations [[2]](https://pubs.acs.org/doi/10.1021/acs.jctc.8b01176)
|
|
139
|
+
- Reads in Wiberg bond orders and Mulliken charges from output
|
|
140
|
+
- Potentially more accurate for unusual bonding situations
|
|
141
|
+
- *though, xTB may be less robust in these situations*
|
|
142
|
+
- Requires xTB binary installation
|
|
143
|
+
|
|
144
|
+
### Cheminformatics Workflow (method='cheminf')
|
|
145
|
+
|
|
146
|
+
```
|
|
147
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
148
|
+
│ 1. Input Processing │
|
|
149
|
+
│ • Parse XYZ file internally │
|
|
150
|
+
│ • Load reference data (VDW radii, valences, electrons) │
|
|
151
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
152
|
+
│
|
|
153
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
154
|
+
│ 2. Initial Bond Graph (Distance-Based) │
|
|
155
|
+
│ • Compute pairwise distances │
|
|
156
|
+
│ • Apply scaled VDW thresholds: │
|
|
157
|
+
│ - H-nonmetal: 0.42 × (r₁ + r₂) │
|
|
158
|
+
│ - H-metal: 0.50 × (r₁ + r₂) │
|
|
159
|
+
│ - Nonmetal-nonmetal: 0.55 × (r₁ + r₂) │
|
|
160
|
+
│ - Metal-ligand: 0.65 × (r₁ + r₂) │
|
|
161
|
+
│ • Create graph with single bonds (order = 1.0) │
|
|
162
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
163
|
+
│
|
|
164
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
165
|
+
│ 3. Ring Pruning │
|
|
166
|
+
│ • Detect cycles (NetworkX cycle_basis) │
|
|
167
|
+
│ • Remove geometrically distorted small rings (3,4-membered) │
|
|
168
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
169
|
+
│
|
|
170
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
171
|
+
│ 3.5 Kekulé Initialization for Aromatic Rings │
|
|
172
|
+
│ • Find 6-membered planar rings with C/N/O/S/B │
|
|
173
|
+
│ • Initialize alternating bond orders: 2-1-2-1-2-1 │
|
|
174
|
+
│ • Handle fused rings (naphthalene, anthracene): │
|
|
175
|
+
│ - Detecting shared edges from previous rings │
|
|
176
|
+
│ - Validated across extended ring system │
|
|
177
|
+
│ • Gives optimizer excellent starting point │
|
|
178
|
+
│ • Reduces iterations needed for aromatic systems │
|
|
179
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
180
|
+
│
|
|
181
|
+
┌──────────┴─────────────┐
|
|
182
|
+
│ │
|
|
183
|
+
┌─────────▼────────────┐ ┌───────▼──────────────────────────────┐
|
|
184
|
+
│ 4a. Quick Mode │ │ 4b. Full Optimization │
|
|
185
|
+
│ • Lock metal bonds │ │ • Lock metal bonds at 1.0 │
|
|
186
|
+
│ • 3 iterations │ │ • Iterative BIDIRECTIONAL search: │
|
|
187
|
+
│ • Promote bonds │ │ - Test both +1 AND -1 changes │
|
|
188
|
+
│ where both atoms │ │ - Allows Kekulé structure swaps │
|
|
189
|
+
│ need increased │ │ • Score = f(valence_error, │
|
|
190
|
+
│ valence │ │ formal_charges, │
|
|
191
|
+
│ • Distance check │ │ electronegativity, │
|
|
192
|
+
│ │ │ conjugation_penalty) │
|
|
193
|
+
│ │ │ • Optimizer choice: │
|
|
194
|
+
│ │ │ - Beam: parallel hypotheses │
|
|
195
|
+
│ │ │ - Greedy: single best change │
|
|
196
|
+
│ │ │ • Cache where possible for speed │
|
|
197
|
+
│ │ │ • Top-k edge candidate selection │
|
|
198
|
+
└─────────┬────────────┘ └──────────┬───────────────────────────┘
|
|
199
|
+
└───────────────────────────┘
|
|
200
|
+
│
|
|
201
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
202
|
+
│ 5. Aromatic Detection (Hückel 4n+2) │
|
|
203
|
+
│ • Find 5/6-membered rings with C/N/O/S/P │
|
|
204
|
+
│ • Count π electrons (sp² carbons → 1e, N/O/S LP → 2e) │
|
|
205
|
+
│ • Apply Hückel rule: 4n+2 π electrons │
|
|
206
|
+
│ • Set aromatic bonds to 1.5 │
|
|
207
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
208
|
+
│
|
|
209
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
210
|
+
│ 6. Formal Charge Assignment │
|
|
211
|
+
│ • For each non-metal atom: │
|
|
212
|
+
│ - B = 2 × Σ(bond_orders) │
|
|
213
|
+
│ - L = max(0, target - B) [target: 2 for H, 8 otherwise] │
|
|
214
|
+
│ - formal = V_electrons - (L + B/2) │
|
|
215
|
+
│ • Balance total to match system charge │
|
|
216
|
+
│ • Metals forced to 0 (coordination not oxidation state) │
|
|
217
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
218
|
+
│
|
|
219
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
220
|
+
│ 7. Gasteiger Partial Charges │
|
|
221
|
+
│ • Convert bond orders to RDKit bond types │
|
|
222
|
+
│ • Compute Gasteiger charges │
|
|
223
|
+
│ • Adjust for total charge conservation │
|
|
224
|
+
│ • Aggregate H charges onto heavy atoms │
|
|
225
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
226
|
+
│
|
|
227
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
228
|
+
│ 9. Output Graph │
|
|
229
|
+
│ Nodes: symbol, formal_charge, charges{}, agg_charge, valence │
|
|
230
|
+
│ Edges: bond_order, bond_type, metal_coord │
|
|
231
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
### xTB Workflow (method='xtb')
|
|
235
|
+
|
|
236
|
+
```
|
|
237
|
+
┌─────────────────────────────────────────────────────────────────┐
|
|
238
|
+
│ 1. Input Processing |
|
|
239
|
+
│ • Parse XYZ file internally │
|
|
240
|
+
│ • Write XYZ to temporary directory │
|
|
241
|
+
│ • Set up xTB calculation parameters │
|
|
242
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
243
|
+
│
|
|
244
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
245
|
+
│ 2. Run xTB Calculation │
|
|
246
|
+
│ Command: xtb <file>.xyz --chrg <charge> --uhf <unpaired> │
|
|
247
|
+
│ • GFN2-xTB Hamiltonian │
|
|
248
|
+
│ • Single-point calculation │
|
|
249
|
+
│ • Wiberg bond order analysis │
|
|
250
|
+
│ • Mulliken population analysis │
|
|
251
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
252
|
+
│
|
|
253
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
254
|
+
│ 3. Parse xTB Output │
|
|
255
|
+
│ • Read wbo file (Wiberg bond orders) │
|
|
256
|
+
│ • Read charges file (Mulliken atomic charges) │
|
|
257
|
+
│ • Threshold: bond_order > 0.5 → create edge │
|
|
258
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
259
|
+
│
|
|
260
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
261
|
+
│ 4. Build Graph from xTB Data │
|
|
262
|
+
│ • Create nodes with Mulliken charges │
|
|
263
|
+
│ • Create edges with Wiberg bond orders │
|
|
264
|
+
│ • No further optimization needed │
|
|
265
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
266
|
+
│
|
|
267
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
268
|
+
│ 5. Cleanup (optional) │
|
|
269
|
+
│ • Remove temporary xTB files (unless --no-clean) │
|
|
270
|
+
└────────────────────┬────────────────────────────────────────────┘
|
|
271
|
+
│
|
|
272
|
+
┌────────────────────▼────────────────────────────────────────────┐
|
|
273
|
+
│ 6. Output Graph │
|
|
274
|
+
│ Nodes: symbol, charges{'mulliken': ...}, agg_charge, valence │
|
|
275
|
+
│ Edges: bond_order (Wiberg), bond_type, metal_coord │
|
|
276
|
+
└─────────────────────────────────────────────────────────────────┘
|
|
277
|
+
```
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## Workflow Comparison
|
|
282
|
+
|
|
283
|
+
| Feature | cheminf (quick) | cheminf (full) | xtb |
|
|
284
|
+
|---------|----------------|----------------|-----|
|
|
285
|
+
| **Speed** | Very Fast | Fast | Moderate |
|
|
286
|
+
| **Accuracy** | Okay for simple molecules | Very good across various systems | Only limited by xTB performance (QM-based) |
|
|
287
|
+
| **External deps** | None | None | Requires xTB binary |
|
|
288
|
+
| **Bond orders** | Heuristic (integer-like) | Optimized formal charge and valency | Wiberg (fractional) |
|
|
289
|
+
| **Charges** | Gasteiger | Gasteiger | Mulliken |
|
|
290
|
+
| **Metal complexes** | Limited | Reasonable | Reasonable (limited by xTB metal performance) |
|
|
291
|
+
| **Conjugated systems** | Basic | Excellent | Excellent |
|
|
292
|
+
| **Best for** | Quick checks, where connectivity most important | Most cases | Awkward bonding, validation |
|
|
293
|
+
|
|
294
|
+
### When to Use Each Method
|
|
295
|
+
|
|
296
|
+
**Use `--method cheminf` (default)**:
|
|
297
|
+
- Most use cases
|
|
298
|
+
- No xTB installation available
|
|
299
|
+
- Batch processing structures
|
|
300
|
+
|
|
301
|
+
**Use `--method cheminf --quick`**:
|
|
302
|
+
- Extremely large molecules
|
|
303
|
+
- Initial rapid screening
|
|
304
|
+
- When approximate bond orders suffice
|
|
305
|
+
|
|
306
|
+
**Use `--method xtb`**:
|
|
307
|
+
- Validation of cheminf results
|
|
308
|
+
- Unusual electronic structures
|
|
309
|
+
- Low confidence in bonding structure
|
|
310
|
+
|
|
311
|
+
### Optimizer Algorithms (cheminf full mode only)
|
|
312
|
+
|
|
313
|
+
**Beam Search Optimizer** (`--optimizer beam`, default `--beam-width 3`):
|
|
314
|
+
- Explores multiple optimization paths in parallel
|
|
315
|
+
- Maintains top-k hypotheses at each iteration
|
|
316
|
+
- Bidirectional: tests both +1 and -1 bond orders for each hypothesis
|
|
317
|
+
- More robust against local minima
|
|
318
|
+
- Slower, but better convergence
|
|
319
|
+
- Best for robust bonding assignment across periodic table
|
|
320
|
+
|
|
321
|
+
**Greedy Optimizer** (`--optimizer greedy`, default in code):
|
|
322
|
+
- Tests all candidate edges, picks single best change per iteration
|
|
323
|
+
- Bidirectional: tests both +1 and -1 bond order changes
|
|
324
|
+
- Fast and effective for most molecules
|
|
325
|
+
- Can get stuck in local minima (*e.g.* alpha, beta unsaturated systems)
|
|
326
|
+
|
|
327
|
+
---
|
|
328
|
+
|
|
329
|
+
## CLI Reference
|
|
330
|
+
|
|
331
|
+
### Command Syntax
|
|
332
|
+
```bash
|
|
333
|
+
> xyzgraph -h
|
|
334
|
+
usage: xyzgraph [-h] [--method {cheminf,xtb}] [-q] [--max-iter MAX_ITER] [--edge-per-iter EDGE_PER_ITER] [-o {greedy,beam}] [-bw BEAM_WIDTH] [--bond BOND]
|
|
335
|
+
[--unbond UNBOND] [-c CHARGE] [-m MULTIPLICITY] [-b] [-d] [-a] [-as ASCII_SCALE] [-H] [--compare-rdkit] [--no-clean]
|
|
336
|
+
xyz
|
|
337
|
+
|
|
338
|
+
Build molecular graph from XYZ.
|
|
339
|
+
|
|
340
|
+
positional arguments:
|
|
341
|
+
xyz Input XYZ file
|
|
342
|
+
|
|
343
|
+
options:
|
|
344
|
+
-h, --help show this help message and exit
|
|
345
|
+
--method {cheminf,xtb}
|
|
346
|
+
Graph construction method (default: cheminf) (xtb requires xTB binary installed and available in PATH)
|
|
347
|
+
-q, --quick Quick mode: fast heuristics, less accuracy (NOT recommended)
|
|
348
|
+
--max-iter MAX_ITER Maximum iterations for bond order optimization (default: 50, cheminf only)
|
|
349
|
+
-t THRESHOLD, --threshold THRESHOLD
|
|
350
|
+
Scaling factor for bond detection thresholds (default: 1.0)
|
|
351
|
+
--edge-per-iter EDGE_PER_ITER
|
|
352
|
+
Number of edges to adjust per iteration (default: 10, cheminf only)
|
|
353
|
+
-o {greedy,beam}, --optimizer {greedy,beam}
|
|
354
|
+
Optimization algorithm (default: beam, cheminf , BEAM recommended)
|
|
355
|
+
-bw BEAM_WIDTH, --beam-width BEAM_WIDTH
|
|
356
|
+
Beam width for beam search (default: 3). i.e. number of candidate graphs to retain per iteration
|
|
357
|
+
--bond BOND Specify atoms that must be bonded in the graph construction. Example: --bond 0,1 2,3
|
|
358
|
+
--unbond UNBOND Specify that two atoms indices are NOT bonded in the graph construction. Example: --unbond 0,1 1,2
|
|
359
|
+
-c CHARGE, --charge CHARGE
|
|
360
|
+
Total molecular charge (default: 0)
|
|
361
|
+
-m MULTIPLICITY, --multiplicity MULTIPLICITY
|
|
362
|
+
Spin multiplicity (auto-detected if not specified)
|
|
363
|
+
-b, --bohr XYZ file provided in units bohr (default is Angstrom)
|
|
364
|
+
-d, --debug Enable debug output (construction details + graph report)
|
|
365
|
+
-a, --ascii Show 2D ASCII depiction (auto-enabled if no other output)
|
|
366
|
+
-as ASCII_SCALE, --ascii-scale ASCII_SCALE
|
|
367
|
+
ASCII scaling factor (default: 3.0)
|
|
368
|
+
-H, --show-h Include hydrogens in visualizations (hidden by default)
|
|
369
|
+
--compare-rdkit Compare with xyz2mol output (uses rdkit implementation)
|
|
370
|
+
--no-clean Keep temporary xTB files (only for --method xtb)
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
**Method comparison**:
|
|
374
|
+
```bash
|
|
375
|
+
xyzgraph molecule.xyz --debug > cheminf.txt
|
|
376
|
+
xyzgraph molecule.xyz --method xtb --debug > xtb.txt
|
|
377
|
+
diff cheminf.txt xtb.txt
|
|
378
|
+
```
|
|
379
|
+
|
|
380
|
+
**Validate against RDKit**:
|
|
381
|
+
```bash
|
|
382
|
+
xyzgraph molecule.xyz --compare-xyz2mol
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## Python API
|
|
388
|
+
|
|
389
|
+
Direct graph construction:
|
|
390
|
+
|
|
391
|
+
```python
|
|
392
|
+
from xyzgraph import build_graph, graph_debug_report
|
|
393
|
+
|
|
394
|
+
# Cheminf full optimization
|
|
395
|
+
G_full = build_graph(
|
|
396
|
+
atoms='molecule.xyz',
|
|
397
|
+
charge=0,
|
|
398
|
+
max_iter=50, # maximum iterations (normally converged <20)
|
|
399
|
+
edge_per_iter=6,
|
|
400
|
+
bond=[(0,1)], # ensure a bond between 0 and 1
|
|
401
|
+
debug=True
|
|
402
|
+
)
|
|
403
|
+
```
|
|
404
|
+
|
|
405
|
+
---
|
|
406
|
+
|
|
407
|
+
## Visualization
|
|
408
|
+
|
|
409
|
+
### ASCII Depiction
|
|
410
|
+
|
|
411
|
+
xyzgraph includes a built-in ASCII renderer for 2D molecular structures. This is heavily inspired by work elsewhere, *e.g.* [[5]](https://github.com/whitead/moltext) by Andrew White.
|
|
412
|
+
|
|
413
|
+
```python
|
|
414
|
+
from xyzgraph import graph_to_ascii
|
|
415
|
+
|
|
416
|
+
# Basic rendering
|
|
417
|
+
ascii_art = graph_to_ascii(G, scale=3.0, include_h=False)
|
|
418
|
+
print(ascii_art)
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
**Output example** (acyl isothiouronium):
|
|
422
|
+
```
|
|
423
|
+
C
|
|
424
|
+
\
|
|
425
|
+
\
|
|
426
|
+
C-------C
|
|
427
|
+
///
|
|
428
|
+
---C- /C-------C
|
|
429
|
+
C--- --- // \ /C----
|
|
430
|
+
/ -C------N\ \ / ---C
|
|
431
|
+
C / \\ /C-------C/ \\
|
|
432
|
+
\\ / \\ // \ C
|
|
433
|
+
\\ ---C- -C\-----N/ \ //
|
|
434
|
+
C--- ---- --- \ C--- //
|
|
435
|
+
-S- \ ----C
|
|
436
|
+
/C===
|
|
437
|
+
// =======O
|
|
438
|
+
C\ ====
|
|
439
|
+
\\
|
|
440
|
+
\\
|
|
441
|
+
/C\
|
|
442
|
+
//
|
|
443
|
+
C/
|
|
444
|
+
```
|
|
445
|
+
|
|
446
|
+
**Features**:
|
|
447
|
+
- Single bonds: `-`, `|`, `/`, `\`
|
|
448
|
+
- Double bonds: `=`, `‖` (parallel lines)
|
|
449
|
+
- Triple bonds: `#`
|
|
450
|
+
- Aromatic: 1.5 bond orders shown as single
|
|
451
|
+
- Special edges: `*` (TS), `.` (NCI) if `G.edges[i,j]['TS']=True`
|
|
452
|
+
|
|
453
|
+
### Layout Alignment
|
|
454
|
+
|
|
455
|
+
Compare methods by aligning their ASCII depictions:
|
|
456
|
+
|
|
457
|
+
```python
|
|
458
|
+
from xyzgraph import build_graph, graph_to_ascii
|
|
459
|
+
|
|
460
|
+
# Build with both methods
|
|
461
|
+
G_cheminf = build_graph(atoms, method='cheminf')
|
|
462
|
+
G_xtb = build_graph(atoms, method='xtb')
|
|
463
|
+
|
|
464
|
+
# Generate aligned depictions
|
|
465
|
+
ascii_ref, layout = graph_to_ascii(G_cheminf)
|
|
466
|
+
ascii_xtb = graph_to_ascii(G_xtb, reference_layout=layout)
|
|
467
|
+
|
|
468
|
+
print("Cheminf:\n", ascii_ref)
|
|
469
|
+
print("\nxTB:\n", ascii_xtb)
|
|
470
|
+
```
|
|
471
|
+
|
|
472
|
+
### Debug Report
|
|
473
|
+
|
|
474
|
+
Tabular listing of all atoms and bonds:
|
|
475
|
+
|
|
476
|
+
```python
|
|
477
|
+
from xyzgraph import graph_debug_report
|
|
478
|
+
|
|
479
|
+
report = graph_debug_report(G, include_h=False)
|
|
480
|
+
print(report)
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
**Full example**:
|
|
484
|
+
```text
|
|
485
|
+
> xyzgraph benzene_NH4-cation-pi.xyz -c 1 -a -d
|
|
486
|
+
|
|
487
|
+
============================================================
|
|
488
|
+
BUILDING GRAPH (CHEMINF, FULL MODE)
|
|
489
|
+
Atoms: 17, Charge: 1, Multiplicity: 1
|
|
490
|
+
============================================================
|
|
491
|
+
|
|
492
|
+
Added 17 atoms
|
|
493
|
+
Initial bonds: 16
|
|
494
|
+
Found 1 rings
|
|
495
|
+
Initial bonds: 16
|
|
496
|
+
Pruning distorted rings (sizes: [3, 4])
|
|
497
|
+
Initialized 1 6-membered carbon rings with Kekulé pattern
|
|
498
|
+
============================================================
|
|
499
|
+
|
|
500
|
+
BEAM SEARCH OPTIMIZATION (width=3)
|
|
501
|
+
============================================================
|
|
502
|
+
Initial score: 15.50
|
|
503
|
+
|
|
504
|
+
Iteration 1:
|
|
505
|
+
No improvements found in any beam, stopping
|
|
506
|
+
|
|
507
|
+
Applying best solution to graph...
|
|
508
|
+
------------------------------------------------------------
|
|
509
|
+
Explored 13 states across 1 iterations
|
|
510
|
+
Found 0 improvements
|
|
511
|
+
Score: 15.50 → 15.50
|
|
512
|
+
------------------------------------------------------------
|
|
513
|
+
|
|
514
|
+
============================================================
|
|
515
|
+
AROMATIC RING DETECTION (Hückel 4n+2)
|
|
516
|
+
============================================================
|
|
517
|
+
|
|
518
|
+
Ring 1 (6-membered): ['C0', 'C1', 'C2', 'C3', 'C4', 'C5']
|
|
519
|
+
π electrons: 6 (C0:1, C1:1, C2:1, C3:1, C4:1, C5:1)
|
|
520
|
+
✓ AROMATIC (4n+2 rule: n=1)
|
|
521
|
+
|
|
522
|
+
------------------------------------------------------------
|
|
523
|
+
SUMMARY: 1 aromatic rings, 6 bonds set to 1.5
|
|
524
|
+
------------------------------------------------------------
|
|
525
|
+
|
|
526
|
+
Gasteiger charge calculation failed: Explicit valence for atom # 12 N, 4, is greater than permitted
|
|
527
|
+
|
|
528
|
+
============================================================
|
|
529
|
+
GRAPH CONSTRUCTION COMPLETE
|
|
530
|
+
============================================================
|
|
531
|
+
|
|
532
|
+
# Molecular Graph: 17 atoms, 16 bonds
|
|
533
|
+
# total_charge=1 multiplicity=1 sum(gasteiger)=+1.000 sum(gasteiger_raw)=+0.000
|
|
534
|
+
# (C–H hydrogens hidden; heteroatom-bound hydrogens shown; valences still include all H)
|
|
535
|
+
# [idx] Sym val=.. chg=.. agg=.. | neighbors: idx(order / aromatic flag)
|
|
536
|
+
[ 0] C val=4.00 formal=+0 chg=+0.059 agg=+0.118 | 1(1.50*) 5(1.50*)
|
|
537
|
+
[ 1] C val=4.00 formal=+0 chg=+0.059 agg=+0.118 | 0(1.50*) 2(1.50*)
|
|
538
|
+
[ 2] C val=4.00 formal=+0 chg=+0.059 agg=+0.118 | 1(1.50*) 3(1.50*)
|
|
539
|
+
[ 3] C val=4.00 formal=+0 chg=+0.059 agg=+0.118 | 2(1.50*) 4(1.50*)
|
|
540
|
+
[ 4] C val=4.00 formal=+0 chg=+0.059 agg=+0.118 | 3(1.50*) 5(1.50*)
|
|
541
|
+
[ 5] C val=4.00 formal=+0 chg=+0.059 agg=+0.118 | 0(1.50*) 4(1.50*)
|
|
542
|
+
[ 12] N val=4.00 formal=+1 chg=+0.059 agg=+0.294 | 13(1.00) 14(1.00) 15(1.00) 16(1.00)
|
|
543
|
+
[ 13] H val=1.00 formal=+0 chg=+0.059 agg=+0.059 | 12(1.00)
|
|
544
|
+
[ 14] H val=1.00 formal=+0 chg=+0.059 agg=+0.059 | 12(1.00)
|
|
545
|
+
[ 15] H val=1.00 formal=+0 chg=+0.059 agg=+0.059 | 12(1.00)
|
|
546
|
+
[ 16] H val=1.00 formal=+0 chg=+0.059 agg=+0.059 | 12(1.00)
|
|
547
|
+
|
|
548
|
+
# Bonds (i-j: order) (filtered)
|
|
549
|
+
[ 0- 1]: 1.50
|
|
550
|
+
[ 0- 5]: 1.50
|
|
551
|
+
[ 1- 2]: 1.50
|
|
552
|
+
[ 2- 3]: 1.50
|
|
553
|
+
[ 3- 4]: 1.50
|
|
554
|
+
[ 4- 5]: 1.50
|
|
555
|
+
[12-13]: 1.00
|
|
556
|
+
[12-14]: 1.00
|
|
557
|
+
[12-15]: 1.00
|
|
558
|
+
[12-16]: 1.00
|
|
559
|
+
|
|
560
|
+
============================================================
|
|
561
|
+
# ASCII Depiction
|
|
562
|
+
============================================================
|
|
563
|
+
-C-------------------C-
|
|
564
|
+
--- ----
|
|
565
|
+
---- ----
|
|
566
|
+
C- -C
|
|
567
|
+
\\ ///
|
|
568
|
+
\\\ //
|
|
569
|
+
\\ ///
|
|
570
|
+
\C-------------------C/
|
|
571
|
+
|
|
572
|
+
|
|
573
|
+
H
|
|
574
|
+
|
|
|
575
|
+
|
|
|
576
|
+
|
|
|
577
|
+
H-------------------N--------------------H
|
|
578
|
+
|
|
|
579
|
+
|
|
|
580
|
+
|
|
|
581
|
+
H
|
|
582
|
+
```
|
|
583
|
+
|
|
584
|
+
---
|
|
585
|
+
|
|
586
|
+
## Limitations & Future Work
|
|
587
|
+
|
|
588
|
+
### Current Limitations
|
|
589
|
+
|
|
590
|
+
1. **Metal Complexes**
|
|
591
|
+
- Bond orders locked at 1.0 (no d-orbital chemistry)
|
|
592
|
+
- Formal charges set to 0 (coordination, not oxidation state)
|
|
593
|
+
- Metal-metal bonds *not* supported
|
|
594
|
+
- **Future**:
|
|
595
|
+
- Metal-metal bonds
|
|
596
|
+
|
|
597
|
+
2. **Radicals & Open-Shell Systems**
|
|
598
|
+
- Should solve a valence structure
|
|
599
|
+
- Not explicity dealt with currently
|
|
600
|
+
- *May* behave, *may* be unreliable
|
|
601
|
+
|
|
602
|
+
3. **Zwitterions**
|
|
603
|
+
- Formal charge and valence analysis does identify `-[N+](=O)(-[O-])` bonding and formal charge pattern
|
|
604
|
+
- *May* not always be fully robust
|
|
605
|
+
|
|
606
|
+
4. **Large Conjugated Systems**
|
|
607
|
+
- May need many iterations for convergence (better with kekule initialised rings)
|
|
608
|
+
- Conjugation penalty heuristic (not full π-MO analysis)
|
|
609
|
+
|
|
610
|
+
5. **Charged Aromatics**
|
|
611
|
+
- Hückel electron counting simplified (doesn't account for ionic charge)
|
|
612
|
+
- Should still solve with valence/charge optimisation
|
|
613
|
+
|
|
614
|
+
---
|
|
615
|
+
|
|
616
|
+
### Built-in Comparison
|
|
617
|
+
|
|
618
|
+
xyzgraph can directly compare its output to rdkit/xyz2mol:
|
|
619
|
+
|
|
620
|
+
```bash
|
|
621
|
+
xyzgraph molecule.xyz --compare-rdkit --debug
|
|
622
|
+
```
|
|
623
|
+
|
|
624
|
+
**Output includes**:
|
|
625
|
+
- Layout-aligned ASCII depictions
|
|
626
|
+
- Edge differences (bonds only in one method)
|
|
627
|
+
- Bond order differences (Δ ≥ 0.25)
|
|
628
|
+
|
|
629
|
+
**Example**:
|
|
630
|
+
```
|
|
631
|
+
# Bond differences: only_in_native=1 only_in_rdkit=0 bond_order_diffs=2
|
|
632
|
+
# only_in_native: 4-7
|
|
633
|
+
# bond_order_diffs (Δ≥0.25):
|
|
634
|
+
# 1-2 native=1.50 rdkit=1.00 Δ=+0.50
|
|
635
|
+
# 2-3 native=2.00 rdkit=1.50 Δ=+0.50
|
|
636
|
+
```
|
|
637
|
+
|
|
638
|
+
---
|
|
639
|
+
|
|
640
|
+
## References
|
|
641
|
+
|
|
642
|
+
1. **van der Waals Radii**: Jorge Charry and Alexandre Tkatchenko, *J. Chem. Theory Comput.*, 2024, **20**, 7469–7478. [DOI](https://doi.org/10.1021/acs.jctc.4c00784).
|
|
643
|
+
|
|
644
|
+
2. **xTB (Extended Tight Binding)**: Christoph Bannwarth, Sebastian Ehlert, and Stefan Grimme, *J. Chem. Theory Comput.* 2019, **15**, 1652–1671. [DOI](https://pubs.acs.org/doi/10.1021/acs.jctc.8b01176). [Repo](https://github.com/grimme-lab/xtb).
|
|
645
|
+
|
|
646
|
+
3. **xyz2mol**: Jan Jensen *et al.*, [xyz2mol](https://github.com/jensengroup/xyz2mol). Now integrated into RDKit as `Chem.rdDetermineBonds.DetermineBonds()`. See also Y. Kim, W. Y. Kim, *Bull. Korean Chem. Soc.*, 2015, **36**, 1769–1777.
|
|
647
|
+
|
|
648
|
+
4. **RDKit**: RDKit: Open-source cheminformatics. [https://www.rdkit.org](https://www.rdkit.org). [Repo](https://github.com/rdkit).
|
|
649
|
+
|
|
650
|
+
5. **moltext**: A. White, *moltext*. [Repo](https://github.com/whitead/moltext)
|
|
651
|
+
|
|
652
|
+
---
|
|
653
|
+
|
|
654
|
+
## Contributing & Contact
|
|
655
|
+
|
|
656
|
+
Contributions welcome! Please open an issue or pull request and get in touch with any questions [here](https://github.com/aligfellow/xyzgraph/issues).
|