pasted 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pasted-0.1.0/PKG-INFO +330 -0
- pasted-0.1.0/README.md +307 -0
- pasted-0.1.0/pyproject.toml +78 -0
- pasted-0.1.0/setup.cfg +4 -0
- pasted-0.1.0/src/pasted/__init__.py +71 -0
- pasted-0.1.0/src/pasted/__main__.py +3 -0
- pasted-0.1.0/src/pasted/_atoms.py +240 -0
- pasted-0.1.0/src/pasted/_generator.py +626 -0
- pasted-0.1.0/src/pasted/_io.py +64 -0
- pasted-0.1.0/src/pasted/_metrics.py +292 -0
- pasted-0.1.0/src/pasted/_placement.py +325 -0
- pasted-0.1.0/src/pasted/cli.py +242 -0
- pasted-0.1.0/src/pasted.egg-info/PKG-INFO +330 -0
- pasted-0.1.0/src/pasted.egg-info/SOURCES.txt +20 -0
- pasted-0.1.0/src/pasted.egg-info/dependency_links.txt +1 -0
- pasted-0.1.0/src/pasted.egg-info/entry_points.txt +2 -0
- pasted-0.1.0/src/pasted.egg-info/requires.txt +7 -0
- pasted-0.1.0/src/pasted.egg-info/top_level.txt +1 -0
- pasted-0.1.0/tests/test_atoms.py +187 -0
- pasted-0.1.0/tests/test_generator.py +266 -0
- pasted-0.1.0/tests/test_metrics.py +213 -0
- pasted-0.1.0/tests/test_placement.py +197 -0
pasted-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,330 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: pasted
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Pointless Atom STructure with Entropy Diagnostics — structure fuzzer for QC/ML-potential codes
|
|
5
|
+
License-Expression: MIT
|
|
6
|
+
Keywords: quantum chemistry,structure generation,fuzzing,machine learning potentials
|
|
7
|
+
Classifier: Development Status :: 3 - Alpha
|
|
8
|
+
Classifier: Intended Audience :: Science/Research
|
|
9
|
+
Classifier: Programming Language :: Python :: 3
|
|
10
|
+
Classifier: Programming Language :: Python :: 3.10
|
|
11
|
+
Classifier: Programming Language :: Python :: 3.11
|
|
12
|
+
Classifier: Programming Language :: Python :: 3.12
|
|
13
|
+
Classifier: Topic :: Scientific/Engineering :: Chemistry
|
|
14
|
+
Classifier: Topic :: Scientific/Engineering :: Physics
|
|
15
|
+
Requires-Python: >=3.10
|
|
16
|
+
Description-Content-Type: text/markdown
|
|
17
|
+
Requires-Dist: numpy>=1.24
|
|
18
|
+
Requires-Dist: scipy>=1.10
|
|
19
|
+
Provides-Extra: dev
|
|
20
|
+
Requires-Dist: pytest>=8.0; extra == "dev"
|
|
21
|
+
Requires-Dist: ruff>=0.4; extra == "dev"
|
|
22
|
+
Requires-Dist: mypy>=1.10; extra == "dev"
|
|
23
|
+
|
|
24
|
+
# PASTED
|
|
25
|
+
|
|
26
|
+
**P**ointless **A**tom **ST**ructure with **E**ntropy **D**iagnostics
|
|
27
|
+
|
|
28
|
+
PASTED is a structure fuzzer for quantum chemistry and machine learning potential codes.
|
|
29
|
+
|
|
30
|
+
A CLI tool that generates intentionally random, physically meaningless atomic structures and evaluates their degree of disorder through a suite of structural metrics. Useful for stress-testing structure optimizers, generating worst-case inputs for quantum chemistry codes, or studying what "maximum chaos" looks like in structural space.
|
|
31
|
+
|
|
32
|
+
## Features
|
|
33
|
+
|
|
34
|
+
- **Three placement modes** — fully random (`gas`), chain-growth (`chain`), coordination-complex-like (`shell`)
|
|
35
|
+
- **10 disorder metrics** computed per structure, all usable as output filters
|
|
36
|
+
- **Element pool** specified by atomic number (Z = 1–106, H through Sg); composition sampled randomly per structure
|
|
37
|
+
- **Always outputs `--n-atoms` atoms** — placement is unrestricted; Pyykkö covalent radii are enforced by mandatory post-placement repulsion relaxation
|
|
38
|
+
- **Auto-scaled `--cutoff`** — defaults to `cov_scale × 1.5 × median(r_i + r_j)` over the element pool, so graph and Steinhardt metrics are meaningful regardless of which elements are used
|
|
39
|
+
- Charge/multiplicity parity validation before any geometry is generated
|
|
40
|
+
- Multi-structure batch generation with `--n-samples`; per-structure progress on stderr, XYZ on stdout
|
|
41
|
+
- Reproducible runs via `--seed`
|
|
42
|
+
|
|
43
|
+
## Requirements
|
|
44
|
+
|
|
45
|
+
```
|
|
46
|
+
Python >= 3.10
|
|
47
|
+
numpy
|
|
48
|
+
scipy
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pip install numpy scipy
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Installation
|
|
56
|
+
|
|
57
|
+
```bash
|
|
58
|
+
git clone https://github.com/yourname/pasted.git
|
|
59
|
+
cd pasted
|
|
60
|
+
# no build step required; run directly
|
|
61
|
+
python pasted.py --help
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
## Quick Start
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
# 10 atoms drawn from H–Zn, placed randomly in a sphere of radius 8 Å
|
|
68
|
+
python pasted.py --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
|
|
69
|
+
--mode gas --region sphere:8
|
|
70
|
+
|
|
71
|
+
# Organic-looking chain structure (C/N/O only)
|
|
72
|
+
python pasted.py --n-atoms 15 --elements 6,7,8 --charge 0 --mult 1 \
|
|
73
|
+
--mode chain --branch-prob 0.4 --n-samples 20 -o organic_junk.xyz
|
|
74
|
+
|
|
75
|
+
# Coordination-complex-like structure with Fe as center
|
|
76
|
+
python pasted.py --n-atoms 12 --elements 6,7,8,26 --charge 0 --mult 1 \
|
|
77
|
+
--mode shell --center-z 26 --coord-range 4:6 --n-samples 10
|
|
78
|
+
|
|
79
|
+
# Generate 100 structures, keep only the most disordered ones
|
|
80
|
+
python pasted.py --n-atoms 12 --elements 1-30 --charge 0 --mult 1 \
|
|
81
|
+
--mode gas --region sphere:9 --n-samples 100 \
|
|
82
|
+
--filter H_total:2.0:- --filter shape_aniso:0.3:- -o disordered.xyz
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
## Placement Modes
|
|
86
|
+
|
|
87
|
+
### `gas` (default)
|
|
88
|
+
|
|
89
|
+
Atoms are placed independently and uniformly at random inside the specified region.
|
|
90
|
+
Closest to true spatial randomness; highest expected `H_spatial`.
|
|
91
|
+
|
|
92
|
+
```
|
|
93
|
+
--region sphere:R sphere of radius R Å
|
|
94
|
+
--region box:L cube of side L Å
|
|
95
|
+
--region box:LX,LY,LZ orthorhombic box
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### `chain`
|
|
99
|
+
|
|
100
|
+
Atoms grow one by one from a seed atom via a random walk with directional persistence.
|
|
101
|
+
At each step, a random active tip is selected and the new atom is placed at a random bond length.
|
|
102
|
+
The direction of each step is constrained by `--chain-persist` to avoid self-tangling.
|
|
103
|
+
A branching probability controls whether the old tip is kept (branch) or replaced (linear advance).
|
|
104
|
+
Produces elongated, tree-like structures.
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
--branch-prob FLOAT branching probability (default: 0.3)
|
|
108
|
+
--chain-persist FLOAT directional persistence 0.0–1.0 (default: 0.5)
|
|
109
|
+
0.0 = fully random (may self-tangle)
|
|
110
|
+
0.5 = rear 120° cone excluded
|
|
111
|
+
1.0 = front hemisphere only, nearly straight
|
|
112
|
+
--bond-range LO:HI bond length range in Å (default: 1.2:1.6)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### `shell`
|
|
116
|
+
|
|
117
|
+
One atom is placed at the origin as the "center", surrounded by a coordination shell at a random radius,
|
|
118
|
+
followed by tail atoms growing from shell members.
|
|
119
|
+
Produces structures that superficially resemble coordination complexes.
|
|
120
|
+
|
|
121
|
+
```
|
|
122
|
+
--center-z Z atomic number of center atom
|
|
123
|
+
(default: random from the sample's composition)
|
|
124
|
+
--coord-range MIN:MAX coordination number range (default: 4:8)
|
|
125
|
+
--shell-radius LO:HI shell radius range in Å (default: 1.8:2.5)
|
|
126
|
+
--bond-range LO:HI tail bond length range in Å (default: 1.2:1.6)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
The center atom and its Z are recorded in the XYZ comment line as `center=Fe(Z=26)`.
|
|
130
|
+
|
|
131
|
+
## Element Pool
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
--elements SPEC
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
Elements are specified by atomic number. Omit to use all supported elements (Z = 1–106).
|
|
138
|
+
|
|
139
|
+
| Syntax | Meaning |
|
|
140
|
+
|--------|---------|
|
|
141
|
+
| `1-30` | Z = 1 through 30 (H to Zn) |
|
|
142
|
+
| `6,7,8` | Z = 6, 7, 8 (C, N, O) |
|
|
143
|
+
| `1-10,26,28` | Z = 1–10 plus Fe(26) and Ni(28) |
|
|
144
|
+
| `72-80` | 5d metals Hf through Hg |
|
|
145
|
+
| *(omitted)* | all Z = 1–106 |
|
|
146
|
+
|
|
147
|
+
For each structure, `--n-atoms` elements are drawn independently and uniformly from this pool.
|
|
148
|
+
The resulting composition varies per sample.
|
|
149
|
+
|
|
150
|
+
If H (Z = 1) is in the pool and the sampled composition contains no hydrogen, a random number of H atoms is automatically appended (approximately `1 + uniform(0,1) × n_atoms × 1.2`). This can be disabled with `--no-add-hydrogen`.
|
|
151
|
+
|
|
152
|
+
## Charge and Multiplicity
|
|
153
|
+
|
|
154
|
+
`--charge` and `--mult` are required and apply to every generated structure.
|
|
155
|
+
Before placement, PASTED checks two conditions against the randomly sampled composition:
|
|
156
|
+
|
|
157
|
+
1. Total electron count `N_e = Σ Z − charge > 0`
|
|
158
|
+
2. Parity: `N_e % 2 == (mult − 1) % 2`
|
|
159
|
+
|
|
160
|
+
Structures that fail either check are logged as `[invalid]` and skipped.
|
|
161
|
+
High-spin vs. low-spin selection is **not enforced**; that is the user's responsibility.
|
|
162
|
+
|
|
163
|
+
Because composition is random, parity failures are common when the element pool contains many odd-Z elements and `mult=1` is specified. Increasing `--n-samples` or using `--mult 2` reduces this.
|
|
164
|
+
|
|
165
|
+
## Interatomic Distance Control
|
|
166
|
+
|
|
167
|
+
PASTED enforces a minimum interatomic distance using Pyykkö single-bond covalent radii (Pyykkö & Atsumi, *Chem. Eur. J.* **15**, 186–197, 2009).
|
|
168
|
+
|
|
169
|
+
The threshold for each atom pair (i, j) is:
|
|
170
|
+
|
|
171
|
+
```
|
|
172
|
+
d_min(i, j) = cov_scale × (r_i + r_j)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
- Default `--cov-scale 1.0` = exact sum of covalent radii.
|
|
176
|
+
- Values below 1.0 allow closer contacts; values above 1.0 enforce additional clearance.
|
|
177
|
+
- Z > 86 (Fr through Sg): no single-bond literature values are available. PASTED uses the same-group nearest lighter element as a proxy (e.g. Fr → Cs, U → Nd, Rf → Hf).
|
|
178
|
+
|
|
179
|
+
### Post-placement repulsion relaxation
|
|
180
|
+
|
|
181
|
+
Placement does **not** check for distance violations — atoms are placed freely in the requested geometry (region/chain/shell). After placement, a mandatory **repulsion relaxation** step resolves all violations iteratively: for each pair below the threshold, both atoms are pushed apart along their connecting vector by half the deficit. This repeats until no violations remain or `--relax-cycles` is exhausted.
|
|
182
|
+
|
|
183
|
+
This design guarantees that `--n-atoms` atoms are always placed, regardless of how crowded the initial configuration is. If relaxation does not converge within `--relax-cycles`, a `[warn]` line is printed to stderr and the structure is output as-is.
|
|
184
|
+
|
|
185
|
+
## Disorder Metrics
|
|
186
|
+
|
|
187
|
+
All metrics are computed for every structure and embedded in the XYZ comment line.
|
|
188
|
+
All are usable in `--filter`.
|
|
189
|
+
|
|
190
|
+
| Metric | Description | Range |
|
|
191
|
+
|--------|-------------|-------|
|
|
192
|
+
| `H_atom` | Shannon entropy of element composition | 0 (single element) to ln(*k*) |
|
|
193
|
+
| `H_spatial` | Shannon entropy of the pairwise-distance histogram | higher = more uniform distances |
|
|
194
|
+
| `H_total` | Weighted sum: `w_atom · H_atom + w_spatial · H_spatial` | — |
|
|
195
|
+
| `RDF_dev` | RMS deviation of empirical *g*(*r*) from ideal-gas baseline | 0 = perfectly random |
|
|
196
|
+
| `shape_aniso` | Relative shape anisotropy from the gyration tensor | 0 = spherical, 1 = rod-like |
|
|
197
|
+
| `Q4`, `Q6`, `Q8` | Steinhardt bond-orientational order parameters (averaged over atoms) | 0 = disordered |
|
|
198
|
+
| `graph_lcc` | Fraction of atoms in the largest connected component at `--cutoff` | 0–1 |
|
|
199
|
+
| `graph_cc` | Mean clustering coefficient at `--cutoff` | 0–1 |
|
|
200
|
+
|
|
201
|
+
### Distance cutoff for graph and Steinhardt metrics
|
|
202
|
+
|
|
203
|
+
The `--cutoff` parameter determines which atom pairs are considered "connected" for `graph_lcc`, `graph_cc`, and `Q4/Q6/Q8`. Setting this too small relative to the actual interatomic distances causes all metrics to collapse to zero (no neighbours found); setting it too large makes every atom a neighbour of every other.
|
|
204
|
+
|
|
205
|
+
By default, `--cutoff` is set automatically to:
|
|
206
|
+
|
|
207
|
+
```
|
|
208
|
+
cutoff = cov_scale × 1.5 × median(r_i + r_j) over all element-pool pairs
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
This scales with the element pool: light-element pools (e.g. C/N/O) get a cutoff around 2.1 Å; 5d-metal pools get around 3.8 Å. The auto value is printed to stderr at startup:
|
|
212
|
+
|
|
213
|
+
```
|
|
214
|
+
[cutoff] 2.130 Å (auto: cov_scale=1.0 × 1.5 × median(r_i+r_j)=1.420 Å)
|
|
215
|
+
```
|
|
216
|
+
|
|
217
|
+
Override with `--cutoff FLOAT` when needed.
|
|
218
|
+
|
|
219
|
+
### Other metric tuning
|
|
220
|
+
|
|
221
|
+
```
|
|
222
|
+
--n-bins N histogram bins for H_spatial and RDF_dev (default: 20)
|
|
223
|
+
--w-atom FLOAT weight of H_atom in H_total (default: 0.5)
|
|
224
|
+
--w-spatial FLOAT weight of H_spatial in H_total (default: 0.5)
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
## Filtering
|
|
228
|
+
|
|
229
|
+
```
|
|
230
|
+
--filter METRIC:MIN:MAX
|
|
231
|
+
```
|
|
232
|
+
|
|
233
|
+
Only structures whose metric falls in [MIN, MAX] are written to output.
|
|
234
|
+
Use `-` for an open bound.
|
|
235
|
+
The flag is repeatable; all conditions must be satisfied simultaneously.
|
|
236
|
+
|
|
237
|
+
```bash
|
|
238
|
+
# Keep structures with high total entropy
|
|
239
|
+
--filter H_total:2.0:-
|
|
240
|
+
|
|
241
|
+
# Keep elongated structures (rod-like)
|
|
242
|
+
--filter shape_aniso:0.5:-
|
|
243
|
+
|
|
244
|
+
# Keep well-connected chains
|
|
245
|
+
--filter graph_lcc:0.8:- --filter graph_cc:0.4:-
|
|
246
|
+
|
|
247
|
+
# Keep structures with low local order (no accidental crystallinity)
|
|
248
|
+
--filter Q6:-:0.4
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
## Output Format
|
|
252
|
+
|
|
253
|
+
Structures are written as a concatenated multi-structure XYZ file.
|
|
254
|
+
Progress and statistics are written to stderr; the XYZ data goes to stdout (or `--output`).
|
|
255
|
+
|
|
256
|
+
```
|
|
257
|
+
12
|
|
258
|
+
sample=3 mode=chain charge=+0 mult=1 comp=[C:4,N:5,O:3] H_atom=1.0986 H_spatial=2.7812 H_total=1.9399 RDF_dev=3.2451 shape_aniso=0.5123 Q4=0.5210 Q6=0.5880 Q8=0.6014 graph_lcc=1.0000 graph_cc=0.5714
|
|
259
|
+
C 1.234567 -0.987654 2.345678
|
|
260
|
+
N -1.456789 3.210987 -0.123456
|
|
261
|
+
...
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
```bash
|
|
265
|
+
# XYZ to file, progress to terminal
|
|
266
|
+
python pasted.py ... -o out.xyz
|
|
267
|
+
|
|
268
|
+
# pipe XYZ, discard progress
|
|
269
|
+
python pasted.py ... 2>/dev/null | downstream_tool
|
|
270
|
+
|
|
271
|
+
# progress only (dry run to check filter hit rate)
|
|
272
|
+
python pasted.py ... -o /dev/null
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
## Full Option Reference
|
|
276
|
+
|
|
277
|
+
```
|
|
278
|
+
required:
|
|
279
|
+
--n-atoms N number of atoms per structure
|
|
280
|
+
--charge INT total system charge
|
|
281
|
+
--mult INT spin multiplicity 2S+1
|
|
282
|
+
|
|
283
|
+
placement mode:
|
|
284
|
+
--mode {gas,chain,shell}
|
|
285
|
+
--region SPEC [gas] sphere:R | box:L | box:LX,LY,LZ
|
|
286
|
+
--branch-prob FLOAT [chain] branching probability (default: 0.3)
|
|
287
|
+
--chain-persist FLOAT [chain] directional persistence 0.0–1.0 (default: 0.5)
|
|
288
|
+
--bond-range LO:HI [chain/shell] bond length range Å (default: 1.2:1.6)
|
|
289
|
+
--center-z Z [shell] fix center atom by atomic number
|
|
290
|
+
--coord-range MIN:MAX [shell] coordination number range (default: 4:8)
|
|
291
|
+
--shell-radius LO:HI [shell] shell radius range Å (default: 1.8:2.5)
|
|
292
|
+
|
|
293
|
+
elements:
|
|
294
|
+
--elements SPEC atomic-number spec (default: all Z=1-106)
|
|
295
|
+
|
|
296
|
+
placement:
|
|
297
|
+
--cov-scale FLOAT min dist = cov_scale × (r_i + r_j), Pyykkö radii (default: 1.0)
|
|
298
|
+
--relax-cycles INT max cycles for post-placement repulsion relaxation (default: 1500)
|
|
299
|
+
--no-add-hydrogen disable automatic H augmentation
|
|
300
|
+
|
|
301
|
+
sampling:
|
|
302
|
+
--n-samples INT number of structures to attempt (default: 1)
|
|
303
|
+
--seed INT random seed
|
|
304
|
+
|
|
305
|
+
metrics:
|
|
306
|
+
--n-bins INT histogram bins (default: 20)
|
|
307
|
+
--w-atom FLOAT H_atom weight in H_total (default: 0.5)
|
|
308
|
+
--w-spatial FLOAT H_spatial weight in H_total (default: 0.5)
|
|
309
|
+
--cutoff FLOAT distance cutoff Å for Q_l / graph_*
|
|
310
|
+
(default: auto = cov_scale × 1.5 × median(r_i+r_j))
|
|
311
|
+
|
|
312
|
+
filtering:
|
|
313
|
+
--filter METRIC:MIN:MAX repeatable; use - for open bound
|
|
314
|
+
|
|
315
|
+
output:
|
|
316
|
+
--validate check charge/mult against one random composition, then exit
|
|
317
|
+
-o / --output FILE XYZ output file (default: stdout)
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
## Notes and Limitations
|
|
321
|
+
|
|
322
|
+
- **Interatomic distances** use Pyykkö (2009) single-bond covalent radii. For Z > 86 (Fr through Sg), same-group proxies are used (e.g. Fr → Cs, U → Nd, Rf → Hf).
|
|
323
|
+
- **Repulsion relaxation** guarantees that no pair falls below `cov_scale × (r_i + r_j)` when it converges. If `[warn] relax_positions did not converge` appears, the structure may contain marginal violations but is still output. Increase `--relax-cycles` if convergence is important.
|
|
324
|
+
- **Auto cutoff** is computed from the element pool before any structures are generated and is fixed for the entire run. If the actual composition drawn per sample is much lighter or heavier than the pool median, the effective neighbour count may still be low or high. Use `--cutoff` to override when needed.
|
|
325
|
+
- **RDF_dev** is a finite-system approximation; treat it as a relative indicator.
|
|
326
|
+
- Charge/mult parity failures are common with large element pools and `mult=1`. Increase `--n-samples` or use `--mult 2` to compensate.
|
|
327
|
+
|
|
328
|
+
## License
|
|
329
|
+
|
|
330
|
+
MIT License. See [LICENSE](LICENSE).
|
pasted-0.1.0/README.md
ADDED
|
@@ -0,0 +1,307 @@
|
|
|
1
|
+
# PASTED
|
|
2
|
+
|
|
3
|
+
**P**ointless **A**tom **ST**ructure with **E**ntropy **D**iagnostics
|
|
4
|
+
|
|
5
|
+
PASTED is a structure fuzzer for quantum chemistry and machine learning potential codes.
|
|
6
|
+
|
|
7
|
+
A CLI tool that generates intentionally random, physically meaningless atomic structures and evaluates their degree of disorder through a suite of structural metrics. Useful for stress-testing structure optimizers, generating worst-case inputs for quantum chemistry codes, or studying what "maximum chaos" looks like in structural space.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
- **Three placement modes** — fully random (`gas`), chain-growth (`chain`), coordination-complex-like (`shell`)
|
|
12
|
+
- **10 disorder metrics** computed per structure, all usable as output filters
|
|
13
|
+
- **Element pool** specified by atomic number (Z = 1–106, H through Sg); composition sampled randomly per structure
|
|
14
|
+
- **Always outputs `--n-atoms` atoms** — placement is unrestricted; Pyykkö covalent radii are enforced by mandatory post-placement repulsion relaxation
|
|
15
|
+
- **Auto-scaled `--cutoff`** — defaults to `cov_scale × 1.5 × median(r_i + r_j)` over the element pool, so graph and Steinhardt metrics are meaningful regardless of which elements are used
|
|
16
|
+
- Charge/multiplicity parity validation before any geometry is generated
|
|
17
|
+
- Multi-structure batch generation with `--n-samples`; per-structure progress on stderr, XYZ on stdout
|
|
18
|
+
- Reproducible runs via `--seed`
|
|
19
|
+
|
|
20
|
+
## Requirements
|
|
21
|
+
|
|
22
|
+
```
|
|
23
|
+
Python >= 3.10
|
|
24
|
+
numpy
|
|
25
|
+
scipy
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
pip install numpy scipy
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
## Installation
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
git clone https://github.com/yourname/pasted.git
|
|
36
|
+
cd pasted
|
|
37
|
+
# no build step required; run directly
|
|
38
|
+
python pasted.py --help
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Quick Start
|
|
42
|
+
|
|
43
|
+
```bash
|
|
44
|
+
# 10 atoms drawn from H–Zn, placed randomly in a sphere of radius 8 Å
|
|
45
|
+
python pasted.py --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
|
|
46
|
+
--mode gas --region sphere:8
|
|
47
|
+
|
|
48
|
+
# Organic-looking chain structure (C/N/O only)
|
|
49
|
+
python pasted.py --n-atoms 15 --elements 6,7,8 --charge 0 --mult 1 \
|
|
50
|
+
--mode chain --branch-prob 0.4 --n-samples 20 -o organic_junk.xyz
|
|
51
|
+
|
|
52
|
+
# Coordination-complex-like structure with Fe as center
|
|
53
|
+
python pasted.py --n-atoms 12 --elements 6,7,8,26 --charge 0 --mult 1 \
|
|
54
|
+
--mode shell --center-z 26 --coord-range 4:6 --n-samples 10
|
|
55
|
+
|
|
56
|
+
# Generate 100 structures, keep only the most disordered ones
|
|
57
|
+
python pasted.py --n-atoms 12 --elements 1-30 --charge 0 --mult 1 \
|
|
58
|
+
--mode gas --region sphere:9 --n-samples 100 \
|
|
59
|
+
--filter H_total:2.0:- --filter shape_aniso:0.3:- -o disordered.xyz
|
|
60
|
+
```
|
|
61
|
+
|
|
62
|
+
## Placement Modes
|
|
63
|
+
|
|
64
|
+
### `gas` (default)
|
|
65
|
+
|
|
66
|
+
Atoms are placed independently and uniformly at random inside the specified region.
|
|
67
|
+
Closest to true spatial randomness; highest expected `H_spatial`.
|
|
68
|
+
|
|
69
|
+
```
|
|
70
|
+
--region sphere:R sphere of radius R Å
|
|
71
|
+
--region box:L cube of side L Å
|
|
72
|
+
--region box:LX,LY,LZ orthorhombic box
|
|
73
|
+
```
|
|
74
|
+
|
|
75
|
+
### `chain`
|
|
76
|
+
|
|
77
|
+
Atoms grow one by one from a seed atom via a random walk with directional persistence.
|
|
78
|
+
At each step, a random active tip is selected and the new atom is placed at a random bond length.
|
|
79
|
+
The direction of each step is constrained by `--chain-persist` to avoid self-tangling.
|
|
80
|
+
A branching probability controls whether the old tip is kept (branch) or replaced (linear advance).
|
|
81
|
+
Produces elongated, tree-like structures.
|
|
82
|
+
|
|
83
|
+
```
|
|
84
|
+
--branch-prob FLOAT branching probability (default: 0.3)
|
|
85
|
+
--chain-persist FLOAT directional persistence 0.0–1.0 (default: 0.5)
|
|
86
|
+
0.0 = fully random (may self-tangle)
|
|
87
|
+
0.5 = rear 120° cone excluded
|
|
88
|
+
1.0 = front hemisphere only, nearly straight
|
|
89
|
+
--bond-range LO:HI bond length range in Å (default: 1.2:1.6)
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### `shell`
|
|
93
|
+
|
|
94
|
+
One atom is placed at the origin as the "center", surrounded by a coordination shell at a random radius,
|
|
95
|
+
followed by tail atoms growing from shell members.
|
|
96
|
+
Produces structures that superficially resemble coordination complexes.
|
|
97
|
+
|
|
98
|
+
```
|
|
99
|
+
--center-z Z atomic number of center atom
|
|
100
|
+
(default: random from the sample's composition)
|
|
101
|
+
--coord-range MIN:MAX coordination number range (default: 4:8)
|
|
102
|
+
--shell-radius LO:HI shell radius range in Å (default: 1.8:2.5)
|
|
103
|
+
--bond-range LO:HI tail bond length range in Å (default: 1.2:1.6)
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
The center atom and its Z are recorded in the XYZ comment line as `center=Fe(Z=26)`.
|
|
107
|
+
|
|
108
|
+
## Element Pool
|
|
109
|
+
|
|
110
|
+
```
|
|
111
|
+
--elements SPEC
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
Elements are specified by atomic number. Omit to use all supported elements (Z = 1–106).
|
|
115
|
+
|
|
116
|
+
| Syntax | Meaning |
|
|
117
|
+
|--------|---------|
|
|
118
|
+
| `1-30` | Z = 1 through 30 (H to Zn) |
|
|
119
|
+
| `6,7,8` | Z = 6, 7, 8 (C, N, O) |
|
|
120
|
+
| `1-10,26,28` | Z = 1–10 plus Fe(26) and Ni(28) |
|
|
121
|
+
| `72-80` | 5d metals Hf through Hg |
|
|
122
|
+
| *(omitted)* | all Z = 1–106 |
|
|
123
|
+
|
|
124
|
+
For each structure, `--n-atoms` elements are drawn independently and uniformly from this pool.
|
|
125
|
+
The resulting composition varies per sample.
|
|
126
|
+
|
|
127
|
+
If H (Z = 1) is in the pool and the sampled composition contains no hydrogen, a random number of H atoms is automatically appended (approximately `1 + uniform(0,1) × n_atoms × 1.2`). This can be disabled with `--no-add-hydrogen`.
|
|
128
|
+
|
|
129
|
+
## Charge and Multiplicity
|
|
130
|
+
|
|
131
|
+
`--charge` and `--mult` are required and apply to every generated structure.
|
|
132
|
+
Before placement, PASTED checks two conditions against the randomly sampled composition:
|
|
133
|
+
|
|
134
|
+
1. Total electron count `N_e = Σ Z − charge > 0`
|
|
135
|
+
2. Parity: `N_e % 2 == (mult − 1) % 2`
|
|
136
|
+
|
|
137
|
+
Structures that fail either check are logged as `[invalid]` and skipped.
|
|
138
|
+
High-spin vs. low-spin selection is **not enforced**; that is the user's responsibility.
|
|
139
|
+
|
|
140
|
+
Because composition is random, parity failures are common when the element pool contains many odd-Z elements and `mult=1` is specified. Increasing `--n-samples` or using `--mult 2` reduces this.
|
|
141
|
+
|
|
142
|
+
## Interatomic Distance Control
|
|
143
|
+
|
|
144
|
+
PASTED enforces a minimum interatomic distance using Pyykkö single-bond covalent radii (Pyykkö & Atsumi, *Chem. Eur. J.* **15**, 186–197, 2009).
|
|
145
|
+
|
|
146
|
+
The threshold for each atom pair (i, j) is:
|
|
147
|
+
|
|
148
|
+
```
|
|
149
|
+
d_min(i, j) = cov_scale × (r_i + r_j)
|
|
150
|
+
```
|
|
151
|
+
|
|
152
|
+
- Default `--cov-scale 1.0` = exact sum of covalent radii.
|
|
153
|
+
- Values below 1.0 allow closer contacts; values above 1.0 enforce additional clearance.
|
|
154
|
+
- Z > 86 (Fr through Sg): no single-bond literature values are available. PASTED uses the same-group nearest lighter element as a proxy (e.g. Fr → Cs, U → Nd, Rf → Hf).
|
|
155
|
+
|
|
156
|
+
### Post-placement repulsion relaxation
|
|
157
|
+
|
|
158
|
+
Placement does **not** check for distance violations — atoms are placed freely in the requested geometry (region/chain/shell). After placement, a mandatory **repulsion relaxation** step resolves all violations iteratively: for each pair below the threshold, both atoms are pushed apart along their connecting vector by half the deficit. This repeats until no violations remain or `--relax-cycles` is exhausted.
|
|
159
|
+
|
|
160
|
+
This design guarantees that `--n-atoms` atoms are always placed, regardless of how crowded the initial configuration is. If relaxation does not converge within `--relax-cycles`, a `[warn]` line is printed to stderr and the structure is output as-is.
|
|
161
|
+
|
|
162
|
+
## Disorder Metrics
|
|
163
|
+
|
|
164
|
+
All metrics are computed for every structure and embedded in the XYZ comment line.
|
|
165
|
+
All are usable in `--filter`.
|
|
166
|
+
|
|
167
|
+
| Metric | Description | Range |
|
|
168
|
+
|--------|-------------|-------|
|
|
169
|
+
| `H_atom` | Shannon entropy of element composition | 0 (single element) to ln(*k*) |
|
|
170
|
+
| `H_spatial` | Shannon entropy of the pairwise-distance histogram | higher = more uniform distances |
|
|
171
|
+
| `H_total` | Weighted sum: `w_atom · H_atom + w_spatial · H_spatial` | — |
|
|
172
|
+
| `RDF_dev` | RMS deviation of empirical *g*(*r*) from ideal-gas baseline | 0 = perfectly random |
|
|
173
|
+
| `shape_aniso` | Relative shape anisotropy from the gyration tensor | 0 = spherical, 1 = rod-like |
|
|
174
|
+
| `Q4`, `Q6`, `Q8` | Steinhardt bond-orientational order parameters (averaged over atoms) | 0 = disordered |
|
|
175
|
+
| `graph_lcc` | Fraction of atoms in the largest connected component at `--cutoff` | 0–1 |
|
|
176
|
+
| `graph_cc` | Mean clustering coefficient at `--cutoff` | 0–1 |
|
|
177
|
+
|
|
178
|
+
### Distance cutoff for graph and Steinhardt metrics
|
|
179
|
+
|
|
180
|
+
The `--cutoff` parameter determines which atom pairs are considered "connected" for `graph_lcc`, `graph_cc`, and `Q4/Q6/Q8`. Setting this too small relative to the actual interatomic distances causes all metrics to collapse to zero (no neighbours found); setting it too large makes every atom a neighbour of every other.
|
|
181
|
+
|
|
182
|
+
By default, `--cutoff` is set automatically to:
|
|
183
|
+
|
|
184
|
+
```
|
|
185
|
+
cutoff = cov_scale × 1.5 × median(r_i + r_j) over all element-pool pairs
|
|
186
|
+
```
|
|
187
|
+
|
|
188
|
+
This scales with the element pool: light-element pools (e.g. C/N/O) get a cutoff around 2.1 Å; 5d-metal pools get around 3.8 Å. The auto value is printed to stderr at startup:
|
|
189
|
+
|
|
190
|
+
```
|
|
191
|
+
[cutoff] 2.130 Å (auto: cov_scale=1.0 × 1.5 × median(r_i+r_j)=1.420 Å)
|
|
192
|
+
```
|
|
193
|
+
|
|
194
|
+
Override with `--cutoff FLOAT` when needed.
|
|
195
|
+
|
|
196
|
+
### Other metric tuning
|
|
197
|
+
|
|
198
|
+
```
|
|
199
|
+
--n-bins N histogram bins for H_spatial and RDF_dev (default: 20)
|
|
200
|
+
--w-atom FLOAT weight of H_atom in H_total (default: 0.5)
|
|
201
|
+
--w-spatial FLOAT weight of H_spatial in H_total (default: 0.5)
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
## Filtering
|
|
205
|
+
|
|
206
|
+
```
|
|
207
|
+
--filter METRIC:MIN:MAX
|
|
208
|
+
```
|
|
209
|
+
|
|
210
|
+
Only structures whose metric falls in [MIN, MAX] are written to output.
|
|
211
|
+
Use `-` for an open bound.
|
|
212
|
+
The flag is repeatable; all conditions must be satisfied simultaneously.
|
|
213
|
+
|
|
214
|
+
```bash
|
|
215
|
+
# Keep structures with high total entropy
|
|
216
|
+
--filter H_total:2.0:-
|
|
217
|
+
|
|
218
|
+
# Keep elongated structures (rod-like)
|
|
219
|
+
--filter shape_aniso:0.5:-
|
|
220
|
+
|
|
221
|
+
# Keep well-connected chains
|
|
222
|
+
--filter graph_lcc:0.8:- --filter graph_cc:0.4:-
|
|
223
|
+
|
|
224
|
+
# Keep structures with low local order (no accidental crystallinity)
|
|
225
|
+
--filter Q6:-:0.4
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
## Output Format
|
|
229
|
+
|
|
230
|
+
Structures are written as a concatenated multi-structure XYZ file.
|
|
231
|
+
Progress and statistics are written to stderr; the XYZ data goes to stdout (or `--output`).
|
|
232
|
+
|
|
233
|
+
```
|
|
234
|
+
12
|
|
235
|
+
sample=3 mode=chain charge=+0 mult=1 comp=[C:4,N:5,O:3] H_atom=1.0986 H_spatial=2.7812 H_total=1.9399 RDF_dev=3.2451 shape_aniso=0.5123 Q4=0.5210 Q6=0.5880 Q8=0.6014 graph_lcc=1.0000 graph_cc=0.5714
|
|
236
|
+
C 1.234567 -0.987654 2.345678
|
|
237
|
+
N -1.456789 3.210987 -0.123456
|
|
238
|
+
...
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
```bash
|
|
242
|
+
# XYZ to file, progress to terminal
|
|
243
|
+
python pasted.py ... -o out.xyz
|
|
244
|
+
|
|
245
|
+
# pipe XYZ, discard progress
|
|
246
|
+
python pasted.py ... 2>/dev/null | downstream_tool
|
|
247
|
+
|
|
248
|
+
# progress only (dry run to check filter hit rate)
|
|
249
|
+
python pasted.py ... -o /dev/null
|
|
250
|
+
```
|
|
251
|
+
|
|
252
|
+
## Full Option Reference
|
|
253
|
+
|
|
254
|
+
```
|
|
255
|
+
required:
|
|
256
|
+
--n-atoms N number of atoms per structure
|
|
257
|
+
--charge INT total system charge
|
|
258
|
+
--mult INT spin multiplicity 2S+1
|
|
259
|
+
|
|
260
|
+
placement mode:
|
|
261
|
+
--mode {gas,chain,shell}
|
|
262
|
+
--region SPEC [gas] sphere:R | box:L | box:LX,LY,LZ
|
|
263
|
+
--branch-prob FLOAT [chain] branching probability (default: 0.3)
|
|
264
|
+
--chain-persist FLOAT [chain] directional persistence 0.0–1.0 (default: 0.5)
|
|
265
|
+
--bond-range LO:HI [chain/shell] bond length range Å (default: 1.2:1.6)
|
|
266
|
+
--center-z Z [shell] fix center atom by atomic number
|
|
267
|
+
--coord-range MIN:MAX [shell] coordination number range (default: 4:8)
|
|
268
|
+
--shell-radius LO:HI [shell] shell radius range Å (default: 1.8:2.5)
|
|
269
|
+
|
|
270
|
+
elements:
|
|
271
|
+
--elements SPEC atomic-number spec (default: all Z=1-106)
|
|
272
|
+
|
|
273
|
+
placement:
|
|
274
|
+
--cov-scale FLOAT min dist = cov_scale × (r_i + r_j), Pyykkö radii (default: 1.0)
|
|
275
|
+
--relax-cycles INT max cycles for post-placement repulsion relaxation (default: 1500)
|
|
276
|
+
--no-add-hydrogen disable automatic H augmentation
|
|
277
|
+
|
|
278
|
+
sampling:
|
|
279
|
+
--n-samples INT number of structures to attempt (default: 1)
|
|
280
|
+
--seed INT random seed
|
|
281
|
+
|
|
282
|
+
metrics:
|
|
283
|
+
--n-bins INT histogram bins (default: 20)
|
|
284
|
+
--w-atom FLOAT H_atom weight in H_total (default: 0.5)
|
|
285
|
+
--w-spatial FLOAT H_spatial weight in H_total (default: 0.5)
|
|
286
|
+
--cutoff FLOAT distance cutoff Å for Q_l / graph_*
|
|
287
|
+
(default: auto = cov_scale × 1.5 × median(r_i+r_j))
|
|
288
|
+
|
|
289
|
+
filtering:
|
|
290
|
+
--filter METRIC:MIN:MAX repeatable; use - for open bound
|
|
291
|
+
|
|
292
|
+
output:
|
|
293
|
+
--validate check charge/mult against one random composition, then exit
|
|
294
|
+
-o / --output FILE XYZ output file (default: stdout)
|
|
295
|
+
```
|
|
296
|
+
|
|
297
|
+
## Notes and Limitations
|
|
298
|
+
|
|
299
|
+
- **Interatomic distances** use Pyykkö (2009) single-bond covalent radii. For Z > 86 (Fr through Sg), same-group proxies are used (e.g. Fr → Cs, U → Nd, Rf → Hf).
|
|
300
|
+
- **Repulsion relaxation** guarantees that no pair falls below `cov_scale × (r_i + r_j)` when it converges. If `[warn] relax_positions did not converge` appears, the structure may contain marginal violations but is still output. Increase `--relax-cycles` if convergence is important.
|
|
301
|
+
- **Auto cutoff** is computed from the element pool before any structures are generated and is fixed for the entire run. If the actual composition drawn per sample is much lighter or heavier than the pool median, the effective neighbour count may still be low or high. Use `--cutoff` to override when needed.
|
|
302
|
+
- **RDF_dev** is a finite-system approximation; treat it as a relative indicator.
|
|
303
|
+
- Charge/mult parity failures are common with large element pools and `mult=1`. Increase `--n-samples` or use `--mult 2` to compensate.
|
|
304
|
+
|
|
305
|
+
## License
|
|
306
|
+
|
|
307
|
+
MIT License. See [LICENSE](LICENSE).
|