pasted 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
pasted-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,330 @@
1
+ Metadata-Version: 2.4
2
+ Name: pasted
3
+ Version: 0.1.0
4
+ Summary: Pointless Atom STructure with Entropy Diagnostics — structure fuzzer for QC/ML-potential codes
5
+ License-Expression: MIT
6
+ Keywords: quantum chemistry,structure generation,fuzzing,machine learning potentials
7
+ Classifier: Development Status :: 3 - Alpha
8
+ Classifier: Intended Audience :: Science/Research
9
+ Classifier: Programming Language :: Python :: 3
10
+ Classifier: Programming Language :: Python :: 3.10
11
+ Classifier: Programming Language :: Python :: 3.11
12
+ Classifier: Programming Language :: Python :: 3.12
13
+ Classifier: Topic :: Scientific/Engineering :: Chemistry
14
+ Classifier: Topic :: Scientific/Engineering :: Physics
15
+ Requires-Python: >=3.10
16
+ Description-Content-Type: text/markdown
17
+ Requires-Dist: numpy>=1.24
18
+ Requires-Dist: scipy>=1.10
19
+ Provides-Extra: dev
20
+ Requires-Dist: pytest>=8.0; extra == "dev"
21
+ Requires-Dist: ruff>=0.4; extra == "dev"
22
+ Requires-Dist: mypy>=1.10; extra == "dev"
23
+
24
+ # PASTED
25
+
26
+ **P**ointless **A**tom **ST**ructure with **E**ntropy **D**iagnostics
27
+
28
+ PASTED is a structure fuzzer for quantum chemistry and machine learning potential codes.
29
+
30
+ A CLI tool that generates intentionally random, physically meaningless atomic structures and evaluates their degree of disorder through a suite of structural metrics. Useful for stress-testing structure optimizers, generating worst-case inputs for quantum chemistry codes, or studying what "maximum chaos" looks like in structural space.
31
+
32
+ ## Features
33
+
34
+ - **Three placement modes** — fully random (`gas`), chain-growth (`chain`), coordination-complex-like (`shell`)
35
+ - **10 disorder metrics** computed per structure, all usable as output filters
36
+ - **Element pool** specified by atomic number (Z = 1–106, H through Sg); composition sampled randomly per structure
37
+ - **Always outputs `--n-atoms` atoms** — placement is unrestricted; Pyykkö covalent radii are enforced by mandatory post-placement repulsion relaxation
38
+ - **Auto-scaled `--cutoff`** — defaults to `cov_scale × 1.5 × median(r_i + r_j)` over the element pool, so graph and Steinhardt metrics are meaningful regardless of which elements are used
39
+ - Charge/multiplicity parity validation before any geometry is generated
40
+ - Multi-structure batch generation with `--n-samples`; per-structure progress on stderr, XYZ on stdout
41
+ - Reproducible runs via `--seed`
42
+
43
+ ## Requirements
44
+
45
+ ```
46
+ Python >= 3.10
47
+ numpy
48
+ scipy
49
+ ```
50
+
51
+ ```bash
52
+ pip install numpy scipy
53
+ ```
54
+
55
+ ## Installation
56
+
57
+ ```bash
58
+ git clone https://github.com/yourname/pasted.git
59
+ cd pasted
60
+ # no build step required; run directly
61
+ python pasted.py --help
62
+ ```
63
+
64
+ ## Quick Start
65
+
66
+ ```bash
67
+ # 10 atoms drawn from H–Zn, placed randomly in a sphere of radius 8 Å
68
+ python pasted.py --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
69
+ --mode gas --region sphere:8
70
+
71
+ # Organic-looking chain structure (C/N/O only)
72
+ python pasted.py --n-atoms 15 --elements 6,7,8 --charge 0 --mult 1 \
73
+ --mode chain --branch-prob 0.4 --n-samples 20 -o organic_junk.xyz
74
+
75
+ # Coordination-complex-like structure with Fe as center
76
+ python pasted.py --n-atoms 12 --elements 6,7,8,26 --charge 0 --mult 1 \
77
+ --mode shell --center-z 26 --coord-range 4:6 --n-samples 10
78
+
79
+ # Generate 100 structures, keep only the most disordered ones
80
+ python pasted.py --n-atoms 12 --elements 1-30 --charge 0 --mult 1 \
81
+ --mode gas --region sphere:9 --n-samples 100 \
82
+ --filter H_total:2.0:- --filter shape_aniso:0.3:- -o disordered.xyz
83
+ ```
84
+
85
+ ## Placement Modes
86
+
87
+ ### `gas` (default)
88
+
89
+ Atoms are placed independently and uniformly at random inside the specified region.
90
+ Closest to true spatial randomness; highest expected `H_spatial`.
91
+
92
+ ```
93
+ --region sphere:R sphere of radius R Å
94
+ --region box:L cube of side L Å
95
+ --region box:LX,LY,LZ orthorhombic box
96
+ ```
97
+
98
+ ### `chain`
99
+
100
+ Atoms grow one by one from a seed atom via a random walk with directional persistence.
101
+ At each step, a random active tip is selected and the new atom is placed at a random bond length.
102
+ The direction of each step is constrained by `--chain-persist` to avoid self-tangling.
103
+ A branching probability controls whether the old tip is kept (branch) or replaced (linear advance).
104
+ Produces elongated, tree-like structures.
105
+
106
+ ```
107
+ --branch-prob FLOAT branching probability (default: 0.3)
108
+ --chain-persist FLOAT directional persistence 0.0–1.0 (default: 0.5)
109
+ 0.0 = fully random (may self-tangle)
110
+ 0.5 = rear 120° cone excluded
111
+ 1.0 = front hemisphere only, nearly straight
112
+ --bond-range LO:HI bond length range in Å (default: 1.2:1.6)
113
+ ```
114
+
115
+ ### `shell`
116
+
117
+ One atom is placed at the origin as the "center", surrounded by a coordination shell at a random radius,
118
+ followed by tail atoms growing from shell members.
119
+ Produces structures that superficially resemble coordination complexes.
120
+
121
+ ```
122
+ --center-z Z atomic number of center atom
123
+ (default: random from the sample's composition)
124
+ --coord-range MIN:MAX coordination number range (default: 4:8)
125
+ --shell-radius LO:HI shell radius range in Å (default: 1.8:2.5)
126
+ --bond-range LO:HI tail bond length range in Å (default: 1.2:1.6)
127
+ ```
128
+
129
+ The center atom and its Z are recorded in the XYZ comment line as `center=Fe(Z=26)`.
130
+
131
+ ## Element Pool
132
+
133
+ ```
134
+ --elements SPEC
135
+ ```
136
+
137
+ Elements are specified by atomic number. Omit to use all supported elements (Z = 1–106).
138
+
139
+ | Syntax | Meaning |
140
+ |--------|---------|
141
+ | `1-30` | Z = 1 through 30 (H to Zn) |
142
+ | `6,7,8` | Z = 6, 7, 8 (C, N, O) |
143
+ | `1-10,26,28` | Z = 1–10 plus Fe(26) and Ni(28) |
144
+ | `72-80` | 5d metals Hf through Hg |
145
+ | *(omitted)* | all Z = 1–106 |
146
+
147
+ For each structure, `--n-atoms` elements are drawn independently and uniformly from this pool.
148
+ The resulting composition varies per sample.
149
+
150
+ If H (Z = 1) is in the pool and the sampled composition contains no hydrogen, a random number of H atoms is automatically appended (approximately `1 + uniform(0,1) × n_atoms × 1.2`). This can be disabled with `--no-add-hydrogen`.
151
+
152
+ ## Charge and Multiplicity
153
+
154
+ `--charge` and `--mult` are required and apply to every generated structure.
155
+ Before placement, PASTED checks two conditions against the randomly sampled composition:
156
+
157
+ 1. Total electron count `N_e = Σ Z − charge > 0`
158
+ 2. Parity: `N_e % 2 == (mult − 1) % 2`
159
+
160
+ Structures that fail either check are logged as `[invalid]` and skipped.
161
+ High-spin vs. low-spin selection is **not enforced**; that is the user's responsibility.
162
+
163
+ Because composition is random, parity failures are common when the element pool contains many odd-Z elements and `mult=1` is specified. Increasing `--n-samples` or using `--mult 2` reduces this.
164
+
165
+ ## Interatomic Distance Control
166
+
167
+ PASTED enforces a minimum interatomic distance using Pyykkö single-bond covalent radii (Pyykkö & Atsumi, *Chem. Eur. J.* **15**, 186–197, 2009).
168
+
169
+ The threshold for each atom pair (i, j) is:
170
+
171
+ ```
172
+ d_min(i, j) = cov_scale × (r_i + r_j)
173
+ ```
174
+
175
+ - Default `--cov-scale 1.0` = exact sum of covalent radii.
176
+ - Values below 1.0 allow closer contacts; values above 1.0 enforce additional clearance.
177
+ - Z > 86 (Fr through Sg): no single-bond literature values are available. PASTED uses the same-group nearest lighter element as a proxy (e.g. Fr → Cs, U → Nd, Rf → Hf).
178
+
179
+ ### Post-placement repulsion relaxation
180
+
181
+ Placement does **not** check for distance violations — atoms are placed freely in the requested geometry (region/chain/shell). After placement, a mandatory **repulsion relaxation** step resolves all violations iteratively: for each pair below the threshold, both atoms are pushed apart along their connecting vector by half the deficit. This repeats until no violations remain or `--relax-cycles` is exhausted.
182
+
183
+ This design guarantees that `--n-atoms` atoms are always placed, regardless of how crowded the initial configuration is. If relaxation does not converge within `--relax-cycles`, a `[warn]` line is printed to stderr and the structure is output as-is.
184
+
185
+ ## Disorder Metrics
186
+
187
+ All metrics are computed for every structure and embedded in the XYZ comment line.
188
+ All are usable in `--filter`.
189
+
190
+ | Metric | Description | Range |
191
+ |--------|-------------|-------|
192
+ | `H_atom` | Shannon entropy of element composition | 0 (single element) to ln(*k*) |
193
+ | `H_spatial` | Shannon entropy of the pairwise-distance histogram | higher = more uniform distances |
194
+ | `H_total` | Weighted sum: `w_atom · H_atom + w_spatial · H_spatial` | — |
195
+ | `RDF_dev` | RMS deviation of empirical *g*(*r*) from ideal-gas baseline | 0 = perfectly random |
196
+ | `shape_aniso` | Relative shape anisotropy from the gyration tensor | 0 = spherical, 1 = rod-like |
197
+ | `Q4`, `Q6`, `Q8` | Steinhardt bond-orientational order parameters (averaged over atoms) | 0 = disordered |
198
+ | `graph_lcc` | Fraction of atoms in the largest connected component at `--cutoff` | 0–1 |
199
+ | `graph_cc` | Mean clustering coefficient at `--cutoff` | 0–1 |
200
+
201
+ ### Distance cutoff for graph and Steinhardt metrics
202
+
203
+ The `--cutoff` parameter determines which atom pairs are considered "connected" for `graph_lcc`, `graph_cc`, and `Q4/Q6/Q8`. Setting this too small relative to the actual interatomic distances causes all metrics to collapse to zero (no neighbours found); setting it too large makes every atom a neighbour of every other.
204
+
205
+ By default, `--cutoff` is set automatically to:
206
+
207
+ ```
208
+ cutoff = cov_scale × 1.5 × median(r_i + r_j) over all element-pool pairs
209
+ ```
210
+
211
+ This scales with the element pool: light-element pools (e.g. C/N/O) get a cutoff around 2.1 Å; 5d-metal pools get around 3.8 Å. The auto value is printed to stderr at startup:
212
+
213
+ ```
214
+ [cutoff] 2.130 Å (auto: cov_scale=1.0 × 1.5 × median(r_i+r_j)=1.420 Å)
215
+ ```
216
+
217
+ Override with `--cutoff FLOAT` when needed.
218
+
219
+ ### Other metric tuning
220
+
221
+ ```
222
+ --n-bins N histogram bins for H_spatial and RDF_dev (default: 20)
223
+ --w-atom FLOAT weight of H_atom in H_total (default: 0.5)
224
+ --w-spatial FLOAT weight of H_spatial in H_total (default: 0.5)
225
+ ```
226
+
227
+ ## Filtering
228
+
229
+ ```
230
+ --filter METRIC:MIN:MAX
231
+ ```
232
+
233
+ Only structures whose metric falls in [MIN, MAX] are written to output.
234
+ Use `-` for an open bound.
235
+ The flag is repeatable; all conditions must be satisfied simultaneously.
236
+
237
+ ```bash
238
+ # Keep structures with high total entropy
239
+ --filter H_total:2.0:-
240
+
241
+ # Keep elongated structures (rod-like)
242
+ --filter shape_aniso:0.5:-
243
+
244
+ # Keep well-connected chains
245
+ --filter graph_lcc:0.8:- --filter graph_cc:0.4:-
246
+
247
+ # Keep structures with low local order (no accidental crystallinity)
248
+ --filter Q6:-:0.4
249
+ ```
250
+
251
+ ## Output Format
252
+
253
+ Structures are written as a concatenated multi-structure XYZ file.
254
+ Progress and statistics are written to stderr; the XYZ data goes to stdout (or `--output`).
255
+
256
+ ```
257
+ 12
258
+ sample=3 mode=chain charge=+0 mult=1 comp=[C:4,N:5,O:3] H_atom=1.0986 H_spatial=2.7812 H_total=1.9399 RDF_dev=3.2451 shape_aniso=0.5123 Q4=0.5210 Q6=0.5880 Q8=0.6014 graph_lcc=1.0000 graph_cc=0.5714
259
+ C 1.234567 -0.987654 2.345678
260
+ N -1.456789 3.210987 -0.123456
261
+ ...
262
+ ```
263
+
264
+ ```bash
265
+ # XYZ to file, progress to terminal
266
+ python pasted.py ... -o out.xyz
267
+
268
+ # pipe XYZ, discard progress
269
+ python pasted.py ... 2>/dev/null | downstream_tool
270
+
271
+ # progress only (dry run to check filter hit rate)
272
+ python pasted.py ... -o /dev/null
273
+ ```
274
+
275
+ ## Full Option Reference
276
+
277
+ ```
278
+ required:
279
+ --n-atoms N number of atoms per structure
280
+ --charge INT total system charge
281
+ --mult INT spin multiplicity 2S+1
282
+
283
+ placement mode:
284
+ --mode {gas,chain,shell}
285
+ --region SPEC [gas] sphere:R | box:L | box:LX,LY,LZ
286
+ --branch-prob FLOAT [chain] branching probability (default: 0.3)
287
+ --chain-persist FLOAT [chain] directional persistence 0.0–1.0 (default: 0.5)
288
+ --bond-range LO:HI [chain/shell] bond length range Å (default: 1.2:1.6)
289
+ --center-z Z [shell] fix center atom by atomic number
290
+ --coord-range MIN:MAX [shell] coordination number range (default: 4:8)
291
+ --shell-radius LO:HI [shell] shell radius range Å (default: 1.8:2.5)
292
+
293
+ elements:
294
+ --elements SPEC atomic-number spec (default: all Z=1-106)
295
+
296
+ placement:
297
+ --cov-scale FLOAT min dist = cov_scale × (r_i + r_j), Pyykkö radii (default: 1.0)
298
+ --relax-cycles INT max cycles for post-placement repulsion relaxation (default: 1500)
299
+ --no-add-hydrogen disable automatic H augmentation
300
+
301
+ sampling:
302
+ --n-samples INT number of structures to attempt (default: 1)
303
+ --seed INT random seed
304
+
305
+ metrics:
306
+ --n-bins INT histogram bins (default: 20)
307
+ --w-atom FLOAT H_atom weight in H_total (default: 0.5)
308
+ --w-spatial FLOAT H_spatial weight in H_total (default: 0.5)
309
+ --cutoff FLOAT distance cutoff Å for Q_l / graph_*
310
+ (default: auto = cov_scale × 1.5 × median(r_i+r_j))
311
+
312
+ filtering:
313
+ --filter METRIC:MIN:MAX repeatable; use - for open bound
314
+
315
+ output:
316
+ --validate check charge/mult against one random composition, then exit
317
+ -o / --output FILE XYZ output file (default: stdout)
318
+ ```
319
+
320
+ ## Notes and Limitations
321
+
322
+ - **Interatomic distances** use Pyykkö (2009) single-bond covalent radii. For Z > 86 (Fr through Sg), same-group proxies are used (e.g. Fr → Cs, U → Nd, Rf → Hf).
323
+ - **Repulsion relaxation** guarantees that no pair falls below `cov_scale × (r_i + r_j)` when it converges. If `[warn] relax_positions did not converge` appears, the structure may contain marginal violations but is still output. Increase `--relax-cycles` if convergence is important.
324
+ - **Auto cutoff** is computed from the element pool before any structures are generated and is fixed for the entire run. If the actual composition drawn per sample is much lighter or heavier than the pool median, the effective neighbour count may still be low or high. Use `--cutoff` to override when needed.
325
+ - **RDF_dev** is a finite-system approximation; treat it as a relative indicator.
326
+ - Charge/mult parity failures are common with large element pools and `mult=1`. Increase `--n-samples` or use `--mult 2` to compensate.
327
+
328
+ ## License
329
+
330
+ MIT License. See [LICENSE](LICENSE).
pasted-0.1.0/README.md ADDED
@@ -0,0 +1,307 @@
1
+ # PASTED
2
+
3
+ **P**ointless **A**tom **ST**ructure with **E**ntropy **D**iagnostics
4
+
5
+ PASTED is a structure fuzzer for quantum chemistry and machine learning potential codes.
6
+
7
+ A CLI tool that generates intentionally random, physically meaningless atomic structures and evaluates their degree of disorder through a suite of structural metrics. Useful for stress-testing structure optimizers, generating worst-case inputs for quantum chemistry codes, or studying what "maximum chaos" looks like in structural space.
8
+
9
+ ## Features
10
+
11
+ - **Three placement modes** — fully random (`gas`), chain-growth (`chain`), coordination-complex-like (`shell`)
12
+ - **10 disorder metrics** computed per structure, all usable as output filters
13
+ - **Element pool** specified by atomic number (Z = 1–106, H through Sg); composition sampled randomly per structure
14
+ - **Always outputs `--n-atoms` atoms** — placement is unrestricted; Pyykkö covalent radii are enforced by mandatory post-placement repulsion relaxation
15
+ - **Auto-scaled `--cutoff`** — defaults to `cov_scale × 1.5 × median(r_i + r_j)` over the element pool, so graph and Steinhardt metrics are meaningful regardless of which elements are used
16
+ - Charge/multiplicity parity validation before any geometry is generated
17
+ - Multi-structure batch generation with `--n-samples`; per-structure progress on stderr, XYZ on stdout
18
+ - Reproducible runs via `--seed`
19
+
20
+ ## Requirements
21
+
22
+ ```
23
+ Python >= 3.10
24
+ numpy
25
+ scipy
26
+ ```
27
+
28
+ ```bash
29
+ pip install numpy scipy
30
+ ```
31
+
32
+ ## Installation
33
+
34
+ ```bash
35
+ git clone https://github.com/yourname/pasted.git
36
+ cd pasted
37
+ # no build step required; run directly
38
+ python pasted.py --help
39
+ ```
40
+
41
+ ## Quick Start
42
+
43
+ ```bash
44
+ # 10 atoms drawn from H–Zn, placed randomly in a sphere of radius 8 Å
45
+ python pasted.py --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
46
+ --mode gas --region sphere:8
47
+
48
+ # Organic-looking chain structure (C/N/O only)
49
+ python pasted.py --n-atoms 15 --elements 6,7,8 --charge 0 --mult 1 \
50
+ --mode chain --branch-prob 0.4 --n-samples 20 -o organic_junk.xyz
51
+
52
+ # Coordination-complex-like structure with Fe as center
53
+ python pasted.py --n-atoms 12 --elements 6,7,8,26 --charge 0 --mult 1 \
54
+ --mode shell --center-z 26 --coord-range 4:6 --n-samples 10
55
+
56
+ # Generate 100 structures, keep only the most disordered ones
57
+ python pasted.py --n-atoms 12 --elements 1-30 --charge 0 --mult 1 \
58
+ --mode gas --region sphere:9 --n-samples 100 \
59
+ --filter H_total:2.0:- --filter shape_aniso:0.3:- -o disordered.xyz
60
+ ```
61
+
62
+ ## Placement Modes
63
+
64
+ ### `gas` (default)
65
+
66
+ Atoms are placed independently and uniformly at random inside the specified region.
67
+ Closest to true spatial randomness; highest expected `H_spatial`.
68
+
69
+ ```
70
+ --region sphere:R sphere of radius R Å
71
+ --region box:L cube of side L Å
72
+ --region box:LX,LY,LZ orthorhombic box
73
+ ```
74
+
75
+ ### `chain`
76
+
77
+ Atoms grow one by one from a seed atom via a random walk with directional persistence.
78
+ At each step, a random active tip is selected and the new atom is placed at a random bond length.
79
+ The direction of each step is constrained by `--chain-persist` to avoid self-tangling.
80
+ A branching probability controls whether the old tip is kept (branch) or replaced (linear advance).
81
+ Produces elongated, tree-like structures.
82
+
83
+ ```
84
+ --branch-prob FLOAT branching probability (default: 0.3)
85
+ --chain-persist FLOAT directional persistence 0.0–1.0 (default: 0.5)
86
+ 0.0 = fully random (may self-tangle)
87
+ 0.5 = rear 120° cone excluded
88
+ 1.0 = front hemisphere only, nearly straight
89
+ --bond-range LO:HI bond length range in Å (default: 1.2:1.6)
90
+ ```
91
+
92
+ ### `shell`
93
+
94
+ One atom is placed at the origin as the "center", surrounded by a coordination shell at a random radius,
95
+ followed by tail atoms growing from shell members.
96
+ Produces structures that superficially resemble coordination complexes.
97
+
98
+ ```
99
+ --center-z Z atomic number of center atom
100
+ (default: random from the sample's composition)
101
+ --coord-range MIN:MAX coordination number range (default: 4:8)
102
+ --shell-radius LO:HI shell radius range in Å (default: 1.8:2.5)
103
+ --bond-range LO:HI tail bond length range in Å (default: 1.2:1.6)
104
+ ```
105
+
106
+ The center atom and its Z are recorded in the XYZ comment line as `center=Fe(Z=26)`.
107
+
108
+ ## Element Pool
109
+
110
+ ```
111
+ --elements SPEC
112
+ ```
113
+
114
+ Elements are specified by atomic number. Omit to use all supported elements (Z = 1–106).
115
+
116
+ | Syntax | Meaning |
117
+ |--------|---------|
118
+ | `1-30` | Z = 1 through 30 (H to Zn) |
119
+ | `6,7,8` | Z = 6, 7, 8 (C, N, O) |
120
+ | `1-10,26,28` | Z = 1–10 plus Fe(26) and Ni(28) |
121
+ | `72-80` | 5d metals Hf through Hg |
122
+ | *(omitted)* | all Z = 1–106 |
123
+
124
+ For each structure, `--n-atoms` elements are drawn independently and uniformly from this pool.
125
+ The resulting composition varies per sample.
126
+
127
+ If H (Z = 1) is in the pool and the sampled composition contains no hydrogen, a random number of H atoms is automatically appended (approximately `1 + uniform(0,1) × n_atoms × 1.2`). This can be disabled with `--no-add-hydrogen`.
128
+
129
+ ## Charge and Multiplicity
130
+
131
+ `--charge` and `--mult` are required and apply to every generated structure.
132
+ Before placement, PASTED checks two conditions against the randomly sampled composition:
133
+
134
+ 1. Total electron count `N_e = Σ Z − charge > 0`
135
+ 2. Parity: `N_e % 2 == (mult − 1) % 2`
136
+
137
+ Structures that fail either check are logged as `[invalid]` and skipped.
138
+ High-spin vs. low-spin selection is **not enforced**; that is the user's responsibility.
139
+
140
+ Because composition is random, parity failures are common when the element pool contains many odd-Z elements and `mult=1` is specified. Increasing `--n-samples` or using `--mult 2` reduces this.
141
+
142
+ ## Interatomic Distance Control
143
+
144
+ PASTED enforces a minimum interatomic distance using Pyykkö single-bond covalent radii (Pyykkö & Atsumi, *Chem. Eur. J.* **15**, 186–197, 2009).
145
+
146
+ The threshold for each atom pair (i, j) is:
147
+
148
+ ```
149
+ d_min(i, j) = cov_scale × (r_i + r_j)
150
+ ```
151
+
152
+ - Default `--cov-scale 1.0` = exact sum of covalent radii.
153
+ - Values below 1.0 allow closer contacts; values above 1.0 enforce additional clearance.
154
+ - Z > 86 (Fr through Sg): no single-bond literature values are available. PASTED uses the same-group nearest lighter element as a proxy (e.g. Fr → Cs, U → Nd, Rf → Hf).
155
+
156
+ ### Post-placement repulsion relaxation
157
+
158
+ Placement does **not** check for distance violations — atoms are placed freely in the requested geometry (region/chain/shell). After placement, a mandatory **repulsion relaxation** step resolves all violations iteratively: for each pair below the threshold, both atoms are pushed apart along their connecting vector by half the deficit. This repeats until no violations remain or `--relax-cycles` is exhausted.
159
+
160
+ This design guarantees that `--n-atoms` atoms are always placed, regardless of how crowded the initial configuration is. If relaxation does not converge within `--relax-cycles`, a `[warn]` line is printed to stderr and the structure is output as-is.
161
+
162
+ ## Disorder Metrics
163
+
164
+ All metrics are computed for every structure and embedded in the XYZ comment line.
165
+ All are usable in `--filter`.
166
+
167
+ | Metric | Description | Range |
168
+ |--------|-------------|-------|
169
+ | `H_atom` | Shannon entropy of element composition | 0 (single element) to ln(*k*) |
170
+ | `H_spatial` | Shannon entropy of the pairwise-distance histogram | higher = more uniform distances |
171
+ | `H_total` | Weighted sum: `w_atom · H_atom + w_spatial · H_spatial` | — |
172
+ | `RDF_dev` | RMS deviation of empirical *g*(*r*) from ideal-gas baseline | 0 = perfectly random |
173
+ | `shape_aniso` | Relative shape anisotropy from the gyration tensor | 0 = spherical, 1 = rod-like |
174
+ | `Q4`, `Q6`, `Q8` | Steinhardt bond-orientational order parameters (averaged over atoms) | 0 = disordered |
175
+ | `graph_lcc` | Fraction of atoms in the largest connected component at `--cutoff` | 0–1 |
176
+ | `graph_cc` | Mean clustering coefficient at `--cutoff` | 0–1 |
177
+
178
+ ### Distance cutoff for graph and Steinhardt metrics
179
+
180
+ The `--cutoff` parameter determines which atom pairs are considered "connected" for `graph_lcc`, `graph_cc`, and `Q4/Q6/Q8`. Setting this too small relative to the actual interatomic distances causes all metrics to collapse to zero (no neighbours found); setting it too large makes every atom a neighbour of every other.
181
+
182
+ By default, `--cutoff` is set automatically to:
183
+
184
+ ```
185
+ cutoff = cov_scale × 1.5 × median(r_i + r_j) over all element-pool pairs
186
+ ```
187
+
188
+ This scales with the element pool: light-element pools (e.g. C/N/O) get a cutoff around 2.1 Å; 5d-metal pools get around 3.8 Å. The auto value is printed to stderr at startup:
189
+
190
+ ```
191
+ [cutoff] 2.130 Å (auto: cov_scale=1.0 × 1.5 × median(r_i+r_j)=1.420 Å)
192
+ ```
193
+
194
+ Override with `--cutoff FLOAT` when needed.
195
+
196
+ ### Other metric tuning
197
+
198
+ ```
199
+ --n-bins N histogram bins for H_spatial and RDF_dev (default: 20)
200
+ --w-atom FLOAT weight of H_atom in H_total (default: 0.5)
201
+ --w-spatial FLOAT weight of H_spatial in H_total (default: 0.5)
202
+ ```
203
+
204
+ ## Filtering
205
+
206
+ ```
207
+ --filter METRIC:MIN:MAX
208
+ ```
209
+
210
+ Only structures whose metric falls in [MIN, MAX] are written to output.
211
+ Use `-` for an open bound.
212
+ The flag is repeatable; all conditions must be satisfied simultaneously.
213
+
214
+ ```bash
215
+ # Keep structures with high total entropy
216
+ --filter H_total:2.0:-
217
+
218
+ # Keep elongated structures (rod-like)
219
+ --filter shape_aniso:0.5:-
220
+
221
+ # Keep well-connected chains
222
+ --filter graph_lcc:0.8:- --filter graph_cc:0.4:-
223
+
224
+ # Keep structures with low local order (no accidental crystallinity)
225
+ --filter Q6:-:0.4
226
+ ```
227
+
228
+ ## Output Format
229
+
230
+ Structures are written as a concatenated multi-structure XYZ file.
231
+ Progress and statistics are written to stderr; the XYZ data goes to stdout (or `--output`).
232
+
233
+ ```
234
+ 12
235
+ sample=3 mode=chain charge=+0 mult=1 comp=[C:4,N:5,O:3] H_atom=1.0986 H_spatial=2.7812 H_total=1.9399 RDF_dev=3.2451 shape_aniso=0.5123 Q4=0.5210 Q6=0.5880 Q8=0.6014 graph_lcc=1.0000 graph_cc=0.5714
236
+ C 1.234567 -0.987654 2.345678
237
+ N -1.456789 3.210987 -0.123456
238
+ ...
239
+ ```
240
+
241
+ ```bash
242
+ # XYZ to file, progress to terminal
243
+ python pasted.py ... -o out.xyz
244
+
245
+ # pipe XYZ, discard progress
246
+ python pasted.py ... 2>/dev/null | downstream_tool
247
+
248
+ # progress only (dry run to check filter hit rate)
249
+ python pasted.py ... -o /dev/null
250
+ ```
251
+
252
+ ## Full Option Reference
253
+
254
+ ```
255
+ required:
256
+ --n-atoms N number of atoms per structure
257
+ --charge INT total system charge
258
+ --mult INT spin multiplicity 2S+1
259
+
260
+ placement mode:
261
+ --mode {gas,chain,shell}
262
+ --region SPEC [gas] sphere:R | box:L | box:LX,LY,LZ
263
+ --branch-prob FLOAT [chain] branching probability (default: 0.3)
264
+ --chain-persist FLOAT [chain] directional persistence 0.0–1.0 (default: 0.5)
265
+ --bond-range LO:HI [chain/shell] bond length range Å (default: 1.2:1.6)
266
+ --center-z Z [shell] fix center atom by atomic number
267
+ --coord-range MIN:MAX [shell] coordination number range (default: 4:8)
268
+ --shell-radius LO:HI [shell] shell radius range Å (default: 1.8:2.5)
269
+
270
+ elements:
271
+ --elements SPEC atomic-number spec (default: all Z=1-106)
272
+
273
+ placement:
274
+ --cov-scale FLOAT min dist = cov_scale × (r_i + r_j), Pyykkö radii (default: 1.0)
275
+ --relax-cycles INT max cycles for post-placement repulsion relaxation (default: 1500)
276
+ --no-add-hydrogen disable automatic H augmentation
277
+
278
+ sampling:
279
+ --n-samples INT number of structures to attempt (default: 1)
280
+ --seed INT random seed
281
+
282
+ metrics:
283
+ --n-bins INT histogram bins (default: 20)
284
+ --w-atom FLOAT H_atom weight in H_total (default: 0.5)
285
+ --w-spatial FLOAT H_spatial weight in H_total (default: 0.5)
286
+ --cutoff FLOAT distance cutoff Å for Q_l / graph_*
287
+ (default: auto = cov_scale × 1.5 × median(r_i+r_j))
288
+
289
+ filtering:
290
+ --filter METRIC:MIN:MAX repeatable; use - for open bound
291
+
292
+ output:
293
+ --validate check charge/mult against one random composition, then exit
294
+ -o / --output FILE XYZ output file (default: stdout)
295
+ ```
296
+
297
+ ## Notes and Limitations
298
+
299
+ - **Interatomic distances** use Pyykkö (2009) single-bond covalent radii. For Z > 86 (Fr through Sg), same-group proxies are used (e.g. Fr → Cs, U → Nd, Rf → Hf).
300
+ - **Repulsion relaxation** guarantees that no pair falls below `cov_scale × (r_i + r_j)` when it converges. If `[warn] relax_positions did not converge` appears, the structure may contain marginal violations but is still output. Increase `--relax-cycles` if convergence is important.
301
+ - **Auto cutoff** is computed from the element pool before any structures are generated and is fixed for the entire run. If the actual composition drawn per sample is much lighter or heavier than the pool median, the effective neighbour count may still be low or high. Use `--cutoff` to override when needed.
302
+ - **RDF_dev** is a finite-system approximation; treat it as a relative indicator.
303
+ - Charge/mult parity failures are common with large element pools and `mult=1`. Increase `--n-samples` or use `--mult 2` to compensate.
304
+
305
+ ## License
306
+
307
+ MIT License. See [LICENSE](LICENSE).