PyPI - pymdkit - Versions diffs - 1.1.2__tar.gz → 1.1.4__tar.gz - Mend

pymdkit 1.1.2tar.gz → 1.1.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (30) hide show

{pymdkit-1.1.2/src/pymdkit.egg-info → pymdkit-1.1.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pymdkit
-Version: 1.1.2
+Version: 1.1.4
 Summary: A unified command-line toolkit for atomistic / MD structure workflows.
 Author-email: Yueda Wang <ydwang0608@ustc.edu.cn>
 License-Expression: GPL-3.0-or-later
@@ -33,18 +33,19 @@ scripts into each working folder and running `python some_script.py`, you instal
 Every command exposes named `--flags` (no positional guessing), and each underlying
 script is still runnable on its own.
-## Install ("compiling" the executable)
+## Install
-Python isn't compiled to a binary; the equivalent step is installing the package,
-which creates the `pymdkit` command on your `PATH`.
+Create a clean conda environment, activate it, then install `pymdkit` with pip:
 ```bash
+conda create -n pymdkit python=3.10
+conda activate pymdkit
 pip install pymdkit
 ```
-On an HPC cluster, activate your conda env / `module load` first so `pymdkit` lands in
-that environment's `bin`. This installs every dependency (numpy, scipy, ase, pymatgen,
-pyxtal, mp_api, gemmi, tqdm), so all commands work out of the box.
+This installs the `pymdkit` command into the active conda environment, together
+with its dependencies (numpy, scipy, ase, pymatgen, pyxtal, mp_api, gemmi,
+tqdm).
 Verify:
@@ -63,6 +64,7 @@ directory for job sub-folders automatically.
 | Command | What it does |
 |---|---|
 | `add-groups` | Tag atoms with a GPUMD group index by element order |
+| `electrostatic-energy` | Compute CIF electrostatic energy with pymatgen EwaldSummation |
 | `ehull` | Auto-detect VASP job folders and compute E_hull vs Materials Project |
 | `gather-contcar` | Collect CONTCARs from VASP job folders into one folder, renamed `<folder>.vasp` |
 | `msd` | Diffusivity & conductivity from GPUMD MSD jobs (auto-scans `<structure>/<temp>/`) |
@@ -70,6 +72,7 @@ directory for job sub-folders automatically.
 | `rmsd` | Compute RMSD between two structure files, or all pairs in a folder |
 | `select-candidate` | Split a NEP training set into candidate/accurate sets by energy error |
 | `stru2xyz` | Convert structure file(s) of any format to extxyz |
+| `substitute` | Randomly substitute or remove selected atoms/sites from a structure |
 | `supercell` | Build a supercell with cell lengths capped at a maximum (Angstrom); optional per-temperature GPUMD setup |
 | `symmetrize` | Import space-group symmetry into a structure file (or folder) -> CIF |
 | `vasp-relax` | Write VASP relaxation inputs for a structure (or folder); INCAR tags overridable |
@@ -99,6 +102,11 @@ pymdkit gather-contcar -of vasp-opted               # CONTCARs -> vasp-opted/<fo
 pymdkit gather-contcar -of vasp-opted -ehull 0.028  # only structures with E_hull < 0.028 eV/atom
 pymdkit outcar2xyz                                  # scans ./ for OUTCAR folders -> scf-converged.xyz
 pymdkit outcar2xyz --position-only                  # write positions only, without energy/forces/stress
+pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we Na -wn 3 -on 100
+pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we none -on 100
+pymdkit substitute -i Li96Ta6La11Cl72.cif -se Li1 Li2 -sn 20 67 -we none -on 100
+pymdkit electrostatic-energy -i Li3YCl6.cif
+pymdkit electrostatic-energy -if Li3YCl6-all
 pymdkit select-candidate                            # RMSE bands: <low all accurate, >high all candidate, else worst 50%
 pymdkit select-candidate -r 0.8                     # in the middle band, take worst 80% as candidate.xyz
 pymdkit rmsd a.cif b.cif                            # RMSD of two files -> rmsd.txt
@@ -154,8 +162,10 @@ pymdkit/
         |-- add_groups.py
         |-- compute_ehull.py
         |-- compute_rmsd.py
+        |-- electrostatic_energy.py
         |-- outcar2xyz.py
         |-- stru2xyz.py
+        |-- substitute.py
         |-- supercell.py
         |-- vasp_relax.py
         |-- vasp_static.py
@@ -188,8 +198,8 @@ if __name__ == "__main__":          # keeps the script runnable on its own
     raise SystemExit(run(_p.parse_args()))
 ```
-It will appear in `pymdkit --help` automatically — no central registration needed.
-Put heavy imports (pymatgen, ase, …) inside `run()` where practical; the dispatcher
+It will appear in `pymdkit --help` automatically - no central registration needed.
+Put heavy imports (pymatgen, ase, ...) inside `run()` where practical; the dispatcher
 reads each command's name and help without importing it, so `pymdkit --help` stays
 fast and a missing optional dependency only affects the one command that needs it.

{pymdkit-1.1.2 → pymdkit-1.1.4}/README.md RENAMED Viewed

@@ -8,18 +8,19 @@ scripts into each working folder and running `python some_script.py`, you instal
 Every command exposes named `--flags` (no positional guessing), and each underlying
 script is still runnable on its own.
-## Install ("compiling" the executable)
-Python isn't compiled to a binary; the equivalent step is installing the package,
-which creates the `pymdkit` command on your `PATH`.
-```bash
-pip install pymdkit
-```
-On an HPC cluster, activate your conda env / `module load` first so `pymdkit` lands in
-that environment's `bin`. This installs every dependency (numpy, scipy, ase, pymatgen,
-pyxtal, mp_api, gemmi, tqdm), so all commands work out of the box.
+## Install
+Create a clean conda environment, activate it, then install `pymdkit` with pip:
+```bash
+conda create -n pymdkit python=3.10
+conda activate pymdkit
+pip install pymdkit
+```
+This installs the `pymdkit` command into the active conda environment, together
+with its dependencies (numpy, scipy, ase, pymatgen, pyxtal, mp_api, gemmi,
+tqdm).
 Verify:
@@ -38,6 +39,7 @@ directory for job sub-folders automatically.
 | Command | What it does |
 |---|---|
 | `add-groups` | Tag atoms with a GPUMD group index by element order |
+| `electrostatic-energy` | Compute CIF electrostatic energy with pymatgen EwaldSummation |
 | `ehull` | Auto-detect VASP job folders and compute E_hull vs Materials Project |
 | `gather-contcar` | Collect CONTCARs from VASP job folders into one folder, renamed `<folder>.vasp` |
 | `msd` | Diffusivity & conductivity from GPUMD MSD jobs (auto-scans `<structure>/<temp>/`) |
@@ -45,6 +47,7 @@ directory for job sub-folders automatically.
 | `rmsd` | Compute RMSD between two structure files, or all pairs in a folder |
 | `select-candidate` | Split a NEP training set into candidate/accurate sets by energy error |
 | `stru2xyz` | Convert structure file(s) of any format to extxyz |
+| `substitute` | Randomly substitute or remove selected atoms/sites from a structure |
 | `supercell` | Build a supercell with cell lengths capped at a maximum (Angstrom); optional per-temperature GPUMD setup |
 | `symmetrize` | Import space-group symmetry into a structure file (or folder) -> CIF |
 | `vasp-relax` | Write VASP relaxation inputs for a structure (or folder); INCAR tags overridable |
@@ -74,18 +77,23 @@ pymdkit gather-contcar -of vasp-opted               # CONTCARs -> vasp-opted/<fo
 pymdkit gather-contcar -of vasp-opted -ehull 0.028  # only structures with E_hull < 0.028 eV/atom
 pymdkit outcar2xyz                                  # scans ./ for OUTCAR folders -> scf-converged.xyz
 pymdkit outcar2xyz --position-only                  # write positions only, without energy/forces/stress
+pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we Na -wn 3 -on 100
+pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we none -on 100
+pymdkit substitute -i Li96Ta6La11Cl72.cif -se Li1 Li2 -sn 20 67 -we none -on 100
+pymdkit electrostatic-energy -i Li3YCl6.cif
+pymdkit electrostatic-energy -if Li3YCl6-all
 pymdkit select-candidate                            # RMSE bands: <low all accurate, >high all candidate, else worst 50%
 pymdkit select-candidate -r 0.8                     # in the middle band, take worst 80% as candidate.xyz
 pymdkit rmsd a.cif b.cif                            # RMSD of two files -> rmsd.txt
 pymdkit rmsd vasp-opted/                            # all pairs in a folder -> rmsd.txt
-pymdkit symmetrize -i opted.cif --symprec 0.01 --add_oxidation yes -o opted-symm.cif
-pymdkit symmetrize -if my_cifs/ --symprec 0.01 --add_oxidation no -of my_cifs-symm
+pymdkit symmetrize -i opted.cif --symprec 0.01 --add_oxidation yes -o opted-symm.cif
+pymdkit symmetrize -if my_cifs/ --symprec 0.01 --add_oxidation no -of my_cifs-symm
 ```
 VASP input commands (`vasp-relax`, `vasp-static`) always produce **individual**
 jobs (one structure per folder): `-i` writes into the current dir (or `-o`),
 `-if` creates one `./<name>/` folder per structure, and `-it` creates one
-`./frame_N/` folder per trajectory frame - all directly in the current path.
+`./frame_N/` folder per trajectory frame - all directly in the current path.
 They start from sensible default INCAR settings; override them by passing a
 settings file with `-custom-setting FILE`. The file may be a Python-dict block
@@ -101,7 +109,7 @@ custom_settings = {
 `vasp-static -it traj.xyz` (also available on `vasp-relax`) reads a
 multi-structure trajectory and writes one job sub-folder per frame
-(`frame_1/`, `frame_2/`, ..., prefix configurable via `--frame-prefix`). Each
+(`frame_1/`, `frame_2/`, ..., prefix configurable via `--frame-prefix`). Each
 folder also keeps a `frame_N.xyz`, so `Config_type` survives for a later
 `outcar2xyz`.
@@ -110,7 +118,7 @@ Each command's full flag list is in `pymdkit <command> --help`.
 `ehull` auto-detects every sub-folder of the current path that contains a
 `vasprun.xml`, groups them by chemical system (elements ordered by
 electronegativity, e.g. `Li-Y-Cl`), and builds/reuses one `mp_cache_<system>.json`
-per system - so a pure Li-Y-Cl batch yields a single `mp_cache_Li-Y-Cl.json`, while
+per system - so a pure Li-Y-Cl batch yields a single `mp_cache_Li-Y-Cl.json`, while
 a mixed Li-Y-Cl + La-O batch yields both `mp_cache_Li-Y-Cl.json` and
 `mp_cache_La-O.json`. (Formation energy is reported alongside E_hull in
 `ehull.txt`.)
@@ -119,23 +127,25 @@ a mixed Li-Y-Cl + La-O batch yields both `mp_cache_Li-Y-Cl.json` and
 ```
 pymdkit/
-|-- pyproject.toml              # package metadata + the `pymdkit` entry point
+|-- pyproject.toml              # package metadata + the `pymdkit` entry point
 |-- README.md
-`-- src/pymdkit/
+`-- src/pymdkit/
     |-- pymdkit_main.py         # dispatcher: discovers and runs commands
-    `-- commands/               # one module per command
+    `-- commands/               # one module per command
         |-- _fileio.py          # shared -i/-o/-if/-of helper (not a command)
         |-- _vaspset.py         # shared VASP input-set helper (not a command)
         |-- add_groups.py
         |-- compute_ehull.py
         |-- compute_rmsd.py
+        |-- electrostatic_energy.py
         |-- outcar2xyz.py
         |-- stru2xyz.py
+        |-- substitute.py
         |-- supercell.py
         |-- vasp_relax.py
         |-- vasp_static.py
         |-- ...
-        `-- symmetrize.py
+        `-- symmetrize.py
 ```
 Modules whose name starts with `_` are shared helpers and are skipped by the
@@ -163,8 +173,8 @@ if __name__ == "__main__":          # keeps the script runnable on its own
     raise SystemExit(run(_p.parse_args()))
 ```
-It will appear in `pymdkit --help` automatically — no central registration needed.
-Put heavy imports (pymatgen, ase, …) inside `run()` where practical; the dispatcher
+It will appear in `pymdkit --help` automatically - no central registration needed.
+Put heavy imports (pymatgen, ase, ...) inside `run()` where practical; the dispatcher
 reads each command's name and help without importing it, so `pymdkit --help` stays
 fast and a missing optional dependency only affects the one command that needs it.

{pymdkit-1.1.2 → pymdkit-1.1.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "pymdkit"
-version = "1.1.2"
+version = "1.1.4"
 description = "A unified command-line toolkit for atomistic / MD structure workflows."
 readme = "README.md"
 requires-python = ">=3.9"

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/__init__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """pymdkit -- a unified CLI for atomistic / MD structure workflows."""
-__version__ = "1.1.2"
+__version__ = "1.1.4"

pymdkit-1.1.4/src/pymdkit/commands/electrostatic_energy.py ADDED Viewed

@@ -0,0 +1,103 @@
+"""
+Compute electrostatic energy with pymatgen's EwaldSummation.
+Examples:
+    pymdkit electrostatic-energy -i Li3YCl6.cif
+    pymdkit electrostatic-energy -if Li3YCl6-all
+Input files must be CIF files containing oxidation-state information, such as
+an _atom_type_oxidation_number loop.
+"""
+from __future__ import annotations
+import argparse
+from pathlib import Path
+COMMAND = "electrostatic-energy"
+HELP = "Compute CIF electrostatic energy using pymatgen EwaldSummation."
+def read_charged_structure(path):
+    from pymatgen.core import Structure
+    structure = Structure.from_file(str(path))
+    missing = []
+    for site in structure:
+        for specie in site.species:
+            if getattr(specie, "oxi_state", None) is None:
+                missing.append(str(specie))
+    if missing:
+        raise ValueError(
+            "missing oxidation states; CIF must include "
+            "_atom_type_oxidation_number")
+    return structure
+def electrostatic_energy(path):
+    from pymatgen.analysis.ewald import EwaldSummation
+    structure = read_charged_structure(path)
+    return float(EwaldSummation(structure).total_energy)
+def cif_files(folder):
+    folder = Path(folder)
+    if not folder.is_dir():
+        raise SystemExit(f"Error: input folder not found: {folder}")
+    files = sorted(p for p in folder.iterdir()
+                   if p.is_file() and p.suffix.lower() == ".cif")
+    if not files:
+        raise SystemExit(f"Error: no CIF files found in {folder}")
+    return files
+def add_arguments(parser: argparse.ArgumentParser) -> None:
+    parser.add_argument("-i", "--input", help="Single input CIF file.")
+    parser.add_argument("-if", "--input-folder", dest="input_folder",
+                        help="Folder containing input CIF files.")
+    parser.add_argument("-o", "--output", default="electrostatic-energy.txt",
+                        help="Output text filename (default: electrostatic-energy.txt).")
+def run(args) -> int:
+    if bool(args.input) == bool(args.input_folder):
+        print("Error: provide exactly one of -i/--input or -if/--input-folder.")
+        return 1
+    if args.input:
+        input_path = Path(args.input)
+        if input_path.suffix.lower() != ".cif":
+            print("Error: electrostatic-energy requires CIF input.")
+            return 1
+        output_path = input_path.with_name(args.output)
+        files = [input_path]
+    else:
+        input_folder = Path(args.input_folder)
+        output_path = input_folder / args.output
+        files = cif_files(input_folder)
+    results = []
+    for path in files:
+        try:
+            energy = electrostatic_energy(path)
+        except Exception as exc:  # noqa: BLE001 - report per-file failures
+            print(f"{path.name}: error ({exc})")
+            continue
+        results.append((path.name, energy))
+        print(f"{path.name}\t{energy:.3f} eV")
+    if args.input_folder:
+        results.sort(key=lambda item: (item[1], item[0]))
+    lines = [f"{name}\t{energy:.3f} eV" for name, energy in results]
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    output_path.write_text("\n".join(lines) + ("\n" if lines else ""))
+    print(f"Wrote {len(lines)} result(s) to {output_path}")
+    return 0 if lines else 1
+if __name__ == "__main__":
+    _p = argparse.ArgumentParser(description=__doc__)
+    add_arguments(_p)
+    raise SystemExit(run(_p.parse_args()))

pymdkit-1.1.4/src/pymdkit/commands/substitute.py ADDED Viewed

@@ -0,0 +1,452 @@
+"""
+Randomly substitute or remove selected atoms from a structure.
+Examples:
+    pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we Na -wn 3 -on 100
+    pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we none -on 100
+    pymdkit substitute -i Li96Ta6La11Cl72.cif -se Li1 Li2 -sn 20 67 -we none -on 100
+"""
+from __future__ import annotations
+import argparse
+import itertools
+import math
+import random
+import re
+import shlex
+from pathlib import Path
+COMMAND = "substitute"
+HELP = "Randomly substitute or remove selected atoms from a structure."
+def _clean_token(token):
+    token = token.strip()
+    if len(token) >= 2 and token[0] == token[-1] and token[0] in {"'", '"'}:
+        return token[1:-1]
+    return token
+def _float_token(token, default=None):
+    token = _clean_token(token)
+    if token in {"", ".", "?"}:
+        return default
+    token = token.split("(", 1)[0]
+    try:
+        return float(token)
+    except ValueError:
+        return default
+def _frac_distance(a, b):
+    import numpy as np
+    diff = np.asarray(a, dtype=float) - np.asarray(b, dtype=float)
+    diff -= np.round(diff)
+    return float(np.linalg.norm(diff))
+def _tokenize(line):
+    lexer = shlex.shlex(line, posix=True)
+    lexer.whitespace_split = True
+    lexer.commenters = ""
+    return list(lexer)
+def _element_from_label(label):
+    m = re.match(r"[A-Z][a-z]?", label)
+    return m.group(0) if m else label
+def _eval_symop_part(expr, x, y, z):
+    expr = expr.strip().lower()
+    if not re.fullmatch(r"[xyz0-9+\-./ ]+", expr):
+        raise ValueError(f"unsupported symmetry expression: {expr}")
+    return eval(expr, {"__builtins__": {}}, {"x": x, "y": y, "z": z}) % 1.0
+def _apply_symop(op, coords):
+    parts = [part.strip() for part in op.strip().strip("'\"").split(",")]
+    if len(parts) != 3:
+        return coords
+    x, y, z = coords
+    return tuple(_eval_symop_part(part, x, y, z) for part in parts)
+def _dedupe_label_records(records, tol=1e-4):
+    unique = []
+    for rec in records:
+        duplicate = False
+        for other in unique:
+            if (rec["label"] == other["label"]
+                    and rec["symbol"] == other["symbol"]
+                    and _frac_distance(rec["coords"], other["coords"]) <= tol):
+                duplicate = True
+                break
+        if not duplicate:
+            unique.append(rec)
+    return unique
+def read_cif_labels(path):
+    """Return CIF atom-site labels with fractional coordinates, if available."""
+    path = Path(path)
+    if path.suffix.lower() != ".cif":
+        return []
+    try:
+        lines = path.read_text(encoding="utf-8", errors="replace").splitlines()
+    except OSError:
+        return []
+    records = []
+    symops = []
+    i = 0
+    while i < len(lines):
+        if lines[i].strip().lower() != "loop_":
+            i += 1
+            continue
+        i += 1
+        headers = []
+        while i < len(lines) and lines[i].strip().startswith("_"):
+            headers.append(lines[i].strip())
+            i += 1
+        if ("_symmetry_equiv_pos_as_xyz" in headers
+                or "_space_group_symop_operation_xyz" in headers):
+            op_i = (headers.index("_symmetry_equiv_pos_as_xyz")
+                    if "_symmetry_equiv_pos_as_xyz" in headers
+                    else headers.index("_space_group_symop_operation_xyz"))
+            while i < len(lines):
+                stripped = lines[i].strip()
+                if (not stripped or stripped.lower() == "loop_"
+                        or stripped.startswith("_")
+                        or stripped.lower().startswith("data_")):
+                    break
+                tokens = _tokenize(stripped)
+                if len(tokens) > op_i:
+                    symops.append(_clean_token(tokens[op_i]))
+                i += 1
+            continue
+        if "_atom_site_label" not in headers:
+            continue
+        def idx(name):
+            return headers.index(name) if name in headers else None
+        label_i = idx("_atom_site_label")
+        sym_i = idx("_atom_site_type_symbol")
+        x_i = idx("_atom_site_fract_x")
+        y_i = idx("_atom_site_fract_y")
+        z_i = idx("_atom_site_fract_z")
+        if None in {label_i, x_i, y_i, z_i}:
+            continue
+        while i < len(lines):
+            stripped = lines[i].strip()
+            if (not stripped or stripped.lower() == "loop_"
+                    or stripped.startswith("_")
+                    or stripped.lower().startswith("data_")):
+                break
+            tokens = _tokenize(stripped)
+            if len(tokens) >= len(headers):
+                label = _clean_token(tokens[label_i])
+                symbol = (_clean_token(tokens[sym_i]) if sym_i is not None
+                          else _element_from_label(label))
+                coords = (
+                    _float_token(tokens[x_i]),
+                    _float_token(tokens[y_i]),
+                    _float_token(tokens[z_i]),
+                )
+                if label and all(v is not None for v in coords):
+                    for expanded in (symops or ["x,y,z"]):
+                        try:
+                            expanded_coords = _apply_symop(expanded, coords)
+                        except Exception:
+                            expanded_coords = coords
+                        records.append({
+                            "label": label,
+                            "symbol": symbol,
+                            "coords": expanded_coords,
+                        })
+            i += 1
+    return _dedupe_label_records(records)
+def read_structure(path):
+    """Read any pymatgen-readable structure, falling back to ASE."""
+    from pymatgen.core import Structure
+    path = Path(path)
+    try:
+        return Structure.from_file(str(path))
+    except Exception:
+        from ase.io import read as ase_read
+        from pymatgen.io.ase import AseAtomsAdaptor
+        atoms = ase_read(str(path))
+        return AseAtomsAdaptor.get_structure(atoms)
+def write_structure(structure, path):
+    """Write output. CIF is the default because it is stable for substitutions."""
+    from pymatgen.io.cif import CifWriter
+    path = Path(path)
+    path.parent.mkdir(parents=True, exist_ok=True)
+    if path.suffix.lower() == ".cif":
+        CifWriter(structure).write_file(str(path))
+    else:
+        structure.to(filename=str(path))
+def site_symbols(structure):
+    return [site.specie.symbol for site in structure]
+def map_cif_labels_to_sites(structure, label_records, tol=1e-3):
+    labels = [None] * len(structure)
+    used = set()
+    for idx, site in enumerate(structure):
+        best_rec = None
+        best_dist = float("inf")
+        for rec_idx, rec in enumerate(label_records):
+            if rec_idx in used or rec["symbol"] != site.specie.symbol:
+                continue
+            dist = _frac_distance(site.frac_coords, rec["coords"])
+            if dist < best_dist:
+                best_dist = dist
+                best_rec = rec_idx
+        if best_rec is not None and best_dist <= tol:
+            used.add(best_rec)
+            labels[idx] = label_records[best_rec]["label"]
+    return labels
+def target_indices(structure, selectors, input_path):
+    symbols = site_symbols(structure)
+    labels = map_cif_labels_to_sites(structure, read_cif_labels(input_path))
+    result = []
+    for selector in selectors:
+        if any(label == selector for label in labels):
+            result.append([i for i, label in enumerate(labels) if label == selector])
+        else:
+            result.append([i for i, sym in enumerate(symbols) if sym == selector])
+    return result
+def choose_combinations(groups, counts, output_number):
+    total = math.prod(math.comb(len(group), count)
+                      for group, count in zip(groups, counts))
+    if total == 0:
+        return [], total
+    if total <= output_number:
+        pools = [itertools.combinations(group, count)
+                 for group, count in zip(groups, counts)]
+        combos = [tuple(sorted(itertools.chain.from_iterable(parts)))
+                  for parts in itertools.product(*pools)]
+        random.shuffle(combos)
+        return combos, total
+    seen = set()
+    max_attempts = max(output_number * 100, 1000)
+    attempts = 0
+    while len(seen) < output_number and attempts < max_attempts:
+        attempts += 1
+        selected = []
+        for group, count in zip(groups, counts):
+            selected.extend(random.sample(group, count))
+        seen.add(tuple(sorted(selected)))
+    if len(seen) < output_number:
+        pools = [itertools.combinations(group, count)
+                 for group, count in zip(groups, counts)]
+        for parts in itertools.product(*pools):
+            seen.add(tuple(sorted(itertools.chain.from_iterable(parts))))
+            if len(seen) >= output_number:
+                break
+    combos = list(seen)
+    random.shuffle(combos)
+    return combos[:output_number], total
+def _multinomial_count(items):
+    counts = {}
+    for item in items:
+        counts[item] = counts.get(item, 0) + 1
+    total = math.factorial(len(items))
+    for count in counts.values():
+        total //= math.factorial(count)
+    return total
+def _unique_permutations(items):
+    return sorted(set(itertools.permutations(items)))
+def _random_replacement_assignment(selected, replacements):
+    repl = list(replacements)
+    random.shuffle(repl)
+    return tuple(sorted(zip(selected, repl)))
+def choose_operations(groups, counts, replacements, output_number):
+    """Return unique remove/replace operations and the total unique count."""
+    combos, selection_total = choose_combinations(groups, counts, output_number)
+    if replacements is None:
+        return [("remove", combo) for combo in combos], selection_total
+    arrangement_total = _multinomial_count(replacements)
+    total = selection_total * arrangement_total
+    if total <= output_number:
+        all_selection_combos, _ = choose_combinations(groups, counts, selection_total)
+        operations = []
+        for combo in all_selection_combos:
+            for assignment in _unique_permutations(replacements):
+                operations.append(("replace", tuple(sorted(zip(combo, assignment)))))
+        random.shuffle(operations)
+        return operations, total
+    seen = set()
+    max_attempts = max(output_number * 100, 1000)
+    attempts = 0
+    while len(seen) < output_number and attempts < max_attempts:
+        attempts += 1
+        selected = []
+        for group, count in zip(groups, counts):
+            selected.extend(random.sample(group, count))
+        selected = tuple(sorted(selected))
+        seen.add(("replace", _random_replacement_assignment(selected, replacements)))
+    if len(seen) < output_number:
+        all_selection_combos, _ = choose_combinations(groups, counts, selection_total)
+        for combo in all_selection_combos:
+            for assignment in _unique_permutations(replacements):
+                seen.add(("replace", tuple(sorted(zip(combo, assignment)))))
+                if len(seen) >= output_number:
+                    break
+            if len(seen) >= output_number:
+                break
+    operations = list(seen)
+    random.shuffle(operations)
+    return operations[:output_number], total
+def replacement_plan(args, selected_count):
+    if len(args.with_element) == 1 and args.with_element[0].lower() == "none":
+        return None
+    if not args.with_number:
+        if len(args.with_element) == 1:
+            return [args.with_element[0]] * selected_count
+        raise SystemExit("Error: provide -wn/--with-number for multiple replacement elements.")
+    if len(args.with_number) != len(args.with_element):
+        raise SystemExit("Error: -wn must have the same length as -we.")
+    if sum(args.with_number) != selected_count:
+        raise SystemExit("Error: sum(-wn) must equal sum(-sn).")
+    replacements = []
+    for element, count in zip(args.with_element, args.with_number):
+        replacements.extend([element] * count)
+    return replacements
+def apply_operation(structure, operation):
+    new_structure = structure.copy()
+    mode, payload = operation
+    if mode == "remove":
+        selected = list(payload)
+        new_structure.remove_sites(sorted(selected, reverse=True))
+        return new_structure
+    for site_idx, element in payload:
+        new_structure.replace(site_idx, element)
+    return new_structure
+def substitute_one(input_path, output_dir, args):
+    structure = read_structure(input_path)
+    groups = target_indices(structure, args.select_element, input_path)
+    for selector, group, count in zip(args.select_element, groups, args.select_number):
+        if len(group) < count:
+            raise SystemExit(
+                f"Error: selector {selector!r} has {len(group)} matching site(s), "
+                f"but -sn requests {count}.")
+    selected_count = sum(args.select_number)
+    replacements = replacement_plan(args, selected_count)
+    operations, total = choose_operations(
+        groups, args.select_number, replacements, args.output_number)
+    if not operations:
+        print(f"No combinations generated for {input_path}.")
+        return 0
+    output_dir.mkdir(parents=True, exist_ok=True)
+    stem = Path(input_path).stem
+    suffix = args.output_format
+    for idx, operation in enumerate(operations, start=1):
+        new_structure = apply_operation(structure, operation)
+        write_structure(new_structure, output_dir / f"{stem}_r{idx}.{suffix}")
+    print(f"{input_path}: total unique combinations = {total}")
+    print(f"{input_path}: wrote {len(operations)} unique structure(s) -> {output_dir}/")
+    return len(operations)
+def add_arguments(parser: argparse.ArgumentParser) -> None:
+    parser.add_argument("-i", "--input", required=True,
+                        help="Input structure file.")
+    parser.add_argument("-se", "--select-element", nargs="+", required=True,
+                        help="Element symbols or CIF atom-site labels to select, e.g. Li or Li1 Li2.")
+    parser.add_argument("-sn", "--select-number", nargs="+", type=int, required=True,
+                        help="Number of selected atoms/sites for each -se selector.")
+    parser.add_argument("-we", "--with-element", nargs="+", required=True,
+                        help="Replacement element(s), or 'none' to remove selected atoms.")
+    parser.add_argument("-wn", "--with-number", nargs="+", type=int,
+                        help="Replacement count(s), required for multiple -we values.")
+    parser.add_argument("-on", "--output-number", type=int, required=True,
+                        help="Number of random substituted/removed structures to output.")
+    parser.add_argument("-o", "--output-folder",
+                        help="Output folder (default: <input-stem>_substitute).")
+    parser.add_argument("--output-format", default="cif",
+                        help="Output file format/extension (default: cif).")
+    parser.add_argument("--seed", type=int,
+                        help="Random seed for reproducible selections.")
+def run(args) -> int:
+    if len(args.select_number) != len(args.select_element):
+        print("Error: -sn must have the same length as -se.")
+        return 1
+    if args.output_number <= 0:
+        print("Error: -on/--output-number must be positive.")
+        return 1
+    if args.seed is not None:
+        random.seed(args.seed)
+    input_path = Path(args.input)
+    if not input_path.is_file():
+        print(f"Input file not found: {input_path}")
+        return 1
+    output_dir = (Path(args.output_folder) if args.output_folder
+                  else input_path.with_name(input_path.stem + "_substitute"))
+    substitute_one(input_path, output_dir, args)
+    return 0
+if __name__ == "__main__":
+    _p = argparse.ArgumentParser(description=__doc__)
+    add_arguments(_p)
+    raise SystemExit(run(_p.parse_args()))

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/pymdkit_main.py RENAMED Viewed

@@ -69,10 +69,10 @@ def _discover() -> Dict[str, Tuple[str, str]]:
 def _build_top_parser(cmds: Dict[str, Tuple[str, str]]) -> argparse.ArgumentParser:
     parser = argparse.ArgumentParser(
-        prog="pymdkit",
-        description="Unified CLI for atomistic / MD structure workflows.",
+        prog=f"pymdkit {__version__}",
+        description="Available commands:",
     )
-    parser.add_argument("--version", action="version", version=f"%(prog)s {__version__}")
+    parser.add_argument("--version", action="version", version=f"pymdkit {__version__}")
     sub = parser.add_subparsers(dest="command", metavar="<command>")
     for name in sorted(cmds):
         sub.add_parser(name, help=cmds[name][1], add_help=False)

{pymdkit-1.1.2 → pymdkit-1.1.4/src/pymdkit.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pymdkit
-Version: 1.1.2
+Version: 1.1.4
 Summary: A unified command-line toolkit for atomistic / MD structure workflows.
 Author-email: Yueda Wang <ydwang0608@ustc.edu.cn>
 License-Expression: GPL-3.0-or-later
@@ -33,18 +33,19 @@ scripts into each working folder and running `python some_script.py`, you instal
 Every command exposes named `--flags` (no positional guessing), and each underlying
 script is still runnable on its own.
-## Install ("compiling" the executable)
+## Install
-Python isn't compiled to a binary; the equivalent step is installing the package,
-which creates the `pymdkit` command on your `PATH`.
+Create a clean conda environment, activate it, then install `pymdkit` with pip:
 ```bash
+conda create -n pymdkit python=3.10
+conda activate pymdkit
 pip install pymdkit
 ```
-On an HPC cluster, activate your conda env / `module load` first so `pymdkit` lands in
-that environment's `bin`. This installs every dependency (numpy, scipy, ase, pymatgen,
-pyxtal, mp_api, gemmi, tqdm), so all commands work out of the box.
+This installs the `pymdkit` command into the active conda environment, together
+with its dependencies (numpy, scipy, ase, pymatgen, pyxtal, mp_api, gemmi,
+tqdm).
 Verify:
@@ -63,6 +64,7 @@ directory for job sub-folders automatically.
 | Command | What it does |
 |---|---|
 | `add-groups` | Tag atoms with a GPUMD group index by element order |
+| `electrostatic-energy` | Compute CIF electrostatic energy with pymatgen EwaldSummation |
 | `ehull` | Auto-detect VASP job folders and compute E_hull vs Materials Project |
 | `gather-contcar` | Collect CONTCARs from VASP job folders into one folder, renamed `<folder>.vasp` |
 | `msd` | Diffusivity & conductivity from GPUMD MSD jobs (auto-scans `<structure>/<temp>/`) |
@@ -70,6 +72,7 @@ directory for job sub-folders automatically.
 | `rmsd` | Compute RMSD between two structure files, or all pairs in a folder |
 | `select-candidate` | Split a NEP training set into candidate/accurate sets by energy error |
 | `stru2xyz` | Convert structure file(s) of any format to extxyz |
+| `substitute` | Randomly substitute or remove selected atoms/sites from a structure |
 | `supercell` | Build a supercell with cell lengths capped at a maximum (Angstrom); optional per-temperature GPUMD setup |
 | `symmetrize` | Import space-group symmetry into a structure file (or folder) -> CIF |
 | `vasp-relax` | Write VASP relaxation inputs for a structure (or folder); INCAR tags overridable |
@@ -99,6 +102,11 @@ pymdkit gather-contcar -of vasp-opted               # CONTCARs -> vasp-opted/<fo
 pymdkit gather-contcar -of vasp-opted -ehull 0.028  # only structures with E_hull < 0.028 eV/atom
 pymdkit outcar2xyz                                  # scans ./ for OUTCAR folders -> scf-converged.xyz
 pymdkit outcar2xyz --position-only                  # write positions only, without energy/forces/stress
+pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we Na -wn 3 -on 100
+pymdkit substitute -i Li3YCl6.cif -se Li -sn 3 -we none -on 100
+pymdkit substitute -i Li96Ta6La11Cl72.cif -se Li1 Li2 -sn 20 67 -we none -on 100
+pymdkit electrostatic-energy -i Li3YCl6.cif
+pymdkit electrostatic-energy -if Li3YCl6-all
 pymdkit select-candidate                            # RMSE bands: <low all accurate, >high all candidate, else worst 50%
 pymdkit select-candidate -r 0.8                     # in the middle band, take worst 80% as candidate.xyz
 pymdkit rmsd a.cif b.cif                            # RMSD of two files -> rmsd.txt
@@ -154,8 +162,10 @@ pymdkit/
         |-- add_groups.py
         |-- compute_ehull.py
         |-- compute_rmsd.py
+        |-- electrostatic_energy.py
         |-- outcar2xyz.py
         |-- stru2xyz.py
+        |-- substitute.py
         |-- supercell.py
         |-- vasp_relax.py
         |-- vasp_static.py
@@ -188,8 +198,8 @@ if __name__ == "__main__":          # keeps the script runnable on its own
     raise SystemExit(run(_p.parse_args()))
 ```
-It will appear in `pymdkit --help` automatically — no central registration needed.
-Put heavy imports (pymatgen, ase, …) inside `run()` where practical; the dispatcher
+It will appear in `pymdkit --help` automatically - no central registration needed.
+Put heavy imports (pymatgen, ase, ...) inside `run()` where practical; the dispatcher
 reads each command's name and help without importing it, so `pymdkit --help` stays
 fast and a missing optional dependency only affects the one command that needs it.

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit.egg-info/SOURCES.txt RENAMED Viewed

@@ -16,10 +16,12 @@ src/pymdkit/commands/add_groups.py
 src/pymdkit/commands/compute_ehull.py
 src/pymdkit/commands/compute_msd_all_groups.py
 src/pymdkit/commands/compute_rmsd.py
+src/pymdkit/commands/electrostatic_energy.py
 src/pymdkit/commands/gather_contcar.py
 src/pymdkit/commands/outcar2xyz.py
 src/pymdkit/commands/select_candidate.py
 src/pymdkit/commands/stru2xyz.py
+src/pymdkit/commands/substitute.py
 src/pymdkit/commands/supercell.py
 src/pymdkit/commands/symmetrize.py
 src/pymdkit/commands/vasp_relax.py

{pymdkit-1.1.2 → pymdkit-1.1.4}/LICENSE RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/setup.cfg RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/__init__.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/_fileio.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/_vaspset.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/add_groups.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/compute_ehull.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/compute_msd_all_groups.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/compute_rmsd.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/gather_contcar.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/outcar2xyz.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/select_candidate.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/stru2xyz.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/supercell.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/symmetrize.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/vasp_relax.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit/commands/vasp_static.py RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit.egg-info/entry_points.txt RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit.egg-info/requires.txt RENAMED Viewed

File without changes

{pymdkit-1.1.2 → pymdkit-1.1.4}/src/pymdkit.egg-info/top_level.txt RENAMED Viewed

File without changes

pymdkit 1.1.2__tar.gz → 1.1.4__tar.gz

pymdkit 1.1.2tar.gz → 1.1.4tar.gz