calkit-python 0.5.0__tar.gz → 0.6.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. {calkit_python-0.5.0 → calkit_python-0.6.1}/PKG-INFO +27 -7
  2. {calkit_python-0.5.0 → calkit_python-0.6.1}/README.md +26 -6
  3. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/__init__.py +2 -1
  4. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/main.py +32 -1
  5. calkit_python-0.6.1/calkit/conda.py +162 -0
  6. calkit_python-0.6.1/docs/tutorials/conda-envs.md +64 -0
  7. {calkit_python-0.5.0 → calkit_python-0.6.1}/.github/FUNDING.yml +0 -0
  8. {calkit_python-0.5.0 → calkit_python-0.6.1}/.github/workflows/publish-test.yml +0 -0
  9. {calkit_python-0.5.0 → calkit_python-0.6.1}/.github/workflows/publish.yml +0 -0
  10. {calkit_python-0.5.0 → calkit_python-0.6.1}/.gitignore +0 -0
  11. {calkit_python-0.5.0 → calkit_python-0.6.1}/LICENSE +0 -0
  12. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/__init__.py +0 -0
  13. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/config.py +0 -0
  14. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/core.py +0 -0
  15. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/import_.py +0 -0
  16. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/list.py +0 -0
  17. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/new.py +0 -0
  18. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/notebooks.py +0 -0
  19. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cli/office.py +0 -0
  20. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/cloud.py +0 -0
  21. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/config.py +0 -0
  22. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/core.py +0 -0
  23. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/data.py +0 -0
  24. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/docker.py +0 -0
  25. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/dvc.py +0 -0
  26. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/git.py +0 -0
  27. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/gui.py +0 -0
  28. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/jupyter.py +0 -0
  29. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/models.py +0 -0
  30. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/office.py +0 -0
  31. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/server.py +0 -0
  32. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/__init__.py +0 -0
  33. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/core.py +0 -0
  34. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/__init__.py +0 -0
  35. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/article/paper.tex +0 -0
  36. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/core.py +0 -0
  37. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/jfm/jfm.bst +0 -0
  38. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/jfm/jfm.cls +0 -0
  39. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/jfm/lineno-FLM.sty +0 -0
  40. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/jfm/paper.tex +0 -0
  41. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/templates/latex/jfm/upmath.sty +0 -0
  42. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/__init__.py +0 -0
  43. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/cli/__init__.py +0 -0
  44. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/cli/test_list.py +0 -0
  45. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/cli/test_main.py +0 -0
  46. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/cli/test_new.py +0 -0
  47. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/test_core.py +0 -0
  48. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/test_dvc.py +0 -0
  49. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/test_jupyter.py +0 -0
  50. {calkit_python-0.5.0 → calkit_python-0.6.1}/calkit/tests/test_templates.py +0 -0
  51. {calkit_python-0.5.0 → calkit_python-0.6.1}/docs/tutorials/adding-latex-pub-docker.md +0 -0
  52. {calkit_python-0.5.0 → calkit_python-0.6.1}/docs/tutorials/img/run-proc.png +0 -0
  53. {calkit_python-0.5.0 → calkit_python-0.6.1}/docs/tutorials/procedures.md +0 -0
  54. {calkit_python-0.5.0 → calkit_python-0.6.1}/pyproject.toml +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.3
2
2
  Name: calkit-python
3
- Version: 0.5.0
3
+ Version: 0.6.1
4
4
  Summary: Reproducibility simplified.
5
5
  Project-URL: Homepage, https://github.com/calkit/calkit
6
6
  Project-URL: Issues, https://github.com/calkit/calkit/issues
@@ -31,19 +31,39 @@ Description-Content-Type: text/markdown
31
31
 
32
32
  # Calkit
33
33
 
34
- [Calkit](https://calkit.io) helps simplify reproducibility,
35
- acting as a layer on top of
36
- [Git](https://git-scm.com/), [DVC](https://dvc.org/),
37
- [Docker](https://docker.com),
38
- and adds a domain-specific data model
34
+ Calkit is a lightweight framework for doing reproducible research.
35
+ It acts as a top-level layer to integrate and simplify the use of enabling
36
+ technologies such as
37
+ [Git](https://git-scm.com/),
38
+ [DVC](https://dvc.org/),
39
+ [Conda](https://docs.conda.io/en/latest/),
40
+ and [Docker](https://docker.com).
41
+ Calkit also adds a domain-specific data model
39
42
  such that all aspects of the research process can be fully described in a
40
43
  single repository and therefore easily consumed by others.
41
44
 
45
+ Our goal is to make reproducibility easier so it becomes more common.
46
+ To do this, we try to make it easy for users to follow two simple rules:
47
+
48
+ 1. **Keep everything in version control.** This includes large files like
49
+ datasets, enabled by DVC. The [Calkit cloud](https://calkit.io)
50
+ serves as a simple default DVC remote storage location for those who do not
51
+ want to manage their own infrastructure.
52
+ 2. **Generate all important artifacts with a single pipeline.** There should be
53
+ no special instructions required to reproduce a project's artifacts.
54
+ It should be as simple as calling `calkit run`.
55
+ The DVC pipeline (in a project's `dvc.yaml` file) is therefore the main
56
+ thing to "build" throughout a research project.
57
+ Calkit provides helper functionality to build pipeline stages that
58
+ keep computational environments up-to-date and label their outputs for
59
+ convenient reuse.
60
+
42
61
  ## Tutorials
43
62
 
63
+ - [Keeping track of conda environments](docs/tutorials/conda-envs.md)
44
64
  - [Defining and executing manual procedures](docs/tutorials/procedures.md)
45
65
  - [Adding a new LaTeX-based publication with its own Docker build environment](docs/tutorials/adding-latex-pub-docker.md)
46
- - [A reproducibly workflow using Microsoft Office (Word and Excel)](https://petebachant.me/office-repro/)
66
+ - [A reproducible workflow using Microsoft Office (Word and Excel)](https://petebachant.me/office-repro/)
47
67
  - [Reproducible OpenFOAM simulations](https://petebachant.me/reproducible-openfoam/)
48
68
 
49
69
  ## Why does reproducibility matter?
@@ -1,18 +1,38 @@
1
1
  # Calkit
2
2
 
3
- [Calkit](https://calkit.io) helps simplify reproducibility,
4
- acting as a layer on top of
5
- [Git](https://git-scm.com/), [DVC](https://dvc.org/),
6
- [Docker](https://docker.com),
7
- and adds a domain-specific data model
3
+ Calkit is a lightweight framework for doing reproducible research.
4
+ It acts as a top-level layer to integrate and simplify the use of enabling
5
+ technologies such as
6
+ [Git](https://git-scm.com/),
7
+ [DVC](https://dvc.org/),
8
+ [Conda](https://docs.conda.io/en/latest/),
9
+ and [Docker](https://docker.com).
10
+ Calkit also adds a domain-specific data model
8
11
  such that all aspects of the research process can be fully described in a
9
12
  single repository and therefore easily consumed by others.
10
13
 
14
+ Our goal is to make reproducibility easier so it becomes more common.
15
+ To do this, we try to make it easy for users to follow two simple rules:
16
+
17
+ 1. **Keep everything in version control.** This includes large files like
18
+ datasets, enabled by DVC. The [Calkit cloud](https://calkit.io)
19
+ serves as a simple default DVC remote storage location for those who do not
20
+ want to manage their own infrastructure.
21
+ 2. **Generate all important artifacts with a single pipeline.** There should be
22
+ no special instructions required to reproduce a project's artifacts.
23
+ It should be as simple as calling `calkit run`.
24
+ The DVC pipeline (in a project's `dvc.yaml` file) is therefore the main
25
+ thing to "build" throughout a research project.
26
+ Calkit provides helper functionality to build pipeline stages that
27
+ keep computational environments up-to-date and label their outputs for
28
+ convenient reuse.
29
+
11
30
  ## Tutorials
12
31
 
32
+ - [Keeping track of conda environments](docs/tutorials/conda-envs.md)
13
33
  - [Defining and executing manual procedures](docs/tutorials/procedures.md)
14
34
  - [Adding a new LaTeX-based publication with its own Docker build environment](docs/tutorials/adding-latex-pub-docker.md)
15
- - [A reproducibly workflow using Microsoft Office (Word and Excel)](https://petebachant.me/office-repro/)
35
+ - [A reproducible workflow using Microsoft Office (Word and Excel)](https://petebachant.me/office-repro/)
16
36
  - [Reproducible OpenFOAM simulations](https://petebachant.me/reproducible-openfoam/)
17
37
 
18
38
  ## Why does reproducibility matter?
@@ -1,4 +1,4 @@
1
- __version__ = "0.5.0"
1
+ __version__ = "0.6.1"
2
2
 
3
3
  from .core import *
4
4
  from . import git
@@ -9,3 +9,4 @@ from . import config
9
9
  from . import models
10
10
  from . import office
11
11
  from . import templates
12
+ from . import conda
@@ -553,7 +553,9 @@ def run_in_env(
553
553
  typer.echo(f"Running command: {docker_cmd}")
554
554
  subprocess.call(docker_cmd, cwd=wdir)
555
555
  elif env["kind"] == "conda":
556
- cmd = ["conda", "run", "-n", env_name] + cmd
556
+ with open(env["path"]) as f:
557
+ conda_env = calkit.ryaml.load(f)
558
+ cmd = ["conda", "run", "-n", conda_env["name"]] + cmd
557
559
  if verbose:
558
560
  typer.echo(f"Running command: {cmd}")
559
561
  subprocess.call(cmd, cwd=wdir)
@@ -784,3 +786,32 @@ def run_procedure(
784
786
  )
785
787
  if step.wait_after_s:
786
788
  wait(step.wait_after_s)
789
+
790
+
791
+ @app.command(
792
+ name="check-conda-env",
793
+ help="Check a conda environment and rebuild if necessary.",
794
+ )
795
+ def check_conda_env(
796
+ env_fpath: Annotated[
797
+ str,
798
+ typer.Option(
799
+ "--file", "-f", help="Path to conda environment YAML file."
800
+ ),
801
+ ] = "environment.yml",
802
+ output_fpath: Annotated[
803
+ str,
804
+ typer.Option(
805
+ "--output",
806
+ "-o",
807
+ help=(
808
+ "Path to which existing environment should be exported. "
809
+ "If not specified, will have the same filename with '-loc' "
810
+ "appended to it, keeping the same extension."
811
+ ),
812
+ ),
813
+ ] = None,
814
+ ):
815
+ calkit.conda.check_env(
816
+ env_fpath=env_fpath, output_fpath=output_fpath, log_func=typer.echo
817
+ )
@@ -0,0 +1,162 @@
1
+ """Functionality for working with conda environments."""
2
+
3
+ import json
4
+ import os
5
+ import subprocess
6
+
7
+ import calkit
8
+ from calkit import ryaml
9
+
10
+
11
+ def check_env(
12
+ env_fpath: str = "environment.yml",
13
+ use_mamba=True,
14
+ log_func=None,
15
+ output_fpath: str = None,
16
+ ):
17
+ """Check that a conda environment matches its spec.
18
+
19
+ If it doesn't match, recreate it.
20
+
21
+ Note that this only works with exact or no version specification.
22
+ Using greater than and less than operators is not supported.
23
+ """
24
+ if log_func is None:
25
+ log_func = calkit.logger.info
26
+ log_func(f"Checking conda env defined in {env_fpath}")
27
+ envs = json.loads(
28
+ subprocess.check_output(["conda", "env", "list", "--json"]).decode()
29
+ )["envs"]
30
+ # Get existing env names, but skip the base environment
31
+ existing_env_names = [env.split("/")[-1] for env in envs[1:]]
32
+ with open(env_fpath) as f:
33
+ env_spec = ryaml.load(f)
34
+ env_name = env_spec["name"]
35
+ env_check_fpath = os.path.join(
36
+ os.path.expanduser("~"),
37
+ ".calkit",
38
+ "conda-env-checks",
39
+ env_name + ".yml",
40
+ )
41
+ env_check_dir = os.path.dirname(env_check_fpath)
42
+ os.makedirs(env_check_dir, exist_ok=True)
43
+ conda = "mamba" if use_mamba else "conda"
44
+ # Check if env even exists
45
+ if env_name not in existing_env_names:
46
+ log_func(f"Environment {env_name} doesn't exist; creating")
47
+ # Environment doesn't exist, so create it
48
+ subprocess.check_call([conda, "env", "create", "-f", env_fpath])
49
+ env_needs_rebuild = False
50
+ env_needs_export = True
51
+ else:
52
+ env_needs_export = False
53
+ # Environment does exist, so check it
54
+ if os.path.isfile(env_check_fpath):
55
+ # Open up the env check result file
56
+ with open(env_check_fpath) as f:
57
+ env_check = ryaml.load(f)
58
+ # Check the prefix mtime saved to that file against the actual
59
+ # prefix mtime
60
+ # If they match, the environment saved in env_check is still
61
+ # valid, so we don't need to re-export
62
+ existing_mtime = env_check["mtime"]
63
+ current_mtime = os.path.getmtime(env_check["prefix"])
64
+ env_needs_export = existing_mtime != current_mtime
65
+ else:
66
+ env_needs_export = True
67
+ if env_needs_export:
68
+ log_func(f"Exporting existing env to {env_check_fpath}")
69
+ env_check = json.loads(
70
+ subprocess.check_output(
71
+ [
72
+ "conda",
73
+ "env",
74
+ "export",
75
+ "-n",
76
+ env_name,
77
+ "--no-builds",
78
+ "--json",
79
+ ]
80
+ ).decode()
81
+ )
82
+ env_check["mtime"] = os.path.getmtime(env_check["prefix"])
83
+ with open(env_check_fpath, "w") as f:
84
+ ryaml.dump(env_check, f)
85
+ # Determine if the env matches
86
+ env_needs_rebuild = False
87
+ existing_conda_deps = env_check["dependencies"][:-1]
88
+ existing_pip_deps = env_check["dependencies"][-1]["pip"]
89
+ required_conda_deps = env_spec["dependencies"][:-1]
90
+ required_pip_deps = env_spec["dependencies"][-1]["pip"]
91
+ log_func("Checking conda dependencies")
92
+ for dep in required_conda_deps:
93
+ dep_split = dep.split("=")
94
+ package = dep_split[0]
95
+ if len(dep_split) > 1:
96
+ version = dep_split[1]
97
+ else:
98
+ version = None
99
+ if version is not None and dep not in existing_conda_deps:
100
+ log_func(f"Found missing dependency: {dep}")
101
+ env_needs_rebuild = True
102
+ break
103
+ elif version is None:
104
+ if package not in [
105
+ d.split("=")[0] for d in existing_conda_deps
106
+ ]:
107
+ log_func(f"Found missing dependency: {dep}")
108
+ env_needs_rebuild = True
109
+ break
110
+ if not env_needs_rebuild:
111
+ log_func("Checking pip dependencies")
112
+ for dep in required_pip_deps:
113
+ dep_split = dep.split("==")
114
+ package = dep_split[0]
115
+ if len(dep_split) > 1:
116
+ version = dep_split[1]
117
+ else:
118
+ version = None
119
+ if version is not None and dep not in existing_pip_deps:
120
+ env_needs_rebuild = True
121
+ log_func(f"Found missing dependency: {dep}")
122
+ break
123
+ elif version is None:
124
+ if package not in [
125
+ d.split("==")[0] for d in existing_pip_deps
126
+ ]:
127
+ log_func(f"Found missing dependency: {dep}")
128
+ env_needs_rebuild = True
129
+ break
130
+ if env_needs_rebuild:
131
+ log_func(f"Rebuilding {env_name} since it does not match spec")
132
+ subprocess.check_call([conda, "env", "create", "-y", "-f", env_fpath])
133
+ env_needs_export = True
134
+ else:
135
+ log_func(f"Environment {env_name} matches spec")
136
+ # If the env was rebuilt, export the env check
137
+ if env_needs_export:
138
+ log_func(f"Exporting existing env to {env_check_fpath}")
139
+ env_check = json.loads(
140
+ subprocess.check_output(
141
+ [
142
+ "conda",
143
+ "env",
144
+ "export",
145
+ "-n",
146
+ env_name,
147
+ "--no-builds",
148
+ "--json",
149
+ ]
150
+ ).decode()
151
+ )
152
+ env_check["mtime"] = os.path.getmtime(env_check["prefix"])
153
+ with open(env_check_fpath, "w") as f:
154
+ ryaml.dump(env_check, f)
155
+ if output_fpath is None:
156
+ fname, ext = os.path.splitext(env_fpath)
157
+ output_fpath = fname + "-lock" + ext
158
+ log_func(f"Exporting lock file to {output_fpath}")
159
+ with open(output_fpath, "w") as f:
160
+ _ = env_check.pop("mtime")
161
+ _ = env_check.pop("prefix")
162
+ ryaml.dump(env_check, f)
@@ -0,0 +1,64 @@
1
+ # Keeping track of conda environments
2
+
3
+ It can be difficult to know if a conda environment present on your machine
4
+ matches one in your project's `environment.yml` file.
5
+ You may be collaborating with a team on a project and someone adds a
6
+ dependency, then all of a sudden things won't run on your
7
+ machine.
8
+ Or maybe you use multiple machines to run the same project.
9
+
10
+ Calkit has a feature to make working with conda environments more
11
+ reproducible,
12
+ without needing to rebuild the environment all the time.
13
+ If you're working on a project with a conda `environment.yml` file,
14
+ you can simply run:
15
+
16
+ ```sh
17
+ calkit check-conda-env
18
+ ```
19
+
20
+ and the environment on your local machine will be rebuilt if it doesn't
21
+ match the spec,
22
+ or it will be created if it doesn't exist.
23
+ Note that this will delete the existing environment and rebuild from scratch,
24
+ so make sure you don't have any unsaved changes in there.
25
+ Also note that for some combinations of `pip` dependencies,
26
+ it may not be possible to arrive at an environment that matches the spec,
27
+ so it is recommended to only put the "top-level" dependencies in
28
+ `environment.yml` rather than a full export.
29
+
30
+ We can also add an environment check to our DVC pipeline
31
+ so if we're running any stages with that environment, we make sure
32
+ the environment is correct before doing so.
33
+ For example, we could have the following in `dvc.yaml`:
34
+
35
+ ```yaml
36
+ stages:
37
+ check-env:
38
+ cmd: calkit check-conda-env
39
+ deps:
40
+ - environment.yml
41
+ outs:
42
+ - environment-lock.yml:
43
+ cache: false
44
+ always_changed: true
45
+ run-my-script:
46
+ cmd: conda run -n my-env python my-script.py
47
+ deps:
48
+ - my-script.py
49
+ - environment-lock.yml
50
+ ```
51
+
52
+ In the example above, we use the `always_changed` option so the conda env
53
+ will be checked in every call to `calkit run` or `dvc repro`.
54
+ If the output file `environment-lock.yml` changes,
55
+ DVC will rerun the `run-my-script` stage.
56
+ With the pipeline setup this way,
57
+ our collaborators (or ourselves on other computers)
58
+ can simply call `calkit run` without needing to think about
59
+ getting our conda environment into the correct state beforehand.
60
+
61
+ Note that this pattern can also be expanded to projects that use multiple
62
+ conda environments.
63
+ For example, if an environment spec is saved to `env-2.yml`,
64
+ we can call `calkit check-conda-env -f env-2.yml`.
File without changes
File without changes