dvc-utils 0.0.9__tar.gz → 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/PKG-INFO +8 -5
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/README.md +7 -4
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils/cli.py +21 -27
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils.egg-info/PKG-INFO +8 -5
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils.egg-info/SOURCES.txt +1 -0
- dvc-utils-0.1.0/dvc_utils.egg-info/requires.txt +4 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/setup.py +2 -1
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/LICENSE +0 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils/__init__.py +0 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils/path.py +0 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils.egg-info/dependency_links.txt +0 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils.egg-info/entry_points.txt +0 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/dvc_utils.egg-info/top_level.txt +0 -0
- {dvc-utils-0.0.9 → dvc-utils-0.1.0}/setup.cfg +0 -0
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: dvc-utils
|
3
|
-
Version: 0.0
|
3
|
+
Version: 0.1.0
|
4
4
|
Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
|
5
5
|
Home-page: https://github.com/runsascoded/dvc-utils
|
6
6
|
Author: Ryan Williams
|
@@ -12,6 +12,8 @@ License-File: LICENSE
|
|
12
12
|
# dvc-utils
|
13
13
|
Diff [DVC] files, optionally piping through other commands first.
|
14
14
|
|
15
|
+
[][PyPI]
|
16
|
+
|
15
17
|
<!-- toc -->
|
16
18
|
- [Installation](#installation)
|
17
19
|
- [Usage](#usage)
|
@@ -246,7 +248,7 @@ pel "ddcr $r guc h1 spc kq kcr snc 'sdf seds' sort"
|
|
246
248
|
|
247
249
|
<details>
|
248
250
|
<summary>
|
249
|
-
|
251
|
+
Explanation of aliases
|
250
252
|
</summary>
|
251
253
|
|
252
254
|
- [`gdno`] (`git diff --name-only`): list files changed in the given commit range and directory
|
@@ -254,9 +256,9 @@ Aliases used in the pipeline:
|
|
254
256
|
- [`ddcr`] (`dvc-diff -cr`): colorized `diff` output, revision range `$r`
|
255
257
|
- [`guc`] (`gunzip -c`): uncompress the `.csv.gz` files
|
256
258
|
- [`h1`] (`head -n1`): only examine each file's header line
|
257
|
-
- [`spc`] (`tr , $'\n'`):
|
258
|
-
- [`kq`] (`tr -d '"'`):
|
259
|
-
- [`kcr`] (`tr -d '\r'`):
|
259
|
+
- [`spc`] (`tr , $'\n'`): **sp**lit the header line by **c**ommas (so each column name will be on one line, for easier `diff`ing below)
|
260
|
+
- [`kq`] (`tr -d '"'`): **k**ill **q**uote characters (in this case, header-column name quoting changed, but I don't care about that)
|
261
|
+
- [`kcr`] (`tr -d '\r'`): **k**ill **c**arriage **r**eturns (line endings also changed)
|
260
262
|
- [`snc`] (`sed -f 'snake_case.sed'`): snake-case column names
|
261
263
|
- [`sdf`] (`sed -f`): execute the `sed` substitution commands defined in the `seds` file above
|
262
264
|
- `sort`: sort the column names alphabetically (to identify missing or added columns, ignore rearrangements)
|
@@ -311,6 +313,7 @@ s3/ctbk/csvs/202003-citibike-tripdata.csv.gz.dvc:
|
|
311
313
|
This helped me see that the data update in question (`c0..c1`) dropped some fields (`bikeid, birth_year`, `gender`, `tripduration`) and added others (`ride_id`, `rideable_type`), for `202001` and later.
|
312
314
|
|
313
315
|
[DVC]: https://dvc.org/
|
316
|
+
[PyPI]: https://pypi.org/project/dvc-utils/
|
314
317
|
[`parquet2json`]: https://github.com/jupiter/parquet2json
|
315
318
|
[hudcostreets/nj-crashes]: https://github.com/hudcostreets/nj-crashes
|
316
319
|
[Parquet]: https://parquet.apache.org/
|
@@ -1,6 +1,8 @@
|
|
1
1
|
# dvc-utils
|
2
2
|
Diff [DVC] files, optionally piping through other commands first.
|
3
3
|
|
4
|
+
[][PyPI]
|
5
|
+
|
4
6
|
<!-- toc -->
|
5
7
|
- [Installation](#installation)
|
6
8
|
- [Usage](#usage)
|
@@ -235,7 +237,7 @@ pel "ddcr $r guc h1 spc kq kcr snc 'sdf seds' sort"
|
|
235
237
|
|
236
238
|
<details>
|
237
239
|
<summary>
|
238
|
-
|
240
|
+
Explanation of aliases
|
239
241
|
</summary>
|
240
242
|
|
241
243
|
- [`gdno`] (`git diff --name-only`): list files changed in the given commit range and directory
|
@@ -243,9 +245,9 @@ Aliases used in the pipeline:
|
|
243
245
|
- [`ddcr`] (`dvc-diff -cr`): colorized `diff` output, revision range `$r`
|
244
246
|
- [`guc`] (`gunzip -c`): uncompress the `.csv.gz` files
|
245
247
|
- [`h1`] (`head -n1`): only examine each file's header line
|
246
|
-
- [`spc`] (`tr , $'\n'`):
|
247
|
-
- [`kq`] (`tr -d '"'`):
|
248
|
-
- [`kcr`] (`tr -d '\r'`):
|
248
|
+
- [`spc`] (`tr , $'\n'`): **sp**lit the header line by **c**ommas (so each column name will be on one line, for easier `diff`ing below)
|
249
|
+
- [`kq`] (`tr -d '"'`): **k**ill **q**uote characters (in this case, header-column name quoting changed, but I don't care about that)
|
250
|
+
- [`kcr`] (`tr -d '\r'`): **k**ill **c**arriage **r**eturns (line endings also changed)
|
249
251
|
- [`snc`] (`sed -f 'snake_case.sed'`): snake-case column names
|
250
252
|
- [`sdf`] (`sed -f`): execute the `sed` substitution commands defined in the `seds` file above
|
251
253
|
- `sort`: sort the column names alphabetically (to identify missing or added columns, ignore rearrangements)
|
@@ -300,6 +302,7 @@ s3/ctbk/csvs/202003-citibike-tripdata.csv.gz.dvc:
|
|
300
302
|
This helped me see that the data update in question (`c0..c1`) dropped some fields (`bikeid, birth_year`, `gender`, `tripduration`) and added others (`ride_id`, `rideable_type`), for `202001` and later.
|
301
303
|
|
302
304
|
[DVC]: https://dvc.org/
|
305
|
+
[PyPI]: https://pypi.org/project/dvc-utils/
|
303
306
|
[`parquet2json`]: https://github.com/jupiter/parquet2json
|
304
307
|
[hudcostreets/nj-crashes]: https://github.com/hudcostreets/nj-crashes
|
305
308
|
[Parquet]: https://parquet.apache.org/
|
@@ -4,7 +4,8 @@ from typing import Tuple
|
|
4
4
|
|
5
5
|
import click
|
6
6
|
from click import option, argument, group
|
7
|
-
from utz import
|
7
|
+
from utz import process, err
|
8
|
+
from qmdx import join_pipelines
|
8
9
|
|
9
10
|
from dvc_utils.path import dvc_paths, dvc_path as dvc_cache_path
|
10
11
|
|
@@ -62,39 +63,32 @@ def dvc_utils_diff(
|
|
62
63
|
raise ValueError(f"Invalid refspec: {refspec}")
|
63
64
|
|
64
65
|
log = err if verbose else False
|
65
|
-
|
66
|
-
|
67
|
-
|
66
|
+
path1 = dvc_cache_path(before, dvc_path, log=log)
|
67
|
+
path2 = path if after is None else dvc_cache_path(after, dvc_path, log=log)
|
68
|
+
|
69
|
+
diff_args = [
|
70
|
+
*(['-w'] if ignore_whitespace else []),
|
71
|
+
*(['-U', str(unified)] if unified is not None else []),
|
72
|
+
*(['--color=always'] if color else []),
|
73
|
+
]
|
68
74
|
if cmds:
|
69
75
|
cmd, *sub_cmds = cmds
|
76
|
+
cmds1 = [ f'{cmd} {path1}', *sub_cmds ]
|
77
|
+
cmds2 = [ f'{cmd} {path2}', *sub_cmds ]
|
70
78
|
if not shell:
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
]
|
76
|
-
|
77
|
-
|
78
|
-
*sub_cmds,
|
79
|
-
]
|
80
|
-
shell_kwargs = {}
|
81
|
-
else:
|
82
|
-
before_cmds = [ f'{cmd} {before_path}', *sub_cmds ]
|
83
|
-
after_cmds = [ f'{cmd} {after_path}', *sub_cmds ]
|
84
|
-
shell_kwargs = dict(shell=shell)
|
85
|
-
|
86
|
-
diff_cmds(
|
87
|
-
before_cmds,
|
88
|
-
after_cmds,
|
79
|
+
cmds1 = [ shlex.split(cmd) for cmd in cmds1 ]
|
80
|
+
cmds2 = [ shlex.split(cmd) for cmd in cmds2 ]
|
81
|
+
|
82
|
+
join_pipelines(
|
83
|
+
base_cmd=['diff', *diff_args],
|
84
|
+
cmds1=cmds1,
|
85
|
+
cmds2=cmds2,
|
89
86
|
verbose=verbose,
|
90
|
-
|
91
|
-
unified=unified,
|
92
|
-
ignore_whitespace=ignore_whitespace,
|
87
|
+
shell=shell,
|
93
88
|
shell_executable=shell_executable,
|
94
|
-
**shell_kwargs,
|
95
89
|
)
|
96
90
|
else:
|
97
|
-
process.run('diff',
|
91
|
+
process.run('diff', *diff_args, path1, path2, log=log)
|
98
92
|
|
99
93
|
|
100
94
|
if __name__ == '__main__':
|
@@ -1,6 +1,6 @@
|
|
1
1
|
Metadata-Version: 2.1
|
2
2
|
Name: dvc-utils
|
3
|
-
Version: 0.0
|
3
|
+
Version: 0.1.0
|
4
4
|
Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
|
5
5
|
Home-page: https://github.com/runsascoded/dvc-utils
|
6
6
|
Author: Ryan Williams
|
@@ -12,6 +12,8 @@ License-File: LICENSE
|
|
12
12
|
# dvc-utils
|
13
13
|
Diff [DVC] files, optionally piping through other commands first.
|
14
14
|
|
15
|
+
[][PyPI]
|
16
|
+
|
15
17
|
<!-- toc -->
|
16
18
|
- [Installation](#installation)
|
17
19
|
- [Usage](#usage)
|
@@ -246,7 +248,7 @@ pel "ddcr $r guc h1 spc kq kcr snc 'sdf seds' sort"
|
|
246
248
|
|
247
249
|
<details>
|
248
250
|
<summary>
|
249
|
-
|
251
|
+
Explanation of aliases
|
250
252
|
</summary>
|
251
253
|
|
252
254
|
- [`gdno`] (`git diff --name-only`): list files changed in the given commit range and directory
|
@@ -254,9 +256,9 @@ Aliases used in the pipeline:
|
|
254
256
|
- [`ddcr`] (`dvc-diff -cr`): colorized `diff` output, revision range `$r`
|
255
257
|
- [`guc`] (`gunzip -c`): uncompress the `.csv.gz` files
|
256
258
|
- [`h1`] (`head -n1`): only examine each file's header line
|
257
|
-
- [`spc`] (`tr , $'\n'`):
|
258
|
-
- [`kq`] (`tr -d '"'`):
|
259
|
-
- [`kcr`] (`tr -d '\r'`):
|
259
|
+
- [`spc`] (`tr , $'\n'`): **sp**lit the header line by **c**ommas (so each column name will be on one line, for easier `diff`ing below)
|
260
|
+
- [`kq`] (`tr -d '"'`): **k**ill **q**uote characters (in this case, header-column name quoting changed, but I don't care about that)
|
261
|
+
- [`kcr`] (`tr -d '\r'`): **k**ill **c**arriage **r**eturns (line endings also changed)
|
260
262
|
- [`snc`] (`sed -f 'snake_case.sed'`): snake-case column names
|
261
263
|
- [`sdf`] (`sed -f`): execute the `sed` substitution commands defined in the `seds` file above
|
262
264
|
- `sort`: sort the column names alphabetically (to identify missing or added columns, ignore rearrangements)
|
@@ -311,6 +313,7 @@ s3/ctbk/csvs/202003-citibike-tripdata.csv.gz.dvc:
|
|
311
313
|
This helped me see that the data update in question (`c0..c1`) dropped some fields (`bikeid, birth_year`, `gender`, `tripduration`) and added others (`ride_id`, `rideable_type`), for `202001` and later.
|
312
314
|
|
313
315
|
[DVC]: https://dvc.org/
|
316
|
+
[PyPI]: https://pypi.org/project/dvc-utils/
|
314
317
|
[`parquet2json`]: https://github.com/jupiter/parquet2json
|
315
318
|
[hudcostreets/nj-crashes]: https://github.com/hudcostreets/nj-crashes
|
316
319
|
[Parquet]: https://parquet.apache.org/
|
@@ -2,11 +2,12 @@ from setuptools import setup
|
|
2
2
|
|
3
3
|
setup(
|
4
4
|
name='dvc-utils',
|
5
|
-
version="0.0
|
5
|
+
version="0.1.0",
|
6
6
|
description="CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first",
|
7
7
|
long_description=open("README.md").read(),
|
8
8
|
long_description_content_type="text/markdown",
|
9
9
|
packages=['dvc_utils'],
|
10
|
+
install_requires=open("requirements.txt").read(),
|
10
11
|
entry_points={
|
11
12
|
'console_scripts': [
|
12
13
|
'dvc-utils = dvc_utils.cli:cli',
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|