dvc-utils 0.0.9__tar.gz → 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: dvc-utils
3
- Version: 0.0.9
3
+ Version: 0.1.0
4
4
  Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
5
5
  Home-page: https://github.com/runsascoded/dvc-utils
6
6
  Author: Ryan Williams
@@ -12,6 +12,8 @@ License-File: LICENSE
12
12
  # dvc-utils
13
13
  Diff [DVC] files, optionally piping through other commands first.
14
14
 
15
+ [![dvc-utils on PyPI](https://img.shields.io/pypi/v/dvc-utils?label=dvc-utils)][PyPI]
16
+
15
17
  <!-- toc -->
16
18
  - [Installation](#installation)
17
19
  - [Usage](#usage)
@@ -246,7 +248,7 @@ pel "ddcr $r guc h1 spc kq kcr snc 'sdf seds' sort"
246
248
 
247
249
  <details>
248
250
  <summary>
249
- Aliases used in the pipeline:
251
+ Explanation of aliases
250
252
  </summary>
251
253
 
252
254
  - [`gdno`] (`git diff --name-only`): list files changed in the given commit range and directory
@@ -254,9 +256,9 @@ Aliases used in the pipeline:
254
256
  - [`ddcr`] (`dvc-diff -cr`): colorized `diff` output, revision range `$r`
255
257
  - [`guc`] (`gunzip -c`): uncompress the `.csv.gz` files
256
258
  - [`h1`] (`head -n1`): only examine each file's header line
257
- - [`spc`] (`tr , $'\n'`): split the header line by commas (so each column name will be on one line, for easier `diff`ing below)
258
- - [`kq`] (`tr -d '"'`): kill quote characters (in this case, header-column name quoting changed, but I don't care about that)
259
- - [`kcr`] (`tr -d '\r'`): kill carriage returns (line endings also changed)
259
+ - [`spc`] (`tr , $'\n'`): **sp**lit the header line by **c**ommas (so each column name will be on one line, for easier `diff`ing below)
260
+ - [`kq`] (`tr -d '"'`): **k**ill **q**uote characters (in this case, header-column name quoting changed, but I don't care about that)
261
+ - [`kcr`] (`tr -d '\r'`): **k**ill **c**arriage **r**eturns (line endings also changed)
260
262
  - [`snc`] (`sed -f 'snake_case.sed'`): snake-case column names
261
263
  - [`sdf`] (`sed -f`): execute the `sed` substitution commands defined in the `seds` file above
262
264
  - `sort`: sort the column names alphabetically (to identify missing or added columns, ignore rearrangements)
@@ -311,6 +313,7 @@ s3/ctbk/csvs/202003-citibike-tripdata.csv.gz.dvc:
311
313
  This helped me see that the data update in question (`c0..c1`) dropped some fields (`bikeid, birth_year`, `gender`, `tripduration`) and added others (`ride_id`, `rideable_type`), for `202001` and later.
312
314
 
313
315
  [DVC]: https://dvc.org/
316
+ [PyPI]: https://pypi.org/project/dvc-utils/
314
317
  [`parquet2json`]: https://github.com/jupiter/parquet2json
315
318
  [hudcostreets/nj-crashes]: https://github.com/hudcostreets/nj-crashes
316
319
  [Parquet]: https://parquet.apache.org/
@@ -1,6 +1,8 @@
1
1
  # dvc-utils
2
2
  Diff [DVC] files, optionally piping through other commands first.
3
3
 
4
+ [![dvc-utils on PyPI](https://img.shields.io/pypi/v/dvc-utils?label=dvc-utils)][PyPI]
5
+
4
6
  <!-- toc -->
5
7
  - [Installation](#installation)
6
8
  - [Usage](#usage)
@@ -235,7 +237,7 @@ pel "ddcr $r guc h1 spc kq kcr snc 'sdf seds' sort"
235
237
 
236
238
  <details>
237
239
  <summary>
238
- Aliases used in the pipeline:
240
+ Explanation of aliases
239
241
  </summary>
240
242
 
241
243
  - [`gdno`] (`git diff --name-only`): list files changed in the given commit range and directory
@@ -243,9 +245,9 @@ Aliases used in the pipeline:
243
245
  - [`ddcr`] (`dvc-diff -cr`): colorized `diff` output, revision range `$r`
244
246
  - [`guc`] (`gunzip -c`): uncompress the `.csv.gz` files
245
247
  - [`h1`] (`head -n1`): only examine each file's header line
246
- - [`spc`] (`tr , $'\n'`): split the header line by commas (so each column name will be on one line, for easier `diff`ing below)
247
- - [`kq`] (`tr -d '"'`): kill quote characters (in this case, header-column name quoting changed, but I don't care about that)
248
- - [`kcr`] (`tr -d '\r'`): kill carriage returns (line endings also changed)
248
+ - [`spc`] (`tr , $'\n'`): **sp**lit the header line by **c**ommas (so each column name will be on one line, for easier `diff`ing below)
249
+ - [`kq`] (`tr -d '"'`): **k**ill **q**uote characters (in this case, header-column name quoting changed, but I don't care about that)
250
+ - [`kcr`] (`tr -d '\r'`): **k**ill **c**arriage **r**eturns (line endings also changed)
249
251
  - [`snc`] (`sed -f 'snake_case.sed'`): snake-case column names
250
252
  - [`sdf`] (`sed -f`): execute the `sed` substitution commands defined in the `seds` file above
251
253
  - `sort`: sort the column names alphabetically (to identify missing or added columns, ignore rearrangements)
@@ -300,6 +302,7 @@ s3/ctbk/csvs/202003-citibike-tripdata.csv.gz.dvc:
300
302
  This helped me see that the data update in question (`c0..c1`) dropped some fields (`bikeid, birth_year`, `gender`, `tripduration`) and added others (`ride_id`, `rideable_type`), for `202001` and later.
301
303
 
302
304
  [DVC]: https://dvc.org/
305
+ [PyPI]: https://pypi.org/project/dvc-utils/
303
306
  [`parquet2json`]: https://github.com/jupiter/parquet2json
304
307
  [hudcostreets/nj-crashes]: https://github.com/hudcostreets/nj-crashes
305
308
  [Parquet]: https://parquet.apache.org/
@@ -4,7 +4,8 @@ from typing import Tuple
4
4
 
5
5
  import click
6
6
  from click import option, argument, group
7
- from utz import diff_cmds, process, err
7
+ from utz import process, err
8
+ from qmdx import join_pipelines
8
9
 
9
10
  from dvc_utils.path import dvc_paths, dvc_path as dvc_cache_path
10
11
 
@@ -62,39 +63,32 @@ def dvc_utils_diff(
62
63
  raise ValueError(f"Invalid refspec: {refspec}")
63
64
 
64
65
  log = err if verbose else False
65
- before_path = dvc_cache_path(before, dvc_path, log=log)
66
- after_path = path if after is None else dvc_cache_path(after, dvc_path, log=log)
67
-
66
+ path1 = dvc_cache_path(before, dvc_path, log=log)
67
+ path2 = path if after is None else dvc_cache_path(after, dvc_path, log=log)
68
+
69
+ diff_args = [
70
+ *(['-w'] if ignore_whitespace else []),
71
+ *(['-U', str(unified)] if unified is not None else []),
72
+ *(['--color=always'] if color else []),
73
+ ]
68
74
  if cmds:
69
75
  cmd, *sub_cmds = cmds
76
+ cmds1 = [ f'{cmd} {path1}', *sub_cmds ]
77
+ cmds2 = [ f'{cmd} {path2}', *sub_cmds ]
70
78
  if not shell:
71
- sub_cmds = [ shlex.split(c) for c in sub_cmds ]
72
- before_cmds = [
73
- shlex.split(f'{cmd} {before_path}'),
74
- *sub_cmds,
75
- ]
76
- after_cmds = [
77
- shlex.split(f'{cmd} {after_path}'),
78
- *sub_cmds,
79
- ]
80
- shell_kwargs = {}
81
- else:
82
- before_cmds = [ f'{cmd} {before_path}', *sub_cmds ]
83
- after_cmds = [ f'{cmd} {after_path}', *sub_cmds ]
84
- shell_kwargs = dict(shell=shell)
85
-
86
- diff_cmds(
87
- before_cmds,
88
- after_cmds,
79
+ cmds1 = [ shlex.split(cmd) for cmd in cmds1 ]
80
+ cmds2 = [ shlex.split(cmd) for cmd in cmds2 ]
81
+
82
+ join_pipelines(
83
+ base_cmd=['diff', *diff_args],
84
+ cmds1=cmds1,
85
+ cmds2=cmds2,
89
86
  verbose=verbose,
90
- color=color,
91
- unified=unified,
92
- ignore_whitespace=ignore_whitespace,
87
+ shell=shell,
93
88
  shell_executable=shell_executable,
94
- **shell_kwargs,
95
89
  )
96
90
  else:
97
- process.run('diff', before_path, after_path, log=log)
91
+ process.run('diff', *diff_args, path1, path2, log=log)
98
92
 
99
93
 
100
94
  if __name__ == '__main__':
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: dvc-utils
3
- Version: 0.0.9
3
+ Version: 0.1.0
4
4
  Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
5
5
  Home-page: https://github.com/runsascoded/dvc-utils
6
6
  Author: Ryan Williams
@@ -12,6 +12,8 @@ License-File: LICENSE
12
12
  # dvc-utils
13
13
  Diff [DVC] files, optionally piping through other commands first.
14
14
 
15
+ [![dvc-utils on PyPI](https://img.shields.io/pypi/v/dvc-utils?label=dvc-utils)][PyPI]
16
+
15
17
  <!-- toc -->
16
18
  - [Installation](#installation)
17
19
  - [Usage](#usage)
@@ -246,7 +248,7 @@ pel "ddcr $r guc h1 spc kq kcr snc 'sdf seds' sort"
246
248
 
247
249
  <details>
248
250
  <summary>
249
- Aliases used in the pipeline:
251
+ Explanation of aliases
250
252
  </summary>
251
253
 
252
254
  - [`gdno`] (`git diff --name-only`): list files changed in the given commit range and directory
@@ -254,9 +256,9 @@ Aliases used in the pipeline:
254
256
  - [`ddcr`] (`dvc-diff -cr`): colorized `diff` output, revision range `$r`
255
257
  - [`guc`] (`gunzip -c`): uncompress the `.csv.gz` files
256
258
  - [`h1`] (`head -n1`): only examine each file's header line
257
- - [`spc`] (`tr , $'\n'`): split the header line by commas (so each column name will be on one line, for easier `diff`ing below)
258
- - [`kq`] (`tr -d '"'`): kill quote characters (in this case, header-column name quoting changed, but I don't care about that)
259
- - [`kcr`] (`tr -d '\r'`): kill carriage returns (line endings also changed)
259
+ - [`spc`] (`tr , $'\n'`): **sp**lit the header line by **c**ommas (so each column name will be on one line, for easier `diff`ing below)
260
+ - [`kq`] (`tr -d '"'`): **k**ill **q**uote characters (in this case, header-column name quoting changed, but I don't care about that)
261
+ - [`kcr`] (`tr -d '\r'`): **k**ill **c**arriage **r**eturns (line endings also changed)
260
262
  - [`snc`] (`sed -f 'snake_case.sed'`): snake-case column names
261
263
  - [`sdf`] (`sed -f`): execute the `sed` substitution commands defined in the `seds` file above
262
264
  - `sort`: sort the column names alphabetically (to identify missing or added columns, ignore rearrangements)
@@ -311,6 +313,7 @@ s3/ctbk/csvs/202003-citibike-tripdata.csv.gz.dvc:
311
313
  This helped me see that the data update in question (`c0..c1`) dropped some fields (`bikeid, birth_year`, `gender`, `tripduration`) and added others (`ride_id`, `rideable_type`), for `202001` and later.
312
314
 
313
315
  [DVC]: https://dvc.org/
316
+ [PyPI]: https://pypi.org/project/dvc-utils/
314
317
  [`parquet2json`]: https://github.com/jupiter/parquet2json
315
318
  [hudcostreets/nj-crashes]: https://github.com/hudcostreets/nj-crashes
316
319
  [Parquet]: https://parquet.apache.org/
@@ -8,4 +8,5 @@ dvc_utils.egg-info/PKG-INFO
8
8
  dvc_utils.egg-info/SOURCES.txt
9
9
  dvc_utils.egg-info/dependency_links.txt
10
10
  dvc_utils.egg-info/entry_points.txt
11
+ dvc_utils.egg-info/requires.txt
11
12
  dvc_utils.egg-info/top_level.txt
@@ -0,0 +1,4 @@
1
+ click
2
+ pyyaml
3
+ qmdx
4
+ utz>=0.11.3
@@ -2,11 +2,12 @@ from setuptools import setup
2
2
 
3
3
  setup(
4
4
  name='dvc-utils',
5
- version="0.0.9",
5
+ version="0.1.0",
6
6
  description="CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first",
7
7
  long_description=open("README.md").read(),
8
8
  long_description_content_type="text/markdown",
9
9
  packages=['dvc_utils'],
10
+ install_requires=open("requirements.txt").read(),
10
11
  entry_points={
11
12
  'console_scripts': [
12
13
  'dvc-utils = dvc_utils.cli:cli',
File without changes
File without changes
File without changes