dvc-utils 0.0.1__tar.gz → 0.0.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,201 @@
1
+ Metadata-Version: 2.1
2
+ Name: dvc-utils
3
+ Version: 0.0.2
4
+ Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
5
+ Home-page: https://github.com/runsascoded/dvc-utils
6
+ Author: Ryan Williams
7
+ Author-email: ryan@runsascoded.com
8
+ License: MIT
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE
11
+
12
+ # dvc-utils
13
+ CLI for diffing [DVC] files at two commits (or one commit vs. current worktree), optionally passing both through another command first
14
+
15
+ ## Installation
16
+ ```bash
17
+ pip install dvc-utils
18
+ ```
19
+
20
+ ## Usage
21
+ ```bash
22
+ dvc-utils --help
23
+ # Usage: dvc-utils [OPTIONS] COMMAND [ARGS]...
24
+ #
25
+ # Options:
26
+ # --help Show this message and exit.
27
+ #
28
+ # Commands:
29
+ # diff Diff a DVC-tracked file at two commits (or one commit vs. current
30
+ # worktree), optionally passing both through another command first
31
+ ```
32
+
33
+ ### `dvc-utils diff`
34
+ ```bash
35
+ dvc-utils diff --help
36
+ # Usage: dvc-utils diff [OPTIONS] [cmd...] <path>
37
+ #
38
+ # Diff a file at two commits (or one commit vs. current worktree), optionally
39
+ # passing both through `cmd` first
40
+ #
41
+ # Examples:
42
+ #
43
+ # dvc-utils diff -r HEAD^..HEAD wc -l foo.dvc # Compare the number of lines
44
+ # (`wc -l`) in `foo` (the file referenced by `foo.dvc`) at the previous vs.
45
+ # current commit (`HEAD^..HEAD`).
46
+ #
47
+ # dvc-utils diff md5sum foo # Diff the `md5sum` of `foo` (".dvc" extension is
48
+ # optional) at HEAD (last committed value) vs. the current worktree content.
49
+ #
50
+ # Options:
51
+ # -r, --refspec TEXT <commit 1>..<commit 2> (compare two commits) or <commit>
52
+ # (compare <commit> to the worktree)
53
+ # -S, --no-shell Don't pass `shell=True` to Python `subprocess`es
54
+ # -v, --verbose Log intermediate commands to stderr
55
+ # --help Show this message and exit.
56
+ ```
57
+
58
+ ## Examples
59
+ See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
60
+
61
+ ```bash
62
+ git clone https://github.com/neighbor-ryan/nj-crashes
63
+ commit=c8ae28e
64
+ path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
65
+ ```
66
+
67
+ ### Parquet schema diff
68
+ Use [`parquet2json`] to observe schema changes to a Parquet file, in [a given commit][commit] from [neighbor-ryan/nj-crashes]:
69
+ ```bash
70
+ parquet_schema() {
71
+ parquet2json "$1" schema
72
+ }
73
+ export -f parquet_schema
74
+ dvc-utils diff -r $commit^..$commit parquet_schema $path
75
+ ```
76
+ <details><summary>Output</summary>
77
+
78
+ ```diff
79
+ 2d1
80
+ < OPTIONAL BYTE_ARRAY Year (STRING);
81
+ 8,10d6
82
+ < OPTIONAL BYTE_ARRAY Crash Date (STRING);
83
+ < OPTIONAL BYTE_ARRAY Crash Day Of Week (STRING);
84
+ < OPTIONAL BYTE_ARRAY Crash Time (STRING);
85
+ 14,17c10,13
86
+ < OPTIONAL BYTE_ARRAY Total Killed (STRING);
87
+ < OPTIONAL BYTE_ARRAY Total Injured (STRING);
88
+ < OPTIONAL BYTE_ARRAY Pedestrians Killed (STRING);
89
+ < OPTIONAL BYTE_ARRAY Pedestrians Injured (STRING);
90
+ ---
91
+ > OPTIONAL INT64 Total Killed;
92
+ > OPTIONAL INT64 Total Injured;
93
+ > OPTIONAL INT64 Pedestrians Killed;
94
+ > OPTIONAL INT64 Pedestrians Injured;
95
+ 20,21c16,17
96
+ < OPTIONAL BYTE_ARRAY Alcohol Involved (STRING);
97
+ < OPTIONAL BYTE_ARRAY HazMat Involved (STRING);
98
+ ---
99
+ > OPTIONAL BOOLEAN Alcohol Involved;
100
+ > OPTIONAL BOOLEAN HazMat Involved;
101
+ 23c19
102
+ < OPTIONAL BYTE_ARRAY Total Vehicles Involved (STRING);
103
+ ---
104
+ > OPTIONAL INT64 Total Vehicles Involved;
105
+ 29c25
106
+ < OPTIONAL BYTE_ARRAY Mile Post (STRING);
107
+ ---
108
+ > OPTIONAL DOUBLE Mile Post;
109
+ 47,48c43,44
110
+ < OPTIONAL BYTE_ARRAY Latitude (STRING);
111
+ < OPTIONAL BYTE_ARRAY Longitude (STRING);
112
+ ---
113
+ > OPTIONAL DOUBLE Latitude;
114
+ > OPTIONAL DOUBLE Longitude;
115
+ 51a48
116
+ > OPTIONAL INT64 Date (TIMESTAMP(MICROS,false));
117
+ ```
118
+
119
+ Here we can see that various date/time columns were consolidated, and several stringly-typed columns were converted to ints, floats, and booleans.
120
+
121
+ </details>
122
+
123
+ ### Parquet row diff
124
+ Diff the first row of the Parquet file above (pretty-printed as JSON), before and after the given commit:
125
+
126
+ ```bash
127
+ pretty_print_first_row() {
128
+ parquet2json "$1" cat -l 1 | jq .
129
+ }
130
+ export -f pretty_print_first_row
131
+ dvc-utils diff -r $commit^..$commit pretty_print_first_row $path
132
+ ```
133
+
134
+ <details><summary>Output</summary>
135
+
136
+ ```diff
137
+ 2d1
138
+ < "Year": "2001",
139
+ 8,10d6
140
+ < "Crash Date": "12/21/2001",
141
+ < "Crash Day Of Week": "F",
142
+ < "Crash Time": "1834",
143
+ 14,17c10,13
144
+ < "Total Killed": "0",
145
+ < "Total Injured": "0",
146
+ < "Pedestrians Killed": "0",
147
+ < "Pedestrians Injured": "0",
148
+ ---
149
+ > "Total Killed": 0,
150
+ > "Total Injured": 0,
151
+ > "Pedestrians Killed": 0,
152
+ > "Pedestrians Injured": 0,
153
+ 20,21c16,17
154
+ < "Alcohol Involved": "N",
155
+ < "HazMat Involved": "N",
156
+ ---
157
+ > "Alcohol Involved": false,
158
+ > "HazMat Involved": false,
159
+ 23c19
160
+ < "Total Vehicles Involved": "2",
161
+ ---
162
+ > "Total Vehicles Involved": 2,
163
+ 29c25
164
+ < "Mile Post": "",
165
+ ---
166
+ > "Mile Post": null,
167
+ 47,48c43,44
168
+ < "Latitude": "",
169
+ < "Longitude": "",
170
+ ---
171
+ > "Latitude": null,
172
+ > "Longitude": null,
173
+ 51c47,48
174
+ < "Reporting Badge No.": "830"
175
+ ---
176
+ > "Reporting Badge No.": "830",
177
+ > "Date": "2001-12-21 18:34:00 +00:00"
178
+ ```
179
+
180
+ This reflects the schema changes above.
181
+
182
+ </details>
183
+
184
+ ### Parquet row count diff
185
+ ```bash
186
+ parquet_row_count() {
187
+ parquet2json "$1" rowcount
188
+ }
189
+ export -f parquet_row_count
190
+ dvc-utils diff -r $commit^..$commit parquet_row_count $path
191
+ ```
192
+
193
+ This time we get no output; [the given `$commit`][commit] didn't change the row count in the DVC-tracked Parquet file [`$path`][commit path].
194
+
195
+ [DVC]: https://dvc.org/
196
+ [`parquet2json`]: https://github.com/jupiter/parquet2json
197
+ [neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
198
+ [Parquet]: https://parquet.apache.org/
199
+ [commit]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7
200
+ [commit path]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7#diff-7f812dce61e0996354f4af414203e0933ccdfe9613cb406c40c1c41a14b9769c
201
+ [neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
@@ -45,14 +45,21 @@ dvc-utils diff --help
45
45
  ```
46
46
 
47
47
  ## Examples
48
- Use [`parquet2json`] to observe schema changes to a Parquet file, in a given commit from [neighbor-ryan/nj-crashes](https://github.com/neighbor-ryan/nj-crashes):
48
+ See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
49
+
50
+ ```bash
51
+ git clone https://github.com/neighbor-ryan/nj-crashes
52
+ commit=c8ae28e
53
+ path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
54
+ ```
55
+
56
+ ### Parquet schema diff
57
+ Use [`parquet2json`] to observe schema changes to a Parquet file, in [a given commit][commit] from [neighbor-ryan/nj-crashes]:
49
58
  ```bash
50
59
  parquet_schema() {
51
60
  parquet2json "$1" schema
52
61
  }
53
62
  export -f parquet_schema
54
- commit=7fa6a07
55
- path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
56
63
  dvc-utils diff -r $commit^..$commit parquet_schema $path
57
64
  ```
58
65
  <details><summary>Output</summary>
@@ -97,8 +104,12 @@ dvc-utils diff -r $commit^..$commit parquet_schema $path
97
104
  51a48
98
105
  > OPTIONAL INT64 Date (TIMESTAMP(MICROS,false));
99
106
  ```
107
+
108
+ Here we can see that various date/time columns were consolidated, and several stringly-typed columns were converted to ints, floats, and booleans.
109
+
100
110
  </details>
101
111
 
112
+ ### Parquet row diff
102
113
  Diff the first row of the Parquet file above (pretty-printed as JSON), before and after the given commit:
103
114
 
104
115
  ```bash
@@ -154,10 +165,26 @@ dvc-utils diff -r $commit^..$commit pretty_print_first_row $path
154
165
  > "Reporting Badge No.": "830",
155
166
  > "Date": "2001-12-21 18:34:00 +00:00"
156
167
  ```
168
+
169
+ This reflects the schema changes above.
170
+
157
171
  </details>
158
172
 
173
+ ### Parquet row count diff
174
+ ```bash
175
+ parquet_row_count() {
176
+ parquet2json "$1" rowcount
177
+ }
178
+ export -f parquet_row_count
179
+ dvc-utils diff -r $commit^..$commit parquet_row_count $path
180
+ ```
159
181
 
182
+ This time we get no output; [the given `$commit`][commit] didn't change the row count in the DVC-tracked Parquet file [`$path`][commit path].
160
183
 
161
184
  [DVC]: https://dvc.org/
162
185
  [`parquet2json`]: https://github.com/jupiter/parquet2json
163
186
  [neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
187
+ [Parquet]: https://parquet.apache.org/
188
+ [commit]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7
189
+ [commit path]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7#diff-7f812dce61e0996354f4af414203e0933ccdfe9613cb406c40c1c41a14b9769c
190
+ [neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
@@ -0,0 +1,201 @@
1
+ Metadata-Version: 2.1
2
+ Name: dvc-utils
3
+ Version: 0.0.2
4
+ Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
5
+ Home-page: https://github.com/runsascoded/dvc-utils
6
+ Author: Ryan Williams
7
+ Author-email: ryan@runsascoded.com
8
+ License: MIT
9
+ Description-Content-Type: text/markdown
10
+ License-File: LICENSE
11
+
12
+ # dvc-utils
13
+ CLI for diffing [DVC] files at two commits (or one commit vs. current worktree), optionally passing both through another command first
14
+
15
+ ## Installation
16
+ ```bash
17
+ pip install dvc-utils
18
+ ```
19
+
20
+ ## Usage
21
+ ```bash
22
+ dvc-utils --help
23
+ # Usage: dvc-utils [OPTIONS] COMMAND [ARGS]...
24
+ #
25
+ # Options:
26
+ # --help Show this message and exit.
27
+ #
28
+ # Commands:
29
+ # diff Diff a DVC-tracked file at two commits (or one commit vs. current
30
+ # worktree), optionally passing both through another command first
31
+ ```
32
+
33
+ ### `dvc-utils diff`
34
+ ```bash
35
+ dvc-utils diff --help
36
+ # Usage: dvc-utils diff [OPTIONS] [cmd...] <path>
37
+ #
38
+ # Diff a file at two commits (or one commit vs. current worktree), optionally
39
+ # passing both through `cmd` first
40
+ #
41
+ # Examples:
42
+ #
43
+ # dvc-utils diff -r HEAD^..HEAD wc -l foo.dvc # Compare the number of lines
44
+ # (`wc -l`) in `foo` (the file referenced by `foo.dvc`) at the previous vs.
45
+ # current commit (`HEAD^..HEAD`).
46
+ #
47
+ # dvc-utils diff md5sum foo # Diff the `md5sum` of `foo` (".dvc" extension is
48
+ # optional) at HEAD (last committed value) vs. the current worktree content.
49
+ #
50
+ # Options:
51
+ # -r, --refspec TEXT <commit 1>..<commit 2> (compare two commits) or <commit>
52
+ # (compare <commit> to the worktree)
53
+ # -S, --no-shell Don't pass `shell=True` to Python `subprocess`es
54
+ # -v, --verbose Log intermediate commands to stderr
55
+ # --help Show this message and exit.
56
+ ```
57
+
58
+ ## Examples
59
+ See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
60
+
61
+ ```bash
62
+ git clone https://github.com/neighbor-ryan/nj-crashes
63
+ commit=c8ae28e
64
+ path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
65
+ ```
66
+
67
+ ### Parquet schema diff
68
+ Use [`parquet2json`] to observe schema changes to a Parquet file, in [a given commit][commit] from [neighbor-ryan/nj-crashes]:
69
+ ```bash
70
+ parquet_schema() {
71
+ parquet2json "$1" schema
72
+ }
73
+ export -f parquet_schema
74
+ dvc-utils diff -r $commit^..$commit parquet_schema $path
75
+ ```
76
+ <details><summary>Output</summary>
77
+
78
+ ```diff
79
+ 2d1
80
+ < OPTIONAL BYTE_ARRAY Year (STRING);
81
+ 8,10d6
82
+ < OPTIONAL BYTE_ARRAY Crash Date (STRING);
83
+ < OPTIONAL BYTE_ARRAY Crash Day Of Week (STRING);
84
+ < OPTIONAL BYTE_ARRAY Crash Time (STRING);
85
+ 14,17c10,13
86
+ < OPTIONAL BYTE_ARRAY Total Killed (STRING);
87
+ < OPTIONAL BYTE_ARRAY Total Injured (STRING);
88
+ < OPTIONAL BYTE_ARRAY Pedestrians Killed (STRING);
89
+ < OPTIONAL BYTE_ARRAY Pedestrians Injured (STRING);
90
+ ---
91
+ > OPTIONAL INT64 Total Killed;
92
+ > OPTIONAL INT64 Total Injured;
93
+ > OPTIONAL INT64 Pedestrians Killed;
94
+ > OPTIONAL INT64 Pedestrians Injured;
95
+ 20,21c16,17
96
+ < OPTIONAL BYTE_ARRAY Alcohol Involved (STRING);
97
+ < OPTIONAL BYTE_ARRAY HazMat Involved (STRING);
98
+ ---
99
+ > OPTIONAL BOOLEAN Alcohol Involved;
100
+ > OPTIONAL BOOLEAN HazMat Involved;
101
+ 23c19
102
+ < OPTIONAL BYTE_ARRAY Total Vehicles Involved (STRING);
103
+ ---
104
+ > OPTIONAL INT64 Total Vehicles Involved;
105
+ 29c25
106
+ < OPTIONAL BYTE_ARRAY Mile Post (STRING);
107
+ ---
108
+ > OPTIONAL DOUBLE Mile Post;
109
+ 47,48c43,44
110
+ < OPTIONAL BYTE_ARRAY Latitude (STRING);
111
+ < OPTIONAL BYTE_ARRAY Longitude (STRING);
112
+ ---
113
+ > OPTIONAL DOUBLE Latitude;
114
+ > OPTIONAL DOUBLE Longitude;
115
+ 51a48
116
+ > OPTIONAL INT64 Date (TIMESTAMP(MICROS,false));
117
+ ```
118
+
119
+ Here we can see that various date/time columns were consolidated, and several stringly-typed columns were converted to ints, floats, and booleans.
120
+
121
+ </details>
122
+
123
+ ### Parquet row diff
124
+ Diff the first row of the Parquet file above (pretty-printed as JSON), before and after the given commit:
125
+
126
+ ```bash
127
+ pretty_print_first_row() {
128
+ parquet2json "$1" cat -l 1 | jq .
129
+ }
130
+ export -f pretty_print_first_row
131
+ dvc-utils diff -r $commit^..$commit pretty_print_first_row $path
132
+ ```
133
+
134
+ <details><summary>Output</summary>
135
+
136
+ ```diff
137
+ 2d1
138
+ < "Year": "2001",
139
+ 8,10d6
140
+ < "Crash Date": "12/21/2001",
141
+ < "Crash Day Of Week": "F",
142
+ < "Crash Time": "1834",
143
+ 14,17c10,13
144
+ < "Total Killed": "0",
145
+ < "Total Injured": "0",
146
+ < "Pedestrians Killed": "0",
147
+ < "Pedestrians Injured": "0",
148
+ ---
149
+ > "Total Killed": 0,
150
+ > "Total Injured": 0,
151
+ > "Pedestrians Killed": 0,
152
+ > "Pedestrians Injured": 0,
153
+ 20,21c16,17
154
+ < "Alcohol Involved": "N",
155
+ < "HazMat Involved": "N",
156
+ ---
157
+ > "Alcohol Involved": false,
158
+ > "HazMat Involved": false,
159
+ 23c19
160
+ < "Total Vehicles Involved": "2",
161
+ ---
162
+ > "Total Vehicles Involved": 2,
163
+ 29c25
164
+ < "Mile Post": "",
165
+ ---
166
+ > "Mile Post": null,
167
+ 47,48c43,44
168
+ < "Latitude": "",
169
+ < "Longitude": "",
170
+ ---
171
+ > "Latitude": null,
172
+ > "Longitude": null,
173
+ 51c47,48
174
+ < "Reporting Badge No.": "830"
175
+ ---
176
+ > "Reporting Badge No.": "830",
177
+ > "Date": "2001-12-21 18:34:00 +00:00"
178
+ ```
179
+
180
+ This reflects the schema changes above.
181
+
182
+ </details>
183
+
184
+ ### Parquet row count diff
185
+ ```bash
186
+ parquet_row_count() {
187
+ parquet2json "$1" rowcount
188
+ }
189
+ export -f parquet_row_count
190
+ dvc-utils diff -r $commit^..$commit parquet_row_count $path
191
+ ```
192
+
193
+ This time we get no output; [the given `$commit`][commit] didn't change the row count in the DVC-tracked Parquet file [`$path`][commit path].
194
+
195
+ [DVC]: https://dvc.org/
196
+ [`parquet2json`]: https://github.com/jupiter/parquet2json
197
+ [neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
198
+ [Parquet]: https://parquet.apache.org/
199
+ [commit]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7
200
+ [commit path]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7#diff-7f812dce61e0996354f4af414203e0933ccdfe9613cb406c40c1c41a14b9769c
201
+ [neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
@@ -8,5 +8,4 @@ dvc_utils.egg-info/PKG-INFO
8
8
  dvc_utils.egg-info/SOURCES.txt
9
9
  dvc_utils.egg-info/dependency_links.txt
10
10
  dvc_utils.egg-info/entry_points.txt
11
- dvc_utils.egg-info/requires.txt
12
11
  dvc_utils.egg-info/top_level.txt
@@ -0,0 +1,20 @@
1
+ from setuptools import setup
2
+
3
+ setup(
4
+ name='dvc-utils',
5
+ version="0.0.2",
6
+ description="CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first",
7
+ long_description=open("README.md").read(),
8
+ long_description_content_type="text/markdown",
9
+ packages=['dvc_utils'],
10
+ entry_points={
11
+ 'console_scripts': [
12
+ 'dvc-utils = dvc_utils.main:cli',
13
+ ],
14
+ },
15
+ license="MIT",
16
+ author="Ryan Williams",
17
+ author_email="ryan@runsascoded.com",
18
+ author_url="https://github.com/ryan-williams",
19
+ url="https://github.com/runsascoded/dvc-utils",
20
+ )
dvc-utils-0.0.1/PKG-INFO DELETED
@@ -1,4 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: dvc-utils
3
- Version: 0.0.1
4
- License-File: LICENSE
@@ -1,4 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: dvc-utils
3
- Version: 0.0.1
4
- License-File: LICENSE
@@ -1,3 +0,0 @@
1
- click
2
- pyyaml
3
- utz
dvc-utils-0.0.1/setup.py DELETED
@@ -1,13 +0,0 @@
1
- from setuptools import setup
2
-
3
- setup(
4
- name='dvc-utils',
5
- version="0.0.1",
6
- install_requires=open("requirements.txt").readlines(),
7
- packages=['dvc_utils'],
8
- entry_points={
9
- 'console_scripts': [
10
- 'dvc-utils = dvc_utils.main:cli',
11
- ],
12
- },
13
- )
File without changes
File without changes
File without changes