dvc-utils 0.0.1__tar.gz → 0.0.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- dvc-utils-0.0.2/PKG-INFO +201 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/README.md +30 -3
- dvc-utils-0.0.2/dvc_utils.egg-info/PKG-INFO +201 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils.egg-info/SOURCES.txt +0 -1
- dvc-utils-0.0.2/setup.py +20 -0
- dvc-utils-0.0.1/PKG-INFO +0 -4
- dvc-utils-0.0.1/dvc_utils.egg-info/PKG-INFO +0 -4
- dvc-utils-0.0.1/dvc_utils.egg-info/requires.txt +0 -3
- dvc-utils-0.0.1/setup.py +0 -13
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/LICENSE +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils/__init__.py +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils/main.py +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils/named_pipes.py +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils.egg-info/dependency_links.txt +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils.egg-info/entry_points.txt +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/dvc_utils.egg-info/top_level.txt +0 -0
- {dvc-utils-0.0.1 → dvc-utils-0.0.2}/setup.cfg +0 -0
dvc-utils-0.0.2/PKG-INFO
ADDED
@@ -0,0 +1,201 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: dvc-utils
|
3
|
+
Version: 0.0.2
|
4
|
+
Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
|
5
|
+
Home-page: https://github.com/runsascoded/dvc-utils
|
6
|
+
Author: Ryan Williams
|
7
|
+
Author-email: ryan@runsascoded.com
|
8
|
+
License: MIT
|
9
|
+
Description-Content-Type: text/markdown
|
10
|
+
License-File: LICENSE
|
11
|
+
|
12
|
+
# dvc-utils
|
13
|
+
CLI for diffing [DVC] files at two commits (or one commit vs. current worktree), optionally passing both through another command first
|
14
|
+
|
15
|
+
## Installation
|
16
|
+
```bash
|
17
|
+
pip install dvc-utils
|
18
|
+
```
|
19
|
+
|
20
|
+
## Usage
|
21
|
+
```bash
|
22
|
+
dvc-utils --help
|
23
|
+
# Usage: dvc-utils [OPTIONS] COMMAND [ARGS]...
|
24
|
+
#
|
25
|
+
# Options:
|
26
|
+
# --help Show this message and exit.
|
27
|
+
#
|
28
|
+
# Commands:
|
29
|
+
# diff Diff a DVC-tracked file at two commits (or one commit vs. current
|
30
|
+
# worktree), optionally passing both through another command first
|
31
|
+
```
|
32
|
+
|
33
|
+
### `dvc-utils diff`
|
34
|
+
```bash
|
35
|
+
dvc-utils diff --help
|
36
|
+
# Usage: dvc-utils diff [OPTIONS] [cmd...] <path>
|
37
|
+
#
|
38
|
+
# Diff a file at two commits (or one commit vs. current worktree), optionally
|
39
|
+
# passing both through `cmd` first
|
40
|
+
#
|
41
|
+
# Examples:
|
42
|
+
#
|
43
|
+
# dvc-utils diff -r HEAD^..HEAD wc -l foo.dvc # Compare the number of lines
|
44
|
+
# (`wc -l`) in `foo` (the file referenced by `foo.dvc`) at the previous vs.
|
45
|
+
# current commit (`HEAD^..HEAD`).
|
46
|
+
#
|
47
|
+
# dvc-utils diff md5sum foo # Diff the `md5sum` of `foo` (".dvc" extension is
|
48
|
+
# optional) at HEAD (last committed value) vs. the current worktree content.
|
49
|
+
#
|
50
|
+
# Options:
|
51
|
+
# -r, --refspec TEXT <commit 1>..<commit 2> (compare two commits) or <commit>
|
52
|
+
# (compare <commit> to the worktree)
|
53
|
+
# -S, --no-shell Don't pass `shell=True` to Python `subprocess`es
|
54
|
+
# -v, --verbose Log intermediate commands to stderr
|
55
|
+
# --help Show this message and exit.
|
56
|
+
```
|
57
|
+
|
58
|
+
## Examples
|
59
|
+
See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
|
60
|
+
|
61
|
+
```bash
|
62
|
+
git clone https://github.com/neighbor-ryan/nj-crashes
|
63
|
+
commit=c8ae28e
|
64
|
+
path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
|
65
|
+
```
|
66
|
+
|
67
|
+
### Parquet schema diff
|
68
|
+
Use [`parquet2json`] to observe schema changes to a Parquet file, in [a given commit][commit] from [neighbor-ryan/nj-crashes]:
|
69
|
+
```bash
|
70
|
+
parquet_schema() {
|
71
|
+
parquet2json "$1" schema
|
72
|
+
}
|
73
|
+
export -f parquet_schema
|
74
|
+
dvc-utils diff -r $commit^..$commit parquet_schema $path
|
75
|
+
```
|
76
|
+
<details><summary>Output</summary>
|
77
|
+
|
78
|
+
```diff
|
79
|
+
2d1
|
80
|
+
< OPTIONAL BYTE_ARRAY Year (STRING);
|
81
|
+
8,10d6
|
82
|
+
< OPTIONAL BYTE_ARRAY Crash Date (STRING);
|
83
|
+
< OPTIONAL BYTE_ARRAY Crash Day Of Week (STRING);
|
84
|
+
< OPTIONAL BYTE_ARRAY Crash Time (STRING);
|
85
|
+
14,17c10,13
|
86
|
+
< OPTIONAL BYTE_ARRAY Total Killed (STRING);
|
87
|
+
< OPTIONAL BYTE_ARRAY Total Injured (STRING);
|
88
|
+
< OPTIONAL BYTE_ARRAY Pedestrians Killed (STRING);
|
89
|
+
< OPTIONAL BYTE_ARRAY Pedestrians Injured (STRING);
|
90
|
+
---
|
91
|
+
> OPTIONAL INT64 Total Killed;
|
92
|
+
> OPTIONAL INT64 Total Injured;
|
93
|
+
> OPTIONAL INT64 Pedestrians Killed;
|
94
|
+
> OPTIONAL INT64 Pedestrians Injured;
|
95
|
+
20,21c16,17
|
96
|
+
< OPTIONAL BYTE_ARRAY Alcohol Involved (STRING);
|
97
|
+
< OPTIONAL BYTE_ARRAY HazMat Involved (STRING);
|
98
|
+
---
|
99
|
+
> OPTIONAL BOOLEAN Alcohol Involved;
|
100
|
+
> OPTIONAL BOOLEAN HazMat Involved;
|
101
|
+
23c19
|
102
|
+
< OPTIONAL BYTE_ARRAY Total Vehicles Involved (STRING);
|
103
|
+
---
|
104
|
+
> OPTIONAL INT64 Total Vehicles Involved;
|
105
|
+
29c25
|
106
|
+
< OPTIONAL BYTE_ARRAY Mile Post (STRING);
|
107
|
+
---
|
108
|
+
> OPTIONAL DOUBLE Mile Post;
|
109
|
+
47,48c43,44
|
110
|
+
< OPTIONAL BYTE_ARRAY Latitude (STRING);
|
111
|
+
< OPTIONAL BYTE_ARRAY Longitude (STRING);
|
112
|
+
---
|
113
|
+
> OPTIONAL DOUBLE Latitude;
|
114
|
+
> OPTIONAL DOUBLE Longitude;
|
115
|
+
51a48
|
116
|
+
> OPTIONAL INT64 Date (TIMESTAMP(MICROS,false));
|
117
|
+
```
|
118
|
+
|
119
|
+
Here we can see that various date/time columns were consolidated, and several stringly-typed columns were converted to ints, floats, and booleans.
|
120
|
+
|
121
|
+
</details>
|
122
|
+
|
123
|
+
### Parquet row diff
|
124
|
+
Diff the first row of the Parquet file above (pretty-printed as JSON), before and after the given commit:
|
125
|
+
|
126
|
+
```bash
|
127
|
+
pretty_print_first_row() {
|
128
|
+
parquet2json "$1" cat -l 1 | jq .
|
129
|
+
}
|
130
|
+
export -f pretty_print_first_row
|
131
|
+
dvc-utils diff -r $commit^..$commit pretty_print_first_row $path
|
132
|
+
```
|
133
|
+
|
134
|
+
<details><summary>Output</summary>
|
135
|
+
|
136
|
+
```diff
|
137
|
+
2d1
|
138
|
+
< "Year": "2001",
|
139
|
+
8,10d6
|
140
|
+
< "Crash Date": "12/21/2001",
|
141
|
+
< "Crash Day Of Week": "F",
|
142
|
+
< "Crash Time": "1834",
|
143
|
+
14,17c10,13
|
144
|
+
< "Total Killed": "0",
|
145
|
+
< "Total Injured": "0",
|
146
|
+
< "Pedestrians Killed": "0",
|
147
|
+
< "Pedestrians Injured": "0",
|
148
|
+
---
|
149
|
+
> "Total Killed": 0,
|
150
|
+
> "Total Injured": 0,
|
151
|
+
> "Pedestrians Killed": 0,
|
152
|
+
> "Pedestrians Injured": 0,
|
153
|
+
20,21c16,17
|
154
|
+
< "Alcohol Involved": "N",
|
155
|
+
< "HazMat Involved": "N",
|
156
|
+
---
|
157
|
+
> "Alcohol Involved": false,
|
158
|
+
> "HazMat Involved": false,
|
159
|
+
23c19
|
160
|
+
< "Total Vehicles Involved": "2",
|
161
|
+
---
|
162
|
+
> "Total Vehicles Involved": 2,
|
163
|
+
29c25
|
164
|
+
< "Mile Post": "",
|
165
|
+
---
|
166
|
+
> "Mile Post": null,
|
167
|
+
47,48c43,44
|
168
|
+
< "Latitude": "",
|
169
|
+
< "Longitude": "",
|
170
|
+
---
|
171
|
+
> "Latitude": null,
|
172
|
+
> "Longitude": null,
|
173
|
+
51c47,48
|
174
|
+
< "Reporting Badge No.": "830"
|
175
|
+
---
|
176
|
+
> "Reporting Badge No.": "830",
|
177
|
+
> "Date": "2001-12-21 18:34:00 +00:00"
|
178
|
+
```
|
179
|
+
|
180
|
+
This reflects the schema changes above.
|
181
|
+
|
182
|
+
</details>
|
183
|
+
|
184
|
+
### Parquet row count diff
|
185
|
+
```bash
|
186
|
+
parquet_row_count() {
|
187
|
+
parquet2json "$1" rowcount
|
188
|
+
}
|
189
|
+
export -f parquet_row_count
|
190
|
+
dvc-utils diff -r $commit^..$commit parquet_row_count $path
|
191
|
+
```
|
192
|
+
|
193
|
+
This time we get no output; [the given `$commit`][commit] didn't change the row count in the DVC-tracked Parquet file [`$path`][commit path].
|
194
|
+
|
195
|
+
[DVC]: https://dvc.org/
|
196
|
+
[`parquet2json`]: https://github.com/jupiter/parquet2json
|
197
|
+
[neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
|
198
|
+
[Parquet]: https://parquet.apache.org/
|
199
|
+
[commit]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7
|
200
|
+
[commit path]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7#diff-7f812dce61e0996354f4af414203e0933ccdfe9613cb406c40c1c41a14b9769c
|
201
|
+
[neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
|
@@ -45,14 +45,21 @@ dvc-utils diff --help
|
|
45
45
|
```
|
46
46
|
|
47
47
|
## Examples
|
48
|
-
|
48
|
+
See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
|
49
|
+
|
50
|
+
```bash
|
51
|
+
git clone https://github.com/neighbor-ryan/nj-crashes
|
52
|
+
commit=c8ae28e
|
53
|
+
path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
|
54
|
+
```
|
55
|
+
|
56
|
+
### Parquet schema diff
|
57
|
+
Use [`parquet2json`] to observe schema changes to a Parquet file, in [a given commit][commit] from [neighbor-ryan/nj-crashes]:
|
49
58
|
```bash
|
50
59
|
parquet_schema() {
|
51
60
|
parquet2json "$1" schema
|
52
61
|
}
|
53
62
|
export -f parquet_schema
|
54
|
-
commit=7fa6a07
|
55
|
-
path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
|
56
63
|
dvc-utils diff -r $commit^..$commit parquet_schema $path
|
57
64
|
```
|
58
65
|
<details><summary>Output</summary>
|
@@ -97,8 +104,12 @@ dvc-utils diff -r $commit^..$commit parquet_schema $path
|
|
97
104
|
51a48
|
98
105
|
> OPTIONAL INT64 Date (TIMESTAMP(MICROS,false));
|
99
106
|
```
|
107
|
+
|
108
|
+
Here we can see that various date/time columns were consolidated, and several stringly-typed columns were converted to ints, floats, and booleans.
|
109
|
+
|
100
110
|
</details>
|
101
111
|
|
112
|
+
### Parquet row diff
|
102
113
|
Diff the first row of the Parquet file above (pretty-printed as JSON), before and after the given commit:
|
103
114
|
|
104
115
|
```bash
|
@@ -154,10 +165,26 @@ dvc-utils diff -r $commit^..$commit pretty_print_first_row $path
|
|
154
165
|
> "Reporting Badge No.": "830",
|
155
166
|
> "Date": "2001-12-21 18:34:00 +00:00"
|
156
167
|
```
|
168
|
+
|
169
|
+
This reflects the schema changes above.
|
170
|
+
|
157
171
|
</details>
|
158
172
|
|
173
|
+
### Parquet row count diff
|
174
|
+
```bash
|
175
|
+
parquet_row_count() {
|
176
|
+
parquet2json "$1" rowcount
|
177
|
+
}
|
178
|
+
export -f parquet_row_count
|
179
|
+
dvc-utils diff -r $commit^..$commit parquet_row_count $path
|
180
|
+
```
|
159
181
|
|
182
|
+
This time we get no output; [the given `$commit`][commit] didn't change the row count in the DVC-tracked Parquet file [`$path`][commit path].
|
160
183
|
|
161
184
|
[DVC]: https://dvc.org/
|
162
185
|
[`parquet2json`]: https://github.com/jupiter/parquet2json
|
163
186
|
[neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
|
187
|
+
[Parquet]: https://parquet.apache.org/
|
188
|
+
[commit]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7
|
189
|
+
[commit path]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7#diff-7f812dce61e0996354f4af414203e0933ccdfe9613cb406c40c1c41a14b9769c
|
190
|
+
[neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
|
@@ -0,0 +1,201 @@
|
|
1
|
+
Metadata-Version: 2.1
|
2
|
+
Name: dvc-utils
|
3
|
+
Version: 0.0.2
|
4
|
+
Summary: CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first
|
5
|
+
Home-page: https://github.com/runsascoded/dvc-utils
|
6
|
+
Author: Ryan Williams
|
7
|
+
Author-email: ryan@runsascoded.com
|
8
|
+
License: MIT
|
9
|
+
Description-Content-Type: text/markdown
|
10
|
+
License-File: LICENSE
|
11
|
+
|
12
|
+
# dvc-utils
|
13
|
+
CLI for diffing [DVC] files at two commits (or one commit vs. current worktree), optionally passing both through another command first
|
14
|
+
|
15
|
+
## Installation
|
16
|
+
```bash
|
17
|
+
pip install dvc-utils
|
18
|
+
```
|
19
|
+
|
20
|
+
## Usage
|
21
|
+
```bash
|
22
|
+
dvc-utils --help
|
23
|
+
# Usage: dvc-utils [OPTIONS] COMMAND [ARGS]...
|
24
|
+
#
|
25
|
+
# Options:
|
26
|
+
# --help Show this message and exit.
|
27
|
+
#
|
28
|
+
# Commands:
|
29
|
+
# diff Diff a DVC-tracked file at two commits (or one commit vs. current
|
30
|
+
# worktree), optionally passing both through another command first
|
31
|
+
```
|
32
|
+
|
33
|
+
### `dvc-utils diff`
|
34
|
+
```bash
|
35
|
+
dvc-utils diff --help
|
36
|
+
# Usage: dvc-utils diff [OPTIONS] [cmd...] <path>
|
37
|
+
#
|
38
|
+
# Diff a file at two commits (or one commit vs. current worktree), optionally
|
39
|
+
# passing both through `cmd` first
|
40
|
+
#
|
41
|
+
# Examples:
|
42
|
+
#
|
43
|
+
# dvc-utils diff -r HEAD^..HEAD wc -l foo.dvc # Compare the number of lines
|
44
|
+
# (`wc -l`) in `foo` (the file referenced by `foo.dvc`) at the previous vs.
|
45
|
+
# current commit (`HEAD^..HEAD`).
|
46
|
+
#
|
47
|
+
# dvc-utils diff md5sum foo # Diff the `md5sum` of `foo` (".dvc" extension is
|
48
|
+
# optional) at HEAD (last committed value) vs. the current worktree content.
|
49
|
+
#
|
50
|
+
# Options:
|
51
|
+
# -r, --refspec TEXT <commit 1>..<commit 2> (compare two commits) or <commit>
|
52
|
+
# (compare <commit> to the worktree)
|
53
|
+
# -S, --no-shell Don't pass `shell=True` to Python `subprocess`es
|
54
|
+
# -v, --verbose Log intermediate commands to stderr
|
55
|
+
# --help Show this message and exit.
|
56
|
+
```
|
57
|
+
|
58
|
+
## Examples
|
59
|
+
See sample commands and output below for inspecting changes to [a DVC-tracked Parquet file][commit path] in [a given commit][commit].
|
60
|
+
|
61
|
+
```bash
|
62
|
+
git clone https://github.com/neighbor-ryan/nj-crashes
|
63
|
+
commit=c8ae28e
|
64
|
+
path=njdot/data/2001/NewJersey2001Accidents.pqt.dvc
|
65
|
+
```
|
66
|
+
|
67
|
+
### Parquet schema diff
|
68
|
+
Use [`parquet2json`] to observe schema changes to a Parquet file, in [a given commit][commit] from [neighbor-ryan/nj-crashes]:
|
69
|
+
```bash
|
70
|
+
parquet_schema() {
|
71
|
+
parquet2json "$1" schema
|
72
|
+
}
|
73
|
+
export -f parquet_schema
|
74
|
+
dvc-utils diff -r $commit^..$commit parquet_schema $path
|
75
|
+
```
|
76
|
+
<details><summary>Output</summary>
|
77
|
+
|
78
|
+
```diff
|
79
|
+
2d1
|
80
|
+
< OPTIONAL BYTE_ARRAY Year (STRING);
|
81
|
+
8,10d6
|
82
|
+
< OPTIONAL BYTE_ARRAY Crash Date (STRING);
|
83
|
+
< OPTIONAL BYTE_ARRAY Crash Day Of Week (STRING);
|
84
|
+
< OPTIONAL BYTE_ARRAY Crash Time (STRING);
|
85
|
+
14,17c10,13
|
86
|
+
< OPTIONAL BYTE_ARRAY Total Killed (STRING);
|
87
|
+
< OPTIONAL BYTE_ARRAY Total Injured (STRING);
|
88
|
+
< OPTIONAL BYTE_ARRAY Pedestrians Killed (STRING);
|
89
|
+
< OPTIONAL BYTE_ARRAY Pedestrians Injured (STRING);
|
90
|
+
---
|
91
|
+
> OPTIONAL INT64 Total Killed;
|
92
|
+
> OPTIONAL INT64 Total Injured;
|
93
|
+
> OPTIONAL INT64 Pedestrians Killed;
|
94
|
+
> OPTIONAL INT64 Pedestrians Injured;
|
95
|
+
20,21c16,17
|
96
|
+
< OPTIONAL BYTE_ARRAY Alcohol Involved (STRING);
|
97
|
+
< OPTIONAL BYTE_ARRAY HazMat Involved (STRING);
|
98
|
+
---
|
99
|
+
> OPTIONAL BOOLEAN Alcohol Involved;
|
100
|
+
> OPTIONAL BOOLEAN HazMat Involved;
|
101
|
+
23c19
|
102
|
+
< OPTIONAL BYTE_ARRAY Total Vehicles Involved (STRING);
|
103
|
+
---
|
104
|
+
> OPTIONAL INT64 Total Vehicles Involved;
|
105
|
+
29c25
|
106
|
+
< OPTIONAL BYTE_ARRAY Mile Post (STRING);
|
107
|
+
---
|
108
|
+
> OPTIONAL DOUBLE Mile Post;
|
109
|
+
47,48c43,44
|
110
|
+
< OPTIONAL BYTE_ARRAY Latitude (STRING);
|
111
|
+
< OPTIONAL BYTE_ARRAY Longitude (STRING);
|
112
|
+
---
|
113
|
+
> OPTIONAL DOUBLE Latitude;
|
114
|
+
> OPTIONAL DOUBLE Longitude;
|
115
|
+
51a48
|
116
|
+
> OPTIONAL INT64 Date (TIMESTAMP(MICROS,false));
|
117
|
+
```
|
118
|
+
|
119
|
+
Here we can see that various date/time columns were consolidated, and several stringly-typed columns were converted to ints, floats, and booleans.
|
120
|
+
|
121
|
+
</details>
|
122
|
+
|
123
|
+
### Parquet row diff
|
124
|
+
Diff the first row of the Parquet file above (pretty-printed as JSON), before and after the given commit:
|
125
|
+
|
126
|
+
```bash
|
127
|
+
pretty_print_first_row() {
|
128
|
+
parquet2json "$1" cat -l 1 | jq .
|
129
|
+
}
|
130
|
+
export -f pretty_print_first_row
|
131
|
+
dvc-utils diff -r $commit^..$commit pretty_print_first_row $path
|
132
|
+
```
|
133
|
+
|
134
|
+
<details><summary>Output</summary>
|
135
|
+
|
136
|
+
```diff
|
137
|
+
2d1
|
138
|
+
< "Year": "2001",
|
139
|
+
8,10d6
|
140
|
+
< "Crash Date": "12/21/2001",
|
141
|
+
< "Crash Day Of Week": "F",
|
142
|
+
< "Crash Time": "1834",
|
143
|
+
14,17c10,13
|
144
|
+
< "Total Killed": "0",
|
145
|
+
< "Total Injured": "0",
|
146
|
+
< "Pedestrians Killed": "0",
|
147
|
+
< "Pedestrians Injured": "0",
|
148
|
+
---
|
149
|
+
> "Total Killed": 0,
|
150
|
+
> "Total Injured": 0,
|
151
|
+
> "Pedestrians Killed": 0,
|
152
|
+
> "Pedestrians Injured": 0,
|
153
|
+
20,21c16,17
|
154
|
+
< "Alcohol Involved": "N",
|
155
|
+
< "HazMat Involved": "N",
|
156
|
+
---
|
157
|
+
> "Alcohol Involved": false,
|
158
|
+
> "HazMat Involved": false,
|
159
|
+
23c19
|
160
|
+
< "Total Vehicles Involved": "2",
|
161
|
+
---
|
162
|
+
> "Total Vehicles Involved": 2,
|
163
|
+
29c25
|
164
|
+
< "Mile Post": "",
|
165
|
+
---
|
166
|
+
> "Mile Post": null,
|
167
|
+
47,48c43,44
|
168
|
+
< "Latitude": "",
|
169
|
+
< "Longitude": "",
|
170
|
+
---
|
171
|
+
> "Latitude": null,
|
172
|
+
> "Longitude": null,
|
173
|
+
51c47,48
|
174
|
+
< "Reporting Badge No.": "830"
|
175
|
+
---
|
176
|
+
> "Reporting Badge No.": "830",
|
177
|
+
> "Date": "2001-12-21 18:34:00 +00:00"
|
178
|
+
```
|
179
|
+
|
180
|
+
This reflects the schema changes above.
|
181
|
+
|
182
|
+
</details>
|
183
|
+
|
184
|
+
### Parquet row count diff
|
185
|
+
```bash
|
186
|
+
parquet_row_count() {
|
187
|
+
parquet2json "$1" rowcount
|
188
|
+
}
|
189
|
+
export -f parquet_row_count
|
190
|
+
dvc-utils diff -r $commit^..$commit parquet_row_count $path
|
191
|
+
```
|
192
|
+
|
193
|
+
This time we get no output; [the given `$commit`][commit] didn't change the row count in the DVC-tracked Parquet file [`$path`][commit path].
|
194
|
+
|
195
|
+
[DVC]: https://dvc.org/
|
196
|
+
[`parquet2json`]: https://github.com/jupiter/parquet2json
|
197
|
+
[neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
|
198
|
+
[Parquet]: https://parquet.apache.org/
|
199
|
+
[commit]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7
|
200
|
+
[commit path]: https://github.com/neighbor-ryan/nj-crashes/commit/c8ae28e64f4917895d84074913f48e0a7afbc3d7#diff-7f812dce61e0996354f4af414203e0933ccdfe9613cb406c40c1c41a14b9769c
|
201
|
+
[neighbor-ryan/nj-crashes]: https://github.com/neighbor-ryan/nj-crashes
|
dvc-utils-0.0.2/setup.py
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
from setuptools import setup
|
2
|
+
|
3
|
+
setup(
|
4
|
+
name='dvc-utils',
|
5
|
+
version="0.0.2",
|
6
|
+
description="CLI for diffing DVC files at two commits (or one commit vs. current worktree), optionally passing both through another command first",
|
7
|
+
long_description=open("README.md").read(),
|
8
|
+
long_description_content_type="text/markdown",
|
9
|
+
packages=['dvc_utils'],
|
10
|
+
entry_points={
|
11
|
+
'console_scripts': [
|
12
|
+
'dvc-utils = dvc_utils.main:cli',
|
13
|
+
],
|
14
|
+
},
|
15
|
+
license="MIT",
|
16
|
+
author="Ryan Williams",
|
17
|
+
author_email="ryan@runsascoded.com",
|
18
|
+
author_url="https://github.com/ryan-williams",
|
19
|
+
url="https://github.com/runsascoded/dvc-utils",
|
20
|
+
)
|
dvc-utils-0.0.1/PKG-INFO
DELETED
dvc-utils-0.0.1/setup.py
DELETED
@@ -1,13 +0,0 @@
|
|
1
|
-
from setuptools import setup
|
2
|
-
|
3
|
-
setup(
|
4
|
-
name='dvc-utils',
|
5
|
-
version="0.0.1",
|
6
|
-
install_requires=open("requirements.txt").readlines(),
|
7
|
-
packages=['dvc_utils'],
|
8
|
-
entry_points={
|
9
|
-
'console_scripts': [
|
10
|
-
'dvc-utils = dvc_utils.main:cli',
|
11
|
-
],
|
12
|
-
},
|
13
|
-
)
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|