PyPI - csvpath - Versions diffs - 0.0.2__tar.gz - Mend

csvpath 0.0.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

csvpath-0.0.2/PKG-INFO +184 -0
csvpath-0.0.2/README.md +169 -0
csvpath-0.0.2/csvpath/__init__.py +1 -0
csvpath-0.0.2/csvpath/csvpath.py +368 -0
csvpath-0.0.2/csvpath/matching/__init__.py +1 -0
csvpath-0.0.2/csvpath/matching/expression_encoder.py +108 -0
csvpath-0.0.2/csvpath/matching/expression_math.py +123 -0
csvpath-0.0.2/csvpath/matching/expression_utility.py +29 -0
csvpath-0.0.2/csvpath/matching/functions/above.py +36 -0
csvpath-0.0.2/csvpath/matching/functions/add.py +24 -0
csvpath-0.0.2/csvpath/matching/functions/below.py +36 -0
csvpath-0.0.2/csvpath/matching/functions/concat.py +25 -0
csvpath-0.0.2/csvpath/matching/functions/count.py +44 -0
csvpath-0.0.2/csvpath/matching/functions/count_lines.py +12 -0
csvpath-0.0.2/csvpath/matching/functions/count_scans.py +13 -0
csvpath-0.0.2/csvpath/matching/functions/divide.py +30 -0
csvpath-0.0.2/csvpath/matching/functions/end.py +18 -0
csvpath-0.0.2/csvpath/matching/functions/every.py +33 -0
csvpath-0.0.2/csvpath/matching/functions/first.py +46 -0
csvpath-0.0.2/csvpath/matching/functions/function.py +31 -0
csvpath-0.0.2/csvpath/matching/functions/function_factory.py +114 -0
csvpath-0.0.2/csvpath/matching/functions/inf.py +38 -0
csvpath-0.0.2/csvpath/matching/functions/is_instance.py +95 -0
csvpath-0.0.2/csvpath/matching/functions/length.py +33 -0
csvpath-0.0.2/csvpath/matching/functions/lower.py +21 -0
csvpath-0.0.2/csvpath/matching/functions/minf.py +167 -0
csvpath-0.0.2/csvpath/matching/functions/multiply.py +27 -0
csvpath-0.0.2/csvpath/matching/functions/no.py +10 -0
csvpath-0.0.2/csvpath/matching/functions/notf.py +26 -0
csvpath-0.0.2/csvpath/matching/functions/now.py +33 -0
csvpath-0.0.2/csvpath/matching/functions/orf.py +28 -0
csvpath-0.0.2/csvpath/matching/functions/percent.py +29 -0
csvpath-0.0.2/csvpath/matching/functions/random.py +33 -0
csvpath-0.0.2/csvpath/matching/functions/regex.py +38 -0
csvpath-0.0.2/csvpath/matching/functions/subtract.py +28 -0
csvpath-0.0.2/csvpath/matching/functions/tally.py +36 -0
csvpath-0.0.2/csvpath/matching/functions/upper.py +21 -0
csvpath-0.0.2/csvpath/matching/matcher.py +215 -0
csvpath-0.0.2/csvpath/matching/matching_lexer.py +66 -0
csvpath-0.0.2/csvpath/matching/parser.out +1287 -0
csvpath-0.0.2/csvpath/matching/parsetab.py +1427 -0
csvpath-0.0.2/csvpath/matching/productions/equality.py +158 -0
csvpath-0.0.2/csvpath/matching/productions/expression.py +16 -0
csvpath-0.0.2/csvpath/matching/productions/header.py +30 -0
csvpath-0.0.2/csvpath/matching/productions/matchable.py +41 -0
csvpath-0.0.2/csvpath/matching/productions/term.py +11 -0
csvpath-0.0.2/csvpath/matching/productions/variable.py +15 -0
csvpath-0.0.2/csvpath/parser_utility.py +39 -0
csvpath-0.0.2/csvpath/scanning/__init__.py +1 -0
csvpath-0.0.2/csvpath/scanning/parser.out +1 -0
csvpath-0.0.2/csvpath/scanning/parsetab.py +231 -0
csvpath-0.0.2/csvpath/scanning/scanner.py +165 -0
csvpath-0.0.2/csvpath/scanning/scanning_lexer.py +47 -0
csvpath-0.0.2/pyproject.toml +22 -0

csvpath-0.0.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,184 @@
+Metadata-Version: 2.1
+Name: csvpath
+Version: 0.0.2
+Summary:
+Author: David Kershaw
+Author-email: dk107dk@hotmail.com
+Requires-Python: >=3.12,<4.0
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.12
+Requires-Dist: ply (>=3.11,<4.0)
+Requires-Dist: pytest (>=8.2.2,<9.0.0)
+Requires-Dist: python-dateutil (>=2.9.0.post0,<3.0.0)
+Description-Content-Type: text/markdown
+# CsvPath
+CsvPath defines a declarative syntax for inspecting and updating CSV files. Though much simpler, it is similar to:
+- XPath: CsvPath is to a CSV file like XPath is to an XML file
+- Schematron: Schematron is basically XPath rules applied using XSLT. CsvPath paths can be used as validation rules.
+- CSS selectors: CsvPath picks out structured data in a similar way to how CSS selectors pick out HTML structures.
+CsvPath is intended to fit with other DataOps and data quality tools. Files are streamed. The interface is simple. Custom functions can be added.
+# Usage
+CsvPath paths have two parts, scanning and matching. For usage, see the unit tests in [tests/test_scanner.py](tests/test_scanner.py), [tests/test_matcher.py](tests/test_matcher.py) and [tests/test_functions.py](tests/test_functions.py).
+    path = CsvPath(delimiter=",")
+    path.parse("$test.csv[5-25][#0=="Frog" @lastname=="Bats" count()==2]")
+    for i, line in enumerate( path.next() ):
+        print(f"{i}: {line}")
+    print(f"path vars: {path.variables}")
+This scanning and matching path says:
+- Open test.csv
+- Scan lines 5 through 25
+- Match the second time we see a line where the first column equals "Frog" and set the variable called  "lastname" to "Bats"
+# Scanning
+The scanner enumerates lines. For each line returned, the line number, the scanned line count, and the match count are available. The set of line numbers scanned is also available.
+The scan part of the path starts with '$' to indicate the root, meaning the file from the top. After the '$' comes the file path. The scanning instructions are in a bracket. The rules are:
+- `[*]` means all
+- `[3*]` means starting from line 3 and going to the end of the file
+- `[3]` by itself means just line 3
+- `[1-3]` means lines 1 through 3
+- `[1+3]` means lines 1 and line 3
+- `[1+3-8]` means line 1 and lines 3 through eight
+# Matching
+The match part is also bracketed. Matches have space separated
+components that are ANDed together. A match component is one of several types:
+<table>
+<tr>
+<td>Type</td>
+<td>Returns</td>
+<td>Matches</td>
+<td>Description</td>
+<td>Examples</td>
+</tr>
+    <tr>
+        <td>Term </td><td> Value </td><td> True when used alone, otherwise calculated </td>
+        <td>A quoted string or date, optionally quoted number, or
+        regex. Regex features are limited. A regex is wrapped  in "/" characters.</td>
+        <td>
+            <li/> `"Massachusetts"`
+            <li/> `89.7`
+            <li/> `/[0-9a-zA-Z]+!/`
+        </td>
+    </tr>
+    <tr>
+        <td>Function </td><td> Calculated   </td><td> Calculated </td>
+        <td>A function name followed by parentheses. Functions can
+contain terms, variables, headers and other  functions. Some functions
+take a specific or  unlimited number of types as arguments.     </td>
+        <td>
+            <li/> `not(count()==2)`
+        </td>
+    </tr>
+    <tr>
+        <td>Variable </td>
+        <td>Value</td>
+        <td>True/False when value tested. True when set, True/False existence when used alone</td>
+        <td>An @ followed by a name. A variable is
+            set or tested depending on the usage. By itself, it is an existence test. When used as
+            the left hand side of an "=" its value is set.
+            When it is used on either side of an "==" it is an equality test.
+        <td>
+            <li/> `@weather="cloudy"`
+            <li/> `count(@weather=="sunny")`
+            <li/> `@weather`
+            <li/> `#summer==@weather`
+#1 is an assignment that sets the variable and returns True. #2 is an argument used as a test in a way that is specific to the function. #3 is an existence test. #4 is a test.
+        </td>
+    </tr>
+    <tr>
+        <td>Header   </td>
+        <td>Value     </td>
+        <td>True/False existence when used alone, otherwise calculated</td>
+        <td>A # followed by a name or integer. The name references a value in line 0, the header
+ row. A number references a column by the 0-based column order.   </td>
+        <td>
+            <li/> `#firstname`
+            <li/> `#3`
+        </td>
+    </tr>
+    <tr>
+        <td>Equality</td>
+        <td>Calculated   </td>
+        <td>True at assignment, otherwise calculated   </td>
+        <td>Two of the other types joined with an "=" or "==".</td>
+        <td>
+            <li/> `@type_of_tree="Oak"`
+            <li/> `#name == @type_of_tree`
+        </td>
+    </tr>
+<table>
+    [ #common_name #0=="field" @tail=end() not(in(@tail, 'short|medium')) ]
+In the path above, the rules applied are:
+- `#common_name` indicates a header named "common_name". Headers are the values in the 0th line. This component of the match is an existence test.
+- `#2` means the 3rd column, counting from 0
+- Functions and column references are ANDed together
+- `@tail` creates a variable named "tail" and sets it to the value of the last column
+- Functions can contain functions, equality tests, and/or literals
+Most of the work of matching is done in functions. The match functions are:
+| Function                      | What it does                                              |Done|
+|-------------------------------|-----------------------------------------------------------|----|
+| add(value, value, ...)        | adds numbers                                              | X  |
+| after(value)                  | finds things after a date, number, string                 | X  |
+| average(number, type)         | returns the average up to current "line", "scan", "match" | X  |
+| before(value)                 | finds things before a date, number, string                | X  |
+| concat(value, value)          | counts the number of matches                              | X  |
+| count()                       | counts the number of matches                              | X  |
+| count(value)                  | count matches of value                                    | X  |
+| count_lines()                 | count lines to this point in the file                     | X  |
+| count_scans()                 | count lines we checked for match                          | X  |
+| divide(value, value, ...)     | divides numbers                                           | X  |
+| end()                         | returns the value of the last column                      | X  |
+| every(value, number)          | match every Nth time a value is seen                      | X  |
+| first(value)                  | match the first occurrence and capture line               | X  |
+| in(value, list)               | match in a pipe-delimited list                            | X  |
+| increment(value, n)           | increments a variable by n each time seen                 |    |
+| isinstance(value, typestr)    | tests for "int","float","complex","bool","usd"            | X  |
+| length(value)                 | returns the length of the value                           | X  |
+| lower(value)                  | makes value lowercase                                     | X  |
+| max(value, type)              | largest value seen up to current "line", "scan", "match"  | X  |
+| median(value, type)           | median value up to current "line", "scan", "match"        | X  |
+| min(value, type)              | smallest value seen up to current "line", "scan", "match" | X  |
+| multiply(value, value, ...)   | multiplies numbers                                        | X  |
+| no()                          | always false                                              | X  |
+| not(value)                    | negates a value                                           | X  |
+| now(format)                   | a datetime, optionally formatted                          | X  |
+| or(value, value,...)          | match any one                                             | X  |
+| percent(type)                 | % of total lines for "scan", "match", "line"              | X  |
+| random(list)                  | pick from a list                                          |    |
+| random(starting, ending)      | generates a random int from starting to ending            | X  |
+| regex(regex-string)           | match on a regular expression                             | X  |
+| subtract(value, value, ...)   | subtracts numbers                                         | X  |
+| tally(value, value, ...)      | counts times values are seen, including as a set          | X  |
+| then(y,m,d,hh,mm,ss,format)   | a datetime, optionally formatted                          |    |
+| upper(value)                  | makes value uppercase                                     | X  |
+# Not Ready For Production
+Anything could change. This project is a hobby.

csvpath-0.0.2/README.md ADDED Viewed

@@ -0,0 +1,169 @@
+# CsvPath
+CsvPath defines a declarative syntax for inspecting and updating CSV files. Though much simpler, it is similar to:
+- XPath: CsvPath is to a CSV file like XPath is to an XML file
+- Schematron: Schematron is basically XPath rules applied using XSLT. CsvPath paths can be used as validation rules.
+- CSS selectors: CsvPath picks out structured data in a similar way to how CSS selectors pick out HTML structures.
+CsvPath is intended to fit with other DataOps and data quality tools. Files are streamed. The interface is simple. Custom functions can be added.
+# Usage
+CsvPath paths have two parts, scanning and matching. For usage, see the unit tests in [tests/test_scanner.py](tests/test_scanner.py), [tests/test_matcher.py](tests/test_matcher.py) and [tests/test_functions.py](tests/test_functions.py).
+    path = CsvPath(delimiter=",")
+    path.parse("$test.csv[5-25][#0=="Frog" @lastname=="Bats" count()==2]")
+    for i, line in enumerate( path.next() ):
+        print(f"{i}: {line}")
+    print(f"path vars: {path.variables}")
+This scanning and matching path says:
+- Open test.csv
+- Scan lines 5 through 25
+- Match the second time we see a line where the first column equals "Frog" and set the variable called  "lastname" to "Bats"
+# Scanning
+The scanner enumerates lines. For each line returned, the line number, the scanned line count, and the match count are available. The set of line numbers scanned is also available.
+The scan part of the path starts with '$' to indicate the root, meaning the file from the top. After the '$' comes the file path. The scanning instructions are in a bracket. The rules are:
+- `[*]` means all
+- `[3*]` means starting from line 3 and going to the end of the file
+- `[3]` by itself means just line 3
+- `[1-3]` means lines 1 through 3
+- `[1+3]` means lines 1 and line 3
+- `[1+3-8]` means line 1 and lines 3 through eight
+# Matching
+The match part is also bracketed. Matches have space separated
+components that are ANDed together. A match component is one of several types:
+<table>
+<tr>
+<td>Type</td>
+<td>Returns</td>
+<td>Matches</td>
+<td>Description</td>
+<td>Examples</td>
+</tr>
+    <tr>
+        <td>Term </td><td> Value </td><td> True when used alone, otherwise calculated </td>
+        <td>A quoted string or date, optionally quoted number, or
+        regex. Regex features are limited. A regex is wrapped  in "/" characters.</td>
+        <td>
+            <li/> `"Massachusetts"`
+            <li/> `89.7`
+            <li/> `/[0-9a-zA-Z]+!/`
+        </td>
+    </tr>
+    <tr>
+        <td>Function </td><td> Calculated   </td><td> Calculated </td>
+        <td>A function name followed by parentheses. Functions can
+contain terms, variables, headers and other  functions. Some functions
+take a specific or  unlimited number of types as arguments.     </td>
+        <td>
+            <li/> `not(count()==2)`
+        </td>
+    </tr>
+    <tr>
+        <td>Variable </td>
+        <td>Value</td>
+        <td>True/False when value tested. True when set, True/False existence when used alone</td>
+        <td>An @ followed by a name. A variable is
+            set or tested depending on the usage. By itself, it is an existence test. When used as
+            the left hand side of an "=" its value is set.
+            When it is used on either side of an "==" it is an equality test.
+        <td>
+            <li/> `@weather="cloudy"`
+            <li/> `count(@weather=="sunny")`
+            <li/> `@weather`
+            <li/> `#summer==@weather`
+#1 is an assignment that sets the variable and returns True. #2 is an argument used as a test in a way that is specific to the function. #3 is an existence test. #4 is a test.
+        </td>
+    </tr>
+    <tr>
+        <td>Header   </td>
+        <td>Value     </td>
+        <td>True/False existence when used alone, otherwise calculated</td>
+        <td>A # followed by a name or integer. The name references a value in line 0, the header
+ row. A number references a column by the 0-based column order.   </td>
+        <td>
+            <li/> `#firstname`
+            <li/> `#3`
+        </td>
+    </tr>
+    <tr>
+        <td>Equality</td>
+        <td>Calculated   </td>
+        <td>True at assignment, otherwise calculated   </td>
+        <td>Two of the other types joined with an "=" or "==".</td>
+        <td>
+            <li/> `@type_of_tree="Oak"`
+            <li/> `#name == @type_of_tree`
+        </td>
+    </tr>
+<table>
+    [ #common_name #0=="field" @tail=end() not(in(@tail, 'short|medium')) ]
+In the path above, the rules applied are:
+- `#common_name` indicates a header named "common_name". Headers are the values in the 0th line. This component of the match is an existence test.
+- `#2` means the 3rd column, counting from 0
+- Functions and column references are ANDed together
+- `@tail` creates a variable named "tail" and sets it to the value of the last column
+- Functions can contain functions, equality tests, and/or literals
+Most of the work of matching is done in functions. The match functions are:
+| Function                      | What it does                                              |Done|
+|-------------------------------|-----------------------------------------------------------|----|
+| add(value, value, ...)        | adds numbers                                              | X  |
+| after(value)                  | finds things after a date, number, string                 | X  |
+| average(number, type)         | returns the average up to current "line", "scan", "match" | X  |
+| before(value)                 | finds things before a date, number, string                | X  |
+| concat(value, value)          | counts the number of matches                              | X  |
+| count()                       | counts the number of matches                              | X  |
+| count(value)                  | count matches of value                                    | X  |
+| count_lines()                 | count lines to this point in the file                     | X  |
+| count_scans()                 | count lines we checked for match                          | X  |
+| divide(value, value, ...)     | divides numbers                                           | X  |
+| end()                         | returns the value of the last column                      | X  |
+| every(value, number)          | match every Nth time a value is seen                      | X  |
+| first(value)                  | match the first occurrence and capture line               | X  |
+| in(value, list)               | match in a pipe-delimited list                            | X  |
+| increment(value, n)           | increments a variable by n each time seen                 |    |
+| isinstance(value, typestr)    | tests for "int","float","complex","bool","usd"            | X  |
+| length(value)                 | returns the length of the value                           | X  |
+| lower(value)                  | makes value lowercase                                     | X  |
+| max(value, type)              | largest value seen up to current "line", "scan", "match"  | X  |
+| median(value, type)           | median value up to current "line", "scan", "match"        | X  |
+| min(value, type)              | smallest value seen up to current "line", "scan", "match" | X  |
+| multiply(value, value, ...)   | multiplies numbers                                        | X  |
+| no()                          | always false                                              | X  |
+| not(value)                    | negates a value                                           | X  |
+| now(format)                   | a datetime, optionally formatted                          | X  |
+| or(value, value,...)          | match any one                                             | X  |
+| percent(type)                 | % of total lines for "scan", "match", "line"              | X  |
+| random(list)                  | pick from a list                                          |    |
+| random(starting, ending)      | generates a random int from starting to ending            | X  |
+| regex(regex-string)           | match on a regular expression                             | X  |
+| subtract(value, value, ...)   | subtracts numbers                                         | X  |
+| tally(value, value, ...)      | counts times values are seen, including as a set          | X  |
+| then(y,m,d,hh,mm,ss,format)   | a datetime, optionally formatted                          |    |
+| upper(value)                  | makes value uppercase                                     | X  |
+# Not Ready For Production
+Anything could change. This project is a hobby.

csvpath-0.0.2/csvpath/__init__.py ADDED Viewed

	@@ -0,0 +1 @@
1	+