csvpath 0.0.454__tar.gz → 0.0.455__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (193) hide show
  1. {csvpath-0.0.454 → csvpath-0.0.455}/PKG-INFO +1 -1
  2. csvpath-0.0.455/config/config.ini +19 -0
  3. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/managers/files_manager.py +4 -1
  4. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/config.py +86 -29
  5. csvpath-0.0.455/docs/asbool.md +22 -0
  6. csvpath-0.0.455/docs/assignment.md +81 -0
  7. csvpath-0.0.455/docs/config.md +69 -0
  8. csvpath-0.0.455/docs/examples.md +119 -0
  9. csvpath-0.0.455/docs/files.md +55 -0
  10. csvpath-0.0.455/docs/functions/above.md +34 -0
  11. csvpath-0.0.455/docs/functions/advance.md +27 -0
  12. csvpath-0.0.455/docs/functions/after_blank.md +18 -0
  13. csvpath-0.0.455/docs/functions/all.md +39 -0
  14. csvpath-0.0.455/docs/functions/andor.md +30 -0
  15. csvpath-0.0.455/docs/functions/any.md +33 -0
  16. csvpath-0.0.455/docs/functions/average.md +21 -0
  17. csvpath-0.0.455/docs/functions/between.md +25 -0
  18. csvpath-0.0.455/docs/functions/collect.md +30 -0
  19. csvpath-0.0.455/docs/functions/correlate.md +39 -0
  20. csvpath-0.0.455/docs/functions/count.md +67 -0
  21. csvpath-0.0.455/docs/functions/count_headers.md +26 -0
  22. csvpath-0.0.455/docs/functions/date.md +20 -0
  23. csvpath-0.0.455/docs/functions/empty.md +42 -0
  24. csvpath-0.0.455/docs/functions/end.md +12 -0
  25. csvpath-0.0.455/docs/functions/every.md +58 -0
  26. csvpath-0.0.455/docs/functions/fail.md +46 -0
  27. csvpath-0.0.455/docs/functions/first.md +25 -0
  28. csvpath-0.0.455/docs/functions/get.md +23 -0
  29. csvpath-0.0.455/docs/functions/has_dups.md +28 -0
  30. csvpath-0.0.455/docs/functions/header.md +18 -0
  31. csvpath-0.0.455/docs/functions/header_name.md +24 -0
  32. csvpath-0.0.455/docs/functions/header_names_mismatch.md +46 -0
  33. csvpath-0.0.455/docs/functions/implementing_functions.md +85 -0
  34. csvpath-0.0.455/docs/functions/import.md +47 -0
  35. csvpath-0.0.455/docs/functions/in.md +25 -0
  36. csvpath-0.0.455/docs/functions/increment.md +57 -0
  37. csvpath-0.0.455/docs/functions/jinja.md +84 -0
  38. csvpath-0.0.455/docs/functions/last.md +25 -0
  39. csvpath-0.0.455/docs/functions/line_number.md +39 -0
  40. csvpath-0.0.455/docs/functions/max.md +24 -0
  41. csvpath-0.0.455/docs/functions/metaphone.md +74 -0
  42. csvpath-0.0.455/docs/functions/mismatch.md +43 -0
  43. csvpath-0.0.455/docs/functions/no.md +19 -0
  44. csvpath-0.0.455/docs/functions/not.md +27 -0
  45. csvpath-0.0.455/docs/functions/now.md +13 -0
  46. csvpath-0.0.455/docs/functions/percent_unique.md +20 -0
  47. csvpath-0.0.455/docs/functions/pop.md +42 -0
  48. csvpath-0.0.455/docs/functions/print.md +44 -0
  49. csvpath-0.0.455/docs/functions/print_line.md +23 -0
  50. csvpath-0.0.455/docs/functions/print_queue.md +18 -0
  51. csvpath-0.0.455/docs/functions/regex.md +37 -0
  52. csvpath-0.0.455/docs/functions/replace.md +23 -0
  53. csvpath-0.0.455/docs/functions/reset_headers.md +20 -0
  54. csvpath-0.0.455/docs/functions/stdev.md +31 -0
  55. csvpath-0.0.455/docs/functions/stop.md +41 -0
  56. csvpath-0.0.455/docs/functions/string_functions.md +65 -0
  57. csvpath-0.0.455/docs/functions/subtract.md +64 -0
  58. csvpath-0.0.455/docs/functions/sum.md +38 -0
  59. csvpath-0.0.455/docs/functions/tally.md +48 -0
  60. csvpath-0.0.455/docs/functions/total_lines.md +6 -0
  61. csvpath-0.0.455/docs/functions/track.md +32 -0
  62. csvpath-0.0.455/docs/functions/variables.md +13 -0
  63. csvpath-0.0.455/docs/functions/variables_and_headers.md +30 -0
  64. csvpath-0.0.455/docs/functions.md +199 -0
  65. csvpath-0.0.455/docs/grammar.md +17 -0
  66. csvpath-0.0.455/docs/headers.md +47 -0
  67. csvpath-0.0.455/docs/images/logo-wordmark-white-on-black-trimmed-padded.png +0 -0
  68. csvpath-0.0.455/docs/images/logo-wordmark-white-trimmed.png +0 -0
  69. csvpath-0.0.455/docs/paths.md +119 -0
  70. csvpath-0.0.455/docs/qualifiers.md +118 -0
  71. csvpath-0.0.455/docs/references.md +63 -0
  72. csvpath-0.0.455/docs/terms.md +28 -0
  73. csvpath-0.0.455/docs/variables.md +117 -0
  74. {csvpath-0.0.454 → csvpath-0.0.455}/pyproject.toml +2 -1
  75. {csvpath-0.0.454 → csvpath-0.0.455}/LICENSE +0 -0
  76. {csvpath-0.0.454 → csvpath-0.0.455}/README.md +0 -0
  77. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/__init__.py +0 -0
  78. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/csvpath.py +0 -0
  79. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/csvpaths.py +0 -0
  80. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/managers/__init__.py +0 -0
  81. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/managers/csvpath_result.py +0 -0
  82. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/managers/csvpaths_manager.py +0 -0
  83. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/managers/results_manager.py +0 -0
  84. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/__init__.py +0 -0
  85. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/__init__.py +0 -0
  86. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/all.py +0 -0
  87. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/andf.py +0 -0
  88. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/any.py +0 -0
  89. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/between.py +0 -0
  90. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/empty.py +0 -0
  91. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/exists.py +0 -0
  92. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/inf.py +0 -0
  93. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/no.py +0 -0
  94. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/notf.py +0 -0
  95. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/orf.py +0 -0
  96. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/boolean/yes.py +0 -0
  97. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/count.py +0 -0
  98. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/count_headers.py +0 -0
  99. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/count_lines.py +0 -0
  100. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/count_scans.py +0 -0
  101. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/every.py +0 -0
  102. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/increment.py +0 -0
  103. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/tally.py +0 -0
  104. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/counting/total_lines.py +0 -0
  105. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/function.py +0 -0
  106. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/function_factory.py +0 -0
  107. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/function_focus.py +0 -0
  108. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/headers/end.py +0 -0
  109. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/headers/header_name.py +0 -0
  110. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/headers/header_names_mismatch.py +0 -0
  111. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/headers/headers.py +0 -0
  112. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/headers/mismatch.py +0 -0
  113. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/headers/reset_headers.py +0 -0
  114. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/advance.py +0 -0
  115. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/after_blank.py +0 -0
  116. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/dups.py +0 -0
  117. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/first.py +0 -0
  118. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/first_line.py +0 -0
  119. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/last.py +0 -0
  120. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/lines/stop.py +0 -0
  121. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/above.py +0 -0
  122. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/add.py +0 -0
  123. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/divide.py +0 -0
  124. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/equals.py +0 -0
  125. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/mod.py +0 -0
  126. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/multiply.py +0 -0
  127. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/round.py +0 -0
  128. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/subtract.py +0 -0
  129. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/math/sum.py +0 -0
  130. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/collect.py +0 -0
  131. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/datef.py +0 -0
  132. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/get.py +0 -0
  133. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/importf.py +0 -0
  134. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/intf.py +0 -0
  135. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/nonef.py +0 -0
  136. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/now.py +0 -0
  137. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/pushpop.py +0 -0
  138. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/random.py +0 -0
  139. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/regex.py +0 -0
  140. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/replace.py +0 -0
  141. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/track.py +0 -0
  142. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/misc/variables.py +0 -0
  143. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/print/jinjaf.py +0 -0
  144. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/print/print_line.py +0 -0
  145. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/print/print_queue.py +0 -0
  146. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/print/printf.py +0 -0
  147. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/stats/correlate.py +0 -0
  148. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/stats/minf.py +0 -0
  149. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/stats/percent.py +0 -0
  150. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/stats/percent_unique.py +0 -0
  151. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/stats/stdev.py +0 -0
  152. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/concat.py +0 -0
  153. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/length.py +0 -0
  154. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/lower.py +0 -0
  155. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/metaphone.py +0 -0
  156. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/starts_with.py +0 -0
  157. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/strip.py +0 -0
  158. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/substring.py +0 -0
  159. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/strings/upper.py +0 -0
  160. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/validation.py +0 -0
  161. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/validity/fail.py +0 -0
  162. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/functions/validity/failed.py +0 -0
  163. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/lark_parser.py +0 -0
  164. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/lark_transformer.py +0 -0
  165. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/matcher.py +0 -0
  166. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/__init__.py +0 -0
  167. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/equality.py +0 -0
  168. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/expression.py +0 -0
  169. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/header.py +0 -0
  170. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/matchable.py +0 -0
  171. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/qualified.py +0 -0
  172. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/reference.py +0 -0
  173. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/term.py +0 -0
  174. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/productions/variable.py +0 -0
  175. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/util/exceptions.py +0 -0
  176. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/util/expression_encoder.py +0 -0
  177. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/util/expression_utility.py +0 -0
  178. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/util/lark_print_parser.py +0 -0
  179. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/matching/util/print_parser.py +0 -0
  180. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/scanning/__init__.py +0 -0
  181. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/scanning/exceptions.py +0 -0
  182. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/scanning/parser.out +0 -0
  183. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/scanning/parsetab.py +0 -0
  184. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/scanning/scanner.py +0 -0
  185. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/scanning/scanning_lexer.py +0 -0
  186. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/config_exception.py +0 -0
  187. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/error.py +0 -0
  188. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/exceptions.py +0 -0
  189. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/last_line_stats.py +0 -0
  190. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/line_monitor.py +0 -0
  191. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/log_utility.py +0 -0
  192. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/metadata_parser.py +0 -0
  193. {csvpath-0.0.454 → csvpath-0.0.455}/csvpath/util/printer.py +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: csvpath
3
- Version: 0.0.454
3
+ Version: 0.0.455
4
4
  Summary: A declarative language for data extraction and validation of CSV files
5
5
  Author: David Kershaw
6
6
  Author-email: dk107dk@hotmail.com
@@ -0,0 +1,19 @@
1
+ [csvpath_files]
2
+ extensions = txt, csvpath, csvpaths
3
+
4
+ [csv_files]
5
+ extensions = txt, csv, tsv, dat, tab, psv, ssv
6
+
7
+ [errors]
8
+ csvpath = raise, collect, stop, fail
9
+ csvpaths = raise, collect
10
+
11
+ [logging]
12
+ csvpath = info
13
+ csvpaths = info
14
+ log_file = logs/csvpath.log
15
+ log_files_to_keep = 100
16
+ log_file_size = 52428800
17
+
18
+ [config]
19
+ path = config/config.ini
@@ -68,7 +68,10 @@ class FilesManager(CsvPathsFilesManager): # pylint: disable=C0115
68
68
  path = os.path.join(base, p)
69
69
  self.named_files[name] = path
70
70
  else:
71
- pass
71
+ self.csvpaths.logger.debug(
72
+ "Skipping %s because extension not in accept list",
73
+ os.path.join(base, p),
74
+ )
72
75
 
73
76
  def add_named_file(self, *, name: str, path: str) -> None:
74
77
  self.named_files[name] = path
@@ -1,6 +1,7 @@
1
1
  from configparser import RawConfigParser
2
2
  from dataclasses import dataclass
3
3
  from os import path, environ
4
+ import os
4
5
  from typing import Dict, List
5
6
  from enum import Enum
6
7
  import logging
@@ -57,17 +58,17 @@ class CsvPathConfig:
57
58
  def __init__(self, holder):
58
59
  self._holder = holder
59
60
  self._config = RawConfigParser()
61
+ self.log_file_handler = None
60
62
  self._configpath = environ.get(CsvPathConfig.CSVPATH_CONFIG_FILE_ENV)
61
63
  if self._configpath is None:
62
64
  self._configpath = CsvPathConfig.CONFIG
63
- self.log_file_handler = None
64
65
  self._load_config()
65
66
 
66
67
  def reload(self):
68
+ self._config = RawConfigParser()
67
69
  self._load_config()
68
70
 
69
71
  def set_config_path_and_reload(self, path: str) -> None:
70
- self._config = RawConfigParser()
71
72
  self._configpath = path
72
73
  self.reload()
73
74
 
@@ -76,9 +77,9 @@ class CsvPathConfig:
76
77
  return self._configpath
77
78
 
78
79
  def _get(self, section: str, name: str):
80
+ if self._config is None:
81
+ raise ConfigurationException("No config object available")
79
82
  try:
80
- if self._config is None:
81
- raise ConfigurationException("No config object available")
82
83
  s = self._config[section][name]
83
84
  s = s.strip()
84
85
  ret = None
@@ -92,31 +93,83 @@ class CsvPathConfig:
92
93
  f"Check config at {self.config_path} for [{section}][{name}]"
93
94
  )
94
95
 
95
- def _load_config(self):
96
- if not path.isfile(self._configpath):
97
- raise ConfigurationException(
98
- "No config file at {self._configpath}"
99
- ) # pragma: no cover
100
- else:
101
- self._config.read(self._configpath)
102
- self.csvpath_file_extensions = self._get(
103
- Sections.CSVPATH_FILES.value, "extensions"
104
- )
105
- self.csv_file_extensions = self._get(Sections.CSV_FILES.value, "extensions")
106
-
107
- self.csvpath_errors_policy = self._get(Sections.ERRORS.value, "csvpath")
108
- self.csvpaths_errors_policy = self._get(Sections.ERRORS.value, "csvpaths")
109
-
110
- self.csvpath_log_level = self._get(Sections.LOGGING.value, "csvpath")
111
- self.csvpaths_log_level = self._get(Sections.LOGGING.value, "csvpaths")
96
+ def _create_default_config(self) -> None:
97
+ if not path.exists("config"):
98
+ os.makedirs("config")
99
+ with open(CsvPathConfig.CONFIG, "w") as file:
100
+ c = """
101
+ [csvpath_files]
102
+ extensions = txt, csvpath, csvpaths
103
+ [csv_files]
104
+ extensions = txt, csv, tsv, dat, tab, psv, ssv
105
+ [errors]
106
+ csvpath = raise, collect, stop, fail
107
+ csvpaths = raise, collect
108
+ [logging]
109
+ csvpath = info
110
+ csvpaths = info
111
+ log_file = logs/csvpath.log
112
+ log_files_to_keep = 100
113
+ log_file_size = 52428800
114
+ [config]
115
+ path =
116
+ """
117
+ file.write(c)
118
+ print(f"Creating a default config file at {CsvPathConfig.CONFIG}.")
119
+ print("If you want your config to be somewhere else remember to")
120
+ print("update the path in the default config.ini")
121
+
122
+ def _assure_logs_path(self) -> None:
123
+ filepath = self.log_file
124
+ if not filepath:
125
+ filepath = "logs/csvpath.log"
126
+ self.log_file = filepath
127
+ dirpath = self._get_dir_path(filepath)
128
+ if dirpath and not path.exists(dirpath):
129
+ os.makedirs(dirpath)
130
+
131
+ def _get_dir_path(self, filepath):
132
+ if filepath.find(os.sep) > -1:
133
+ dirpath = filepath[0 : filepath.rfind(os.sep)]
134
+ return dirpath
135
+ return None
136
+
137
+ def _assure_config_file_path(self) -> None:
138
+ if not self._configpath or not os.path.isfile(self._configpath):
139
+ self._configpath = CsvPathConfig.CONFIG
140
+ self._create_default_config()
112
141
 
113
- self.log_file = self._get(Sections.LOGGING.value, LogFile.LOG_FILE.value)
114
- self.log_files_to_keep = self._get(
115
- Sections.LOGGING.value, LogFile.LOG_FILES_TO_KEEP.value
116
- )
117
- self.log_file_size = self._get(
118
- Sections.LOGGING.value, LogFile.LOG_FILE_SIZE.value
119
- )
142
+ def _load_config(self, norecurse=False):
143
+ self._assure_config_file_path()
144
+ #
145
+ #
146
+ #
147
+ self._config.read(self._configpath)
148
+ self.csvpath_file_extensions = self._get(
149
+ Sections.CSVPATH_FILES.value, "extensions"
150
+ )
151
+ self.csv_file_extensions = self._get(Sections.CSV_FILES.value, "extensions")
152
+
153
+ self.csvpath_errors_policy = self._get(Sections.ERRORS.value, "csvpath")
154
+ self.csvpaths_errors_policy = self._get(Sections.ERRORS.value, "csvpaths")
155
+
156
+ self.csvpath_log_level = self._get(Sections.LOGGING.value, "csvpath")
157
+ self.csvpaths_log_level = self._get(Sections.LOGGING.value, "csvpaths")
158
+
159
+ self.log_file = self._get(Sections.LOGGING.value, LogFile.LOG_FILE.value)
160
+ self.log_files_to_keep = self._get(
161
+ Sections.LOGGING.value, LogFile.LOG_FILES_TO_KEEP.value
162
+ )
163
+ self.log_file_size = self._get(
164
+ Sections.LOGGING.value, LogFile.LOG_FILE_SIZE.value
165
+ )
166
+ path = self._get("config", "path")
167
+ if path:
168
+ path = path.strip().lower()
169
+ if path and path != "" and path != self._configpath.strip().lower():
170
+ self._configpath = path
171
+ self.reload()
172
+ return
120
173
  self.validate_config()
121
174
 
122
175
  def validate_config(self) -> None:
@@ -176,10 +229,14 @@ class CsvPathConfig:
176
229
  if self.csvpaths_log_level not in LogLevels:
177
230
  raise ConfigurationException(f"CsvPaths log level {_} is wrong")
178
231
  #
179
- # log files
232
+ # log files config
180
233
  #
181
234
  if self.log_file is None or not isinstance(self.log_file, str):
182
235
  raise ConfigurationException(f"Log file path is wrong: {self.log_file}")
236
+ #
237
+ # make sure the log dir exists
238
+ #
239
+ self._assure_logs_path()
183
240
  if self.log_files_to_keep is None or not isinstance(
184
241
  self.log_files_to_keep, int
185
242
  ):
@@ -0,0 +1,22 @@
1
+
2
+ # Asbool
3
+
4
+ The `asbool` qualifier makes CsvPath consider the match component it is used on as a bool according to its value, rather than as an existence test.
5
+
6
+ The difference is:
7
+
8
+ - CsvPath evaluates "true" and "false" as their bool equivalents, `True` and `False` respectively
9
+ - A match component used as an existence test without `asbool` evaluates to `True` or `False` based on any its `not None` condition, resulting in, for e.g., the value `False` == `True` because `False` exists
10
+
11
+ As an example, the value of `not.asbool()` is assigned according to:
12
+
13
+ | When | Example | Example's result |
14
+ |---------------------------------------------|---------------------|---------------------|
15
+ | If used alone, as a boolean match condition | #a.asbool | evaluated as a bool |
16
+ | Assignment | @a.asbool = #b | True |
17
+ | With the `nocontrib` qualifier | @a.nocontrib.asbool = no() | True. In this case `nocontrib` overrides the match value of `asbool`. |
18
+ | With the `onchange` qualifier | @a.onchange.asbool = @b | evaluated as a bool |
19
+ | With the `latch` qualifier | @a.latch.asbool = @b| evaluated as a bool |
20
+ | With the `onmatch` qualifier | @a.onmatch.asbool = @b | True for the purposes of the whole row matching and evaluated as a bool if/when it does |
21
+
22
+
@@ -0,0 +1,81 @@
1
+
2
+ # The Role Of Qualifiers In Assignment
3
+
4
+ Assignment is the process of setting a variable equal to a value from another variable or a header, function, or term. There is more to assignment than you might think.
5
+
6
+ Assignments have this common `x = y` form:
7
+
8
+ - `@x = #y`
9
+ - `@x = "a value"`
10
+ - `@x = @y`
11
+ - `@x.y = @z`
12
+ - `@x = count()`
13
+
14
+ Where things get more interesting is qualifiers.
15
+
16
+ There are a number of qualifiers. These are the ones that are involved with assignment:
17
+
18
+ - `asbool`
19
+ - `decrease`
20
+ - `increase`
21
+ - `latch`
22
+ - `nocontrib`
23
+ - `notnone`
24
+ - `onchange`
25
+ - `onmatch`
26
+
27
+ These qualifiers are consistently applied to all variable assignments. This in contrast to the more case-by-case way qualifiers work in other match components.
28
+
29
+ You can <a href='https://github.com/dk107dk/csvpath/blob/main/docs/qualifiers.md'>read about all the qualifiers here</a>.
30
+
31
+ Qualifiers can go together in any order. CsvPath decides what to do in an assignment based on all the qualifiers it finds and which are applicable. There are three ways a qualifier relates to an assignment:
32
+
33
+ - It may qualify the specific act of the assignment
34
+ - It may qualify the assignment's relationship to the whole row
35
+ - Or it can apply to just the actual value itself that is being assigned
36
+
37
+ You can think of these happening in layers.
38
+
39
+ ## Match Values Of Assignments
40
+
41
+ | Form | Match value |
42
+ |----------------------------------------------------|--------------------------------------------|
43
+ | `@x = y` | True |
44
+ | `@x.latch = y` | True on first assignment, otherwise False |
45
+ | `@x.onchange = y` | True if changed, otherwise False |
46
+ | `@x.onmatch = y` | True if row matches, otherwise False |
47
+ | `@x.increase = y` | True if y is > x, otherwise False |
48
+ | `@x.decrease = y` | True if y is < x, otherwise False |
49
+ | `@x.notnone = y` | True if y is not None, otherwise False |
50
+ | `@x.[any-qualifiers-except-nocontrib].asbool = y` | True or False determined by the value of x |
51
+ | `@x.[any-other-qualifiers].nocontrib = y` | True
52
+
53
+ A typical assignment doesn't contribute to the match decision for a row. On the other hand, `increase`, `decrease`, `latch`, `notnone`, and `onchange` assignments contribute to matching.
54
+
55
+ A latched assignment is one where the variable is set once and then never changes. For any row, regardless if the latched variable has been set, the match value is True. Again, the idea is that the latched assignment is a non-consideration for matching.
56
+
57
+ An `onchange` assignment is one that only happens when the variable's value changes. Obviously, if a variable's value is the same as as the new value it doesn't matter for that variable if the assignment happens. However, `onchange`'s effect is on the match, not the assignment. If a variable with `onchange` is assigned to a value that it already holds, the match fails for the whole row. Conversely, if the same variable is given a new value, the variable's contribution to the row matching is True.
58
+
59
+ `onmatch` is similar. If a row matches in all other respects, its `onmatch` variable assignments happen. If the same row doesn't match in other respects, the `onmatch`ed variable concurs -- it doesn't make the assignment and returns False in the match.
60
+
61
+ `notnone` simply blocks a variable assignment if the value to be assigned is None. In that case the match vote is negative.
62
+
63
+ `increase` and `decreae` are similar to `notnone`. They block assignment and report False if the value to be assigned to the variable is less than or greater than the current value, respectively.
64
+
65
+ `asbool` overrides the previously described variable qualifiers. If `asbool` is found, the assignment returns True or False in the match according to the value the variable is set to. The interpretation of the variable's new value as a bool is similar to Python's `bool(x)`, but with the addition of "true" and "false" being treated as the True and False value, respectively.
66
+
67
+ Finally, `nocontrib` is superior to all the other variable qualifiers, from the perspective of match decision voting. If found, the assignment is not considered for matching.
68
+
69
+ The primacy or order of consideration is:
70
+
71
+ 1. Assignments with no qualifiers always succeed
72
+ 2. `nocontrib` overrides other assignment qualifiers
73
+ 3. `asbool` overlays all other assignment qualifiers except nocontrib
74
+ 4. `onmatch` determines if any of the below qualifiers come into play
75
+ 4. `notnone` is the next highest priority
76
+ 5. `increase` and `decrease` are the next highest priority after `notnone`
77
+ 4. `onchange` or `latch` are the lowest priority, meaning that they can be overridden, blocked from consideration, or have their match vote (or absence of voting) modified by any of the other variable qualifiers.
78
+
79
+ This may seem like a lot. And it is. The silver lining is that the qualifiers have a lot of expressive power that you can tap into when you need it, and ignore when you don't.
80
+
81
+
@@ -0,0 +1,69 @@
1
+
2
+ # Config
3
+
4
+ CsvPaths has a few config options. By default, the config options are in `./config/config.ini`. You can change the location of your .ini file in two ways:
5
+ - Set a `CSVPATH_CONFIG_FILE` env var pointing to your file
6
+ - Create an instance of CsvPathConfig, set its CONFIG property, and call the `reload()` method
7
+
8
+ The config options, at this time, are about:
9
+ - File extensions
10
+ - Error handling
11
+ - Logging
12
+
13
+ There are two types of files you can set extensions for:
14
+ - CSV files
15
+ - CsvPath files
16
+
17
+ ## File Extensions
18
+
19
+ The defaults for these are:
20
+
21
+ ```ini
22
+ [csvpath_files]
23
+ extensions = txt, csvpath
24
+
25
+ [csv_files]
26
+ extensions = txt, csv, tsv, dat, tab, psv, ssv
27
+ ```
28
+
29
+ ## Error Handling
30
+
31
+ The error settings are for when CsvPath or CsvPaths instances encounter problems. The options are:
32
+ - `stop` - Halt processing; the CsvPath stopped property is set to True
33
+ - `fail` - Mark the currently running CsvPath as having failed
34
+ - `raise` - Raise the exception in as noisy a way as possible
35
+ - `quiet` - Do nothing that affects the system out; this protects command line redirection of `print()` output. Logging is also minimized such that errors that would release a lot of metadata are slimmed down.
36
+ - `collect` - Collect the errors in the error results for the CsvPath. This option is available with and without a CsvPaths instance.
37
+
38
+ Multiple of these settings can be configured together. Quiet and raise do not coexist well. Raise will win because seeing problems lets you fix them.
39
+
40
+ ## Logging
41
+
42
+ Logging levels are set at the major-component level. The components are:
43
+ - `csvpath`
44
+ - `csvpaths`
45
+ - `matcher`
46
+ - `scanner`
47
+
48
+ Four levels are available:
49
+ - `error`
50
+ - `warning`
51
+ - `debug`
52
+ - `info`
53
+
54
+ The levels are intended for the same functionality as their Python equivalents.
55
+
56
+ CsvPath logs are directed to a file. The log file settings are:
57
+ - `log_file` - a path to the log
58
+ - `log_files_to_keep` - a number of logs, 1 to 100, kept in rotation before being deleted
59
+ - `log_file_size` - an indication of roughly when a log file will be rotated
60
+
61
+ As an example:
62
+ ```ini
63
+ log_file = logs/csvpath.log
64
+ log_files_to_keep = 100
65
+ log_file_size = 52428800
66
+ ```
67
+
68
+
69
+
@@ -0,0 +1,119 @@
1
+
2
+ # Examples
3
+
4
+ These are simple examples of csvpath match parts. Test them yourself before relying on them. See the unit test for more simple path ideas.
5
+
6
+ 1. Find a value
7
+
8
+ ```bash
9
+ [ ~A running average of ages from 5 and 85~
10
+ between(#age, 4, 86)
11
+ @average_age.onmatch = average(#age, "match")
12
+ last.nocontrib() -> print("The average age between 5 and 85 is $.variables.average_age")
13
+ ]
14
+ ```
15
+
16
+ 2. Create a file
17
+
18
+ ```bash
19
+ [ ~Create a new CSV file sampling sales greater than $2000 in a region~
20
+ #region == or( "emea", "us" )
21
+ @r = random(0,1)
22
+ @line = line_count()
23
+ gt(#sale, 2000)
24
+ @ave = average.test.onmatch(#sale, "line")
25
+
26
+ count_lines() == 1 ->
27
+ print("line, region, average, sale, salesperson")
28
+ @r == 1 ->
29
+ print("$.variables.line, $.headers.region, $.variables.ave, $.headers.sale, $.headers.seller")
30
+ ]
31
+ ```
32
+
33
+ 3. Validate a file
34
+
35
+ ```bash
36
+ [ ~Apply five rules to check if this file meets expectations~
37
+ @last_age.onchange = @current_age
38
+ @current_age = #age
39
+
40
+ length(#lastname)==30 -> print("$.csvpath.line_count: lastname $.headers.lastname is > 30")
41
+ not( column(2) == "firstname" ) -> print("$.csvpath.line_count: 3rd header must be firstname, not $headers.2")
42
+ not(any(header())) -> print("$.csvpath.line_count: check for missing values")
43
+ not(in(#title, "ceo|minon")) -> print("$.csvpath.line_count: title cannot be $.headers.title")
44
+ gt(@last_age, @current_age) -> print("$.csvpath.line_count: check age, it went down!")
45
+ ]
46
+ ```
47
+
48
+ 4. Find a first value
49
+
50
+ ```bash
51
+ [ ~ Find the first times fruit were the most popular and the most recent popular fruit ~
52
+ @fruit = in( #food, "Apple|Pear|Blueberry")
53
+ exists( @fruit.asbool )
54
+ first.year.onmatch( #year )
55
+ @fruit.asbool -> print("$.headers.food was the most popular food for the first time in $.headers.year")
56
+ last.nocontrib() -> print("First years for a type of fruit: $.variables.year")
57
+ ]
58
+ ```
59
+
60
+ 5. Keep it simple
61
+
62
+ This works:
63
+
64
+ ```bash
65
+ $/User/fred/some_dir/csvpaths/test.csv[*][
66
+ line_count() == 1 -> print("$.csvpath.headers")
67
+ not( line_count() == 1 ) -> stop()
68
+ ]
69
+ ```
70
+
71
+ This is better:
72
+
73
+ ```bash
74
+ $test[*][
75
+ line_count() == 1 -> print("$.csvpath.headers")
76
+ not( line_count() == 1 ) -> stop()
77
+ ]
78
+ ```
79
+
80
+ Still better:
81
+
82
+ ```bash
83
+ $test[*][
84
+ line_count() == 1 -> print("$.csvpath.headers")
85
+ stop()
86
+ ]
87
+ ```
88
+
89
+ Moving on up:
90
+
91
+ ```bash
92
+ $test[*][
93
+ firstline() -> print("$.csvpath.headers")
94
+ stop()
95
+ ]
96
+ ```
97
+
98
+ Getting there:
99
+
100
+ ```bash
101
+ $test[*][
102
+ print("$.csvpath.headers")
103
+ stop()
104
+ ]
105
+ ```
106
+
107
+ Stop here?
108
+
109
+ ```bash
110
+ $test[0][ print("$.csvpath.headers") ]
111
+ ```
112
+
113
+ Best:
114
+
115
+ ```bash
116
+ $[0][ print("$.csvpath.headers") ]
117
+ ```
118
+
119
+
@@ -0,0 +1,55 @@
1
+
2
+ # Named Files
3
+
4
+ The file identifier following the root `$` and preceding the scanning part of the csvpath can be:
5
+ - A relative or absolute file path
6
+ - A logical identifier that points indirectly to a physical file, as described below
7
+ - The empty string, in which case the file association happens in CsvPaths on the fly
8
+
9
+ Filenames must match this regular expression `[A-Z,a-z,0-9\._/\-\\#&]+`. I.e. they have:
10
+
11
+ - alphanums
12
+ - forward and backward slashes
13
+ - dots
14
+ - hash marks
15
+ - dashes
16
+ - underscores, and
17
+ - ampersands.
18
+
19
+ ## Using CsvPaths To Work With Files
20
+
21
+ You can use the `CsvPaths` class to set up a list of named files so that you can have more concise csvpaths. Named files can take the form of:
22
+
23
+ - A JSON file with a dictionary of file system paths under name keys
24
+ - A dict object passed into the CsvPaths object containing the same name-to-file-path structure
25
+ - A file system path pointing to a directory that will be used to populate the named files with all contained files
26
+
27
+ Using named files requires `CsvPaths`, but the configuration happens in a CsvPaths's <a href='https://github.com/dk107dk/csvpath/blob/main/csvpath/managers/files_manager.py'>FilesManager</a>.
28
+
29
+ ## Example
30
+
31
+ ```python
32
+ paths = CsvPaths()
33
+ paths.files_manager.add_named_file("test", "tests/test_resources/test.csv")
34
+ path = paths.csvpath()
35
+ path.parse( """$test[*][#firstname=="Fred"]""" )
36
+ rows = path.collect()
37
+ ```
38
+ This csvpath will be applied to the file named `"test"` and match rows where the `firstname` is `"Fred"`. The matched rows will be returned from the `collect()` method.
39
+
40
+ ## FilesManager
41
+
42
+ The FilesManager methods are:
43
+
44
+ | Method | Description |
45
+ |-------------------------------------|---------------------------------------------------------------------|
46
+ | add_named_files_from_dir(dir_path) | Adds all files to the named files set by names minus any extension |
47
+ | add_named_files_from_json(filename) | Adds named files paths from a dict JSON structure found in the file |
48
+ | set_named_files(Dict[str, str]) | Replaces all named files with the contents of a dict |
49
+ | add_named_file(name, path) | Adds a single named file path |
50
+ | get_named_file(name) | Gets the full file path associated with the name |
51
+ | remove_named_file(name) | Removes a named file |
52
+
53
+
54
+ Using these methods you can setup a CsvPaths, like the example above, then use a csvpath like `$logical_name[*][yes()]` to apply the csvpath to the file named `logical_name` in your CsvPaths object's `files_manager`. This use is easy and nearly transparent.
55
+
@@ -0,0 +1,34 @@
1
+
2
+ # Above, Below
3
+
4
+ The ordinal comparison functions have multiple aliases, but a single simple function. They are:
5
+
6
+ - `above()`, `gt()`, `after()`
7
+ - `below()`, `lt()`, `before()`
8
+
9
+ The functionality is just what you would expect: they implement the `>` and `<` operators as functions.
10
+
11
+ Above and below handle number, date, and string comparisons. At this time they don't have any settings that would refine their function for specific use cases. Comparison by the three types is attempted in this order:
12
+ - Number
13
+ - Date
14
+ - String
15
+
16
+ Keep in mind two important considerations:
17
+ - A number compared with a stringified number is a viable comparison, no different than a number and a number. The conversion is the same as in the underlying Python.
18
+ - These functions answer a question: _is this thing before or after this other thing?_ The answer to: _is None after 1048?_ is _no, it is not_. The result of comparing `None` or `nan` to a regular `int`, `float`, `date`, or `string` is `False`. The functions believe the question is valid and the answer is `False` because there is no ordinal relationship between the two things being compared.
19
+
20
+ There are no differences between each function's three aliases. The reason to have them is just to suit the use case. For instance, in comparing two numbers `lt()` and `gt()` may feel more right; whereas, in finding a point in time between two dates `before()` and `after()` may feel like a better fit.
21
+
22
+ ## Examples
23
+
24
+ ```bash
25
+ $[*][ before(#graduation, 2003) ]
26
+ ```
27
+ This path matches on year of graduation as an int.
28
+
29
+ ```bash
30
+ $[*][ before(@gradulation, date("2020-05-30", "%Y-%m-%d"))
31
+ after(@gradulation, date("2000-05-30", "%Y-%m-%d"))]
32
+
33
+ This path uses `before()` and `after() to find if a date is between two dates.
34
+
@@ -0,0 +1,27 @@
1
+
2
+ # Advance
3
+
4
+ `advance()` allows your csvpath to skip ahead some number of rows. Skipping rows avoids the processing overhead of checking if the rows match. The rows skipped are, of course, iterated; however, depending on the content of your csvpath, substantial latency may be avoided. Advance may be most useful for sampling or spot-checking.
5
+
6
+ Note that you cannot advance during an advance. This is because `advance()` is a match component and will not be activated during the skipped iterations of an existing call to `advance()`.
7
+
8
+ ## Examples
9
+
10
+ ```bash
11
+ $[1*][
12
+ ~ collect a sample of 1000 responders ~
13
+ below(count(), 1001) -> advance(random(1,50))
14
+ ]
15
+ ```
16
+
17
+ This csvpath collects a random sample of 1000 rows, starting after the header row. The sampled rows are from 1 to 50 lines apart.
18
+
19
+ One way to run it looks like this, using dict objects to identify the csvpaths and files. There are, of course, other simple ways to do it.
20
+
21
+ ```python
22
+ paths = CsvPaths()
23
+ paths.files_manager.set_named_files(nf)
24
+ paths.paths_manager.set_named_paths(np)
25
+ lines = paths.collect_paths(pathsname="sample", filename="survey")
26
+ ```
27
+
@@ -0,0 +1,18 @@
1
+
2
+ # After Blank
3
+
4
+ `after_blank()` matches when the current line was preceded by a line with no values.
5
+
6
+ Bear in mind a few things:
7
+ - By default CsvPath skips truly blank lines
8
+ - Lines with delimiters but no data are not exactly blank
9
+ - Lines with fewer than the expected number of headers aren't blank
10
+
11
+ This function considers:
12
+ - The physical lines in the file
13
+ - The possibility of delimiters but no values
14
+
15
+ Basically, if the preceding line had no data in any header or no characters at all (or only whitespace characters), `after_blank()` would return True.
16
+
17
+ `after_blank()` takes no arguments and always returns a bool value.
18
+