persidict 0.4.1__tar.gz → 0.5.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of persidict might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  MIT License
2
2
 
3
- Copyright (c) 2020-2023 Vlad (Volodymyr) Pavlov, Illia Shestakov, Kai Zhao.
3
+ Copyright (c) 2020-2024 Vlad (Volodymyr) Pavlov, Illia Shestakov, Kai Zhao.
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: persidict
3
- Version: 0.4.1
3
+ Version: 0.5.0
4
4
  Summary: Simple persistent key-value store for Python. Values are stored as files on a disk or as S3 objects on AWS cloud.
5
5
  Home-page: https://github.com/vladlpavlov/persidict
6
6
  Author: Vlad (Volodymyr) Pavlov
@@ -149,6 +149,7 @@ that simultaneously work with the same instance of a dictionary.
149
149
 
150
150
  * Keys must be sequences of URL/filename-safe non-empty strings.
151
151
  * Values must be pickleable Python objects.
152
+ * You can constrain values to be an instance of a specific class.
152
153
  * Insertion order is not preserved.
153
154
  * You can not assign initial key-value pairs to a dictionary in its constructor.
154
155
  * `PersiDict` API has additional methods `delete_if_exists()`, `mtimestamp()`,
@@ -160,9 +161,18 @@ not available in Python dicts.
160
161
  `PersiDict` subclasses have a number of parameters that can be used
161
162
  to impact behaviour of a dictionary.
162
163
 
164
+ * `base_class_for_values` - A base class for values stored in a dictionary.
165
+ If specified, it will be used to check types of values in the dictionary.
166
+ If not specified (if set to `None`), no type checking will be performed
167
+ and all types will be allowed.
163
168
  * `file_type` - a string that specifies the type of files used to store objects.
164
- Possible values are "json" and "lz4". Default value is "lz4".
165
- Storing objects as JSON files is mostly supported for debugging purposes.
169
+ If `file_type` has one of two values: "lz4" or "json", it defines
170
+ which file format will be used by the dictionary to store values.
171
+ For all other values of `file_type`, the file format will always be plain
172
+ text. "lz4" or "json" allow to store arbitrary Python objects,
173
+ while all other file_type-s only work with str objects;
174
+ it means `base_class_for_values` must be explicitly set to `str`
175
+ if `file_type` is not set to "lz4" or "json".
166
176
  * `immutable_items` - a boolean that specifies whether items in a dictionary
167
177
  can be modified/deleted. It enables various distributed cache optimizations
168
178
  for remote storage. True means an append-only dictionary.
@@ -189,6 +199,7 @@ Binary installers for the latest released version are available at the Python pa
189
199
 
190
200
  * [jsonpickle](https://jsonpickle.github.io)
191
201
  * [joblib](https://joblib.readthedocs.io)
202
+ * [lz4](https://python-lz4.readthedocs.io)
192
203
  * [pandas](https://pandas.pydata.org)
193
204
  * [numpy](https://numpy.org)
194
205
  * [boto3](https://boto3.readthedocs.io)
@@ -120,6 +120,7 @@ that simultaneously work with the same instance of a dictionary.
120
120
 
121
121
  * Keys must be sequences of URL/filename-safe non-empty strings.
122
122
  * Values must be pickleable Python objects.
123
+ * You can constrain values to be an instance of a specific class.
123
124
  * Insertion order is not preserved.
124
125
  * You can not assign initial key-value pairs to a dictionary in its constructor.
125
126
  * `PersiDict` API has additional methods `delete_if_exists()`, `mtimestamp()`,
@@ -131,9 +132,18 @@ not available in Python dicts.
131
132
  `PersiDict` subclasses have a number of parameters that can be used
132
133
  to impact behaviour of a dictionary.
133
134
 
135
+ * `base_class_for_values` - A base class for values stored in a dictionary.
136
+ If specified, it will be used to check types of values in the dictionary.
137
+ If not specified (if set to `None`), no type checking will be performed
138
+ and all types will be allowed.
134
139
  * `file_type` - a string that specifies the type of files used to store objects.
135
- Possible values are "json" and "lz4". Default value is "lz4".
136
- Storing objects as JSON files is mostly supported for debugging purposes.
140
+ If `file_type` has one of two values: "lz4" or "json", it defines
141
+ which file format will be used by the dictionary to store values.
142
+ For all other values of `file_type`, the file format will always be plain
143
+ text. "lz4" or "json" allow to store arbitrary Python objects,
144
+ while all other file_type-s only work with str objects;
145
+ it means `base_class_for_values` must be explicitly set to `str`
146
+ if `file_type` is not set to "lz4" or "json".
137
147
  * `immutable_items` - a boolean that specifies whether items in a dictionary
138
148
  can be modified/deleted. It enables various distributed cache optimizations
139
149
  for remote storage. True means an append-only dictionary.
@@ -160,6 +170,7 @@ Binary installers for the latest released version are available at the Python pa
160
170
 
161
171
  * [jsonpickle](https://jsonpickle.github.io)
162
172
  * [joblib](https://joblib.readthedocs.io)
173
+ * [lz4](https://python-lz4.readthedocs.io)
163
174
  * [pandas](https://pandas.pydata.org)
164
175
  * [numpy](https://numpy.org)
165
176
  * [boto3](https://boto3.readthedocs.io)
@@ -3,13 +3,14 @@
3
3
  This functionality is implemented by the class FileDirDict
4
4
  (inherited from PersiDict): a dictionary that
5
5
  stores key-value pairs as files on a local hard-drive.
6
- A key is used to compose a filename, while a value is stored
7
- as a binary or a json object in the file.
6
+ A key is used to compose a filename, while a value is stored in the file
7
+ as a binary, or as a json object, or as a plain text
8
+ (depends on configuration parameters).
8
9
  """
9
10
  from __future__ import annotations
10
11
 
11
12
  import os
12
- from typing import Any
13
+ from typing import Any, Optional, Union
13
14
 
14
15
  import joblib
15
16
  import jsonpickle
@@ -33,31 +34,43 @@ class FileDirDict(PersiDict):
33
34
  Insertion order is not preserved.
34
35
 
35
36
  FileDirDict can store objects in binary files or in human-readable
36
- text files (using jsonpickles).
37
+ text files (either in jason format or as a plain text).
37
38
  """
38
39
 
39
40
  def __init__(self
40
41
  , dir_name: str = "FileDirDict"
41
42
  , file_type: str = "lz4"
42
43
  , immutable_items:bool = False
43
- , digest_len:int = 8):
44
+ , digest_len:int = 8
45
+ , base_class_for_values: Optional[str] = None):
44
46
  """A constructor defines location of the store and file format to use.
45
47
 
46
48
  dir_name is a directory that will contain all the files in
47
49
  the FileDirDict. If the directory does not exist, it will be created.
48
50
 
49
- file_type can take one of two values: "lz4" or "json".
50
- It defines which file format will be used by FileDirDict
51
- to store values.
51
+ base_class_for_values constraints the type of values that can be
52
+ stored in the dictionary. If specified, it will be used to
53
+ check types of values in the dictionary. If not specified,
54
+ no type checking will be performed and all types will be allowed.
55
+
56
+ file_type is extension, which will be used for all files in the dictionary.
57
+ If file_type has one of two values: "lz4" or "json", it defines
58
+ which file format will be used by FileDirDict to store values.
59
+ For all other values of file_type, the file format will always be plain
60
+ text. "lz4" or "json" allow to store arbitrary Python objects,
61
+ while all other file_type-s only work with str objects.
52
62
  """
53
63
 
54
64
  super().__init__(immutable_items = immutable_items
55
- ,digest_len = digest_len)
65
+ ,digest_len = digest_len
66
+ ,base_class_for_values = base_class_for_values)
56
67
 
57
68
  self.file_type = file_type
58
69
 
59
- assert file_type in {"json", "lz4"}, (
60
- "file_type must be either lz4 or json")
70
+ if (base_class_for_values is None or
71
+ not issubclass(base_class_for_values,str)):
72
+ assert file_type in {"json", "lz4"}, ("For non-string values "
73
+ + "file_type must be either lz4 or json")
61
74
  assert not os.path.isfile(dir_name)
62
75
  if not os.path.isdir(dir_name):
63
76
  os.mkdir(dir_name)
@@ -80,10 +93,11 @@ class FileDirDict(PersiDict):
80
93
  """ Get number of key-value pairs in the dictionary."""
81
94
 
82
95
  num_files = 0
96
+ suffix = "." + self.file_type
83
97
  for subdir_info in os.walk(self.base_dir):
84
98
  files = subdir_info[2]
85
99
  files = [f_name for f_name in files
86
- if f_name.endswith(self.file_type)]
100
+ if f_name.endswith(suffix)]
87
101
  num_files += len(files)
88
102
  return num_files
89
103
 
@@ -95,8 +109,9 @@ class FileDirDict(PersiDict):
95
109
 
96
110
  for subdir_info in os.walk(self.base_dir, topdown=False):
97
111
  (subdir_name, _, files) = subdir_info
112
+ suffix = "." + self.file_type
98
113
  for f in files:
99
- if f.endswith(self.file_type):
114
+ if f.endswith(suffix):
100
115
  os.remove(os.path.join(subdir_name, f))
101
116
  if (subdir_name != self.base_dir) and (
102
117
  len(os.listdir(subdir_name)) == 0 ):
@@ -149,8 +164,12 @@ class FileDirDict(PersiDict):
149
164
  elif self.file_type == "json":
150
165
  with open(file_name, 'r') as f:
151
166
  result = jsonpickle.loads(f.read())
167
+ elif issubclass(self.base_class_for_values, str):
168
+ with open(file_name, 'r') as f:
169
+ result = f.read()
152
170
  else:
153
- raise ValueError("file_type must be either lz4 or json")
171
+ raise ValueError("When base_class_for_values is not str,"
172
+ + " file_type must be either lz4 or json")
154
173
  return result
155
174
 
156
175
  def _save_to_file(self, file_name:str, value:Any) -> None:
@@ -162,8 +181,12 @@ class FileDirDict(PersiDict):
162
181
  elif self.file_type == "json":
163
182
  with open(file_name, 'w') as f:
164
183
  f.write(jsonpickle.dumps(value, indent=4))
184
+ elif issubclass(self.base_class_for_values, str):
185
+ with open(file_name, 'w') as f:
186
+ f.write(value)
165
187
  else:
166
- raise ValueError("file_type must be either lz4 or json")
188
+ raise ValueError("When base_class_for_values is not str,"
189
+ + " file_type must be either lz4 or json")
167
190
 
168
191
  def __contains__(self, key:PersiDictKey) -> bool:
169
192
  """True if the dictionary has the specified key, else False. """
@@ -182,6 +205,11 @@ class FileDirDict(PersiDict):
182
205
 
183
206
  def __setitem__(self, key:PersiDictKey, value:Any):
184
207
  """Set self[key] to value."""
208
+ if self.base_class_for_values is not None:
209
+ if not isinstance(value, self.base_class_for_values):
210
+ raise TypeError(
211
+ f"Value must be of type {self.base_class_for_values}")
212
+
185
213
  key = SafeStrTuple(key)
186
214
  filename = self._build_full_path(key, create_subdirs=True)
187
215
  if self.immutable_items:
@@ -218,9 +246,10 @@ class FileDirDict(PersiDict):
218
246
  return tuple(result)
219
247
 
220
248
  def step():
249
+ suffix = "." + self.file_type
221
250
  for dir_name, _, files in walk_results:
222
251
  for f in files:
223
- if f.endswith(self.file_type):
252
+ if f.endswith(suffix):
224
253
  prefix_key = os.path.relpath(
225
254
  dir_name, start=self.base_dir)
226
255
 
@@ -22,7 +22,7 @@ from __future__ import annotations
22
22
 
23
23
  from abc import abstractmethod
24
24
  import random
25
- from typing import Any, Dict, Union, Sequence
25
+ from typing import Any, Dict, Union, Sequence, Optional
26
26
  from collections.abc import MutableMapping
27
27
 
28
28
  from .safe_str_tuple import SafeStrTuple
@@ -42,7 +42,8 @@ class PersiDict(MutableMapping):
42
42
 
43
43
  An abstract base class for key-value stores. It accepts keys in a form of
44
44
  SafeStrSequence - a URL/filename-safe sequence of strings.
45
- It imposes no restrictions on types of values in the key-value pairs.
45
+ It assumes no restrictions on types of values in the key-value pairs,
46
+ but allows users to impose such restrictions.
46
47
 
47
48
  The API for the class resembles the API of Python's built-in Dict
48
49
  (see https://docs.python.org/3/library/stdtypes.html#mapping-types-dict)
@@ -68,6 +69,12 @@ class PersiDict(MutableMapping):
68
69
  of persistent dictionaries with case-insensitive
69
70
  (even if case-preserving) filesystems, such as MacOS HFS.
70
71
 
72
+ base_class_for_values: Any
73
+ A base class for values stored in the dictionary.
74
+ If specified, it will be used to check types of values
75
+ in the dictionary. If not specified, no type checking
76
+ will be performed and all types will be allowed.
77
+
71
78
  """
72
79
  # TODO: refactor to support variable length of min_digest_len
73
80
  digest_len:int
@@ -76,10 +83,12 @@ class PersiDict(MutableMapping):
76
83
  def __init__(self
77
84
  , immutable_items:bool
78
85
  , digest_len:int = 8
86
+ , base_class_for_values:Any = None
79
87
  , *args, **kwargas):
80
88
  assert digest_len >= 0
81
89
  self.digest_len = int(digest_len)
82
90
  self.immutable_items = bool(immutable_items)
91
+ self.base_class_for_values = base_class_for_values
83
92
 
84
93
 
85
94
  def __repr__(self):
@@ -88,6 +97,7 @@ class PersiDict(MutableMapping):
88
97
  repr_str += repr(dict(self.items()))
89
98
  repr_str += f", immutable_items={self.immutable_items}"
90
99
  repr_str += f", digest_len={self.digest_len}"
100
+ repr_str += f", base_class_for_values={self.base_class_for_values}"
91
101
  repr_str += ")"
92
102
  return repr_str
93
103
 
@@ -35,6 +35,7 @@ class S3Dict(PersiDict):
35
35
  , file_type:str = "pkl"
36
36
  , immutable_items:bool = False
37
37
  , digest_len:int = 8
38
+ , base_class_for_values:Any = None
38
39
  ,*args ,**kwargs):
39
40
  """A constructor defines location of the store and object format to use.
40
41
 
@@ -46,9 +47,17 @@ class S3Dict(PersiDict):
46
47
 
47
48
  dir_name is a local directory that will be used to store tmp files.
48
49
 
49
- file_type can take one of two values: "pkl" or "json".
50
- It defines which object format will be used by S3_Dict
51
- to store values.
50
+ base_class_for_values constraints the type of values that can be
51
+ stored in the dictionary. If specified, it will be used to
52
+ check types of values in the dictionary. If not specified,
53
+ no type checking will be performed and all types will be allowed.
54
+
55
+ file_type is extension, which will be used for all files in the dictionary.
56
+ If file_type has one of two values: "lz4" or "json", it defines
57
+ which file format will be used by FileDirDict to store values.
58
+ For all other values of file_type, the file format will always be plain
59
+ text. "lz4" or "json" allow to store arbitrary Python objects,
60
+ while all other file_type-s only work with str objects.
52
61
  """
53
62
 
54
63
  super().__init__(immutable_items = immutable_items, digest_len = 0)
@@ -58,6 +67,7 @@ class S3Dict(PersiDict):
58
67
  dir_name = dir_name
59
68
  , file_type = file_type
60
69
  , immutable_items = immutable_items
70
+ , base_class_for_values=base_class_for_values
61
71
  , digest_len = digest_len)
62
72
 
63
73
  self.region = region
@@ -137,6 +147,11 @@ class S3Dict(PersiDict):
137
147
  def __setitem__(self, key:PersiDictKey, value:Any):
138
148
  """Set self[key] to value. """
139
149
 
150
+ if self.base_class_for_values is not None:
151
+ if not isinstance(value, self.base_class_for_values):
152
+ raise TypeError(
153
+ f"Value must be of type {self.base_class_for_values}")
154
+
140
155
  key = SafeStrTuple(key)
141
156
  file_name = self.local_cache._build_full_path(key, create_subdirs=True)
142
157
  obj_name = self._build_full_objectname(key)
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: persidict
3
- Version: 0.4.1
3
+ Version: 0.5.0
4
4
  Summary: Simple persistent key-value store for Python. Values are stored as files on a disk or as S3 objects on AWS cloud.
5
5
  Home-page: https://github.com/vladlpavlov/persidict
6
6
  Author: Vlad (Volodymyr) Pavlov
@@ -149,6 +149,7 @@ that simultaneously work with the same instance of a dictionary.
149
149
 
150
150
  * Keys must be sequences of URL/filename-safe non-empty strings.
151
151
  * Values must be pickleable Python objects.
152
+ * You can constrain values to be an instance of a specific class.
152
153
  * Insertion order is not preserved.
153
154
  * You can not assign initial key-value pairs to a dictionary in its constructor.
154
155
  * `PersiDict` API has additional methods `delete_if_exists()`, `mtimestamp()`,
@@ -160,9 +161,18 @@ not available in Python dicts.
160
161
  `PersiDict` subclasses have a number of parameters that can be used
161
162
  to impact behaviour of a dictionary.
162
163
 
164
+ * `base_class_for_values` - A base class for values stored in a dictionary.
165
+ If specified, it will be used to check types of values in the dictionary.
166
+ If not specified (if set to `None`), no type checking will be performed
167
+ and all types will be allowed.
163
168
  * `file_type` - a string that specifies the type of files used to store objects.
164
- Possible values are "json" and "lz4". Default value is "lz4".
165
- Storing objects as JSON files is mostly supported for debugging purposes.
169
+ If `file_type` has one of two values: "lz4" or "json", it defines
170
+ which file format will be used by the dictionary to store values.
171
+ For all other values of `file_type`, the file format will always be plain
172
+ text. "lz4" or "json" allow to store arbitrary Python objects,
173
+ while all other file_type-s only work with str objects;
174
+ it means `base_class_for_values` must be explicitly set to `str`
175
+ if `file_type` is not set to "lz4" or "json".
166
176
  * `immutable_items` - a boolean that specifies whether items in a dictionary
167
177
  can be modified/deleted. It enables various distributed cache optimizations
168
178
  for remote storage. True means an append-only dictionary.
@@ -189,6 +199,7 @@ Binary installers for the latest released version are available at the Python pa
189
199
 
190
200
  * [jsonpickle](https://jsonpickle.github.io)
191
201
  * [joblib](https://joblib.readthedocs.io)
202
+ * [lz4](https://python-lz4.readthedocs.io)
192
203
  * [pandas](https://pandas.pydata.org)
193
204
  * [numpy](https://numpy.org)
194
205
  * [boto3](https://boto3.readthedocs.io)
@@ -5,7 +5,7 @@ with open("README.md", "r") as f:
5
5
 
6
6
  setuptools.setup(
7
7
  name="persidict"
8
- ,version="0.4.1"
8
+ ,version="0.5.0"
9
9
  ,author="Vlad (Volodymyr) Pavlov"
10
10
  ,author_email="vlpavlov@ieee.org"
11
11
  ,description= "Simple persistent key-value store for Python. "
File without changes