jsonlitedb 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,463 @@
1
+ Metadata-Version: 2.1
2
+ Name: jsonlitedb
3
+ Version: 0.1.0
4
+ Summary: A lightweight JSON-based database module
5
+ Home-page: https://github.com/Jwink3101/jsonlitedb
6
+ Author: Justin Winokur
7
+ Author-email: Jwink3101@users.noreply.github.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.8
12
+ Description-Content-Type: text/markdown
13
+
14
+ # JSONLiteDB
15
+
16
+ SQLite3-backed JSON document database with support for indices and advanced queries.
17
+
18
+ ![100% Coverage][100%]
19
+
20
+ ## Premise and Inspiration
21
+
22
+ JSONLiteDB leverages [SQLite3](https://sqlite.org/index.html) and [JSON1](https://sqlite.org/json1.html) to create a fast JSON document store with easy persistence, indexing capability, and extensible use.
23
+
24
+ JSONLiteDB provides an easy API with no need to load the entire database into memory, nor dump it when inserting! JSONLiteDB SQLite files are easily usable in other tools with no proprietary formats or encoding. JSONLiteDB is a great replacement for reading a JSON or JSONLines file. Entries can be modified in place. Queries can be indexed for *greatly* improved query speed and optionally to enforce uniqueness.
25
+
26
+ Similar tools and inspiration:
27
+
28
+ - [TinyDB](https://github.com/msiemens/tinydb). The API and process of TinyDB heavily inspired JSONLiteDB. But TinyDB reads the entire JSON DB into memory and needs to dump the entire database upon insertion. Hardly efficient or scalable and still queries at O(N).
29
+
30
+ - [Dataset](https://github.com/pudo/dataset) is promising but creates new columns for every key and is very "heavy" with its dependencies. As far as I can tell, there is no native way to support multi-column and/or unique indexes. But still, a very promising tool!
31
+
32
+ - [DictTable](https://github.com/Jwink3101/dicttable) (also written by me) is nice but entirely in-memory and not always efficient for non-equality queries.
33
+
34
+ <!--- BEGIN AUTO GENERATED -->
35
+ <!--- Auto Generated -->
36
+ <!--- DO NOT MODIFY. WILL NOT BE SAVED -->
37
+ ## Basic Usage
38
+
39
+ With some fake data.
40
+
41
+
42
+ ```python
43
+ >>> from jsonlitedb import JSONLiteDB
44
+ ```
45
+
46
+
47
+ ```python
48
+ >>> db = JSONLiteDB(":memory:")
49
+ >>> # more generally:
50
+ >>> # db = JSONLiteDB('my_data.db')
51
+ ```
52
+
53
+ Insert some data. Can use `insert()` with any number of items or `insertmany()` with an iterable (`insertmany([...]) <--> insert(*[...])`).
54
+
55
+ Can also use a context manager (`with db: ...`)to batch the insertions (or deletions).
56
+
57
+
58
+ ```python
59
+ >>> db.insert(
60
+ >>> {"first": "John", "last": "Lennon", "born": 1940, "role": "guitar"},
61
+ >>> {"first": "Paul", "last": "McCartney", "born": 1942, "role": "bass"},
62
+ >>> {"first": "George", "last": "Harrison", "born": 1943, "role": "guitar"},
63
+ >>> {"first": "Ringo", "last": "Starr", "born": 1940, "role": "drums"},
64
+ >>> {"first": "George", "last": "Martin", "born": 1926, "role": "producer"},
65
+ >>> )
66
+ ```
67
+
68
+
69
+ ```python
70
+ >>> len(db)
71
+ ```
72
+
73
+
74
+
75
+
76
+ 5
77
+
78
+
79
+
80
+
81
+ ```python
82
+ >>> list(db)
83
+ ```
84
+
85
+
86
+
87
+
88
+ [{'first': 'John', 'last': 'Lennon', 'born': 1940, 'role': 'guitar'},
89
+ {'first': 'Paul', 'last': 'McCartney', 'born': 1942, 'role': 'bass'},
90
+ {'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
91
+ {'first': 'Ringo', 'last': 'Starr', 'born': 1940, 'role': 'drums'},
92
+ {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
93
+
94
+
95
+
96
+ ### Simple Queries
97
+
98
+ Let's do some simple queries. The default `query()` returns an iterator so we wrap them in a list.
99
+
100
+
101
+ ```python
102
+ >>> list(db.query(first="George"))
103
+ ```
104
+
105
+
106
+
107
+
108
+ [{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
109
+ {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
110
+
111
+
112
+
113
+
114
+ ```python
115
+ >>> list(db.query(first="George", last="Martin"))
116
+ ```
117
+
118
+
119
+
120
+
121
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
122
+
123
+
124
+
125
+ Now let's query with a dictionary to match
126
+
127
+
128
+ ```python
129
+ >>> list(db.query({"first": "George"}))
130
+ ```
131
+
132
+
133
+
134
+
135
+ [{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
136
+ {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
137
+
138
+
139
+
140
+ Multiples are always an AND query
141
+
142
+
143
+ ```python
144
+ >>> list(db.query({"first": "George", "last": "Martin"}))
145
+ ```
146
+
147
+
148
+
149
+
150
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
151
+
152
+
153
+
154
+ Can do seperate items. Or not
155
+
156
+
157
+ ```python
158
+ >>> list(db.query({"first": "George"}, {"last": "Martin"}))
159
+ ```
160
+
161
+
162
+
163
+
164
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
165
+
166
+
167
+
168
+ ### Query Objects
169
+
170
+ Query objects enable more complex combinations and inequalities. Query objects can be from the database (`db.Query` or `db.Q`) or created on thier own (`Query()` or `Q()`). They are all the same.
171
+
172
+
173
+ ```python
174
+ >>> list(db.query(db.Q.first == "George"))
175
+ ```
176
+
177
+
178
+
179
+
180
+ [{'first': 'George', 'last': 'Harrison', 'born': 1943, 'role': 'guitar'},
181
+ {'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
182
+
183
+
184
+
185
+ Note that you need to be careful with parentheses as the operator precedance for the `&` and `|` are very high
186
+
187
+
188
+ ```python
189
+ >>> list(db.query((db.Q.first == "George") & (db.Q.last == "Martin")))
190
+ ```
191
+
192
+
193
+
194
+
195
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
196
+
197
+
198
+
199
+ Can do inequalities too
200
+
201
+
202
+ ```python
203
+ >>> list(db.query(db.Q.born < 1930))
204
+ ```
205
+
206
+
207
+
208
+
209
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
210
+
211
+
212
+
213
+ Queries support: `==`, `!=`, `<`, `<=`, `>`, `>=` for normal comparisons.
214
+
215
+ In addition they support
216
+
217
+ - `%` : `LIKE`
218
+ - `*` : `GLOB`
219
+ - `@` : `REGEXP` using Python's regex module
220
+
221
+
222
+
223
+ ```python
224
+ >>> list(db.query(db.Q.role % "prod%"))
225
+ ```
226
+
227
+
228
+
229
+
230
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
231
+
232
+
233
+
234
+
235
+ ```python
236
+ >>> list(db.query(db.Q.role * "prod*"))
237
+ ```
238
+
239
+
240
+
241
+
242
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
243
+
244
+
245
+
246
+
247
+ ```python
248
+ >>> list(db.query(db.Q.role @ "prod"))
249
+ ```
250
+
251
+
252
+
253
+
254
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
255
+
256
+
257
+
258
+ ### Speeding up queries
259
+
260
+ Queries can be **greatly accelerated** with an index. Note that SQLite is *extremely* picky about how you write the index! For the most part, if you the same method to query as write the index, you will be fine. (This is more of an issue with nested queries and *advanced* formulating of the query).
261
+
262
+ The name of the index is imaterial. It is based on the fields. It will look different
263
+
264
+
265
+ ```python
266
+ >>> db.create_index("last")
267
+ >>> db.indexes
268
+ ```
269
+
270
+
271
+
272
+
273
+ {'ix_items_1bd45eb5': ['$."last"']}
274
+
275
+
276
+
277
+
278
+ ```python
279
+ >>> # of course, with four items, this makes little difference
280
+ >>> list(db.query(last="Martin"))
281
+ ```
282
+
283
+
284
+
285
+
286
+ [{'first': 'George', 'last': 'Martin', 'born': 1926, 'role': 'producer'}]
287
+
288
+
289
+
290
+ And an index can also be used to enforce uniqueness amongst one or more fields
291
+
292
+
293
+ ```python
294
+ >>> db.create_index("first", "last", unique=True)
295
+ >>> db.indexes
296
+ ```
297
+
298
+
299
+
300
+
301
+ {'ix_items_1bd45eb5': ['$."last"'],
302
+ 'ix_items_250e4243_UNIQUE': ['$."first"', '$."last"']}
303
+
304
+
305
+
306
+
307
+ ```python
308
+ >>> # db.insert({'first': 'George', 'last': 'Martin', 'type':'FAKE ENTRY'})
309
+ >>> # Causes: IntegrityError: UNIQUE constraint failed: index 'ix_items_250e4243_UNIQUE'
310
+ ```
311
+
312
+ See [Advanced Usage](Advanced Usage.ipynb) for more examples including nested queries
313
+ <!--- END AUTO GENERATED -->
314
+
315
+ ## Queries and Paths
316
+
317
+ Queries are detailed in the `db.query()` method. All queries and paths can take four basic forms, but query objects are, by far, the most versatile.
318
+
319
+ <table>
320
+ <thead>
321
+ <tr>
322
+ <th>Type</th>
323
+ <th>Path (e.g. <code>create_index()</code>)</th>
324
+ <th>Query (e.g. <code> query()</code>)</th>
325
+ <th>Comments</th>
326
+ </tr>
327
+ </thead>
328
+ <tbody>
329
+ <tr>
330
+ <td>Plain string</td>
331
+ <td><code>'itemkey'</code>
332
+ <td><code>{'itemkey':'query_val'}</code></td>
333
+ <td>Limited to a single item</td>
334
+ </tr>
335
+ <tr>
336
+ <td>JSON Path string</td>
337
+ <td>
338
+ <code>'$.itemkey'</code>
339
+ <br>
340
+ <code>'$.itemkey.subkey'</code>
341
+ <br>
342
+ <code>'$.itemkey[4]'</code>
343
+ <br>
344
+ <code>'$.itemkey.subkey[4]'</code>
345
+ </td>
346
+ <td>
347
+ <code>{'$.itemkey':'query_val'}</code>
348
+ <br>
349
+ <code>{'$.itemkey.subkey':'query_val'}</code>
350
+ <br>
351
+ <code>{'$.itemkey[4]':'query_val'}</code>
352
+ <br>
353
+ <code>{'$.itemkey.subkey[4]':'query_val'}</code>
354
+ </td>
355
+ <td>Be careful about indices on JSON path strings. See more below</td>
356
+ </tr>
357
+ <tr>
358
+ <td>Tuples (or lists)</td>
359
+ <td>
360
+ <code>('itemkey',)</code>
361
+ <br>
362
+ <code>('itemkey','subkey')</code>
363
+ <br>
364
+ <code>('itemkey',4)</code>
365
+ <br>
366
+ <code>('itemkey','subkey',4)</code>
367
+ </td>
368
+ <td>
369
+ <code>{('itemkey',):'query_val'}</code>
370
+ <br>
371
+ <code>{('itemkey','subkey'):'query_val'}</code>
372
+ <br>
373
+ <code>{('itemkey',4):'query_val'}</code>
374
+ <br>
375
+ <code>{('itemkey','subkey',4):'query_val'}</code>
376
+ </td>
377
+ <td></td>
378
+ </tr>
379
+ <tr>
380
+ <td>Query Objects.<br>(Let <code>db</code> be your database)</td>
381
+ <td>
382
+ <code>db.Q.itemkey</code>
383
+ <br>
384
+ <code>db.Q.itemkey.subkey</code>
385
+ <br>
386
+ <code>db.Q.itemkey[4]</code>
387
+ <br>
388
+ <code>db.Q.itemkey.subkey[4]</code>
389
+ </td>
390
+ <td>
391
+ <code>db.Q.itemkey == 'query_val'</code>
392
+ <br>
393
+ <code>db.Q.itemkey.subkey == 'query_val'</code>
394
+ <br>
395
+ <code>db.Q.itemkey[4] == 'query_val'</code>
396
+ <br>
397
+ <code>db.Q.itemkey.subkey[4] == 'query_val'</code>
398
+ </td>
399
+ <td>
400
+ See below. Can also do many more types of comparisons beyond equality
401
+ </td>
402
+ </tbody>
403
+ </table>
404
+
405
+ Note that JSON Path strings presented here are unquoted, but all other methods will quote them. For example, `'$.itemkey.subkey'` and `('itemkey','subkey')` are *functionally* identical; the latter becomes `'$."itemkey"."subkey"'`. While they are functionally the same, an index created on one will not be used on the other.
406
+
407
+ ### Query Objects
408
+
409
+ Query Objects provide a great deal more flexibility than other forms.
410
+
411
+ They can handle normal equality `==` but can handle inequalities, including `!=`, `<`, `<=`, `>`, `>=`.
412
+
413
+ db.Q.item < 10
414
+ db.Q.other_item > 'bla'
415
+
416
+ They can also handle logic. Note that you must be *very careful* about parentheses.
417
+
418
+ (db.Q.item < 10) & (db.Q.other_item > 'bla') # AND
419
+ (db.Q.item < 10) | (db.Q.other_item > 'bla') # OR
420
+
421
+ Note that while something like `10 <= var <= 20` is valid Python, a query must be done like:
422
+
423
+ (10 <= db.Q.var) & (db.Q.var <= 20 )
424
+
425
+ And, as noted in "Basic Usage," they can do SQL `LIKE` comparisons (`db.Q.key % "%Val%"`), `GLOB` comparisons (`db.Q.key * "file*.txt"`), and `REGEXP` comparisons (`db.Q.key @ "\S+?\.[A-Z]"`).
426
+
427
+ #### Form
428
+
429
+ You can mix and match index or attribute for keys. The following are all **identical**:
430
+
431
+ - `db.Q.itemkey.subkey`
432
+ - `db.Q['itemkey'].subkey`
433
+ - `db.Q['itemkey','subkey']`
434
+ - `db.Q['itemkey']['subkey']`
435
+ - ...
436
+
437
+ ## Command Line Tools
438
+
439
+ JSONLiteDB also installs a tool called "jsonlitedb" that makes it easy to read JSONL and JSON files into a database. This is useful for converting existing databases or appending data.
440
+
441
+ $ jsonlitedb insert mydb.db newfile.jsonl
442
+ $ cat newdata.jsonl | jsonlitedb insert mydb.db
443
+
444
+ It can also dump a database to JSONL.
445
+
446
+ $ jsonlitedb dump mydb.db # stdout
447
+ $ jsonlitedb dump mydb.db --output db.jsonl
448
+
449
+ ## Known Limitations
450
+
451
+ - Dictionary keys must be strings without a dot, double quote, square bracket, and may not start with `_`. (Some of these may work but could have unexpected outcomes.)
452
+ - There is no distinction made between an entry having a key with a value of `None` vs. not having the key. However, you can use `query_by_path_exists()` to query items that have a certain path. There is no way still to mix this with other queries testing existence other than with `None`.
453
+ - While it will accept items like strings as a single item, queries on these do not work reliably.
454
+
455
+ ## FAQs
456
+
457
+ ### Wouldn't it be better to use different SQL columns rather than all as JSON?
458
+
459
+ Yes and no. The idea is the complete lack of schema needed and as a notable improvement to a JSON file. Plus, if you index the field of interest, you get super-fast queries all the same!
460
+
461
+ <!-- From https://github.com/dwyl/repo-badges -->
462
+ [100%]:data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHhtbG5zOnhsaW5rPSJodHRwOi8vd3d3LnczLm9yZy8xOTk5L3hsaW5rIiB3aWR0aD0iMTA0IiBoZWlnaHQ9IjIwIiByb2xlPSJpbWciIGFyaWEtbGFiZWw9ImNvdmVyYWdlOiAxMDAlIj48dGl0bGU+Y292ZXJhZ2U6IDEwMCU8L3RpdGxlPjxsaW5lYXJHcmFkaWVudCBpZD0icyIgeDI9IjAiIHkyPSIxMDAlIj48c3RvcCBvZmZzZXQ9IjAiIHN0b3AtY29sb3I9IiNiYmIiIHN0b3Atb3BhY2l0eT0iLjEiLz48c3RvcCBvZmZzZXQ9IjEiIHN0b3Atb3BhY2l0eT0iLjEiLz48L2xpbmVhckdyYWRpZW50PjxjbGlwUGF0aCBpZD0iciI+PHJlY3Qgd2lkdGg9IjEwNCIgaGVpZ2h0PSIyMCIgcng9IjMiIGZpbGw9IiNmZmYiLz48L2NsaXBQYXRoPjxnIGNsaXAtcGF0aD0idXJsKCNyKSI+PHJlY3Qgd2lkdGg9IjYxIiBoZWlnaHQ9IjIwIiBmaWxsPSIjNTU1Ii8+PHJlY3QgeD0iNjEiIHdpZHRoPSI0MyIgaGVpZ2h0PSIyMCIgZmlsbD0iIzRjMSIvPjxyZWN0IHdpZHRoPSIxMDQiIGhlaWdodD0iMjAiIGZpbGw9InVybCgjcykiLz48L2c+PGcgZmlsbD0iI2ZmZiIgdGV4dC1hbmNob3I9Im1pZGRsZSIgZm9udC1mYW1pbHk9IlZlcmRhbmEsR2VuZXZhLERlamFWdSBTYW5zLHNhbnMtc2VyaWYiIHRleHQtcmVuZGVyaW5nPSJnZW9tZXRyaWNQcmVjaXNpb24iIGZvbnQtc2l6ZT0iMTEwIj48dGV4dCBhcmlhLWhpZGRlbj0idHJ1ZSIgeD0iMzE1IiB5PSIxNTAiIGZpbGw9IiMwMTAxMDEiIGZpbGwtb3BhY2l0eT0iLjMiIHRyYW5zZm9ybT0ic2NhbGUoLjEpIiB0ZXh0TGVuZ3RoPSI1MTAiPmNvdmVyYWdlPC90ZXh0Pjx0ZXh0IHg9IjMxNSIgeT0iMTQwIiB0cmFuc2Zvcm09InNjYWxlKC4xKSIgZmlsbD0iI2ZmZiIgdGV4dExlbmd0aD0iNTEwIj5jb3ZlcmFnZTwvdGV4dD48dGV4dCBhcmlhLWhpZGRlbj0idHJ1ZSIgeD0iODE1IiB5PSIxNTAiIGZpbGw9IiMwMTAxMDEiIGZpbGwtb3BhY2l0eT0iLjMiIHRyYW5zZm9ybT0ic2NhbGUoLjEpIiB0ZXh0TGVuZ3RoPSIzMzAiPjEwMCU8L3RleHQ+PHRleHQgeD0iODE1IiB5PSIxNDAiIHRyYW5zZm9ybT0ic2NhbGUoLjEpIiBmaWxsPSIjZmZmIiB0ZXh0TGVuZ3RoPSIzMzAiPjEwMCU8L3RleHQ+PC9nPjwvc3ZnPg==
463
+