PyPI - jupyter-duckdb - Versions diffs - 1.2.0.2__tar.gz → 1.2.0.4__tar.gz - Mend

jupyter-duckdb 1.2.0.2tar.gz → 1.2.0.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (91) hide show

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: jupyter-duckdb
-Version: 1.2.0.2
+Version: 1.2.0.4
 Summary: a basic wrapper kernel for DuckDB
 Home-page: https://github.com/erictroebs/jupyter-duckdb
 Author: Eric Tröbs
@@ -32,10 +32,6 @@ This is a simple DuckDB wrapper kernel which accepts SQL as input, executes it
 using a previously loaded DuckDB instance and formats the output as a table.
 There are some magic commands that make teaching easier with this kernel.
-## Quick Start
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fdbgit.prakinf.tu-ilmenau.de%2Fertr8623%2Fjupyter-duckdb.git/master)
 ## Table of Contents
 - [Setup](#setup)
@@ -85,6 +81,12 @@ Execute the following command to pull and run a prepared image.
 docker run -p 8888:8888 troebs/jupyter-duckdb
 ```
+There is also a second image. It contains an additional instance of PostgreSQL:
+```bash
+docker run -p 8888:8888 troebs/jupyter-duckdb:postgresql
+```
 This image can also be used with JupyterHub and the
 [DockerSpawner / SwarmSpawner](https://github.com/jupyterhub/dockerspawner)
 and probably with the
@@ -138,6 +140,13 @@ Please note that `:memory:` is also a valid file path for DuckDB. The data is
 then stored exclusively in the main memory. In combination with `CREATE`
 and `OF` this makes it possible to work on a temporary copy in memory.
+Although the name suggests otherwise, the kernel can also be used with other
+databases:
+- **SQLite** is automatically used as a fallback if the DuckDB dependency is
+  missing.
+- To connect to a **PostgreSQL** instance, you need to specify a database URI
+  starting with `(postgresql|postgres|pgsql|psql|pg)://`.
 ### Schema Diagrams
 The magic command `SCHEMA` can be used to create a simple schema diagram of the
@@ -153,6 +162,10 @@ representation requires more space, but can improve readability.
 %SCHEMA TD
 ```
+The optional argument `ONLY`, followed by one or more table names separated by a
+comma, can be used to display only the named tables and all those connected with
+a foreign key.
 Graphviz (`dot` in PATH) is required to render schema diagrams.
 ### Number of Rows
@@ -234,6 +247,11 @@ UNION
 SELECT 1, 'Name 1'
 ```
+By default, failed tests will display an explanation, but the notebook will
+continue to run. Set the `DUCKDB_TESTS_RAISE_EXCEPTION` environment variable to
+`true` to raise an exception when a test fails. This can be useful for automated
+testing in CI environments.
 Disclaimer: The integrated testing is work-in-progress and thus subject to
 potentially incompatible changes and enhancements.
@@ -259,6 +277,9 @@ The supported operations are:
 - Cross Product `×`
 - Division `÷`
+The optional flag `ANALYZE` can be used to add an execution diagram to the
+output.
 The Dockerfile also installs the Jupyter Lab plugin
 [jupyter-ra-extension](https://pypi.org/project/jupyter-ra-extension/). It adds
 the symbols mentioned above and some other supported symbols to the toolbar for

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/README.md RENAMED Viewed

@@ -4,10 +4,6 @@ This is a simple DuckDB wrapper kernel which accepts SQL as input, executes it
 using a previously loaded DuckDB instance and formats the output as a table.
 There are some magic commands that make teaching easier with this kernel.
-## Quick Start
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fdbgit.prakinf.tu-ilmenau.de%2Fertr8623%2Fjupyter-duckdb.git/master)
 ## Table of Contents
 - [Setup](#setup)
@@ -57,6 +53,12 @@ Execute the following command to pull and run a prepared image.
 docker run -p 8888:8888 troebs/jupyter-duckdb
 ```
+There is also a second image. It contains an additional instance of PostgreSQL:
+```bash
+docker run -p 8888:8888 troebs/jupyter-duckdb:postgresql
+```
 This image can also be used with JupyterHub and the
 [DockerSpawner / SwarmSpawner](https://github.com/jupyterhub/dockerspawner)
 and probably with the
@@ -110,6 +112,13 @@ Please note that `:memory:` is also a valid file path for DuckDB. The data is
 then stored exclusively in the main memory. In combination with `CREATE`
 and `OF` this makes it possible to work on a temporary copy in memory.
+Although the name suggests otherwise, the kernel can also be used with other
+databases:
+- **SQLite** is automatically used as a fallback if the DuckDB dependency is
+  missing.
+- To connect to a **PostgreSQL** instance, you need to specify a database URI
+  starting with `(postgresql|postgres|pgsql|psql|pg)://`.
 ### Schema Diagrams
 The magic command `SCHEMA` can be used to create a simple schema diagram of the
@@ -125,6 +134,10 @@ representation requires more space, but can improve readability.
 %SCHEMA TD
 ```
+The optional argument `ONLY`, followed by one or more table names separated by a
+comma, can be used to display only the named tables and all those connected with
+a foreign key.
 Graphviz (`dot` in PATH) is required to render schema diagrams.
 ### Number of Rows
@@ -206,6 +219,11 @@ UNION
 SELECT 1, 'Name 1'
 ```
+By default, failed tests will display an explanation, but the notebook will
+continue to run. Set the `DUCKDB_TESTS_RAISE_EXCEPTION` environment variable to
+`true` to raise an exception when a test fails. This can be useful for automated
+testing in CI environments.
 Disclaimer: The integrated testing is work-in-progress and thus subject to
 potentially incompatible changes and enhancements.
@@ -231,6 +249,9 @@ The supported operations are:
 - Cross Product `×`
 - Division `÷`
+The optional flag `ANALYZE` can be used to add an execution diagram to the
+output.
 The Dockerfile also installs the Jupyter Lab plugin
 [jupyter-ra-extension](https://pypi.org/project/jupyter-ra-extension/). It adds
 the symbols mentioned above and some other supported symbols to the toolbar for

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/src/duckdb_kernel/kernel.py RENAMED Viewed

@@ -9,11 +9,12 @@ from typing import Optional, Dict, List, Tuple
 from ipykernel.kernelbase import Kernel
-from .db import Connection, DatabaseError
+from .db import Connection, DatabaseError, Table
 from .db.error import *
 from .magics import *
 from .parser import RAParser, DCParser
 from .util.ResultSetComparator import ResultSetComparator
+from .util.TestError import TestError
 from .util.formatting import row_count, rows_table, wrap_image
 from .visualization import *
@@ -139,6 +140,7 @@ class DuckDBKernel(Kernel):
             return False
     def _execute_stmt(self, query: str, silent: bool,
+                      column_name_mapping: Dict[str, str],
                       max_rows: Optional[int]) -> Tuple[Optional[List[str]], Optional[List[List]]]:
         if self._db is None:
             raise AssertionError('load a database first')
@@ -168,7 +170,8 @@ class DuckDBKernel(Kernel):
         else:
             if columns is not None:
                 # table header
-                table_header = ''.join(f'<th>{c}</th>' for c in columns)
+                mapped_columns = (column_name_mapping.get(c, c) for c in columns)
+                table_header = ''.join(f'<th>{c}</th>' for c in mapped_columns)
                 # table data
                 if max_rows is not None and len(rows) > max_rows:
@@ -302,12 +305,23 @@ class DuckDBKernel(Kernel):
         result_columns = [col.rsplit('.', 1)[-1] for col in result_columns]
         # extract data for test
-        data = self._tests[name]
+        test_data = self._tests[name]
+        # execute test
+        try:
+            self._execute_test(test_data, result_columns, result)
+            self.print_data(wrap_image(True))
+        except TestError as e:
+            self.print_data(wrap_image(False, e.message))
+            if os.environ.get('DUCKDB_TESTS_RAISE_EXCEPTION', 'false').lower() in ('true', '1'):
+                raise e
+    @staticmethod
+    def _execute_test(test_data: Dict, result_columns: List[str], result: List[List]):
         # check columns if required
-        if isinstance(data['equals'], dict):
+        if isinstance(test_data['equals'], dict):
             # get column order
-            data_columns = list(data['equals'].keys())
+            data_columns = list(test_data['equals'].keys())
             column_order = []
             for dc in data_columns:
@@ -318,39 +332,37 @@ class DuckDBKernel(Kernel):
                         found += 1
                 if found == 0:
-                    return self.print_data(wrap_image(False, f'attribute {dc} missing'))
+                    raise TestError(f'attribute {dc} missing')
                 if found >= 2:
-                    return self.print_data(wrap_image(False, f'ambiguous attribute {dc}'))
+                    raise TestError(f'ambiguous attribute {dc}')
             # abort if columns from result are unnecessary
             for i, rc in enumerate(result_columns):
                 if i not in column_order:
-                    return self.print_data(wrap_image(False, f'unnecessary attribute {rc}'))
+                    raise TestError(f'unnecessary attribute {rc}')
             # reorder columns and transform to list of lists
             sorted_columns = [x for _, x in sorted(zip(column_order, data_columns))]
             rows = []
-            for row in zip(*(data['equals'][col] for col in sorted_columns)):
+            for row in zip(*(test_data['equals'][col] for col in sorted_columns)):
                 rows.append(row)
         else:
-            rows = data['equals']
+            rows = test_data['equals']
         # ordered test
-        if data['ordered']:
+        if test_data['ordered']:
             # calculate diff
             rsc = ResultSetComparator(result, rows)
             missing = len(rsc.ordered_right_only)
             if missing > 0:
-                return self.print_data(wrap_image(False, f'{row_count(missing)} missing'))
+                raise TestError(f'{row_count(missing)} missing')
             missing = len(rsc.ordered_left_only)
             if missing > 0:
-                return self.print_data(wrap_image(False, f'{row_count(missing)} more than required'))
-            return self.print_data(wrap_image(True))
+                raise TestError(f'{row_count(missing)} more than required')
         # unordered test
         else:
@@ -362,13 +374,11 @@ class DuckDBKernel(Kernel):
             # print result
             if below > 0 and above > 0:
-                self.print_data(wrap_image(False, f'{row_count(below)} missing, {row_count(above)} unnecessary'))
+                raise TestError(f'{row_count(below)} missing, {row_count(above)} unnecessary')
             elif below > 0:
-                self.print_data(wrap_image(False, f'{row_count(below)} missing'))
+                raise TestError(f'{row_count(below)} missing')
             elif above > 0:
-                self.print_data(wrap_image(False, f'{row_count(above)} unnecessary'))
-            else:
-                self.print_data(wrap_image(True))
+                raise TestError(f'{row_count(above)} unnecessary')
     def _all_magic(self, silent: bool):
         return {
@@ -404,7 +414,7 @@ class DuckDBKernel(Kernel):
             whitelist = set()
             # split and strip names
-            names = [n.strip() for n in re.split(r'[, \t]', only)]
+            names = [Table.normalize_name(n.strip()) for n in re.split(r'[, \t]', only)]
             # add initial tables to result set
             for name in names:
@@ -503,10 +513,11 @@ class DuckDBKernel(Kernel):
         root_node = DCParser.parse_query(code)
         # generate sql
-        sql = root_node.to_sql(tables)
+        sql, cnm = root_node.to_sql_with_renamed_columns(tables)
         return {
-            'generated_code': sql
+            'generated_code': sql,
+            'column_name_mapping': cnm
         }
     # jupyter related functions
@@ -530,6 +541,10 @@ class DuckDBKernel(Kernel):
                 clean_code = execution_args['generated_code']
                 del execution_args['generated_code']
+            # set default column name mapping if none provided
+            if 'column_name_mapping' not in execution_args:
+                execution_args['column_name_mapping'] = {}
             # execute statement if needed
             if clean_code.strip():
                 cols, rows = self._execute_stmt(clean_code, silent, **execution_args)

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/src/duckdb_kernel/parser/elements/binary/ConditionalSet.py RENAMED Viewed

@@ -42,7 +42,7 @@ class ConditionalSet:
                 # If a constant was found, we store the value and replace it with a random attribute name.
                 constant = le.names[i]
-                new_token = Token.random()
+                new_token = Token.random(constant)
                 new_operand = DCOperand(le.relation, le.names[:i] + (new_token,) + le.names[i + 1:], skip_comma=True)
                 # We now need an equality comparison to ensure the introduced attribute is equal to the constant.
@@ -103,7 +103,7 @@ class ConditionalSet:
         # The default case is to return the LogicElement with not DCOperands.
         return le, []
-    def to_sql(self, tables: Dict[str, Table]) -> str:
+    def to_sql_with_renamed_columns(self, tables: Dict[str, Table]) -> Tuple[str, Dict[str, str]]:
         # First we have to find and remove all DCOperands from the operator tree.
         condition, dc_operands = self.split_tree(self.condition)
@@ -339,5 +339,18 @@ class ConditionalSet:
             sql_join_filters += f' AND {join_filter}'
         sql_condition = condition.to_sql(joined_columns) if condition is not None else '1=1'
+        sql_query = f'SELECT DISTINCT {sql_select} FROM {sql_tables} WHERE ({sql_join_filters}) AND ({sql_condition})'
+        # Create a mapping from intermediate column names to constant values.
+        column_name_mapping = {
+            p: p.constant
+            for o in dc_operands
+            for p in o.names
+            if p.constant is not None
+        }
-        return f'SELECT DISTINCT {sql_select} FROM {sql_tables} WHERE ({sql_join_filters}) AND ({sql_condition})'
+        return sql_query, column_name_mapping
+    def to_sql(self, tables: Dict[str, Table]) -> str:
+        sql, _ = self.to_sql_with_renamed_columns(tables)
+        return sql

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/src/duckdb_kernel/parser/tokenizer/Token.py RENAMED Viewed

@@ -1,8 +1,9 @@
+from typing import Optional
 from uuid import uuid4
 class Token(str):
-    def __new__(cls, value: str):
+    def __new__(cls, value: str, constant: 'Token' = None):
         while True:
             # strip whitespaces
             value = value.strip()
@@ -38,20 +39,40 @@ class Token(str):
         return super().__new__(cls, value)
+    def __init__(self, value: str, constant: 'Token' = None):
+        self.constant: Optional[Token] = constant
     @staticmethod
-    def random() -> 'Token':
-        return Token('__' + str(uuid4()).replace('-', '_'))
+    def random(constant: 'Token' = None) -> 'Token':
+        return Token('__' + str(uuid4()).replace('-', '_'), constant)
     @property
     def empty(self) -> bool:
         return len(self) == 0
+    @property
+    def is_temporary(self) -> bool:
+        return self.startswith('__')
     @property
     def is_constant(self) -> bool:
         return ((self[0] == '"' and self[-1] == '"') or
                 (self[0] == "'" and self[-1] == "'") or
                 self.replace('.', '', 1).isnumeric())
+    @property
+    def no_quotes(self) -> str:
+        quotes = ('"', "'")
+        if self[0] in quotes and self[-1] in quotes:
+            return self[1:-1]
+        if self[0] in quotes:
+            return self[1:]
+        if self[-1] in quotes:
+            return self[:-1]
+        else:
+            return self
     @property
     def single_quotes(self) -> str:
         # TODO Is this comparison useless because tokens are cleaned automatically?

jupyter_duckdb-1.2.0.4/src/duckdb_kernel/util/TestError.py ADDED Viewed

@@ -0,0 +1,4 @@
+class TestError(Exception):
+    @property
+    def message(self) -> str:
+        return str(self)

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/src/jupyter_duckdb.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.2
 Name: jupyter-duckdb
-Version: 1.2.0.2
+Version: 1.2.0.4
 Summary: a basic wrapper kernel for DuckDB
 Home-page: https://github.com/erictroebs/jupyter-duckdb
 Author: Eric Tröbs
@@ -32,10 +32,6 @@ This is a simple DuckDB wrapper kernel which accepts SQL as input, executes it
 using a previously loaded DuckDB instance and formats the output as a table.
 There are some magic commands that make teaching easier with this kernel.
-## Quick Start
-[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/git/https%3A%2F%2Fdbgit.prakinf.tu-ilmenau.de%2Fertr8623%2Fjupyter-duckdb.git/master)
 ## Table of Contents
 - [Setup](#setup)
@@ -85,6 +81,12 @@ Execute the following command to pull and run a prepared image.
 docker run -p 8888:8888 troebs/jupyter-duckdb
 ```
+There is also a second image. It contains an additional instance of PostgreSQL:
+```bash
+docker run -p 8888:8888 troebs/jupyter-duckdb:postgresql
+```
 This image can also be used with JupyterHub and the
 [DockerSpawner / SwarmSpawner](https://github.com/jupyterhub/dockerspawner)
 and probably with the
@@ -138,6 +140,13 @@ Please note that `:memory:` is also a valid file path for DuckDB. The data is
 then stored exclusively in the main memory. In combination with `CREATE`
 and `OF` this makes it possible to work on a temporary copy in memory.
+Although the name suggests otherwise, the kernel can also be used with other
+databases:
+- **SQLite** is automatically used as a fallback if the DuckDB dependency is
+  missing.
+- To connect to a **PostgreSQL** instance, you need to specify a database URI
+  starting with `(postgresql|postgres|pgsql|psql|pg)://`.
 ### Schema Diagrams
 The magic command `SCHEMA` can be used to create a simple schema diagram of the
@@ -153,6 +162,10 @@ representation requires more space, but can improve readability.
 %SCHEMA TD
 ```
+The optional argument `ONLY`, followed by one or more table names separated by a
+comma, can be used to display only the named tables and all those connected with
+a foreign key.
 Graphviz (`dot` in PATH) is required to render schema diagrams.
 ### Number of Rows
@@ -234,6 +247,11 @@ UNION
 SELECT 1, 'Name 1'
 ```
+By default, failed tests will display an explanation, but the notebook will
+continue to run. Set the `DUCKDB_TESTS_RAISE_EXCEPTION` environment variable to
+`true` to raise an exception when a test fails. This can be useful for automated
+testing in CI environments.
 Disclaimer: The integrated testing is work-in-progress and thus subject to
 potentially incompatible changes and enhancements.
@@ -259,6 +277,9 @@ The supported operations are:
 - Cross Product `×`
 - Division `÷`
+The optional flag `ANALYZE` can be used to add an execution diagram to the
+output.
 The Dockerfile also installs the Jupyter Lab plugin
 [jupyter-ra-extension](https://pypi.org/project/jupyter-ra-extension/). It adds
 the symbols mentioned above and some other supported symbols to the toolbar for

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/src/jupyter_duckdb.egg-info/SOURCES.txt RENAMED Viewed

@@ -72,6 +72,7 @@ src/duckdb_kernel/parser/util/RenamableColumn.py
 src/duckdb_kernel/parser/util/RenamableColumnList.py
 src/duckdb_kernel/parser/util/__init__.py
 src/duckdb_kernel/util/ResultSetComparator.py
+src/duckdb_kernel/util/TestError.py
 src/duckdb_kernel/util/__init__.py
 src/duckdb_kernel/util/formatting.py
 src/duckdb_kernel/visualization/Drawer.py

{jupyter_duckdb-1.2.0.2 → jupyter_duckdb-1.2.0.4}/test/test_dc.py RENAMED Viewed

@@ -7,7 +7,23 @@ def test_case_insensitivity():
             '{ username | users(id, username) }',
             '{ username | Users(id, username) }',
             '{ username | USERS(id, username) }',
-            '{ username | uSers(id, username) }'
+            '{ username | uSers(id, username) }',
+    ):
+        root = DCParser.parse_query(query)
+        # execute to test case insensitivity
+        with Connection() as con:
+            assert con.execute_dc(root) == [
+                ('Alice',),
+                ('Bob',),
+                ('Charlie',)
+            ]
+    for query in (
+            '{ username | users(id, username) }',
+            '{ Username | users(id, username) }',
+            '{ USERNAME | users(id, username) }',
+            '{ userName | users(id, username) }',
     ):
         root = DCParser.parse_query(query)
@@ -79,7 +95,8 @@ def test_conditions():
             ]
         for query in [
-            '{ id | Users(id, name) ∧ name > "B" ∧ name < "C" }'
+            '{ id | Users(id, name) ∧ name = "Bob" }',
+            '{ id | Users(id, name) ∧ name > "B" ∧ name < "C" }',
         ]:
             root = DCParser.parse_query(query)
             assert con.execute_dc(root) == [
@@ -189,6 +206,33 @@ def test_joins():
             ]
+def test_disjunction_joins():
+    with Connection() as con:
+        for query in [
+            "{ enum, snum, sid | Episodes(enum, snum, sid, ename) ∧ (Characters('Character B', enum, snum, sid, _) ∨ Characters('Character D', enum, snum, sid, _)) }",
+            "{ enum, snum, sid | Episodes(enum, snum, sid, ename) ∧ (Characters(cname1, enum, snum, sid, _) ∧ cname1 = 'Character B' ∨ Characters(cname2, enum, snum, sid, _) ∧ cname2 = 'Character D') }",
+        ]:
+            root = DCParser.parse_query(query)
+            assert con.execute_dc(root) == [
+                (1, 1, 1),
+                (2, 1, 1)
+            ]
+def test_cross_join():
+    with Connection() as con:
+        for query in [
+            "{ cname1, cname2 | Characters(cname1, _, _, 2, _) ∧ Characters(cname2, _, _, 2, _) }",
+        ]:
+            root = DCParser.parse_query(query)
+            assert con.execute_dc(root) == [
+                ('Character E', 'Character E'),
+                ('Character E', 'Character F'),
+                ('Character F', 'Character E'),
+                ('Character F', 'Character F'),
+            ]
 def test_underscores():
     with Connection() as con:
         # distinct underscores
@@ -239,3 +283,53 @@ def test_underscores():
                 ('Show 2 / Season 2 / Episode 3',),
                 ('Show 2 / Season 2 / Episode 4',)
             ]
+def test_anonymous_column_names():
+    with Connection() as con:
+        for query in [
+            '{ * | Episodes(_, _, 2, ename) }',
+        ]:
+            root = DCParser.parse_query(query)
+            cols, _ = con.execute_dc_return_cols(root)
+            assert cols == ['2', 'ename']
+        for query in [
+            "{ * | Episodes(_, _, '2', ename) }",
+        ]:
+            root = DCParser.parse_query(query)
+            cols, _ = con.execute_dc_return_cols(root)
+            assert cols == ["'2'", 'ename']
+        for query in [
+            "{ * | Episodes(_, _, sid, 'ename') }",
+        ]:
+            root = DCParser.parse_query(query)
+            cols, _ = con.execute_dc_return_cols(root)
+            assert cols == ['sid', "'ename'"]
+        for query in [
+            '{ * | Episodes(_, 1, 2, ename) }',
+        ]:
+            root = DCParser.parse_query(query)
+            cols, _ = con.execute_dc_return_cols(root)
+            assert cols == ['1', '2', 'ename']
+        for query in [
+            '{ * | Episodes(_, 2, 2, ename) }',
+        ]:
+            root = DCParser.parse_query(query)
+            cols, _ = con.execute_dc_return_cols(root)
+            assert cols == ['2', '2', 'ename']
+        for query in [
+            '{ * | Episodes(_, _, sid, ename) ∧ sid = 2 }',
+        ]:
+            root = DCParser.parse_query(query)
+            cols, _ = con.execute_dc_return_cols(root)
+            assert cols == ['sid', 'ename']