PyPI - databricks-ddbxutils - Versions diffs - 0.1.0__tar.gz → 0.3.0__tar.gz - Mend

databricks-ddbxutils 0.1.0tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

databricks_ddbxutils-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,108 @@
+Metadata-Version: 2.1
+Name: databricks-ddbxutils
+Version: 0.3.0
+Summary: extends databricks dbutils
+Author: Haneul Kim
+Author-email: haneul.kim@data-dynamics.io
+Requires-Python: >=3.11,<4.0
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Requires-Dist: cloudpickle (>=3.1.1,<4.0.0)
+Requires-Dist: databricks-sdk (>=0.64.0,<0.65.0)
+Requires-Dist: dotenv (>=0.9.9,<0.10.0)
+Requires-Dist: jinja2 (>=3.1.6,<4.0.0)
+Requires-Dist: pyarrow (==20.0.0)
+Requires-Dist: pyspark (>=4.0.0,<5.0.0)
+Requires-Dist: python-dateutil (>=2.9.0.post0,<3.0.0)
+Requires-Dist: pytz (>=2025.2,<2026.0)
+Description-Content-Type: text/markdown
+# databricks-ddbxutils
+dbutils 로 부족한 부분을 확장한 ddbxutils
+## Feature
+* [x] `dbutils.widgets` 에 jinja2 template 적용
+## setup
+```shell
+cd <PROJECT_ROOT>
+pip install poetry
+```
+## venv
+```shell
+poetry shell
+```
+## Build
+```shell
+poetry build
+```
+## Run
+### in databricks w/o init_script(= Serverless)
+* Add Wheel
+  * wheel upload 용 Volume 생성 후 upload
+    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
+  * notebook 의 우측 Environment 에서 Environment version 2로 지정 후 volume 에 upload 한 wheel file 추가 후 Apply
+* Usage
+  ```python
+  # dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date')
+  # dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day')
+  import ddbxutils
+  next_day = ddbxutils.widgets.get('next_day')
+  # next_day: 2025-05-25
+  ```
+### in databricks w/ init_script
+* Add Wheel
+  * wheel upload 용 Volume 생성 후 upload
+    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
+  * Libraries
+    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
+* `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh`
+  ```shell
+  #! /bin/bash
+  STARTUP_SCRIPT=/tmp/pyspark_startup.py
+  cat >> ${STARTUP_SCRIPT} << EOF
+  prefix = 'PYTHONSTARTUP_ddbxutils'
+  print(f'{prefix} custom startup script loading...')
+  try:
+    import ddbxutils
+    print(f'{prefix} Custom modules [ddbxutils] are loaded.')
+  except Exception as e:
+    print(f'{prefix} e={e}')
+    print(f'{prefix} import ddbxutils failed')
+  EOF
+  ```
+* Spark config
+  ```text
+  spark.executorEnv.PYTHONSTARTUP /tmp/pyspark_startup.py
+  ```
+* Environment variables
+  ```shell
+  PYTHONSTARTUP=/tmp/pyspark_startup.py
+  ```
+* Init scripts
+  ```text
+  /Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh
+  ```
+* Usage
+  ```python
+  # dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date')
+  # dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day')
+  next_day = ddbxutils.widgets.get('next_day')
+  # next_day: 2025-05-25
+  ```

databricks_ddbxutils-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,88 @@
+# databricks-ddbxutils
+dbutils 로 부족한 부분을 확장한 ddbxutils
+## Feature
+* [x] `dbutils.widgets` 에 jinja2 template 적용
+## setup
+```shell
+cd <PROJECT_ROOT>
+pip install poetry
+```
+## venv
+```shell
+poetry shell
+```
+## Build
+```shell
+poetry build
+```
+## Run
+### in databricks w/o init_script(= Serverless)
+* Add Wheel
+  * wheel upload 용 Volume 생성 후 upload
+    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
+  * notebook 의 우측 Environment 에서 Environment version 2로 지정 후 volume 에 upload 한 wheel file 추가 후 Apply
+* Usage
+  ```python
+  # dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date')
+  # dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day')
+  import ddbxutils
+  next_day = ddbxutils.widgets.get('next_day')
+  # next_day: 2025-05-25
+  ```
+### in databricks w/ init_script
+* Add Wheel
+  * wheel upload 용 Volume 생성 후 upload
+    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
+  * Libraries
+    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
+* `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh`
+  ```shell
+  #! /bin/bash
+  STARTUP_SCRIPT=/tmp/pyspark_startup.py
+  cat >> ${STARTUP_SCRIPT} << EOF
+  prefix = 'PYTHONSTARTUP_ddbxutils'
+  print(f'{prefix} custom startup script loading...')
+  try:
+    import ddbxutils
+    print(f'{prefix} Custom modules [ddbxutils] are loaded.')
+  except Exception as e:
+    print(f'{prefix} e={e}')
+    print(f'{prefix} import ddbxutils failed')
+  EOF
+  ```
+* Spark config
+  ```text
+  spark.executorEnv.PYTHONSTARTUP /tmp/pyspark_startup.py
+  ```
+* Environment variables
+  ```shell
+  PYTHONSTARTUP=/tmp/pyspark_startup.py
+  ```
+* Init scripts
+  ```text
+  /Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/init_script_ddbxutils.sh
+  ```
+* Usage
+  ```python
+  # dbutils.widgets.text('rawdate', '2025-05-24', 'Raw Date')
+  # dbutils.widgets.text('next_day', '{{add_days(rawdate, "%Y-%m-%d", "", 1)}}', 'Next Day')
+  next_day = ddbxutils.widgets.get('next_day')
+  # next_day: 2025-05-25
+  ```

databricks_ddbxutils-0.3.0/ddbxutils/__init__.py ADDED Viewed

@@ -0,0 +1,4 @@
+from . import widgets
+# lazy evaluation 위해 () 로 감싸서 generator 를 return 하도록 변경
+generator = (widgets.get_instance() for x in range(1))

databricks_ddbxutils-0.3.0/ddbxutils/datasources/__init__.py ADDED Viewed

File without changes

databricks_ddbxutils-0.3.0/ddbxutils/datasources/pyfunc.py ADDED Viewed

@@ -0,0 +1,122 @@
+import base64
+from dataclasses import dataclass
+import cloudpickle
+from pyspark.sql.datasource import DataSource, DataSourceReader, InputPartition
+from pyspark.sql.types import StructType, StructField, IntegerType
+@dataclass
+class PythonFunctionPartition(InputPartition):
+    """
+    Partition 정의: 각 파티션의 시작과 끝 범위
+    """
+    start: int
+    end: int
+class PythonFunctionReader(DataSourceReader):
+    """
+    DataSourceReader 구현
+    """
+    def __init__(self, schema: StructType, options: dict, serialized_func_b64):
+        self.schema = schema
+        self.options = options
+        self.func = cloudpickle.loads(base64.b64decode(serialized_func_b64))
+    def partitions(self):
+        lower = int(self.options.get('lowerLimit', '0'))
+        upper = int(self.options.get('upperLimit', '0'))
+        num_parts = int(self.options.get('numPartitions', '1'))
+        step = (upper - lower) // num_parts if num_parts > 0 else (upper - lower)
+        # print(f'step={step}')
+        parts = []
+        start = lower
+        for i in range(num_parts):
+            end = upper if i == num_parts - 1 else start + step
+            parts.append(PythonFunctionPartition(start, end))
+            start = end
+        return parts
+    def read(self, partition: PythonFunctionPartition):
+        for x in range(partition.start, partition.end):
+            # yield (self.func(x),)
+            yield self.func(x)
+class PythonFunctionDataSource(DataSource):
+    """
+    DataSource 구현
+    .. versionadded: 0.3.0
+    Notes
+    -----
+    user defined function 은 tuple, list, `pyspark.sql.types.Row`, `pyarrow.RecordBatch` 중에 하나의 type 을 가져야 합니다.
+    Examples
+    --------
+    >>> spark = ...
+    Use the default input partition implementation:
+    >>> def partitions(self):
+    ...     return [PythonFunctionPartition(1, 3)]
+    Subclass the input partition class:
+    >>> def partitions(self):
+    ...     return [PythonFunctionPartition(1, 3), PythonFunctionPartition(4, 6)]
+    Example in PySpark Shell 에서 `pyspark.sql.Row` 를 return 하는 함수를 사용하는 방법입니다.
+    >>> import base64
+    >>> import cloudpickle
+    >>> from ddbxutils.datasources.pyfunc import PythonFunctionDataSource
+    >>> spark.dataSource.register(PythonFunctionDataSource)
+    >>> # ...
+    >>> from pyspark.sql import Row
+    >>> def user_function_row(x) -> Row:
+    ...     from datetime import datetime
+    ...     from pytz import timezone
+    ...     return Row(str(x), str(x * x), datetime.now(timezone('Asia/Seoul')).strftime("%Y-%m-%d %H:%M:%S"))
+    ...
+    >>> df = (spark.read.format("pyfunc").
+    ...         schema('value1 string, value2 string, ts string').
+    ...         option("lowerLimit", "0").
+    ...         option("upperLimit", "10").
+    ...         option("numPartitions", "100").
+    ...         option("func", base64.b64encode(cloudpickle.dumps(user_function_row)).decode('utf-8')).
+    ...         load())
+    >>> df.show()
+    Example in PySpark Shell 에서 array 를 return 하는 함수를 사용하는 방법입니다.
+    >>> def user_function_row(x):
+    ...     from datetime import datetime
+    ...     from pytz import timezone
+    ...     return [str(x), str(x * x), datetime.now(timezone('Asia/Seoul')).strftime("%Y-%m-%d %H:%M:%S")]
+    ...
+    >>> df = (spark.read.format("pyfunc").
+    ...         schema('value1 string, value2 string, ts string').
+    ...         option("lowerLimit", "0").
+    ...         option("upperLimit", "10").
+    ...         option("numPartitions", "100").
+    ...         option("func", base64.b64encode(cloudpickle.dumps(user_function_row)).decode('utf-8')).
+    ...         load())
+    >>> df.show()
+    """
+    @classmethod
+    def name(cls):
+        return 'pyfunc'
+    def schema(self):
+        return StructType([StructField('value', IntegerType(), nullable=False)])
+    def reader(self, schema: StructType):
+        # options는 문자열이므로 필요시 변환하세요
+        # func = self.options.get('func', None)
+        func = self.options['func']
+        return PythonFunctionReader(self.schema(), self.options, func)

databricks_ddbxutils-0.3.0/ddbxutils/main.py ADDED Viewed

@@ -0,0 +1,14 @@
+# 직접 만든 ddbxutils 모듈 임포트
+import ddbxutils
+# 'next_day' 위젯의 기본값 가져오기
+initial_value = ddbxutils.widgets.get('next_day')
+print(f'초기 next_day 값: {initial_value}')
+# 변경된 값 다시 가져오기
+updated_value = ddbxutils.widgets.get('next_day')
+print(f'업데이트된 next_day 값: {updated_value}')
+# 존재하지 않는 위젯 가져오기
+other_value = ddbxutils.widgets.get('another_widget')
+print(f'another_widget 값: {other_value}')

databricks_ddbxutils-0.3.0/ddbxutils/widgets/__init__.py ADDED Viewed

@@ -0,0 +1,45 @@
+from .core import WidgetImpl
+_widget_impl_instance: WidgetImpl = None
+def get_instance():
+    """
+    widgets 모듈을 초기화합니다.
+    이 함수는 반드시 dbutils 객체와 함께 한 번 호출되어야 합니다.
+    :return: None
+    """
+    global _widget_impl_instance
+    if _widget_impl_instance is None:
+        _widget_impl_instance = WidgetImpl()
+    return _widget_impl_instance
+def get(widget_name: str):
+    """
+    초기화된 인스턴스에서 위젯 값을 가져옵니다.
+    init()이 호출되지 않았다면 예외를 발생시킵니다.
+    :param widget_name: widget key
+    :return: resolved widget value
+    """
+    widget_impl = get_instance()
+    # if _widget_impl_instance is None:
+    #     raise RuntimeError('ddbxutils.widgets가 초기화되지 않았습니다. `ddbxutils.widgets.init(dbutils)`를 먼저 호출하세요.')
+    return widget_impl.get(widget_name)
+def refresh():
+    """
+    위젯 값을 새로 고칩니다.
+    :param dbutils: dbutils
+    :return: None
+    """
+    widget_impl = get_instance()
+    # if _widget_impl_instance is None:
+    #     raise RuntimeError('ddbxutils.widgets가 초기화되지 않았습니다. `ddbxutils.widgets.init(dbutils)`를 먼저 호출하세요.')
+    # if dbutils is None:
+    #     raise RuntimeError('dbutils is required.')
+    widget_impl.refresh()

{databricks_ddbxutils-0.1.0 → databricks_ddbxutils-0.3.0}/ddbxutils/widgets/core.py RENAMED Viewed

@@ -1,6 +1,8 @@
-from ddbxutils.functions import add_days, add_datetime
+from databricks.sdk import WorkspaceClient
 from jinja2 import Environment
+from ddbxutils.functions import add_days, add_datetime
 environment = Environment()
 environment.globals['add_days'] = add_days
 environment.globals['add_datetime'] = add_datetime
@@ -10,14 +12,15 @@ class WidgetImpl:
     dbutils = None
     rendered_widget_values = None
-    def __init__(self, dbutils):
-        self.refresh(dbutils)
+    def __init__(self):
+        self.refresh()
-    def refresh(self, dbutils):
+    def refresh(self):
         """
         위젯의 값을 설정하거나 추가합니다.
         """
-        self.dbutils = dbutils
+        if self.dbutils is None:
+            self.dbutils = WorkspaceClient().dbutils
         widget_values = self.dbutils.widgets.getAll()
         self.rendered_widget_values = {key: environment.from_string(value).render(widget_values) for key, value in widget_values.items()}

{databricks_ddbxutils-0.1.0 → databricks_ddbxutils-0.3.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "databricks-ddbxutils"
-version = "0.1.0"
+version = "0.3.0"
 description = "extends databricks dbutils"
 authors = ["Haneul Kim <haneul.kim@data-dynamics.io>"]
 readme = "README.md"
@@ -8,10 +8,14 @@ packages = [{ include = "ddbxutils" }]
 [tool.poetry.dependencies]
 python = "^3.11"
-databricks-sdk = "^0.57.0"
+databricks-sdk = "^0.64.0"
 dotenv = "^0.9.9"
 jinja2 = "^3.1.6"
 python-dateutil = "^2.9.0.post0"
+pyspark = "^4.0.0"
+pyarrow = "20.0.0"
+cloudpickle = "^3.1.1"
+pytz = "^2025.2"
 [build-system]

databricks_ddbxutils-0.1.0/PKG-INFO DELETED Viewed

@@ -1,56 +0,0 @@
-Metadata-Version: 2.1
-Name: databricks-ddbxutils
-Version: 0.1.0
-Summary: extends databricks dbutils
-Author: Haneul Kim
-Author-email: haneul.kim@data-dynamics.io
-Requires-Python: >=3.11,<4.0
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.11
-Classifier: Programming Language :: Python :: 3.12
-Requires-Dist: databricks-sdk (>=0.57.0,<0.58.0)
-Requires-Dist: dotenv (>=0.9.9,<0.10.0)
-Requires-Dist: jinja2 (>=3.1.6,<4.0.0)
-Requires-Dist: python-dateutil (>=2.9.0.post0,<3.0.0)
-Description-Content-Type: text/markdown
-# databricks-ddbxutils
-dbutils 로 부족한 부분을 확장한 ddbxutils
-## Feature
-* [x] `dbutils.widgets` 에 jinja2 template 적용
-## setup
-```shell
-cd <PROJECT_ROOT>
-pip install poetry
-```
-## venv
-```shell
-poetry shell
-```
-## Build
-```shell
-poetry build
-```
-## Run
-### in databricks w/o init_script
-* Add Wheel
-  * wheel upload 용 Volume 생성 후 upload
-    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
-  * notebook 의 우측 Environment 에서 Environment version 2로 지정 후 volume 에 upload 한 wheel file 추가 후 Apply
-### in databricks w/ init_script
-[//]: # (TODO)

databricks_ddbxutils-0.1.0/README.md DELETED Viewed

@@ -1,39 +0,0 @@
-# databricks-ddbxutils
-dbutils 로 부족한 부분을 확장한 ddbxutils
-## Feature
-* [x] `dbutils.widgets` 에 jinja2 template 적용
-## setup
-```shell
-cd <PROJECT_ROOT>
-pip install poetry
-```
-## venv
-```shell
-poetry shell
-```
-## Build
-```shell
-poetry build
-```
-## Run
-### in databricks w/o init_script
-* Add Wheel
-  * wheel upload 용 Volume 생성 후 upload
-    * `/Volumes/<CATALOG>/<DATABASE>/<VOLUME_NAME>/ddbxutils-<VERSION>-py3-none-any.whl`
-  * notebook 의 우측 Environment 에서 Environment version 2로 지정 후 volume 에 upload 한 wheel file 추가 후 Apply
-### in databricks w/ init_script
-[//]: # (TODO)

databricks_ddbxutils-0.1.0/ddbxutils/__init__.py DELETED Viewed

@@ -1,6 +0,0 @@
-from databricks.sdk import WorkspaceClient
-from . import widgets
-w = WorkspaceClient()
-widgets.init(w.dbutils)

databricks_ddbxutils-0.1.0/ddbxutils/main.py DELETED Viewed

@@ -1,14 +0,0 @@
-# 직접 만든 ddbxutils 모듈 임포트
-import ddbxutils
-# 'next_day' 위젯의 기본값 가져오기
-initial_value = ddbxutils.widgets.get("next_day")
-print(f"초기 'next_day' 값: {initial_value}")
-# 변경된 값 다시 가져오기
-updated_value = ddbxutils.widgets.get("next_day")
-print(f"업데이트된 'next_day' 값: {updated_value}")
-# 존재하지 않는 위젯 가져오기
-other_value = ddbxutils.widgets.get("another_widget")
-print(f"'another_widget' 값: {other_value}")

databricks_ddbxutils-0.1.0/ddbxutils/widgets/__init__.py DELETED Viewed

@@ -1,42 +0,0 @@
-from .core import WidgetImpl
-_widget_impl_instance: WidgetImpl = None
-def init(dbutils):
-    """
-    widgets 모듈을 초기화합니다.
-    이 함수는 반드시 dbutils 객체와 함께 한 번 호출되어야 합니다.
-    :param dbutils: dbutils
-    :return: None
-    """
-    global _widget_impl_instance
-    _widget_impl_instance = WidgetImpl(dbutils)
-def get(widget_name: str):
-    """
-    초기화된 인스턴스에서 위젯 값을 가져옵니다.
-    init()이 호출되지 않았다면 예외를 발생시킵니다.
-    :param widget_name: widget key
-    :return: resolved widget value
-    """
-    if _widget_impl_instance is None:
-        raise RuntimeError('ddbxutils.widgets가 초기화되지 않았습니다. `ddbxutils.widgets.init(dbutils)`를 먼저 호출하세요.')
-    return _widget_impl_instance.get(widget_name)
-def refresh(dbutils):
-    """
-    위젯 값을 새로 고칩니다.
-    :param dbutils: dbutils
-    :return: None
-    """
-    if _widget_impl_instance is None:
-        raise RuntimeError('ddbxutils.widgets가 초기화되지 않았습니다. `ddbxutils.widgets.init(dbutils)`를 먼저 호출하세요.')
-    if dbutils is None:
-        raise RuntimeError('dbutils is required.')
-    _widget_impl_instance.refresh(dbutils)

{databricks_ddbxutils-0.1.0 → databricks_ddbxutils-0.3.0}/LICENSE RENAMED Viewed

File without changes

{databricks_ddbxutils-0.1.0 → databricks_ddbxutils-0.3.0}/ddbxutils/functions.py RENAMED Viewed

File without changes

databricks-ddbxutils 0.1.0__tar.gz → 0.3.0__tar.gz

databricks-ddbxutils 0.1.0tar.gz → 0.3.0tar.gz