PyPI - salesforce-data-customcode - Versions diffs - 0.1.2__tar.gz → 0.1.5__tar.gz - Mend

salesforce-data-customcode 0.1.2tar.gz → 0.1.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (36) hide show

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: salesforce-data-customcode
-Version: 0.1.2
+Version: 0.1.5
 Summary: Data Cloud Custom Code SDK
 License: Apache-2.0
 Requires-Python: >=3.10,<3.12
@@ -31,6 +31,14 @@ More specifically, this codebase gives you ability to test code locally before p
 Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_USE.md)
+## Prerequisites
+- Python 3.11 (If your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11)
+- [Azul Zulu OpenJDK 17.x](https://www.azul.com/downloads/?version=java-17-lts&package=jdk#zulu)
+- Docker support like [Docker Desktop](https://docs.docker.com/desktop/)
+- A salesforce org, with some DLOs or DMOs with data
+- A [connected app](#creating-a-connected-app)
 ## Installation
 The SDK can be downloaded directly from PyPI with `pip`:
 ```
@@ -42,12 +50,16 @@ You can verify it was properly installed via CLI:
 datacustomcode version
 ```
-## Development Setup
-We offer two built-in development interfaces: `devcontainers` and Jupyter, but you can set up any tool you would like manually.
+## Quick start
+Ensure you have all the [prerequisites](#prerequisites) prepared on your machine.
-To get started, use the CLI to initialize a new development environment:
-```
-datacustomcode init [DIRECTORY TO DUMP NEW REPO]
+To get started, create a directory and initialize a new project with the CLI:
+```zsh
+mkdir datacloud && cd datacloud
+python3.11 -m venv .venv
+source .venv/bin/activate
+pip install salesforce-data-customcode
+datacustomcode init my_package
 ```
 This will yield all necessary files to get started:
@@ -66,11 +78,33 @@ This will yield all necessary files to get started:
 * `Dockerfile` <span style="color:grey;font-style:italic;">(Do not update)</span> – Development container emulating the remote execution environment.
 * `requirements-dev.txt` <span style="color:grey;font-style:italic;">(Do not update)</span> – These are the dependencies for the development environment.
 * `jupyterlab.sh` <span style="color:grey;font-style:italic;">(Do not update)</span> – Helper script for setting up Jupyter.
-* `requirements.txt` – Here you define the requirements that you will need remotely
+* `requirements.txt` – Here you define the requirements that you will need for your script.
 * `payload` – This folder will be compressed and deployed to the remote execution environment.
   * `config.json` – This config defines permissions on the back and can be generated programmatically with `scan` CLI method.
   * `entrypoint.py` – The script that defines the data transformation logic.
+A functional entrypoint.py is provided so you can run once you've configured your connected app:
+```zsh
+cd my_package
+datacustomcode configure
+datacustomcode run ./payload/entrypoint.py
+```
+> [!IMPORTANT]
+> The example entrypoint.py requires a `Account_Home__dll` DLO to be present.  And in order to deploy the script (next step), the output DLO (which is `Account_Home_copy__dll` in the example entrypoint.py) also needs to exist and be in the same dataspace as `Account_Home__dll`.
+After modifying the `entrypoint.py` as needed, using any dependencies you add in the `.venv` virtual environment, you can run this script in Data Cloud:
+```zsh
+datacustomcode scan ./payload/entrypoint.py
+datacustomcode deploy --path ./payload --name my_custom_script
+```
+> [!TIP]
+> The `deploy` process can take several minutes.  If you'd like more feedback on the underlying process, you can add `--debug` to the command like `datacustomcode --debug deploy --path ./payload --name my_custom_script`
+You can now use the Salesforce Data Cloud UI to find the created Data Transform and use the `Run Now` button to run it.
+Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added.
 ## API
 You entry point script will define logic using the `Client` object which wraps data access layers.
@@ -157,3 +191,31 @@ Options:
 - `--config-file TEXT`: Path to configuration file
 - `--dependencies TEXT`: Additional dependencies (can be specified multiple times)
+#### `datacustomcode zip`
+Zip a transformation job in preparation to upload to Data Cloud.
+Options:
+- `--path TEXT`: Path to the code directory (default: ".")
+## Prerequisite details
+### Creating a connected app
+1. Log in to salesforce as an admin. In the top right corner, click on the gear icon and go to `Setup`
+2. In the left hand side, search for "App Manager" and select the `App Manager` underneath `Apps`
+3. Click on `New Connected App` in the upper right
+4. Fill in the required fields within the `Basic Information` section
+5. Under the `API (Enable OAuth Settings)` section:
+    1. Click on the checkbox to Enable OAuth Settings.
+    2. Provide a callback URL like http://localhost:55555/callback
+    3. In the Selected OAuth Scopes, make sure that `refresh_token`, `api`, `cdp_query_api`, `cdp_profile_api` is selected.
+    4. Click on Save to save the connected app
+6. From the detail page that opens up afterwards, click the "Manage Consumer Details" button to find your client id and client secret
+7. Go back to `Setup`, then `OAuth and OpenID Connect Settings`, and enable the "Allow OAuth Username-Password Flows" option
+You now have all fields necessary for the `datacustomcode configure` command.
+## Other docs
+[Troubleshooting](./docs/troubleshooting.md)

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/README.md RENAMED Viewed

@@ -8,6 +8,14 @@ More specifically, this codebase gives you ability to test code locally before p
 Use of this project with Salesforce is subject to the [TERMS OF USE](./TERMS_OF_USE.md)
+## Prerequisites
+- Python 3.11 (If your system version is different, we recommend using [pyenv](https://github.com/pyenv/pyenv) to configure 3.11)
+- [Azul Zulu OpenJDK 17.x](https://www.azul.com/downloads/?version=java-17-lts&package=jdk#zulu)
+- Docker support like [Docker Desktop](https://docs.docker.com/desktop/)
+- A salesforce org, with some DLOs or DMOs with data
+- A [connected app](#creating-a-connected-app)
 ## Installation
 The SDK can be downloaded directly from PyPI with `pip`:
 ```
@@ -19,12 +27,16 @@ You can verify it was properly installed via CLI:
 datacustomcode version
 ```
-## Development Setup
-We offer two built-in development interfaces: `devcontainers` and Jupyter, but you can set up any tool you would like manually.
+## Quick start
+Ensure you have all the [prerequisites](#prerequisites) prepared on your machine.
-To get started, use the CLI to initialize a new development environment:
-```
-datacustomcode init [DIRECTORY TO DUMP NEW REPO]
+To get started, create a directory and initialize a new project with the CLI:
+```zsh
+mkdir datacloud && cd datacloud
+python3.11 -m venv .venv
+source .venv/bin/activate
+pip install salesforce-data-customcode
+datacustomcode init my_package
 ```
 This will yield all necessary files to get started:
@@ -43,11 +55,33 @@ This will yield all necessary files to get started:
 * `Dockerfile` <span style="color:grey;font-style:italic;">(Do not update)</span> – Development container emulating the remote execution environment.
 * `requirements-dev.txt` <span style="color:grey;font-style:italic;">(Do not update)</span> – These are the dependencies for the development environment.
 * `jupyterlab.sh` <span style="color:grey;font-style:italic;">(Do not update)</span> – Helper script for setting up Jupyter.
-* `requirements.txt` – Here you define the requirements that you will need remotely
+* `requirements.txt` – Here you define the requirements that you will need for your script.
 * `payload` – This folder will be compressed and deployed to the remote execution environment.
   * `config.json` – This config defines permissions on the back and can be generated programmatically with `scan` CLI method.
   * `entrypoint.py` – The script that defines the data transformation logic.
+A functional entrypoint.py is provided so you can run once you've configured your connected app:
+```zsh
+cd my_package
+datacustomcode configure
+datacustomcode run ./payload/entrypoint.py
+```
+> [!IMPORTANT]
+> The example entrypoint.py requires a `Account_Home__dll` DLO to be present.  And in order to deploy the script (next step), the output DLO (which is `Account_Home_copy__dll` in the example entrypoint.py) also needs to exist and be in the same dataspace as `Account_Home__dll`.
+After modifying the `entrypoint.py` as needed, using any dependencies you add in the `.venv` virtual environment, you can run this script in Data Cloud:
+```zsh
+datacustomcode scan ./payload/entrypoint.py
+datacustomcode deploy --path ./payload --name my_custom_script
+```
+> [!TIP]
+> The `deploy` process can take several minutes.  If you'd like more feedback on the underlying process, you can add `--debug` to the command like `datacustomcode --debug deploy --path ./payload --name my_custom_script`
+You can now use the Salesforce Data Cloud UI to find the created Data Transform and use the `Run Now` button to run it.
+Once the Data Transform run is successful, check the DLO your script is writing to and verify the correct records were added.
 ## API
 You entry point script will define logic using the `Client` object which wraps data access layers.
@@ -133,3 +167,31 @@ Argument:
 Options:
 - `--config-file TEXT`: Path to configuration file
 - `--dependencies TEXT`: Additional dependencies (can be specified multiple times)
+#### `datacustomcode zip`
+Zip a transformation job in preparation to upload to Data Cloud.
+Options:
+- `--path TEXT`: Path to the code directory (default: ".")
+## Prerequisite details
+### Creating a connected app
+1. Log in to salesforce as an admin. In the top right corner, click on the gear icon and go to `Setup`
+2. In the left hand side, search for "App Manager" and select the `App Manager` underneath `Apps`
+3. Click on `New Connected App` in the upper right
+4. Fill in the required fields within the `Basic Information` section
+5. Under the `API (Enable OAuth Settings)` section:
+    1. Click on the checkbox to Enable OAuth Settings.
+    2. Provide a callback URL like http://localhost:55555/callback
+    3. In the Selected OAuth Scopes, make sure that `refresh_token`, `api`, `cdp_query_api`, `cdp_profile_api` is selected.
+    4. Click on Save to save the connected app
+6. From the detail page that opens up afterwards, click the "Manage Consumer Details" button to find your client id and client secret
+7. Go back to `Setup`, then `OAuth and OpenID Connect Settings`, and enable the "Allow OAuth Username-Password Flows" option
+You now have all fields necessary for the `datacustomcode configure` command.
+## Other docs
+[Troubleshooting](./docs/troubleshooting.md)

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/pyproject.toml RENAMED Viewed

@@ -18,7 +18,7 @@ license = "Apache-2.0"
 name = "salesforce-data-customcode"
 readme = "README.md"
 requires-python = ">=3.10,<3.12"
-version = "0.1.2"
+version = "0.1.5"
 [tool.black]
 exclude = '''

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/src/datacustomcode/cli.py RENAMED Viewed

@@ -69,6 +69,15 @@ def configure(
     ).update_ini(profile=profile)
+@cli.command()
+@click.argument("path", default="payload")
+def zip(path: str):
+    from datacustomcode.deploy import zip
+    logger.debug("Zipping project")
+    zip(path)
 @cli.command()
 @click.option("--profile", default="default")
 @click.option("--path", default="payload")
@@ -127,8 +136,11 @@ def init(directory: str):
 @click.argument("filename")
 @click.option("--config")
 @click.option("--dry-run", is_flag=True)
-def scan(filename: str, config: str, dry_run: bool):
-    from datacustomcode.scan import dc_config_json_from_file
+@click.option(
+    "--no-requirements", is_flag=True, help="Skip generating requirements.txt file"
+)
+def scan(filename: str, config: str, dry_run: bool, no_requirements: bool):
+    from datacustomcode.scan import dc_config_json_from_file, write_requirements_file
     config_location = config or os.path.join(os.path.dirname(filename), "config.json")
     click.echo(
@@ -143,6 +155,13 @@ def scan(filename: str, config: str, dry_run: bool):
         with open(config_location, "w") as f:
             json.dump(config_json, f, indent=2)
+        if not no_requirements:
+            requirements_path = write_requirements_file(filename)
+            click.echo(
+                "Generated requirements file: "
+                + click.style(requirements_path, fg="blue", bold=True)
+            )
 @cli.command()
 @click.argument("entrypoint")

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/src/datacustomcode/deploy.py RENAMED Viewed

@@ -169,25 +169,14 @@ def prepare_dependency_archive(directory: str) -> None:
         archive_file = os.path.join(archives_dir, DEPENDENCIES_ARCHIVE_NAME)
         with tarfile.open(archive_file, "w:gz") as tar:
             for file in os.listdir(temp_dir):
+                # Exclude requirements.txt from the archive
+                if file == "requirements.txt":
+                    continue
                 tar.add(os.path.join(temp_dir, file), arcname=file)
         logger.debug(f"Dependencies downloaded and archived to {archive_file}")
-def zip_and_upload_directory(directory: str, file_upload_url: str) -> None:
-    file_upload_url = unescape(file_upload_url)
-    logger.debug(f"Zipping directory... {directory}")
-    shutil.make_archive(ZIP_FILE_NAME.rstrip(".zip"), "zip", directory)
-    logger.debug(f"Uploading deployment to {file_upload_url}")
-    with open(ZIP_FILE_NAME, "rb") as zip_file:
-        response = requests.put(
-            file_upload_url, data=zip_file, headers={"Content-Type": "application/zip"}
-        )
-        response.raise_for_status()
 class DeploymentsResponse(BaseModel):
     deploymentStatus: str
@@ -325,6 +314,71 @@ def create_data_transform(
     return response
+def has_nonempty_requirements_file(directory: str) -> bool:
+    """
+    Check if requirements.txt exists in the given directory and has at least
+    one non-comment line.
+    Args:
+        directory (str): The directory to check for requirements.txt.
+    Returns:
+        bool: True if requirements.txt exists and has a non-comment line,
+        False otherwise.
+    """
+    # Look for requirements.txt in the parent directory of the given directory
+    requirements_path = os.path.join(os.path.dirname(directory), "requirements.txt")
+    try:
+        if os.path.isfile(requirements_path):
+            with open(requirements_path, "r", encoding="utf-8") as f:
+                for line in f:
+                    # Consider non-empty if any line is not a comment (ignoring
+                    # leading whitespace)
+                    if line.strip() and not line.lstrip().startswith("#"):
+                        return True
+    except Exception as e:
+        logger.error(f"Error reading requirements.txt: {e}")
+    return False
+def upload_zip(file_upload_url: str) -> None:
+    file_upload_url = unescape(file_upload_url)
+    with open(ZIP_FILE_NAME, "rb") as zip_file:
+        response = requests.put(
+            file_upload_url, data=zip_file, headers={"Content-Type": "application/zip"}
+        )
+        response.raise_for_status()
+def zip(
+    directory: str,
+):
+    # Create a zip file excluding .DS_Store files
+    import zipfile
+    # prepare payload only if requirements.txt is non-empty
+    if has_nonempty_requirements_file(directory):
+        prepare_dependency_archive(directory)
+    else:
+        logger.info(
+            f"Skipping dependency archive: requirements.txt is missing or empty "
+            f"in {directory}"
+        )
+    logger.debug(f"Zipping directory... {directory}")
+    with zipfile.ZipFile(ZIP_FILE_NAME, "w", zipfile.ZIP_DEFLATED) as zipf:
+        for root, dirs, files in os.walk(directory):
+            # Skip .DS_Store files when adding to zip
+            for file in files:
+                if file != ".DS_Store":
+                    file_path = os.path.join(root, file)
+                    # Preserve relative path structure in the zip file
+                    arcname = os.path.relpath(file_path, directory)
+                    zipf.write(file_path, arcname)
+    logger.debug(f"Created zip file: {ZIP_FILE_NAME}")
 def deploy_full(
     directory: str,
     metadata: TransformationJobMetadata,
@@ -340,7 +394,8 @@ def deploy_full(
     # create deployment and upload payload
     deployment = create_deployment(access_token, metadata)
-    zip_and_upload_directory(directory, deployment.fileUploadUrl)
+    zip(directory)
+    upload_zip(deployment.fileUploadUrl)
     wait_for_deployment(access_token, metadata, callback)
     # create data transform

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/src/datacustomcode/scan.py RENAMED Viewed

@@ -15,9 +15,12 @@
 from __future__ import annotations
 import ast
+import os
 from typing import (
     Any,
+    ClassVar,
     Dict,
+    Set,
     Union,
 )
@@ -131,6 +134,137 @@ class ClientMethodVisitor(ast.NodeVisitor):
         )
+class ImportVisitor(ast.NodeVisitor):
+    """AST Visitor that extracts external package imports from Python code."""
+    # Standard library modules that should be excluded from requirements
+    STANDARD_LIBS: ClassVar[set[str]] = {
+        "abc",
+        "argparse",
+        "ast",
+        "asyncio",
+        "base64",
+        "collections",
+        "configparser",
+        "contextlib",
+        "copy",
+        "csv",
+        "datetime",
+        "enum",
+        "functools",
+        "glob",
+        "hashlib",
+        "http",
+        "importlib",
+        "inspect",
+        "io",
+        "itertools",
+        "json",
+        "logging",
+        "math",
+        "os",
+        "pathlib",
+        "pickle",
+        "random",
+        "re",
+        "shutil",
+        "site",
+        "socket",
+        "sqlite3",
+        "string",
+        "subprocess",
+        "sys",
+        "tempfile",
+        "threading",
+        "time",
+        "traceback",
+        "typing",
+        "uuid",
+        "warnings",
+        "xml",
+        "zipfile",
+    }
+    # Additional packages to exclude from requirements.txt
+    EXCLUDED_PACKAGES: ClassVar[set[str]] = {
+        "datacustomcode",  # Internal package
+        "pyspark",  # Provided by the runtime environment
+    }
+    def __init__(self) -> None:
+        self.imports: Set[str] = set()
+    def visit_Import(self, node: ast.Import) -> None:
+        """Visit an import statement (e.g., import os, sys)."""
+        for name in node.names:
+            # Get the top-level package name
+            package = name.name.split(".")[0]
+            if (
+                package not in self.STANDARD_LIBS
+                and package not in self.EXCLUDED_PACKAGES
+                and not package.startswith("_")
+            ):
+                self.imports.add(package)
+        self.generic_visit(node)
+    def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
+        """Visit a from-import statement (e.g., from os import path)."""
+        if node.module is not None:
+            # Get the top-level package
+            package = node.module.split(".")[0]
+            if (
+                package not in self.STANDARD_LIBS
+                and package not in self.EXCLUDED_PACKAGES
+                and not package.startswith("_")
+            ):
+                self.imports.add(package)
+        self.generic_visit(node)
+def scan_file_for_imports(file_path: str) -> Set[str]:
+    """Scan a Python file for external package imports."""
+    with open(file_path, "r") as f:
+        code = f.read()
+        tree = ast.parse(code)
+        visitor = ImportVisitor()
+        visitor.visit(tree)
+        return visitor.imports
+def write_requirements_file(file_path: str) -> str:
+    """
+    Scan a Python file for imports and write them to requirements.txt.
+    Args:
+        file_path: Path to the Python file to scan
+    Returns:
+        Path to the generated requirements.txt file
+    """
+    imports = scan_file_for_imports(file_path)
+    # Write requirements.txt in the parent directory of the Python file
+    file_dir = os.path.dirname(file_path)
+    parent_dir = os.path.dirname(file_dir) if file_dir else "."
+    requirements_path = os.path.join(parent_dir, "requirements.txt")
+    # If the file exists, read existing requirements and merge with new ones
+    existing_requirements = set()
+    if os.path.exists(requirements_path):
+        with open(requirements_path, "r") as f:
+            existing_requirements = {line.strip() for line in f if line.strip()}
+    # Merge existing requirements with newly discovered ones
+    all_requirements = existing_requirements.union(imports)
+    # Write the combined requirements
+    with open(requirements_path, "w") as f:
+        for package in sorted(all_requirements):
+            f.write(f"{package}\n")
+    return requirements_path
 def scan_file(file_path: str) -> DataAccessLayerCalls:
     """Scan a single Python file for Client read/write method calls."""
     with open(file_path, "r") as f:

salesforce_data_customcode-0.1.5/src/datacustomcode/templates/account.ipynb ADDED Viewed

@@ -0,0 +1,86 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from datacustomcode.client import Client\n",
+    "from datacustomcode.io.writer.base import WriteMode\n",
+    "from pyspark.sql.functions import col, upper"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "client = Client()\n",
+    "\n",
+    "df = client.read_dlo(\"Account_Home__dll\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Perform transformations on the DataFrame\n",
+    "df_upper1 = df.withColumn(\"Description__c\", upper(col(\"Description__c\")))\n",
+    "\n",
+    "# Drop specific columns related to relationships\n",
+    "df_upper1 = df_upper1.drop(\"KQ_ParentId__c\")\n",
+    "df_upper1 = df_upper1.drop(\"KQ_Id__c\")\n",
+    "\n",
+    "df_upper1.show()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the transformed DataFrame\n",
+    "dlo_name = \"Account_Home_copy__dll\"\n",
+    "client.write_to_dlo(dlo_name, df_upper1, write_mode=WriteMode.APPEND)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

{salesforce_data_customcode-0.1.2 → salesforce_data_customcode-0.1.5}/src/datacustomcode/templates/jupyterlab.sh RENAMED Viewed

@@ -48,13 +48,13 @@ check_docker() {
 # Function to start Jupyter server
 start_jupyter() {
     echo "Building the docker image"
-    docker build -t datacloud-byoc .
+    docker build -t datacloud-customcode .
     echo "Running the docker container"
     docker run -d --rm -p 8888:8888 \
         -v $(pwd):/workspace \
         --name jupyter-server \
-        datacloud-byoc jupyter lab \
+        datacloud-customcode jupyter lab \
         --ip=0.0.0.0 \
         --port=8888 \
         --no-browser \