PyPI - ml-analytics-tools - Versions diffs - 0.2.1__tar.gz → 0.3.0__tar.gz - Mend

ml-analytics-tools 0.2.1tar.gz → 0.3.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (31) hide show

{ml_analytics_tools-0.2.1 → ml_analytics_tools-0.3.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ml-analytics-tools
-Version: 0.2.1
+Version: 0.3.0
 Summary: Tools for ML projects and data management
 Requires-Python: >=3.11
 Description-Content-Type: text/markdown
@@ -30,6 +30,7 @@ Requires-Dist: seaborn>=0.13.2
 Requires-Dist: setuptools>=42.0.0
 Requires-Dist: shap>=0.47.2
 Requires-Dist: slack-sdk>=3.27.0
+Requires-Dist: snowflake-connector-python[pandas,secure-local-storage]>=4.6.0
 Dynamic: license-file
 # ML Analytics Tools
@@ -49,7 +50,7 @@ arguments.
 ## What Is Included
-- `DataConnector`: run Redshift SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.
+- `DataConnector`: run Redshift or Snowflake SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.
 - `S3Connector`: read, write, list, delete, and query S3 data with DuckDB.
 - `GSheet`: read, write, share, and export Google Sheets data.
 - `SlackConnector`: send messages, upload files, and manage simple Slack interactions.
@@ -90,6 +91,18 @@ BI_REDSHIFT_USER=analytics_user
 BI_REDSHIFT_PASSWORD=secret
 BI_REDSHIFT_PORT=5439
+# Snowflake
+SNOWFLAKE_USER=your.name@example.com
+SNOWFLAKE_ACCOUNT=your-account
+SNOWFLAKE_WAREHOUSE=ANALYTICS_S_WH
+SNOWFLAKE_DATABASE=ANALYTICS_DB
+SNOWFLAKE_SCHEMA=PUBLIC
+SNOWFLAKE_AUTHENTICATOR=externalbrowser
+# Browser-free Snowflake auth for local or Databricks jobs
+SNOWFLAKE_PRIVATE_KEY_PATH=~/.snowflake/rsa_key.p8
+SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=secret
 # S3
 ML_ANALYTICS_S3_BUCKET=my-analytics-bucket
@@ -141,6 +154,43 @@ df = dc.sql("SELECT * FROM analytics.customer_features LIMIT 100")
 df_polars = dc.sql("queries/features.sql", format="polars", country="es")
 ```
+### Query Snowflake
+```python
+from ml_analytics import DataConnector
+dc = DataConnector(engine="snowflake")
+df = dc.sql("SELECT 1 AS col_1")
+```
+For local interactive work, `SNOWFLAKE_AUTHENTICATOR=externalbrowser` is supported.
+SSO tokens are cached in the OS keychain, so the browser login only happens once
+per token lifetime.
+For Databricks and Spark jobs, use key-pair auth instead. The connector reads
+default Databricks personal-scope secrets automatically:
+```bash
+databricks secrets put-secret user-your.name@example.com snowflake_key --bytes-value """$(cat rsa_key.p8)"""
+databricks secrets put-secret user-your.name@example.com snowflake_key_pass --string-value """<password>"""
+```
+Then build Spark connector options without opening a browser:
+```python
+from ml_analytics import DataConnector
+dc = DataConnector(engine="snowflake", secret_scope="user-your.name@example.com")
+options = dc.snowflake_spark_options()
+df = (
+    spark.read.format("net.snowflake.spark.snowflake")
+    .options(**options)
+    .option("query", "SELECT 1 AS col_1")
+    .load()
+)
+```
 ### Create A Redshift Table From A DataFrame
 ```python
@@ -181,6 +231,24 @@ df = gsheet.read_sheet(spreadsheet_id="...", sheet_name="Input")
 gsheet.write_sheet(df, spreadsheet_id="...", sheet_name="Results")
 ```
+#### OAuth authentication (alternative to a service account)
+`GSheet` can authenticate as your own Google account using OAuth installed-app
+credentials (e.g. Preply's Google Workspace CLI credentials). Set these env vars
+and the connector uses OAuth automatically when no service-account credentials
+are found:
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `GOOGLE_OAUTH_CLIENT_ID` | yes | OAuth client id (`...apps.googleusercontent.com`) |
+| `GOOGLE_OAUTH_CLIENT_SECRET` | yes | OAuth client secret (`GOCSPX-...`) |
+| `GOOGLE_CLOUD_PROJECT` | optional | GCP project id (e.g. `preply-gworkspace-cli`) |
+| `GSHEET_TOKEN_PATH` | optional | Token cache path (default `~/.config/ml-analytics/gsheet_token.json`) |
+The first run opens a browser for one-time consent; the cached refresh token
+makes later runs non-interactive. Under OAuth, `get_service_account_email()`
+returns `None`.
 ### Log To MLflow
 ```python

{ml_analytics_tools-0.2.1 → ml_analytics_tools-0.3.0}/README.md RENAMED Viewed

@@ -15,7 +15,7 @@ arguments.
 ## What Is Included
-- `DataConnector`: run Redshift SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.
+- `DataConnector`: run Redshift or Snowflake SQL, load SQL files, unload/load data through S3, and create Redshift tables from DataFrames.
 - `S3Connector`: read, write, list, delete, and query S3 data with DuckDB.
 - `GSheet`: read, write, share, and export Google Sheets data.
 - `SlackConnector`: send messages, upload files, and manage simple Slack interactions.
@@ -56,6 +56,18 @@ BI_REDSHIFT_USER=analytics_user
 BI_REDSHIFT_PASSWORD=secret
 BI_REDSHIFT_PORT=5439
+# Snowflake
+SNOWFLAKE_USER=your.name@example.com
+SNOWFLAKE_ACCOUNT=your-account
+SNOWFLAKE_WAREHOUSE=ANALYTICS_S_WH
+SNOWFLAKE_DATABASE=ANALYTICS_DB
+SNOWFLAKE_SCHEMA=PUBLIC
+SNOWFLAKE_AUTHENTICATOR=externalbrowser
+# Browser-free Snowflake auth for local or Databricks jobs
+SNOWFLAKE_PRIVATE_KEY_PATH=~/.snowflake/rsa_key.p8
+SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=secret
 # S3
 ML_ANALYTICS_S3_BUCKET=my-analytics-bucket
@@ -107,6 +119,43 @@ df = dc.sql("SELECT * FROM analytics.customer_features LIMIT 100")
 df_polars = dc.sql("queries/features.sql", format="polars", country="es")
 ```
+### Query Snowflake
+```python
+from ml_analytics import DataConnector
+dc = DataConnector(engine="snowflake")
+df = dc.sql("SELECT 1 AS col_1")
+```
+For local interactive work, `SNOWFLAKE_AUTHENTICATOR=externalbrowser` is supported.
+SSO tokens are cached in the OS keychain, so the browser login only happens once
+per token lifetime.
+For Databricks and Spark jobs, use key-pair auth instead. The connector reads
+default Databricks personal-scope secrets automatically:
+```bash
+databricks secrets put-secret user-your.name@example.com snowflake_key --bytes-value """$(cat rsa_key.p8)"""
+databricks secrets put-secret user-your.name@example.com snowflake_key_pass --string-value """<password>"""
+```
+Then build Spark connector options without opening a browser:
+```python
+from ml_analytics import DataConnector
+dc = DataConnector(engine="snowflake", secret_scope="user-your.name@example.com")
+options = dc.snowflake_spark_options()
+df = (
+    spark.read.format("net.snowflake.spark.snowflake")
+    .options(**options)
+    .option("query", "SELECT 1 AS col_1")
+    .load()
+)
+```
 ### Create A Redshift Table From A DataFrame
 ```python
@@ -147,6 +196,24 @@ df = gsheet.read_sheet(spreadsheet_id="...", sheet_name="Input")
 gsheet.write_sheet(df, spreadsheet_id="...", sheet_name="Results")
 ```
+#### OAuth authentication (alternative to a service account)
+`GSheet` can authenticate as your own Google account using OAuth installed-app
+credentials (e.g. Preply's Google Workspace CLI credentials). Set these env vars
+and the connector uses OAuth automatically when no service-account credentials
+are found:
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `GOOGLE_OAUTH_CLIENT_ID` | yes | OAuth client id (`...apps.googleusercontent.com`) |
+| `GOOGLE_OAUTH_CLIENT_SECRET` | yes | OAuth client secret (`GOCSPX-...`) |
+| `GOOGLE_CLOUD_PROJECT` | optional | GCP project id (e.g. `preply-gworkspace-cli`) |
+| `GSHEET_TOKEN_PATH` | optional | Token cache path (default `~/.config/ml-analytics/gsheet_token.json`) |
+The first run opens a browser for one-time consent; the cached refresh token
+makes later runs non-interactive. Under OAuth, `get_service_account_email()`
+returns `None`.
 ### Log To MLflow
 ```python

ml-analytics-tools 0.2.1__tar.gz → 0.3.0__tar.gz

ml-analytics-tools 0.2.1tar.gz → 0.3.0tar.gz