rgwfuncs 0.0.35__tar.gz → 0.0.36__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: rgwfuncs
3
- Version: 0.0.35
3
+ Version: 0.0.36
4
4
  Summary: A functional programming paradigm for mathematical modelling and data science
5
5
  Home-page: https://github.com/ryangerardwilson/rgwfunc
6
6
  Author: Ryan Gerard Wilson
@@ -667,28 +667,49 @@ Drop duplicate rows based on specified columns, retaining the last occurrence.
667
667
 
668
668
  ### 11. `load_data_from_query`
669
669
 
670
- Load data from a database query into a DataFrame based on a configuration preset.
670
+ Load data from a specified database using a SQL query and return the results in a Pandas DataFrame. The database connection configurations are determined by a preset name specified in a configuration file.
671
671
 
672
- - **Parameters:**
673
- - `db_preset_name` (str): Name of the database preset in the configuration file.
674
- - `query` (str): The SQL query to execute.
672
+ #### Features
675
673
 
676
- - **Returns:**
677
- - `pd.DataFrame`: A DataFrame containing the query result.
674
+ - Multi-Database Support: This function supports different database types, including MSSQL, MySQL, ClickHouse, Google BigQuery, and AWS Athena, based on the configuration preset selected.
675
+ - Configuration-Based: It utilizes a configuration file to store database connection details securely, avoiding hardcoding sensitive information directly into the script.
676
+ - Dynamic Query Execution: Capable of executing custom user-defined SQL queries against the specified database.
677
+ - Automatic Result Loading: Fetches query results and loads them directly into a Pandas DataFrame for further manipulation and analysis.
678
678
 
679
- - **Notes:**
680
- - The configuration file is assumed to be located at `~/.rgwfuncsrc`.
679
+ #### Parameters
681
680
 
682
- - **Example:**
681
+ - `db_preset_name` (str): The name of the database preset found in the configuration file. This preset determines which database connection details to use.
682
+ - `query` (str): The SQL query string to be executed on the database.
683
+
684
+ #### Returns
685
+
686
+ - `pd.DataFrame`: Returns a DataFrame that contains the results from the executed SQL query.
683
687
 
684
- from rgwfuncs import load_data_from_query
688
+ #### Configuration Details
689
+
690
+ - The configuration file is expected to be in JSON format and located at `~/.rgwfuncsrc`.
691
+ - Each preset within the configuration file must include:
692
+ - `name`: Name of the database preset.
693
+ - `db_type`: Type of the database (`mssql`, `mysql`, `clickhouse`, `google_big_query`, `aws_athena`).
694
+ - `credentials`: Necessary credentials such as host, username, password, and potentially others depending on the database type.
695
+
696
+ #### Example
697
+
698
+ from rgwfuncs import load_data_from_query
699
+
700
+ # Load data using a preset configuration
701
+ df = load_data_from_query(
702
+ db_preset_name="MyDBPreset",
703
+ query="SELECT * FROM my_table"
704
+ )
705
+ print(df)
685
706
 
686
- df = load_data_from_query(
687
- db_preset_name="MyDBPreset",
688
- query="SELECT * FROM my_table"
689
- )
690
- print(df)
707
+ #### Notes
691
708
 
709
+ - Security: Ensure that the configuration file (`~/.rgwfuncsrc`) is secure and accessible only to authorized users, as it contains sensitive information.
710
+ - Pre-requisites: Ensure the necessary Python packages are installed for each database type you wish to query. For example, `pymssql` for MSSQL, `mysql-connector-python` for MySQL, and so on.
711
+ - Error Handling: The function raises a `ValueError` if the specified preset name does not exist or if the database type is unsupported. Additional exceptions may arise from network issues or database errors.
712
+ - Environment: For AWS Athena, ensure that AWS credentials are configured properly for the boto3 library to authenticate successfully. Consider using AWS IAM roles or AWS Secrets Manager for better security management.
692
713
 
693
714
  --------------------------------------------------------------------------------
694
715
 
@@ -641,28 +641,49 @@ Drop duplicate rows based on specified columns, retaining the last occurrence.
641
641
 
642
642
  ### 11. `load_data_from_query`
643
643
 
644
- Load data from a database query into a DataFrame based on a configuration preset.
644
+ Load data from a specified database using a SQL query and return the results in a Pandas DataFrame. The database connection configurations are determined by a preset name specified in a configuration file.
645
645
 
646
- - **Parameters:**
647
- - `db_preset_name` (str): Name of the database preset in the configuration file.
648
- - `query` (str): The SQL query to execute.
646
+ #### Features
649
647
 
650
- - **Returns:**
651
- - `pd.DataFrame`: A DataFrame containing the query result.
648
+ - Multi-Database Support: This function supports different database types, including MSSQL, MySQL, ClickHouse, Google BigQuery, and AWS Athena, based on the configuration preset selected.
649
+ - Configuration-Based: It utilizes a configuration file to store database connection details securely, avoiding hardcoding sensitive information directly into the script.
650
+ - Dynamic Query Execution: Capable of executing custom user-defined SQL queries against the specified database.
651
+ - Automatic Result Loading: Fetches query results and loads them directly into a Pandas DataFrame for further manipulation and analysis.
652
652
 
653
- - **Notes:**
654
- - The configuration file is assumed to be located at `~/.rgwfuncsrc`.
653
+ #### Parameters
655
654
 
656
- - **Example:**
655
+ - `db_preset_name` (str): The name of the database preset found in the configuration file. This preset determines which database connection details to use.
656
+ - `query` (str): The SQL query string to be executed on the database.
657
+
658
+ #### Returns
659
+
660
+ - `pd.DataFrame`: Returns a DataFrame that contains the results from the executed SQL query.
657
661
 
658
- from rgwfuncs import load_data_from_query
662
+ #### Configuration Details
663
+
664
+ - The configuration file is expected to be in JSON format and located at `~/.rgwfuncsrc`.
665
+ - Each preset within the configuration file must include:
666
+ - `name`: Name of the database preset.
667
+ - `db_type`: Type of the database (`mssql`, `mysql`, `clickhouse`, `google_big_query`, `aws_athena`).
668
+ - `credentials`: Necessary credentials such as host, username, password, and potentially others depending on the database type.
669
+
670
+ #### Example
671
+
672
+ from rgwfuncs import load_data_from_query
673
+
674
+ # Load data using a preset configuration
675
+ df = load_data_from_query(
676
+ db_preset_name="MyDBPreset",
677
+ query="SELECT * FROM my_table"
678
+ )
679
+ print(df)
659
680
 
660
- df = load_data_from_query(
661
- db_preset_name="MyDBPreset",
662
- query="SELECT * FROM my_table"
663
- )
664
- print(df)
681
+ #### Notes
665
682
 
683
+ - Security: Ensure that the configuration file (`~/.rgwfuncsrc`) is secure and accessible only to authorized users, as it contains sensitive information.
684
+ - Pre-requisites: Ensure the necessary Python packages are installed for each database type you wish to query. For example, `pymssql` for MSSQL, `mysql-connector-python` for MySQL, and so on.
685
+ - Error Handling: The function raises a `ValueError` if the specified preset name does not exist or if the database type is unsupported. Additional exceptions may arise from network issues or database errors.
686
+ - Environment: For AWS Athena, ensure that AWS credentials are configured properly for the boto3 library to authenticate successfully. Consider using AWS IAM roles or AWS Secrets Manager for better security management.
666
687
 
667
688
  --------------------------------------------------------------------------------
668
689
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "rgwfuncs"
7
- version = "0.0.35"
7
+ version = "0.0.36"
8
8
  authors = [
9
9
  { name = "Ryan Gerard Wilson", email = "ryangerardwilson@gmail.com" },
10
10
  ]
@@ -1,6 +1,6 @@
1
1
  [metadata]
2
2
  name = rgwfuncs
3
- version = 0.0.35
3
+ version = 0.0.36
4
4
  author = Ryan Gerard Wilson
5
5
  author_email = ryangerardwilson@gmail.com
6
6
  description = A functional programming paradigm for mathematical modelling and data science
@@ -384,8 +384,7 @@ def load_data_from_query(db_preset_name: str, query: str) -> pd.DataFrame:
384
384
  raise ConnectionError(
385
385
  "All attempts to connect to ClickHouse failed.")
386
386
 
387
- def query_google_big_query(
388
- db_preset: Dict[str, Any], query: str) -> pd.DataFrame:
387
+ def query_google_big_query(db_preset: Dict[str, Any], query: str) -> pd.DataFrame:
389
388
  json_file_path = db_preset['json_file_path']
390
389
  project_id = db_preset['project_id']
391
390
 
@@ -400,6 +399,56 @@ def load_data_from_query(db_preset_name: str, query: str) -> pd.DataFrame:
400
399
 
401
400
  return pd.DataFrame(rows, columns=columns)
402
401
 
402
+
403
+ def query_athena(db_preset: Dict[str, Any], query: str) -> pd.DataFrame:
404
+
405
+ def execute_athena_query(athena_client, query: str, database: str, output_bucket: str) -> str:
406
+ response = athena_client.start_query_execution(
407
+ QueryString=query,
408
+ QueryExecutionContext={"Database": database},
409
+ ResultConfiguration={"OutputLocation": output_bucket}
410
+ )
411
+ return response["QueryExecutionId"]
412
+
413
+ def wait_for_athena_query_to_complete(athena_client, query_execution_id: str):
414
+ while True:
415
+ response = athena_client.get_query_execution(QueryExecutionId=query_execution_id)
416
+ state = response["QueryExecution"]["Status"]["State"]
417
+ if state == "SUCCEEDED":
418
+ break
419
+ elif state in ("FAILED", "CANCELLED"):
420
+ raise Exception(f"Query failed with state: {state}")
421
+ time.sleep(1)
422
+
423
+ def download_athena_query_results(athena_client, query_execution_id: str) -> pd.DataFrame:
424
+ paginator = athena_client.get_paginator("get_query_results")
425
+ result_pages = paginator.paginate(QueryExecutionId=query_execution_id)
426
+ rows = []
427
+ columns = []
428
+ for page in result_pages:
429
+ if not columns:
430
+ columns = [col["Name"] for col in page["ResultSet"]["ResultSetMetadata"]["ColumnInfo"]]
431
+ rows.extend(page["ResultSet"]["Rows"])
432
+
433
+ data = [[col.get("VarCharValue", None) for col in row["Data"]] for row in rows[1:]]
434
+ return pd.DataFrame(data, columns=columns)
435
+
436
+
437
+ aws_region = db_preset['region']
438
+ database = db_preset['database']
439
+ output_bucket = db_preset['output_bucket']
440
+
441
+ athena_client = boto3.client(
442
+ 'athena',
443
+ region_name=aws_region,
444
+ aws_access_key_id=db_preset['aws_access_key'],
445
+ aws_secret_access_key=db_preset['aws_secret_key']
446
+ )
447
+
448
+ query_execution_id = execute_athena_query(athena_client, query, database, output_bucket)
449
+ wait_for_athena_query_to_complete(athena_client, query_execution_id)
450
+ return download_athena_query_results(athena_client, query_execution_id)
451
+
403
452
  # Assume the configuration file is located at ~/.rgwfuncsrc
404
453
  config_path = os.path.expanduser('~/.rgwfuncsrc')
405
454
  with open(config_path, 'r') as f:
@@ -422,6 +471,8 @@ def load_data_from_query(db_preset_name: str, query: str) -> pd.DataFrame:
422
471
  return query_clickhouse(db_preset, query)
423
472
  elif db_type == 'google_big_query':
424
473
  return query_google_big_query(db_preset, query)
474
+ elif db_type == 'athena':
475
+ return query_athena(db_preset, query)
425
476
  else:
426
477
  raise ValueError(f"Unsupported db_type: {db_type}")
427
478
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.2
2
2
  Name: rgwfuncs
3
- Version: 0.0.35
3
+ Version: 0.0.36
4
4
  Summary: A functional programming paradigm for mathematical modelling and data science
5
5
  Home-page: https://github.com/ryangerardwilson/rgwfunc
6
6
  Author: Ryan Gerard Wilson
@@ -667,28 +667,49 @@ Drop duplicate rows based on specified columns, retaining the last occurrence.
667
667
 
668
668
  ### 11. `load_data_from_query`
669
669
 
670
- Load data from a database query into a DataFrame based on a configuration preset.
670
+ Load data from a specified database using a SQL query and return the results in a Pandas DataFrame. The database connection configurations are determined by a preset name specified in a configuration file.
671
671
 
672
- - **Parameters:**
673
- - `db_preset_name` (str): Name of the database preset in the configuration file.
674
- - `query` (str): The SQL query to execute.
672
+ #### Features
675
673
 
676
- - **Returns:**
677
- - `pd.DataFrame`: A DataFrame containing the query result.
674
+ - Multi-Database Support: This function supports different database types, including MSSQL, MySQL, ClickHouse, Google BigQuery, and AWS Athena, based on the configuration preset selected.
675
+ - Configuration-Based: It utilizes a configuration file to store database connection details securely, avoiding hardcoding sensitive information directly into the script.
676
+ - Dynamic Query Execution: Capable of executing custom user-defined SQL queries against the specified database.
677
+ - Automatic Result Loading: Fetches query results and loads them directly into a Pandas DataFrame for further manipulation and analysis.
678
678
 
679
- - **Notes:**
680
- - The configuration file is assumed to be located at `~/.rgwfuncsrc`.
679
+ #### Parameters
681
680
 
682
- - **Example:**
681
+ - `db_preset_name` (str): The name of the database preset found in the configuration file. This preset determines which database connection details to use.
682
+ - `query` (str): The SQL query string to be executed on the database.
683
+
684
+ #### Returns
685
+
686
+ - `pd.DataFrame`: Returns a DataFrame that contains the results from the executed SQL query.
683
687
 
684
- from rgwfuncs import load_data_from_query
688
+ #### Configuration Details
689
+
690
+ - The configuration file is expected to be in JSON format and located at `~/.rgwfuncsrc`.
691
+ - Each preset within the configuration file must include:
692
+ - `name`: Name of the database preset.
693
+ - `db_type`: Type of the database (`mssql`, `mysql`, `clickhouse`, `google_big_query`, `aws_athena`).
694
+ - `credentials`: Necessary credentials such as host, username, password, and potentially others depending on the database type.
695
+
696
+ #### Example
697
+
698
+ from rgwfuncs import load_data_from_query
699
+
700
+ # Load data using a preset configuration
701
+ df = load_data_from_query(
702
+ db_preset_name="MyDBPreset",
703
+ query="SELECT * FROM my_table"
704
+ )
705
+ print(df)
685
706
 
686
- df = load_data_from_query(
687
- db_preset_name="MyDBPreset",
688
- query="SELECT * FROM my_table"
689
- )
690
- print(df)
707
+ #### Notes
691
708
 
709
+ - Security: Ensure that the configuration file (`~/.rgwfuncsrc`) is secure and accessible only to authorized users, as it contains sensitive information.
710
+ - Pre-requisites: Ensure the necessary Python packages are installed for each database type you wish to query. For example, `pymssql` for MSSQL, `mysql-connector-python` for MySQL, and so on.
711
+ - Error Handling: The function raises a `ValueError` if the specified preset name does not exist or if the database type is unsupported. Additional exceptions may arise from network issues or database errors.
712
+ - Environment: For AWS Athena, ensure that AWS credentials are configured properly for the boto3 library to authenticate successfully. Consider using AWS IAM roles or AWS Secrets Manager for better security management.
692
713
 
693
714
  --------------------------------------------------------------------------------
694
715
 
File without changes