PyPI - sqlServerConnector - Versions diffs - 0.1.6__tar.gz → 0.1.8__tar.gz - Mend

sqlServerConnector 0.1.6tar.gz → 0.1.8tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sqlServerConnector
-Version: 0.1.6
+Version: 0.1.8
 Summary: A custom SQL Server Connector for ETL processes with Pandas
 Author-email: Nguyen Minh Son <nguyen.minhson1511@gmail.com>
 Project-URL: Homepage, https://github.com/johnnyb1509/sqlServerConnector
@@ -20,8 +20,8 @@ Requires-Dist: jupyterlab
 # SQL Server Connector
 Thư viện kết nối SQL Server chuyên dụng cho các tác vụ ETL, được tối ưu hóa cho **Pandas**, hỗ trợ **Tiếng Việt (Unicode)** và **Upsert (Merge)** hiệu năng cao.
-## Update 0.1.6
-> Sửa lỗi nhỏ liên quan đến việc upsert với các bảng có cột chứa Tiếng việt
+## Update 0.1.7
+> Sửa lỗi nhỏ liên quan đến việc upsert với các bảng có cột chứa Tiếng Việt
 ## 🚀 Tính năng nổi bật
@@ -69,9 +69,10 @@ db_info:
 ## 📝 Hướng dẫn sử dụng nhanh
 1. Khởi tạo kết nối
-```python
+```python
 import yaml
 from connector import SQLServerConnector
 # Load config
 with open('config/db_config.yaml', 'r') as f:
     conf = yaml.safe_load(f)['db_info']
@@ -88,46 +89,68 @@ db = SQLServerConnector(
 2. Lấy dữ liệu (Read)
 ```python
 # Cách 1: Lấy toàn bộ bảng
-df = db.get_data("DM_KhachHang")
+df = db.get_data("SELECT * FROM DM_KhachHang")
-# Cách 2: Dùng câu lệnh SQL tùy ý
+# Cách 2: Dùng câu lệnh SQL tuy bien
 query = """
-    SELECT TOP 100 * FROM Sales_Transaction
-    WHERE created_date >= '2023-01-01'
+    SELECT TOP 100 * FROM Sales_Transaction
+    WHERE created_date >= :from_date
 """
-df_sales = db.get_data(query)
+df_sales = db.get_data(query, params={"from_date": "2023-01-01"})
 print(df_sales.head())
 ```
-3. Ghi dữ liệu (Upsert)
+3. Kiểm tra bảng tồn tại
+```python
+if not db.check_table_exists("Fact_Sales"):
+    print("Bang Fact_Sales chua ton tai")
+```
+4. Ghi du lieu (Upsert)
 ```python
 import pandas as pd
-# Giả lập dữ liệu
+# Gia lap du lieu
 data = {
     'TransactionID': [101, 102],
-    'Product': ['Laptop Dell', 'Chuột Logitech'], # Hỗ trợ tiếng Việt
+    'Product': ['Laptop Dell', 'Chuot Logitech'],  # Ho tro tieng Viet
     'Amount': [15000000, 250000]
 }
 df_new = pd.DataFrame(data)
-# Đẩy vào DB
+# Day vao DB
 db.upsert_data(
     df=df_new,
     target_table="Fact_Sales",
-    primary_key="TransactionID",  # Cột dùng để định danh (tránh trùng lặp)
-    auto_evolve_schema=True       # Tự động thêm cột nếu thiếu
+    match_columns=["TransactionID"],  # Khoa so khop (Primary Key)
+    conflict_strategy="last",         # "last" hoac "skip"
+    auto_evolve_schema=True            # Tu dong them cot neu thieu
+)
+print("Du lieu da duoc upsert thanh cong!")
+```
+5. Thuc thi cau lenh khong tra ve du lieu
+```python
+# Vi du: xoa du lieu cu
+db.execute_query(
+    "DELETE FROM Fact_Sales WHERE created_date < :cutoff",
+    params={"cutoff": "2023-01-01"}
 )
-print("Dữ liệu đã được upsert thành công!")
 ```
-4. Đóng kết nối
+6. Dong ket noi
 ```python
-# Luôn đóng kết nối khi hoàn tất để giải phóng tài nguyên
+# Luon dong ket noi khi hoan tat de giai phong tai nguyen
 db.dispose()
 ```
 ## ⚠️ Lưu ý quan trọng
-1. **Primary Key:** Khi dùng upsert_data, bắt buộc phải cung cấp primary_key. Nếu bảng chưa có Primary Key, thư viện sẽ tự set cột đó làm khóa chính khi tạo bảng mới.
+1. **Primary Key:** Khi dùng upsert_data, bat buoc phai cung cap `match_columns`. Neu bang chua co Primary Key, thu vien se co gang set cac cot nay lam khoa chinh khi tao bang moi.
 2. **Date Time:** Các cột ngày tháng nên được convert sang datetime64[ns] trong Pandas trước khi đẩy vào để đảm bảo tính chính xác.
+3. **Upgrade version:** Luôn kiểm tra và cập nhật lên phiên bản mới nhất để tận dụng các tính năng và sửa lỗi mới nhất. For developer, change version in `pyproject.toml` and build & upload to PyPI:
+```bash
+python -m build
+python -m twine upload dist/*
+```

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/README.md RENAMED Viewed

@@ -1,8 +1,8 @@
 # SQL Server Connector
 Thư viện kết nối SQL Server chuyên dụng cho các tác vụ ETL, được tối ưu hóa cho **Pandas**, hỗ trợ **Tiếng Việt (Unicode)** và **Upsert (Merge)** hiệu năng cao.
-## Update 0.1.6
-> Sửa lỗi nhỏ liên quan đến việc upsert với các bảng có cột chứa Tiếng việt
+## Update 0.1.7
+> Sửa lỗi nhỏ liên quan đến việc upsert với các bảng có cột chứa Tiếng Việt
 ## 🚀 Tính năng nổi bật
@@ -50,9 +50,10 @@ db_info:
 ## 📝 Hướng dẫn sử dụng nhanh
 1. Khởi tạo kết nối
-```python
+```python
 import yaml
 from connector import SQLServerConnector
 # Load config
 with open('config/db_config.yaml', 'r') as f:
     conf = yaml.safe_load(f)['db_info']
@@ -69,46 +70,68 @@ db = SQLServerConnector(
 2. Lấy dữ liệu (Read)
 ```python
 # Cách 1: Lấy toàn bộ bảng
-df = db.get_data("DM_KhachHang")
+df = db.get_data("SELECT * FROM DM_KhachHang")
-# Cách 2: Dùng câu lệnh SQL tùy ý
+# Cách 2: Dùng câu lệnh SQL tuy bien
 query = """
-    SELECT TOP 100 * FROM Sales_Transaction
-    WHERE created_date >= '2023-01-01'
+    SELECT TOP 100 * FROM Sales_Transaction
+    WHERE created_date >= :from_date
 """
-df_sales = db.get_data(query)
+df_sales = db.get_data(query, params={"from_date": "2023-01-01"})
 print(df_sales.head())
 ```
-3. Ghi dữ liệu (Upsert)
+3. Kiểm tra bảng tồn tại
+```python
+if not db.check_table_exists("Fact_Sales"):
+    print("Bang Fact_Sales chua ton tai")
+```
+4. Ghi du lieu (Upsert)
 ```python
 import pandas as pd
-# Giả lập dữ liệu
+# Gia lap du lieu
 data = {
     'TransactionID': [101, 102],
-    'Product': ['Laptop Dell', 'Chuột Logitech'], # Hỗ trợ tiếng Việt
+    'Product': ['Laptop Dell', 'Chuot Logitech'],  # Ho tro tieng Viet
     'Amount': [15000000, 250000]
 }
 df_new = pd.DataFrame(data)
-# Đẩy vào DB
+# Day vao DB
 db.upsert_data(
     df=df_new,
     target_table="Fact_Sales",
-    primary_key="TransactionID",  # Cột dùng để định danh (tránh trùng lặp)
-    auto_evolve_schema=True       # Tự động thêm cột nếu thiếu
+    match_columns=["TransactionID"],  # Khoa so khop (Primary Key)
+    conflict_strategy="last",         # "last" hoac "skip"
+    auto_evolve_schema=True            # Tu dong them cot neu thieu
 )
-print("Dữ liệu đã được upsert thành công!")
+print("Du lieu da duoc upsert thanh cong!")
 ```
-4. Đóng kết nối
+5. Thuc thi cau lenh khong tra ve du lieu
 ```python
-# Luôn đóng kết nối khi hoàn tất để giải phóng tài nguyên
+# Vi du: xoa du lieu cu
+db.execute_query(
+    "DELETE FROM Fact_Sales WHERE created_date < :cutoff",
+    params={"cutoff": "2023-01-01"}
+)
+```
+6. Dong ket noi
+```python
+# Luon dong ket noi khi hoan tat de giai phong tai nguyen
 db.dispose()
 ```
 ## ⚠️ Lưu ý quan trọng
-1. **Primary Key:** Khi dùng upsert_data, bắt buộc phải cung cấp primary_key. Nếu bảng chưa có Primary Key, thư viện sẽ tự set cột đó làm khóa chính khi tạo bảng mới.
+1. **Primary Key:** Khi dùng upsert_data, bat buoc phai cung cap `match_columns`. Neu bang chua co Primary Key, thu vien se co gang set cac cot nay lam khoa chinh khi tao bang moi.
 2. **Date Time:** Các cột ngày tháng nên được convert sang datetime64[ns] trong Pandas trước khi đẩy vào để đảm bảo tính chính xác.
+3. **Upgrade version:** Luôn kiểm tra và cập nhật lên phiên bản mới nhất để tận dụng các tính năng và sửa lỗi mới nhất. For developer, change version in `pyproject.toml` and build & upload to PyPI:
+```bash
+python -m build
+python -m twine upload dist/*
+```

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/pyproject.toml RENAMED Viewed

@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "sqlServerConnector"
-version = "0.1.6"
+version = "0.1.8"
 description = "A custom SQL Server Connector for ETL processes with Pandas"
 readme = "README.md"
 requires-python = ">=3.8"

sqlserverconnector-0.1.8/src/connector.py ADDED Viewed

@@ -0,0 +1,244 @@
+import os
+import pandas as pd
+import numpy as np
+import uuid
+import sqlalchemy
+from typing import List, Optional, Dict, Union, Any, Literal
+from loguru import logger
+from sqlalchemy import create_engine, text, URL, inspect
+from sqlalchemy.types import NVARCHAR, FLOAT, INTEGER, DATE, DATETIME, BIGINT
+class SQLServerConnector:
+    """
+    Trình kết nối SQL Server chuẩn hóa (Full Features - Fixed Missing Attribute).
+    Tích hợp:
+    - Fast Executemany (Tốc độ cao).
+    - Unicode Support (NVARCHAR).
+    - Upsert Strategy (Last/Skip).
+    - Schema Evolution.
+    - Helper methods: check_table_exists.
+    """
+    def __init__(self, server: str, database: str, username: str, password: str, driver: str = 'ODBC Driver 17 for SQL Server', **kwargs):
+        self.server = server
+        self.database = database
+        self.username = username
+        self.password = password
+        self.driver = driver
+        # Tạo URL kết nối
+        self.connection_url = URL.create(
+            "mssql+pyodbc",
+            query={
+                "odbc_connect": (
+                    f"DRIVER={self.driver};"
+                    f"SERVER={self.server};"
+                    f"DATABASE={self.database};"
+                    f"UID={self.username};"
+                    f"PWD={self.password};"
+                    "Encrypt=no;TrustServerCertificate=yes;"
+                )
+            }
+        )
+        # Engine với fast_executemany=True
+        self.engine = create_engine(
+            self.connection_url,
+            fast_executemany=True,
+            pool_pre_ping=True
+        )
+    def get_data(self, query: str, params: Optional[Dict] = None) -> pd.DataFrame:
+        """Thực thi SELECT và trả về DataFrame"""
+        try:
+            with self.engine.connect() as conn:
+                return pd.read_sql(text(query), conn, params=params)
+        except Exception as e:
+            logger.error(f"Get data error: {e}")
+            raise e
+    def execute_query(self, query: str, params: Optional[Dict] = None):
+        """Thực thi lệnh không trả về dữ liệu (UPDATE, DELETE, etc.)"""
+        try:
+            with self.engine.begin() as conn:
+                conn.execute(text(query), params or {})
+        except Exception as e:
+            logger.error(f"Execute query error: {e}")
+            raise e
+    # --- [ĐÃ BỔ SUNG LẠI HÀM NÀY] ---
+    def check_table_exists(self, table_name: str) -> bool:
+        """Kiểm tra bảng có tồn tại trong database không"""
+        try:
+            inspector = inspect(self.engine)
+            return inspector.has_table(table_name)
+        except Exception as e:
+            logger.error(f"Check table exists failed: {e}")
+            return False
+    # --------------------------------
+    def _generate_dtype_mapping(self, df: pd.DataFrame) -> Dict:
+        """Tự động map kiểu dữ liệu (NVARCHAR cho string)"""
+        dtype_map = {}
+        for col in df.columns:
+            if df[col].dtype == 'object' or pd.api.types.is_string_dtype(df[col]):
+                dtype_map[col] = NVARCHAR(length=None)
+            elif pd.api.types.is_datetime64_any_dtype(df[col]):
+                dtype_map[col] = DATETIME()
+            elif pd.api.types.is_float_dtype(df[col]):
+                dtype_map[col] = FLOAT()
+            elif pd.api.types.is_integer_dtype(df[col]):
+                dtype_map[col] = BIGINT()
+        return dtype_map
+    def _get_table_columns(self, table_name: str, conn) -> List[str]:
+        """Lấy danh sách cột hiện có trong DB"""
+        inspector = inspect(conn)
+        columns = [col['name'] for col in inspector.get_columns(table_name)]
+        return columns
+    def _add_missing_columns(self, table_name: str, missing_cols: List[str], dtype_map: Dict, conn):
+        """Alter table để thêm cột thiếu (Schema Evolution)"""
+        for col in missing_cols:
+            col_type = dtype_map.get(col, NVARCHAR(255))
+            # SQLAlchemy type to string conversion logic simplified
+            type_str = "NVARCHAR(MAX)" # Default safe fallback
+            if isinstance(col_type, FLOAT): type_str = "FLOAT"
+            elif isinstance(col_type, BIGINT): type_str = "BIGINT"
+            elif isinstance(col_type, DATETIME): type_str = "DATETIME"
+            elif isinstance(col_type, DATE): type_str = "DATE"
+            alter_sql = f"ALTER TABLE [{table_name}] ADD [{col}] {type_str}"
+            conn.execute(text(alter_sql))
+            logger.info(f"Auto-evolve: Added column '{col}' to table '{table_name}'")
+    def upsert_data(self,
+                    df: pd.DataFrame,
+                    target_table: str,
+                    match_columns: List[str],
+                    conflict_strategy: Literal['last', 'skip'] = 'last',
+                    auto_evolve_schema: bool = False):
+        """
+        Hàm Upsert mạnh mẽ.
+        Args:
+            df: DataFrame cần upload.
+            target_table: Tên bảng đích.
+            match_columns: Danh sách cột dùng làm Key so khớp (Primary Key).
+            conflict_strategy:
+                - 'last': Update ghi đè dữ liệu mới vào dòng cũ (Default).
+                - 'skip': Nếu trùng key thì bỏ qua, không update.
+            auto_evolve_schema:
+                - True: Tự động thêm cột vào DB nếu DF có cột mới.
+                - False: Bỏ qua các cột trong DF mà DB không có (Strict Schema).
+        """
+        if df.empty:
+            logger.warning(f"DataFrame for {target_table} is empty. Skip.")
+            return
+        # 1. Map Unicode Types
+        dtype_mapping = self._generate_dtype_mapping(df)
+        # 2. Staging Table Name
+        staging_table = f"##Staging_{uuid.uuid4().hex[:8]}"
+        try:
+            with self.engine.begin() as conn:
+                # --- A. Kiểm tra Schema & Table ---
+                inspector = inspect(conn)
+                if not inspector.has_table(target_table):
+                    logger.info(f"Table {target_table} not found. Creating new...")
+                    df.to_sql(target_table, conn, index=False, dtype=dtype_mapping)
+                    # Tạo Primary Key nếu cần
+                    if match_columns:
+                        pk_str = ", ".join([f"[{c}]" for c in match_columns])
+                        try:
+                            conn.execute(text(f"ALTER TABLE [{target_table}] ADD CONSTRAINT PK_{target_table.replace('.','_')}_{uuid.uuid4().hex[:4]} PRIMARY KEY ({pk_str})"))
+                        except Exception as e:
+                            logger.warning(f"Could not create PK: {e}")
+                    return
+                # --- B. Xử lý Schema Evolution ---
+                db_cols = self._get_table_columns(target_table, conn)
+                df_cols = list(df.columns)
+                # Tìm cột có trong DF mà không có trong DB
+                new_cols = [c for c in df_cols if c not in db_cols]
+                if new_cols:
+                    if auto_evolve_schema:
+                        self._add_missing_columns(target_table, new_cols, dtype_mapping, conn)
+                        db_cols.extend(new_cols) # Update danh sách cột DB
+                    else:
+                        # Nếu không auto evolve, chỉ giữ lại các cột khớp với DB
+                        valid_cols = [c for c in df_cols if c in db_cols]
+                        if len(valid_cols) < len(df_cols):
+                            logger.warning(f"Schema strict: Dropping columns {new_cols} because they are not in DB.")
+                            df = df[valid_cols]
+                # --- C. Đẩy vào Staging (Fast Executemany) ---
+                df.to_sql(
+                    name=staging_table,
+                    con=conn,
+                    if_exists='replace',
+                    index=False,
+                    dtype=dtype_mapping
+                )
+                # --- D. Thực hiện MERGE ---
+                # Chỉ lấy các cột chung giữa DF và DB để Merge (tránh lỗi cột không tồn tại)
+                common_cols = [c for c in df.columns if c in db_cols]
+                on_clause = " AND ".join([f"Target.[{col}] = Source.[{col}]" for col in match_columns])
+                # Logic Insert
+                insert_cols = ", ".join([f"[{col}]" for col in common_cols])
+                insert_vals = ", ".join([f"Source.[{col}]" for col in common_cols])
+                # Logic Update
+                merge_sql = ""
+                # Trường hợp 1: Update ('last')
+                if conflict_strategy == 'last':
+                    update_cols = [c for c in common_cols if c not in match_columns]
+                    if update_cols:
+                        update_set = ", ".join([f"Target.[{col}] = Source.[{col}]" for col in update_cols])
+                        merge_sql = f"""
+                        MERGE [{target_table}] AS Target
+                        USING {staging_table} AS Source
+                        ON {on_clause}
+                        WHEN MATCHED THEN
+                            UPDATE SET {update_set}
+                        WHEN NOT MATCHED BY TARGET THEN
+                            INSERT ({insert_cols}) VALUES ({insert_vals});
+                        """
+                    else:
+                        # Nếu chỉ có cột PK, không có gì để update -> Chỉ Insert if not exists
+                        merge_sql = f"""
+                        MERGE [{target_table}] AS Target
+                        USING {staging_table} AS Source
+                        ON {on_clause}
+                        WHEN NOT MATCHED BY TARGET THEN
+                            INSERT ({insert_cols}) VALUES ({insert_vals});
+                        """
+                # Trường hợp 2: Skip (Chỉ Insert, không Update)
+                elif conflict_strategy == 'skip':
+                    merge_sql = f"""
+                    MERGE [{target_table}] AS Target
+                    USING {staging_table} AS Source
+                    ON {on_clause}
+                    WHEN NOT MATCHED BY TARGET THEN
+                        INSERT ({insert_cols}) VALUES ({insert_vals});
+                    """
+                conn.execute(text(merge_sql))
+                conn.execute(text(f"DROP TABLE IF EXISTS {staging_table}"))
+                logger.info(f"Upserted {len(df)} rows to {target_table} (Strategy: {conflict_strategy})")
+        except Exception as e:
+            logger.error(f"Upsert failed for {target_table}: {e}")
+            raise e
+    def dispose(self):
+        self.engine.dispose()

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/src/sqlServerConnector.egg-info/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: sqlServerConnector
-Version: 0.1.6
+Version: 0.1.8
 Summary: A custom SQL Server Connector for ETL processes with Pandas
 Author-email: Nguyen Minh Son <nguyen.minhson1511@gmail.com>
 Project-URL: Homepage, https://github.com/johnnyb1509/sqlServerConnector
@@ -20,8 +20,8 @@ Requires-Dist: jupyterlab
 # SQL Server Connector
 Thư viện kết nối SQL Server chuyên dụng cho các tác vụ ETL, được tối ưu hóa cho **Pandas**, hỗ trợ **Tiếng Việt (Unicode)** và **Upsert (Merge)** hiệu năng cao.
-## Update 0.1.6
-> Sửa lỗi nhỏ liên quan đến việc upsert với các bảng có cột chứa Tiếng việt
+## Update 0.1.7
+> Sửa lỗi nhỏ liên quan đến việc upsert với các bảng có cột chứa Tiếng Việt
 ## 🚀 Tính năng nổi bật
@@ -69,9 +69,10 @@ db_info:
 ## 📝 Hướng dẫn sử dụng nhanh
 1. Khởi tạo kết nối
-```python
+```python
 import yaml
 from connector import SQLServerConnector
 # Load config
 with open('config/db_config.yaml', 'r') as f:
     conf = yaml.safe_load(f)['db_info']
@@ -88,46 +89,68 @@ db = SQLServerConnector(
 2. Lấy dữ liệu (Read)
 ```python
 # Cách 1: Lấy toàn bộ bảng
-df = db.get_data("DM_KhachHang")
+df = db.get_data("SELECT * FROM DM_KhachHang")
-# Cách 2: Dùng câu lệnh SQL tùy ý
+# Cách 2: Dùng câu lệnh SQL tuy bien
 query = """
-    SELECT TOP 100 * FROM Sales_Transaction
-    WHERE created_date >= '2023-01-01'
+    SELECT TOP 100 * FROM Sales_Transaction
+    WHERE created_date >= :from_date
 """
-df_sales = db.get_data(query)
+df_sales = db.get_data(query, params={"from_date": "2023-01-01"})
 print(df_sales.head())
 ```
-3. Ghi dữ liệu (Upsert)
+3. Kiểm tra bảng tồn tại
+```python
+if not db.check_table_exists("Fact_Sales"):
+    print("Bang Fact_Sales chua ton tai")
+```
+4. Ghi du lieu (Upsert)
 ```python
 import pandas as pd
-# Giả lập dữ liệu
+# Gia lap du lieu
 data = {
     'TransactionID': [101, 102],
-    'Product': ['Laptop Dell', 'Chuột Logitech'], # Hỗ trợ tiếng Việt
+    'Product': ['Laptop Dell', 'Chuot Logitech'],  # Ho tro tieng Viet
     'Amount': [15000000, 250000]
 }
 df_new = pd.DataFrame(data)
-# Đẩy vào DB
+# Day vao DB
 db.upsert_data(
     df=df_new,
     target_table="Fact_Sales",
-    primary_key="TransactionID",  # Cột dùng để định danh (tránh trùng lặp)
-    auto_evolve_schema=True       # Tự động thêm cột nếu thiếu
+    match_columns=["TransactionID"],  # Khoa so khop (Primary Key)
+    conflict_strategy="last",         # "last" hoac "skip"
+    auto_evolve_schema=True            # Tu dong them cot neu thieu
+)
+print("Du lieu da duoc upsert thanh cong!")
+```
+5. Thuc thi cau lenh khong tra ve du lieu
+```python
+# Vi du: xoa du lieu cu
+db.execute_query(
+    "DELETE FROM Fact_Sales WHERE created_date < :cutoff",
+    params={"cutoff": "2023-01-01"}
 )
-print("Dữ liệu đã được upsert thành công!")
 ```
-4. Đóng kết nối
+6. Dong ket noi
 ```python
-# Luôn đóng kết nối khi hoàn tất để giải phóng tài nguyên
+# Luon dong ket noi khi hoan tat de giai phong tai nguyen
 db.dispose()
 ```
 ## ⚠️ Lưu ý quan trọng
-1. **Primary Key:** Khi dùng upsert_data, bắt buộc phải cung cấp primary_key. Nếu bảng chưa có Primary Key, thư viện sẽ tự set cột đó làm khóa chính khi tạo bảng mới.
+1. **Primary Key:** Khi dùng upsert_data, bat buoc phai cung cap `match_columns`. Neu bang chua co Primary Key, thu vien se co gang set cac cot nay lam khoa chinh khi tao bang moi.
 2. **Date Time:** Các cột ngày tháng nên được convert sang datetime64[ns] trong Pandas trước khi đẩy vào để đảm bảo tính chính xác.
+3. **Upgrade version:** Luôn kiểm tra và cập nhật lên phiên bản mới nhất để tận dụng các tính năng và sửa lỗi mới nhất. For developer, change version in `pyproject.toml` and build & upload to PyPI:
+```bash
+python -m build
+python -m twine upload dist/*
+```

sqlserverconnector-0.1.6/src/connector.py DELETED Viewed

@@ -1,192 +0,0 @@
-import os
-import pandas as pd
-import numpy as np
-import uuid
-import sqlalchemy
-from typing import List, Optional, Dict, Union, Any
-from loguru import logger
-from sqlalchemy import create_engine, text, URL, inspect
-from sqlalchemy.types import NVARCHAR, FLOAT, INTEGER, DATE, DATETIME, BIGINT
-from sqlalchemy.exc import SQLAlchemyError
-class SQLServerConnector:
-    """
-    Trình kết nối SQL Server tối ưu cho ETL (Extract-Transform-Load).
-    Tính năng:
-    - Hỗ trợ Upsert (Merge) hiệu năng cao qua bảng tạm.
-    - Hỗ trợ Unicode (Tiếng Việt) tự động bằng NVARCHAR.
-    - Tự động quản lý Schema và Primary Key.
-    """
-    def __init__(self, server: str, database: str, username: str, password: str, driver: str = 'ODBC Driver 17 for SQL Server'):
-        self.server = server
-        self.database = database
-        self.username = username
-        self.password = password
-        self.driver = driver
-        # Tạo URL kết nối
-        self.connection_url = URL.create(
-            "mssql+pyodbc",
-            query={
-                "odbc_connect": (
-                    f"DRIVER={self.driver};"
-                    f"SERVER={self.server};"
-                    f"DATABASE={self.database};"
-                    f"UID={self.username};"
-                    f"PWD={self.password};"
-                    "Encrypt=no;TrustServerCertificate=yes;" # Cấu hình SSL linh hoạt
-                )
-            }
-        )
-        # Tạo Engine với fast_executemany=True để tăng tốc độ Insert/Upsert
-        self.engine = create_engine(
-            self.connection_url,
-            fast_executemany=True, # QUAN TRỌNG: Tăng tốc độ ghi gấp nhiều lần
-            pool_pre_ping=True     # Tự động kết nối lại nếu mất kết nối
-        )
-    def get_data(self, query: str, params: Optional[Dict] = None, chunksize: Optional[int] = None) -> Union[pd.DataFrame, Any]:
-        """
-        Thực thi câu lệnh SELECT và trả về DataFrame.
-        """
-        try:
-            with self.engine.connect() as conn:
-                # Dùng text() để đảm bảo tương thích SQLAlchemy 2.0
-                sql_query = text(query)
-                return pd.read_sql(sql_query, conn, params=params, chunksize=chunksize)
-        except Exception as e:
-            logger.error(f"Failed to retrieve data: {e}")
-            raise
-    def execute_query(self, query: str, params: Optional[Dict] = None):
-        """Thực thi câu lệnh không trả về dữ liệu (UPDATE, DELETE, SP...)."""
-        try:
-            with self.engine.begin() as conn: # Tự động commit
-                conn.execute(text(query), params or {})
-        except Exception as e:
-            logger.error(f"Failed to execute query: {e}")
-            raise
-    def _generate_dtype_mapping(self, df: pd.DataFrame) -> Dict:
-        """
-        Tự động tạo mapping kiểu dữ liệu cho SQL.
-        QUAN TRỌNG: Map tất cả cột string/object sang NVARCHAR để hỗ trợ Tiếng Việt.
-        """
-        dtype_map = {}
-        for col in df.columns:
-            # Nếu là chuỗi -> NVARCHAR (hỗ trợ Unicode)
-            if df[col].dtype == 'object' or pd.api.types.is_string_dtype(df[col]):
-                # Tính độ dài max thực tế để tối ưu, hoặc để None (NVARCHAR(MAX))
-                max_len = df[col].astype(str).map(len).max()
-                if pd.isna(max_len) or max_len == 0:
-                    length = 255
-                else:
-                    length = int(max_len * 1.5) + 50 # Buffer thêm
-                    if length > 4000: length = None # NVARCHAR(MAX)
-                dtype_map[col] = NVARCHAR(length=length)
-            # Nếu là ngày tháng
-            elif pd.api.types.is_datetime64_any_dtype(df[col]):
-                dtype_map[col] = DATETIME()
-            # Số thực
-            elif pd.api.types.is_float_dtype(df[col]):
-                dtype_map[col] = FLOAT()
-            # Số nguyên
-            elif pd.api.types.is_integer_dtype(df[col]):
-                dtype_map[col] = BIGINT()
-        return dtype_map
-    def upsert_data(self, df: pd.DataFrame, table_name: str, pk_cols: List[str]):
-        """
-        Thực hiện Upsert (Insert hoặc Update) dữ liệu vào bảng SQL Server.
-        Sử dụng cơ chế Bảng Tạm (Staging Table) + MERGE Statement.
-        """
-        if df.empty:
-            logger.warning(f"DataFrame for table {table_name} is empty. Skipping upsert.")
-            return
-        # 1. Chuẩn bị tên bảng tạm
-        staging_table = f"##Staging_{uuid.uuid4().hex[:8]}"
-        # 2. Tạo mapping kiểu dữ liệu (Fix lỗi Unicode)
-        dtype_mapping = self._generate_dtype_mapping(df)
-        try:
-            with self.engine.begin() as conn:
-                # A. Đẩy dữ liệu vào bảng tạm (Staging)
-                # fast_executemany=True ở engine sẽ làm bước này cực nhanh
-                df.to_sql(
-                    name=staging_table,
-                    con=conn,
-                    if_exists='replace',
-                    index=False,
-                    dtype=dtype_mapping # Ép kiểu NVARCHAR tại đây
-                )
-                # B. Kiểm tra bảng đích có tồn tại không
-                inspector = inspect(conn)
-                if not inspector.has_table(table_name):
-                    logger.info(f"Table {table_name} does not exist. Creating from staging...")
-                    # Tạo bảng chính từ bảng tạm (Copy cấu trúc và dữ liệu)
-                    # Lưu ý: SELECT INTO sẽ tạo bảng mới
-                    create_sql = f"SELECT * INTO {table_name} FROM {staging_table}"
-                    conn.execute(text(create_sql))
-                    # Tạo Primary Key cho bảng mới
-                    if pk_cols:
-                        pk_str = ", ".join([f"[{c}]" for c in pk_cols])
-                        try:
-                            alter_pk = f"ALTER TABLE {table_name} ADD CONSTRAINT PK_{table_name}_{uuid.uuid4().hex[:4]} PRIMARY KEY ({pk_str})"
-                            conn.execute(text(alter_pk))
-                        except Exception as ex_pk:
-                            logger.warning(f"Could not create PK: {ex_pk}")
-                else:
-                    # C. Thực hiện MERGE (Upsert)
-                    # Lấy danh sách cột
-                    cols = [c for c in df.columns]
-                    # 1. Điều kiện ON (Primary Keys)
-                    on_clause = " AND ".join([f"Target.[{col}] = Source.[{col}]" for col in pk_cols])
-                    # 2. Điều kiện UPDATE (Các cột không phải PK)
-                    update_cols = [col for col in cols if col not in pk_cols]
-                    if update_cols:
-                        update_clause = ", ".join([f"Target.[{col}] = Source.[{col}]" for col in update_cols])
-                        matched_action = f"WHEN MATCHED THEN UPDATE SET {update_clause}"
-                    else:
-                        # Trường hợp bảng chỉ có PK (ít gặp), không làm gì khi match
-                        matched_action = ""
-                    # 3. Điều kiện INSERT (Tất cả cột)
-                    insert_cols_str = ", ".join([f"[{col}]" for col in cols])
-                    insert_vals_str = ", ".join([f"Source.[{col}]" for col in cols])
-                    merge_sql = f"""
-                    MERGE [{table_name}] AS Target
-                    USING {staging_table} AS Source
-                    ON {on_clause}
-                    {matched_action}
-                    WHEN NOT MATCHED BY TARGET THEN
-                        INSERT ({insert_cols_str})
-                        VALUES ({insert_vals_str});
-                    """
-                    conn.execute(text(merge_sql))
-                    logger.info(f"Upserted {len(df)} rows into {table_name}.")
-                # D. Xóa bảng tạm (Optional, vì temp table ## tự hủy khi đóng conn, nhưng xóa cho sạch)
-                conn.execute(text(f"DROP TABLE IF EXISTS {staging_table}"))
-        except Exception as e:
-            logger.error(f"Upsert failed for {table_name}: {e}")
-            raise
-    def dispose(self):
-        self.engine.dispose()

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/setup.cfg RENAMED Viewed

File without changes

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/src/__init__.py RENAMED Viewed

File without changes

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/src/sqlServerConnector.egg-info/SOURCES.txt RENAMED Viewed

File without changes

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/src/sqlServerConnector.egg-info/dependency_links.txt RENAMED Viewed

File without changes

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/src/sqlServerConnector.egg-info/requires.txt RENAMED Viewed

File without changes

{sqlserverconnector-0.1.6 → sqlserverconnector-0.1.8}/src/sqlServerConnector.egg-info/top_level.txt RENAMED Viewed

File without changes

sqlServerConnector 0.1.6__tar.gz → 0.1.8__tar.gz

sqlServerConnector 0.1.6tar.gz → 0.1.8tar.gz