sqlServerConnector 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,57 @@
1
+ Metadata-Version: 2.4
2
+ Name: sqlServerConnector
3
+ Version: 0.1.0
4
+ Summary: A custom SQL Server Connector for ETL processes with Pandas
5
+ Author-email: Nguyen Minh Son <nguyen.minhson1511@gmail.com>
6
+ Project-URL: Homepage, https://github.com/johnnyb1509/sqlServerConnector
7
+ Keywords: sql,etl,pandas,sqlalchemy
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.8
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: pandas>=1.5.0
13
+ Requires-Dist: numpy
14
+ Requires-Dist: sqlalchemy>=2.0.0
15
+ Requires-Dist: pyodbc
16
+ Requires-Dist: pyyaml
17
+ Requires-Dist: loguru
18
+ Requires-Dist: jupyterlab
19
+
20
+ # SQL Data Connector Project
21
+
22
+ Project này cung cấp module Python mạnh mẽ (`dbConnector`) để tương tác với Microsoft SQL Server, tối ưu hóa cho các tác vụ Data Engineering như: ETL, Insert dữ liệu lớn, và Đồng bộ hóa (Upsert) dữ liệu từ Pandas DataFrame.
23
+
24
+ ## Tính năng nổi bật
25
+
26
+ * **Tự động hóa cao**: Tự động tạo bảng, phát hiện kiểu dữ liệu, và thêm cột mới nếu DataFrame thay đổi.
27
+ * **Hiệu năng cao**: Sử dụng `fast_executemany=True` của SQLAlchemy/pyodbc để tăng tốc độ Insert.
28
+ * **An toàn dữ liệu**: Hỗ trợ Transaction (Commit/Rollback) để đảm bảo tính toàn vẹn dữ liệu.
29
+ * **Sync thông minh**: Hàm `check_and_update_table` giúp so sánh và chỉ update những dòng thay đổi, insert những dòng mới.
30
+ * **Tiện ích**: Hỗ trợ làm sạch dữ liệu số (ví dụ: convert "1.5M" thành 1,500,000).
31
+
32
+ ## Yêu cầu cài đặt
33
+
34
+ 1. **Hệ điều hành**: Windows, Linux, hoặc MacOS.
35
+ 2. **Driver**: Cần cài đặt **ODBC Driver 17 for SQL Server**.
36
+ * [Tải về tại đây (Microsoft)](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server)
37
+ 3. **Python Libraries**:
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ## Cấu trúc Project
43
+
44
+ * `src/dbConnector.py`: Module chính chứa class `dbJob`.
45
+ * `config/db_config.yaml`: File cấu hình database (cần tự tạo dựa trên mẫu).
46
+ * `notebooks/demo_usage.ipynb`: Ví dụ cách sử dụng.
47
+
48
+ ## Hướng dẫn sử dụng nhanh
49
+
50
+ ### 1. Cấu hình kết nối
51
+ Tạo file `config/db_config.yaml`:
52
+ ```yaml
53
+ db_info:
54
+ server: "localhost"
55
+ database: "MyDatabase"
56
+ username: "sa"
57
+ password: "mypassword"
@@ -0,0 +1,38 @@
1
+ # SQL Data Connector Project
2
+
3
+ Project này cung cấp module Python mạnh mẽ (`dbConnector`) để tương tác với Microsoft SQL Server, tối ưu hóa cho các tác vụ Data Engineering như: ETL, Insert dữ liệu lớn, và Đồng bộ hóa (Upsert) dữ liệu từ Pandas DataFrame.
4
+
5
+ ## Tính năng nổi bật
6
+
7
+ * **Tự động hóa cao**: Tự động tạo bảng, phát hiện kiểu dữ liệu, và thêm cột mới nếu DataFrame thay đổi.
8
+ * **Hiệu năng cao**: Sử dụng `fast_executemany=True` của SQLAlchemy/pyodbc để tăng tốc độ Insert.
9
+ * **An toàn dữ liệu**: Hỗ trợ Transaction (Commit/Rollback) để đảm bảo tính toàn vẹn dữ liệu.
10
+ * **Sync thông minh**: Hàm `check_and_update_table` giúp so sánh và chỉ update những dòng thay đổi, insert những dòng mới.
11
+ * **Tiện ích**: Hỗ trợ làm sạch dữ liệu số (ví dụ: convert "1.5M" thành 1,500,000).
12
+
13
+ ## Yêu cầu cài đặt
14
+
15
+ 1. **Hệ điều hành**: Windows, Linux, hoặc MacOS.
16
+ 2. **Driver**: Cần cài đặt **ODBC Driver 17 for SQL Server**.
17
+ * [Tải về tại đây (Microsoft)](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server)
18
+ 3. **Python Libraries**:
19
+ ```bash
20
+ pip install -r requirements.txt
21
+ ```
22
+
23
+ ## Cấu trúc Project
24
+
25
+ * `src/dbConnector.py`: Module chính chứa class `dbJob`.
26
+ * `config/db_config.yaml`: File cấu hình database (cần tự tạo dựa trên mẫu).
27
+ * `notebooks/demo_usage.ipynb`: Ví dụ cách sử dụng.
28
+
29
+ ## Hướng dẫn sử dụng nhanh
30
+
31
+ ### 1. Cấu hình kết nối
32
+ Tạo file `config/db_config.yaml`:
33
+ ```yaml
34
+ db_info:
35
+ server: "localhost"
36
+ database: "MyDatabase"
37
+ username: "sa"
38
+ password: "mypassword"
@@ -0,0 +1,34 @@
1
+ # File: pyproject.toml
2
+
3
+ [build-system]
4
+ requires = ["setuptools>=61.0"]
5
+ build-backend = "setuptools.build_meta"
6
+
7
+ [project]
8
+ name = "sqlServerConnector"
9
+ version = "0.1.0"
10
+ description = "A custom SQL Server Connector for ETL processes with Pandas"
11
+ readme = "README.md"
12
+ requires-python = ">=3.8"
13
+ authors = [
14
+ { name="Nguyen Minh Son", email="nguyen.minhson1511@gmail.com" },
15
+ ]
16
+ keywords = ["sql", "etl", "pandas", "sqlalchemy"]
17
+ classifiers = [
18
+ "Programming Language :: Python :: 3",
19
+ "Operating System :: OS Independent",
20
+ ]
21
+
22
+ # Tự động cài các thư viện này khi user cài package của bạn
23
+ dependencies = [
24
+ "pandas>=1.5.0",
25
+ "numpy",
26
+ "sqlalchemy>=2.0.0",
27
+ "pyodbc",
28
+ "pyyaml",
29
+ "loguru",
30
+ "jupyterlab"
31
+ ]
32
+
33
+ [project.urls]
34
+ "Homepage" = "https://github.com/johnnyb1509/sqlServerConnector"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,6 @@
1
+ # File: src/sql_etl_lib/__init__.py
2
+
3
+ from .connector import SQLServerConnector
4
+
5
+ # Khai báo những gì sẽ được public ra ngoài
6
+ __all__ = ["SQLServerConnector"]
@@ -0,0 +1,406 @@
1
+ import os
2
+ import numpy as np
3
+ import pandas as pd
4
+ import yaml
5
+ from typing import List, Optional, Dict, Union, Any
6
+ from loguru import logger
7
+ from sqlalchemy import create_engine, inspect, text, URL
8
+ from sqlalchemy.types import NVARCHAR, FLOAT, INTEGER, DATE, DATETIME, BIGINT
9
+ from sqlalchemy.exc import SQLAlchemyError
10
+
11
+ # ========================================================
12
+ # SQL SERVER CONNECTOR (Standardized ETL Object)
13
+ # Tech Stack: SQLAlchemy 2.0+, Pandas, PyODBC
14
+ # Unicode Support: YES (Vietnamese/UTF-8)
15
+ # ========================================================
16
+
17
+ class SQLServerConnector:
18
+ """
19
+ A robust, SQLAlchemy 2.0 compliant connector for SQL Server designed for ETL processes.
20
+
21
+ Features:
22
+ - High-performance Upserts (Merge) using Staging Tables.
23
+ - Full Unicode/Vietnamese Support (NVARCHAR + UTF8).
24
+ - Automatic Schema Evolution (adds missing columns).
25
+ - Automatic Primary Key detection and creation.
26
+ """
27
+
28
+ def __init__(self, server: str, database: str, username: str, password: str, driver: str = 'ODBC Driver 17 for SQL Server'):
29
+ self.server = server
30
+ self.database = database
31
+ self.username = username
32
+ self.password = password
33
+ self.driver = driver
34
+
35
+ # Connection URL Construction
36
+ # CRITICAL: 'fast_executemany' is required for proper Unicode handling in bulk inserts with PyODBC
37
+ self.connection_url = URL.create(
38
+ "mssql+pyodbc",
39
+ query={
40
+ "odbc_connect": (
41
+ f"DRIVER={self.driver};"
42
+ f"SERVER={self.server};"
43
+ f"DATABASE={self.database};"
44
+ f"UID={self.username};"
45
+ f"PWD={self.password};"
46
+ "Charsets=UTF-8;" # Explicitly request UTF-8
47
+ ),
48
+ "fast_executemany": "True"
49
+ }
50
+ )
51
+
52
+ # Create Engine
53
+ self.engine = create_engine(
54
+ self.connection_url,
55
+ pool_pre_ping=True,
56
+ pool_size=20,
57
+ max_overflow=10
58
+ )
59
+
60
+ def get_engine(self):
61
+ """Returns the SQLAlchemy engine object."""
62
+ return self.engine
63
+
64
+ def close(self):
65
+ """Alias for dispose(). Closes all connections in the pool."""
66
+ self.dispose()
67
+
68
+ def dispose(self):
69
+ """Dispose of the engine and close all connections."""
70
+ self.engine.dispose()
71
+ logger.info("Database engine disposed and connections closed.")
72
+
73
+ # ========================================================
74
+ # SCHEMA & METADATA METHODS
75
+ # ========================================================
76
+
77
+ def get_table_names(self) -> List[str]:
78
+ """Retrieve all table names in the database."""
79
+ try:
80
+ inspector = inspect(self.engine)
81
+ return inspector.get_table_names()
82
+ except SQLAlchemyError as e:
83
+ logger.error(f"Error retrieving table names: {e}")
84
+ return []
85
+
86
+ def check_table_exists(self, table_name: str) -> bool:
87
+ """Check if a specific table exists."""
88
+ return inspect(self.engine).has_table(table_name)
89
+
90
+ def get_primary_key(self, table_name: str) -> Optional[str]:
91
+ """Retrieve the primary key column name for a table."""
92
+ try:
93
+ inspector = inspect(self.engine)
94
+ pk_constraint = inspector.get_pk_constraint(table_name)
95
+ if pk_constraint and pk_constraint['constrained_columns']:
96
+ return pk_constraint['constrained_columns'][0]
97
+
98
+ # Fallback heuristic
99
+ columns = [col['name'] for col in inspector.get_columns(table_name)]
100
+ for candidate in ['id_date', 'id', 'ID', 'Date', 'date']:
101
+ if candidate in columns:
102
+ return candidate
103
+ return None
104
+ except Exception as e:
105
+ logger.warning(f"Could not inspect PK for {table_name}: {e}")
106
+ return None
107
+
108
+ def get_columns_info(self, table_name: str) -> Dict[str, str]:
109
+ """Get column names and their SQL types."""
110
+ inspector = inspect(self.engine)
111
+ columns = inspector.get_columns(table_name)
112
+ return {col['name']: str(col['type']) for col in columns}
113
+
114
+ # ========================================================
115
+ # DATA RETRIEVAL METHODS
116
+ # ========================================================
117
+
118
+ def get_data(self, query_or_table: str, params: Optional[Dict] = None) -> pd.DataFrame:
119
+ """
120
+ Execute a raw SQL query or fetch a whole table.
121
+ Args:
122
+ query_or_table: SQL Query string OR Table Name.
123
+ params: Dictionary of parameters for the query.
124
+ """
125
+ if "SELECT" not in query_or_table.upper() and " " not in query_or_table:
126
+ query = text(f"SELECT * FROM {query_or_table}")
127
+ else:
128
+ query = text(query_or_table)
129
+
130
+ try:
131
+ with self.engine.connect() as conn:
132
+ df = pd.read_sql(query, conn, params=params)
133
+ return df
134
+ except Exception as e:
135
+ logger.error(f"Error fetching data: {e}")
136
+ return pd.DataFrame()
137
+
138
+ # ========================================================
139
+ # CORE ETL METHODS (Upsert Logic)
140
+ # ========================================================
141
+
142
+ def upsert_data(self,
143
+ df: pd.DataFrame,
144
+ target_table: str,
145
+ primary_key: str = 'id_date',
146
+ match_columns: Optional[List[str]] = None,
147
+ auto_evolve_schema: bool = True):
148
+ """
149
+ Main ETL Function with Unicode Support.
150
+
151
+ Args:
152
+ df: The new data to push.
153
+ target_table: The SQL table name.
154
+ primary_key: The Database Primary Key.
155
+ match_columns: Columns to match on (e.g. ['Ticker', 'Date']) for detecting updates.
156
+ auto_evolve_schema: If True, adds missing columns to SQL automatically.
157
+ """
158
+ if df.empty:
159
+ logger.warning(f"Dataframe for {target_table} is empty. Skipping.")
160
+ return
161
+
162
+ # 1. PRE-PROCESS DATA
163
+ df_clean = self._sanitize_dataframe(df)
164
+
165
+ # 2. CHECK TARGET TABLE
166
+ if not self.check_table_exists(target_table):
167
+ logger.info(f"Table {target_table} does not exist. Creating new table.")
168
+ self._create_table_from_df(df_clean, target_table, primary_key)
169
+ return
170
+
171
+ # 3. SCHEMA EVOLUTION
172
+ if auto_evolve_schema:
173
+ self._sync_columns(df_clean, target_table)
174
+
175
+ # 4. DETERMINE MATCHING LOGIC
176
+ if match_columns:
177
+ join_keys = match_columns
178
+ elif primary_key in df_clean.columns:
179
+ join_keys = [primary_key]
180
+ else:
181
+ logger.error(f"CRITICAL: Primary Key '{primary_key}' is missing from DataFrame (likely Auto-Increment).")
182
+ logger.error("You MUST provide 'match_columns' to identify which rows to update.")
183
+ raise ValueError("Missing match keys for Identity Column Upsert.")
184
+
185
+ # 5. EXECUTE UPSERT VIA STAGING
186
+ self._execute_merge_upsert(df_clean, target_table, join_keys)
187
+
188
+ def _execute_merge_upsert(self, df: pd.DataFrame, target_table: str, join_keys: List[str]):
189
+ """Internal: Uploads to a temp table and runs a SQL MERGE."""
190
+ staging_table = f"##staging_{target_table}"
191
+
192
+ with self.engine.begin() as conn:
193
+ try:
194
+ # A. Upload to Staging
195
+ # IMPORTANT: We use NVARCHAR mapping implicitly here via pandas to_sql,
196
+ # but explicit dtype mapping is safer for Unicode preservation.
197
+ dtype_map = {}
198
+ for col in df.columns:
199
+ if df[col].dtype == 'object':
200
+ dtype_map[col] = NVARCHAR(None) # Force Unicode (NVARCHAR) for all strings
201
+
202
+ df.to_sql(staging_table, conn, if_exists='replace', index=False, chunksize=5000, dtype=dtype_map)
203
+
204
+ # B. Build Dynamic SQL
205
+ source_cols = [col for col in df.columns]
206
+
207
+ # Join Condition: Target.Key = Source.Key AND ...
208
+ on_clause = " AND ".join([f"Target.[{k}] = Source.[{k}]" for k in join_keys])
209
+
210
+ # Update Clause
211
+ update_stmts = [f"Target.[{col}] = Source.[{col}]" for col in source_cols
212
+ if col not in join_keys]
213
+
214
+ # Insert Clause
215
+ insert_cols_str = ", ".join([f"[{col}]" for col in source_cols])
216
+ insert_vals_str = ", ".join([f"Source.[{col}]" for col in source_cols])
217
+
218
+ # C. Construct MERGE Query
219
+ # Notice the N prefix is usually for literals, but since we are copying column-to-column
220
+ # from a staging table that is ALREADY NVARCHAR, we don't need N'' prefixes here.
221
+ if not update_stmts:
222
+ merge_query = f"""
223
+ MERGE [{target_table}] AS Target
224
+ USING [{staging_table}] AS Source
225
+ ON ({on_clause})
226
+ WHEN NOT MATCHED BY TARGET THEN
227
+ INSERT ({insert_cols_str}) VALUES ({insert_vals_str});
228
+ """
229
+ else:
230
+ merge_query = f"""
231
+ MERGE [{target_table}] AS Target
232
+ USING [{staging_table}] AS Source
233
+ ON ({on_clause})
234
+ WHEN MATCHED THEN
235
+ UPDATE SET {", ".join(update_stmts)}
236
+ WHEN NOT MATCHED BY TARGET THEN
237
+ INSERT ({insert_cols_str}) VALUES ({insert_vals_str});
238
+ """
239
+
240
+ conn.execute(text(merge_query))
241
+ logger.success(f"Upsert successful for {target_table}. Matched on {join_keys}.")
242
+
243
+ conn.execute(text(f"DROP TABLE [{staging_table}]"))
244
+
245
+ except SQLAlchemyError as e:
246
+ logger.error(f"Upsert failed for {target_table}: {e}")
247
+ raise
248
+
249
+ # ========================================================
250
+ # HELPER: SCHEMA & CREATION
251
+ # ========================================================
252
+
253
+ def _sync_columns(self, df: pd.DataFrame, table_name: str):
254
+ """Add missing columns to the SQL table."""
255
+ db_cols = self.get_columns_info(table_name)
256
+ existing_cols_lower = {k.lower() for k in db_cols.keys()}
257
+
258
+ new_cols = [col for col in df.columns if col.lower() not in existing_cols_lower]
259
+
260
+ if new_cols:
261
+ logger.info(f"Schema Evolution: Adding {len(new_cols)} new columns to {table_name}.")
262
+ with self.engine.connect() as conn:
263
+ for col in new_cols:
264
+ dtype = df[col].dtype
265
+ # VIETNAMESE SUPPORT: Default to NVARCHAR(MAX) for new string columns
266
+ sql_type = "NVARCHAR(MAX)"
267
+
268
+ if pd.api.types.is_integer_dtype(dtype):
269
+ sql_type = "BIGINT"
270
+ elif pd.api.types.is_float_dtype(dtype):
271
+ sql_type = "FLOAT"
272
+ elif pd.api.types.is_datetime64_any_dtype(dtype):
273
+ sql_type = "DATETIME"
274
+
275
+ try:
276
+ conn.execute(text(f"ALTER TABLE [{table_name}] ADD [{col}] {sql_type} NULL"))
277
+ conn.commit()
278
+ except Exception as e:
279
+ logger.warning(f"Failed to add column {col}: {e}")
280
+
281
+ def _create_table_from_df(self, df: pd.DataFrame, table_name: str, primary_key: Optional[str] = None):
282
+ """Create a new table with Unicode support (NVARCHAR)."""
283
+ dtype_map = {}
284
+ for col in df.columns:
285
+ # VIETNAMESE SUPPORT: Explicitly map all object columns to NVARCHAR
286
+ if df[col].dtype == 'object':
287
+ dtype_map[col] = NVARCHAR(None) # None = MAX
288
+
289
+ df.to_sql(table_name, self.engine, index=False, dtype=dtype_map)
290
+
291
+ if primary_key:
292
+ if primary_key in df.columns:
293
+ pk_dtype = df[primary_key].dtype
294
+ self.set_primary_key(table_name, primary_key, source_dtype=pk_dtype)
295
+ else:
296
+ logger.warning(f"Skipping PK creation: Column '{primary_key}' not found in new data.")
297
+
298
+ def set_primary_key(self, table_name: str, column_name: str, source_dtype=None):
299
+ """Sets a primary key with type detection."""
300
+ sql_type = "INT"
301
+ if source_dtype is not None:
302
+ if pd.api.types.is_integer_dtype(source_dtype):
303
+ sql_type = "BIGINT"
304
+ elif pd.api.types.is_float_dtype(source_dtype):
305
+ sql_type = "BIGINT"
306
+ elif pd.api.types.is_string_dtype(source_dtype):
307
+ # VIETNAMESE SUPPORT: PKs that are strings must also be NVARCHAR
308
+ sql_type = "NVARCHAR(450)"
309
+ elif pd.api.types.is_datetime64_any_dtype(source_dtype):
310
+ sql_type = "DATE"
311
+
312
+ with self.engine.connect() as conn:
313
+ with conn.begin():
314
+ try:
315
+ conn.execute(text(f"ALTER TABLE [{table_name}] ALTER COLUMN [{column_name}] {sql_type} NOT NULL"))
316
+ conn.execute(text(f"ALTER TABLE [{table_name}] ADD PRIMARY KEY ([{column_name}])"))
317
+ logger.info(f"Primary key set on {table_name}.{column_name}")
318
+ except SQLAlchemyError as e:
319
+ logger.error(f"Failed to set PK on {table_name}: {e}")
320
+
321
+ # ========================================================
322
+ # HELPER: DATA CLEANING
323
+ # ========================================================
324
+
325
+ def _sanitize_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
326
+ """Cleans numeric strings, NaT, and NaN values."""
327
+ df = df.copy()
328
+
329
+ # 1. Clean Numeric Strings
330
+ for col in df.select_dtypes(include=['object']).columns:
331
+ # Only try to convert to float if it looks like a number (digits or currency symbols)
332
+ # Avoid trying to convert Vietnamese text columns
333
+ sample = df[col].dropna().head(10).astype(str).tolist()
334
+ if any(any(char.isdigit() for char in str(x)) for x in sample):
335
+ try:
336
+ # Attempt conversion, but strictly ignore errors so we don't break text columns
337
+ # We use a temp series to check if conversion is successful for majority
338
+ temp = df[col].apply(self._clean_numeric_string)
339
+ # If the column was actually text (e.g., "Hà Nội"), _clean_numeric_string returns the original string.
340
+ # We trust _clean_numeric_string to be safe.
341
+ df[col] = temp
342
+ except:
343
+ pass
344
+
345
+ # 2. Clean Dates (NaT -> None)
346
+ for col in df.select_dtypes(include=['datetime', 'datetimetz']).columns:
347
+ df[col] = df[col].replace({pd.NaT: None})
348
+ df[col] = df[col].astype(object).where(df[col].notnull(), None)
349
+
350
+ # 3. Clean NaN -> None
351
+ df = df.replace({np.nan: None})
352
+ df = df.where(pd.notnull(df), None)
353
+ return df
354
+
355
+ @staticmethod
356
+ def _clean_numeric_string(value):
357
+ """Convert '2.5B', '1,000' to float. Safe for Vietnamese text."""
358
+ if pd.isna(value) or value is None: return None
359
+ if isinstance(value, (int, float)): return value
360
+
361
+ s = str(value).strip().upper()
362
+ if not s: return None
363
+
364
+ # Heuristic: If it contains many letters (excluding K,M,B,T for billions), it's probably text
365
+ # Count letters
366
+ alpha_count = sum(c.isalpha() for c in s)
367
+ if alpha_count > 1 and s[-1] not in ['K', 'M', 'B', 'T']:
368
+ return value # It's likely text (e.g. "Cổ phiếu")
369
+
370
+ # Clean common financial chars
371
+ s_clean = s.replace(',', '').replace('%', '')
372
+
373
+ multipliers = {'K': 1e3, 'M': 1e6, 'B': 1e9, 'T': 1e12}
374
+ if s_clean and s_clean[-1] in multipliers:
375
+ try:
376
+ return float(s_clean[:-1]) * multipliers[s_clean[-1]]
377
+ except ValueError:
378
+ return value
379
+
380
+ try:
381
+ return float(s_clean)
382
+ except ValueError:
383
+ return value
384
+
385
+ # ========================================================
386
+ # ENTRY POINT
387
+ # ========================================================
388
+
389
+ # def get_db_connector(yaml_path: Optional[str] = None, env_prefix: str = "DB") -> SQLServerConnector:
390
+ # """Factory function to initialize connector."""
391
+ # if yaml_path and os.path.exists(yaml_path):
392
+ # with open(yaml_path, 'r') as f:
393
+ # config = yaml.safe_load(f).get('db_info', {})
394
+ # return SQLServerConnector(
395
+ # config.get('server'),
396
+ # config.get('database'),
397
+ # config.get('username'),
398
+ # config.get('password')
399
+ # )
400
+ # else:
401
+ # return SQLServerConnector(
402
+ # os.environ.get(f'{env_prefix}_SERVER'),
403
+ # os.environ.get(f'{env_prefix}_NAME'),
404
+ # os.environ.get(f'{env_prefix}_USER'),
405
+ # os.environ.get(f'{env_prefix}_PASS')
406
+ # )
@@ -0,0 +1,57 @@
1
+ Metadata-Version: 2.4
2
+ Name: sqlServerConnector
3
+ Version: 0.1.0
4
+ Summary: A custom SQL Server Connector for ETL processes with Pandas
5
+ Author-email: Nguyen Minh Son <nguyen.minhson1511@gmail.com>
6
+ Project-URL: Homepage, https://github.com/johnnyb1509/sqlServerConnector
7
+ Keywords: sql,etl,pandas,sqlalchemy
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Operating System :: OS Independent
10
+ Requires-Python: >=3.8
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: pandas>=1.5.0
13
+ Requires-Dist: numpy
14
+ Requires-Dist: sqlalchemy>=2.0.0
15
+ Requires-Dist: pyodbc
16
+ Requires-Dist: pyyaml
17
+ Requires-Dist: loguru
18
+ Requires-Dist: jupyterlab
19
+
20
+ # SQL Data Connector Project
21
+
22
+ Project này cung cấp module Python mạnh mẽ (`dbConnector`) để tương tác với Microsoft SQL Server, tối ưu hóa cho các tác vụ Data Engineering như: ETL, Insert dữ liệu lớn, và Đồng bộ hóa (Upsert) dữ liệu từ Pandas DataFrame.
23
+
24
+ ## Tính năng nổi bật
25
+
26
+ * **Tự động hóa cao**: Tự động tạo bảng, phát hiện kiểu dữ liệu, và thêm cột mới nếu DataFrame thay đổi.
27
+ * **Hiệu năng cao**: Sử dụng `fast_executemany=True` của SQLAlchemy/pyodbc để tăng tốc độ Insert.
28
+ * **An toàn dữ liệu**: Hỗ trợ Transaction (Commit/Rollback) để đảm bảo tính toàn vẹn dữ liệu.
29
+ * **Sync thông minh**: Hàm `check_and_update_table` giúp so sánh và chỉ update những dòng thay đổi, insert những dòng mới.
30
+ * **Tiện ích**: Hỗ trợ làm sạch dữ liệu số (ví dụ: convert "1.5M" thành 1,500,000).
31
+
32
+ ## Yêu cầu cài đặt
33
+
34
+ 1. **Hệ điều hành**: Windows, Linux, hoặc MacOS.
35
+ 2. **Driver**: Cần cài đặt **ODBC Driver 17 for SQL Server**.
36
+ * [Tải về tại đây (Microsoft)](https://learn.microsoft.com/en-us/sql/connect/odbc/download-odbc-driver-for-sql-server)
37
+ 3. **Python Libraries**:
38
+ ```bash
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ## Cấu trúc Project
43
+
44
+ * `src/dbConnector.py`: Module chính chứa class `dbJob`.
45
+ * `config/db_config.yaml`: File cấu hình database (cần tự tạo dựa trên mẫu).
46
+ * `notebooks/demo_usage.ipynb`: Ví dụ cách sử dụng.
47
+
48
+ ## Hướng dẫn sử dụng nhanh
49
+
50
+ ### 1. Cấu hình kết nối
51
+ Tạo file `config/db_config.yaml`:
52
+ ```yaml
53
+ db_info:
54
+ server: "localhost"
55
+ database: "MyDatabase"
56
+ username: "sa"
57
+ password: "mypassword"
@@ -0,0 +1,9 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/__init__.py
4
+ src/connector.py
5
+ src/sqlServerConnector.egg-info/PKG-INFO
6
+ src/sqlServerConnector.egg-info/SOURCES.txt
7
+ src/sqlServerConnector.egg-info/dependency_links.txt
8
+ src/sqlServerConnector.egg-info/requires.txt
9
+ src/sqlServerConnector.egg-info/top_level.txt
@@ -0,0 +1,7 @@
1
+ pandas>=1.5.0
2
+ numpy
3
+ sqlalchemy>=2.0.0
4
+ pyodbc
5
+ pyyaml
6
+ loguru
7
+ jupyterlab
@@ -0,0 +1,2 @@
1
+ __init__
2
+ connector