ddeutil-workflow 0.0.4__py3-none-any.whl → 0.0.5__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: ddeutil-workflow
3
- Version: 0.0.4
3
+ Version: 0.0.5
4
4
  Summary: Data Developer & Engineer Workflow Utility Objects
5
5
  Author-email: ddeutils <korawich.anu@gmail.com>
6
6
  License: MIT
@@ -9,7 +9,7 @@ Project-URL: Source Code, https://github.com/ddeutils/ddeutil-workflow/
9
9
  Keywords: data,workflow,utility,pipeline
10
10
  Classifier: Topic :: Utilities
11
11
  Classifier: Natural Language :: English
12
- Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Development Status :: 4 - Beta
13
13
  Classifier: Intended Audience :: Developers
14
14
  Classifier: Operating System :: OS Independent
15
15
  Classifier: Programming Language :: Python
@@ -23,21 +23,16 @@ Description-Content-Type: text/markdown
23
23
  License-File: LICENSE
24
24
  Requires-Dist: fmtutil
25
25
  Requires-Dist: ddeutil-io
26
- Requires-Dist: python-dotenv
27
- Provides-Extra: test
28
- Requires-Dist: sqlalchemy ==2.0.30 ; extra == 'test'
29
- Requires-Dist: paramiko ==3.4.0 ; extra == 'test'
30
- Requires-Dist: sshtunnel ==0.4.0 ; extra == 'test'
31
- Requires-Dist: boto3 ==1.34.117 ; extra == 'test'
32
- Requires-Dist: fsspec ==2024.5.0 ; extra == 'test'
33
- Requires-Dist: polars ==0.20.31 ; extra == 'test'
34
- Requires-Dist: pyarrow ==16.1.0 ; extra == 'test'
26
+ Requires-Dist: python-dotenv ==1.0.1
27
+ Requires-Dist: schedule ==1.2.2
35
28
 
36
29
  # Data Utility: _Workflow_
37
30
 
38
31
  [![test](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml/badge.svg?branch=main)](https://github.com/ddeutils/ddeutil-workflow/actions/workflows/tests.yml)
39
32
  [![python support version](https://img.shields.io/pypi/pyversions/ddeutil-workflow)](https://pypi.org/project/ddeutil-workflow/)
40
33
  [![size](https://img.shields.io/github/languages/code-size/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow)
34
+ [![gh license](https://img.shields.io/github/license/ddeutils/ddeutil-workflow)](https://github.com/ddeutils/ddeutil-workflow/blob/main/LICENSE)
35
+
41
36
 
42
37
  **Table of Contents**:
43
38
 
@@ -46,10 +41,11 @@ Requires-Dist: pyarrow ==16.1.0 ; extra == 'test'
46
41
  - [Connection](#connection)
47
42
  - [Dataset](#dataset)
48
43
  - [Schedule](#schedule)
49
- - [Examples](#examples)
50
- - [Python](#python)
44
+ - [Pipeline Examples](#examples)
45
+ - [Python & Shell](#python--shell)
51
46
  - [Tasks (EL)](#tasks-extract--load)
52
- - [Hooks (T)](#hooks-transform)
47
+ - [Hooks (T)](#tasks-transform)
48
+ - [Configuration](#configuration)
53
49
 
54
50
  This **Utility Workflow** objects was created for easy to make a simple metadata
55
51
  driven pipeline that able to **ETL, T, EL, or ELT** by `.yaml` file.
@@ -80,7 +76,7 @@ This project need `ddeutil-io`, `ddeutil-model` extension namespace packages.
80
76
 
81
77
  The first step, you should start create the connections and datasets for In and
82
78
  Out of you data that want to use in pipeline of workflow. Some of this component
83
- is similar component of the **Airflow** because I like it concepts.
79
+ is similar component of the **Airflow** because I like it orchestration concepts.
84
80
 
85
81
  The main feature of this project is the `Pipeline` object that can call any
86
82
  registries function. The pipeline can handle everything that you want to do, it
@@ -91,44 +87,7 @@ will passing parameters and catching the output for re-use it to next step.
91
87
  > dynamic registries instead of main features because it have a lot of maintain
92
88
  > vendor codes and deps. (I do not have time to handle this features)
93
89
 
94
- ### Connection
95
-
96
- The connection for worker able to do any thing.
97
-
98
- ```yaml
99
- conn_postgres_data:
100
- type: conn.Postgres
101
- url: 'postgres//username:${ENV_PASS}@hostname:port/database?echo=True&time_out=10'
102
- ```
103
-
104
- ```python
105
- from ddeutil.workflow.conn import Conn
106
-
107
- conn = Conn.from_loader(name='conn_postgres_data', externals={})
108
- assert conn.ping()
109
- ```
110
-
111
- ### Dataset
112
-
113
- The dataset is define any objects on the connection. This feature was implemented
114
- on `/vendors` because it has a lot of tools that can interact with any data systems
115
- in the data tool stacks.
116
-
117
- ```yaml
118
- ds_postgres_customer_tbl:
119
- type: dataset.PostgresTbl
120
- conn: 'conn_postgres_data'
121
- features:
122
- id: serial primary key
123
- name: varchar( 100 ) not null
124
- ```
125
-
126
- ```python
127
- from ddeutil.workflow.vendors.pg import PostgresTbl
128
-
129
- dataset = PostgresTbl.from_loader(name='ds_postgres_customer_tbl', externals={})
130
- assert dataset.exists()
131
- ```
90
+ ---
132
91
 
133
92
  ### Schedule
134
93
 
@@ -139,7 +98,7 @@ schd_for_node:
139
98
  ```
140
99
 
141
100
  ```python
142
- from ddeutil.workflow.schedule import Schedule
101
+ from ddeutil.workflow.on import Schedule
143
102
 
144
103
  scdl = Schedule.from_loader(name='schd_for_node', externals={})
145
104
  assert '*/5 * * * *' == str(scdl.cronjob)
@@ -152,18 +111,35 @@ assert '2022-01-01 00:20:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
152
111
  assert '2022-01-01 00:25:00' f"{cron_iterate.next:%Y-%m-%d %H:%M:%S}"
153
112
  ```
154
113
 
114
+ ---
115
+
116
+ ### Pipeline
117
+
118
+ ```yaml
119
+ run_py_local:
120
+ type: ddeutil.workflow.pipeline.Pipeline
121
+ ...
122
+ ```
123
+
124
+ ```python
125
+ from ddeutil.workflow.pipeline import Pipeline
126
+
127
+ pipe = Pipeline.from_loader(name='run_py_local', externals={})
128
+ pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
129
+ ```
130
+
155
131
  ## Examples
156
132
 
157
133
  This is examples that use workflow file for running common Data Engineering
158
134
  use-case.
159
135
 
160
- ### Python
136
+ ### Python & Shell
161
137
 
162
138
  The state of doing lists that worker should to do. It be collection of the stage.
163
139
 
164
140
  ```yaml
165
141
  run_py_local:
166
- type: ddeutil.workflow.pipe.Pipeline
142
+ type: ddeutil.workflow.pipeline.Pipeline
167
143
  params:
168
144
  author-run:
169
145
  type: str
@@ -194,6 +170,12 @@ run_py_local:
194
170
  echo: ${{ stages.define-func.outputs.echo }}
195
171
  run: |
196
172
  echo('Caller')
173
+ second-job:
174
+ stages:
175
+ - name: Echo Shell Script
176
+ id: shell-echo
177
+ shell: |
178
+ echo "Hello World from Shell"
197
179
  ```
198
180
 
199
181
  ```python
@@ -207,13 +189,16 @@ pipe.execute(params={'author-run': 'Local Workflow', 'run-date': '2024-01-01'})
207
189
  > Hello Local Workflow
208
190
  > Receive x from above with Local Workflow
209
191
  > Hello Caller
192
+ > Hello World from Shell
210
193
  ```
211
194
 
195
+ ---
196
+
212
197
  ### Tasks (Extract & Load)
213
198
 
214
199
  ```yaml
215
200
  pipe_el_pg_to_lake:
216
- type: ddeutil.workflow.pipe.Pipeline
201
+ type: ddeutil.workflow.pipeline.Pipeline
217
202
  params:
218
203
  run-date:
219
204
  type: datetime
@@ -236,11 +221,15 @@ pipe_el_pg_to_lake:
236
221
  endpoint: "/${{ params.name }}"
237
222
  ```
238
223
 
224
+ ---
225
+
239
226
  ### Tasks (Transform)
240
227
 
228
+ > I recommend you to use task for all actions that you want to do.
229
+
241
230
  ```yaml
242
231
  pipe_hook_mssql_proc:
243
- type: ddeutil.workflow.pipe.Pipeline
232
+ type: ddeutil.workflow.pipeline.Pipeline
244
233
  params:
245
234
  run_date: datetime
246
235
  sp_name: str
@@ -261,6 +250,16 @@ pipe_hook_mssql_proc:
261
250
  target: ${{ params.target_name }}
262
251
  ```
263
252
 
253
+ > [!NOTE]
254
+ > The above parameter use short declarative statement. You can pass a parameter
255
+ > type to the key of a parameter name.
256
+
257
+ ## Configuration
258
+
259
+ ```text
260
+
261
+ ```
262
+
264
263
  ## License
265
264
 
266
265
  This project was licensed under the terms of the [MIT license](LICENSE).
@@ -0,0 +1,17 @@
1
+ ddeutil/workflow/__about__.py,sha256=jgkUUyo8sKJnE1-6McC_AbxbZqvAFoYRfSE3HCAexlk,27
2
+ ddeutil/workflow/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
3
+ ddeutil/workflow/__regex.py,sha256=bOngaQ0zJgy3vfNwF2MlI8XhLu_Ei1Vz8y50iLj8ao4,1061
4
+ ddeutil/workflow/__scheduler.py,sha256=wSzv6EN2Nx99lLxzrA80qfqQ_AxOFJAOk_EZYgafVzk,20965
5
+ ddeutil/workflow/__types.py,sha256=AkpQq6QlrclpurCZZVY9RMxoyS9z2WGzhaz_ikeTaCU,453
6
+ ddeutil/workflow/exceptions.py,sha256=XAq82VHSMLNb4UjGatp7hYfjxFtMiKFtBqJyAhwTl-s,434
7
+ ddeutil/workflow/loader.py,sha256=xiOtxluhLXfryMp3q1OIJggykr01WENKV7zBkcJ9-Yc,5763
8
+ ddeutil/workflow/on.py,sha256=vYNcvh74MN2ttd9KOehngOhJDNqDheFbbsXeEhDSqyk,5280
9
+ ddeutil/workflow/pipeline.py,sha256=tnzeAQrXb_-OIo53IGv6LxZoMiOJyMPWXAbisdBCXHI,19298
10
+ ddeutil/workflow/utils.py,sha256=A-k3L4OUFFq6utlsbpEVMGyofiLbDrvThM8IqFDE9gE,6093
11
+ ddeutil/workflow/tasks/__init__.py,sha256=HcQ7xNETFOKovMOs4lL2Pl8hXFZ515jU5Mc3LFZcSGE,336
12
+ ddeutil/workflow/tasks/dummy.py,sha256=b_y6eHGxj4aQ-ZmvcNL7aBHu3eIzL6BeXgqj0MDqSPw,1460
13
+ ddeutil_workflow-0.0.5.dist-info/LICENSE,sha256=nGFZ1QEhhhWeMHf9n99_fdt4vQaXS29xWKxt-OcLywk,1085
14
+ ddeutil_workflow-0.0.5.dist-info/METADATA,sha256=dcZadiwnhPD6DnoBaugdPHlOw108lRCmhDD7KF2s-Dg,7717
15
+ ddeutil_workflow-0.0.5.dist-info/WHEEL,sha256=R0nc6qTxuoLk7ShA2_Y-UWkN8ZdfDBG2B6Eqpz2WXbs,91
16
+ ddeutil_workflow-0.0.5.dist-info/top_level.txt,sha256=m9M6XeSWDwt_yMsmH6gcOjHZVK5O0-vgtNBuncHjzW4,8
17
+ ddeutil_workflow-0.0.5.dist-info/RECORD,,
@@ -1,5 +1,5 @@
1
1
  Wheel-Version: 1.0
2
- Generator: bdist_wheel (0.43.0)
2
+ Generator: setuptools (72.1.0)
3
3
  Root-Is-Purelib: true
4
4
  Tag: py3-none-any
5
5
 
ddeutil/workflow/conn.py DELETED
@@ -1,240 +0,0 @@
1
- # ------------------------------------------------------------------------------
2
- # Copyright (c) 2022 Korawich Anuttra. All rights reserved.
3
- # Licensed under the MIT License. See LICENSE in the project root for
4
- # license information.
5
- # ------------------------------------------------------------------------------
6
- from __future__ import annotations
7
-
8
- import logging
9
- from collections.abc import Iterator
10
- from pathlib import Path
11
- from typing import Annotated, Any, Literal, Optional, TypeVar
12
-
13
- from ddeutil.io.models.conn import Conn as ConnModel
14
- from pydantic import BaseModel, ConfigDict, Field
15
- from pydantic.functional_validators import field_validator
16
- from pydantic.types import SecretStr
17
- from typing_extensions import Self
18
-
19
- from .__types import DictData, TupleStr
20
- from .loader import Loader
21
-
22
- EXCLUDED_EXTRAS: TupleStr = (
23
- "type",
24
- "url",
25
- )
26
-
27
-
28
- class BaseConn(BaseModel):
29
- """Base Conn (Connection) Model"""
30
-
31
- model_config = ConfigDict(arbitrary_types_allowed=True)
32
-
33
- # NOTE: This is fields
34
- dialect: str
35
- host: Optional[str] = None
36
- port: Optional[int] = None
37
- user: Optional[str] = None
38
- pwd: Optional[SecretStr] = None
39
- endpoint: str
40
- extras: Annotated[
41
- DictData,
42
- Field(default_factory=dict, description="Extras mapping of parameters"),
43
- ]
44
-
45
- @classmethod
46
- def from_dict(cls, values: DictData) -> Self:
47
- """Construct Connection Model from dict data. This construct is
48
- different with ``.model_validate()`` because it will prepare the values
49
- before using it if the data dose not have 'url'.
50
-
51
- :param values: A dict data that use to construct this model.
52
- """
53
- # NOTE: filter out the fields of this model.
54
- filter_data: DictData = {
55
- k: values.pop(k)
56
- for k in values.copy()
57
- if k not in cls.model_fields and k not in EXCLUDED_EXTRAS
58
- }
59
- if "url" in values:
60
- url: ConnModel = ConnModel.from_url(values.pop("url"))
61
- return cls(
62
- dialect=url.dialect,
63
- host=url.host,
64
- port=url.port,
65
- user=url.user,
66
- pwd=url.pwd,
67
- # NOTE:
68
- # I will replace None endpoint with memory value for SQLite
69
- # connection string.
70
- endpoint=(url.endpoint or "memory"),
71
- # NOTE: This order will show that externals this the top level.
72
- extras=(url.options | filter_data),
73
- )
74
- return cls.model_validate(
75
- obj={
76
- "extras": (values.pop("extras", {}) | filter_data),
77
- **values,
78
- }
79
- )
80
-
81
- @classmethod
82
- def from_loader(cls, name: str, externals: DictData) -> Self:
83
- """Construct Connection with Loader object with specific config name.
84
-
85
- :param name: A config name.
86
- :param externals: A external data that want to adding to extras.
87
- """
88
- loader: Loader = Loader(name, externals=externals)
89
- # NOTE: Validate the config type match with current connection model
90
- if loader.type != cls:
91
- raise ValueError(f"Type {loader.type} does not match with {cls}")
92
- return cls.from_dict(
93
- {
94
- "extras": (loader.data.pop("extras", {}) | externals),
95
- **loader.data,
96
- }
97
- )
98
-
99
- @field_validator("endpoint")
100
- def __prepare_slash(cls, value: str) -> str:
101
- """Prepare slash character that map double form URL model loading."""
102
- if value.startswith("//"):
103
- return value[1:]
104
- return value
105
-
106
-
107
- class Conn(BaseConn):
108
- """Conn (Connection) Model that implement any necessary methods. This object
109
- should be the base for abstraction to any connection model object.
110
- """
111
-
112
- def get_spec(self) -> str:
113
- """Return full connection url that construct from all fields."""
114
- return (
115
- f"{self.dialect}://{self.user or ''}"
116
- f"{f':{self.pwd}' if self.pwd else ''}"
117
- f"{self.host or ''}{f':{self.port}' if self.port else ''}"
118
- f"/{self.endpoint}"
119
- )
120
-
121
- def ping(self) -> bool:
122
- """Ping the connection that able to use with this field value."""
123
- raise NotImplementedError("Ping does not implement")
124
-
125
- def glob(self, pattern: str) -> Iterator[Any]:
126
- """Return a list of object from the endpoint of this connection."""
127
- raise NotImplementedError("Glob does not implement")
128
-
129
- def find_object(self, _object: str):
130
- raise NotImplementedError("Glob does not implement")
131
-
132
-
133
- class FlSys(Conn):
134
- """File System Connection."""
135
-
136
- dialect: Literal["local"] = "local"
137
-
138
- def ping(self) -> bool:
139
- return Path(self.endpoint).exists()
140
-
141
- def glob(self, pattern: str) -> Iterator[Path]:
142
- yield from Path(self.endpoint).rglob(pattern=pattern)
143
-
144
- def find_object(self, _object: str) -> bool:
145
- return (Path(self.endpoint) / _object).exists()
146
-
147
-
148
- class SFTP(Conn):
149
- """SFTP Server Connection."""
150
-
151
- dialect: Literal["sftp"] = "sftp"
152
-
153
- def __client(self):
154
- from .vendors.sftp import WrapSFTP
155
-
156
- return WrapSFTP(
157
- host=self.host,
158
- port=self.port,
159
- user=self.user,
160
- pwd=self.pwd.get_secret_value(),
161
- )
162
-
163
- def ping(self) -> bool:
164
- with self.__client().simple_client():
165
- return True
166
-
167
- def glob(self, pattern: str) -> Iterator[str]:
168
- yield from self.__client().walk(pattern=pattern)
169
-
170
-
171
- class Db(Conn):
172
- """RDBMS System Connection"""
173
-
174
- def ping(self) -> bool:
175
- from sqlalchemy import create_engine
176
- from sqlalchemy.engine import URL, Engine
177
- from sqlalchemy.exc import OperationalError
178
-
179
- engine: Engine = create_engine(
180
- url=URL.create(
181
- self.dialect,
182
- username=self.user,
183
- password=self.pwd.get_secret_value() if self.pwd else None,
184
- host=self.host,
185
- port=self.port,
186
- database=self.endpoint,
187
- query={},
188
- ),
189
- execution_options={},
190
- )
191
- try:
192
- return engine.connect()
193
- except OperationalError as err:
194
- logging.warning(str(err))
195
- return False
196
-
197
-
198
- class SQLite(Db):
199
- dialect: Literal["sqlite"]
200
-
201
-
202
- class ODBC(Conn): ...
203
-
204
-
205
- class Doc(Conn):
206
- """No SQL System Connection"""
207
-
208
-
209
- class Mongo(Doc): ...
210
-
211
-
212
- class SSHCred(BaseModel):
213
- ssh_host: str
214
- ssh_user: str
215
- ssh_password: Optional[SecretStr] = Field(default=None)
216
- ssh_private_key: Optional[str] = Field(default=None)
217
- ssh_private_key_pwd: Optional[SecretStr] = Field(default=None)
218
- ssh_port: int = Field(default=22)
219
-
220
-
221
- class S3Cred(BaseModel):
222
- aws_access_key: str
223
- aws_secret_access_key: SecretStr
224
- region: str = Field(default="ap-southeast-1")
225
- role_arn: Optional[str] = Field(default=None)
226
- role_name: Optional[str] = Field(default=None)
227
- mfa_serial: Optional[str] = Field(default=None)
228
-
229
-
230
- class AZServPrinCred(BaseModel):
231
- tenant: str
232
- client_id: str
233
- secret_id: SecretStr
234
-
235
-
236
- class GoogleCred(BaseModel):
237
- google_json_path: str
238
-
239
-
240
- SubclassConn = TypeVar("SubclassConn", bound=Conn)
@@ -1,82 +0,0 @@
1
- # ------------------------------------------------------------------------------
2
- # Copyright (c) 2022 Korawich Anuttra. All rights reserved.
3
- # Licensed under the MIT License. See LICENSE in the project root for
4
- # license information.
5
- # ------------------------------------------------------------------------------
6
- from __future__ import annotations
7
-
8
- from datetime import datetime
9
- from typing import Annotated
10
- from zoneinfo import ZoneInfo, ZoneInfoNotFoundError
11
-
12
- from ddeutil.workflow.vendors.__schedule import CronJob, CronRunner
13
- from pydantic import BaseModel, ConfigDict, Field
14
- from pydantic.functional_validators import field_validator
15
- from typing_extensions import Self
16
-
17
- from .__types import DictData
18
- from .loader import Loader
19
-
20
-
21
- class BaseSchedule(BaseModel):
22
- """Base Schedule (Schedule) Model"""
23
-
24
- model_config = ConfigDict(arbitrary_types_allowed=True)
25
-
26
- # NOTE: This is fields
27
- cronjob: Annotated[CronJob, Field(description="Cron job of this schedule")]
28
- tz: Annotated[str, Field(description="Timezone")] = "utc"
29
- extras: Annotated[
30
- DictData,
31
- Field(default_factory=dict, description="Extras mapping of parameters"),
32
- ]
33
-
34
- @classmethod
35
- def from_loader(
36
- cls,
37
- name: str,
38
- externals: DictData,
39
- ) -> Self:
40
- loader: Loader = Loader(name, externals=externals)
41
- if "cronjob" not in loader.data:
42
- raise ValueError("Config does not set ``cronjob`` value")
43
- return cls(cronjob=loader.data["cronjob"], extras=externals)
44
-
45
- @field_validator("tz")
46
- def __validate_tz(cls, value: str):
47
- try:
48
- _ = ZoneInfo(value)
49
- return value
50
- except ZoneInfoNotFoundError as err:
51
- raise ValueError(f"Invalid timezone: {value}") from err
52
-
53
- @field_validator("cronjob", mode="before")
54
- def __prepare_cronjob(cls, value: str | CronJob) -> CronJob:
55
- return CronJob(value) if isinstance(value, str) else value
56
-
57
- def generate(self, start: str | datetime) -> CronRunner:
58
- """Return Cron runner object."""
59
- if not isinstance(start, datetime):
60
- start: datetime = datetime.fromisoformat(start)
61
- return self.cronjob.schedule(date=(start.astimezone(ZoneInfo(self.tz))))
62
-
63
-
64
- class Schedule(BaseSchedule):
65
- """Schedule (Schedule) Model.
66
-
67
- See Also:
68
- * ``generate()`` is the main usecase of this schedule object.
69
- """
70
-
71
-
72
- class ScheduleBkk(Schedule):
73
- """Asia Bangkok Schedule (Schedule) timezone Model.
74
-
75
- This model use for change timezone from utc to Asia/Bangkok
76
- """
77
-
78
- tz: Annotated[str, Field(description="Timezone")] = "Asia/Bangkok"
79
-
80
-
81
- class AwsSchedule(BaseSchedule):
82
- """Implement Schedule for AWS Service."""
@@ -1,54 +0,0 @@
1
- import logging
2
- import math
3
-
4
- try:
5
- import pandas as pd
6
-
7
- logging.debug(f"Pandas version: {pd.__version__}")
8
- except ImportError as err:
9
- raise ImportError(
10
- "``split_iterable`` function want to use pandas package that does"
11
- "not install on your interpreter."
12
- ) from err
13
-
14
-
15
- def split_iterable(iterable, chunk_size=None, generator_flag: bool = True):
16
- """
17
- Split an iterable into mini batch with batch length of batch_number
18
- supports batch of a pandas dataframe
19
- usage:
20
- >> for i in split_iterable([1,2,3,4,5], chunk_size=2):
21
- >> print(i)
22
- [1, 2]
23
- [3, 4]
24
- [5]
25
-
26
- for idx, mini_data in split_iterable(batch(df, chunk_size=10)):
27
- print(idx)
28
- print(mini_data)
29
- """
30
-
31
- chunk_size: int = chunk_size or 25000
32
- num_chunks = math.ceil(len(iterable) / chunk_size)
33
- if generator_flag:
34
- for _ in range(num_chunks):
35
- if isinstance(iterable, pd.DataFrame):
36
- yield iterable.iloc[_ * chunk_size : (_ + 1) * chunk_size]
37
- else:
38
- yield iterable[_ * chunk_size : (_ + 1) * chunk_size]
39
- else:
40
- _chunks: list = []
41
- for _ in range(num_chunks):
42
- if isinstance(iterable, pd.DataFrame):
43
- _chunks.append(
44
- iterable.iloc[_ * chunk_size : (_ + 1) * chunk_size]
45
- )
46
- else:
47
- _chunks.append(iterable[_ * chunk_size : (_ + 1) * chunk_size])
48
- return _chunks
49
-
50
-
51
- def chunks(dataframe: pd.DataFrame, n: int):
52
- """Yield successive n-sized chunks from dataframe."""
53
- for i in range(0, len(dataframe), n):
54
- yield dataframe.iloc[i : i + n]