Flowfile 0.2.2__py3-none-any.whl → 0.3.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
flowfile/__init__.py CHANGED
@@ -18,7 +18,7 @@ from flowfile_frame.expr import (
18
18
  sum, min, max, mean, count, when
19
19
  )
20
20
  from flowfile_frame.group_frame import GroupByFrame
21
- from flowfile_frame.utils import create_etl_graph, open_graph_in_editor
21
+ from flowfile_frame.utils import create_flow_graph, open_graph_in_editor
22
22
  from flowfile_frame.selectors import (
23
23
  numeric, float_, integer, string, temporal,
24
24
  datetime, date, time, duration, boolean,
@@ -57,7 +57,7 @@ __all__ = [
57
57
  'by_dtype', 'contains', 'starts_with', 'ends_with', 'matches',
58
58
 
59
59
  # Utilities
60
- 'create_etl_graph', 'open_graph_in_editor',
60
+ 'create_flow_graph', 'open_graph_in_editor',
61
61
 
62
62
  # Data types from Polars
63
63
  'Int8', 'Int16', 'Int32', 'Int64', 'Int128',
@@ -0,0 +1,219 @@
1
+ Metadata-Version: 2.3
2
+ Name: Flowfile
3
+ Version: 0.3.0
4
+ Summary: Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)
5
+ Author: Edward van Eechoud
6
+ Author-email: evaneechoud@gmail.com
7
+ Requires-Python: >=3.10,<3.13
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: Programming Language :: Python :: 3.10
10
+ Classifier: Programming Language :: Python :: 3.11
11
+ Classifier: Programming Language :: Python :: 3.12
12
+ Requires-Dist: XlsxWriter (>=3.2.0,<3.3.0)
13
+ Requires-Dist: aiofiles (>=24.1.0,<25.0.0)
14
+ Requires-Dist: airbyte-cdk (==6.47.2)
15
+ Requires-Dist: bcrypt (>=4.3.0,<5.0.0)
16
+ Requires-Dist: connectorx (>=0.4.2,<0.5.0)
17
+ Requires-Dist: databases (>=0.9.0,<0.10.0)
18
+ Requires-Dist: faker (>=23.1.0,<23.2.0)
19
+ Requires-Dist: fastapi (>=0.115.2,<0.116.0)
20
+ Requires-Dist: fastexcel (>=0.12.0,<0.13.0)
21
+ Requires-Dist: google-api-python-client (>=2.149.0,<2.150.0)
22
+ Requires-Dist: gspread (>=6.1.3,<6.2.0)
23
+ Requires-Dist: loky (>=3.4.1,<3.5.0)
24
+ Requires-Dist: methodtools (>=0.4.7,<0.5.0)
25
+ Requires-Dist: openpyxl (>=3.1.2,<3.2.0)
26
+ Requires-Dist: passlib (>=1.7.4,<1.8.0)
27
+ Requires-Dist: pendulum (==2.1.2) ; python_version < "3.12"
28
+ Requires-Dist: polars (>1.8.2,<=1.25.2)
29
+ Requires-Dist: polars-distance (>=0.4.3,<0.5.0)
30
+ Requires-Dist: polars-ds (>=0.6.0)
31
+ Requires-Dist: polars-expr-transformer (>0.4.7.0)
32
+ Requires-Dist: polars-grouper (>=0.3.0,<0.4.0)
33
+ Requires-Dist: polars_simed (>=0.3.4,<0.4.0)
34
+ Requires-Dist: pyairbyte-flowfile (==0.20.2)
35
+ Requires-Dist: pyarrow (>=18.0.0,<19.0.0)
36
+ Requires-Dist: pydantic (>=2.9.2,<2.10.0)
37
+ Requires-Dist: pyinstaller (>=6.11.0,<7.0.0)
38
+ Requires-Dist: pytest (>=8.3.4,<9.0.0)
39
+ Requires-Dist: python-jose (>=3.4.0,<4.0.0)
40
+ Requires-Dist: python-multipart (>=0.0.12,<0.1.0)
41
+ Requires-Dist: uvicorn (>=0.32.0,<0.33.0)
42
+ Description-Content-Type: text/markdown
43
+
44
+ <h1 align="center">
45
+ <img src="https://raw.githubusercontent.com/Edwardvaneechoud/Flowfile/main/.github/images/logo.png" alt="Flowfile Logo" width="100">
46
+ <br>
47
+ Flowfile
48
+ </h1>
49
+
50
+ <p align="center">
51
+ <b>Main Repository</b>: <a href="https://github.com/Edwardvaneechoud/Flowfile">Edwardvaneechoud/Flowfile</a><br>
52
+ <b>Documentation</b>:
53
+ <a href="https://edwardvaneechoud.github.io/Flowfile/">Website</a> -
54
+ <a href="https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_core/README.md">Core</a> -
55
+ <a href="https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_worker/README.md">Worker</a> -
56
+ <a href="https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_frontend/README.md">Frontend</a> -
57
+ <a href="https://dev.to/edwardvaneechoud/building-flowfile-architecting-a-visual-etl-tool-with-polars-576c">Technical Architecture</a>
58
+ </p>
59
+
60
+ <p>
61
+ Flowfile is a visual ETL tool and Python library suite that combines drag-and-drop workflow building with the speed of Polars dataframes. Build data pipelines visually, transform data using powerful nodes, or define data flows programmatically with Python and analyze results - all with high-performance data processing.
62
+ </p>
63
+
64
+ <div align="center">
65
+ <img src="https://raw.githubusercontent.com/Edwardvaneechoud/Flowfile/main/.github/images/group_by_screenshot.png" alt="Flowfile Interface" width="800"/>
66
+ </div>
67
+
68
+ ## ⚡ Technical Design
69
+
70
+ The `Flowfile` PyPI package provides the backend services and the `flowfile_frame` Python library:
71
+
72
+ - **Core (`flowfile_core`)** (FastAPI): The main ETL engine using Polars for high-performance data transformations. Typically runs on port `:63578`.
73
+ - **Worker (`flowfile_worker`)** (FastAPI): Handles computation-intensive tasks and caching of data operations, supporting the Core service. Typically runs on port `:63579`.
74
+ - **FlowFrame API (`flowfile_frame`)**: A Python library with a Polars-like API for defining data manipulation pipelines programmatically, which also generates an underlying ETL graph compatible with the Flowfile ecosystem.
75
+
76
+ Each flow is represented as a directed acyclic graph (DAG), where nodes represent data operations and edges represent data flow between operations.
77
+
78
+ For a deeper dive into the technical architecture, check out [this article](https://dev.to/edwardvaneechoud/building-flowfile-architecting-a-visual-etl-tool-with-polars-576c) on how Flowfile leverages Polars for efficient data processing.
79
+
80
+ ## ✨ Introducing FlowFile Frame - A Polars-Like API for ETL
81
+
82
+ FlowFile Frame is a Python library that provides a familiar Polars-like API for data manipulation, while simultaneously building an ETL (Extract, Transform, Load) graph under the hood. This allows you to:
83
+
84
+ 1. Write data transformation code using a simple, Pandas/Polars-like API
85
+ 2. Automatically generate executable ETL workflows compatible with the Flowfile ecosystem
86
+ 3. Visualize, save, and share your data pipelines
87
+ 4. Get the performance benefits of Polars with the traceability of ETL graphs
88
+
89
+ ### FlowFrame Quick Start
90
+
91
+ ```python
92
+ import flowfile_frame as ff
93
+ from flowfile_frame.utils import open_graph_in_editor
94
+
95
+ # Create a complex data pipeline
96
+ df = ff.from_dict({
97
+ "id": [1, 2, 3, 4, 5],
98
+ "category": ["A", "B", "A", "C", "B"],
99
+ "value": [100, 200, 150, 300, 250]
100
+ })
101
+
102
+ open_graph_in_editor(df.flow_graph)
103
+
104
+ ```
105
+
106
+ ### Key FlowFrame Features
107
+
108
+ - **Familiar API**: Based on Polars, making it easy to learn if you know Pandas or Polars
109
+ - **ETL Graph Generation**: Automatically builds a directed acyclic graph of your data operations
110
+ - **Lazy Evaluation**: Operations are not executed until `collect()` or a write operation
111
+ - **Interoperability**: Saved `.flowfile` graphs can be opened in the visual Flowfile Designer
112
+ - **High Performance**: Leverages Polars for fast data processing
113
+ - **Reproducible**: Save and share your data transformation workflows
114
+
115
+ ### Common FlowFrame Operations
116
+
117
+ ```python
118
+ import flowfile_frame as ff
119
+ from flowfile_frame import col, when
120
+
121
+ # Create from dictionary
122
+ df = ff.from_dict({
123
+ "id": [1, 2, 3],
124
+ "name": ["Alice", "Bob", "Charlie"],
125
+ "age": [25, 35, 28]
126
+ })
127
+
128
+ flow_graph = df.flow_graph
129
+ # Reading data
130
+ # df_csv = ff.read_csv("data.csv")
131
+ # df_parquet = ff.read_parquet("data.parquet")
132
+
133
+ # Filtering
134
+ adults = df.filter(col("age") >= 30)
135
+
136
+ # Select and transform
137
+ result = df.select(
138
+ col("name"),
139
+ (col("age") * 2).alias("double_age")
140
+ )
141
+
142
+ # Add new columns
143
+ df_with_cols = df.with_columns([
144
+ (col("age") + 10).alias("future_age"),
145
+ when(col("age") >= 30).then(ff.lit("Senior")).otherwise(ff.lit("Junior")).alias("status")]
146
+ )
147
+
148
+ # Group by and aggregate
149
+ df_sales = ff.from_dict({
150
+ "region": ["North", "South", "North", "South"],
151
+ "sales": [100, 200, 150, 300]
152
+ })
153
+ sales_by_region = df_sales.group_by("region").agg([
154
+ col("sales").sum().alias("total_sales"),
155
+ col("sales").mean().alias("avg_sales")
156
+ ])
157
+
158
+ # Joins
159
+ customers = ff.from_dict({"id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"]}, flow_graph=flow_graph)
160
+ orders = ff.from_dict({"id": [101, 102], "customer_id": [1, 2], "amount": [100, 200]}, flow_graph=flow_graph)
161
+ joined = customers.join(orders, left_on="id", right_on="customer_id")
162
+
163
+ # Save and visualize ETL graph
164
+
165
+ result.save_graph("my_pipeline.flowfile")
166
+ # open_graph_in_editor(result.flow_graph, "my_pipeline.flowfile") # Opens in Designer UI if installed
167
+ ```
168
+
169
+ For more detailed information on all available operations, including pivoting, window functions, complex workflows, and more, please refer to the [FlowFrame documentation](https://github.com/Edwardvaneechoud/Flowfile/blob/main/flowfile_frame/README.md).
170
+
171
+ ## 🔥 Example Use Cases
172
+
173
+ Flowfile is great for:
174
+
175
+ - **Data Cleaning & Transformation**
176
+ - Complex joins (fuzzy matching)
177
+ - Text-to-rows transformations
178
+ - Advanced filtering and grouping
179
+ - Custom formulas and expressions
180
+ - Filter data based on conditions
181
+
182
+ - **Performance**
183
+ - Built to scale out of core
184
+ - Using Polars for data processing
185
+
186
+ - **Data Integration**
187
+ - Standardize data formats
188
+ - Handle messy Excel files
189
+
190
+ - **ETL Operations**
191
+ - Data quality checks
192
+
193
+ (For more visual examples of these use cases, please see our [main GitHub repository](https://github.com/Edwardvaneechoud/Flowfile#-example-use-cases)).
194
+
195
+ ## 🚀 Getting Started
196
+
197
+ ### Installing the Flowfile Python Package
198
+
199
+ This package provides the `flowfile_core` and `flowfile_worker` backend services, and the `flowfile_frame` library.
200
+
201
+ ```bash
202
+ pip install Flowfile
203
+ ```
204
+
205
+ Once installed, you can use `flowfile_frame` as a library in your Python scripts (see Quick Start above).
206
+
207
+ ### Full Application with Visual Designer
208
+
209
+ For the complete visual ETL experience with the Designer UI, please see the [installation instructions in the main repository](https://github.com/Edwardvaneechoud/Flowfile#-getting-started).
210
+
211
+ Available options include:
212
+ - Desktop application (recommended for most users)
213
+ - Docker setup (backend services + web frontend)
214
+ - Manual setup for development
215
+
216
+ ## 📋 Development Roadmap
217
+
218
+ For the latest development roadmap and TODO list, please refer to the [main repository](https://github.com/Edwardvaneechoud/Flowfile#-todo).
219
+
@@ -1,7 +1,7 @@
1
1
  build_backends/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
2
2
  build_backends/main.py,sha256=hLmfqTeHLSTiwwZ5mUuoLQgtO40Igvl1_4NbnvzWSgI,9912
3
3
  build_backends/main_prd.py,sha256=JR2tYCMWM5ThooQjv5pw6nwVKMQjgsiHgKMhYn9NXWI,6927
4
- flowfile/__init__.py,sha256=jERp50eC0SrT-lsuMJpHaFoN6NIEzWYkqLvZF0vZ6ls,2299
4
+ flowfile/__init__.py,sha256=ci66Dom0DjMuu_S1wsyy9StLkp95wp6KCqGhj7YVQ0E,2301
5
5
  flowfile/__main__.py,sha256=X8ItB1LEC1ZXw_tvegu7sagb2CwqUeWSwWybbO1HtUs,630
6
6
  flowfile_core/__init__.py,sha256=dGxpVE9ol33CMRPJSPWlL7AZqXowBmlCx8unxCVWJXQ,254
7
7
  flowfile_core/auth/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
@@ -119,15 +119,15 @@ flowfile_core/utils/excel_file_manager.py,sha256=EIad2LenHu-3Yw1FcLmE0KgmLflnvNK
119
119
  flowfile_core/utils/fileManager.py,sha256=LnJhK_pwjb9MIApG2e4Hp3L5Z7Wny8YYHaL9SkW8WlE,1371
120
120
  flowfile_core/utils/fl_executor.py,sha256=eNnNZHZ9451brzZD00_X8aoCHFl1hR1gVOIGxtE0Db4,1301
121
121
  flowfile_core/utils/utils.py,sha256=NkEu21OF1l5weu01g-dAVdJ6BRHdpT2jBrWBSi-wp3c,270
122
- flowfile_frame/__init__.py,sha256=5Jofwsln7SI6-NcwR3DvMzRhg0nN19z3Ro3ZDZ5fClw,1349
122
+ flowfile_frame/__init__.py,sha256=-fD0U8K8VAw9uLGm9FU7R-mCgLoVtvcNh1YJsfhOzeE,1350
123
123
  flowfile_frame/__main__.py,sha256=qN0H-bYXDLIDoJNAurXep27crvKNHiz90HOGWbvC0HM,413
124
124
  flowfile_frame/adapters.py,sha256=C6JZZKANoKbHHmwMaF9AqAcZvITvQeb2Dklcpg5JAdY,530
125
125
  flowfile_frame/expr.py,sha256=wzJLiHtAqBA4NUEyXW8UhNJJ2jiGDvWEX8RfWT5ICwQ,47370
126
- flowfile_frame/flow_frame.py,sha256=X4p5Wr2DNoAZhkhCveVPGtFF2LG9FgDZHerLDrVqJdY,77553
126
+ flowfile_frame/flow_frame.py,sha256=TP3u7Nyq0TuT3sba0NGLCgMPnug5MuFCKE7Djafud-U,77559
127
127
  flowfile_frame/group_frame.py,sha256=MMGE2_DC8n1J2UxBBm_TyacGiRT_1V2CcWUvsIyhbIQ,9115
128
128
  flowfile_frame/join.py,sha256=pezHyNQMcaSPK9vhjaCaelMkgKdrvOQxwiROsa0fmN8,2480
129
129
  flowfile_frame/selectors.py,sha256=Ny5IpDP481ClNr5gI7_SjXzeqF16LsPcVQxiyUf5tUw,9130
130
- flowfile_frame/utils.py,sha256=fTcHWDyN7LVxRtVBoLwlfdEanhGhFZXEtslQUK2LDpQ,5923
130
+ flowfile_frame/utils.py,sha256=WGCY-eDeDlVkIT_zF1Si_80557TcxS51FMSAIdG5iTw,5928
131
131
  flowfile_worker/__init__.py,sha256=ZDdn3JCP7LWTiTsmntVIVduB4p2bUkJcZUKVEj7V9TU,1375
132
132
  flowfile_worker/configs.py,sha256=7fYtlj06vxDrMiRuMbwvSDOD1JRVMZqnPbcQFuikCJM,2714
133
133
  flowfile_worker/create/__init__.py,sha256=vkWy5uODffivUdxt3nNVALj6xgQK3HPBetqR-QqZ-uo,1643
@@ -164,8 +164,8 @@ test_utils/__init__.py,sha256=8WwOgIuKw6YtOc1GWR1DqIhQ8BhlLWqsMyQJSpxnzKk,66
164
164
  test_utils/postgres/__init__.py,sha256=y3V_6a9N1Pvm5NIBaA8CFf3i4mvPVY-H1teHA-rg0VU,33
165
165
  test_utils/postgres/commands.py,sha256=4oA8EHW3EqwGkG02HSqEGbXEBGM01sUW5FsyHm86W4k,4347
166
166
  test_utils/postgres/fixtures.py,sha256=kR8UBjQr3pgbe-xM-V8x8VseTHCPv0EmDEzPHl5Qc8Y,13507
167
- flowfile-0.2.2.dist-info/LICENSE,sha256=pCfLAA27jMHReYk_wGiirZxWRRXz_Bm7PVInRCa9P5g,1075
168
- flowfile-0.2.2.dist-info/METADATA,sha256=4jZprT4VmdoP8ahxW6WAm0gpwN5ulfHXr21-I07Hzww,7409
169
- flowfile-0.2.2.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
170
- flowfile-0.2.2.dist-info/entry_points.txt,sha256=CiyNXUvc77hDbE9rDaAMQFdFCQs-XdBm5_o1WV9_gQA,335
171
- flowfile-0.2.2.dist-info/RECORD,,
167
+ flowfile-0.3.0.dist-info/LICENSE,sha256=pCfLAA27jMHReYk_wGiirZxWRRXz_Bm7PVInRCa9P5g,1075
168
+ flowfile-0.3.0.dist-info/METADATA,sha256=049rH0Qr90sG9YaKOxeF9MoEBXgsKg8FnCvd7ithtAw,8952
169
+ flowfile-0.3.0.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
170
+ flowfile-0.3.0.dist-info/entry_points.txt,sha256=CiyNXUvc77hDbE9rDaAMQFdFCQs-XdBm5_o1WV9_gQA,335
171
+ flowfile-0.3.0.dist-info/RECORD,,
@@ -4,7 +4,7 @@
4
4
  # Core classes
5
5
  from flowfile_frame.flow_frame import FlowFrame # noqa: F401
6
6
 
7
- from flowfile_frame.utils import create_etl_graph # noqa: F401
7
+ from flowfile_frame.utils import create_flow_graph # noqa: F401
8
8
 
9
9
  # Commonly used functions
10
10
  from flowfile_frame.expr import ( # noqa: F401
@@ -16,7 +16,7 @@ from flowfile_core.schemas import input_schema, transform_schema
16
16
  from flowfile_frame.expr import Expr, Column, lit, col
17
17
  from flowfile_frame.selectors import Selector
18
18
  from flowfile_frame.group_frame import GroupByFrame
19
- from flowfile_frame.utils import _parse_inputs_as_iterable, create_etl_graph
19
+ from flowfile_frame.utils import _parse_inputs_as_iterable, create_flow_graph
20
20
  from flowfile_frame.join import _normalize_columns_to_list, _create_join_mappings
21
21
 
22
22
  node_id_counter = 0
@@ -92,7 +92,7 @@ class FlowFrame:
92
92
 
93
93
  # Create a new flow graph if none is provided
94
94
  if flow_graph is None:
95
- flow_graph = create_etl_graph()
95
+ flow_graph = create_flow_graph()
96
96
 
97
97
  flow_id = flow_graph.flow_id
98
98
 
@@ -198,7 +198,7 @@ class FlowFrame:
198
198
 
199
199
  # Initialize graph
200
200
  if flow_graph is None:
201
- flow_graph = create_etl_graph()
201
+ flow_graph = create_flow_graph()
202
202
  self.flow_graph = flow_graph
203
203
  # Set up data
204
204
  if isinstance(data, FlowDataEngine):
@@ -1922,7 +1922,7 @@ def read_csv(file_path, *, flow_graph: FlowGraph = None, separator: str = ';',
1922
1922
  # Create new node ID
1923
1923
  node_id = generate_node_id()
1924
1924
  if flow_graph is None:
1925
- flow_graph = create_etl_graph()
1925
+ flow_graph = create_flow_graph()
1926
1926
 
1927
1927
  flow_id = flow_graph.flow_id
1928
1928
 
@@ -1982,7 +1982,7 @@ def read_parquet(file_path, *, flow_graph: FlowGraph = None, description: str =
1982
1982
  node_id = generate_node_id()
1983
1983
 
1984
1984
  if flow_graph is None:
1985
- flow_graph = create_etl_graph()
1985
+ flow_graph = create_flow_graph()
1986
1986
 
1987
1987
  flow_id = flow_graph.flow_id
1988
1988
 
@@ -2028,7 +2028,7 @@ def from_dict(data, *, flow_graph: FlowGraph = None, description: str = None) ->
2028
2028
  node_id = generate_node_id()
2029
2029
 
2030
2030
  if not flow_graph:
2031
- flow_graph = create_etl_graph()
2031
+ flow_graph = create_flow_graph()
2032
2032
  flow_id = flow_graph.flow_id
2033
2033
 
2034
2034
  input_node = input_schema.NodeManualInput(
flowfile_frame/utils.py CHANGED
@@ -33,7 +33,7 @@ def _generate_id() -> int:
33
33
  return int(uuid.uuid4().int % 100000)
34
34
 
35
35
 
36
- def create_etl_graph() -> FlowGraph:
36
+ def create_flow_graph() -> FlowGraph:
37
37
  flow_id = _generate_id()
38
38
  flow_settings = schemas.FlowSettings(
39
39
  flow_id=flow_id,
@@ -127,13 +127,13 @@ def import_flow_to_editor(flow_path: str, auth_token: str) -> Optional[int]:
127
127
  return None
128
128
 
129
129
 
130
- def open_graph_in_editor(etl_graph: FlowGraph, storage_location: str = None) -> bool:
130
+ def open_graph_in_editor(flow_graph: FlowGraph, storage_location: str = None) -> bool:
131
131
  """
132
132
  Save the ETL graph and open it in the Flowfile editor.
133
133
 
134
134
  Parameters:
135
135
  -----------
136
- etl_graph : FlowGraph
136
+ flow_graph : FlowGraph
137
137
  The graph to save and open
138
138
  storage_location : str, optional
139
139
  Where to save the flowfile. If None, a default name is used.
@@ -152,8 +152,8 @@ def open_graph_in_editor(etl_graph: FlowGraph, storage_location: str = None) ->
152
152
  # Ensure path is absolute
153
153
  storage_location = os.path.abspath(storage_location)
154
154
 
155
- etl_graph.apply_layout()
156
- etl_graph.save_flow(storage_location)
155
+ flow_graph.apply_layout()
156
+ flow_graph.save_flow(storage_location)
157
157
  print(f"Flow saved to: {storage_location}")
158
158
 
159
159
  # Check if Flowfile is running, and start it if not
@@ -1,225 +0,0 @@
1
- Metadata-Version: 2.3
2
- Name: Flowfile
3
- Version: 0.2.2
4
- Summary: Project combining flowfile core (backend) and flowfile_worker (compute offloader) and flowfile_frame (api)
5
- Author: Edward van Eechoud
6
- Author-email: evaneechoud@gmail.com
7
- Requires-Python: >=3.10,<3.13
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: Programming Language :: Python :: 3.10
10
- Classifier: Programming Language :: Python :: 3.11
11
- Classifier: Programming Language :: Python :: 3.12
12
- Requires-Dist: XlsxWriter (>=3.2.0,<3.3.0)
13
- Requires-Dist: aiofiles (>=24.1.0,<25.0.0)
14
- Requires-Dist: airbyte-cdk (==6.47.2)
15
- Requires-Dist: bcrypt (>=4.3.0,<5.0.0)
16
- Requires-Dist: connectorx (>=0.4.2,<0.5.0)
17
- Requires-Dist: databases (>=0.9.0,<0.10.0)
18
- Requires-Dist: faker (>=23.1.0,<23.2.0)
19
- Requires-Dist: fastapi (>=0.115.2,<0.116.0)
20
- Requires-Dist: fastexcel (>=0.12.0,<0.13.0)
21
- Requires-Dist: google-api-python-client (>=2.149.0,<2.150.0)
22
- Requires-Dist: gspread (>=6.1.3,<6.2.0)
23
- Requires-Dist: loky (>=3.4.1,<3.5.0)
24
- Requires-Dist: methodtools (>=0.4.7,<0.5.0)
25
- Requires-Dist: openpyxl (>=3.1.2,<3.2.0)
26
- Requires-Dist: passlib (>=1.7.4,<1.8.0)
27
- Requires-Dist: pendulum (==2.1.2) ; python_version < "3.12"
28
- Requires-Dist: polars (>1.8.2,<=1.25.2)
29
- Requires-Dist: polars-distance (>=0.4.3,<0.5.0)
30
- Requires-Dist: polars-ds (>=0.6.0)
31
- Requires-Dist: polars-expr-transformer (>0.4.7.0)
32
- Requires-Dist: polars-grouper (>=0.3.0,<0.4.0)
33
- Requires-Dist: polars_simed (>=0.3.4,<0.4.0)
34
- Requires-Dist: pyairbyte-flowfile (==0.20.2)
35
- Requires-Dist: pyarrow (>=18.0.0,<19.0.0)
36
- Requires-Dist: pydantic (>=2.9.2,<2.10.0)
37
- Requires-Dist: pyinstaller (>=6.11.0,<7.0.0)
38
- Requires-Dist: pytest (>=8.3.4,<9.0.0)
39
- Requires-Dist: python-jose (>=3.4.0,<4.0.0)
40
- Requires-Dist: python-multipart (>=0.0.12,<0.1.0)
41
- Requires-Dist: uvicorn (>=0.32.0,<0.33.0)
42
- Description-Content-Type: text/markdown
43
-
44
- <h1 align="center">
45
- <img src=".github/images/logo.png" alt="Flowfile Logo" width="100">
46
- <br>
47
- Flowfile
48
- </h1>
49
- <p align="center">
50
- <b>Documentation</b>:
51
- <a href="https://edwardvaneechoud.github.io/Flowfile/">Website</a>
52
- -
53
- <a href="flowfile_core/README.md">Core</a>
54
- -
55
- <a href="flowfile_worker/README.md">Worker</a>
56
- -
57
- <a href="flowfile_frontend/README.md">Frontend</a>
58
- -
59
- <a href="https://dev.to/edwardvaneechoud/building-flowfile-architecting-a-visual-etl-tool-with-polars-576c">Technical Architecture</a>
60
- </p>
61
- <p>
62
- Flowfile is a visual ETL tool that combines drag-and-drop workflow building with the speed of Polars dataframes. Build data pipelines visually, transform data using powerful nodes, and analyze results - all without writing code.
63
- </p>
64
-
65
- <div align="center">
66
- <img src=".github/images/group_by_screenshot.png" alt="Flowfile Interface" width="800"/>
67
- </div>
68
-
69
- ## ⚡ Technical Design
70
-
71
- Flowfile operates as three interconnected services:
72
-
73
- - **Designer** (Electron + Vue): Visual interface for building data flows
74
- - **Core** (FastAPI): ETL engine using Polars for data transformations (`:63578`)
75
- - **Worker** (FastAPI): Handles computation and caching of data operations (`:63579`)
76
-
77
- Each flow is represented as a directed acyclic graph (DAG), where nodes represent data operations and edges represent data flow between operations.
78
-
79
- For a deeper dive into the technical architecture, check out [this article](https://dev.to/edwardvaneechoud/building-flowfile-architecting-a-visual-etl-tool-with-polars-576c) on how Flowfile leverages Polars for efficient data processing.
80
-
81
- ## 🔥 Example Use Cases
82
-
83
- - **Data Cleaning & Transformation**
84
- - Complex joins (fuzzy matching)
85
- - Text to rows transformations
86
- - Advanced filtering and grouping
87
- - Custom formulas and expressions
88
- - Filter data based on conditions
89
-
90
- <div align="center">
91
- <img src=".github/images/flowfile_demo_1.gif" alt="Flowfile Layout" width="800"/>
92
- </div>
93
-
94
- ---
95
-
96
- - **Performance**
97
- - Build to scale out of core
98
- - Using polars for data processing
99
-
100
- <div align="center">
101
- <img src=".github/images/demo_flowfile_write.gif" alt="Flowfile Layout" width="800"/>
102
- </div>
103
-
104
- ---
105
-
106
- ### **Data Integration**
107
- - Standardize data formats
108
- - Handle messy Excel files
109
-
110
-
111
- <div align="center">
112
- <img src=".github/images/read_excel_flowfile.gif" alt="Flowfile Layout" width="800"/>
113
- </div>
114
-
115
-
116
- ---
117
-
118
- - **ETL Operations**
119
- - Data quality checks
120
-
121
-
122
- ## 🚀 Getting Started
123
-
124
- ### Prerequisites
125
- - Python 3.10+
126
- - Node.js 16+
127
- - Poetry (Python package manager)
128
- - Docker & Docker Compose (option, for Docker setup)
129
- - Make (optional, for build automation)
130
-
131
- ### Installation Options
132
-
133
- #### 1. Desktop Application
134
- The desktop version offers the best experience with a native interface and integrated services. You can either:
135
-
136
- **Option A: Download Pre-built Application**
137
- - Download the latest release from [GitHub Releases](https://github.com/Edwardvaneechoud/Flowfile/releases)
138
- - Run the installer for your platform (Windows, macOS, or Linux)
139
- - Note: You may see security warnings since the installer isn't signed. On Windows, click "More info" then "Run anyway". On macOS, right-click the app, select "Open", then confirm. These warnings appear because the app isn't signed with a developer certificate.
140
-
141
- **Option B: Build from Source:**
142
- ```bash
143
- git clone https://github.com/edwardvaneechoud/Flowfile.git
144
- cd Flowfile
145
-
146
- # Build packaged executable
147
- make # Creates platform-specific executable
148
-
149
- # Or manually:
150
- poetry install
151
- poetry run build_backends
152
- cd flowfile_frontend
153
- npm install
154
- npm run build # All platforms
155
- ```
156
-
157
- #### 2. Docker Setup
158
- Perfect for quick testing, development or deployment scenarios. Runs all services in containers with proper networking and volume management:
159
- ```bash
160
- # Clone and start all services
161
- git clone https://github.com/edwardvaneechoud/Flowfile.git
162
- cd Flowfile
163
- docker compose up -d
164
-
165
- # Access services:
166
- Frontend: http://localhost:8080 # main service
167
- Core API: http://localhost:63578/docs
168
- Worker API: http://localhost:63579/docs
169
- ```
170
- Just place your files that you want to transform in the directory in shared_data and you're all set!
171
-
172
- Docker Compose is also excellent for development, as it automatically sets up all required services and ensures proper communication between them. Code changes in the mounted volumes will be reflected in the running containers.
173
-
174
- #### 3. Manual Setup (Development)
175
- Ideal for development work when you need direct access to all services and hot-reloading:
176
-
177
- ```bash
178
- git clone https://github.com/edwardvaneechoud/Flowfile.git
179
- cd Flowfile
180
-
181
- # Install Python dependencies
182
- poetry install
183
-
184
- # Start backend services
185
- poetry run flowfile_worker # Starts worker on :63579
186
- poetry run flowfile_core # Starts core on :63578
187
-
188
- # Start web frontend
189
- cd flowfile_frontend
190
- npm install
191
- npm run dev:web # Starts web interface on :8080
192
- ```
193
-
194
- ## 📋 TODO
195
-
196
- ### Core Features
197
- - [ ] Add cloud storage support
198
- - S3 integration
199
- - Azure Data Lake Storage (ADLS)
200
- - [x] Multi-flow execution support
201
- - [ ] Polars code reverse engineering
202
- - Generate Polars code from visual flows
203
- - Import existing Polars scripts
204
-
205
- ### Documentation
206
- - [ ] Add comprehensive docstrings
207
- - [x] Create detailed node documentation
208
- - [x] Add architectural documentation
209
- - [ ] Improve inline code comments
210
- - [ ] Create user guides and tutorials
211
-
212
- ### Infrastructure
213
- - [ ] Implement proper testing
214
- - [x] Add CI/CD pipeline
215
- - [x] Improve error handling
216
- - [x] Add monitoring and logging
217
-
218
- ## 📝 License
219
-
220
- [MIT License](LICENSE)
221
-
222
- ## Acknowledgments
223
-
224
- Built with Polars, Vue.js, FastAPI, Vueflow and Electron.
225
-