crewlyze 3.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +12 -0
- package/.gitattributes +2 -0
- package/CHANGELOG.md +86 -0
- package/Dockerfile +21 -0
- package/LICENSE +21 -0
- package/README.md +139 -0
- package/USAGE.md +106 -0
- package/agents/__init__.py +0 -0
- package/agents/cleaner.py +38 -0
- package/agents/insights.py +44 -0
- package/agents/relation.py +36 -0
- package/agents/visualizer.py +41 -0
- package/assets/badge_crewai.svg +4 -0
- package/assets/badge_matplotlib.svg +4 -0
- package/assets/badge_ollama.svg +4 -0
- package/assets/badge_pandas.svg +4 -0
- package/assets/badge_seaborn.svg +4 -0
- package/assets/branding_image.png +0 -0
- package/assets/complete_workflow.svg +216 -0
- package/assets/favicon.png +0 -0
- package/assets/logo.png +0 -0
- package/assets/stars.svg +12 -0
- package/bin/crewlyze.js +79 -0
- package/config/README.md +129 -0
- package/config/__init__.py +1 -0
- package/config/context.py +16 -0
- package/config/llm_config.py +300 -0
- package/config/metrics_tracker.py +70 -0
- package/crew.py +870 -0
- package/crewlyze-3.1.0.tgz +0 -0
- package/fix_syntax.py +54 -0
- package/main.py +1279 -0
- package/package.json +22 -0
- package/pyproject.toml +32 -0
- package/requirements.txt +33 -0
- package/tools/__init__.py +0 -0
- package/tools/dataset_tools.py +803 -0
- package/ui/__init__.py +3 -0
- package/ui/copilot.py +200 -0
- package/ui/export.py +800 -0
- package/update_appjs.py +54 -0
- package/update_llm.py +21 -0
- package/update_main.py +20 -0
- package/web/app.js +3142 -0
- package/web/index.html +1105 -0
- package/web/style.css +2561 -0
- package/workflows/__init__.py +0 -0
- package/workflows/pipeline.py +254 -0
package/.dockerignore
ADDED
package/.gitattributes
ADDED
package/CHANGELOG.md
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to the Crewlyze project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [3.0.0] - 2026-06-27
|
|
9
|
+
|
|
10
|
+
### Architecture Refactor
|
|
11
|
+
- **Modular UI Package**: Extracted all Streamlit UI logic from `app.py` into a dedicated `ui/` package:
|
|
12
|
+
- `ui/styles.py` — CSS injection (glassmorphism "Obsidian & Electric Violet" theme)
|
|
13
|
+
- `ui/components.py` — `display_text_as_bullets`, `display_relations`, `StreamlitLogger` (module-level, not inline)
|
|
14
|
+
- `ui/export.py` — ReportLab PDF builder, wrapped with `@st.cache_data`
|
|
15
|
+
- **Security — Subprocess Sandboxing**: All LLM-generated Python code (cleaning and visualization) now runs in an isolated child process via `subprocess.run()`. `exec()` is never called in the parent process, eliminating RCE risk.
|
|
16
|
+
- **Per-session File Isolation**: Each browser session gets its own `data/sessions/<id>/` and `outputs/<id>/` directories. No cross-session data leakage.
|
|
17
|
+
- **Content-hashed Caching**: Analysis results and PDF exports are cached by MD5 of the uploaded file content, not the filename. Re-uploading the same file never triggers a redundant re-run.
|
|
18
|
+
- **XSS-safe Output**: All LLM-generated text is `html.escape()`'d before injection into `unsafe_allow_html` markdown blocks.
|
|
19
|
+
|
|
20
|
+
### Improvements
|
|
21
|
+
- **Explicit Run Button**: Analysis no longer fires automatically on upload. Users configure the LLM provider in the sidebar and then click **▶️ Run Analysis**.
|
|
22
|
+
- **Numbered List Regex**: Bullet stripping now uses `re.sub(r"^[\d]+\.\s+", "", line)` — handles all numbered items (N.), not just 1–3.
|
|
23
|
+
- **LLM Config Isolation**: Provider/model/API key are stored in `st.session_state` and only written to `os.environ` immediately before `run_crew()` is invoked.
|
|
24
|
+
- **Agent Factory Pattern**: All agent factories (`make_cleaner_agent`, etc.) are called fresh on every `run_crew()` invocation, picking up the latest sidebar config without requiring `importlib.reload()`.
|
|
25
|
+
- **Session Cleanup**: `_cleanup_old_sessions()` automatically removes session directories older than 24 hours on every run.
|
|
26
|
+
|
|
27
|
+
### Fixed
|
|
28
|
+
- **Session Isolation Bug**: `execute_visualization_code` tool no longer creates a root-level `outputs/` directory, which previously bypassed per-session isolation.
|
|
29
|
+
- **Stale Cached PDF**: PDF export is now `@st.cache_data` wrapped with a content-hash key — it is never rebuilt on every Streamlit rerender.
|
|
30
|
+
- **Unicode Crash**: `StreamlitLogger.write()` re-encodes through the terminal's actual encoding with `errors='replace'`, preventing `UnicodeEncodeError` on Windows cp1252 terminals.
|
|
31
|
+
|
|
32
|
+
### Removed
|
|
33
|
+
- `validator.py` (merged into cleaner agent's responsibility)
|
|
34
|
+
- `code_gen.py` (replaced by inline visualization task in `visualizer.py`)
|
|
35
|
+
- `index.html` report output (replaced by the interactive Streamlit dashboard)
|
|
36
|
+
- `outputs/op.py` collected code output (agent code is shown in "Visualization Architecture" section)
|
|
37
|
+
|
|
38
|
+
## [2.1.0] - 2025-11-27
|
|
39
|
+
|
|
40
|
+
### UI Overhaul
|
|
41
|
+
- **Premium Design**: Introduced a new "Obsidian & Electric Violet" theme with glassmorphism effects.
|
|
42
|
+
- **Single-Page Layout**: Removed sidebar navigation for a seamless, scrolling experience.
|
|
43
|
+
- **Enhanced Components**:
|
|
44
|
+
- Redesigned "Column Relations" display with visual cards.
|
|
45
|
+
- Styled bullet points for cleaner readability.
|
|
46
|
+
- Modern typography using 'Outfit' and 'JetBrains Mono'.
|
|
47
|
+
- **Interactive Sidebar**: Redesigned configuration panel and "About" section with GitHub integration.
|
|
48
|
+
|
|
49
|
+
### Improvements
|
|
50
|
+
- **Robustness**: Improved error handling for LLM API calls and visualization generation.
|
|
51
|
+
- **Consistency**: Unified styling for both live analysis results and cached sessions.
|
|
52
|
+
|
|
53
|
+
|
|
54
|
+
## [2.0.0] - 2025-11-26
|
|
55
|
+
|
|
56
|
+
### Major Features
|
|
57
|
+
- **Data Analysis as a Service**: Rebranded and restructured for premium service delivery.
|
|
58
|
+
- **Enhanced Validator Agent**: Now acts as a "Data Quality Assurance Specialist" providing detailed quality scores (0-100), decision logic, and specific warnings.
|
|
59
|
+
- **Business Intelligence Agent**: Upgraded Insights Agent to a "Business Intelligence Analyst" role, focusing on synthesizing findings from cleaning, validation, and relation tasks.
|
|
60
|
+
- **Token Optimization**: Significantly reduced token usage by removing dynamic data context injection and optimizing agent prompts.
|
|
61
|
+
- **Professional Reporting**: Updated `index.html` with a modern, dark-themed UI, visual scorecards for data quality, and structured insight presentation.
|
|
62
|
+
|
|
63
|
+
### Changed
|
|
64
|
+
- **Project Branding**: Renamed to "Crewlyze".
|
|
65
|
+
- **Agent Roles**:
|
|
66
|
+
- Validator: Dataset Validator -> Data Quality Assurance Specialist
|
|
67
|
+
- Insights: Insights Agent -> Business Intelligence Analyst
|
|
68
|
+
- **Workflow**: Streamlined pipeline to use static task definitions for better efficiency.
|
|
69
|
+
- **Licensing**: Added MIT License and copyright headers to all source files.
|
|
70
|
+
|
|
71
|
+
### Fixed
|
|
72
|
+
- **Rate Limit Issues**: Optimized prompts and removed heavy context to prevent LLM rate limit errors.
|
|
73
|
+
- **Task Conflicts**: Resolved overlapping task descriptions between Validator and Insights agents.
|
|
74
|
+
|
|
75
|
+
## [1.0.0] - 2023-10-XX
|
|
76
|
+
|
|
77
|
+
### Added
|
|
78
|
+
- Initial release of CrewAI Data Analyst Agent
|
|
79
|
+
- Modular agent system with cleaner, validator, relation, code_gen, and insights agents
|
|
80
|
+
- Automated CSV processing and analysis pipeline
|
|
81
|
+
- HTML report generation with interactive elements
|
|
82
|
+
- LLM integration via Ollama backend
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
**Status**: ✅ Working | 🚀 Production Ready | 📊 Data Analysis as a Service
|
package/Dockerfile
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
FROM python:3.10-slim
|
|
2
|
+
|
|
3
|
+
WORKDIR /app
|
|
4
|
+
|
|
5
|
+
# Install system dependencies if needed
|
|
6
|
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
7
|
+
build-essential \
|
|
8
|
+
&& rm -rf /var/lib/apt/lists/*
|
|
9
|
+
|
|
10
|
+
# Install python dependencies
|
|
11
|
+
COPY requirements.txt .
|
|
12
|
+
RUN pip install --no-cache-dir -r requirements.txt
|
|
13
|
+
|
|
14
|
+
# Copy application source code
|
|
15
|
+
COPY . .
|
|
16
|
+
|
|
17
|
+
# Expose Hugging Face Spaces default port
|
|
18
|
+
EXPOSE 7860
|
|
19
|
+
|
|
20
|
+
# Run FastAPI using uvicorn on port 7860
|
|
21
|
+
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7860"]
|
package/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025 Sowmiyan S
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
package/README.md
ADDED
|
@@ -0,0 +1,139 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Crewlyze
|
|
3
|
+
emoji: 📊
|
|
4
|
+
colorFrom: indigo
|
|
5
|
+
colorTo: purple
|
|
6
|
+
sdk: docker
|
|
7
|
+
app_port: 7860
|
|
8
|
+
---
|
|
9
|
+
# Crewlyze
|
|
10
|
+
|
|
11
|
+
<p align="center">
|
|
12
|
+
<img src="assets/stars.svg" alt="5-star rating" height="28" />
|
|
13
|
+
|
|
14
|
+
<img src="assets/badge_crewai.svg" alt="crewai" height="28" />
|
|
15
|
+
<img src="assets/badge_pandas.svg" alt="pandas" height="28" />
|
|
16
|
+
<img src="assets/badge_matplotlib.svg" alt="matplotlib" height="28" />
|
|
17
|
+
<img src="assets/badge_seaborn.svg" alt="seaborn" height="28" />
|
|
18
|
+
<img src="assets/badge_ollama.svg" alt="ollama" height="28" />
|
|
19
|
+
</p>
|
|
20
|
+
|
|
21
|
+
## Branding
|
|
22
|
+
|
|
23
|
+
<p align="center">
|
|
24
|
+
<img src="assets/branding_image.png" alt="Transform Raw Datasets Into Insights With Agentic AI Analysts" width="100%" />
|
|
25
|
+
</p>
|
|
26
|
+
|
|
27
|
+
## Overview
|
|
28
|
+
|
|
29
|
+
> **Autonomous Data Intelligence as a Service** | A premium, modular data-analyst pipeline powered by LLM-driven agents. Upload a CSV to initialize a workspace, chat with your dataset in real-time, execute custom schema modifications via natural language, and run a complete multi-agent pipeline to generate structured audits, correlation maps, and executive business summaries.
|
|
30
|
+
|
|
31
|
+
<p align="center">
|
|
32
|
+
<img src="assets/complete_workflow.svg" alt="Crewlyze Workflow" width="100%" />
|
|
33
|
+
</p>
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## 🚀 Key Features
|
|
38
|
+
|
|
39
|
+
Once a project is initialized, the system branches into two distinct, high-impact paths:
|
|
40
|
+
|
|
41
|
+
### Track A: AI Data Chat (Interactive Exploration)
|
|
42
|
+
- **Natural Language Querying**: Query your dataset directly to get condition-based rows, statistics, or aggregations.
|
|
43
|
+
- **On-the-Fly Data Prep**: Ask the copilot to perform edits in-place, such as `Rename column "Q3_Sales" to "Sales_Q3"` or `Delete column "Notes"`, and watch the live data preview table update dynamically.
|
|
44
|
+
- **Instant Visualizations**: Command the chat bot to create custom charts (e.g. *"plot a neon-purple scatter chart of rating vs cost"*). It writes and runs the matplotlib code in a sandboxed subprocess to output charts inline.
|
|
45
|
+
|
|
46
|
+
### Track B: Agentic Analysis (CrewAI Pipeline)
|
|
47
|
+
Select and run specific automated tasks through the multi-agent pipeline:
|
|
48
|
+
1. **Data Cleaner (🧹)**: Audits columns, formats values, drops redundant rows, and generates a structured cleaning audit trail.
|
|
49
|
+
2. **Relationship Mapper (🔗)**: Maps numeric and categorical variables, rendering zoomable, interactive **Plotly** correlation charts.
|
|
50
|
+
3. **Business Insights (💡)**: Analyzes statistical summaries and generates McKinsey/BCG consulting cards (Observation ➔ Implication ➔ Strategy) alongside critical risk alerts.
|
|
51
|
+
4. **Visualizer Agent (📈)**: Automatically creates, styles, and saves formatted matplotlib PNG graphs.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## 🛠️ Installation & Setup
|
|
56
|
+
|
|
57
|
+
1. **Clone & Navigate**:
|
|
58
|
+
```bash
|
|
59
|
+
git clone https://github.com/your-username/Multi-Agent-Data-Analysis-System-with-CrewAI.git
|
|
60
|
+
cd Multi-Agent-Data-Analysis-System-with-CrewAI
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
2. **Initialize Environment**:
|
|
64
|
+
```bash
|
|
65
|
+
python -m venv .venv
|
|
66
|
+
# Windows:
|
|
67
|
+
.\.venv\Scripts\Activate.ps1
|
|
68
|
+
# macOS/Linux:
|
|
69
|
+
source .venv/bin/activate
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
3. **Install Dependencies**:
|
|
73
|
+
```bash
|
|
74
|
+
pip install -r requirements.txt
|
|
75
|
+
```
|
|
76
|
+
|
|
77
|
+
4. **Launch Server**:
|
|
78
|
+
```bash
|
|
79
|
+
# Start the FastAPI Web App
|
|
80
|
+
python -m uvicorn main:app --host 127.0.0.1 --port 8000
|
|
81
|
+
```
|
|
82
|
+
*Alternatively, double-click `run_web.bat` (Windows) to boot the server automatically.*
|
|
83
|
+
|
|
84
|
+
5. **Open Browser**:
|
|
85
|
+
Navigate to [http://127.0.0.1:8000](http://127.0.0.1:8000)
|
|
86
|
+
|
|
87
|
+
---
|
|
88
|
+
|
|
89
|
+
## 📂 Project Structure
|
|
90
|
+
|
|
91
|
+
```
|
|
92
|
+
.
|
|
93
|
+
├── agents/ # CrewAI Agent factories
|
|
94
|
+
│ ├── cleaner.py # 🧹 Data Cleaner Agent
|
|
95
|
+
│ ├── relation.py # 🔗 Relationship Analyst Agent
|
|
96
|
+
│ ├── insights.py # 💡 BI McKinsey Insights Agent
|
|
97
|
+
│ └── visualizer.py # 📈 Matplotlib Visualizer Agent
|
|
98
|
+
├── config/ # Platform configuration
|
|
99
|
+
│ ├── llm_config.py # Multi-Provider settings and model catalog
|
|
100
|
+
│ └── __init__.py
|
|
101
|
+
├── tools/ # Orchestration tools
|
|
102
|
+
│ └── dataset_tools.py # read_head, subprocess sandbox runner, plotly builder
|
|
103
|
+
├── ui/ # Document export services
|
|
104
|
+
│ └── export.py # Formatted PDF Cover & Content builder
|
|
105
|
+
├── workflows/ # Workflow pipelines
|
|
106
|
+
│ └── pipeline.py # Make pipeline orchestration (adaptive cooldown)
|
|
107
|
+
├── web/ # Web Frontend Assets
|
|
108
|
+
│ ├── index.html # Glassmorphic Workspace structure
|
|
109
|
+
│ ├── app.js # Frontend core logic (SSE logs, Chat, API hooks)
|
|
110
|
+
│ └── style.css # Dark Electric-Violet Theme styles
|
|
111
|
+
├── data/ # Dynamic project sessions
|
|
112
|
+
│ └── sessions/ # Concurrency-isolated session directories
|
|
113
|
+
│ └── <session_id>/
|
|
114
|
+
│ ├── original_upload.csv
|
|
115
|
+
│ ├── cleaned.csv
|
|
116
|
+
│ └── metadata.json
|
|
117
|
+
├── outputs/ # Sandbox generated PNG charts
|
|
118
|
+
│ └── <session_id>/
|
|
119
|
+
├── assets/ # Static icons and complete_workflow.svg
|
|
120
|
+
├── requirements.txt # Python package catalog
|
|
121
|
+
├── main.py # FastAPI backend routing endpoints
|
|
122
|
+
├── README.md # This file
|
|
123
|
+
├── USAGE.md # Detailed user guide
|
|
124
|
+
├── CHANGELOG.md # Version history
|
|
125
|
+
└── LICENSE # MIT License
|
|
126
|
+
```
|
|
127
|
+
|
|
128
|
+
---
|
|
129
|
+
|
|
130
|
+
## ⚙️ Provider Gateway Support
|
|
131
|
+
The system integrates a custom gateway supporting **13+ LLM providers** through local configuration or environment variables:
|
|
132
|
+
- **Cloud Gateways**: OpenAI, Anthropic, Google Gemini, NVIDIA NIM, Groq, Mistral, TogetherAI, Cohere, OpenRouter, DeepSeek, Perplexity, HuggingFace.
|
|
133
|
+
- **Local Sandbox**: Ollama (auto-detects local models via the Ollama catalog).
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
*Crewlyze*
|
|
138
|
+
*Copyright (c) 2025 Sowmiyan S*
|
|
139
|
+
*Licensed under the MIT License*
|
package/USAGE.md
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
1
|
+
# Usage Guide
|
|
2
|
+
|
|
3
|
+
## Overview
|
|
4
|
+
|
|
5
|
+
**Crewlyze** is a premium "Data Analysis as a Service" tool. It uses a swarm of specialized AI agents to clean, validate, analyze, and visualize your datasets automatically.
|
|
6
|
+
|
|
7
|
+
## Prerequisites
|
|
8
|
+
|
|
9
|
+
- **Python**: Version 3.10 or higher
|
|
10
|
+
- **API Key**: A Groq, OpenAI, Anthropic, or Hugging Face API key (or a local Ollama setup).
|
|
11
|
+
|
|
12
|
+
## Installation
|
|
13
|
+
|
|
14
|
+
1. Clone the repository:
|
|
15
|
+
```bash
|
|
16
|
+
git clone https://github.com/yourusername/Multi-Agent-Data-Analysis.git
|
|
17
|
+
cd Multi-Agent-Data-Analysis
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
2. Create a virtual environment:
|
|
21
|
+
```bash
|
|
22
|
+
python -m venv .venv
|
|
23
|
+
.\.venv\Scripts\Activate.ps1 # Windows
|
|
24
|
+
# source .venv/bin/activate # Mac/Linux
|
|
25
|
+
```
|
|
26
|
+
|
|
27
|
+
3. Install dependencies:
|
|
28
|
+
```bash
|
|
29
|
+
pip install -r requirements.txt
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
4. Configure your environment:
|
|
33
|
+
Create a `.env` file in the root directory:
|
|
34
|
+
```env
|
|
35
|
+
# Example for Groq
|
|
36
|
+
LLM_PROVIDER=groq
|
|
37
|
+
GROQ_API_KEY=your_groq_api_key_here
|
|
38
|
+
|
|
39
|
+
# Example for Hugging Face
|
|
40
|
+
# LLM_PROVIDER=huggingface
|
|
41
|
+
# HUGGINGFACE_API_KEY=your_huggingface_api_key_here
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
## Quick Start
|
|
45
|
+
|
|
46
|
+
1. **Prepare Data**: Ensure your CSV file is ready.
|
|
47
|
+
|
|
48
|
+
2. **Run the System**:
|
|
49
|
+
```bash
|
|
50
|
+
python crew.py
|
|
51
|
+
```
|
|
52
|
+
|
|
53
|
+
3. **Input Path**: When prompted, paste the full path to your CSV file (or press Enter to use the default `data/TB_Burden_Country.csv`).
|
|
54
|
+
|
|
55
|
+
4. **View Results**:
|
|
56
|
+
- The system will automatically open `index.html` in your default browser.
|
|
57
|
+
- This report contains your Data Quality Score, Cleaning Logs, Visualizations, and Business Insights.
|
|
58
|
+
|
|
59
|
+
## Detailed Features
|
|
60
|
+
|
|
61
|
+
### 1. Data Quality Assurance
|
|
62
|
+
The **Data Quality Assurance Specialist** scans your dataset for:
|
|
63
|
+
- Missing values and anomalies
|
|
64
|
+
- Sufficient volume for analysis
|
|
65
|
+
- Data type consistency
|
|
66
|
+
- **Output**: A 0-100 Quality Score and a GO/NO-GO decision.
|
|
67
|
+
|
|
68
|
+
### 2. Automated Cleaning
|
|
69
|
+
The **Data Cleaner** agent:
|
|
70
|
+
- Removes duplicates
|
|
71
|
+
- Fills missing values (Mean for numeric, Mode for categorical)
|
|
72
|
+
- Standardizes formats
|
|
73
|
+
|
|
74
|
+
### 3. Relationship Analysis
|
|
75
|
+
The **Relationship Analyst**:
|
|
76
|
+
- Identifies correlations between columns
|
|
77
|
+
- Selects the best visualization type (Scatter, Bar, Line, Heatmap, etc.)
|
|
78
|
+
|
|
79
|
+
### 4. Visualization Generation
|
|
80
|
+
The **Code Generator**:
|
|
81
|
+
- Writes bug-free Matplotlib/Seaborn code
|
|
82
|
+
- Executes the code to generate charts embedded in the report
|
|
83
|
+
|
|
84
|
+
### 5. Business Intelligence
|
|
85
|
+
The **Business Intelligence Analyst**:
|
|
86
|
+
- Synthesizes all findings into actionable strategic insights.
|
|
87
|
+
|
|
88
|
+
## Troubleshooting
|
|
89
|
+
|
|
90
|
+
### Rate Limit Errors
|
|
91
|
+
If you see `RateLimitError`:
|
|
92
|
+
- Switch to a smaller model in `config/llm_config.py` (e.g., `llama-3.1-8b-instant`).
|
|
93
|
+
- The system is optimized to minimize token usage, but heavy usage may still hit free tier limits.
|
|
94
|
+
|
|
95
|
+
### Browser Not Opening
|
|
96
|
+
- Manually open `index.html` in your browser.
|
|
97
|
+
|
|
98
|
+
## Support
|
|
99
|
+
|
|
100
|
+
For issues, please open a ticket on our GitHub repository.
|
|
101
|
+
|
|
102
|
+
---
|
|
103
|
+
|
|
104
|
+
*Crewlyze*
|
|
105
|
+
*Copyright (c) 2025 Sowmiyan S*
|
|
106
|
+
*Licensed under the MIT License*
|
|
File without changes
|
|
@@ -0,0 +1,38 @@
|
|
|
1
|
+
# Crewlyze
|
|
2
|
+
# Copyright (c) 2025 Sowmiyan S
|
|
3
|
+
# Licensed under the MIT License
|
|
4
|
+
|
|
5
|
+
from crewai import Agent, LLM
|
|
6
|
+
from config.llm_config import get_llm_params
|
|
7
|
+
from tools.dataset_tools import DatasetTools
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def make_cleaner_agent() -> Agent:
|
|
11
|
+
"""Factory — creates a fresh Data Cleaner agent with the current LLM config.
|
|
12
|
+
|
|
13
|
+
max_iter=5: read profile (already in task desc) → write cleaning code →
|
|
14
|
+
call clean_dataset_with_python → verify → final answer. 5 steps is enough.
|
|
15
|
+
"""
|
|
16
|
+
return Agent(
|
|
17
|
+
name="Data Cleaner",
|
|
18
|
+
role="Dataset cleaning expert",
|
|
19
|
+
backstory=(
|
|
20
|
+
"You are an expert data cleaning specialist. The task description already "
|
|
21
|
+
"contains a full dataset profile (shape, dtypes, missing %, sample rows). "
|
|
22
|
+
"Use that profile to immediately identify quality issues and write cleaning "
|
|
23
|
+
"code — DO NOT call read_dataset_head or get_dataset_info first."
|
|
24
|
+
),
|
|
25
|
+
goal=(
|
|
26
|
+
"Clean the dataset at the given file path by executing a Python script using "
|
|
27
|
+
"'Clean Dataset with Python Code'. When done, return a concise plain-text "
|
|
28
|
+
"bulleted list of the cleaning actions you took."
|
|
29
|
+
),
|
|
30
|
+
llm=LLM(**get_llm_params()),
|
|
31
|
+
tools=[
|
|
32
|
+
DatasetTools.read_dataset_head, # fallback only
|
|
33
|
+
DatasetTools.get_dataset_info, # fallback only
|
|
34
|
+
DatasetTools.clean_dataset_with_python,
|
|
35
|
+
],
|
|
36
|
+
max_iter=5,
|
|
37
|
+
verbose=True,
|
|
38
|
+
)
|
|
@@ -0,0 +1,44 @@
|
|
|
1
|
+
# Crewlyze
|
|
2
|
+
# Copyright (c) 2025 Sowmiyan S
|
|
3
|
+
# Licensed under the MIT License
|
|
4
|
+
|
|
5
|
+
from crewai import Agent, LLM
|
|
6
|
+
from config.llm_config import get_llm_params
|
|
7
|
+
from tools.dataset_tools import DatasetTools
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def make_insights_agent() -> Agent:
|
|
11
|
+
"""Factory — creates a fresh BI Insights agent with the current LLM config.
|
|
12
|
+
|
|
13
|
+
Enforces high-value management consulting output instead of dummy text.
|
|
14
|
+
"""
|
|
15
|
+
return Agent(
|
|
16
|
+
name="Business Intelligence Analyst",
|
|
17
|
+
role="Derive strategic business insights and ROI-focused recommendations",
|
|
18
|
+
goal=(
|
|
19
|
+
"Generate 5 high-impact, context-specific business insights from the data profile "
|
|
20
|
+
"and column relationships. Format each insight as a numbered list item. "
|
|
21
|
+
"DO NOT write generic comments or dummy filler text. Each insight MUST include:\n"
|
|
22
|
+
"- **Observation**: The exact pattern, trend, or correlation shown in the columns.\n"
|
|
23
|
+
"- **Business Implication**: What this means for operational efficiency, revenue, customer satisfaction, or risk.\n"
|
|
24
|
+
"- **Actionable Strategy**: A concrete, practical recommendation the company can execute immediately to drive business value."
|
|
25
|
+
),
|
|
26
|
+
backstory=(
|
|
27
|
+
"You are a Senior BI Director and Management Consultant (ex-McKinsey/BCG). You possess "
|
|
28
|
+
"a sharp ability to look at data profiles, column distributions, and correlations and immediately "
|
|
29
|
+
"translate them into strategic business realities. You write clearly, professionally, and persuasively. "
|
|
30
|
+
"You never use vague summaries or generic fillers — every point you make is tailored, analytical, "
|
|
31
|
+
"and directly useful to executive management.\n\n"
|
|
32
|
+
"CRITICAL CORRELATION RULE: Double check all correlation coefficient values you mention. Never state a "
|
|
33
|
+
"correlation is strong or moderate if the coefficient is 0 or -0. If the correlation coefficient is near 0, "
|
|
34
|
+
"there is no linear correlation. Quote the actual coefficients from the correlation matrix tool accurately."
|
|
35
|
+
),
|
|
36
|
+
llm=LLM(**get_llm_params()),
|
|
37
|
+
tools=[
|
|
38
|
+
DatasetTools.read_dataset_head,
|
|
39
|
+
DatasetTools.get_dataset_info,
|
|
40
|
+
DatasetTools.get_correlation_matrix,
|
|
41
|
+
],
|
|
42
|
+
max_iter=3,
|
|
43
|
+
verbose=True,
|
|
44
|
+
)
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
# Crewlyze
|
|
2
|
+
# Copyright (c) 2025 Sowmiyan S
|
|
3
|
+
# Licensed under the MIT License
|
|
4
|
+
|
|
5
|
+
from crewai import Agent, LLM
|
|
6
|
+
from config.llm_config import get_llm_params
|
|
7
|
+
from tools.dataset_tools import DatasetTools
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def make_relation_agent() -> Agent:
|
|
11
|
+
"""Factory — creates a fresh Relation Analyst agent with the current LLM config."""
|
|
12
|
+
return Agent(
|
|
13
|
+
name="Analyst",
|
|
14
|
+
role="Identify high-value business correlations and dataset relationships",
|
|
15
|
+
goal=(
|
|
16
|
+
"Identify 5 key column relationships with high business relevance (e.g. comparing "
|
|
17
|
+
"metrics like cost vs revenue, demographic factors vs outcome, or country vs rate, "
|
|
18
|
+
"rather than trivial ID columns or metadata). Output ONLY a list in this exact format:\n"
|
|
19
|
+
"- X: [Column1] | Y: [Column2] | Type: [ChartType]\n"
|
|
20
|
+
"DO NOT output any introductions, explanations, or other text."
|
|
21
|
+
),
|
|
22
|
+
backstory=(
|
|
23
|
+
"You are a Senior Quantitative Analyst. You have a keen eye for finding statistical "
|
|
24
|
+
"relations that translate to real-world business dynamics. You strictly follow "
|
|
25
|
+
"formatting guidelines and never invent columns that don't exist in the provided profile.\n\n"
|
|
26
|
+
"CRITICAL CHART RULE: If either Column1 (X) or Column2 (Y) is categorical (e.g. contains discrete "
|
|
27
|
+
"values like categories, gender, status, chest pain type 'cp', or classes), do NOT recommend a "
|
|
28
|
+
"'Scatter Plot'. Instead, recommend a 'Bar Chart' or 'Box Plot' or 'Grouped Bar Chart'. Scatter Plots "
|
|
29
|
+
"must only be used for continuous numeric vs continuous numeric variables."
|
|
30
|
+
),
|
|
31
|
+
allow_delegation=False,
|
|
32
|
+
llm=LLM(**get_llm_params()),
|
|
33
|
+
tools=[DatasetTools.read_dataset_head, DatasetTools.get_correlation_matrix],
|
|
34
|
+
max_iter=3,
|
|
35
|
+
verbose=True,
|
|
36
|
+
)
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# Crewlyze
|
|
2
|
+
# Copyright (c) 2025 Sowmiyan S
|
|
3
|
+
# Licensed under the MIT License
|
|
4
|
+
|
|
5
|
+
from crewai import Agent, LLM
|
|
6
|
+
from config.llm_config import get_llm_params
|
|
7
|
+
from tools.dataset_tools import DatasetTools
|
|
8
|
+
|
|
9
|
+
|
|
10
|
+
def make_visualizer_agent() -> Agent:
|
|
11
|
+
"""Factory — creates a fresh Visualizer agent with the current LLM config."""
|
|
12
|
+
return Agent(
|
|
13
|
+
name="Data Visualizer",
|
|
14
|
+
role="Premium Data Visualization & Plotting Expert",
|
|
15
|
+
backstory=(
|
|
16
|
+
"You are a master of data visualization design and analytics. You believe that charts must be "
|
|
17
|
+
"both statistically correct AND visually stunning. You use seaborn and matplotlib to design "
|
|
18
|
+
"corporate-grade, light-themed figures that executives love.\n\n"
|
|
19
|
+
"You have access to a sandbox execution tool 'Execute Visualization Code' where the pandas DataFrame "
|
|
20
|
+
"is already loaded as `df` and a helper function `save_chart(filename)` is pre-defined for you.\n\n"
|
|
21
|
+
"CRITICAL RULE: You will be given a 'RELATIONSHIPS TO VISUALIZE' section in your task. You MUST "
|
|
22
|
+
"generate charts for EXACTLY those specified column pairs (X and Y columns listed). Do NOT invent "
|
|
23
|
+
"different columns. Do NOT skip any pair. Use the chart Type hint given for each pair.\n\n"
|
|
24
|
+
"Apply a clean white theme: set figure facecolor to 'white', axes facecolor to '#f8fafc', "
|
|
25
|
+
"tick/label colors to '#334155'. Use high-contrast corporate colors like '#4f46e5', '#06b6d4', '#ec4899', '#10b981'."
|
|
26
|
+
),
|
|
27
|
+
goal=(
|
|
28
|
+
"Generate premium seaborn/matplotlib charts for EACH relationship pair listed in the "
|
|
29
|
+
"'RELATIONSHIPS TO VISUALIZE' section. Execute Python code using 'Execute Visualization Code' "
|
|
30
|
+
"for every pair, saving each chart with save_chart(). Apply dark-themed professional styling. "
|
|
31
|
+
"If a pair fails, try an alternative chart type before giving up. Must generate at least 3 charts."
|
|
32
|
+
),
|
|
33
|
+
llm=LLM(**get_llm_params()),
|
|
34
|
+
tools=[
|
|
35
|
+
DatasetTools.read_dataset_head,
|
|
36
|
+
DatasetTools.get_dataset_info,
|
|
37
|
+
DatasetTools.execute_visualization_code,
|
|
38
|
+
],
|
|
39
|
+
max_iter=7,
|
|
40
|
+
verbose=True,
|
|
41
|
+
)
|
|
Binary file
|