lecrapaud 0.19.0__py3-none-any.whl → 0.22.6__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (37) hide show
  1. lecrapaud/__init__.py +22 -1
  2. lecrapaud/{api.py → base.py} +331 -241
  3. lecrapaud/config.py +15 -3
  4. lecrapaud/db/alembic/versions/2025_10_25_0635-07e303521594_add_unique_constraint_to_score.py +39 -0
  5. lecrapaud/db/alembic/versions/2025_10_26_1727-033e0f7eca4f_merge_score_and_model_trainings_into_.py +264 -0
  6. lecrapaud/db/alembic/versions/2025_10_28_2006-0a8fb7826e9b_add_number_of_targets_and_remove_other_.py +75 -0
  7. lecrapaud/db/models/__init__.py +2 -4
  8. lecrapaud/db/models/base.py +116 -65
  9. lecrapaud/db/models/experiment.py +195 -182
  10. lecrapaud/db/models/feature_selection.py +0 -3
  11. lecrapaud/db/models/feature_selection_rank.py +0 -18
  12. lecrapaud/db/models/model_selection.py +2 -2
  13. lecrapaud/db/models/{score.py → model_selection_score.py} +29 -12
  14. lecrapaud/db/session.py +4 -0
  15. lecrapaud/experiment.py +44 -17
  16. lecrapaud/feature_engineering.py +45 -674
  17. lecrapaud/feature_preprocessing.py +1202 -0
  18. lecrapaud/feature_selection.py +145 -332
  19. lecrapaud/integrations/sentry_integration.py +46 -0
  20. lecrapaud/misc/tabpfn_tests.ipynb +2 -2
  21. lecrapaud/mixins.py +247 -0
  22. lecrapaud/model_preprocessing.py +295 -0
  23. lecrapaud/model_selection.py +612 -242
  24. lecrapaud/pipeline.py +548 -0
  25. lecrapaud/search_space.py +2 -1
  26. lecrapaud/utils.py +36 -3
  27. lecrapaud-0.22.6.dist-info/METADATA +423 -0
  28. lecrapaud-0.22.6.dist-info/RECORD +51 -0
  29. {lecrapaud-0.19.0.dist-info → lecrapaud-0.22.6.dist-info}/WHEEL +1 -1
  30. {lecrapaud-0.19.0.dist-info → lecrapaud-0.22.6.dist-info/licenses}/LICENSE +1 -1
  31. lecrapaud/db/models/model_training.py +0 -64
  32. lecrapaud/jobs/__init__.py +0 -13
  33. lecrapaud/jobs/config.py +0 -17
  34. lecrapaud/jobs/scheduler.py +0 -30
  35. lecrapaud/jobs/tasks.py +0 -17
  36. lecrapaud-0.19.0.dist-info/METADATA +0 -249
  37. lecrapaud-0.19.0.dist-info/RECORD +0 -48
@@ -1,249 +0,0 @@
1
- Metadata-Version: 2.3
2
- Name: lecrapaud
3
- Version: 0.19.0
4
- Summary: Framework for machine and deep learning, with regression, classification and time series analysis
5
- License: Apache License
6
- Author: Pierre H. Gallet
7
- Requires-Python: ==3.12.*
8
- Classifier: License :: Other/Proprietary License
9
- Classifier: Programming Language :: Python :: 3
10
- Classifier: Programming Language :: Python :: 3.12
11
- Requires-Dist: catboost (>=1.2.8)
12
- Requires-Dist: category-encoders (>=2.8.1)
13
- Requires-Dist: celery (>=5.5.3)
14
- Requires-Dist: ftfy (>=6.3.1)
15
- Requires-Dist: joblib (>=1.5.1)
16
- Requires-Dist: keras (>=3.10.0)
17
- Requires-Dist: lightgbm (>=4.6.0)
18
- Requires-Dist: matplotlib (>=3.10.3)
19
- Requires-Dist: mlxtend (>=0.23.4)
20
- Requires-Dist: numpy (>=2.1.3)
21
- Requires-Dist: openai (>=1.88.0)
22
- Requires-Dist: pandas (>=2.3.0)
23
- Requires-Dist: pydantic (>=2.9.2)
24
- Requires-Dist: python-dotenv (>=1.1.0)
25
- Requires-Dist: scikit-learn (>=1.6.1)
26
- Requires-Dist: scipy (<1.14.0)
27
- Requires-Dist: seaborn (>=0.13.2)
28
- Requires-Dist: sqlalchemy (>=2.0.41)
29
- Requires-Dist: tensorboardx (>=2.6.4)
30
- Requires-Dist: tensorflow (>=2.19.0)
31
- Requires-Dist: tiktoken (>=0.9.0)
32
- Requires-Dist: tqdm (>=4.67.1)
33
- Requires-Dist: xgboost (>=3.0.2)
34
- Description-Content-Type: text/markdown
35
-
36
- <div align="center">
37
-
38
- <img src="https://s3.amazonaws.com/pix.iemoji.com/images/emoji/apple/ios-12/256/frog-face.png" width=120 alt="crapaud"/>
39
-
40
- ## Welcome to LeCrapaud
41
-
42
- **An all-in-one machine learning framework**
43
-
44
- [![GitHub stars](https://img.shields.io/github/stars/pierregallet/lecrapaud.svg?style=flat&logo=github&colorB=blue&label=stars)](https://github.com/pierregallet/lecrapaud/stargazers)
45
- [![PyPI version](https://badge.fury.io/py/lecrapaud.svg)](https://badge.fury.io/py/lecrapaud)
46
- [![Python versions](https://img.shields.io/pypi/pyversions/lecrapaud.svg)](https://pypi.org/project/lecrapaud)
47
- [![License](https://img.shields.io/github/license/pierregallet/lecrapaud.svg)](https://github.com/pierregallet/lecrapaud/blob/main/LICENSE)
48
- [![codecov](https://codecov.io/gh/pierregallet/lecrapaud/branch/main/graph/badge.svg)](https://codecov.io/gh/pierregallet/lecrapaud)
49
-
50
- </div>
51
-
52
- ## 🚀 Introduction
53
-
54
- LeCrapaud is a high-level Python library for end-to-end machine learning workflows on tabular data, with a focus on financial and stock datasets. It provides a simple API to handle feature engineering, model selection, training, and prediction, all in a reproducible and modular way.
55
-
56
- ## ✨ Key Features
57
-
58
- - 🧩 Modular pipeline: Feature engineering, preprocessing, selection, and modeling as independent steps
59
- - 🤖 Automated model selection and hyperparameter optimization
60
- - 📊 Easy integration with pandas DataFrames
61
- - 🔬 Supports both regression and classification tasks
62
- - 🛠️ Simple API for both full pipeline and step-by-step usage
63
- - 📦 Ready for production and research workflows
64
-
65
- ## ⚡ Quick Start
66
-
67
-
68
- ### Install the package
69
-
70
- ```sh
71
- pip install lecrapaud
72
- ```
73
-
74
- ### How it works
75
-
76
- This package provides a high-level API to manage experiments for feature engineering, model selection, and prediction on tabular data (e.g. stock data).
77
-
78
- ### Typical workflow
79
-
80
- ```python
81
- from lecrapaud import LeCrapaud
82
-
83
- # 1. Create the main app
84
- app = LeCrapaud(uri=uri)
85
-
86
- # 2. Define your experiment context (see your notebook or api.py for all options)
87
- context = {
88
- "data": your_dataframe,
89
- "columns_drop": [...],
90
- "columns_date": [...],
91
- # ... other config options
92
- }
93
-
94
- # 3. Create an experiment
95
- experiment = app.create_experiment(**context)
96
-
97
- # 4. Run the full training pipeline
98
- experiment.train(your_dataframe)
99
-
100
- # 5. Make predictions on new data
101
- predictions = experiment.predict(new_data)
102
- ```
103
-
104
- ### Database Configuration (Required)
105
-
106
- LeCrapaud requires access to a MySQL database to store experiments and results. You must either:
107
-
108
- - Pass a valid MySQL URI to the `LeCrapaud` constructor:
109
- ```python
110
- app = LeCrapaud(uri="mysql+pymysql://user:password@host:port/dbname")
111
- ```
112
- - **OR** set the following environment variables before using the package:
113
- - `DB_USER`, `DB_PASSWORD`, `DB_HOST`, `DB_PORT`, `DB_NAME`
114
- - Or set `DB_URI` directly with your full connection string.
115
-
116
- If neither is provided, database operations will not work.
117
-
118
- ### Using OpenAI Embeddings (Optional)
119
-
120
- If you want to use the `columns_pca` embedding feature (for advanced feature engineering), you must set the `OPENAI_API_KEY` environment variable with your OpenAI API key:
121
-
122
- ```sh
123
- export OPENAI_API_KEY=sk-...
124
- ```
125
-
126
- If this variable is not set, features relying on OpenAI embeddings will not be available.
127
-
128
- ### Experiment Context Arguments
129
-
130
- Below are the main arguments you can pass to `create_experiment` (or the `Experiment` class):
131
-
132
- | Argument | Type | Description | Example/Default |
133
- | -------------------- | --------- | ---------------------------------------------------------------------------------------- | ------------------ |
134
- | `columns_binary` | list | Columns to treat as binary | `['flag']` |
135
- | `columns_boolean` | list | Columns to treat as boolean | `['is_active']` |
136
- | `columns_date` | list | Columns to treat as dates | `['date']` |
137
- | `columns_drop` | list | Columns to drop during feature engineering | `['col1', 'col2']` |
138
- | `columns_frequency` | list | Columns to frequency encode | `['category']` |
139
- | `columns_onehot` | list | Columns to one-hot encode | `['sector']` |
140
- | `columns_ordinal` | list | Columns to ordinal encode | `['grade']` |
141
- | `columns_pca` | list | Columns to use for PCA/embeddings (requires `OPENAI_API_KEY` if using OpenAI embeddings) | `['text_col']` |
142
- | `columns_te_groupby` | list | Columns for target encoding groupby | `['sector']` |
143
- | `columns_te_target` | list | Columns for target encoding target | `['target']` |
144
- | `data` | DataFrame | Your main dataset (required for new experiment) | `your_dataframe` |
145
- | `date_column` | str | Name of the date column | `'date'` |
146
- | `experiment_name` | str | Name for the training session | `'my_session'` |
147
- | `group_column` | str | Name of the group column | `'stock_id'` |
148
- | `max_timesteps` | int | Max timesteps for time series models | `30` |
149
- | `models_idx` | list | Indices of models to use for model selection | `[0, 1, 2]` |
150
- | `number_of_trials` | int | Number of trials for hyperparameter optimization | `20` |
151
- | `perform_crossval` | bool | Whether to perform cross-validation | `True`/`False` |
152
- | `perform_hyperopt` | bool | Whether to perform hyperparameter optimization | `True`/`False` |
153
- | `plot` | bool | Whether to plot results | `True`/`False` |
154
- | `preserve_model` | bool | Whether to preserve the best model | `True`/`False` |
155
- | `target_clf` | list | List of classification target column indices/names | `[1, 2, 3]` |
156
- | `target_mclf` | list | Multi-class classification targets (not yet implemented) | `[11]` |
157
- | `target_numbers` | list | List of regression target column indices/names | `[1, 2, 3]` |
158
- | `test_size` | int/float | Test set size (count or fraction) | `0.2` |
159
- | `time_series` | bool | Whether the data is time series | `True`/`False` |
160
- | `val_size` | int/float | Validation set size (count or fraction) | `0.2` |
161
-
162
- **Note:**
163
- - Not all arguments are required; defaults may exist for some.
164
- - For `columns_pca` with OpenAI embeddings, you must set the `OPENAI_API_KEY` environment variable.
165
-
166
-
167
-
168
- ### Modular usage
169
-
170
- You can also use each step independently:
171
-
172
- ```python
173
- data_eng = experiment.feature_engineering(data)
174
- train, val, test = experiment.preprocess_feature(data_eng)
175
- features = experiment.feature_selection(train)
176
- std_data, reshaped_data = experiment.preprocess_model(train, val, test)
177
- experiment.model_selection(std_data, reshaped_data)
178
- ```
179
-
180
- ## ⚠️ Using Alembic in Your Project (Important for Integrators)
181
-
182
- If you use Alembic for migrations in your own project and you share the same database with LeCrapaud, you must ensure that Alembic does **not** attempt to drop or modify LeCrapaud tables (those prefixed with `{LECRAPAUD_TABLE_PREFIX}_`).
183
-
184
- By default, Alembic's autogenerate feature will propose to drop any table that exists in the database but is not present in your project's models. To prevent this, add the following filter to your `env.py`:
185
-
186
- ```python
187
- def include_object(object, name, type_, reflected, compare_to):
188
- if type_ == "table" and name.startswith(f"{LECRAPAUD_TABLE_PREFIX}_"):
189
- return False # Ignore LeCrapaud tables
190
- return True
191
-
192
- context.configure(
193
- # ... other options ...
194
- include_object=include_object,
195
- )
196
- ```
197
-
198
- This will ensure that Alembic ignores all tables created by LeCrapaud when generating migrations for your own project.
199
-
200
- ---
201
-
202
- ## 🤝 Contributing
203
-
204
- ### Reminders for Github usage
205
-
206
- 1. Creating Github repository
207
-
208
- ```sh
209
- $ brew install gh
210
- $ gh auth login
211
- $ gh repo create
212
- ```
213
-
214
- 2. Initializing git and first commit to distant repository
215
-
216
- ```sh
217
- $ git init
218
- $ git add .
219
- $ git commit -m 'first commit'
220
- $ git remote add origin <YOUR_REPO_URL>
221
- $ git push -u origin master
222
- ```
223
-
224
- 3. Use conventional commits
225
- https://www.conventionalcommits.org/en/v1.0.0/#summary
226
-
227
- 4. Create environment
228
-
229
- ```sh
230
- $ pip install virtualenv
231
- $ python -m venv .venv
232
- $ source .venv/bin/activate
233
- ```
234
-
235
- 5. Install dependencies
236
-
237
- ```sh
238
- $ make install
239
- ```
240
-
241
- 6. Deactivate virtualenv (if needed)
242
-
243
- ```sh
244
- $ deactivate
245
- ```
246
-
247
- ---
248
-
249
- Pierre Gallet © 2025
@@ -1,48 +0,0 @@
1
- lecrapaud/__init__.py,sha256=oCxbtw_nk8rlOXbXbWo0RRMlsh6w-hTiZ6e5PRG_wp0,28
2
- lecrapaud/api.py,sha256=GsylHdScug-D8ePbPKo5r7Wa0myj9Ol0OqNwlNsbgs8,22518
3
- lecrapaud/config.py,sha256=itiqC31HB8i2Xo-kn2viCQrg_9tnA07-TJuZ-xdnx44,1126
4
- lecrapaud/db/__init__.py,sha256=82o9fMfaqKXPh2_rt44EzNRVZV1R4LScEnQYvj_TjK0,34
5
- lecrapaud/db/alembic/README,sha256=MVlc9TYmr57RbhXET6QxgyCcwWP7w-vLkEsirENqiIQ,38
6
- lecrapaud/db/alembic/env.py,sha256=RvTTBa3bDVBxmDtapAfzUoeWBgmVQU3s9U6HmQCAP84,2421
7
- lecrapaud/db/alembic/script.py.mako,sha256=MEqL-2qATlST9TAOeYgscMn1uy6HUS9NFvDgl93dMj8,635
8
- lecrapaud/db/alembic/versions/2025_06_23_1748-f089dfb7e3ba_.py,sha256=hyPW0Mt_B4ZAHnJYLREy7MAncNDLnEIyJQJW2pyz_LY,17228
9
- lecrapaud/db/alembic/versions/2025_06_24_1216-c62251b129ed_.py,sha256=6Pf36HAXEVrVlnrohAe2O7gVaXpDiv3LLIP_EEgTyA0,917
10
- lecrapaud/db/alembic/versions/2025_06_24_1711-86457e2f333f_.py,sha256=KjwjYvFaNqYmBLTYel8As37fyaBtNVWTqN_3M7y_2eI,1357
11
- lecrapaud/db/alembic/versions/2025_06_25_1759-72aa496ca65b_.py,sha256=MiqooJuZ1etExl2he3MniaEv8G0LrmqY-0m22m9xKmc,943
12
- lecrapaud/db/alembic/versions/2025_08_25_1434-7ed9963e732f_add_best_score_to_model_selection.py,sha256=gyQDFFHp1dlILuDtXSPdUU_MsLlX-UzTP-E96Aj_Hto,966
13
- lecrapaud/db/alembic/versions/2025_08_28_1516-c36e9fee22b9_add_avg_precision_to_score.py,sha256=Bpi1zegNGX1qU-8RVzRfwjyv2cVaQ5P9cpKQ1QDJgxs,945
14
- lecrapaud/db/alembic/versions/2025_08_28_1622-8b11c1ba982e_change_name_column.py,sha256=g6H2Z9MwB6UEiqdGlBoHBXpO9DTaWkwHt8FS6joVOm0,1191
15
- lecrapaud/db/alembic.ini,sha256=Zw2rdwsKV6c7J1SPtoFIPDX08_oTP3MuUKnNxBDiY8I,3796
16
- lecrapaud/db/models/__init__.py,sha256=Lhyw9fVLdom0Fc6yIP-ip8FjkU1EwVwjae5q2VM815Q,740
17
- lecrapaud/db/models/base.py,sha256=Sc6g38LsNsjn9-qpWOMSsZlbUER0Xr56-yLIJLpTMDU,7808
18
- lecrapaud/db/models/experiment.py,sha256=HlaHnAdjTRo9q87FUWq83YlKw5vB_o1sULxUQdmuCvo,14869
19
- lecrapaud/db/models/feature.py,sha256=5o77O2FyRObnLOCGNj8kaPSGM3pLv1Ov6mXXHYkmnYY,1136
20
- lecrapaud/db/models/feature_selection.py,sha256=mk42xuw1Sm_7Pznfg7TNc5_S4hscdw79QgIe3Bt9ZRI,3245
21
- lecrapaud/db/models/feature_selection_rank.py,sha256=Ydsb_rAT58FoSH13wkGjGPByzsjPx3DITXgJ2jgZmow,2198
22
- lecrapaud/db/models/model.py,sha256=F0hyMjd4FFHCv6_arIWBEmBCGOfG3b6_uzU8ExtFE90,952
23
- lecrapaud/db/models/model_selection.py,sha256=tJuICcporf3TxQHbJbHxnKgkaVc02z2kJJoCYS2nDcw,2001
24
- lecrapaud/db/models/model_training.py,sha256=jAIYPdwBln2jf593soLQ730uYrTfNK8zdG8TesOqmh0,1698
25
- lecrapaud/db/models/score.py,sha256=oo9-IAP8iRgrQYe39W_T1nW4zA_E3cLuTnFaDPAAK1A,1703
26
- lecrapaud/db/models/target.py,sha256=DKnfeaLU8eT8J_oh_vuFo5-o1CaoXR13xBbswme6Bgk,1649
27
- lecrapaud/db/models/utils.py,sha256=-a-nWWmpJ2XzidIxo2COVUTrGZIPYCfBzjhcszJj_bM,1109
28
- lecrapaud/db/session.py,sha256=87W5AkGRYu8b2rF5FNQ0MFC6wtGc-gGagNJQN_0vnDQ,3667
29
- lecrapaud/directories.py,sha256=0LrANuDgbuneSLker60c6q2hmGnQ3mKHIztTGzTx6Gw,826
30
- lecrapaud/experiment.py,sha256=1xLWjOrqAxJh9CdXOx9ppQuRFRRj0GH-xYZqg-ty9hI,2463
31
- lecrapaud/feature_engineering.py,sha256=ib1afBrwqePiXUaw0Cpe6hY3VNl5afg8YVntb88SCT4,39199
32
- lecrapaud/feature_selection.py,sha256=6ry-oVPQHbipm1XSE5YsH7AY0lQFt4CFbWiHiRs1nxg,43593
33
- lecrapaud/integrations/openai_integration.py,sha256=hHLF3fk5Bps8KNbNrEL3NUFa945jwClE6LrLpuMZOd4,7459
34
- lecrapaud/jobs/__init__.py,sha256=ZkrsyTOR21c_wN7RY8jPhm8jCrL1oCEtTsf3VFIlQiE,292
35
- lecrapaud/jobs/config.py,sha256=AmO0j3RFjx8H66dfKw_7vnshaOJb9Ox5BAZ9cwwLFMY,377
36
- lecrapaud/jobs/scheduler.py,sha256=OKXhb_gxE1-R7D1HyPns88iIS31Wd4gRqEzk4EqS0J4,774
37
- lecrapaud/jobs/tasks.py,sha256=sbD2_IT45DE4yQQbR6DVb9xv5x06rYDtUvSK8exYxes,332
38
- lecrapaud/misc/tabpfn_tests.ipynb,sha256=VkgsCUJ30d8jaL2VaWtQAgb8ngHPNtPgnXLs7QQTjqg,6676
39
- lecrapaud/misc/test-gpu-bilstm.ipynb,sha256=4nLuZRJVe2kn6kEmauhRiz5wkWT9AVrYhI9CEk_dYUY,9608
40
- lecrapaud/misc/test-gpu-resnet.ipynb,sha256=27Vu7nYwujYeh3fOxBNCnKJn3MXNPKZU-U8oDDUbymg,4944
41
- lecrapaud/misc/test-gpu-transformers.ipynb,sha256=k6MBSs_Um1h4PykvE-LTBcdpbWLbIFST_xl_AFW2jgI,8444
42
- lecrapaud/model_selection.py,sha256=z6sMU6ZGaymZOWdJehPw4yaWdzcYTABWweyH5LvCJwk,76980
43
- lecrapaud/search_space.py,sha256=FCIEHZBK1pUQ4CphJuxwXY2N_BdrCelRzHsCXnNLlVI,36334
44
- lecrapaud/utils.py,sha256=ATKu9pbXjYFRa2YzBYjqyLHJrzfnZ7SJrOD_qAnEBYE,8242
45
- lecrapaud-0.19.0.dist-info/LICENSE,sha256=MImCryu0AnqhJE_uAZD-PIDKXDKb8sT7v0i1NOYeHTM,11350
46
- lecrapaud-0.19.0.dist-info/METADATA,sha256=VdLOpM5P_sI6pe2UBIcBwcrA8870ZBWUUmjBbZINQbI,11115
47
- lecrapaud-0.19.0.dist-info/WHEEL,sha256=b4K_helf-jlQoXBBETfwnf4B04YC67LOev0jo4fX5m8,88
48
- lecrapaud-0.19.0.dist-info/RECORD,,