StatsPAI 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,36 @@
1
+ # Changelog
2
+
3
+ All notable changes to StatsPAI will be documented in this file.
4
+
5
+ ## [0.1.0] - 2024-07-26
6
+
7
+ ### Added
8
+ - **Core Regression Framework**
9
+ - OLS (Ordinary Least Squares) regression with formula interface
10
+ - Robust standard errors (HC0, HC1, HC2, HC3)
11
+ - Clustered standard errors
12
+ - Weighted Least Squares (WLS) support
13
+
14
+ - **Causal Inference Module**
15
+ - Causal Forest implementation inspired by Wager & Athey (2018)
16
+ - Honest estimation for unbiased treatment effect estimation
17
+ - Bootstrap confidence intervals for treatment effects
18
+ - Formula interface: `"outcome ~ treatment | features | controls"`
19
+
20
+ - **Output Management (outreg2)**
21
+ - Excel export functionality similar to Stata's outreg2
22
+ - Support for multiple regression models in single output
23
+ - Customizable formatting options
24
+ - Professional table layout
25
+
26
+ - **Unified API Design**
27
+ - Consistent `reg()` function interface
28
+ - Formula parsing: R/Stata-style syntax `"y ~ x1 + x2"`
29
+ - Type hints throughout the codebase
30
+ - Comprehensive documentation
31
+
32
+ ### Technical Details
33
+ - Python 3.8+ support
34
+ - Dependencies: numpy, scipy, pandas, scikit-learn, openpyxl
35
+ - MIT License
36
+ - Comprehensive test suite
@@ -0,0 +1,205 @@
1
+ # Contributing to StatsPAI
2
+
3
+ We welcome contributions to StatsPAI! This document provides guidelines for contributing to the project.
4
+
5
+ ## 🤝 How to Contribute
6
+
7
+ ### Types of Contributions
8
+
9
+ 1. **Bug Reports**: Help us identify and fix issues
10
+ 2. **Feature Requests**: Suggest new econometric methods or improvements
11
+ 3. **Code Contributions**: Implement new features or fix bugs
12
+ 4. **Documentation**: Improve docs, examples, or tutorials
13
+ 5. **Testing**: Add test cases or improve test coverage
14
+
15
+ ### Getting Started
16
+
17
+ 1. **Fork the Repository**
18
+ ```bash
19
+ git clone https://github.com/brycewang-stanford/pyEconometrics.git
20
+ cd pyEconometrics
21
+ ```
22
+
23
+ 2. **Set Up Development Environment**
24
+ ```bash
25
+ # Create virtual environment
26
+ python -m venv venv
27
+ source venv/bin/activate # On Windows: venv\Scripts\activate
28
+
29
+ # Install in development mode
30
+ pip install -e ".[dev]"
31
+
32
+ # Install pre-commit hooks
33
+ pre-commit install
34
+ ```
35
+
36
+ 3. **Create a Branch**
37
+ ```bash
38
+ git checkout -b feature/your-feature-name
39
+ # or
40
+ git checkout -b fix/issue-description
41
+ ```
42
+
43
+ ## 📝 Development Workflow
44
+
45
+ ### Before Making Changes
46
+
47
+ 1. **Check existing issues** to avoid duplicate work
48
+ 2. **Create an issue** for major changes to discuss the approach
49
+ 3. **Read the documentation** to understand the codebase structure
50
+
51
+ ### Making Changes
52
+
53
+ 1. **Write Tests First** (TDD approach recommended)
54
+ ```bash
55
+ # Create test file
56
+ touch tests/test_your_feature.py
57
+
58
+ # Write failing tests
59
+ pytest tests/test_your_feature.py
60
+ ```
61
+
62
+ 2. **Implement Your Changes**
63
+ - Follow existing code style and patterns
64
+ - Add type hints for all function signatures
65
+ - Include docstrings for public functions
66
+ - Add inline comments for complex logic
67
+
68
+ 3. **Run Tests**
69
+ ```bash
70
+ # Run all tests
71
+ pytest
72
+
73
+ # Run with coverage
74
+ pytest --cov=src/statspai
75
+
76
+ # Run specific tests
77
+ pytest tests/test_your_feature.py -v
78
+ ```
79
+
80
+ 4. **Check Code Quality**
81
+ ```bash
82
+ # Format code
83
+ black src/ tests/
84
+ isort src/ tests/
85
+
86
+ # Check linting
87
+ flake8 src/ tests/
88
+
89
+ # Type checking (if mypy is configured)
90
+ mypy src/
91
+ ```
92
+
93
+ ### Commit Guidelines
94
+
95
+ Use conventional commits format:
96
+
97
+ ```
98
+ type(scope): brief description
99
+
100
+ Detailed explanation if needed.
101
+
102
+ Fixes #issue_number
103
+ ```
104
+
105
+ **Types:**
106
+ - `feat`: New feature
107
+ - `fix`: Bug fix
108
+ - `docs`: Documentation changes
109
+ - `style`: Code style changes (formatting, etc.)
110
+ - `refactor`: Code refactoring
111
+ - `test`: Adding or updating tests
112
+ - `chore`: Maintenance tasks
113
+
114
+ **Examples:**
115
+ ```bash
116
+ git commit -m "feat(causal): add bootstrap confidence intervals to CausalForest"
117
+ git commit -m "fix(outreg2): handle empty model lists gracefully"
118
+ git commit -m "docs(readme): update installation instructions"
119
+ ```
120
+
121
+ ## 🏗 Code Structure
122
+
123
+ ### Package Organization
124
+ ```
125
+ src/statspai/
126
+ ├── __init__.py # Main API exports
127
+ ├── core/ # Core regression functionality
128
+ │ ├── __init__.py
129
+ │ ├── base.py # Base classes
130
+ │ └── regression.py # Main regression implementation
131
+ ├── causal/ # Causal inference methods
132
+ │ ├── __init__.py
133
+ │ └── causal_forest.py # Causal Forest implementation
134
+ └── output/ # Output and formatting
135
+ ├── __init__.py
136
+ └── outreg2.py # Excel export functionality
137
+ ```
138
+
139
+ ### Code Style Guidelines
140
+
141
+ 1. **Follow PEP 8** with line length of 88 characters
142
+ 2. **Use type hints** for all function parameters and return values
143
+ 3. **Write docstrings** in Google format:
144
+ ```python
145
+ def function_name(param1: int, param2: str) -> bool:
146
+ """Brief description of the function.
147
+
148
+ Args:
149
+ param1: Description of param1.
150
+ param2: Description of param2.
151
+
152
+ Returns:
153
+ Description of return value.
154
+
155
+ Raises:
156
+ ValueError: When param1 is negative.
157
+ """
158
+ ```
159
+
160
+ 4. **Use descriptive variable names**
161
+ 5. **Add comments for complex algorithms**
162
+
163
+ ### Testing Guidelines
164
+
165
+ 1. **Test Coverage**: Aim for >90% test coverage
166
+ 2. **Test Types**:
167
+ - Unit tests for individual functions
168
+ - Integration tests for workflows
169
+ - Regression tests against known results
170
+ 3. **Test Structure**:
171
+ ```python
172
+ def test_function_name_scenario():
173
+ # Arrange
174
+ data = create_test_data()
175
+
176
+ # Act
177
+ result = function_to_test(data)
178
+
179
+ # Assert
180
+ assert result.some_property == expected_value
181
+ ```
182
+
183
+ 4. **Use fixtures** for common test data:
184
+ ```python
185
+ @pytest.fixture
186
+ def sample_data():
187
+ return pd.DataFrame({
188
+ 'y': [1, 2, 3, 4, 5],
189
+ 'x': [2, 4, 6, 8, 10]
190
+ })
191
+ ```
192
+
193
+ ## 📞 Getting Help
194
+
195
+ - **GitHub Issues**: For bugs and feature requests
196
+ - **Discussions**: For questions and general discussion
197
+ - **Email**: For security issues or private communication
198
+
199
+ ## 📄 License
200
+
201
+ By contributing to StatsPAI, you agree that your contributions will be licensed under the MIT License.
202
+
203
+ ---
204
+
205
+ Thank you for contributing to StatsPAI! 🎉
statspai-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Bryce Wang
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,44 @@
1
+ # Include essential files for PyPI distribution
2
+ include LICENSE
3
+ include README.md
4
+ include CHANGELOG.md
5
+ include CONTRIBUTING.md
6
+ include pyproject.toml
7
+
8
+ # Include package source code
9
+ recursive-include src *.py
10
+
11
+ # Include tests
12
+ recursive-include tests *.py
13
+
14
+ # Exclude internal documentation and development files
15
+ exclude PROJECT_SUMMARY.md
16
+ exclude FINAL_PROJECT_STATUS.md
17
+ exclude RELEASE_CHECKLIST.md
18
+ exclude build_and_release.sh
19
+ exclude Makefile
20
+ exclude *.sh
21
+
22
+ # Exclude development directories (not needed for users)
23
+ exclude docs/
24
+ recursive-exclude docs *
25
+ exclude examples/
26
+ recursive-exclude examples *
27
+ exclude .github/
28
+ recursive-exclude .github *
29
+
30
+ # Exclude build and cache files
31
+ recursive-exclude * __pycache__
32
+ recursive-exclude * *.py[co]
33
+ exclude .git*
34
+ exclude .pytest_cache
35
+ exclude .mypy_cache
36
+ exclude htmlcov
37
+ exclude .coverage
38
+ exclude .DS_Store
39
+ exclude *.egg-info
40
+ exclude build/
41
+ exclude dist/
42
+ exclude .venv/
43
+ exclude venv/
44
+ exclude .pre-commit-config.yaml
@@ -0,0 +1,252 @@
1
+ Metadata-Version: 2.4
2
+ Name: StatsPAI
3
+ Version: 0.1.0
4
+ Summary: The AI-powered Statistics & Econometrics Toolkit for Python
5
+ Author-email: Bryce Wang <bryce.wang@example.com>
6
+ Maintainer-email: Bryce Wang <bryce.wang@example.com>
7
+ License-Expression: MIT
8
+ Project-URL: Homepage, https://github.com/brycewang-stanford/pyEconometrics
9
+ Project-URL: Documentation, https://statspai.readthedocs.io/
10
+ Project-URL: Repository, https://github.com/brycewang-stanford/pyEconometrics
11
+ Project-URL: Bug Reports, https://github.com/brycewang-stanford/pyEconometrics/issues
12
+ Keywords: econometrics,statistics,regression,causal-inference,causal-forest,panel-data,instrumental-variables,stata,R,machine-learning
13
+ Classifier: Development Status :: 3 - Alpha
14
+ Classifier: Intended Audience :: Science/Research
15
+ Classifier: Operating System :: OS Independent
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Programming Language :: Python :: 3.8
18
+ Classifier: Programming Language :: Python :: 3.9
19
+ Classifier: Programming Language :: Python :: 3.10
20
+ Classifier: Programming Language :: Python :: 3.11
21
+ Classifier: Programming Language :: Python :: 3.12
22
+ Classifier: Topic :: Scientific/Engineering :: Mathematics
23
+ Classifier: Topic :: Office/Business :: Financial
24
+ Requires-Python: >=3.8
25
+ Description-Content-Type: text/markdown
26
+ License-File: LICENSE
27
+ Requires-Dist: numpy>=1.20.0
28
+ Requires-Dist: pandas>=1.3.0
29
+ Requires-Dist: scipy>=1.7.0
30
+ Requires-Dist: statsmodels>=0.13.0
31
+ Requires-Dist: linearmodels>=4.25
32
+ Requires-Dist: numba>=0.56.0
33
+ Requires-Dist: scikit-learn>=1.0.0
34
+ Requires-Dist: patsy>=0.5.0
35
+ Requires-Dist: openpyxl>=3.0.0
36
+ Requires-Dist: xlsxwriter>=3.0.0
37
+ Provides-Extra: dev
38
+ Requires-Dist: pytest>=6.0; extra == "dev"
39
+ Requires-Dist: pytest-cov; extra == "dev"
40
+ Requires-Dist: black; extra == "dev"
41
+ Requires-Dist: flake8; extra == "dev"
42
+ Requires-Dist: mypy; extra == "dev"
43
+ Requires-Dist: sphinx; extra == "dev"
44
+ Requires-Dist: sphinx-rtd-theme; extra == "dev"
45
+ Provides-Extra: performance
46
+ Requires-Dist: jax[cpu]>=0.4.0; extra == "performance"
47
+ Requires-Dist: jaxlib>=0.4.0; extra == "performance"
48
+ Provides-Extra: plotting
49
+ Requires-Dist: matplotlib>=3.5.0; extra == "plotting"
50
+ Requires-Dist: seaborn>=0.11.0; extra == "plotting"
51
+ Requires-Dist: plotly>=5.0.0; extra == "plotting"
52
+ Dynamic: license-file
53
+
54
+ # StatsPAI
55
+
56
+ [![PyPI version](https://badge.fury.io/py/StatsPAI.svg)](https://badge.fury.io/py/StatsPAI)
57
+ [![Python versions](https://img.shields.io/pypi/pyversions/StatsPAI.svg)](https://pypi.org/project/StatsPAI/)
58
+ [![License](https://img.shields.io/github/license/brycewang-stanford/pyEconometrics.svg)](https://github.com/brycewang-stanford/pyEconometrics/blob/main/LICENSE)
59
+ [![Build Status](https://github.com/brycewang-stanford/pyEconometrics/workflows/CI%2FCD%20Pipeline/badge.svg)](https://github.com/brycewang-stanford/pyEconometrics/actions)
60
+ [![codecov](https://codecov.io/gh/brycewang-stanford/pyEconometrics/branch/main/graph/badge.svg)](https://codecov.io/gh/brycewang-stanford/pyEconometrics)
61
+
62
+ **The AI-powered Statistics & Econometrics Toolkit for Python**
63
+
64
+ StatsPAI bridges the gap between user-friendly syntax and powerful econometric analysis, making advanced techniques accessible to researchers and practitioners.
65
+
66
+ ## 🚀 Features
67
+
68
+ ### Core Econometric Methods
69
+ - **Linear Regression**: OLS, WLS with robust standard errors
70
+ - **Instrumental Variables**: 2SLS estimation
71
+ - **Panel Data**: Fixed Effects, Random Effects models
72
+ - **Causal Inference**: Causal Forest implementation (inspired by EconML)
73
+
74
+ ### User Experience
75
+ - **Formula Interface**: Intuitive R/Stata-style syntax `"y ~ x1 + x2"`
76
+ - **Excel Export**: Professional output tables via `outreg2` (Stata-inspired)
77
+ - **Flexible API**: Both formula and matrix interfaces supported
78
+ - **Rich Output**: Detailed summary statistics and diagnostic tests
79
+
80
+ ### Technical Excellence
81
+ - **Robust Implementation**: Based on proven econometric theory
82
+ - **Performance Optimized**: Efficient algorithms for large datasets
83
+ - **Well Tested**: Comprehensive test suite ensuring reliability
84
+ - **Type Hints**: Full type annotation for better development experience
85
+
86
+ ## 📦 Installation
87
+
88
+ ```bash
89
+ # Latest stable version
90
+ pip install StatsPAI
91
+
92
+ # Development version
93
+ pip install git+https://github.com/brycewang-stanford/pyEconometrics.git
94
+ ```
95
+
96
+ ### Requirements
97
+ - Python 3.8+
98
+ - NumPy, SciPy, Pandas
99
+ - scikit-learn (for Causal Forest)
100
+ - openpyxl (for Excel export)
101
+
102
+ ## 🏁 Quick Start
103
+
104
+ ### Basic Regression Analysis
105
+ ```python
106
+ import pandas as pd
107
+ from statspai import reg, outreg2
108
+
109
+ # Load your data
110
+ df = pd.read_csv('data.csv')
111
+
112
+ # Run OLS regression
113
+ result1 = reg('wage ~ education + experience', data=df)
114
+ print(result1.summary())
115
+
116
+ # Add control variables
117
+ result2 = reg('wage ~ education + experience + age + gender', data=df)
118
+
119
+ # Export results to Excel
120
+ outreg2([result1, result2], 'regression_results.xlsx',
121
+ title='Wage Regression Analysis')
122
+ ```
123
+
124
+ ### Instrumental Variables
125
+ ```python
126
+ # 2SLS estimation
127
+ iv_result = reg('wage ~ education | mother_education + father_education',
128
+ data=df, method='2sls')
129
+ print(iv_result.summary())
130
+ ```
131
+
132
+ ### Panel Data Analysis
133
+ ```python
134
+ # Fixed effects model
135
+ fe_result = reg('y ~ x1 + x2', data=df,
136
+ entity_col='firm_id', time_col='year',
137
+ method='fixed_effects')
138
+ ```
139
+
140
+ ### Causal Forest for Heterogeneous Treatment Effects
141
+ ```python
142
+ from statspai import CausalForest
143
+
144
+ # Initialize Causal Forest
145
+ cf = CausalForest(n_estimators=100, random_state=42)
146
+
147
+ # Fit model: outcome ~ treatment | features | controls
148
+ cf.fit('income ~ job_training | age + education + experience | region + year',
149
+ data=df)
150
+
151
+ # Estimate individual treatment effects
152
+ individual_effects = cf.effect(df)
153
+
154
+ # Get confidence intervals
155
+ effects_ci = cf.effect_interval(df, alpha=0.05)
156
+
157
+ # Export results
158
+ cf_summary = cf.summary()
159
+ outreg2([cf_summary], 'causal_forest_results.xlsx')
160
+ ```
161
+
162
+ ## 📊 Advanced Usage
163
+
164
+ ### Robust Standard Errors
165
+ ```python
166
+ # Heteroskedasticity-robust standard errors
167
+ result = reg('y ~ x1 + x2', data=df, robust=True)
168
+
169
+ # Clustered standard errors
170
+ result = reg('y ~ x1 + x2', data=df, cluster='firm_id')
171
+ ```
172
+
173
+ ### Model Comparison
174
+ ```python
175
+ from statspai import compare_models
176
+
177
+ models = [
178
+ reg('y ~ x1', data=df),
179
+ reg('y ~ x1 + x2', data=df),
180
+ reg('y ~ x1 + x2 + x3', data=df)
181
+ ]
182
+
183
+ comparison = compare_models(models)
184
+ print(comparison.summary())
185
+ ```
186
+
187
+ ### Custom Output Formatting
188
+ ```python
189
+ outreg2(results, 'output.xlsx',
190
+ title='Regression Results',
191
+ add_stats={'Observations': lambda r: r.nobs,
192
+ 'R-squared': lambda r: r.rsquared},
193
+ decimal_places=4,
194
+ star_levels=[0.01, 0.05, 0.1])
195
+ ```
196
+
197
+ ## 📚 Documentation
198
+
199
+ - **[User Guide](docs/user_guide.md)**: Comprehensive tutorials and examples
200
+ - **[API Reference](docs/api_reference.md)**: Detailed function documentation
201
+ - **[Theory Guide](docs/theory_guide.md)**: Mathematical foundations
202
+ - **[Examples](examples/)**: Jupyter notebooks with real-world applications
203
+
204
+ ## 🤝 Contributing
205
+
206
+ We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for details.
207
+
208
+ ### Development Setup
209
+ ```bash
210
+ # Clone repository
211
+ git clone https://github.com/brycewang-stanford/pyEconometrics.git
212
+ cd pyEconometrics
213
+
214
+ # Install in development mode
215
+ pip install -e ".[dev]"
216
+
217
+ # Install pre-commit hooks
218
+ pre-commit install
219
+
220
+ # Run tests
221
+ pytest
222
+ ```
223
+
224
+ ## 📄 License
225
+
226
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
227
+
228
+ ## 🙏 Acknowledgments
229
+
230
+ - Inspired by Stata's `outreg2` command for output formatting
231
+ - Causal Forest implementation based on Wager & Athey (2018)
232
+ - Built on the shoulders of NumPy, SciPy, and scikit-learn
233
+
234
+ ## 📞 Contact
235
+
236
+ - **Author**: Bryce Wang
237
+ - **Email**: brycewang2018@gmail.com
238
+ - **GitHub**: [brycewang-stanford](https://github.com/brycewang-stanford)
239
+
240
+ ## 📈 Citation
241
+
242
+ If you use StatsPAI in your research, please cite:
243
+
244
+ ```bibtex
245
+ @software{wang2024statspai,
246
+ title={StatsPAI: The AI-powered Statistics & Econometrics Toolkit for Python},
247
+ author={Wang, Bryce},
248
+ year={2024},
249
+ url={https://github.com/brycewang-stanford/pyEconometrics},
250
+ version={0.1.0}
251
+ }
252
+ ```