PyPI - StatsPAI - Versions diffs - 0.1.0__tar.gz - Mend

StatsPAI 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

statspai-0.1.0/CHANGELOG.md +36 -0
statspai-0.1.0/CONTRIBUTING.md +205 -0
statspai-0.1.0/LICENSE +21 -0
statspai-0.1.0/MANIFEST.in +44 -0
statspai-0.1.0/PKG-INFO +252 -0
statspai-0.1.0/README.md +199 -0
statspai-0.1.0/pyproject.toml +92 -0
statspai-0.1.0/setup.cfg +4 -0
statspai-0.1.0/src/StatsPAI.egg-info/PKG-INFO +252 -0
statspai-0.1.0/src/StatsPAI.egg-info/SOURCES.txt +24 -0
statspai-0.1.0/src/StatsPAI.egg-info/dependency_links.txt +1 -0
statspai-0.1.0/src/StatsPAI.egg-info/requires.txt +28 -0
statspai-0.1.0/src/StatsPAI.egg-info/top_level.txt +1 -0
statspai-0.1.0/src/statspai/__init__.py +54 -0
statspai-0.1.0/src/statspai/causal/__init__.py +10 -0
statspai-0.1.0/src/statspai/causal/causal_forest.py +795 -0
statspai-0.1.0/src/statspai/core/__init__.py +12 -0
statspai-0.1.0/src/statspai/core/base.py +87 -0
statspai-0.1.0/src/statspai/core/results.py +180 -0
statspai-0.1.0/src/statspai/core/utils.py +229 -0
statspai-0.1.0/src/statspai/output/__init__.py +10 -0
statspai-0.1.0/src/statspai/output/outreg2.py +729 -0
statspai-0.1.0/src/statspai/regression/__init__.py +11 -0
statspai-0.1.0/src/statspai/regression/ols.py +388 -0
statspai-0.1.0/tests/__init__.py +3 -0
statspai-0.1.0/tests/test_ols.py +124 -0

statspai-0.1.0/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,36 @@
+# Changelog
+All notable changes to StatsPAI will be documented in this file.
+## [0.1.0] - 2024-07-26
+### Added
+- **Core Regression Framework**
+  - OLS (Ordinary Least Squares) regression with formula interface
+  - Robust standard errors (HC0, HC1, HC2, HC3)
+  - Clustered standard errors
+  - Weighted Least Squares (WLS) support
+- **Causal Inference Module**
+  - Causal Forest implementation inspired by Wager & Athey (2018)
+  - Honest estimation for unbiased treatment effect estimation
+  - Bootstrap confidence intervals for treatment effects
+  - Formula interface: `"outcome ~ treatment | features | controls"`
+- **Output Management (outreg2)**
+  - Excel export functionality similar to Stata's outreg2
+  - Support for multiple regression models in single output
+  - Customizable formatting options
+  - Professional table layout
+- **Unified API Design**
+  - Consistent `reg()` function interface
+  - Formula parsing: R/Stata-style syntax `"y ~ x1 + x2"`
+  - Type hints throughout the codebase
+  - Comprehensive documentation
+### Technical Details
+- Python 3.8+ support
+- Dependencies: numpy, scipy, pandas, scikit-learn, openpyxl
+- MIT License
+- Comprehensive test suite

statspai-0.1.0/CONTRIBUTING.md ADDED Viewed

@@ -0,0 +1,205 @@
+# Contributing to StatsPAI
+We welcome contributions to StatsPAI! This document provides guidelines for contributing to the project.
+## 🤝 How to Contribute
+### Types of Contributions
+1. **Bug Reports**: Help us identify and fix issues
+2. **Feature Requests**: Suggest new econometric methods or improvements
+3. **Code Contributions**: Implement new features or fix bugs
+4. **Documentation**: Improve docs, examples, or tutorials
+5. **Testing**: Add test cases or improve test coverage
+### Getting Started
+1. **Fork the Repository**
+   ```bash
+   git clone https://github.com/brycewang-stanford/pyEconometrics.git
+   cd pyEconometrics
+   ```
+2. **Set Up Development Environment**
+   ```bash
+   # Create virtual environment
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   # Install in development mode
+   pip install -e ".[dev]"
+   # Install pre-commit hooks
+   pre-commit install
+   ```
+3. **Create a Branch**
+   ```bash
+   git checkout -b feature/your-feature-name
+   # or
+   git checkout -b fix/issue-description
+   ```
+## 📝 Development Workflow
+### Before Making Changes
+1. **Check existing issues** to avoid duplicate work
+2. **Create an issue** for major changes to discuss the approach
+3. **Read the documentation** to understand the codebase structure
+### Making Changes
+1. **Write Tests First** (TDD approach recommended)
+   ```bash
+   # Create test file
+   touch tests/test_your_feature.py
+   # Write failing tests
+   pytest tests/test_your_feature.py
+   ```
+2. **Implement Your Changes**
+   - Follow existing code style and patterns
+   - Add type hints for all function signatures
+   - Include docstrings for public functions
+   - Add inline comments for complex logic
+3. **Run Tests**
+   ```bash
+   # Run all tests
+   pytest
+   # Run with coverage
+   pytest --cov=src/statspai
+   # Run specific tests
+   pytest tests/test_your_feature.py -v
+   ```
+4. **Check Code Quality**
+   ```bash
+   # Format code
+   black src/ tests/
+   isort src/ tests/
+   # Check linting
+   flake8 src/ tests/
+   # Type checking (if mypy is configured)
+   mypy src/
+   ```
+### Commit Guidelines
+Use conventional commits format:
+```
+type(scope): brief description
+Detailed explanation if needed.
+Fixes #issue_number
+```
+**Types:**
+- `feat`: New feature
+- `fix`: Bug fix
+- `docs`: Documentation changes
+- `style`: Code style changes (formatting, etc.)
+- `refactor`: Code refactoring
+- `test`: Adding or updating tests
+- `chore`: Maintenance tasks
+**Examples:**
+```bash
+git commit -m "feat(causal): add bootstrap confidence intervals to CausalForest"
+git commit -m "fix(outreg2): handle empty model lists gracefully"
+git commit -m "docs(readme): update installation instructions"
+```
+## 🏗 Code Structure
+### Package Organization
+```
+src/statspai/
+├── __init__.py          # Main API exports
+├── core/                # Core regression functionality
+│   ├── __init__.py
+│   ├── base.py          # Base classes
+│   └── regression.py    # Main regression implementation
+├── causal/              # Causal inference methods
+│   ├── __init__.py
+│   └── causal_forest.py # Causal Forest implementation
+└── output/              # Output and formatting
+    ├── __init__.py
+    └── outreg2.py       # Excel export functionality
+```
+### Code Style Guidelines
+1. **Follow PEP 8** with line length of 88 characters
+2. **Use type hints** for all function parameters and return values
+3. **Write docstrings** in Google format:
+   ```python
+   def function_name(param1: int, param2: str) -> bool:
+       """Brief description of the function.
+       Args:
+           param1: Description of param1.
+           param2: Description of param2.
+       Returns:
+           Description of return value.
+       Raises:
+           ValueError: When param1 is negative.
+       """
+   ```
+4. **Use descriptive variable names**
+5. **Add comments for complex algorithms**
+### Testing Guidelines
+1. **Test Coverage**: Aim for >90% test coverage
+2. **Test Types**:
+   - Unit tests for individual functions
+   - Integration tests for workflows
+   - Regression tests against known results
+3. **Test Structure**:
+   ```python
+   def test_function_name_scenario():
+       # Arrange
+       data = create_test_data()
+       # Act
+       result = function_to_test(data)
+       # Assert
+       assert result.some_property == expected_value
+   ```
+4. **Use fixtures** for common test data:
+   ```python
+   @pytest.fixture
+   def sample_data():
+       return pd.DataFrame({
+           'y': [1, 2, 3, 4, 5],
+           'x': [2, 4, 6, 8, 10]
+       })
+   ```
+## 📞 Getting Help
+- **GitHub Issues**: For bugs and feature requests
+- **Discussions**: For questions and general discussion
+- **Email**: For security issues or private communication
+## 📄 License
+By contributing to StatsPAI, you agree that your contributions will be licensed under the MIT License.
+---
+Thank you for contributing to StatsPAI! 🎉

statspai-0.1.0/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 Bryce Wang
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

statspai-0.1.0/MANIFEST.in ADDED Viewed

@@ -0,0 +1,44 @@
+# Include essential files for PyPI distribution
+include LICENSE
+include README.md
+include CHANGELOG.md
+include CONTRIBUTING.md
+include pyproject.toml
+# Include package source code
+recursive-include src *.py
+# Include tests
+recursive-include tests *.py
+# Exclude internal documentation and development files
+exclude PROJECT_SUMMARY.md
+exclude FINAL_PROJECT_STATUS.md
+exclude RELEASE_CHECKLIST.md
+exclude build_and_release.sh
+exclude Makefile
+exclude *.sh
+# Exclude development directories (not needed for users)
+exclude docs/
+recursive-exclude docs *
+exclude examples/
+recursive-exclude examples *
+exclude .github/
+recursive-exclude .github *
+# Exclude build and cache files
+recursive-exclude * __pycache__
+recursive-exclude * *.py[co]
+exclude .git*
+exclude .pytest_cache
+exclude .mypy_cache
+exclude htmlcov
+exclude .coverage
+exclude .DS_Store
+exclude *.egg-info
+exclude build/
+exclude dist/
+exclude .venv/
+exclude venv/
+exclude .pre-commit-config.yaml

statspai-0.1.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,252 @@
+Metadata-Version: 2.4
+Name: StatsPAI
+Version: 0.1.0
+Summary: The AI-powered Statistics & Econometrics Toolkit for Python
+Author-email: Bryce Wang <bryce.wang@example.com>
+Maintainer-email: Bryce Wang <bryce.wang@example.com>
+License-Expression: MIT
+Project-URL: Homepage, https://github.com/brycewang-stanford/pyEconometrics
+Project-URL: Documentation, https://statspai.readthedocs.io/
+Project-URL: Repository, https://github.com/brycewang-stanford/pyEconometrics
+Project-URL: Bug Reports, https://github.com/brycewang-stanford/pyEconometrics/issues
+Keywords: econometrics,statistics,regression,causal-inference,causal-forest,panel-data,instrumental-variables,stata,R,machine-learning
+Classifier: Development Status :: 3 - Alpha
+Classifier: Intended Audience :: Science/Research
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.8
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Topic :: Scientific/Engineering :: Mathematics
+Classifier: Topic :: Office/Business :: Financial
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: numpy>=1.20.0
+Requires-Dist: pandas>=1.3.0
+Requires-Dist: scipy>=1.7.0
+Requires-Dist: statsmodels>=0.13.0
+Requires-Dist: linearmodels>=4.25
+Requires-Dist: numba>=0.56.0
+Requires-Dist: scikit-learn>=1.0.0
+Requires-Dist: patsy>=0.5.0
+Requires-Dist: openpyxl>=3.0.0
+Requires-Dist: xlsxwriter>=3.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=6.0; extra == "dev"
+Requires-Dist: pytest-cov; extra == "dev"
+Requires-Dist: black; extra == "dev"
+Requires-Dist: flake8; extra == "dev"
+Requires-Dist: mypy; extra == "dev"
+Requires-Dist: sphinx; extra == "dev"
+Requires-Dist: sphinx-rtd-theme; extra == "dev"
+Provides-Extra: performance
+Requires-Dist: jax[cpu]>=0.4.0; extra == "performance"
+Requires-Dist: jaxlib>=0.4.0; extra == "performance"
+Provides-Extra: plotting
+Requires-Dist: matplotlib>=3.5.0; extra == "plotting"
+Requires-Dist: seaborn>=0.11.0; extra == "plotting"
+Requires-Dist: plotly>=5.0.0; extra == "plotting"
+Dynamic: license-file
+# StatsPAI
+[![PyPI version](https://badge.fury.io/py/StatsPAI.svg)](https://badge.fury.io/py/StatsPAI)
+[![Python versions](https://img.shields.io/pypi/pyversions/StatsPAI.svg)](https://pypi.org/project/StatsPAI/)
+[![License](https://img.shields.io/github/license/brycewang-stanford/pyEconometrics.svg)](https://github.com/brycewang-stanford/pyEconometrics/blob/main/LICENSE)
+[![Build Status](https://github.com/brycewang-stanford/pyEconometrics/workflows/CI%2FCD%20Pipeline/badge.svg)](https://github.com/brycewang-stanford/pyEconometrics/actions)
+[![codecov](https://codecov.io/gh/brycewang-stanford/pyEconometrics/branch/main/graph/badge.svg)](https://codecov.io/gh/brycewang-stanford/pyEconometrics)
+**The AI-powered Statistics & Econometrics Toolkit for Python**
+StatsPAI bridges the gap between user-friendly syntax and powerful econometric analysis, making advanced techniques accessible to researchers and practitioners.
+## 🚀 Features
+### Core Econometric Methods
+- **Linear Regression**: OLS, WLS with robust standard errors
+- **Instrumental Variables**: 2SLS estimation
+- **Panel Data**: Fixed Effects, Random Effects models
+- **Causal Inference**: Causal Forest implementation (inspired by EconML)
+### User Experience
+- **Formula Interface**: Intuitive R/Stata-style syntax `"y ~ x1 + x2"`
+- **Excel Export**: Professional output tables via `outreg2` (Stata-inspired)
+- **Flexible API**: Both formula and matrix interfaces supported
+- **Rich Output**: Detailed summary statistics and diagnostic tests
+### Technical Excellence
+- **Robust Implementation**: Based on proven econometric theory
+- **Performance Optimized**: Efficient algorithms for large datasets
+- **Well Tested**: Comprehensive test suite ensuring reliability
+- **Type Hints**: Full type annotation for better development experience
+## 📦 Installation
+```bash
+# Latest stable version
+pip install StatsPAI
+# Development version
+pip install git+https://github.com/brycewang-stanford/pyEconometrics.git
+```
+### Requirements
+- Python 3.8+
+- NumPy, SciPy, Pandas
+- scikit-learn (for Causal Forest)
+- openpyxl (for Excel export)
+## 🏁 Quick Start
+### Basic Regression Analysis
+```python
+import pandas as pd
+from statspai import reg, outreg2
+# Load your data
+df = pd.read_csv('data.csv')
+# Run OLS regression
+result1 = reg('wage ~ education + experience', data=df)
+print(result1.summary())
+# Add control variables
+result2 = reg('wage ~ education + experience + age + gender', data=df)
+# Export results to Excel
+outreg2([result1, result2], 'regression_results.xlsx',
+        title='Wage Regression Analysis')
+```
+### Instrumental Variables
+```python
+# 2SLS estimation
+iv_result = reg('wage ~ education | mother_education + father_education',
+                data=df, method='2sls')
+print(iv_result.summary())
+```
+### Panel Data Analysis
+```python
+# Fixed effects model
+fe_result = reg('y ~ x1 + x2', data=df,
+                entity_col='firm_id', time_col='year',
+                method='fixed_effects')
+```
+### Causal Forest for Heterogeneous Treatment Effects
+```python
+from statspai import CausalForest
+# Initialize Causal Forest
+cf = CausalForest(n_estimators=100, random_state=42)
+# Fit model: outcome ~ treatment | features | controls
+cf.fit('income ~ job_training | age + education + experience | region + year',
+       data=df)
+# Estimate individual treatment effects
+individual_effects = cf.effect(df)
+# Get confidence intervals
+effects_ci = cf.effect_interval(df, alpha=0.05)
+# Export results
+cf_summary = cf.summary()
+outreg2([cf_summary], 'causal_forest_results.xlsx')
+```
+## 📊 Advanced Usage
+### Robust Standard Errors
+```python
+# Heteroskedasticity-robust standard errors
+result = reg('y ~ x1 + x2', data=df, robust=True)
+# Clustered standard errors
+result = reg('y ~ x1 + x2', data=df, cluster='firm_id')
+```
+### Model Comparison
+```python
+from statspai import compare_models
+models = [
+    reg('y ~ x1', data=df),
+    reg('y ~ x1 + x2', data=df),
+    reg('y ~ x1 + x2 + x3', data=df)
+]
+comparison = compare_models(models)
+print(comparison.summary())
+```
+### Custom Output Formatting
+```python
+outreg2(results, 'output.xlsx',
+        title='Regression Results',
+        add_stats={'Observations': lambda r: r.nobs,
+                  'R-squared': lambda r: r.rsquared},
+        decimal_places=4,
+        star_levels=[0.01, 0.05, 0.1])
+```
+## 📚 Documentation
+- **[User Guide](docs/user_guide.md)**: Comprehensive tutorials and examples
+- **[API Reference](docs/api_reference.md)**: Detailed function documentation
+- **[Theory Guide](docs/theory_guide.md)**: Mathematical foundations
+- **[Examples](examples/)**: Jupyter notebooks with real-world applications
+## 🤝 Contributing
+We welcome contributions! See our [Contributing Guide](CONTRIBUTING.md) for details.
+### Development Setup
+```bash
+# Clone repository
+git clone https://github.com/brycewang-stanford/pyEconometrics.git
+cd pyEconometrics
+# Install in development mode
+pip install -e ".[dev]"
+# Install pre-commit hooks
+pre-commit install
+# Run tests
+pytest
+```
+## 📄 License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## 🙏 Acknowledgments
+- Inspired by Stata's `outreg2` command for output formatting
+- Causal Forest implementation based on Wager & Athey (2018)
+- Built on the shoulders of NumPy, SciPy, and scikit-learn
+## 📞 Contact
+- **Author**: Bryce Wang
+- **Email**: brycewang2018@gmail.com
+- **GitHub**: [brycewang-stanford](https://github.com/brycewang-stanford)
+## 📈 Citation
+If you use StatsPAI in your research, please cite:
+```bibtex
+@software{wang2024statspai,
+  title={StatsPAI: The AI-powered Statistics & Econometrics Toolkit for Python},
+  author={Wang, Bryce},
+  year={2024},
+  url={https://github.com/brycewang-stanford/pyEconometrics},
+  version={0.1.0}
+}
+```