npm - @wentorai/research-plugins - Versions diffs - 1.0.0 → 1.1.0 - Mend

@wentorai/research-plugins 1.0.0 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (203) hide show

package/skills/domains/finance/risk-modeling-guide/SKILL.md ADDED Viewed

@@ -0,0 +1,260 @@
+---
+name: risk-modeling-guide
+description: "Financial risk modeling including VaR, stress testing, and credit risk"
+metadata:
+  openclaw:
+    emoji: "chart-decreasing"
+    category: "domains"
+    subcategory: "finance"
+    keywords: ["risk-modeling", "var", "stress-testing", "credit-risk", "monte-carlo", "basel"]
+    source: "wentor"
+---
+# Risk Modeling Guide
+A skill for quantitative financial risk modeling, covering Value at Risk, Expected Shortfall, credit risk, stress testing, and Monte Carlo simulation methods. Essential for financial engineering research and regulatory risk analysis.
+## Market Risk: Value at Risk
+### VaR Methodologies
+| Method | Description | Pros | Cons |
+|--------|-------------|------|------|
+| Historical simulation | Replay past returns | No distributional assumption | Assumes past repeats |
+| Variance-covariance | Assume normal returns | Fast, analytical | Underestimates tail risk |
+| Monte Carlo simulation | Simulate from fitted model | Flexible distributions | Computationally expensive |
+| Filtered historical simulation | GARCH + historical innovations | Captures volatility clustering | More complex |
+### Implementation
+```python
+import numpy as np
+import pandas as pd
+from scipy.stats import norm, t as t_dist
+def historical_var(returns: np.ndarray, confidence: float = 0.99,
+                    horizon_days: int = 1) -> dict:
+    """
+    Compute Value at Risk using historical simulation.
+    returns: array of daily log returns
+    confidence: confidence level (e.g., 0.99 for 99% VaR)
+    horizon_days: risk horizon in days
+    """
+    # Scale returns to horizon
+    if horizon_days > 1:
+        # Rolling sum for overlapping returns
+        scaled_returns = pd.Series(returns).rolling(horizon_days).sum().dropna().values
+    else:
+        scaled_returns = returns
+    alpha = 1 - confidence
+    var = -np.percentile(scaled_returns, alpha * 100)
+    es = -np.mean(scaled_returns[scaled_returns <= -var])
+    return {
+        "VaR": round(var, 6),
+        "Expected_Shortfall": round(es, 6),
+        "confidence": confidence,
+        "horizon_days": horizon_days,
+        "n_observations": len(scaled_returns),
+    }
+def parametric_var(returns: np.ndarray, confidence: float = 0.99,
+                    distribution: str = "normal") -> dict:
+    """
+    Parametric VaR assuming normal or Student-t distribution.
+    """
+    mu = np.mean(returns)
+    sigma = np.std(returns, ddof=1)
+    if distribution == "normal":
+        z = norm.ppf(1 - confidence)
+        var = -(mu + sigma * z)
+        # Analytical ES for normal
+        es = -mu + sigma * norm.pdf(norm.ppf(1 - confidence)) / (1 - confidence)
+    elif distribution == "student-t":
+        # Fit Student-t
+        df, loc, scale = t_dist.fit(returns)
+        z = t_dist.ppf(1 - confidence, df)
+        var = -(loc + scale * z)
+        # ES for Student-t
+        t_pdf = t_dist.pdf(t_dist.ppf(1 - confidence, df), df)
+        es = -loc + scale * (t_pdf / (1 - confidence)) * ((df + z**2) / (df - 1))
+    else:
+        raise ValueError(f"Unknown distribution: {distribution}")
+    return {
+        "VaR": round(var, 6),
+        "Expected_Shortfall": round(es, 6),
+        "distribution": distribution,
+        "mean": round(mu, 6),
+        "std": round(sigma, 6),
+    }
+```
+### Monte Carlo VaR
+```python
+def monte_carlo_var(returns: np.ndarray, n_simulations: int = 100000,
+                     confidence: float = 0.99,
+                     horizon_days: int = 10) -> dict:
+    """
+    Monte Carlo VaR using GBM (Geometric Brownian Motion).
+    """
+    mu = np.mean(returns)
+    sigma = np.std(returns, ddof=1)
+    # Simulate daily returns for the horizon
+    rng = np.random.default_rng(42)
+    simulated = rng.normal(
+        mu * horizon_days,
+        sigma * np.sqrt(horizon_days),
+        size=n_simulations,
+    )
+    alpha = 1 - confidence
+    var = -np.percentile(simulated, alpha * 100)
+    es = -np.mean(simulated[simulated <= -var])
+    return {
+        "VaR": round(var, 6),
+        "Expected_Shortfall": round(es, 6),
+        "n_simulations": n_simulations,
+        "confidence": confidence,
+        "horizon_days": horizon_days,
+    }
+```
+## Credit Risk Modeling
+### Probability of Default Estimation
+```python
+from sklearn.linear_model import LogisticRegression
+def build_pd_model(features: pd.DataFrame,
+                    default_flag: pd.Series) -> dict:
+    """
+    Build a Probability of Default (PD) model using logistic regression.
+    Common features: debt-to-income, credit utilization, payment history,
+    employment length, loan amount.
+    """
+    model = LogisticRegression(max_iter=1000, class_weight="balanced")
+    model.fit(features, default_flag)
+    # Coefficient interpretation
+    coef_df = pd.DataFrame({
+        "feature": features.columns,
+        "coefficient": model.coef_[0],
+        "odds_ratio": np.exp(model.coef_[0]),
+    }).sort_values("coefficient", ascending=False)
+    # Model discrimination
+    from sklearn.metrics import roc_auc_score
+    pred_proba = model.predict_proba(features)[:, 1]
+    auc = roc_auc_score(default_flag, pred_proba)
+    return {
+        "auc": round(auc, 4),
+        "coefficients": coef_df.to_dict("records"),
+        "intercept": round(model.intercept_[0], 4),
+    }
+```
+### Loss Given Default and EAD
+```python
+def compute_expected_loss(pd_score: float, lgd: float,
+                           ead: float) -> dict:
+    """
+    Compute Expected Loss = PD x LGD x EAD.
+    pd_score: probability of default (0-1)
+    lgd: loss given default (0-1, fraction of exposure lost)
+    ead: exposure at default (dollar amount)
+    """
+    el = pd_score * lgd * ead
+    return {
+        "PD": pd_score,
+        "LGD": lgd,
+        "EAD": ead,
+        "Expected_Loss": round(el, 2),
+        "Unexpected_Loss_99": round(el * 2.33 * np.sqrt(pd_score * (1 - pd_score)), 2),
+    }
+```
+## Stress Testing
+### Scenario-Based Stress Tests
+```python
+def run_stress_test(portfolio_returns: pd.DataFrame,
+                     scenarios: dict[str, dict]) -> pd.DataFrame:
+    """
+    Apply macroeconomic stress scenarios to a portfolio.
+    scenarios: {name: {factor: shock_value}} where factors are
+    macroeconomic variables (interest_rate, gdp_growth, unemployment, etc.)
+    """
+    # Factor sensitivities (betas from regression)
+    # In practice, estimated via historical regression
+    factor_betas = {
+        "interest_rate": -0.15,    # portfolio loses 15bp per 1% rate increase
+        "gdp_growth": 0.08,        # gains 8bp per 1% GDP growth
+        "unemployment": -0.12,     # loses 12bp per 1% unemployment increase
+        "equity_market": 0.45,     # 45bp per 1% equity market move
+        "credit_spread": -0.25,    # loses 25bp per 1% spread widening
+    }
+    results = []
+    for name, shocks in scenarios.items():
+        portfolio_impact = 0
+        for factor, shock in shocks.items():
+            beta = factor_betas.get(factor, 0)
+            portfolio_impact += beta * shock
+        results.append({
+            "scenario": name,
+            "portfolio_impact_pct": round(portfolio_impact * 100, 2),
+            "shocks": shocks,
+        })
+    return pd.DataFrame(results)
+# Example scenarios
+scenarios = {
+    "Mild Recession": {
+        "interest_rate": -0.5, "gdp_growth": -2.0,
+        "unemployment": 2.0, "equity_market": -15.0,
+        "credit_spread": 1.5,
+    },
+    "Severe Recession": {
+        "interest_rate": -1.0, "gdp_growth": -5.0,
+        "unemployment": 5.0, "equity_market": -40.0,
+        "credit_spread": 4.0,
+    },
+    "Rate Shock": {
+        "interest_rate": 3.0, "gdp_growth": -1.0,
+        "unemployment": 1.0, "equity_market": -10.0,
+        "credit_spread": 1.0,
+    },
+}
+```
+## Regulatory Framework
+### Basel III Capital Requirements
+| Risk Type | Measurement | Capital Charge |
+|-----------|-------------|---------------|
+| Market risk | FRTB (Fundamental Review of the Trading Book) | ES at 97.5%, stressed calibration |
+| Credit risk | SA or IRB approach | PD, LGD, EAD based risk weights |
+| Operational risk | Basic Indicator / Standardized | Business indicator x ILM |
+| Liquidity risk | LCR and NSFR ratios | High-quality liquid assets buffer |
+## Tools and Libraries
+- **QuantLib (Python/C++)**: Derivatives pricing and risk analytics
+- **riskfolio-lib**: Portfolio risk and optimization in Python
+- **arch (Python)**: GARCH models for volatility estimation
+- **pyfolio**: Portfolio performance and risk analysis
+- **OpenGamma Strata**: Open-source market risk analytics (Java)
+- **Moody's Analytics / Bloomberg PORT**: Commercial risk platforms

package/skills/domains/finance/stata-accounting-research/SKILL.md ADDED Viewed

@@ -0,0 +1,372 @@
+---
+name: stata-accounting-research
+description: "STATA code patterns for empirical accounting and finance research"
+metadata:
+  openclaw:
+    emoji: "📒"
+    category: "domains"
+    subcategory: "finance"
+    keywords: ["STATA", "accounting", "empirical finance", "panel data", "earnings management", "audit"]
+    source: "https://github.com/stata-accounting/resources"
+---
+# STATA Accounting Research Guide
+## Overview
+Empirical accounting research relies heavily on STATA for data manipulation, statistical analysis, and robustness testing. The field has developed standardized methodological approaches -- earnings quality models, event studies, difference-in-differences for regulatory changes, and instrument variable strategies for endogeneity -- that are implemented in a relatively stable set of STATA patterns.
+This guide provides the core STATA code patterns used in top accounting journals (The Accounting Review, Journal of Accounting Research, Journal of Accounting and Economics, and Review of Accounting Studies). These patterns are drawn from commonly used research designs in financial reporting, auditing, tax, and managerial accounting research.
+Whether you are estimating discretionary accruals, conducting an event study around an earnings announcement, testing the effect of auditor rotation on audit quality, or implementing a regulatory shock analysis, these patterns provide tested, reviewable STATA implementations.
+## Data Preparation
+### Loading and Cleaning COMPUSTAT Data
+```stata
+* ============================================================
+* COMPUSTAT Annual Data Preparation for Accounting Research
+* Standard preparation used across most empirical accounting papers
+* ============================================================
+* Load COMPUSTAT annual data
+use "compustat_annual.dta", clear
+* Keep relevant variables
+keep gvkey fyear datadate at sale cogs xsga dp ib oancf act lct che dlc ///
+     csho prcc_f ceq re dltt txp xrd ppegt ppent invt rect
+* Set panel structure
+destring gvkey, replace
+xtset gvkey fyear
+* --- Basic cleaning ---
+* Drop financial firms (SIC 6000-6999) and utilities (SIC 4900-4999)
+drop if inrange(sic, 6000, 6999) | inrange(sic, 4900, 4999)
+* Require minimum observations
+bysort gvkey: gen nobs = _N
+drop if nobs < 3
+drop nobs
+* --- Generate common variables ---
+* Total accruals (balance sheet approach)
+gen total_accruals = (D.act - D.che) - (D.lct - D.dlc) - dp
+* Total accruals (cash flow approach, preferred)
+gen total_accruals_cf = ib - oancf
+* Scale by lagged total assets
+gen lag_at = L.at
+gen ta_scaled = total_accruals_cf / lag_at
+gen sale_scaled = sale / lag_at
+gen ppe_scaled = ppent / lag_at
+gen dsale = D.sale / lag_at
+gen drec = D.rect / lag_at
+gen roa = ib / lag_at
+* Market value of equity
+gen mve = csho * prcc_f
+* Book-to-market ratio
+gen btm = ceq / mve
+* Leverage
+gen leverage = (dlc + dltt) / at
+* Firm size
+gen size = ln(at)
+* --- Winsorize at 1% and 99% ---
+foreach var of varlist ta_scaled sale_scaled ppe_scaled roa btm leverage size {
+    winsor2 `var', replace cuts(1 99)
+}
+* Label variables
+label var ta_scaled "Total accruals / lagged assets"
+label var roa "Return on assets"
+label var btm "Book-to-market ratio"
+label var leverage "Total debt / total assets"
+label var size "Log(total assets)"
+save "compustat_clean.dta", replace
+```
+## Earnings Quality Models
+### Modified Jones Model (Dechow et al., 1995)
+```stata
+* ============================================================
+* Modified Jones Model: Estimate discretionary accruals
+* Standard model for earnings management research
+* ============================================================
+use "compustat_clean.dta", clear
+* --- Step 1: Estimate non-discretionary accruals by industry-year ---
+* Jones (1991) model estimated cross-sectionally
+gen inv_lag_at = 1 / lag_at
+gen dsale_drec = dsale - drec  // Modified Jones adjustment
+* Estimate by 2-digit SIC and year (require >= 15 obs per group)
+gen sic2 = floor(sic / 100)
+* Cross-sectional estimation
+gen da_mj = .
+gen nda_mj = .
+levelsof fyear, local(years)
+foreach y of local years {
+    levelsof sic2 if fyear == `y', local(industries)
+    foreach ind of local industries {
+        * Count observations in this industry-year
+        count if sic2 == `ind' & fyear == `y' & !missing(ta_scaled, inv_lag_at, dsale_drec, ppe_scaled)
+        if r(N) >= 15 {
+            * Estimate Jones model
+            quietly reg ta_scaled inv_lag_at dsale_drec ppe_scaled ///
+                if sic2 == `ind' & fyear == `y', robust
+            * Predict non-discretionary accruals
+            quietly predict temp_nda if sic2 == `ind' & fyear == `y', xb
+            quietly replace nda_mj = temp_nda if sic2 == `ind' & fyear == `y'
+            drop temp_nda
+        }
+    }
+}
+* Discretionary accruals = Total accruals - Non-discretionary accruals
+replace da_mj = ta_scaled - nda_mj
+* Absolute discretionary accruals (common measure of earnings quality)
+gen abs_da = abs(da_mj)
+label var da_mj "Discretionary accruals (Modified Jones)"
+label var abs_da "Absolute discretionary accruals"
+save "accruals_data.dta", replace
+```
+### Performance-Matched Discretionary Accruals (Kothari et al., 2005)
+```stata
+* ============================================================
+* Kothari (2005): Performance-matched discretionary accruals
+* Controls for correlation between performance and accruals
+* ============================================================
+* Add ROA to the Jones model
+gen da_kothari = .
+levelsof fyear, local(years)
+foreach y of local years {
+    levelsof sic2 if fyear == `y', local(industries)
+    foreach ind of local industries {
+        count if sic2 == `ind' & fyear == `y' & !missing(ta_scaled, inv_lag_at, dsale_drec, ppe_scaled, roa)
+        if r(N) >= 15 {
+            quietly reg ta_scaled inv_lag_at dsale_drec ppe_scaled roa ///
+                if sic2 == `ind' & fyear == `y', robust
+            quietly predict temp_res if sic2 == `ind' & fyear == `y', residuals
+            quietly replace da_kothari = temp_res if sic2 == `ind' & fyear == `y'
+            drop temp_res
+        }
+    }
+}
+gen abs_da_kothari = abs(da_kothari)
+label var da_kothari "Discretionary accruals (Kothari)"
+label var abs_da_kothari "Absolute DA (Kothari)"
+```
+## Event Study
+```stata
+* ============================================================
+* Short-window event study around earnings announcements
+* Standard methodology for capital markets research
+* ============================================================
+use "crsp_daily_returns.dta", clear
+* Merge with event dates
+merge m:1 gvkey fyear using "earnings_dates.dta", keep(match) nogen
+* --- Estimation window: [-250, -30] relative to announcement ---
+gen event_day = date - rdq  // rdq = report date of quarterly earnings
+keep if inrange(event_day, -250, 10)
+* Estimate market model in estimation window
+gen est_window = inrange(event_day, -250, -30)
+gen event_window = inrange(event_day, -1, 1)  // 3-day window [-1, +1]
+* Market model: R_i = alpha + beta * R_m + epsilon
+bysort permno fyear: egen has_enough = total(est_window)
+keep if has_enough >= 100  // Require 100+ days in estimation window
+* Estimate market model parameters
+gen alpha = .
+gen beta_mkt = .
+levelsof permno, local(firms)
+foreach p of local firms {
+    capture quietly reg ret mktrf if permno == `p' & est_window == 1
+    if _rc == 0 {
+        quietly replace alpha = _b[_cons] if permno == `p'
+        quietly replace beta_mkt = _b[mktrf] if permno == `p'
+    }
+}
+* Abnormal returns
+gen ar = ret - (alpha + beta_mkt * mktrf)
+* Cumulative abnormal returns [-1, +1]
+bysort permno fyear (event_day): egen car_3day = total(ar) if event_window == 1
+* Cross-sectional test
+preserve
+    keep if event_day == 0
+    * t-test: Is average CAR different from zero?
+    ttest car_3day == 0
+    * Regression with controls
+    reg car_3day surprise size btm, robust
+restore
+```
+## Regression Specifications
+### Standard Panel Regression with Fixed Effects
+```stata
+* ============================================================
+* Standard regression specification for accounting research
+* Includes firm and year fixed effects, clustered standard errors
+* ============================================================
+use "merged_analysis_data.dta", clear
+* --- Main specification ---
+* DV: Absolute discretionary accruals (earnings quality)
+* Key IV: Big 4 auditor indicator
+* Model 1: Pooled OLS (baseline, for comparison only)
+reg abs_da big4 size leverage btm roa loss, robust
+estimates store m1
+* Model 2: Year fixed effects
+reg abs_da big4 size leverage btm roa loss i.fyear, robust
+estimates store m2
+* Model 3: Industry + Year fixed effects
+reg abs_da big4 size leverage btm roa loss i.sic2 i.fyear, robust
+estimates store m3
+* Model 4: Firm + Year fixed effects (preferred specification)
+reghdfe abs_da big4 size leverage btm roa loss, absorb(gvkey fyear) ///
+    cluster(gvkey)
+estimates store m4
+* Model 5: Firm + Year FE, two-way clustering (firm and year)
+reghdfe abs_da big4 size leverage btm roa loss, absorb(gvkey fyear) ///
+    cluster(gvkey fyear)
+estimates store m5
+* --- Output table ---
+esttab m1 m2 m3 m4 m5 using "table_main.tex", replace ///
+    star(* 0.10 ** 0.05 *** 0.01) ///
+    b(%9.4f) se(%9.4f) ///
+    stats(N r2 r2_a, fmt(%9.0g %9.4f %9.4f) ///
+        labels("Observations" "R-squared" "Adj. R-squared")) ///
+    title("Effect of Auditor Type on Earnings Quality") ///
+    label booktabs
+```
+## Robustness Tests
+### Propensity Score Matching
+```stata
+* ============================================================
+* Propensity Score Matching (PSM) for endogeneity concerns
+* Used when treatment assignment (e.g., Big 4 auditor) is not random
+* ============================================================
+* Step 1: Estimate propensity score
+logit big4 size leverage btm roa loss age_firm, robust
+predict pscore, pr
+* Step 2: Common support check
+gen cs = pscore >= 0.1 & pscore <= 0.9  // Trim extreme propensity scores
+* Step 3: Nearest-neighbor matching (1:1, without replacement)
+psmatch2 big4 size leverage btm roa loss if cs == 1, ///
+    outcome(abs_da) neighbor(1) caliper(0.01) common
+* Check covariate balance after matching
+pstest size leverage btm roa loss, both
+* Step 4: Re-estimate on matched sample
+gen matched = _weight != .
+reg abs_da big4 size leverage btm roa loss if matched == 1, robust
+```
+### Heckman Selection Model
+```stata
+* ============================================================
+* Heckman two-stage model for sample selection bias
+* Example: Analyst coverage → Earnings quality
+* ============================================================
+* First stage: Selection equation (what determines analyst coverage?)
+probit analyst_covered size btm roa institutional_ownership sp500 ///
+    exchange_listed, robust
+* Second stage: Outcome equation with inverse Mills ratio
+heckman abs_da analyst_covered size leverage btm roa, ///
+    select(analyst_covered = size btm roa institutional_ownership ///
+           sp500 exchange_listed) ///
+    twostep
+```
+## Publication-Ready Output
+```stata
+* ============================================================
+* Generating publication-ready tables and statistics
+* ============================================================
+* Summary statistics table
+estpost summarize abs_da big4 size leverage btm roa loss, detail
+esttab using "table_sumstats.tex", replace ///
+    cells("count mean sd p25 p50 p75") ///
+    label booktabs title("Summary Statistics")
+* Correlation matrix
+pwcorr abs_da big4 size leverage btm roa, star(0.05) sig
+estpost correlate abs_da big4 size leverage btm roa, matrix listwise
+esttab using "table_corr.tex", replace unstack not noobs ///
+    label booktabs title("Correlation Matrix")
+* Univariate comparison (treatment vs. control)
+ttest abs_da, by(big4) unequal
+ranksum abs_da, by(big4)
+```
+## Best Practices
+- **Always cluster standard errors** by firm (at minimum) in panel data. Two-way clustering by firm and year is increasingly required by reviewers.
+- **Use `reghdfe`** for high-dimensional fixed effects. It is faster and more memory-efficient than `areg` or `xtreg, fe`.
+- **Report economic magnitude.** A one-standard-deviation change in X produces a Y% change in the dependent variable.
+- **Include all robustness tests** that reviewers expect: PSM, Heckman, placebo tests, entropy balancing, and alternative variable definitions.
+- **Winsorize at 1% and 99%** as a default; report results at 5%/95% as a robustness check.
+- **Use `eststo` and `esttab`** for consistent, automated table generation. Never hand-type regression results.
+## References
+- Dechow, P. M., Sloan, R. G., & Sweeney, A. P. (1995). Detecting Earnings Management. The Accounting Review, 70(2), 193-225.
+- Kothari, S. P., Leone, A. J., & Wasley, C. E. (2005). Performance Matched Discretionary Accrual Measures. Journal of Accounting and Economics, 39(1), 163-197.
+- [WRDS (Wharton Research Data Services)](https://wrds-www.wharton.upenn.edu/) -- Standard data platform for accounting/finance research
+- [reghdfe documentation](https://github.com/sergiocorreia/reghdfe) -- Fast fixed-effects estimation in STATA
+- Gow, I. D., Ormazabal, G., & Taylor, D. J. (2010). Correcting for Cross-Sectional and Time-Series Dependence in Accounting Research. The Accounting Review, 85(2), 483-512.