process.stepwise¶
- process.stepwise(model, pvalue_limit: float = 0.05)¶
Stepwise process for Statsmodels regression models
Usage example
In [1]: import statsmodels.api as sm In [2]: from statstests.datasets import empresas In [3]: from statstests.process import stepwise # import empresas dataset In [4]: df = empresas.get_data() # Estimate and fit model In [5]: model = sm.OLS.from_formula("retorno ~ disclosure + endividamento + ativos + liquidez", df).fit() # Print summary In [6]: print(model.summary()) OLS Regression Results ============================================================================== Dep. Variable: retorno R-squared: 0.833 Model: OLS Adj. R-squared: 0.827 Method: Least Squares F-statistic: 147.9 Date: Thu, 20 Oct 2022 Prob (F-statistic): 3.35e-45 Time: 14:03:22 Log-Likelihood: -401.07 No. Observations: 124 AIC: 812.1 Df Residuals: 119 BIC: 826.2 Df Model: 4 Covariance Type: nonrobust ================================================================================= coef std err t P>|t| [0.025 0.975] --------------------------------------------------------------------------------- Intercept 6.0506 4.080 1.483 0.141 -2.028 14.129 disclosure 0.1067 0.048 2.227 0.028 0.012 0.202 endividamento -0.0882 0.051 -1.723 0.087 -0.190 0.013 ativos 0.0035 0.001 5.134 0.000 0.002 0.005 liquidez 1.9762 0.396 4.987 0.000 1.191 2.761 ============================================================================== Omnibus: 35.509 Durbin-Watson: 2.065 Prob(Omnibus): 0.000 Jarque-Bera (JB): 7.127 Skew: -0.136 Prob(JB): 0.0283 Kurtosis: 1.858 Cond. No. 2.94e+04 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 2.94e+04. This might indicate that there are strong multicollinearity or other numerical problems. # Stepwise process In [7]: stepwise(model, pvalue_limit=0.05) Regression type: OLS Estimating model...: retorno ~ disclosure + endividamento + ativos + liquidez Discarding atribute "endividamento" with p-value equal to 0.08749071283026419 Estimating model...: retorno ~ disclosure + ativos + liquidez Discarding atribute "disclosure" with p-value equal to 0.06514029954311086 Estimating model...: retorno ~ ativos + liquidez No more atributes with p-value higher than 0.05 Atributes discarded on the process...: {'atribute': 'endividamento', 'p-value': 0.08749071283026419} {'atribute': 'disclosure', 'p-value': 0.06514029954311086} Model after stepwise process...: retorno ~ ativos + liquidez OLS Regression Results ============================================================================== Dep. Variable: retorno R-squared: 0.823 Model: OLS Adj. R-squared: 0.820 Method: Least Squares F-statistic: 282.1 Date: Thu, 20 Oct 2022 Prob (F-statistic): 2.76e-46 Time: 14:03:22 Log-Likelihood: -404.37 No. Observations: 124 AIC: 814.7 Df Residuals: 121 BIC: 823.2 Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -2.5348 2.341 -1.083 0.281 -7.169 2.100 ativos 0.0040 0.001 7.649 0.000 0.003 0.005 liquidez 2.7391 0.258 10.637 0.000 2.229 3.249 ============================================================================== Omnibus: 23.591 Durbin-Watson: 1.926 Prob(Omnibus): 0.000 Jarque-Bera (JB): 5.887 Skew: -0.087 Prob(JB): 0.0527 Kurtosis: 1.947 Cond. No. 1.65e+04 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified. [2] The condition number is large, 1.65e+04. This might indicate that there are strong multicollinearity or other numerical problems. Out[7]: <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ff9aa456460>
- Parameters:
- modelStatsmodels model
- pvalue_limitfloat
- Returns:
- modelStepwised model
References
[1]Reference