statstests.process.stepwise

statstests.process.stepwise(model, pvalue_limit: float = 0.05)

Stepwise process for Statsmodels regression models

Usage example

In [1]: import statsmodels.api as sm

In [2]: from statstests.datasets import empresas

In [3]: from statstests.process import stepwise

# import empresas dataset
In [4]: df = empresas.get_data()

# Estimate and fit model
In [5]: model = sm.OLS.from_formula("retorno ~ disclosure + endividamento + ativos + liquidez", df).fit()

# Print summary
In [6]: print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                retorno   R-squared:                       0.833
Model:                            OLS   Adj. R-squared:                  0.827
Method:                 Least Squares   F-statistic:                     147.9
Date:                Thu, 20 Oct 2022   Prob (F-statistic):           3.35e-45
Time:                        14:03:22   Log-Likelihood:                -401.07
No. Observations:                 124   AIC:                             812.1
Df Residuals:                     119   BIC:                             826.2
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         6.0506      4.080      1.483      0.141      -2.028      14.129
disclosure        0.1067      0.048      2.227      0.028       0.012       0.202
endividamento    -0.0882      0.051     -1.723      0.087      -0.190       0.013
ativos            0.0035      0.001      5.134      0.000       0.002       0.005
liquidez          1.9762      0.396      4.987      0.000       1.191       2.761
==============================================================================
Omnibus:                       35.509   Durbin-Watson:                   2.065
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                7.127
Skew:                          -0.136   Prob(JB):                       0.0283
Kurtosis:                       1.858   Cond. No.                     2.94e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.94e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

# Stepwise process
In [7]: stepwise(model, pvalue_limit=0.05)
Regression type: OLS 

Estimating model...: 
 retorno ~ disclosure + endividamento + ativos + liquidez

 Discarding atribute "endividamento" with p-value equal to 0.08749071283026419 

Estimating model...: 
 retorno ~ disclosure + ativos + liquidez

 Discarding atribute "disclosure" with p-value equal to 0.06514029954311086 

Estimating model...: 
 retorno ~ ativos + liquidez

 No more atributes with p-value higher than 0.05

 Atributes discarded on the process...: 

{'atribute': 'endividamento', 'p-value': 0.08749071283026419}
{'atribute': 'disclosure', 'p-value': 0.06514029954311086}

 Model after stepwise process...: 
 retorno ~ ativos + liquidez 

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                retorno   R-squared:                       0.823
Model:                            OLS   Adj. R-squared:                  0.820
Method:                 Least Squares   F-statistic:                     282.1
Date:                Thu, 20 Oct 2022   Prob (F-statistic):           2.76e-46
Time:                        14:03:22   Log-Likelihood:                -404.37
No. Observations:                 124   AIC:                             814.7
Df Residuals:                     121   BIC:                             823.2
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -2.5348      2.341     -1.083      0.281      -7.169       2.100
ativos         0.0040      0.001      7.649      0.000       0.003       0.005
liquidez       2.7391      0.258     10.637      0.000       2.229       3.249
==============================================================================
Omnibus:                       23.591   Durbin-Watson:                   1.926
Prob(Omnibus):                  0.000   Jarque-Bera (JB):                5.887
Skew:                          -0.087   Prob(JB):                       0.0527
Kurtosis:                       1.947   Cond. No.                     1.65e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.65e+04. This might indicate that there are
strong multicollinearity or other numerical problems.
Out[7]: <statsmodels.regression.linear_model.RegressionResultsWrapper at 0x7ff9aa45d490>
Parameters:
modelStatsmodels model
pvalue_limitfloat
Returns:
modelStepwised model

References

[1]

Reference