tests.vuong_test

tests.vuong_test(m1: Poisson | NegativeBinomial, m2: ZeroInflatedPoisson | ZeroInflatedNegativeBinomialP)

Module to perform Vuong test for identification of zero inflation in count data regression models.

The new Python command vuong_test of the package statstests.tests reports the results of the Voung test.

The Vuong statistical test specifies that the Vuong (1989) test of ZIP or ZINB versus Poisson or negative binomial, respectively, be reported. This test statistic has a standard normal distribution with large values favoring ZIP or ZINB models over Poisson or negative binomial regression models, respectively.

In [1]: import pandas as pd

In [2]: import statsmodels.api as sm

In [3]: from statstests.datasets import corruption

In [4]: from statstests.tests import vuong_test

In [5]: from statsmodels.discrete.count_model import ZeroInflatedNegativeBinomialP, ZeroInflatedPoisson

In [6]: from statsmodels.discrete.discrete_model import Poisson, NegativeBinomial

In [7]: import warnings

In [8]: warnings.filterwarnings('ignore')

# import corruption dataset
In [9]: df = corruption.get_data()

#Definição da variável dependente (voltando ao dataset 'df_corruption')
In [10]: y = df.violations

#Definição das variáveis preditoras que entrarão no componente de contagem
In [11]: x = df[['staff','post','corruption']]

In [12]: X = sm.add_constant(x)

In [13]: X = pd.get_dummies(X, columns=['post'], drop_first=True)

In [14]: X["post_yes"] = X["post_yes"].astype("int")

# Estimação do modelo poisson
In [15]: modelo_poisson = Poisson(endog=y, exog=X).fit()
Optimization terminated successfully.
         Current function value: 6.952328
         Iterations 9

#Parâmetros do modelo_poisson
In [16]: print(modelo_poisson.summary())
                          Poisson Regression Results                          
==============================================================================
Dep. Variable:             violations   No. Observations:                  298
Model:                        Poisson   Df Residuals:                      294
Method:                           MLE   Df Model:                            3
Date:                Mon, 02 Feb 2026   Pseudo R-squ.:                  0.3992
Time:                        11:29:15   Log-Likelihood:                -2071.8
converged:                       True   LL-Null:                       -3448.6
Covariance Type:            nonrobust   LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.2127      0.031     71.134      0.000       2.152       2.274
staff          0.0219      0.001     17.807      0.000       0.019       0.024
corruption     0.3418      0.027     12.430      0.000       0.288       0.396
post_yes      -4.2968      0.197    -21.762      0.000      -4.684      -3.910
==============================================================================

# Estimação do modelo poisson
In [17]: modelo_bneg = NegativeBinomial(endog=y, exog=X, loglike_method='nb2').fit()
Optimization terminated successfully.
         Current function value: 1.904031
         Iterations: 19
         Function evaluations: 23
         Gradient evaluations: 23

#Parâmetros do modelo_poisson
In [18]: print(modelo_bneg.summary())
                     NegativeBinomial Regression Results                      
==============================================================================
Dep. Variable:             violations   No. Observations:                  298
Model:               NegativeBinomial   Df Residuals:                      294
Method:                           MLE   Df Model:                            3
Date:                Mon, 02 Feb 2026   Pseudo R-squ.:                  0.1549
Time:                        11:29:15   Log-Likelihood:                -567.40
converged:                       True   LL-Null:                       -671.37
Covariance Type:            nonrobust   LLR p-value:                 8.088e-45
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          1.9469      0.205      9.477      0.000       1.544       2.350
staff          0.0400      0.014      2.945      0.003       0.013       0.067
corruption     0.4527      0.133      3.396      0.001       0.191       0.714
post_yes      -4.2746      0.266    -16.065      0.000      -4.796      -3.753
alpha          2.0963      0.243      8.614      0.000       1.619       2.573
==============================================================================

#Definição das variáveis preditoras que entrarão no componente de contagem
In [19]: x1 = df[['staff','post','corruption']]

In [20]: X1 = sm.add_constant(x1)

#Definição das variáveis preditoras que entrarão no componente logit (inflate)
In [21]: x2 = df[['corruption']]

In [22]: X2 = sm.add_constant(x2)

#Se estimarmos o modelo sem dummizar as variáveis categórias, o modelo retorna
#um erro
In [23]: X1 = pd.get_dummies(X1, columns=['post'], drop_first=True, dtype='int')

#Estimação do modelo ZIP pela função 'ZeroInflatedPoisson' do pacote
#'Statsmodels'
#Estimação do modelo ZIP
#O argumento 'exog_infl' corresponde às variáveis que entram no componente
#logit (inflate)
# modelo_zip = ZeroInflatedPoisson(y, 
#                                 X1, 
#                                 exog_infl=X2,
#                                 inflation='logit').fit(maxiter=1000000000)
# #Parâmetros do modelo
# print(modelo_zip.summary())
# vuong_test(modelo_poisson, modelo_zip)