tests.vuong_test¶
- tests.vuong_test(m1: Poisson | NegativeBinomial, m2: ZeroInflatedPoisson | ZeroInflatedNegativeBinomialP)¶
Module to perform Vuong test for identification of zero inflation in count data regression models.
The new Python command vuong_test of the package statstests.tests reports the results of the Voung test.
The Vuong statistical test specifies that the Vuong (1989) test of ZIP or ZINB versus Poisson or negative binomial, respectively, be reported. This test statistic has a standard normal distribution with large values favoring ZIP or ZINB models over Poisson or negative binomial regression models, respectively.
In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: from statstests.datasets import corruption In [4]: from statstests.tests import vuong_test In [5]: from statsmodels.discrete.count_model import ZeroInflatedNegativeBinomialP, ZeroInflatedPoisson In [6]: from statsmodels.discrete.discrete_model import Poisson, NegativeBinomial In [7]: import warnings In [8]: warnings.filterwarnings('ignore') # import corruption dataset In [9]: df = corruption.get_data() #Definição da variável dependente (voltando ao dataset 'df_corruption') In [10]: y = df.violations #Definição das variáveis preditoras que entrarão no componente de contagem In [11]: x = df[['staff','post','corruption']] In [12]: X = sm.add_constant(x) In [13]: X = pd.get_dummies(X, columns=['post'], drop_first=True) In [14]: X["post_yes"] = X["post_yes"].astype("int") # Estimação do modelo poisson In [15]: modelo_poisson = Poisson(endog=y, exog=X).fit() Optimization terminated successfully. Current function value: 6.952328 Iterations 9 #Parâmetros do modelo_poisson In [16]: print(modelo_poisson.summary()) Poisson Regression Results ============================================================================== Dep. Variable: violations No. Observations: 298 Model: Poisson Df Residuals: 294 Method: MLE Df Model: 3 Date: Mon, 02 Feb 2026 Pseudo R-squ.: 0.3992 Time: 11:29:15 Log-Likelihood: -2071.8 converged: True LL-Null: -3448.6 Covariance Type: nonrobust LLR p-value: 0.000 ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ const 2.2127 0.031 71.134 0.000 2.152 2.274 staff 0.0219 0.001 17.807 0.000 0.019 0.024 corruption 0.3418 0.027 12.430 0.000 0.288 0.396 post_yes -4.2968 0.197 -21.762 0.000 -4.684 -3.910 ============================================================================== # Estimação do modelo poisson In [17]: modelo_bneg = NegativeBinomial(endog=y, exog=X, loglike_method='nb2').fit() Optimization terminated successfully. Current function value: 1.904031 Iterations: 19 Function evaluations: 23 Gradient evaluations: 23 #Parâmetros do modelo_poisson In [18]: print(modelo_bneg.summary()) NegativeBinomial Regression Results ============================================================================== Dep. Variable: violations No. Observations: 298 Model: NegativeBinomial Df Residuals: 294 Method: MLE Df Model: 3 Date: Mon, 02 Feb 2026 Pseudo R-squ.: 0.1549 Time: 11:29:15 Log-Likelihood: -567.40 converged: True LL-Null: -671.37 Covariance Type: nonrobust LLR p-value: 8.088e-45 ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ const 1.9469 0.205 9.477 0.000 1.544 2.350 staff 0.0400 0.014 2.945 0.003 0.013 0.067 corruption 0.4527 0.133 3.396 0.001 0.191 0.714 post_yes -4.2746 0.266 -16.065 0.000 -4.796 -3.753 alpha 2.0963 0.243 8.614 0.000 1.619 2.573 ============================================================================== #Definição das variáveis preditoras que entrarão no componente de contagem In [19]: x1 = df[['staff','post','corruption']] In [20]: X1 = sm.add_constant(x1) #Definição das variáveis preditoras que entrarão no componente logit (inflate) In [21]: x2 = df[['corruption']] In [22]: X2 = sm.add_constant(x2) #Se estimarmos o modelo sem dummizar as variáveis categórias, o modelo retorna #um erro In [23]: X1 = pd.get_dummies(X1, columns=['post'], drop_first=True, dtype='int') #Estimação do modelo ZIP pela função 'ZeroInflatedPoisson' do pacote #'Statsmodels' #Estimação do modelo ZIP #O argumento 'exog_infl' corresponde às variáveis que entram no componente #logit (inflate) # modelo_zip = ZeroInflatedPoisson(y, # X1, # exog_infl=X2, # inflation='logit').fit(maxiter=1000000000) # #Parâmetros do modelo # print(modelo_zip.summary()) # vuong_test(modelo_poisson, modelo_zip)