Abstract
Background: In the case of sensitive questions such as number of alcoholics known, majority of respondents might give an answer of zero. Poisson regression model (P) is the standard tool to analyze count data. However, P provides poor fit in the case of zero inflated counts, when over-dispersion exists. Therefore, the questions to be addressed are to compare performance of alternative count regression models; and to investigate whether characteristics of respondents affect their responses.
Methods: A total of 700 participants were asked about number of people they know in hidden groups; alcoholics, methadone users, and Female Sex Workers (FSW). Five regression models were fitted to these outcomes: Logistic, P, Negative Binomial (NB), Zero Inflated Poisson (ZIP), and Zero Inflated Negative Binomial (ZINB). Models were compared in terms of Likelihood Ratio Test (LRT), Vuong, AIC and Sum Square of Error (SSE).
Results: Percentages of zero were 35% for number of alcoholics, 50% for methadone users, and 65% for FSWs. ZINB provided the best fit for alcoholics, and NB provided the best fit for other outcomes. In addition, we noticed that young respondents, male and those with low education were more likely to know or reveal sensitive information.
Conclusions: Although P is the first choice for modeling of count data in many cases, it seems because of over-dispersion of zero inflated counts in the case of sensitive questions, other models, specifying NB and ZINB, might have better goodness of fit.