Journal of Postgraduate Medicine
 Open access journal indexed with Index Medicus & ISI's SCI  
Users online: 2792  
Home | Subscribe | Feedback | Login 
About Latest Articles Back-Issues Article Submission Resources Sections Etcetera Contact
 
  NAVIGATE Here 
 ::   Next article
 ::   Previous article
 ::   Table of Contents

 RESOURCE Links
 ::   Similar in PUBMED
 ::  Search Pubmed for
 ::  Search in Google Scholar for
 ::Related articles
 ::   Citation Manager
 ::   Access Statistics
 ::   Reader Comments
 ::   Email Alert *
 ::   Add to My List *
 * Requires registration (Free)
 

 Article Access Statistics
    Viewed3348    
    Printed81    
    Emailed0    
    PDF Downloaded25    
    Comments [Add]    

Recommend this journal


 

 ORIGINAL ARTICLE
Year : 2016  |  Volume : 62  |  Issue : 1  |  Page : 26-31

Analysis of sparse data in logistic regression in medical research: A newer approach


1 Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu, India
2 Department of Statistics, St. Thomas College, Palai, Kerala, India

Correspondence Address:
L Jeyaseelan
Department of Biostatistics, Christian Medical College, Vellore, Tamil Nadu
India
Login to access the Email id

Source of Support: None, Conflict of Interest: None


DOI: 10.4103/0022-3859.173193

Rights and Permissions

Background and Objective: In the analysis of dichotomous type response variable, logistic regression is usually used. However, the performance of logistic regression in the presence of sparse data is questionable. In such a situation, a common problem is the presence of high odds ratios (ORs) with very wide 95% confidence interval (CI) (OR: >999.999, 95% CI: <0.001, >999.999). In this paper, we addressed this issue by using penalized logistic regression (PLR) method. Materials and Methods: Data from case-control study on hyponatremia and hiccups conducted in Christian Medical College, Vellore, Tamil Nadu, India was used. The outcome variable was the presence/absence of hiccups and the main exposure variable was the status of hyponatremia. Simulation dataset was created with different sample sizes and with a different number of covariates. Results: A total of 23 cases and 50 controls were used for the analysis of ordinary and PLR methods. The main exposure variable hyponatremia was present in nine (39.13%) of the cases and in four (8.0%) of the controls. Of the 23 hiccup cases, all were males and among the controls, 46 (92.0%) were males. Thus, the complete separation between gender and the disease group led into an infinite OR with 95% CI (OR: >999.999, 95% CI: <0.001, >999.999) whereas there was a finite and consistent regression coefficient for gender (OR: 5.35; 95% CI: 0.42, 816.48) using PLR. After adjusting for all the confounding variables, hyponatremia entailed 7.9 (95% CI: 2.06, 38.86) times higher risk for the development of hiccups as was found using PLR whereas there was an overestimation of risk OR: 10.76 (95% CI: 2.17, 53.41) using the conventional method. Simulation experiment shows that the estimated coverage probability of this method is near the nominal level of 95% even for small sample sizes and for a large number of covariates. Conclusions: PLR is almost equal to the ordinary logistic regression when the sample size is large and is superior in small cell values.






[FULL TEXT] [PDF]*


        
Print this article     Email this article

Online since 12th February '04
2004 - Journal of Postgraduate Medicine
Official Publication of the Staff Society of the Seth GS Medical College and KEM Hospital, Mumbai, India
Published by Wolters Kluwer - Medknow