Methodological aspects in the use of scoring and multivariable logistic regression as predictive model
Department of Biostatistics and Medical Informatics, University College of Medical Sciences, Dilshad Garden, Delhi, India
Department of Biostatistics and Medical Informatics, University College of Medical Sciences, Dilshad Garden, Delhi
|How to cite this article:|
Kumar R. Methodological aspects in the use of scoring and multivariable logistic regression as predictive model.J Postgrad Med 2013;59:343-344
|How to cite this URL:|
Kumar R. Methodological aspects in the use of scoring and multivariable logistic regression as predictive model. J Postgrad Med [serial online] 2013 [cited 2020 Feb 19 ];59:343-344
Available from: http://www.jpgmonline.com/text.asp?2013/59/4/343/123193
I would like to present a few considerations related to the statistical analysis and scoring system used by the authors in predicting post total thyroidectomy (TT) hypocalcemia. 
Firstly, the authors had assigned equal score (one) to each predictor if the condition of [Table 1] in the original article was fulfilled; otherwise they had assigned a score of zero. The rationale behind assigning equal score to each predictor is unclear. Secondly, the threshold point for each predictor seems to be arbitrary, although the authors tried to justify by providing the sensitivity and specificity at the chosen cut-off point. It would have been better if the authors had applied the receiver operating characteristic (ROC) curve and used Youden's index to find the best cut-off point and then correlate with clinical justification to decide the optimal threshold point of each predictor. Dichotomization of a continuous predictor is a bad idea and lost approximately one-third of information. 
The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) with 95% confidence interval (CI) at various cut-off points were presented in [Table 5] of the original article. When any of these indices was equal to one, the calculated 95% CI value seemed to be wrong. For example, specificity at hypocalcemia score ≥4 was mentioned as 0.995 with 95% CI being 0.95-0.99. The 95% CI must contain the point estimated value, i.e. 0.995 should be a part of 95% CI. Exact binomial methods should be applied to find the 95% CI when the proportion is close to or equal to 0 or 1. These methods are readily available in statistical software like STATA, SAS, and R.
The authors had applied multivariable logistic regression (MLR) to develop a predictive model for predicting post TT hypocalcemia. It is not clear why the authors applied MLR in the study when the scoring system of each predictor was pre-decided. However, the MLR as a predictive model has many flaws. In MLR, eight predictors were included to predict post TT hypocalcemia, and there were only 44 out of 145 patients who underwent thyroidectomy developed post TT hypocalemia. Therefore, the ratio of events per predictor was small (44/8 = 5.5). The small number per event affects the accuracy and precision of regression coefficient estimate of the predictor and also may have higher standard error and wider CI.  The high standard errors are an indication of instability in regression coefficient which yields wrong associations. In this study, higher standard error was seen in postoperative calcium (18.3), postoperative serum intact parathyroid hormone (iPTH; 279.39), and age (951.42). The high standard error observed in this study may be due to the small sample size, collinearity effect (correlation between preoperative calcium and postoperative calcium), or perfect classification of predictors. A useful rule of thumb from the simulation studies suggested that event per predictor should be at least 10 or preferably higher in case of predictive model. However, the authors had mentioned the small sample size issue as a limitation in the article.
Furthermore, the authors did not apply any validation method for this predictive model. It is recommended that the predictive model be validated at least by internal validation. 
MLR is the most frequently applied method in medical literature, but small size, presence of collinearity, and invalidation of predictive model lead to instability in model estimates, and the model results are not trustworthy. Our previous study in which the MLR quality was evaluated using well-established criteria revealed that compliance of MLR quality criteria is poor in Indian medical journals.  The articles using MLR should be reviewed by competent statisticians or epidemiologists to avoid such errors.
|1||Pradeep PV, Ramalingam K, Jayashree B. Post total thyroidectomy hypocalcemia: A novel multi-factorial scoring system to enable its prediction to facilitate an early discharge. J Postgrad Med 2013;59:4-8.|
|2||Altman DG, Royston P. Statistics notes: The cost of dichotomising continuous variables. BMJ 2006;332:1080-1.|
|3||Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996;49:1373-9.|
|4||Bleeker SE, Moll HA, Steyerberg EW, Donders AR, Derksen-Lubsen G, Grobbee DE, et al. External validation is necessary in prediction research: A clinical example. J Clin Epidemiol 2003;56:826-32.|
|5||Kumar R, Indrayan A, Chhabra P. Reporting quality of multivariable logistic regression in selected Indian medical journals. J Postgrad Med 2012;58:123-6.|