Attributing death to cancer: cause-specific survival estimation.A Mathew, M Pandey
Division of Epidemiology and Clinical Research, Regional Cancer Centre, Medical College P.O., Trivandrum - 695 011, India. , India
Correspondence Address: Source of Support: None, Conflict of Interest: None PMID: 12571396
Source of Support: None, Conflict of Interest: None
Cancer survival estimation is an important part of assessing the overall strength of cancer care in a region. Generally, the death of a patient is taken as the end point in estimation of overall survival. When calculating the overall survival, the cause of death is not taken into account. With increasing demand for better survival of cancer patients it is important for clinicians and researchers to know about survival statistics due to disease of interest, i.e. net survival. It is also important to choose the best method for estimating net survival. Increase in the use of computer programmes has made it possible to carry out statistical analysis without guidance from a bio-statistician. This is of prime importance in third- world countries as there are a few trained bio-statisticians to guide clinicians and researchers. The present communication describes current methods used to estimate net survival such as cause-specific survival and relative survival. The limitation of estimation of cause-specific survival particularly in India and the usefulness of relative survival are discussed. The various sources for estimating cancer survival are also discussed. As survival-estimates are to be projected on to the population at large, it becomes important to measure the variation of the estimates, and thus confidence intervals are used. Rothman's confidence interval gives the most satisfactory result for survival estimate.
Keywords: Cause of Death, Death Certificates, Human, Life Tables, Neoplasms, mortality,Registries,
Survival from cancer refers to the time duration from diagnosis to eventual death due to cancer. Survival figures from a population-based study provide the overall effectiveness of cancer care in a region, while the corresponding figures from a hospital-based study provide a summary index of the efficacy of treatment and a sound platform for therapeutic planning.
People with cancer generally experience a much lower overall survival probability than the general population. The overall survival probability for people with cancer is being considered as the result of two components, corresponding respectively to deaths due to disease being studied and to deaths due to all other causes taken together. Thus a net survival can be defined as “the survival, which may occur if the risks of death other than the disease under study are removed from the overall survival”. Analysing the net survival is thus equivalent to the analysis of the excess mortality in the group under study. The point at which net survival levels off represents the percentage of subjects who can be considered to be cured of the cancer under study. The net survival probabilities are useful to make comparisons between subpopulations from the same region or between populations from different regions where the mortality due to other causes may differ such that the comparison of overall survival probabilities might lead to a biased conclusion.
There are a number of methods to estimate overall,, and net survival probability, as well as several ways to estimate confidence intervals for a survival probability.,,,, The present paper critically compares various methods currently used for estimating net survival probability and confidence intervals for these probabilities. The advantages and disadvantages of each method are also discussed.
For estimating survival probability, a group of individuals with some common morbidity experience is followed up from a well-defined starting date. The starting date varies according to the purpose of the study. Commonly used starting dates in cancer survival estimation are date of first symptom, date of diagnosis, date of first visit to hospital (date of registration) and date of beginning of treatment. As an example, when the survival probability is used to measure the end results of a treatment, date of onset of therapy would be appropriate. For studying the natural history of a particular cancer, date of appearance of first symptom would be suitable as starting date.
All individuals in the study are then followed up to a predetermined cut off date. The vital status of each individual is to be assessed at the end of the study. When the period of observation ends, some patients would be dead either due to disease of interest or from causes other than that under study. Some patients would still be alive and they have to terminate the follow-up due to closure of the study. Such patients are called ‘withdrawn alive’. Some patients would be lost to follow-up before the end of the study. Any observation that terminates due to disease of interest is called ‘failure’. Failures will always have complete follow-up. All others will have incomplete follow-up and are called ‘censored’ observations.
Another informative component of survival analysis is the length of follow-up, which is defined as the time from the starting date to the terminal event for failures. For patients who have to terminate the follow-up due to closure of the study, length of follow-up will be the time from the starting date to the end of the study. For ‘lost to follow-up patients’, as the follow-up information at the end of the study is not available, length of follow-up is measured from the starting date to the date at which the patient is last seen. The length of follow-up is expressed in arbitrary units of interval, in terms of days, months or years. The unit is with regard to the prognosis of the site of cancer under consideration. For cancers with better prognosis, the unit could be years; otherwise it is expressed in terms of months.
The following are some of the commonly used sources of data for cancer survival studies.
a) Cancer registries
Often, cancer registries that routinely collect data on follow-up information of patients are the main sources of data for estimating cancer survival. Survival probability calculated from incident cases in a population are distinct from data of the same kind established from hospital patients.
Population-based survival are generated from a population-based cancer registry (PBCR) as PBCRs are concerned with all newly diagnosed cases of cancer occurring in a population of well-defined composition and size. Such survival figures represent the average prognosis in the population and provide an objective index of the effectiveness of cancer care in the region concerned. The findings based on population-based cancer survival can be generalised as this is based on a representation of all cancer patients with respect to age, socio-demographic and clinical factors in that particular geographic area.
Hospital-based survival addresses the survival of a selected group of patients treated in a particular hospital or a selective group seen within a hospital. They are unlikely to be representative of all cancer patients with respect to age, socio- demographic and clinical factors. Hence hospital-based survival figures are not representative of all incident cases in the population from which they are derived. On the other hand, the characteristics of survival as a function of many cancer-related factors can be addressed in hospital-based survival studies as these are routinely collected in hospitals.
For survival estimation, many population-based registries collect follow-up information through the collaborating hospital registries or hospital record departments which in turn conduct annual follow-up surveys of registered cases through the attending doctors of patients. Other active follow-up methods used to confirm the status of patient involve the utilisation of surveys or registers setup for various other purposes. Many registries therefore use sources such as population registers, registers of the national health service, health insurance or social security register, voters list etc. Telephone enquiries, medical insurance data, enquiry by general practitioners and personal enquiry are some of the other methods to assess the patient-status.,,
Many of the above methods may not be valid in developing countries because of poor coverage of records and improper or inadequate maintenance of them. A few registries in developing countries have developed indigenous methods to improve follow-up inadequacies remaining after routine review of case records. These include reply-paid postal enquiries using several addresses and routine scrutiny of newspaper announcements of obituaries.,
In hospital-based registries, patient follow-up is mainly maintained through a referral system. The disease-status of a patient is collected from where the patient was referred back upon discharge. The primary contact is generally maintained through the physician responsible for patient care. Hence, data from hospital registries are more complete compared to population registries.
b) Death certificates
Another method of collecting follow-up information on registered patients is through death certificates from the vital statistics department of the region. Routine scrutiny of death certificates gives primary cause of death, which is useful for calculating cause-specific survival probability. Any registered patient whose death has not been notified to the registries by the vital statistics department or whom the linkage fails due to poor identification is considered to be alive. The results of this kind of follow-up may therefore be a biased survival estimate if the death registration through the vital statistics department is not accurate.,,
Cases known to the cancer registry only from their death certificates cannot be included in survival analysis, because the actual date of diagnosis is not known. Cases first obtained from death certificates are usually followed back by cancer registries through hospital records (at least the date of diagnosis), which would enable them to include in survival analysis.
c) Verbal autopsy
Verbal autopsy (VA) is another indirect method for estimating cause-specific survival probability. This is more appropriate for third world countries where there is no regular system of death registry and getting a specific cause of death from them is next to impossible. Verbal autopsy uses information obtained from a close relative or care-taker of the deceased person about the circumstances, symptoms and signs during terminal illness and assigns a cause of death according to the International Classification of Disease codes. The World Health Organisation recommended lay reporting of health issue by people without formal medical training and subsequently published a “death record” which is probably the first VA questionnaire. It has been shown that the data obtained by these methods are found useful at least as well as a physician’s review. These methods are cheap, quick and simple and the applicability is far wider in overcoming the problem of not reporting., 
There are a number of methods used to estimate overall survival probabilities, such as the actuarial (life-table), and the Kaplan-Meier (product-limit) method. All deaths occurring among subjects, including deaths from causes other than cancer, are considered as ‘failures’ in the calculation of overall survival. To describe the deaths attributable to the disease under study, there are two classical methods available, the method of cause-specific survival and that of relative survival.
When reliable information on the cause of death of all registered cancer patient is available, cause-specific survival can be obtained by considering cases for which cause of death other than the registered cancer as ‘censored’ observations. i.e. only those deaths occurring due to the disease of interest are considered as failures while other deaths are considered as simply termination of follow-up (in the same way as cases lost to follow-up or withdrawn alive). Estimation of cause-specific survival can be carried out by the actuarial method or by the Kaplan-Meier method.
The method of cause-specific survival can be applied only when the cause of death is recorded on the death certificate. Information on cause of death is some times unavailable. Further, the reliability of the cause of death listed on the death certificate, particularly when looking at a specific cancer site is questionable. Sometimes a metastatic site may be recorded as the cause of death. Also, it is frequently difficult to classify the cause of death of a cancer patient as cancer or non-cancer death.
In India, only the city of Mumbai has reliable data on cause of death, as all deaths have to be medically certified according to coroner’s act. In urban areas information on cause of death is obtained in some instances through inpatient medical records of hospitals. Due to the absence of a central death registration system, the certification of cause of death is incomplete. In rural areas, information on cause of death is collected through paramedical workers. As the mortality statistics in India are deficient due to incomplete entry in the death certification, reliability of death registers are poor for cause-specific survival studies.
The method of relative survival does not require knowledge on the cause of death and thus avoids the difficulties associated with its determination. Relative survival is defined as the ratio of the overall survival to the expected survival for a group of people in the general population.
Expected survival corresponds to the mortality of the general population, taking into account the initial distribution in the group of factors, which one wishes to control for. If only age is considered, the expected survival is provided by the proportion of survivors that would be predicted at time ‘t’ in a group having the initial age structure as the group under study, but subject only to the force of mortality of the general population. The calculation of the expected number of survivors is firstly done for each subgroup defined by age at diagnosis in single year or by larger age groups according to the available life tables. Expected deaths are then summed. If nx is the number of subjects of age x at the beginning of follow-up and Sex(t) is the probability of survival at time t for a subject with initial age x, then the number of survivors at time t for this age group is
ex(t) = nx Sex (t)
The total number of survivors at this time is thus:
?ex(t) =? nx Sex (t)
Consequently, the overall expected survival is
? nx Sex (t)
Se(t) = x
When factors other than age are identified, it is preferable to calculate expected survival by taking them into account. Their incorporation in the analysis can be achieved if the data necessary to construct life-tables are available as a function of these variables.
The method of relative survival is based on the assumption that the general mortality, as it is described by the life-table of the population, adequately takes into account of all causes of mortality, except for the specific cause under study. The cause due to cancer is considered to be negligible in comparison to all other causes of death. Only in this condition, relative survival can provide an acceptable approximation to net survival. If this assumption does not hold, net survival will be overestimated as a result of the increased estimation of the mortality due to other causes. Mortality from a specific cancer constitutes a negligible fraction of total mortality. Hence, survival rates computed from general population life-tables provide satisfactory estimates of expected rates in analysing survival of cancer patients by cancer of specific site.
Life-tables established by statistical services are usually available to calculate expected survival in any country. However, it can happen that there is no life-table suitable for the population being studied, because the population being followed is too selected for its mortality to be described by the official table available. In this situation, a table can be built from available mortality rates provided they are sufficiently reliable and accurate.
In India, Sample Registration System based abridged life-tables are available in the vital statistics division, office of the registrar general of India. The latest life-tables for 5-year “normal” survival probabilities based on mortality experience in calendar years 1991-95 separately by urban and rural areas, by sex and by major states and the country as a whole are available.,
The survival probability based on a sample of observations is frequently used to generalise a larger population. In the selection of samples, sampling variation occurs. Standard error is a measure of the extent to which the sampling variations influence the computed survival probabilities. Thus estimation of standard error for survival probability becomes essential especially when its calculation is carried out on a small group of patients. Several formulae have been proposed for the computation of standard error and then confidence intervals for survival probability at a given point.,, The choice between the many different ways of calculating the confidence interval depends on practical considerations and on how conservative an estimate is required.
A rough estimate of the standard error (SE) for survival probability (Pi) at the end of a given interval ‘i’ can be calculated by using the formula of Peto et al, which is SE(Pi) = ?Pi(l-Pi)/Ni ; where Ni is the number of observations at risk during interval ‘i’. On the assumption that Pi will have an approximately normal sampling distribution, the 95% confidence interval for survival probability Pi is Pi + 1.96 SE (Pi).
Another formula suggested by Greenwood for the standard error of Pi based on an estimate of the variance of Pi is
SE(Pi) = Pi ? [dj/Nj(Nj-dj)]
where Nj is the number of observations at risk and dj is the number of deaths during interval ‘j’. On the assumption that Pi will have an approximately normal sampling distribution, the confidence interval can be calculated.
Rothman in 1978 suggested another formula to calculate confidence interval whose limits always lie between 0 and 1 and which is
where no = Pi(1-Pi)/Vi, Vi is the Greenwood’s variance and za is the standard value of Z-statistic at a level of significance.
The confidence interval for survival probability derived by Peto et al can go outside the range of 0 to 1 for small sample size or for very large or small probabilities even though it is an easily obtained estimate of the magnitude of the variability of the survival probability estimate. Hence, it is not a good approximation for small samples or for very large or small survival probabilities. As the formula for standard error derived by Greenwood depends on the estimate of the variance, it can lead to an underestimate of the variance for long time intervals when the sample size is not sufficiently large. Hence it is valid only for large samples. It has been shown by Anderson et al that Rothman’s method with Greenwood’s variance on an average provides the most satisfactory result.
The above formulae can be used for estimating confidence interval for cause-specific survival. The confidence interval for relative survival is proportional to that of the overall survival probability if random variation in expected survival could be assumed to be negligible. The standard error of the relative survival is thus obtained by dividing the standard error of the overall survival to the expected survival.
In conclusion attributing death to cancer is simple and yet difficult. It is simple, as it does not use any specific techniques yet it is difficult due to the absence of reliable information. Information from the death registers are mostly incomplete or fails to answer the simple question how the person died. The techniques of verbal autopsies are used but they too are biased due to the layman reporting. The best way out of situation is to use abridged life-table and calculate relative survival. Although not perfect, this is the most efficient method for third-world countries like India. There is an urgent need to streamline death registration and to educate and emphasise the need for correct certification.