Abstract
INTRODUCTION
Why Survival Analysis?
Clinical outcomes come in a variety of statistical forms. Some are continuous, such as systolic blood pressure, and can be easily analyzed with linear regression. Others, such as mortality or myocardial infarction (MI), are distinct events and have forms that are slightly more complex to analyze statistically. If the clinical outcome observed is “either-or,” such as if a patient suffers an MI or not, logistic regression can be used. However, if the information on the time to MI is the observed outcome, data are analyzed using statistical methods for survival analysis. It should be noted that despite the name “survival analysis,” methods can be used in any time-to-event outcome, such as the time until a patient experiences an MI or the time to hospitalization. Such studies will discuss event-free survival which is the proportion of subjects who have not yet experienced an event.
There are important clinical and statistical reasons for investigating a time-to-event outcome using survival analysis. For example, consider a study that found that the final observed proportion of events between two treatment groups is identical. However, if one group had all events occur shortly after randomization, while the other had no events until just before the end of follow-up then the two treatments would logically be considered to have different clinical effects despite the identical proportions at the end of follow-up. Similarly, if all-cause mortality is the outcome, then a sufficiently long follow-up would reveal equal survival proportions of 0% between any groups. In such cases, the time to an event contains much more clinical information than whether or not the event occurred. There is much more statistical information as well, as survival analyses tend to have greater statistical power to detect a significant treatment or exposure effect than methods for binary outcomes such as logistic regression.
It is typical in these types of studies to have subjects who did not experience the event before the end of a study or dropped out before the event of interest occurs. These subjects are said to be right-censored. Although these may seem to be cases of missing data as the time-to-event is not actually observed, these subjects are highly valuable as the observation that they went a certain amount of time without experiencing an event is itself informative. One of the most important properties of survival methods is their ability to handle such censored observations which are ignored by methods such as a t-test (or analysis of variance) for comparing survival times of two (or more) groups and linear regression. It is worth noting that survival methods of analyses can handle other types of censoring such as left-censoring, where a subject had already experienced the event at the time they enrolled in the study, and interval- censoring, where it is known only that the event happened in a particular interval, say between the last and current visit. This article will only discuss right-censored data.
Clinical outcomes come in a variety of statistical forms. Some are continuous, such as systolic blood pressure, and can be easily analyzed with linear regression. Others, such as mortality or myocardial infarction (MI), are distinct events and have forms that are slightly more complex to analyze statistically. If the clinical outcome observed is “either-or,” such as if a patient suffers an MI or not, logistic regression can be used. However, if the information on the time to MI is the observed outcome, data are analyzed using statistical methods for survival analysis. It should be noted that despite the name “survival analysis,” methods can be used in any time-to-event outcome, such as the time until a patient experiences an MI or the time to hospitalization. Such studies will discuss event-free survival which is the proportion of subjects who have not yet experienced an event.
There are important clinical and statistical reasons for investigating a time-to-event outcome using survival analysis. For example, consider a study that found that the final observed proportion of events between two treatment groups is identical. However, if one group had all events occur shortly after randomization, while the other had no events until just before the end of follow-up then the two treatments would logically be considered to have different clinical effects despite the identical proportions at the end of follow-up. Similarly, if all-cause mortality is the outcome, then a sufficiently long follow-up would reveal equal survival proportions of 0% between any groups. In such cases, the time to an event contains much more clinical information than whether or not the event occurred. There is much more statistical information as well, as survival analyses tend to have greater statistical power to detect a significant treatment or exposure effect than methods for binary outcomes such as logistic regression.
It is typical in these types of studies to have subjects who did not experience the event before the end of a study or dropped out before the event of interest occurs. These subjects are said to be right-censored. Although these may seem to be cases of missing data as the time-to-event is not actually observed, these subjects are highly valuable as the observation that they went a certain amount of time without experiencing an event is itself informative. One of the most important properties of survival methods is their ability to handle such censored observations which are ignored by methods such as a t-test (or analysis of variance) for comparing survival times of two (or more) groups and linear regression. It is worth noting that survival methods of analyses can handle other types of censoring such as left-censoring, where a subject had already experienced the event at the time they enrolled in the study, and interval- censoring, where it is known only that the event happened in a particular interval, say between the last and current visit. This article will only discuss right-censored data.
OVERVIEW OF SURVIVAL METHODS AND THEIR USE IN NUCLEAR CARDIOLOGY
In comparing the survival distributions of two or more groups (for example, new therapy vs standard of care), Kaplan-Meier estimation1 and the log-rank test2 are the basic statistical methods of analyses. These are non-parametric methods in that no mathematical form of the survival distributions is assumed. If an investigator is interested in quantifying or investigating the effects of known covariates (e.g., age or race) or predictor variables (e.g., blood pressure), regression models are utilized. As in the conventional linear regression models, survival regression models allow for the quantification of the effect on survival of a set of predictors, the interaction of two predictors, or the effect of a new predictor above and beyond other covariates.
Among the available survival regression models, the Cox proportional hazards model developed by Sir David Cox3 has seen great use in epidemiological and medical studies, and the field of nuclear cardiology is no exception. What follows are some examples of Cox models being used in nuclear cardiology. Xu et al4looked at how myocardial scarring (assessed with positron emission tomography [PET] or single photon emission computed tomography [SPECT]) and other demographic and medical history factors predicted mortality in patients with advanced heart failure who received cardiac resynchronization therapy. Bourque et al5 looked at how left ventricular ejection fraction (LVEF, assessed with angiography) and nuclear summed rest score (SRS, assessed with SPECT) interacted to change the risk of mortality. Hachamovitch and Berman6 looked at the incremental prognostic value of myocardial perfusion SPECT (MPS) parameters in the prediction of sudden cardiac death. Nakata et al7 looked at how the heart-to-mediastinum ratio (assessed with metaiodobenzylguanidine [MIBG] imaging) predicted cardiac death.
Survival models other than the Cox model have been used in nuclear cardiology as well. For example, in a study of diagnosis strategies for quantifying myocardial perfusion with SPECT, Duvall et al8 utilized a log-normal survival model, a member of the parametric family of regression survival models, since initial data exploration revealed that the proportional hazards assumption of the Cox model was invalid. While this is an excellent example of when to utilize other survival models, it has been more common to see such data presented in conjunction with a Cox model analysis. In earlier studies of MPS-derived predictors of cardiac events, Hachamovitch et al9 used Cox models to identify significant predictors and parametric models, specifically the accelerated failure time (AFT) model, to make estimates of the time to certain percentiles of survival. An identical analysis strategy was used by the research group comprised of Cuocolo, Acampa, Petretta, Daniele et al10–13 in their research of the impact of various SPECT-derived predictors on the occurrence of cardiac events.
In comparing the survival distributions of two or more groups (for example, new therapy vs standard of care), Kaplan-Meier estimation1 and the log-rank test2 are the basic statistical methods of analyses. These are non-parametric methods in that no mathematical form of the survival distributions is assumed. If an investigator is interested in quantifying or investigating the effects of known covariates (e.g., age or race) or predictor variables (e.g., blood pressure), regression models are utilized. As in the conventional linear regression models, survival regression models allow for the quantification of the effect on survival of a set of predictors, the interaction of two predictors, or the effect of a new predictor above and beyond other covariates.
Among the available survival regression models, the Cox proportional hazards model developed by Sir David Cox3 has seen great use in epidemiological and medical studies, and the field of nuclear cardiology is no exception. What follows are some examples of Cox models being used in nuclear cardiology. Xu et al4looked at how myocardial scarring (assessed with positron emission tomography [PET] or single photon emission computed tomography [SPECT]) and other demographic and medical history factors predicted mortality in patients with advanced heart failure who received cardiac resynchronization therapy. Bourque et al5 looked at how left ventricular ejection fraction (LVEF, assessed with angiography) and nuclear summed rest score (SRS, assessed with SPECT) interacted to change the risk of mortality. Hachamovitch and Berman6 looked at the incremental prognostic value of myocardial perfusion SPECT (MPS) parameters in the prediction of sudden cardiac death. Nakata et al7 looked at how the heart-to-mediastinum ratio (assessed with metaiodobenzylguanidine [MIBG] imaging) predicted cardiac death.
Survival models other than the Cox model have been used in nuclear cardiology as well. For example, in a study of diagnosis strategies for quantifying myocardial perfusion with SPECT, Duvall et al8 utilized a log-normal survival model, a member of the parametric family of regression survival models, since initial data exploration revealed that the proportional hazards assumption of the Cox model was invalid. While this is an excellent example of when to utilize other survival models, it has been more common to see such data presented in conjunction with a Cox model analysis. In earlier studies of MPS-derived predictors of cardiac events, Hachamovitch et al9 used Cox models to identify significant predictors and parametric models, specifically the accelerated failure time (AFT) model, to make estimates of the time to certain percentiles of survival. An identical analysis strategy was used by the research group comprised of Cuocolo, Acampa, Petretta, Daniele et al10–13 in their research of the impact of various SPECT-derived predictors on the occurrence of cardiac events.
IMPORTANT QUANTITIES IN SURVIVAL ANALYSIS
In analyzing survival or time-to-event data, there are several important quantities of interest to define. One of the most important quantities is the survival function, denoted by S(t), which provides the probability of survival at a given time. To illustrate, suppose that death is the event of interest, and time is measured in years from study enrollment. Examples of survival functions for two groups are displayed in Figure 1A. Group 1 has a higher risk of experiencing death than Group 2, because its survival curve decreases faster than the curve for Group 2. It is expected that about 61% of Group 1 and about 76% in Group 2 will survive past 5 years of study enrollment; while about 25% in Group 1 and 47% in Group 2 will survive past 10 years. The median time for each group is found by looking at the intersection between the horizontal line associated with the probability of survival equal to 0.5 and the particular survival curve. The median time is about 6.2 years for Group 1 and about 9.4 years for Group 2 using Figure 1A.
Plot of sample Weibull survival functions (A) and the corresponding hazard functions (B) The solid red curve represents the hazard function of Group 1, and the blue dashed curve represents the hazard function of Group 2.
Another important quantity in the analysis of survival data is the rate at which a person who is event-free at a given point in time will instantaneously experience the event. This rate is quantified by the hazard function, denoted by h(t). The value of the hazard function is not a probability, but it is an indicator of the risk of experiencing the event. The higher the value of the hazard function, the higher the risk of event. Mathematically, the hazard function is related to how fast the survival function decreases over time. Therefore, the faster the survival function decreases, the higher the hazard. Returning to Figure 1A, since the survival curve for Group 1 decreases faster than Group 2, it is expected that the hazard function for Group 1 is higher than that of Group 2 as seen in Figure 1B which displays the hazard function corresponding to the survival curves in Figure 1A. It is also worth noting that both hazard functions increase over time.
In analyzing survival or time-to-event data, there are several important quantities of interest to define. One of the most important quantities is the survival function, denoted by S(t), which provides the probability of survival at a given time. To illustrate, suppose that death is the event of interest, and time is measured in years from study enrollment. Examples of survival functions for two groups are displayed in Figure 1A. Group 1 has a higher risk of experiencing death than Group 2, because its survival curve decreases faster than the curve for Group 2. It is expected that about 61% of Group 1 and about 76% in Group 2 will survive past 5 years of study enrollment; while about 25% in Group 1 and 47% in Group 2 will survive past 10 years. The median time for each group is found by looking at the intersection between the horizontal line associated with the probability of survival equal to 0.5 and the particular survival curve. The median time is about 6.2 years for Group 1 and about 9.4 years for Group 2 using Figure 1A.
Plot of sample Weibull survival functions (A) and the corresponding hazard functions (B) The solid red curve represents the hazard function of Group 1, and the blue dashed curve represents the hazard function of Group 2.
Another important quantity in the analysis of survival data is the rate at which a person who is event-free at a given point in time will instantaneously experience the event. This rate is quantified by the hazard function, denoted by h(t). The value of the hazard function is not a probability, but it is an indicator of the risk of experiencing the event. The higher the value of the hazard function, the higher the risk of event. Mathematically, the hazard function is related to how fast the survival function decreases over time. Therefore, the faster the survival function decreases, the higher the hazard. Returning to Figure 1A, since the survival curve for Group 1 decreases faster than Group 2, it is expected that the hazard function for Group 1 is higher than that of Group 2 as seen in Figure 1B which displays the hazard function corresponding to the survival curves in Figure 1A. It is also worth noting that both hazard functions increase over time.
COX PROPORTIONAL HAZARDS MODEL
The hazard function plays a very important role in survival analysis. The Cox proportional hazards model, the most popularly used survival regression model, investigates the relationship of predictors and the time-to-event through the hazard function. It assumes that the predictors have a multiplicative effect on the hazard and that this effect is constant over time, i.e.,
[Math Processing Error]
where h(t|x) is the hazard at time t for a subject with a set of predictors x1,…,xp, h0(t) is the baseline hazard function, and β1,…,βp are the model parameters describing the effect of the predictors on the overall hazard. Therefore, the interpretation of the Cox model is done using hazard ratios (HR), defined as the ratio of the predicted hazard function under two different values of a predictor variable. A hazard ratio greater than 1 means the event is more likely to occur, and a ratio less than one means an event is less likely to occur. A hazard ratio of 1 means the predictor has no effect on the hazard of the event. Also, due to the regression framework of the model, one can get hazard ratio estimates that are controlled for other covariates in the model such as age, sex, and race.
Cox models have achieved great popularity, because they do not require the investigator to assume a particular survival distribution for the data. Instead, these models use a hazard function. In estimating the baseline hazard function, a Cox model uses the so-called Aalen-Breslow estimator, which is a generalization of the non-parametric Nelson-Aalen estimator of the cumulative hazard function.14 The lack of a parametric form of the survival distribution gives the Cox model its other name, the semiparametric model, since the only parameters to estimate in the model are those describing how the predictors affect the hazard.
Despite the seeming ease at which the Cox model can be implemented, there is one significant assumption which must be checked. The Cox model assumes proportional hazards between the values of the predictors regardless of how the underlying hazard may change over time (admittedly, the addition of time-varying covariates affects this assumption). Since the proportional hazards model is built entirely around this assumption, if it happens to be invalid for a set of predictors in a given dataset, then the Cox model should not be used on that dataset, and any results would be questionable. A way around this issue is to fit a stratified Cox model for which the baseline hazard can be different from stratum to stratum or fit a model that includes time-varying covariates.14 The latter case can address broader cases but the interpretation of the resulting model will not be straight forward.
It is important to examine the proportional hazards assumption for all predictors tested in a Cox model. A way to assess proportional hazards for a continuous predictor is to plot the Schoenfeld residuals vs time. If the proportional hazards assumption is valid, then the Schoenfeld residuals should look like a random scatter around zero.15 When examining a categorical predictor like a medical treatment or disease status, it is easiest to compare a log-log transformation of the Kaplan-Meier survival curves for the different categories. Under proportional hazards, the curves should be approximately parallel and should not intersect after time apart. Note that a bit of crossing at early time points may be a product of noise in the survival estimates and may not constitute a violation of the proportional hazards assumption. Both Schoenfeld residuals and Kaplan-Meier estimates can be easily obtained from statistical software such as SAS.
The hazard function plays a very important role in survival analysis. The Cox proportional hazards model, the most popularly used survival regression model, investigates the relationship of predictors and the time-to-event through the hazard function. It assumes that the predictors have a multiplicative effect on the hazard and that this effect is constant over time, i.e.,
[Math Processing Error]
where h(t|x) is the hazard at time t for a subject with a set of predictors x1,…,xp, h0(t) is the baseline hazard function, and β1,…,βp are the model parameters describing the effect of the predictors on the overall hazard. Therefore, the interpretation of the Cox model is done using hazard ratios (HR), defined as the ratio of the predicted hazard function under two different values of a predictor variable. A hazard ratio greater than 1 means the event is more likely to occur, and a ratio less than one means an event is less likely to occur. A hazard ratio of 1 means the predictor has no effect on the hazard of the event. Also, due to the regression framework of the model, one can get hazard ratio estimates that are controlled for other covariates in the model such as age, sex, and race.
Cox models have achieved great popularity, because they do not require the investigator to assume a particular survival distribution for the data. Instead, these models use a hazard function. In estimating the baseline hazard function, a Cox model uses the so-called Aalen-Breslow estimator, which is a generalization of the non-parametric Nelson-Aalen estimator of the cumulative hazard function.14 The lack of a parametric form of the survival distribution gives the Cox model its other name, the semiparametric model, since the only parameters to estimate in the model are those describing how the predictors affect the hazard.
Despite the seeming ease at which the Cox model can be implemented, there is one significant assumption which must be checked. The Cox model assumes proportional hazards between the values of the predictors regardless of how the underlying hazard may change over time (admittedly, the addition of time-varying covariates affects this assumption). Since the proportional hazards model is built entirely around this assumption, if it happens to be invalid for a set of predictors in a given dataset, then the Cox model should not be used on that dataset, and any results would be questionable. A way around this issue is to fit a stratified Cox model for which the baseline hazard can be different from stratum to stratum or fit a model that includes time-varying covariates.14 The latter case can address broader cases but the interpretation of the resulting model will not be straight forward.
It is important to examine the proportional hazards assumption for all predictors tested in a Cox model. A way to assess proportional hazards for a continuous predictor is to plot the Schoenfeld residuals vs time. If the proportional hazards assumption is valid, then the Schoenfeld residuals should look like a random scatter around zero.15 When examining a categorical predictor like a medical treatment or disease status, it is easiest to compare a log-log transformation of the Kaplan-Meier survival curves for the different categories. Under proportional hazards, the curves should be approximately parallel and should not intersect after time apart. Note that a bit of crossing at early time points may be a product of noise in the survival estimates and may not constitute a violation of the proportional hazards assumption. Both Schoenfeld residuals and Kaplan-Meier estimates can be easily obtained from statistical software such as SAS.
PARAMETRIC REGRESSION MODELS
An alternative to the Cox model is a parametric survival model wherein a particular form of the survival distribution is assumed. There are several classes of parametric models: (1) parametric proportional hazards model which takes the form of the Cox model but assumes a parametric form on the baseline hazard; (2) the additive hazards model where the predictors affect the hazard function in an additive manner instead of multiplicative; and (3) the AFT model which is most similar to conventional linear regression.
Unlike the proportional hazards model and the additive hazards model, which examine how predictors affect the hazard function, the AFT model postulates a direct relationship between the predictors and the survival time, making its interpretation easier. Suppose there is only one predictor denoting the presence or absence a disease. The AFT model assumes that the disease either accelerates or decelerates the rate of decrease of the survival function. In other words, if S1(t) and S2(t) denote the survival functions of the presence and absence of the disease, respectively, then the AFT model assumes the relationship S1(t) = S2(ηt), where η is the acceleration factor. If median time-to-event is of interest, then the AFT model implies that the median time for those with the disease is η times the median time of those without the disease. If η > 1 (or η < 1), then the group with the disease has a longer (or shorter) median time-to-event relative to those without the disease. This interpretation is true for other percentiles of the survival distribution.
In general, the AFT model can be expressed two ways:
[Math Processing Error]
or
[Math Processing Error]
where T is the time-to-event (the failure time); x1,…,xp, and β1,…,βp are predictor variables and their corresponding coefficients, respectively; ε is the error term assumed to have a particular parametric distribution; and ln(ε) is the natural log of the error term. Traditional regression and the AFT model differ in the following points: (1) predictor variables in the AFT model affect the event time multipli-catively; (2) the AFT model accommodates censored observations; and (3) the error terms in the AFT model, although still independent and identically distributed, no longer follow a normal distribution. Some commonly assumed parametric distributions in survival models include (see for instance Klein and Moeschberger15 for discussions regarding properties of these distributions): exponential, Weibull, generalized gamma, log-normal, and log-logistic. These are used in place of a normal distribution since the event times are positively valued and generally have a skewed distribution, making the symmetric normal distribution a poor choice for fitting the data closely.
In practice, the choice of which parametric distribution to use is done by comparing the model fit for a variety of different distributions. The choice of candidate distributions to be considered is driven by prior assumptions or scientific insight about the data. The comparison may be done graphically using probability plots which will show how the observed data follow an assumed parametric model. Distributions with multiple parameters defining their shape may have a better fit, but if parsimony is desired, it would be better to rely on a penalized metric provided by model selection indices such as the Akaike information criterion (AIC)16 or Bayesian information criterion (BIC)17 to choose which distribution gives the best fit with the fewest parameters among candidate distributions. These indices allow for numeric comparison which may be less subjective than comparing graphs. However, there is no formal statistical test associated with these indices. The choice of the distribution should not be based on which distribution gives a favorable P value. It should be noted that there is no distribution that provides a perfect fit, and it is possible that more than one distribution may fit the data well.
Once the distribution of the outcome has been decided, an investigator can focus on the effects of variables of interest on the time to an event. As previously noted, the effects of individual predictors in the AFT model are interpreted using time ratios (TR) where the ratio denotes the acceleration factor. Contrary to HR, a time ratio greater than one means that an event is less likely to occur as it means that an investigator must wait longer for the event to happen. Similarly, a time ratio less than one implies that the event is more likely to happen.
An important point to note is that when the survival distribution of the event of interest follows a Weibull distribution, the AFT model and the Cox proportional hazard model coincide.15 In other words, the AFT model assumes proportional hazard if the distribution is Weibull and vice versa. For all other parametric distributions, the AFT model assumes non-proportional hazards. This underlines the important distinction between the two models: for a given set of data, the AFT model and the Cox model (without covariates that vary with time) cannot both be correct, unless the survival distribution is Weibull.
An alternative to the Cox model is a parametric survival model wherein a particular form of the survival distribution is assumed. There are several classes of parametric models: (1) parametric proportional hazards model which takes the form of the Cox model but assumes a parametric form on the baseline hazard; (2) the additive hazards model where the predictors affect the hazard function in an additive manner instead of multiplicative; and (3) the AFT model which is most similar to conventional linear regression.
Unlike the proportional hazards model and the additive hazards model, which examine how predictors affect the hazard function, the AFT model postulates a direct relationship between the predictors and the survival time, making its interpretation easier. Suppose there is only one predictor denoting the presence or absence a disease. The AFT model assumes that the disease either accelerates or decelerates the rate of decrease of the survival function. In other words, if S1(t) and S2(t) denote the survival functions of the presence and absence of the disease, respectively, then the AFT model assumes the relationship S1(t) = S2(ηt), where η is the acceleration factor. If median time-to-event is of interest, then the AFT model implies that the median time for those with the disease is η times the median time of those without the disease. If η > 1 (or η < 1), then the group with the disease has a longer (or shorter) median time-to-event relative to those without the disease. This interpretation is true for other percentiles of the survival distribution.
In general, the AFT model can be expressed two ways:
[Math Processing Error]
or
[Math Processing Error]
where T is the time-to-event (the failure time); x1,…,xp, and β1,…,βp are predictor variables and their corresponding coefficients, respectively; ε is the error term assumed to have a particular parametric distribution; and ln(ε) is the natural log of the error term. Traditional regression and the AFT model differ in the following points: (1) predictor variables in the AFT model affect the event time multipli-catively; (2) the AFT model accommodates censored observations; and (3) the error terms in the AFT model, although still independent and identically distributed, no longer follow a normal distribution. Some commonly assumed parametric distributions in survival models include (see for instance Klein and Moeschberger15 for discussions regarding properties of these distributions): exponential, Weibull, generalized gamma, log-normal, and log-logistic. These are used in place of a normal distribution since the event times are positively valued and generally have a skewed distribution, making the symmetric normal distribution a poor choice for fitting the data closely.
In practice, the choice of which parametric distribution to use is done by comparing the model fit for a variety of different distributions. The choice of candidate distributions to be considered is driven by prior assumptions or scientific insight about the data. The comparison may be done graphically using probability plots which will show how the observed data follow an assumed parametric model. Distributions with multiple parameters defining their shape may have a better fit, but if parsimony is desired, it would be better to rely on a penalized metric provided by model selection indices such as the Akaike information criterion (AIC)16 or Bayesian information criterion (BIC)17 to choose which distribution gives the best fit with the fewest parameters among candidate distributions. These indices allow for numeric comparison which may be less subjective than comparing graphs. However, there is no formal statistical test associated with these indices. The choice of the distribution should not be based on which distribution gives a favorable P value. It should be noted that there is no distribution that provides a perfect fit, and it is possible that more than one distribution may fit the data well.
Once the distribution of the outcome has been decided, an investigator can focus on the effects of variables of interest on the time to an event. As previously noted, the effects of individual predictors in the AFT model are interpreted using time ratios (TR) where the ratio denotes the acceleration factor. Contrary to HR, a time ratio greater than one means that an event is less likely to occur as it means that an investigator must wait longer for the event to happen. Similarly, a time ratio less than one implies that the event is more likely to happen.
An important point to note is that when the survival distribution of the event of interest follows a Weibull distribution, the AFT model and the Cox proportional hazard model coincide.15 In other words, the AFT model assumes proportional hazard if the distribution is Weibull and vice versa. For all other parametric distributions, the AFT model assumes non-proportional hazards. This underlines the important distinction between the two models: for a given set of data, the AFT model and the Cox model (without covariates that vary with time) cannot both be correct, unless the survival distribution is Weibull.
ILLUSTRATIVE EXAMPLE
A fictitious study enrolled a selected cohort of 200 patients with New York Heart Association (NYHA) Class II-III diastolic heart failure who were followed over time. Suppose that 100 of these patients have diabetes mellitus (DM), while the other 100 patients are non-diabetic (non-DM). Let the goal of the study be the comparison of cardiovascular-related mortality between diabetics and non-diabetics who all have NYHA Class II-III diastolic heart failure. Using the statistical package SAS version 9.3, data on time to death for both groups were artificially generated through simulation based on the mathematical model that generated the survival curves in Figure 1A, which are Weibull survival functions. Censoring was also artificially generated by assuming a maximum length of follow-up of 10 years and allowing for dropouts and loss to follow-up (Details of the simulation and analyses are in Appendix A.4).
In the simulated data, there were 38 deaths in the DM group and 22 deaths in the non-DM group. Figure 2 displays the Kaplan-Meier curve for the two groups and the P value of the log-rank test. Patients with diabetes have significantly lower survival than those without diabetes (P = .002). Kaplan-Meier curves do not go all the way down to zero when the largest observed time (which is around 9.5 years for both groups for this example) is censored.
Fitting a Cox model with only one predictor variable (i.e., presence of DM), a significant group difference (P = .003) was found just as in the log-rank test. However, in the Cox model, one can estimate the hazard ratio. In this case, the estimate of the model coefficient (β) for diabetes is 0.809 where non-DM is the reference group. Thus, the hazard ratio estimate is HR = e0.809 = 2.24 (95% confidence interval (CI): 1.3–3.8). This means that the DM group is estimated to have a hazard rate about twice that of the non-DM group. Figure 3 displays a graphical check for proportionality of hazards showing the transformed Kaplan-Meier (also known as the product limit estimator) curves. Since the curves in Figure 3 are approximately parallel to each other, there is no evidence of violation of the proportional hazard assumption from Figure 3.
In fitting an AFT model, it is worth noting that since the data were simulated, the survival distribution (Weibull in this case) is known. However, in practice, the true distribution of the event times is unknown. Therefore, in the analysis of data collected, it is recommended to fit several parametric distributions. In this case, the Weibull, log-normal, log-logistic, and Gamma distributions were fitted. Weibull was chosen to be the best fitting model using the AIC and BIC criteria (see Table 1). The fact that Weibull was the best fit model also supports the appropriateness of the proportional hazards assumption of the Cox model.
As in the log-rank and Cox models, the Weibull AFT model with only DM as a predictor variable found significant group differences (P = .0034). The estimate for the model parameter associated with DM is −0.58 (95% CI: −0.96, −0.19) where non-DM is the reference group. Consequently, the estimate of the TR ratio is exp{−0.58} = 0.56 (95% CI: 0.38, 0.83), meaning DM shortened the survival time.
One may also prefer to provide estimates of the median time to death for each group. Table 2 displays the estimates of the median and its 95% CI for each group. Using this information, it is estimated that a patient from this artificially generated population with DM has a median time to death of 5.76 years (95% CI: 54.59–7.23). Note that the ratio of the median time-to-death estimates for patients with DM and non-DM is 0.56 (=5.76 ÷ 10.26), which was the TR obtained previously.
A fictitious study enrolled a selected cohort of 200 patients with New York Heart Association (NYHA) Class II-III diastolic heart failure who were followed over time. Suppose that 100 of these patients have diabetes mellitus (DM), while the other 100 patients are non-diabetic (non-DM). Let the goal of the study be the comparison of cardiovascular-related mortality between diabetics and non-diabetics who all have NYHA Class II-III diastolic heart failure. Using the statistical package SAS version 9.3, data on time to death for both groups were artificially generated through simulation based on the mathematical model that generated the survival curves in Figure 1A, which are Weibull survival functions. Censoring was also artificially generated by assuming a maximum length of follow-up of 10 years and allowing for dropouts and loss to follow-up (Details of the simulation and analyses are in Appendix A.4).
In the simulated data, there were 38 deaths in the DM group and 22 deaths in the non-DM group. Figure 2 displays the Kaplan-Meier curve for the two groups and the P value of the log-rank test. Patients with diabetes have significantly lower survival than those without diabetes (P = .002). Kaplan-Meier curves do not go all the way down to zero when the largest observed time (which is around 9.5 years for both groups for this example) is censored.
Fitting a Cox model with only one predictor variable (i.e., presence of DM), a significant group difference (P = .003) was found just as in the log-rank test. However, in the Cox model, one can estimate the hazard ratio. In this case, the estimate of the model coefficient (β) for diabetes is 0.809 where non-DM is the reference group. Thus, the hazard ratio estimate is HR = e0.809 = 2.24 (95% confidence interval (CI): 1.3–3.8). This means that the DM group is estimated to have a hazard rate about twice that of the non-DM group. Figure 3 displays a graphical check for proportionality of hazards showing the transformed Kaplan-Meier (also known as the product limit estimator) curves. Since the curves in Figure 3 are approximately parallel to each other, there is no evidence of violation of the proportional hazard assumption from Figure 3.
In fitting an AFT model, it is worth noting that since the data were simulated, the survival distribution (Weibull in this case) is known. However, in practice, the true distribution of the event times is unknown. Therefore, in the analysis of data collected, it is recommended to fit several parametric distributions. In this case, the Weibull, log-normal, log-logistic, and Gamma distributions were fitted. Weibull was chosen to be the best fitting model using the AIC and BIC criteria (see Table 1). The fact that Weibull was the best fit model also supports the appropriateness of the proportional hazards assumption of the Cox model.
As in the log-rank and Cox models, the Weibull AFT model with only DM as a predictor variable found significant group differences (P = .0034). The estimate for the model parameter associated with DM is −0.58 (95% CI: −0.96, −0.19) where non-DM is the reference group. Consequently, the estimate of the TR ratio is exp{−0.58} = 0.56 (95% CI: 0.38, 0.83), meaning DM shortened the survival time.
One may also prefer to provide estimates of the median time to death for each group. Table 2 displays the estimates of the median and its 95% CI for each group. Using this information, it is estimated that a patient from this artificially generated population with DM has a median time to death of 5.76 years (95% CI: 54.59–7.23). Note that the ratio of the median time-to-death estimates for patients with DM and non-DM is 0.56 (=5.76 ÷ 10.26), which was the TR obtained previously.
SUMMARY AND CONCLUSIONS
This paper reviews some basic concepts of survival analyses including discussions and comparisons between the semiparametric Cox proportional hazards model and the parametric AFT model. The appeal of the AFT model lies in the ease of interpreting the results, because the AFT models the effect of predictors and covariates directly on the survival time instead of through the hazard function. If the assumption of proportional hazards of the Cox model is met, the AFT model can be used with the Weibull distribution, while if proportional hazard is violated, the AFT model can be used with distributions other than Weibull.
It is essential to consider the model assumptions and recognize that if the assumptions are not met, the results may be erroneous or misleading. The AFT model assumes a certain parametric distribution for the failure times and that the effect of the covariates on the failure time is multiplicative. Several different distributions should be considered before choosing one. The Cox model assumes proportional hazards of the predictors over time. Model diagnostic tools and goodness of fit tests should be utilized to assess the model assumptions before statistical inferences are made.
In conclusion, although the Cox proportional hazards model tends to be more popular in the literature, the AFT model should also be considered when planning a survival analysis. It should go without saying that the choice should be driven by the desired outcome or the fit to the data, and never by which gives a significant P value for the predictor of interest. The choice should be dictated only by the research hypothesis and by which assumptions of the model are valid for the data being analyzed.
This paper reviews some basic concepts of survival analyses including discussions and comparisons between the semiparametric Cox proportional hazards model and the parametric AFT model. The appeal of the AFT model lies in the ease of interpreting the results, because the AFT models the effect of predictors and covariates directly on the survival time instead of through the hazard function. If the assumption of proportional hazards of the Cox model is met, the AFT model can be used with the Weibull distribution, while if proportional hazard is violated, the AFT model can be used with distributions other than Weibull.
It is essential to consider the model assumptions and recognize that if the assumptions are not met, the results may be erroneous or misleading. The AFT model assumes a certain parametric distribution for the failure times and that the effect of the covariates on the failure time is multiplicative. Several different distributions should be considered before choosing one. The Cox model assumes proportional hazards of the predictors over time. Model diagnostic tools and goodness of fit tests should be utilized to assess the model assumptions before statistical inferences are made.
In conclusion, although the Cox proportional hazards model tends to be more popular in the literature, the AFT model should also be considered when planning a survival analysis. It should go without saying that the choice should be driven by the desired outcome or the fit to the data, and never by which gives a significant P value for the predictor of interest. The choice should be dictated only by the research hypothesis and by which assumptions of the model are valid for the data being analyzed.
Acknowledgments
We would like to thank Dr Edsel Pena and Dr Fadi Hage for their valuable comments and suggestions.
Funding provided by NHLBI T32HL079888.
We would like to thank Dr Edsel Pena and Dr Fadi Hage for their valuable comments and suggestions.
Funding provided by NHLBI T32HL079888.
APPENDIX
See Tables 3, 4, 5, 6.
See Tables 3, 4, 5, 6.
Table 3
Term Symbol Description
Survival function S(t) Proportion of subjects who are event-free at time t
Hazard function h(t) Instantaneous rate of experiencing an event, given the subject is event-free at time t
Predictor variable/covariates x1, x2,…,xp The P independent variables in a regression model such as age, sex, race, and treatment group
Predictor parameters β1, β2,…,βp The values corresponding to the P predictors that quantify how the predictor affects the outcome
Term | Symbol | Description |
---|---|---|
Survival function | S(t) | Proportion of subjects who are event-free at time t |
Hazard function | h(t) | Instantaneous rate of experiencing an event, given the subject is event-free at time t |
Predictor variable/covariates | x1, x2,…,xp | The P independent variables in a regression model such as age, sex, race, and treatment group |
Predictor parameters | β1, β2,…,βp | The values corresponding to the P predictors that quantify how the predictor affects the outcome |
Table 4
Term Definition Explanation
AIC
BIC Akaike’s information criteria
Bayesian information criteria Indices that quantify how well the statistical model fits the data, with a penalty for added complexity (model parameters)
AFT Accelerated failure time Statistical model that can test the effects of multiple predictors on survival, controlling for the others. Assumes the event times follow a parametric distribution defined by the analyst
HR Hazard ratio The proportion of the hazard changes in the presence of a categorical predictor variable or from a one-unit increase in a continuous predictor. Calculated from the Cox model
TR Time ratio The proportion of the time-to-event changes in the presence of a categorical predictor variable or from a one-unit increase in a continuous predictor. Calculated from the AFT model
SAS Statistical analysis system Commercial software for statistical analysis. Developed by SAS Institute Inc., Cary, NC
Term | Definition | Explanation |
---|---|---|
AIC BIC | Akaike’s information criteria Bayesian information criteria | Indices that quantify how well the statistical model fits the data, with a penalty for added complexity (model parameters) |
AFT | Accelerated failure time | Statistical model that can test the effects of multiple predictors on survival, controlling for the others. Assumes the event times follow a parametric distribution defined by the analyst |
HR | Hazard ratio | The proportion of the hazard changes in the presence of a categorical predictor variable or from a one-unit increase in a continuous predictor. Calculated from the Cox model |
TR | Time ratio | The proportion of the time-to-event changes in the presence of a categorical predictor variable or from a one-unit increase in a continuous predictor. Calculated from the AFT model |
SAS | Statistical analysis system | Commercial software for statistical analysis. Developed by SAS Institute Inc., Cary, NC |
Table 5
Name # of parameters Notes
Exponential 1 Assumes constant hazard
Special case of Weibull
Weibull 2 Assumes proportional hazards
Hazard is monotonically increasing or decreasing
Special case of gamma
Gamma 3 Hazard is monotonically increasing or decreasing
Log-normal 2 Hazard increases and later decreases
Log-logistic 2 Hazard is monotonically increasing or decreasing
Note that the number of parameters relates to the model complexity; if a simpler model is desired, a survival distribution with fewer parameters is preferred. Furthermore to a limited extent, additional parameters results in a greater fit to the data
Name | # of parameters | Notes |
---|---|---|
Exponential | 1 | Assumes constant hazard Special case of Weibull |
Weibull | 2 | Assumes proportional hazards Hazard is monotonically increasing or decreasing Special case of gamma |
Gamma | 3 | Hazard is monotonically increasing or decreasing |
Log-normal | 2 | Hazard increases and later decreases |
Log-logistic | 2 | Hazard is monotonically increasing or decreasing |
Note that the number of parameters relates to the model complexity; if a simpler model is desired, a survival distribution with fewer parameters is preferred. Furthermore to a limited extent, additional parameters results in a greater fit to the data
Table 6
%macro simsurv (seed=,DATA=,g=,a=);
data &DATA; length group $ 7.;
group=&g;
b=1.5;
n=100;
call streaminit(&seed);
do i = 1 to n;
T=RAND(‘WEIBULL’,b,&a);
U=RAND(‘UNIFORM’);
C=u*10;
Z=min(T,C);
if T>C then censor=0; else censor=l;
output;
end;
run;
%mend simsurv;
% simsurv (seed=246195,data=gl,g=‘DM’,a=8);
% simsurv (seed=950127,data=g2,g=‘NonDM’,a=12);
data HF;
set gl g2;
run;
/*Kaplan-Meier plots and log rank test*/
proc lifetest data=HF plots=(survival(test) lls);
time z*censor(0);
strata group;
run;
/*Fitting Cox Model*/
proc phreg data=HF;
class group;
model z*censor(0)=group;
hazardratio group;
run;
/*Fitting Weibull AFT model*/
proc lifereg data=HF;
class group;
model z*censor(0)=group/dist=weibull alpha=0.05;
output out=outb quantiles=.5 std=std p=predtime;
run;
/*computes the median and its 95% CI*/
proc sort data=outb; by group;
data out;
set outb;by group;
if first.group;
run;
data outl;
set out;
ltime=log(predtime);
stde=std/predtime;
upper=exp(ltime+1.96*stdE);
lower=exp(ltime−1.96*stdE);
keep group predtime upper lower;
run;
proc print;run;
%macro simsurv (seed=,DATA=,g=,a=); |
data &DATA; length group $ 7.; |
group=&g; |
b=1.5; |
n=100; |
call streaminit(&seed); |
do i = 1 to n; |
T=RAND(‘WEIBULL’,b,&a); |
U=RAND(‘UNIFORM’); |
C=u*10; |
Z=min(T,C); |
if T>C then censor=0; else censor=l; |
output; |
end; |
run; |
%mend simsurv; |
% simsurv (seed=246195,data=gl,g=‘DM’,a=8); |
% simsurv (seed=950127,data=g2,g=‘NonDM’,a=12); |
data HF; |
set gl g2; |
run; |
/*Kaplan-Meier plots and log rank test*/ |
proc lifetest data=HF plots=(survival(test) lls); |
time z*censor(0); |
strata group; |
run; |
/*Fitting Cox Model*/ |
proc phreg data=HF; |
class group; |
model z*censor(0)=group; |
hazardratio group; |
run; |
/*Fitting Weibull AFT model*/ |
proc lifereg data=HF; |
class group; |
model z*censor(0)=group/dist=weibull alpha=0.05; |
output out=outb quantiles=.5 std=std p=predtime; |
run; |
/*computes the median and its 95% CI*/ |
proc sort data=outb; by group; |
data out; |
set outb;by group; |
if first.group; |
run; |
data outl; |
set out; |
ltime=log(predtime); |
stde=std/predtime; |
upper=exp(ltime+1.96*stdE); |
lower=exp(ltime−1.96*stdE); |
keep group predtime upper lower; |
run; |
proc print;run; |
References
1. Kaplan EK, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53(282):457–481.
2. Savage IR. Contributions to the theory of rank order statistics: The two sample case. Ann Math Stat. 1956;27(3):590–615.
3. Cox DR. Regression models and life-tables. J R Stat Soc Ser B (Methodol) 1972;34(2):187–220.
4. Xu Y-Z, Cha Y-M, Feng D, Powell BD, Wise HJ, Hua W, et al. Impact of myocardial scarring on outcomes of cardiac resyn-chronization therapy: Extent or location? J Nucl Med. 2012;53(1):47–54.[PubMed]
5. Bourque JM, Velazquez EJ, Tuttle RH, Shaw LK, O’Connor CM, Borges-Neto S. Mortality risk associated with ejection fraction differs among resting nuclear perfusion findings. J Nucl Cardiol. 2007;14(2):165–173. [PubMed]
6. Hachamovitch R, Berman DS. The use of nuclear cardiology in clinical decision making. Semin Nucl Med. 2005;35(1):62–72. [PubMed]
7. Nakata T, Miyamoto K, Doi A, Sasao H, Wakabayashi T, Ko-bayashi H, et al. Cardiac death prediction and impaired cardiac sympathetic innervation assessed by MIBG in patients with failing and nonfailing hearts. J Nucl Cardiol. 1998;5(6):579–590. [PubMed]
8. Duvall WL, Wijetunga MN, Klein TM, Razzouk L, Godbold J, Croft LB, et al. The prognosis of a normal stress-only Tc-99m myocardial perfusion imaging study. J Nucl Cardiol. 2010;17(3):370–377. [PubMed]
9. Hachamovitch R, Hayes S, Friedman JD, Cohen I, Shaw LJ, Germano G, et al. Determinants of risk and its temporal variation in patients with normal stress myocardial perfusion scans: What is the warranty period of a normal scan? J Am Coll Cardiol. 2003;41(8):1329–1340. [PubMed]
10. Acampa W, Evangelista L, Petretta M, Liuzzi R, Cuocolo A. Usefulness of stress cardiac single-photon emission computed tomographic imaging late after percutaneous coronary intervention for assessing cardiac events and time to such events. Am J Cardiol. 2007;100(3):436–441. [PubMed]
11. Petretta M, Acampa W, Evangelista L, Daniele S, Ferro A, Cuo-colo A. Impact of inducible ischemia by stress SPECT in cardiac risk assessment in diabetic patients: Rationale and design of a prospective multicenter trial. J Nucl Cardiol. 2008;15(1):100–104. [PubMed]
12. Daniele S, Nappi C, Acampa W, Storto G, Pellegrino T, Ricci F, et al. Incremental prognostic value of coronary flow reserve with single-photon emission computed tomography. J Nucl Cardiol. 2011;18(4):612–619. [PubMed]
13. Acampa W, Petretta M, Cuocolo R, Daniele S, Cantoni V, Cuo-colo A. Warranty period of normal stress myocardial perfusion imaging in diabetic patients: A propensity score analysis. J Nucl Cardiol. 2014;21(1):50–56. [PubMed]
14. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd ed. Hoboken: John Wiley and Sons; 2002.
15. Klein JP, Moeschberger ML. Survival analysis: Techniques for censored and truncated data. New York: Springer; 2003.
16. Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716–723.
17. Schwarz GE. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–464
1. Kaplan EK, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958;53(282):457–481.
2. Savage IR. Contributions to the theory of rank order statistics: The two sample case. Ann Math Stat. 1956;27(3):590–615.
3. Cox DR. Regression models and life-tables. J R Stat Soc Ser B (Methodol) 1972;34(2):187–220.
4. Xu Y-Z, Cha Y-M, Feng D, Powell BD, Wise HJ, Hua W, et al. Impact of myocardial scarring on outcomes of cardiac resyn-chronization therapy: Extent or location? J Nucl Med. 2012;53(1):47–54.[PubMed]
5. Bourque JM, Velazquez EJ, Tuttle RH, Shaw LK, O’Connor CM, Borges-Neto S. Mortality risk associated with ejection fraction differs among resting nuclear perfusion findings. J Nucl Cardiol. 2007;14(2):165–173. [PubMed]
6. Hachamovitch R, Berman DS. The use of nuclear cardiology in clinical decision making. Semin Nucl Med. 2005;35(1):62–72. [PubMed]
7. Nakata T, Miyamoto K, Doi A, Sasao H, Wakabayashi T, Ko-bayashi H, et al. Cardiac death prediction and impaired cardiac sympathetic innervation assessed by MIBG in patients with failing and nonfailing hearts. J Nucl Cardiol. 1998;5(6):579–590. [PubMed]
8. Duvall WL, Wijetunga MN, Klein TM, Razzouk L, Godbold J, Croft LB, et al. The prognosis of a normal stress-only Tc-99m myocardial perfusion imaging study. J Nucl Cardiol. 2010;17(3):370–377. [PubMed]
9. Hachamovitch R, Hayes S, Friedman JD, Cohen I, Shaw LJ, Germano G, et al. Determinants of risk and its temporal variation in patients with normal stress myocardial perfusion scans: What is the warranty period of a normal scan? J Am Coll Cardiol. 2003;41(8):1329–1340. [PubMed]
10. Acampa W, Evangelista L, Petretta M, Liuzzi R, Cuocolo A. Usefulness of stress cardiac single-photon emission computed tomographic imaging late after percutaneous coronary intervention for assessing cardiac events and time to such events. Am J Cardiol. 2007;100(3):436–441. [PubMed]
11. Petretta M, Acampa W, Evangelista L, Daniele S, Ferro A, Cuo-colo A. Impact of inducible ischemia by stress SPECT in cardiac risk assessment in diabetic patients: Rationale and design of a prospective multicenter trial. J Nucl Cardiol. 2008;15(1):100–104. [PubMed]
12. Daniele S, Nappi C, Acampa W, Storto G, Pellegrino T, Ricci F, et al. Incremental prognostic value of coronary flow reserve with single-photon emission computed tomography. J Nucl Cardiol. 2011;18(4):612–619. [PubMed]
13. Acampa W, Petretta M, Cuocolo R, Daniele S, Cantoni V, Cuo-colo A. Warranty period of normal stress myocardial perfusion imaging in diabetic patients: A propensity score analysis. J Nucl Cardiol. 2014;21(1):50–56. [PubMed]
14. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. 2nd ed. Hoboken: John Wiley and Sons; 2002.
15. Klein JP, Moeschberger ML. Survival analysis: Techniques for censored and truncated data. New York: Springer; 2003.
16. Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control. 1974;19(6):716–723.
17. Schwarz GE. Estimating the dimension of a model. Ann Stat. 1978;6(2):461–464
No comments:
Post a Comment