statistics Tutorial: Survival analysis is the analysis of data

Wednesday, January 18, 2017

Survival analysis is the analysis of data

Survival analysis is the analysis of data which is in the form of times from a well defined start point, up to a particular event of interest. The actual survival time, t, for an individual, is a realisation of the random variable T, which can take any non-negative value. This random variable has associated with it a probability distribution, with an underlying probability density function. There are generally two functions that are of central interest in survival analysis, namely the survivor function and the hazard function. The survivor function is defined to be the probability that an individuals’ survival time is greater than or equal to some value t. The hazard function can be thought of as the instantaneous event rate.

An important issue in survival analysis is that of censoring. Censoring occurs when an individuals’ actual survival time cannot be measured, but we have instead some measurable censored time associated with them. There are generally three types of censoring: (i) right censoring, occurring when the censored survival time is less than the actual, unknown survival time, (ii) left censoring, occurring when the observed, censored survival time is greater than the actual, unknown survival time, and (iii) interval censoring, which is evident if the actual survival time is only known up to some interval. Censoring is typically assumed to be independent of the event of interest.

In many situations, individuals’ survival times will be accompanied by a number of explanatory variables, or covariates. Interest is most commonly concerned with how one or more of these covariates may affect an individual’s survival time. When these situations arise, simple non-parametric approaches are not sufficient, and more sophisticated modelling is necessary. Many of the principles and procedures of linear modelling lend themselves easily to the modelling of survival data.

A proper survival distribution should have total mass 1, with the resulting Kaplan-Meier curve having its asymptote at zero. That is, in standard survival analysis we assume that every individual in the sample is susceptible to the event of interest. In some situations however, there may be a number of individuals who would never experience the event of interest, regardless of the time for which they were followed. We are encouraged to think of these individuals as cured, or immune to the event of interest. If survival data does indeed have a proportion that are immune to the event of interest, considering a proper survival model that ignores this may give misleading results. An improper survival distribution allows, formally, infinite survival times. Cure rate models allow the limiting value, as t tends to infinity, of cumulative distribution function of the survival times to be strictly less than 1, corresponding to the presence of immunes in the population.

statistics Tutorial

Wednesday, January 18, 2017

Survival analysis is the analysis of data

No comments:

Post a Comment