Wednesday, January 18, 2017

Survival analysis: topics covered

Survival Analysis

Survival analysis is applied when the data set includes subjects that are tracked until an event happens (failure) or we lose them from the sample.  We are interested in how long they stay in the sample (survival).  We are also interested in their risk of failure (hazard rates).  Examples include loan performance and default, firm survival and exit, and time to retirement.

Survival analysis: topics covered
  • Survival analysis set up and features
  • Extensions of basic survival analysis
  • Survival, hazard, and cumulative hazard functions
  • Nonparametric analysis (Kaplan-Meier survival function)
  • Parametric models (Exponential, Weibull, Gompertz, and Log-logistic)
  • Semi-parametric models (Cox proportional hazard model)

Survival analysis AKA failure time analysis Classically, the analysis of time to death. But can be used anywhere you want to know what factors affect the time for an event to occur: Germination timing Arrival of a migrant or parasite Dispersal of seeds or offspring Failure time in mechanical systems Response to stimulus Censoring: dealing with missing data Right censoring: Where the date of death is unknown but is after some known date e.g. Date of death is after the end of the study Subject is removed from the study (patient withdraws, animal escapes, plant gets eaten etc.) Survival analysis can account for this kind of censoring Censoring: dealing with missing data Left censoring: Occurs when a subject's survival time is incomplete on the left side of the follow-up period e.g. Following up a patient after being tested for an infection, we don't know the exact time of exposure Less common Survival analysis can account for this (see ref 4) ASSUMPTION: censoring must be independent of the event being looked at! The Survival Function (Survival curve)

Survival analysis AKA failure time analysis

Classically, the analysis of time to death.

But can be used anywhere you want to know what factors affect the time for an event to occur:
  • Germination timing
  • Arrival of a migrant or parasite
  • Dispersal of seeds or offspring
  • Failure time in mechanical systems
  • Response to stimulus

Censoring: dealing with missing data

Right censoring:

Where the date of death is unknown but is after some known date
e.g.
  • Date of death is after the end of the study
  • Subject is removed from the study (patient withdraws, animal escapes, plant gets eaten etc.)
Survival analysis can account for this kind of censoring

Censoring: dealing with missing data

Left censoring:

Occurs when a subject's survival time is incomplete on the left side of the follow-up period
  • e.g. Following up a patient after being tested for an infection, we don't know the exact time of exposure
  • Less common
  • Survival analysis can account for this (see ref 4)

ASSUMPTION: censoring must be independent of the event being looked at!

The Survival Function (Survival curve)



















\[ S(t)=Pr(T>t) \]
The Survival function (\( S \)) is the probability that the time of death (\( T \)) is greater than some specified time (\( t \))

It is composed of:

  • The underlying Hazard function(How the risk of death per unit time changes over time at baseline covariates)
  • The effect parameters (How the hazard varies in response to the covariates)
plot of chunk unnamed-chunk-2

The Cox Proportional-Hazards Model

The most common model used to determine the effects of covariates on survival
\[ h_i(t)=h_0(t)exp(\beta_{1}x_{i1} + \beta_{2}x_{ik} + ... + \beta_{2}x_{ik} ) \]
It is a semi-parametric model:
  • The baseline hazard function is unspecified
  • The effects of the covariates are multiplicative
  • Doesn't make arbitrary assumptions about the shape/form of the baseline hazard function

The Proportional Hazards Assumption

  • Covariates multiply the hazard by some constant
  • e.g. a drug may halve a subjects risk of death at any time \( t \)
  • The effect is the same at any time point

Violating the PH assumption can seriously invalidate your model!

Survival analysis in R

Uses the survival package.
The response variable is a Survobject taking in

  • start time (after study start)
  • stop time (after study start)
  • whether or not an event occurred

Otherwise, the model is specified in the same way as for a standard regression, using the coxph function
Toolkit
Surv(): Define a survival object

coxph(): Run a cox PH regression

survfit(): Fit a survival curve to a model or formula