Survival Models
Our final chapter concerns models for the analysis of data which have three main characteristics: (1) the dependent variable or response is the waiting time until the occurrence of a well-defined event, (2) observations are censored, in the sense that for some units the event of interest has not occurred at the time the data are analyzed, and (3) there are predictors or explanatory variables whose effect on the waiting time we wish to assess or control. We start with some basic definitions.
7.1
The Hazard and Survival Functions
Let T be a non-negative random variable representing the waiting time until the occurrence of an event. For simplicity we will adopt the terminology of survival analysis, referring to the event of interest as ‘death’ and to the waiting time as ‘survival’ time, but the techniques to be studied have much wider applicability. They can be used, for example, to study age at marriage, the duration of marriage, the intervals between successive births to a woman, the duration of stay in a city (or in a job), and the length of life. The observant demographer will have noticed that these examples include the fields of fertility, mortality and migration.
7.1.1
The Survival Function
We will assume for now that T is a continuous random variable with probability density function (p.d.f.) f (t) and cumulative distribution function
(c.d.f.) F (t) = Pr{T < t}, giving the probability that the event has occurred by duration t.
G. Rodr´ıguez. Revised September, 2010
2
CHAPTER 7. SURVIVAL MODELS
It will often be convenient to work with the complement of the c.d.f, the survival function
∞
S(t) = Pr{T ≥ t} = 1 − F (t) =
f (x)dx,
(7.1)
t
which gives the probability of being alive just before duration t, or more generally, the probability that the event of interest has not occurred by duration t.
7.1.2
The Hazard Function
An alternative characterization of the distribution of T is given by the hazard function, or