Biostatistics Ph.D. Qualifying Examination (APPLIED)
Instructions: You will find 6 questions in this exam with each question worth the same
number of points. However, complete solution to one question is worth more than partial (half)solutions to two questions. You are asked to solve 4 of 6 questions. You need to indicate clearlywhich questions you selected to solve, and solutions to each question have to be saved in aseparate file on the memory stick. If you submit solutions to more than four questions, the firstfour will be graded. Most questions have comma-delimited data files associated with them. Alldata files can be found on the memory stick provided to you. The exam will last for 4 hours,starting at 9am and ending at 1pm.
1. File lung.csv contains information on 137 lung cancer patients. The four variables are
survival time in days (Time), death indicator (Delta), treatment (Treatment), and celltype (Celltype). In particular:
Time (until death or end of study, in days)
(1=squamous, 2=smallcell, 3=adeno, 4=large)
Use a statistical package to answer the following questions:
i. Plot the Kaplan-Meier survival curve based on all 137 patients.
ii. Provide a 95% confidence interval for the six-month survival probability based on
the log transformation of the cumulative hazard.
iii. Provide a 95% confidence interval for the median survival time.
iv. Provide a 95% confidence interval for the mean survival time.
v. Interpret the confidence intervals obtained in points ii., iii. and iv.
(b) Assume that survival time follows a lognormal distribution. The maximum likelihood
estimates of µ and σ for the lognormal distribution are 4.158 and 1.378. Use these tocompute a point estimate of the mean survival time.
i. Plot the survival curves for each of the treatment groups.
ii. Perform the logrank test to test the equality of the survival curves. What is your
conclusion? For this hypothesis test state the null and alternative hypotheses,the value of the test statistic, the null distribution and the p-value.
iii. Formally compare the equality of the survival curves at six months.
2. An investigator is planning an epidemiologic study of sexually transmitted infections. He
plans to send out N = 200 letters to eligible young women to invite them and their malepartners to participate in the study. Let Yi be the number of recruits resulted from the ithletter, i.e., Yi = 0 if the woman declines to participate, Yi = 1 if the woman agrees to partic-ipate as an individual subject, and Yi = 2 if the woman decides to participate together withher male partner, i = 1, 2, . . . , N . Assuming Y1, Y2, . . . , YN are independently distributedrandom variables, the epidemiologist assigned reasonable probabilities pi(j) =P(Yi = j) forj = 0, 1, 2, respectively, based on pilot data. These probabilities are summarized below:
How many participants do you expect to recruit? Compute an approximate 90% confidenceinterval for the total number of participants the study will be able to recruit. Explain therationale and assumptions behind your interval.
3. Random zero sphygmomanometer is a widely used blood pressure meter in research studies
because it is believed to have superior measurement accuracy to the standard mercury bloodpressure meters. A clinical investigator purchased a new a random zero sphygmomanometerfor his study. He decided to first test the device on his research assistant. He took 8 systolicblood pressure measurements in the right arm of his assistant with the new device, whilethe assistant was in a seated position. The eight readings are given below: 128, 119, 126,129, 123, 122, 124, 121 (mmHg). Assuming that these eight measurements were randomobservations from the true systolic blood pressure distribution of this individual, pleasederive a 95% prediction interval for the systolic blood pressure measurement of the nextrandom observation.
To receive full credit, you need to derive the prediction interval
formula step by step and clearly state all assumptions.
4. File bone.csv contains data on 43 bone marrow transplant patients which were collected by
Ohio State University Bone Marrow Transplant Unit. All patients had either Hodgkins dis-ease (HOD) or non-Hodgkins lymphoma (NHL) and were given either an allogeneic (Allo)transplant from a human leukocyte antigen (HLA) match sibling donor or an autogenic(Auto) transplant. Also included are two possible explanatory variables, Z1 =Karnofskyscore at transplant and Z2 =the waiting time in months from diagnosis to transplant. TheKarnofsky score is a measure of performance status and it runs from 100 to 0, where 100 de-notes ”perfect” health and 0 is death. Of interest is whether the leukemia-free survivorshipdepends on the transplant-disease type (Allo-NHL, Auto-NHL, Allo-HOD, or Auto-HOD).
The data is taken from Table 1.5 on the page 12 of Klein and Moeschberger second edition.
(a) Fit a Cox proportional hazards model containing Z1, Z2, and transplant-disease type
using the Breslow, Efron and exact methods for handling tied survival times. Whichof the three methods would you adopt and why?
(b) Assess the significance of the model in part (a) using the likelihood ratio, Wald, and
score tests. Does the model predict survivorship?
(c) Based on the model in part (a), which variables appear to be insignificant to the
(d) Remove the insignificant variables and test for the adequacy of the reduced model
compared to the original model using the likelihood ratio test.
(e) Write down the fitted reduced model and discuss the effects of independent variables
(f) Draw the plot (estimated) survival function vs. time for an NHL patient who waited
60 days to receive an Allo transplant and his/her pre-transplant Karnofsky score was10.
(g) Conduct residual analyses to assess the overall adequacy of the fitted model.
(h) Conduct graphical checks to assess the proportional hazard assumption.
The variables in the dataset are as follows:
- (1=Non Hodgkin lymphoma, 2=Hodgkins disease)
- Death/relapse indicator (0=alive, 1=dead)
Transplant - Waiting time to transplant in months
5. File cd4long.csv contains data from a randomized, double-blind study of AIDS patients
with advanced immune suppression. Patients were randomized to one of four daily regi-mens containing 600mg of zidovudine. Measurements of CD4 counts were scheduled to becollected at baseline and at 8-week intervals during follow-up. However, the CD4 countdata are unbalanced due to mistimed measurements and missing data that resulted fromskipped visits and dropout. The number of measurements of CD4 counts during the first40 weeks of follow-up varied from 1 to 9, with a median of 4. The response variable is thelog transformed CD4 counts = log(CD4 count + 1) (logCD4), available on 1309 patientsidentified by ID. The categorical variable Treatment is coded 1 = zidovudine alternatingmonthly with 400mg didanosine, 2 = zidovudine plus 2.25mg of zalcitabine, 3 = zidovu-dine plus 400mg of didanosine, and 4 = zidovudine plus 400mg of didanosine plus 400mgof nevirapine. The variable Week represents time since baseline (in weeks).
(a) Fit a linear mixed model where each subject’s CD4 trajectory is represented by a
randomly varying line. Allow the average slopes to vary by treatment group, but
assume the baseline mean response to be the same in the four groups. What treatmentsare beneficial (higher CD4 count is better) according to this analysis?
(b) Fit the model with the same mean structure as in part (a), but with randomly varying
intercepts only. Is this model defensible? Why?
(c) Fit a model with each subjects’s CD4 trajectory represented by a randomly varying
piecewise linear spline with a knot at week 16. That is a model with random interceptsand two random slopes, one for the log CD4 counts before week 16, and one after week16. Allow the average slopes for changes in response before and after week 16 to varyby treatment group, but assume the baseline mean response to be the same in thefour groups. Can you compare this model with the model in part (a) directly? Whyor why not?
(d) Based on the model in part (c), construct a test of the null hypothesis of no treatment
effects in the changes in log CD4 counts. What are your conclusions?
(e) Based on the analysis in part (c) what are the interpretations of the treatment effects
from baseline to week 16, and from week 16 to week 40.
(f) What assumptions need to be made in the analysis in part (c) to guarantee the validity
6. Data set bodyweight.csv contains weight data on offspring of genetically modified mice.
There were 3 groups of mice. First 2 groups consist of offspring of mice with missing milkproduction protein, with first group fed by wild type mice (GM-XF) and the second groupfed by their own mothers (GM). Third group consists of wild type offspring fed by theirown mothers (WT). All mice pups were weaned on day 23 by separating them from thenursing female and placing them in cages with same sex litter mates. From weaning untilthe age of 42 days, water-soaked diet was provided ad libitum in Petri dishes on the floorof the weaned pups cages in all groups to facilitate solid food intake by growth impairedanimals. Before weaning the pups were individually weighed on days 1, 3, 7, 14 and 21.
After weaning the pups were weighed on day 23 and weekly till day 57. Investigators areinterested in answers to the following questions:
(a) Are the rates of weight gain different before and after weaning?
(b) Do the rates depend on the group (GM, GM-XF, and WT)?
(c) Did the three groups differ in weight gain over the study period?
(d) Did male mice grow faster than female mice?
State all the assumptions you make in the analysis including the form of the mean modeland covariance structure with the justification for both. (Hint: exploratory data analysismight be very helpful.)
APPLICATION NOTE R100 Respiratory: EOSINOPHIL - ADHESION ASSAY - ASTHMA To determine novel anti-inflammatory effects of MLK on MLK (10 nM and 100 nM) gave partial (~40%) but significant resting and GM-CSF-stimulated eosinophils using the Cellix (P<0.05) inhibition of unstimulated eosinophil adhesion to VenaFluxTM platform to mimic physiological adhesion to rhVCAM-1 at 2 dyne cm
Sewage Sludge Contents / Tip of Iceberg Heavy Metals, Pathogens, Synthetic Chemicals, Hydrocarbons, Petrochemicals & Organochlorines, Pharmaceuticals, Steroids & Hormones. This list of contents represents only the “tip of the iceberg” of toxics concentrated in sewage sludge. Federal and most state and local land application regulations limit concentrations of only nine heavy me