Preface

References

Acknowledgements

Notation

1. What’s the problem with missing data?

1.1 What do we mean by missing data?

1.1.1 Monotone and non-monotone missing data

1.1.2 Modeling missingness, modeling the missing value and ignorability

1.1.3 Types of missingness (MCAR, MAR, and MNAR)

1.1.4 Missing data and study objectives

1.2 An illustration

1.3 Why can’t I use only the available primary endpoint data?

1.4 What’s the problem with using last observation carried forward?

1.5 Can we just assume that data are missing at random?

1.6 What can be done if data may be missing not at random?

1.7 Stress-testing study results for robustness to missing data

1.8 How the pattern of dropouts can bias the outcome

1.9 How do we formulate a strategy for missing data?

1.10 Description of Example Datasets

1.10.1 Example dataset in Parkinson’s disease treatment

1.10.2 Example dataset in insomnia treatment

1.10.3 Example dataset in mania treatment

1.A Appendix: Formal definitions of MCAR, MAR, and MNAR

References

2 The prevention of missing data

2.1 Introduction

2.2 The impact of ‘too much’ missing data

2.2.1 Example from human immunodeficiency virus

2.2.2 Example from acute coronary syndrome

2.2.3 Example from studies in pain

2.3 The role of the statistician in the prevention of missing data

2.3.1 Illustrative example from HIV

Step 1: Quantifying the amount of missing data in previous trials, and its resultant impact

Step 2: Identifying subgroups of subjects who require an increased level of trial retention support

Step 3: Translating statistical analysis of previous trial data into information to inform future subject care

Step 4: Education of the clinical trial team and participation in the creation of missing data prevention plans

2.4 Methods for increasing subject retention

2.5 Improving understanding of reasons for subject withdrawal

2.6 Acknowledgements

2.7 Appendix 2.A: example protocol text for missing data prevention

References

3 Regulatory guidance – a quick tour

3.1 International Conference on Harmonization guideline: Statistical principles for clinical trials: E9

3.2 The U.S. and EU regulatory documents

3.3 Key points in the regulatory documents on missing data

3.4 Regulatory guidance on particular statistical approaches

3.4.1 Available cases

3.4.2 Single imputation methods

3.4.3 Methods that generally assume MAR

3.4.4 Methods that are used assuming MNAR

3.5 Guidance about how to plan for missing data in a study

3.6 Differences in emphasis between the NRC report and EU guidance documents

3.6.1 The term “conservative”

3.6.2 Last observation carried forward

3.6.3 Post hoc analyses

3.6.4 Non-monotone or intermittently missing data

3.6.5 Assumptions should be readily interpretable

3.6.6 Study report

3.6.7 Training

3.7 Other technical points from the NRC report

3.7.1 Time-to-event analyses

3.7.2 Tipping point sensitivity analyses

3.8 Other U.S./EU/international guidance documents that refer to missing data

3.8.1 Committee for Medicinal Products for Human Use guideline on anticancer products, recommendations on survival analysis

3.8.2 U.S. guidance on considerations when research supported by Office of Human Research Protections is discontinued

3.8.3 FDA guidance on data retention

3.9 And in practice?

References

4 A guide to planning for missing data

4.1 Introduction

4.1.1 Missing data may bias trial results or make them more difficult to generalize to subjects outside the trial

4.1.2 Credibility of trial results when there is missing data

4.1.3 Demand for better practice with regard to missing data

4.2 Planning for missing data

4.2.1 The case report form and non-statistical sections of the protocol

4.2.2 The statistical sections of the protocol and the statistical analysis plan

4.2.2.1 Summary of what the protocol should say about missing data in its statistical sections

4.2.2.2 Prespecification and flexibility in planning for missing data

4.2.2.3 Sources of bias in analyses of missing data

4.2.3 Using historic data to narrow the choice of primary analysis and sensitivity analyses

4.2.3.1 Using historic data, example 1: Parkinson’s disease

4.2.3.2 Using historic data, example 2: insomnia

4.2.3.3 Using historic data, example 3: mania

4.2.3.4 Using historic data, general conclusions from examples

4.2.4 Key points in choosing an approach for missing data

4.2.4.1 Assumptions for missing data: when to consider, when to avoid

MAR

LOCF-like approaches

BOCF-like and worst-case approaches

Control based or reference-based imputation

Assuming a variety of outcomes for missing values, depending on reason for discontinuation

4.2.4.2 Methods of implementing assumptions: when to consider, when to avoid

Likelihood based analysis of continuous outcome variables

Likelihood based analysis of binary response variables using longitudinal generalized linear mixed models (GLMM)

Doubly robust estimation

Multiple imputation

Pattern-mixture models

Selection models and shared parameter models

4.3 Exploring and presenting missingness

4.4 Model checking

4.5 Interpreting model results when there is missing data

4.6 Sample size and missing data

Appendix 4.A: Sample protocol/SAP text for study in Parkinson’s disease

Appendix 4.B: A formal definition of a sensitivity parameter

References

5 Mixed Models for Repeated Measures Using Categorical Time Effects (MMRM)

5.1 Introduction

5.2 Specifying the MMRM

5.2.1 The mixed model

5.2.2 Covariance structures

5.2.2.1 Unstructured

5.2.2.2 Toeplitz and heterogeneous Toeplitz patterns

5.2.2.3 Spatial covariance patterns

5.2.3 MMRM versus generalized estimating equations (GEE)

5.2.4 MMRM versus LOCF

5.3 Understanding the data

5.3.1 Parkinson’s disease example

5.3.2 A second example showing the usefulness of plots: CATIE

5.4 Applying the MMRM

5.4.1 Specifying the model

5.4.1.1 Analysis plan

5.4.1.2 Strategies to improve convergence

5.4.1.3 SAS code

5.4.2 Interpreting and presenting results

5.4.2.1 Unstructured covariance pattern

5.4.2.2 Heterogeneous Toeplitz covariance pattern

5.4.2.3 Spatial power covariance pattern

5.4.2.4 LOCF

5.4.2.5 GEE

5.5 Additional MMRM topics

5.5.1 Treatment by subgroup and treatment by site interactions

5.5.2 Calculating the effect size

5.5.3 Another strategy to model baseline

5.6 Logistic regression MMRM with generalized linear mixed model (GLMM)

5.6.1 The generalized linear mixed model

5.6.2 Specifying the model

5.6.2.1 Analysis plan

5.6.2.2 Pooling investigator sites to improve convergence

5.6.2.3 SAS code

5.6.3 Interpreting and presenting results

5.6.3.1 Logistic GLMM model adjusting for pooled site

5.6.3.2 Alternate repeated measures models

5.6.3.3 Univariate logistic models

5.6.3.4 Adjustment for pooled site versus country

5.6.4 Other modeling options

References

Table of SAS Code Fragments

6 Multiple imputation

6.1 Introduction

6.1.1 How is MI different from single imputation?

6.1.2 How is MI different from maximum likelihood methods?

6.1.3 MI’s assumptions about missingness mechanism

6.1.4 A general 3-step process for multiple imputation and inference

6.1.5 Imputation versus analysis model

6.1.6 Note on notation use

6.2 Imputation Phase

6.2.1 Missing patterns: monotone and non-monotone

6.2.2 How do we get multiple imputations?

6.2.3 Imputation strategies: sequential univariate versus joint multivariate

6.2.4 Overview of the imputation methods

6.2.5 Reusing the multiply-imputed dataset for different analyses or summary scales

6.3 Analysis phase: analyzing multiple imputed datasets

6.4 Pooling phase: combining results from multiple datasets

6.4.1 Combination rules

6.4.2 Pooling analyses of continuous outcomes

6.4.3 Pooling analyses of categorical outcomes

6.5 Required number of imputations

6.6 Some practical considerations

6.6.1 Choosing an imputation model

6.6.2 Multivariate normality

6.6.3 Rounding and restricting the range for the imputed values

6.6.4 Convergence of MCMC

6.7 Pre-specifying details of analysis with multiple imputation

6.A Appendix: Additional methods for multiple imputation

References

7 Analyses under missing-not-at-random assumptions

7.1 Introduction

7.2 Background to sensitivity analyses and pattern-mixture models

7.2.1 The purpose of a sensitivity analysis

7.2.2 Pattern-mixture models as sensitivity analyses

7.2.2.1 Traditional identifying restrictions: complete cases, neighboring cases, available cases

7.2.2.2 The versatility of pattern-mixture models

7.3 Two methods of implementing sensitivity analyses via PMMs

7.3.1 A sequential method of implementing pattern-mixture models with MI

7.3.2 Providing stress-testing “what ifs” using pattern-mixture models

7.3.3 Two implementations of pattern-mixture models for sensitivity analyses

7.3.4 Characteristics and limitations of the sequential modeling method of implementing PMMs

7.3.5 PMMs implemented using the joint modeling method

7.3.6 Characteristics of the joint modeling method of implementing PMMs

7.3.7 Summary of differences between the joint modeling and sequential modeling methods

7.4 A “toolkit”: implementing sensitivity analyses via SAS

7.4.1 Reminder: general approach using MI with regression

7.4.2 Sensitivity analyses assuming withdrawals have trajectory of control arm

7.4.3 Sensitivity analyses assuming withdrawals have distribution of control arm

7.4.4 BOCF-like and LOCF-like analyses

7.4.5 The general principle of using selected subsets of observed data as the basis to implement “what if” stress-tests

7.4.6 Using a mixture of “what ifs,” depending on reason for discontinuation

7.4.7 Assuming trajectory of withdrawals is worse by some delta: delta adjustment and tipping point analysis

7.4.7.1 Illustrative dataset for a clinical trial in mania

7.4.7.2 Implementing a delta adjustment

7.4.7.3 Implementing a “tipping point” sensitivity analysis.

7.5 Examples of realistic strategies and results for illustrative datasets of three indications

7.5.1 Parkinson’s disease

7.5.2 Insomnia

7.5.3 Mania

Appendix 7.A: How one could implement NCMV using visit-by-visit MI for the example trial

Appendix 7.B: SAS code to model withdrawals from the experimental arm, using observed data from the control arm

Appendix 7.C: SAS code to model early withdrawals from the experimental arm, using the LOCF-like values

Appendix 7.D: SAS macro to impose delta adjustment on a responder variable in the mania dataset

Appendix 7.E: SAS code to implement tipping point via exhaustive scenarios for the withdrawals in the mania dataset

Appendix 7.F: Code to perform sensitivity analyses for the Parkinson’s disease dataset

Appendix 7.G: Code to perform sensitivity analyses for the insomnia dataset

Appendix 7.H: Code to perform sensitivity analyses for the mania dataset

Appendix 7.I: Selection models

Appendix 7.J: Shared parameter models

References

Table of SAS Code Fragments

8 Doubly Robust Estimation

8.1 Introduction

8.2 Inverse probability weighted estimation

8.2.1 IPW estimators for estimating equations

8.2.2 Summary of IPW advantages

8.2.3 IPW disadvantages

8.3 Doubly robust estimation

8.3.1 Doubly robust methods explained

8.3.1.1 Linear regression example

8.3.2 Advantages of DR methods

8.3.3 Limitations of DR methods

8.4 Vansteelandt et al. method for doubly robust estimation

8.4.1 Theoretical justification for the Vansteelandt et al. method

8.4.2 Implementation of the Vansteelandt et al. method for DR estimation

8.4.2.1 Bootstrap estimates of the variance

8.4.2.2 Missingness model for longitudinal data

8.4.2.3 Final analysis model

8.4.2.4 Characteristics of Vansteelandt et al. method macro for calculating DR estimates of treatment effects

Non-monotone missing data

How regression parameters are estimated

Restrictions on regression parameters

Missing covariates

Binary covariates

Binary responses

Auxiliary variables

8.5 Implementing the Vansteelandt et al. method via SAS

8.5.1 Mania dataset

8.5.1.1 Implementing the Vansteelandt et al. method for the illustrative mania dataset

8.5.2 Insomnia dataset

Appendix 8.A: How to implement Vansteelandt et al. method for mania dataset (binary response)

Appendix 8.B: SAS code to calculate estimates from the bootstrapped data sets

Appendix 8.C: How to implement Vansteelandt et al. method for insomnia dataset

References

Table of SAS Code Fragments

Bibliography