What is included with this book?
This book provides practical guidance for statisticians, clinicians, and researchers involved in clinical trials in the biopharmaceutical industry, medical and public health organisations. Academics and students needing an introduction to handling missing data will also find this book invaluable.
The authors describe how missing data can affect the outcome and credibility of a clinical trial, show by examples how a clinical team can work to prevent missing data, and present the reader with approaches to address missing data effectively.
The book is illustrated throughout with realistic case studies and worked examples, and presents clear and concise guidelines to enable good planning for missing data. The authors show how to handle missing data in a way that is transparent and easy to understand for clinicians, regulators and patients. New developments are presented to improve the choice and implementation of primary and sensitivity analyses for missing data. Many SAS code examples are included – the reader is given a toolbox for implementing analyses under a variety of assumptions.
References
Acknowledgements
Notation
1. What’s the problem with missing data?
1.1 What do we mean by missing data?
1.1.1 Monotone and non-monotone missing data
1.1.2 Modeling missingness, modeling the missing value and ignorability
1.1.3 Types of missingness (MCAR, MAR, and MNAR)
1.1.4 Missing data and study objectives
1.2 An illustration
1.3 Why can’t I use only the available primary endpoint data?
1.4 What’s the problem with using last observation carried forward?
1.5 Can we just assume that data are missing at random?
1.6 What can be done if data may be missing not at random?
1.7 Stress-testing study results for robustness to missing data
1.8 How the pattern of dropouts can bias the outcome
1.9 How do we formulate a strategy for missing data?
1.10 Description of Example Datasets
1.10.1 Example dataset in Parkinson’s disease treatment
1.10.2 Example dataset in insomnia treatment
1.10.3 Example dataset in mania treatment
1.A Appendix: Formal definitions of MCAR, MAR, and MNAR
References
2 The prevention of missing data
2.1 Introduction
2.2 The impact of ‘too much’ missing data
2.2.1 Example from human immunodeficiency virus
2.2.2 Example from acute coronary syndrome
2.2.3 Example from studies in pain
2.3 The role of the statistician in the prevention of missing data
2.3.1 Illustrative example from HIV
Step 1: Quantifying the amount of missing data in previous trials, and its resultant impact
Step 2: Identifying subgroups of subjects who require an increased level of trial retention support
Step 3: Translating statistical analysis of previous trial data into information to inform future subject care
Step 4: Education of the clinical trial team and participation in the creation of missing data prevention plans
2.4 Methods for increasing subject retention
2.5 Improving understanding of reasons for subject withdrawal
2.6 Acknowledgements
2.7 Appendix 2.A: example protocol text for missing data prevention
References
3 Regulatory guidance – a quick tour
3.1 International Conference on Harmonization guideline: Statistical principles for clinical trials: E9
3.2 The U.S. and EU regulatory documents
3.3 Key points in the regulatory documents on missing data
3.4 Regulatory guidance on particular statistical approaches
3.4.1 Available cases
3.4.2 Single imputation methods
3.4.3 Methods that generally assume MAR
3.4.4 Methods that are used assuming MNAR
3.5 Guidance about how to plan for missing data in a study
3.6 Differences in emphasis between the NRC report and EU guidance documents
3.6.1 The term “conservative”
3.6.2 Last observation carried forward
3.6.3 Post hoc analyses
3.6.4 Non-monotone or intermittently missing data
3.6.5 Assumptions should be readily interpretable
3.6.6 Study report
3.6.7 Training
3.7 Other technical points from the NRC report
3.7.1 Time-to-event analyses
3.7.2 Tipping point sensitivity analyses
3.8 Other U.S./EU/international guidance documents that refer to missing data
3.8.1 Committee for Medicinal Products for Human Use guideline on anticancer products, recommendations on survival analysis
3.8.2 U.S. guidance on considerations when research supported by Office of Human Research Protections is discontinued
3.8.3 FDA guidance on data retention
3.9 And in practice?
References
4 A guide to planning for missing data
4.1 Introduction
4.1.1 Missing data may bias trial results or make them more difficult to generalize to subjects outside the trial
4.1.2 Credibility of trial results when there is missing data
4.1.3 Demand for better practice with regard to missing data
4.2 Planning for missing data
4.2.1 The case report form and non-statistical sections of the protocol
4.2.2 The statistical sections of the protocol and the statistical analysis plan
4.2.2.1 Summary of what the protocol should say about missing data in its statistical sections
4.2.2.2 Prespecification and flexibility in planning for missing data
4.2.2.3 Sources of bias in analyses of missing data
4.2.3 Using historic data to narrow the choice of primary analysis and sensitivity analyses
4.2.3.1 Using historic data, example 1: Parkinson’s disease
4.2.3.2 Using historic data, example 2: insomnia
4.2.3.3 Using historic data, example 3: mania
4.2.3.4 Using historic data, general conclusions from examples
4.2.4 Key points in choosing an approach for missing data
4.2.4.1 Assumptions for missing data: when to consider, when to avoid
MAR
LOCF-like approaches
BOCF-like and worst-case approaches
Control based or reference-based imputation
Assuming a variety of outcomes for missing values, depending on reason for discontinuation
4.2.4.2 Methods of implementing assumptions: when to consider, when to avoid
Likelihood based analysis of continuous outcome variables
Likelihood based analysis of binary response variables using longitudinal generalized linear mixed models (GLMM)
Doubly robust estimation
Multiple imputation
Pattern-mixture models
Selection models and shared parameter models
4.3 Exploring and presenting missingness
4.4 Model checking
4.5 Interpreting model results when there is missing data
4.6 Sample size and missing data
Appendix 4.A: Sample protocol/SAP text for study in Parkinson’s disease
Appendix 4.B: A formal definition of a sensitivity parameter
References
5 Mixed Models for Repeated Measures Using Categorical Time Effects (MMRM)
5.1 Introduction
5.2 Specifying the MMRM
5.2.1 The mixed model
5.2.2 Covariance structures
5.2.2.1 Unstructured
5.2.2.2 Toeplitz and heterogeneous Toeplitz patterns
5.2.2.3 Spatial covariance patterns
5.2.3 MMRM versus generalized estimating equations (GEE)
5.2.4 MMRM versus LOCF
5.3 Understanding the data
5.3.1 Parkinson’s disease example
5.3.2 A second example showing the usefulness of plots: CATIE
5.4 Applying the MMRM
5.4.1 Specifying the model
5.4.1.1 Analysis plan
5.4.1.2 Strategies to improve convergence
5.4.1.3 SAS code
5.4.2 Interpreting and presenting results
5.4.2.1 Unstructured covariance pattern
5.4.2.2 Heterogeneous Toeplitz covariance pattern
5.4.2.3 Spatial power covariance pattern
5.4.2.4 LOCF
5.4.2.5 GEE
5.5 Additional MMRM topics
5.5.1 Treatment by subgroup and treatment by site interactions
5.5.2 Calculating the effect size
5.5.3 Another strategy to model baseline
5.6 Logistic regression MMRM with generalized linear mixed model (GLMM)
5.6.1 The generalized linear mixed model
5.6.2 Specifying the model
5.6.2.1 Analysis plan
5.6.2.2 Pooling investigator sites to improve convergence
5.6.2.3 SAS code
5.6.3 Interpreting and presenting results
5.6.3.1 Logistic GLMM model adjusting for pooled site
5.6.3.2 Alternate repeated measures models
5.6.3.3 Univariate logistic models
5.6.3.4 Adjustment for pooled site versus country
5.6.4 Other modeling options
References
Table of SAS Code Fragments
6 Multiple imputation
6.1 Introduction
6.1.1 How is MI different from single imputation?
6.1.2 How is MI different from maximum likelihood methods?
6.1.3 MI’s assumptions about missingness mechanism
6.1.4 A general 3-step process for multiple imputation and inference
6.1.5 Imputation versus analysis model
6.1.6 Note on notation use
6.2 Imputation Phase
6.2.1 Missing patterns: monotone and non-monotone
6.2.2 How do we get multiple imputations?
6.2.3 Imputation strategies: sequential univariate versus joint multivariate
6.2.4 Overview of the imputation methods
6.2.5 Reusing the multiply-imputed dataset for different analyses or summary scales
6.3 Analysis phase: analyzing multiple imputed datasets
6.4 Pooling phase: combining results from multiple datasets
6.4.1 Combination rules
6.4.2 Pooling analyses of continuous outcomes
6.4.3 Pooling analyses of categorical outcomes
6.5 Required number of imputations
6.6 Some practical considerations
6.6.1 Choosing an imputation model
6.6.2 Multivariate normality
6.6.3 Rounding and restricting the range for the imputed values
6.6.4 Convergence of MCMC
6.7 Pre-specifying details of analysis with multiple imputation
6.A Appendix: Additional methods for multiple imputation
References
7 Analyses under missing-not-at-random assumptions
7.1 Introduction
7.2 Background to sensitivity analyses and pattern-mixture models
7.2.1 The purpose of a sensitivity analysis
7.2.2 Pattern-mixture models as sensitivity analyses
7.2.2.1 Traditional identifying restrictions: complete cases, neighboring cases, available cases
7.2.2.2 The versatility of pattern-mixture models
7.3 Two methods of implementing sensitivity analyses via PMMs
7.3.1 A sequential method of implementing pattern-mixture models with MI
7.3.2 Providing stress-testing “what ifs” using pattern-mixture models
7.3.3 Two implementations of pattern-mixture models for sensitivity analyses
7.3.4 Characteristics and limitations of the sequential modeling method of implementing PMMs
7.3.5 PMMs implemented using the joint modeling method
7.3.6 Characteristics of the joint modeling method of implementing PMMs
7.3.7 Summary of differences between the joint modeling and sequential modeling methods
7.4 A “toolkit”: implementing sensitivity analyses via SAS
7.4.1 Reminder: general approach using MI with regression
7.4.2 Sensitivity analyses assuming withdrawals have trajectory of control arm
7.4.3 Sensitivity analyses assuming withdrawals have distribution of control arm
7.4.4 BOCF-like and LOCF-like analyses
7.4.5 The general principle of using selected subsets of observed data as the basis to implement “what if” stress-tests
7.4.6 Using a mixture of “what ifs,” depending on reason for discontinuation
7.4.7 Assuming trajectory of withdrawals is worse by some delta: delta adjustment and tipping point analysis
7.4.7.1 Illustrative dataset for a clinical trial in mania
7.4.7.2 Implementing a delta adjustment
7.4.7.3 Implementing a “tipping point” sensitivity analysis.
7.5 Examples of realistic strategies and results for illustrative datasets of three indications
7.5.1 Parkinson’s disease
7.5.2 Insomnia
7.5.3 Mania
Appendix 7.A: How one could implement NCMV using visit-by-visit MI for the example trial
Appendix 7.B: SAS code to model withdrawals from the experimental arm, using observed data from the control arm
Appendix 7.C: SAS code to model early withdrawals from the experimental arm, using the LOCF-like values
Appendix 7.D: SAS macro to impose delta adjustment on a responder variable in the mania dataset
Appendix 7.E: SAS code to implement tipping point via exhaustive scenarios for the withdrawals in the mania dataset
Appendix 7.F: Code to perform sensitivity analyses for the Parkinson’s disease dataset
Appendix 7.G: Code to perform sensitivity analyses for the insomnia dataset
Appendix 7.H: Code to perform sensitivity analyses for the mania dataset
Appendix 7.I: Selection models
Appendix 7.J: Shared parameter models
References
Table of SAS Code Fragments
8 Doubly Robust Estimation
8.1 Introduction
8.2 Inverse probability weighted estimation
8.2.1 IPW estimators for estimating equations
8.2.2 Summary of IPW advantages
8.2.3 IPW disadvantages
8.3 Doubly robust estimation
8.3.1 Doubly robust methods explained
8.3.1.1 Linear regression example
8.3.2 Advantages of DR methods
8.3.3 Limitations of DR methods
8.4 Vansteelandt et al. method for doubly robust estimation
8.4.1 Theoretical justification for the Vansteelandt et al. method
8.4.2 Implementation of the Vansteelandt et al. method for DR estimation
8.4.2.1 Bootstrap estimates of the variance
8.4.2.2 Missingness model for longitudinal data
8.4.2.3 Final analysis model
8.4.2.4 Characteristics of Vansteelandt et al. method macro for calculating DR estimates of treatment effects
Non-monotone missing data
How regression parameters are estimated
Restrictions on regression parameters
Missing covariates
Binary covariates
Binary responses
Auxiliary variables
8.5 Implementing the Vansteelandt et al. method via SAS
8.5.1 Mania dataset
8.5.1.1 Implementing the Vansteelandt et al. method for the illustrative mania dataset
8.5.2 Insomnia dataset
Appendix 8.A: How to implement Vansteelandt et al. method for mania dataset (binary response)
Appendix 8.B: SAS code to calculate estimates from the bootstrapped data sets
Appendix 8.C: How to implement Vansteelandt et al. method for insomnia dataset
References
Table of SAS Code Fragments
Bibliography