# Regression Analysis by Example

**by**Chatterjee, Samprit; Hadi, Ali S.

### 9780470905845

## Questions About This Book?

- The
**New**copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any CDs, lab manuals, study guides, etc. - The
**Used**copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. - The
**Rental**copy of this book is not guaranteed to include any supplemental materials. You may receive a brand new copy, but typically, only the book itself.

## Summary

## Author Biography

**SAMPRIT CHATTERJEE, PhD,** is Professor Emeritus of Statistics at New York University. A Fellow of the American Statistical Association, Dr. Chatterjee has been a Fulbright scholar in both Kazhakstan and Mongolia. He is the coauthor of *Sensitivity Analysis in Linear Regression* and *A Casebook for a First Course in Statistics and Data Analysis,* both published by Wiley.

**ALI S. HADI, PhD,** is a Distinguished University Professor and former vice provost at the American University in Cairo (AUC). He is the founding Director of the Actuarial Science Program at AUC. He is also a Stephen H. Weiss Presidential Fellow and Professor Emeritus at Cornell University. Dr. Hadi is the author of four other books, a Fellow of the American Statistical Association, and an elected Member of the International Statistical Institute.

## Table of Contents

Preface | p. xiv |

Introduction | p. 1 |

What Is Regression Analysis? | p. 1 |

Publicly Available Data Sets | p. 2 |

Selected Applications of Regression Analysis | p. 3 |

Steps in Regression Analysis | p. 13 |

Scope and Organization of the Book | p. 21 |

Exercises | p. 23 |

Simple Linear Regression | p. 25 |

Introduction | p. 25 |

Covariance and Correlation Coefficient | p. 25 |

Example: Computer Repair Data | p. 30 |

The Simple Linear Regression Model | p. 32 |

Parameter Estimation | p. 33 |

Tests of Hypotheses | p. 36 |

Confidence Intervals | p. 41 |

Predictions | p. 41 |

Measuring the Quality of Fit | p. 43 |

Regression Line Through the Origin | p. 46 |

Trivial Regression Models | p. 48 |

Bibliographic Notes | p. 49 |

Exercises | p. 49 |

Multiple Linear Regression | p. 57 |

Introduction | p. 57 |

Description of the Data and Model | p. 57 |

Example: Supervisor Performance Data | p. 58 |

Parameter Estimation | p. 61 |

Interpretations of Regression Coefficients | p. 62 |

Centering and Scaling | p. 64 |

Properties of the Least Squares Estimators | p. 67 |

Multiple Correlation Coefficient | p. 68 |

Inference for Individual Regression Coefficients | p. 69 |

Tests of Hypotheses in a Linear Model | p. 71 |

Predictions | p. 81 |

Summary | p. 82 |

Exercises | p. 82 |

Appendix: Multiple Regression in Matrix Notation | p. 89 |

Regression Diagnostics: Detection of Model Violations | p. 93 |

Introduction | p. 93 |

The Standard Regression Assumptions | p. 94 |

Various Types of Residuals | p. 96 |

Graphical Methods | p. 98 |

Graphs Before Fitting a Model | p. 101 |

Graphs After Fitting a Model | p. 105 |

Checking Linearity and Normality Assumptions | p. 105 |

Leverage, Influence, and Outliers | p. 106 |

Measures of Influence | p. 111 |

The Potential-Residual Plot | p. 115 |

What to Do with the Outliers? | p. 116 |

Role of Variables in a Regression Equation | p. 117 |

Effects of an Additional Predictor | p. 122 |

Robust Regression | p. 123 |

Exercises | p. 123 |

Qualitative Variables as Predictors | p. 129 |

Introduction | p. 129 |

Salary Survey Data | p. 130 |

Interaction Variables | p. 133 |

Systems of Regression Equations | p. 136 |

Other Applications of Indicator Variables | p. 147 |

Seasonality | p. 148 |

Stability of Regression Parameters Over Time | p. 149 |

Exercises | p. 151 |

Transformation of Variables | p. 163 |

Introduction | p. 163 |

Transformations to Achieve Linearity | p. 165 |

Bacteria Deaths Due to XRay Radiation | p. 167 |

Transformations to Stabilize Variance | p. 171 |

Detection of Heteroscedastic Errors | p. 176 |

Removal of Heteroscedasticity | p. 178 |

Weighted Least Squares | p. 179 |

Logarithmic Transformation of Data | p. 180 |

Power Transformation | p. 181 |

Summary | p. 185 |

Exercises | p. 186 |

Weighted Least Squares | p. 191 |

Introduction | p. 191 |

Heteroscedastic Models | p. 192 |

Two-Stage Estimation | p. 195 |

Education Expenditure Data | p. 197 |

Fitting a Dose-Response Relationship Curve | p. 206 |

Exercises | p. 208 |

The Problem of Correlated Errors | p. 209 |

Introduction: Autocorrelation | p. 209 |

Consumer Expenditure and Money Stock | p. 210 |

Durbin-Watson Statistic | p. 212 |

Removal of Autocorrelation by Transformation | p. 214 |

Iterative Estimation With Autocorrelated Errors | p. 216 |

Autocorrelation and Missing Variables | p. 217 |

Analysis of Housing Starts | p. 218 |

Limitations of Durbin-Watson Statistic | p. 222 |

Indicator Variables to Remove Seasonality | p. 223 |

Regressing Two Time Series | p. 226 |

Exercises | p. 228 |

Analysis of Collinear Data | p. 233 |

Introduction | p. 233 |

Effects of Collinearity on Inference | p. 234 |

Effects of Collinearity on Forecasting | p. 240 |

Detection of Collinearity | p. 245 |

Exercises | p. 254 |

Working With Collinear Data | p. 259 |

Introduction | p. 259 |

Principal Components | p. 259 |

Computations Using Principal Components | p. 263 |

Imposing Constraints | p. 263 |

Searching for Linear Functions of the Î²âÇÖs | p. 267 |

Biased Estimation of Regression Coefficients | p. 272 |

Principal Components Regression | p. 272 |

Reduction of Collinearity in the Estimation Data | p. 274 |

Constraints on the Regression Coefficients | p. 276 |

Principal Components Regression: A Caution | p. 277 |

Ridge Regression | p. 280 |

Estimation by the Ridge Method | p. 281 |

Ridge Regression: Some Remarks | p. 285 |

Summary | p. 287 |

Bibliographic Notes | p. 288 |

Exercises | p. 288 |

Principal Components | p. 291 |

Ridge Regression | p. 294 |

Surrogate Ridge Regression | p. 297 |

Variable Selection Procedures | p. 299 |

Introduction | p. 299 |

Formulation of the Problem | p. 300 |

Consequences of Variables Deletion | p. 300 |

Uses of Regression Equations | p. 302 |

Criteria for Evaluating Equations | p. 303 |

Collinearity and Variable Selection | p. 306 |

Evaluating All Possible Equations | p. 306 |

Variable Selection Procedures | p. 307 |

General Remarks on Variable Selection Methods | p. 309 |

A Study of Supervisor Performance | p. 310 |

Variable Selection With Collinear Data | p. 314 |

The Homicide Data | p. 314 |

Variable Selection Using Ridge Regression | p. 317 |

Selection of Variables in an Air Pollution Study | p. 318 |

A Possible Strategy for Fitting Regression Models | p. 326 |

Bibliographic Notes | p. 327 |

Exercises | p. 328 |

Appendix: Effects of Incorrect Model Specifications | p. 332 |

Logistic Regression | p. 335 |

Introduction | p. 335 |

Modeling Qualitative Data | p. 336 |

The Logit Model | p. 336 |

Example: Estimating Probability of Bankruptcies | p. 338 |

Logistic Regression Diagnostics | p. 341 |

Determination of Variables to Retain | p. 342 |

Judging the Fit of a Logistic Regression | p. 345 |

The Multinomial Logit Model | p. 347 |

Multinomial Logistic Regression | p. 347 |

Classification Problem: Another Approach | p. 354 |

Exercises | p. 355 |

Further Topics | p. 359 |

Introduction | p. 359 |

Generalized Linear Model | p. 359 |

Poisson Regression Model | p. 360 |

Introduction of New Drugs | p. 361 |

Robust Regression | p. 363 |

Fitting a Quadratic Model | p. 364 |

Distribution of PCB in U.S. Bays | p. 366 |

Exercises | p. 370 |

Statistical Tables | p. 371 |

References | p. 381 |

Index | p. 389 |

Table of Contents provided by Publisher. All Rights Reserved. |