rent-now

Rent More, Save More! Use code: ECRENTAL

5% off 1 book, 7% off 2 books, 10% off 3+ books

9780387695020

Data Quality and Record Linkage Techniques

by ; ;
  • ISBN13:

    9780387695020

  • ISBN10:

    0387695028

  • Format: Paperback
  • Copyright: 2007-05-25
  • Publisher: Springer Verlag

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $109.99 Save up to $70.75
  • Buy Used
    $82.49
    Add to Cart Free Shipping Icon Free Shipping

    USUALLY SHIPS IN 2-4 BUSINESS DAYS

Summary

This book helps practitioners gain a deeper understanding, at an applied level, of the issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models. Here, we focus on the Fellegi-Holt edit-imputation model, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter record linkage model. Brief examples are included to show how these techniques work. In the second part of the book, the authors present real-world case studies in which one or more of these techniques are used. They cover a wide variety of application areas. These include mortgage guarantee insurance, medical, biomedical, highway safety, and social insurance as well as the construction of list frames and administrative lists. Readers will find this book a mixture of practical advice, mathematical rigor, management insight and philosophy. The long list of references at the end of the book enables readers to delve more deeply into the subjects discussed here. The authors also discuss the software that has been developed to apply the techniques described in our text.

Author Biography

Thomas N. Herzog, Ph.D., ASA is the Chief Actuary at the U.S. Department of Housing and Urban Development Fritz J. Scheuren, Ph.D., is a Vice President for Statistics with the National Opinion Research Center at the University of Chicago William E. Winkler, Ph.D., is Principal Researcher at the U.S. Census Bureau

Table of Contents

Prefacep. v
About the Authorsp. xiii
Introductionp. 1
Audience and Objectivep. 1
Scopep. 1
Structurep. 2
Data Quality: What It is, Why It is Important, and How to Achieve It
What Is Data Quality and Why Should We Care?p. 7
When Are Data of High Quality?p. 7
Why Care About Data Quality?p. 10
How Do You Obtain High-Quality Data?p. 11
Practical Tipsp. 13
Where Are We Now?p. 13
Examples of Entities Using Data to their Advantage/Disadvantagep. 17
Data Quality as a Competitive Advantagep. 17
Data Quality Problems and their Consequencesp. 20
How Many People Really Live to 100 and Beyond? Views from the United States, Canada, and the United Kingdomp. 25
Disabled Airplane Pilots - A Successful Application of Record Linkagep. 26
Completeness and Accuracy of a Billing Database: Why It Is Important to the Bottom Linep. 26
Where Are We Now?p. 27
Properties of Data Quality and Metrics for Measuring Itp. 29
Desirable Properties of Databases/Listsp. 29
Examples of Merging Two or More Lists and the Issues that May Arisep. 31
Metrics Used when Merging Listsp. 33
Where Are We Now?p. 35
Basic Data Quality Toolsp. 37
Data Elementsp. 37
Requirements Documentp. 38
A Dictionary of Testsp. 39
Deterministic Testsp. 40
Probabilistic Testsp. 44
Exploratory Data Analysis Techniquesp. 44
Minimizing Processing Errorsp. 46
Practical Tipsp. 46
Where Are We Now?p. 48
Specialized Tools for Database Improvement
Mathematical Preliminaries for Specialized Data Quality Techniquesp. 51
Conditional Independencep. 51
Statistical Paradigmsp. 53
Capture-Recapture Procedures and Applicationsp. 54
Automatic Editing and Imputation of Sample Survey Datap. 61
Introductionp. 61
Early Editing Effortsp. 63
Fellegi-Holt Model for Editingp. 64
Practical Tipsp. 65
Imputationp. 66
Constructing a Unified Edit/Imputation Modelp. 71
Implicit Edits - A Key Construct of Editing Softwarep. 73
Editing Softwarep. 75
Is Automatic Editing Taking Up Too Much Time and Money?p. 78
Selective Editingp. 79
Tips on Automatic Editing and Imputationp. 79
Where Are We Now?p. 80
Record Linkage - Methodologyp. 81
Introductionp. 81
Why Did Analysts Begin Linking Records?p. 82
Deterministic Record Linkagep. 82
Probabilistic Record Linkage - A Frequentist Perspectivep. 83
Probabilistic Record Linkage - A Bayesian Perspectivep. 91
Where Are We Now?p. 92
Estimating the Parameters of the Fellegi-Sunter Record Linkage Modelp. 93
Basic Estimation of Parameters Under Simple Agreement/Disagreement Patternsp. 93
Parameter Estimates Obtained via Frequency-Based Matchingp. 94
Parameter Estimates Obtained Using Data from Current Filesp. 96
Parameter Estimates Obtained via the EM Algorithmp. 97
Advantages and Disadvantages of Using the EM Algorithm to Estimate m- and u-probabilitiesp. 101
General Parameter Estimation Using the EM Algorithmp. 103
Where Are We Now?p. 106
Standardization and Parsingp. 107
Obtaining and Understanding Computer Filesp. 109
Standardization of Termsp. 110
Parsing of Fieldsp. 111
Where Are We Now?p. 114
Phonetic Coding Systems for Namesp. 115
Soundex System of Namesp. 115
NYSIIS Phonetic Decoderp. 119
Where Are We Now?p. 121
Blockingp. 123
Independence of Blocking Strategiesp. 124
Blocking Variablesp. 125
Using Blocking Strategies to Identify Duplicate List Entriesp. 126
Using Blocking Strategies to Match Records Between Two Sample Surveysp. 128
Estimating the Number of Matches Missedp. 130
Where Are We Now?p. 130
String Comparator Metrics for Typographical Errorp. 131
Jaro String Comparator Metric for Typographical Errorp. 131
Adjusting the Matching Weight for the Jaro String Comparatorp. 133
Winkler String Comparator Metric for Typographical Errorp. 133
Adjusting the Weights for the Winkler Comparator Metricp. 134
Where are We Now?p. 135
Record Linkage Case Studies
Duplicate FHA Single-Family Mortgage Records: A Case Study of Data Problems, Consequences, and Corrective Stepsp. 139
Introductionp. 139
FHA Case Numbers on Single-Family Mortgagesp. 141
Duplicate Mortgage Recordsp. 141
Mortgage Records with an Incorrect Termination Statusp. 145
Estimating the Number of Duplicate Mortgage Recordsp. 148
Record Linkage Case Studies in the Medical, Biomedical, and Highway Safety Areasp. 151
Biomedical and Genetic Research Studiesp. 151
Who goes to a Chiropractor?p. 153
National Master Patient Indexp. 154
Provider Access to Immunization Register Securely (PAiRS) Systemp. 155
Studies Required by the Intermodal Surface Transportation Efficiency Act of 1991p. 156
Crash Outcome Data Evaluation Systemp. 157
Constructing List Frames and Administrative Listsp. 159
National Address Register of Residences in Canadap. 160
USDA List Frame of Farms in the United Statesp. 162
List Frame Development for the US Census of Agriculturep. 165
Post-enumeration Studies of US Decennial Censusp. 166
Social Security and Related Topicsp. 169
Hidden Multiple Issuance of Social Security Numbersp. 169
How Social Security Stops Benefit Payments after Deathp. 173
CPS-IRS-SSA Exact Match Filep. 175
Record Linkage and Terrorismp. 177
Other Topics
Confidentiality: Maximizing Access to Micro-data while Protecting Privacyp. 181
Importance of High Quality of Data in the Original Filep. 182
Documenting Public-use Filesp. 183
Checking Re-identifiabilityp. 183
Elementary Masking Methods and Statistical Agenciesp. 186
Protecting Confidentiality of Medical Datap. 193
More-advanced Masking Methods - Synthetic Datasetsp. 195
Where Are We Now?p. 198
Review of Record Linkage Softwarep. 201
Governmentp. 201
Commercialp. 202
Checklist for Evaluating Record Linkage Softwarep. 203
Summary Chapterp. 209
Bibliographyp. 211
Indexp. 221
Table of Contents provided by Ingram. All Rights Reserved.

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program