did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780471226161

Analyzing Microarray Gene Expression Data

by ; ;
  • ISBN13:

    9780471226161

  • ISBN10:

    0471226165

  • Edition: 1st
  • Format: Hardcover
  • Copyright: 2004-08-04
  • Publisher: Wiley-Interscience

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $200.47 Save up to $50.12
  • Buy Used
    $150.35
    Add to Cart Free Shipping Icon Free Shipping

    USUALLY SHIPS IN 2-4 BUSINESS DAYS

Supplemental Materials

What is included with this book?

Summary

Emphasis on clustering of data of gene tissues * Includes new research findings and activities in molecular biology * Highlights the important general field of bioinformatics and genomics and discusses the impact of microarray analysis on both

Author Biography

GEOFFREY J. McLACHLAN, PhD, is Professor of Statistics at the University of Queensland, Australia, and the author of four very successful statistical texts. KIM-ANH DO, PhD, is Professor of Biostatistics at the University of Texas MD Anderson Cancer Center in Houston, Texas. CHRISTOPHE AMBROISE, PhD, is Lecturer at the Université de Technologie de Compiègne in France.

Table of Contents

Preface xv
1 Microarrays in Gene Expression Studies 1(30)
1.1 Introduction
1(1)
1.2 Background Biology
2(3)
1.2.1 Genome, Genotype, and Gene Expression
2(1)
1.2.2 Of Wild-Types and Other Alleles
3(1)
1.2.3 Aspects of Underlying Biology and Physiochemistry
4(1)
1.3 Polymerase Chain Reaction
5(1)
1.4 cDNA
6(1)
1.4.1 Expressed Sequence Tag
6(1)
1.5 Microarray Technology and Application
7(17)
1.5.1 History of Microarray Development
8(2)
1.5.2 Tools of Microarray Technology
10(8)
1.5.3 Limitations of Microarray Technology
18(2)
1.5.4 Oligonucleotides versus cDNA Arrays
20(3)
1.5.5 SAGE: Another Method for Detecting and Measuring Gene Expression Levels
23(1)
1.5.6 Emerging Technologies
24(1)
1.6 Sampling of Relevant Research Entities and Public Resources
24(7)
2 Cleaning and Normalization 31(30)
2.1 Introduction
31(1)
2.2 Cleaning Procedures
32(6)
2.2.1 Image Processing to Extract Information
32(4)
2.2.2 Missing Value Estimation
36(2)
2.2.3 Sources of Nonlinearity
38(1)
2.3 Normalization and Plotting Procedures for Oligonucleotide Arrays
38(6)
2.3.1 Global Approaches for Oligonucleotide Array Data
38(1)
2.3.2 Spiked Standard Approaches
39(2)
2.3.3 Geometric Mean and Linear Regression Normalization for Multiple Arrays
41(1)
2.3.4 Nonlinear Normalization for Multiple Arrays Using Smooth Curves
42(2)
2.4 Normalization Methods for cDNA Microarray Data
44(8)
2.4.1 Single-Array Normalization
46(2)
2.4.2 Multiple Slides Normalization
48(1)
2.4.3 ANOVA and Related Methods for Normalization
49(1)
2.4.4 Mixed-Model Method for Normalization
50(1)
2.4.5 SNOMAD
51(1)
2.5 Transformations and Replication
52(4)
2.5.1 Importance of Replication
52(1)
2.5.2 Transformations
53(3)
2.6 Analysis of the Alon Data Set
56(1)
2.7 Comparison of Normalization Strategies and Discussion
56(5)
3 Some Cluster Analysis Methods 61(38)
3.1 Introduction
61(1)
3.2 Reduction in the Dimension of the Feature Space
62(1)
3.3 Cluster Analysis
63(1)
3.4 Some Hierarchical Agglomerative Techniques
64(4)
3.5 kappa-Means Clustering
68(1)
3.6 Cluster Analysis with No A Priori Metric
69(1)
3.7 Clustering via Finite Mixture Models
69(3)
3.7.1 Definition
69(2)
3.7.2 Advantages of Model-Based Clustering
71(1)
3.8 Fitting Mixture Models Via the EM Algorithm
72(3)
3.8.1 E-Step
73(1)
3.8.2 M-Step
74(1)
3.8.3 Choice of Starting Values for the EM Algorithm
75(1)
3.9 Clustering Via Normal Mixtures
75(22)
3.9.1 Heteroscedastic Components
75(1)
3.9.2 Homoscedastic Components
76(1)
3.9.3 Spherical Components
76(1)
3.9.4 Choice of Root
77(1)
3.9.5 Available Software
77(1)
3.10 Mixtures of τ Distributions
78(1)
3.11 Mixtures of Factor Analyzers
78(2)
3.12 Choice of Clustering Solution
80(1)
3.13 Classification ML Approach
81(1)
3.14 Mixture Models for Clinical and Microarray Data
82(2)
3.14.1 Unconditional Approach
83(1)
3.14.2 Conditional Approach
84(1)
3.15 Choice of the Number of Components in a Mixture Model
84(2)
3.15.1 Order of a Mixture Model
84(1)
3.15.2 Approaches for As ssing Mixture Order
84(1)
3.15.3 Bayesian Informatio Criterion
85(1)
3.15.4 Integrated Classificati n Likelihood Criterion
85(1)
3.16 Resampling Approach
86(1)
3.17 Other Resampling Approaches for Number of Clusters
87(1)
3.17.1 The Gap Statistic
87(1)
3.17.2 The Clest Method for the Number of Clusters
88(1)
3.18 Simulation Results for Two Resampling Approaches
88(3)
3.19 Principal Component Analysis
91(3)
3.19.1 Introduction
91(2)
3.19.2 Singular Value Decomposition
93(1)
3.19.3 Some Other Multivariate Exploratory Methods
94(1)
3.20 Canonical Variate Analysis
94(3)
3.20.1 Linear Projections with Group Structure
94(1)
3.20.2 Canonical Variates
95(2)
3.21 Partial Least Squares
97(2)
4 Clustering of Tissue Samples 99(34)
4.1 Introduction
99(1)
4.2 Notation
100(1)
4.3 Two Clustering Problems
101(1)
4.4 Principal Component Analysis
102(1)
4.5 The EMMIX-GENE Clustering Procedure
103(1)
4.6 Step 1: Screening of Genes
104(1)
4.7 Step 2: Clustering of Genes: Formation of Metagenes
105(2)
4.8 Step 3: Clustering of Tissues
107(1)
4.9 EMMIX-GENE Software
108(1)
4.10 Example: Clustering of Alon Data
108(4)
4.10.1 Clustering on Basis of 446 Genes
108(1)
4.10.2 Clustering on Basis of Gene Groups
109(3)
4.10.3 Clustering on Basis of Metagenes
112(1)
4.11 Example: Clustering of van't Veer Data
112(12)
4.11.1 Screening and Clustering of Genes
113(2)
4.11.2 Usefulness of the Selected Genes
115(6)
4.11.3 Clustering of Tissues
121(2)
4.11.4 Use of Underlying Signatures with Clinical Data
123(1)
4.12 Choosing the Number of Clusters in Microarray Data
124(1)
4.12.1 Some Previous Attempts
124(1)
4.13 Likelihood Ratio Test Applied to Microarray Data
125(3)
4.13.1 Golub Data
125(1)
4.13.2 Alizadeh Data
126(1)
4.13.3 Bittner Data
127(1)
4.13.4 van't Veer Data
127(1)
4.14 Effect of Selection Bias on the Number of Clusters
128(1)
4.15 Clustering on Microarray arid-Clinical Data
128(2)
4.16 Discussion
130(3)
5 Screening and Clustering of Genes 133(52)
5.1 Detection of Differentially Expressed Genes
133(4)
5.1.1 Introduction
133(1)
5.1.2 Fold Change
134(1)
5.1.3 Multiplicity Problem
134(1)
5.1.4 Overview of Literature
135(2)
5.2 Test of a Single Hypothesis
137(1)
5.3 Gene Statistics
138(1)
5.3.1 Calculation of Interactions via ANOVA Models
138(1)
5.3.2 Two-Sample τ-Statistics
139(1)
5.4 Multiple Hypothesis Testing
139(5)
5.4.1 Outcomes with Multiple Hypotheses
140(1)
5.4.2 Controlling the FWER
140(1)
5.4.3 False Discovery Rate (FDR)
141(1)
5.4.4 Benjamini-Hochberg Procedure
142(1)
5.4.5 False Nondiscovery Rate (FNR)
143(1)
5.4.6 Positive FDR
143(1)
5.4.7 Positive FNR
143(1)
5.4.8 Linking False Rates with Posterior Probabilities
143(1)
5.5 Null Distribution of Test Statistic
144(4)
5.5.1 Permutation Method
144(1)
5.5.2 Null Replications of the Test Statistic
145(1)
5.5.3 The SAM Method
146(1)
5.5.4 Application of SAM Method to Alon Data
146(2)
5.6 Recent Approaches for Strong Control of the FDR
148(6)
5.6.1 The q-Value
148(1)
5.6.2 Technical Definition of q-Value
149(1)
5.6.3 Controlling FDR Strongly
150(1)
5.6.4 Selecting Genes via the q-Value
151(1)
5.6.5 Application to Hedenfalk Data
152(2)
5.7 Two-Component Mixture Model Framework
154(4)
5.7.1 Definition of Model
154(1)
5.7.2 Bayes Rule
155(1)
5.7.3 Estimated FDR
155(1)
5.7.4 Bayes Risk in terms of Estimated FDR and FNR
156(2)
5.8 Nonparametric Empirical Bayes Approach
158(2)
5.8.1 Method of Efron et al. (2001)
158(1)
5.8.2 Mixture Model Method (MMM)
158(1)
5.8.3 Nonparametric Bayesian Approach
159(1)
5.8.4 Application of Empirical Bayes Methods to Alon Data
159(1)
5.9 Parametric Mixture Models for Differential Gene Expression
160(6)
5.9.1 Parametric Empirical Bayes Methods
160(4)
5.9.2 Finding Clusters of Differentially Expressed Genes
164(1)
5.9.3 Example: Fitting Normal Mixtures to τ-Statistic Values
165(1)
5.10 Use of the Rho-Value as a Summary Statistic
166(5)
5.10.1 Beta Mixture for Distribution of Rho-Values
168(1)
5.10.2 Example: Fitting Beta Mixtures to Rho-Values
169(2)
5.11 Clustering of Genes
171(2)
5.12 Finding Correlated Genes
173(1)
5.13 Clustering of Genes via Full Expression Profiles
173(1)
5.14 Clustering of Genes via PCA of Expression Profiles
174(1)
5.15 Clustering of Genes with Repeated Measurements
175(2)
5.15.1 A Mixture Model for Technical Replicates
175(1)
5.15.2 Application of EM Algorithm
176(1)
5.15.3 M-Step
176(1)
5.16 Gene Shaving
177(8)
5.16.1 Introduction
177(1)
5.16.2 Methodology and implementation
177(1)
5.16.3 Optimal cluster size via the Gap statistic
178(1)
5.16.4 Supervised Gene Shaving
179(1)
5.16.5 Real Data Example
179(1)
5.16.6 Computer Software
180(5)
6 Discriminant Analysis 185(36)
6.1 Introduction
185(1)
6.2 Basic Notation
185(2)
6.3 Error Rates
187(1)
6.4 Decision-Theoretic Approach
187(2)
6.5 Training Data
189(1)
6.6 Different Types of Error Rates
190(1)
6.7 Sample-Based Discriminant Rules
191(1)
6.8 Parametric Discriminant Rules
192(1)
6.9 Discrimination via Normal Models
193(6)
6.9.1 Heteroscedastic Normal Model
193(1)
6.9.2 Plug-in Sample NQDR
194(1)
6.9.3 Homoscedastic Normal Model
195(2)
6.9.4 Optimal Error Rates
197(1)
6.9.5 Plug-in Sample NLDR
197(1)
6.9.6 Normal Mixture Model
198(1)
6.10 Fisher's Linear Discriminant Function
199(2)
6.10.1 Separation Approach
199(1)
6.10.2 Regression Approach
199(2)
6.11 Logistic Discrimination
201(1)
6.12 Nearest-Centroid Rule
202(1)
6.13 Support Vector Machines
203(4)
6.13.1 Two Classes
203(1)
6.13.2 Selection of Feature Variables
204(1)
6.13.3 Multiple Classes
205(1)
6.13.4 Computer Software
206(1)
6.14 Variants of Support Vector Machines
207(1)
6.15 Neural Networks
207(1)
6.16 Nearest-Neighbor Rules
208(2)
6.16.1 Introduction
208(1)
6.16.2 Definition of a kappa-NN Rule
209(1)
6.17 Classification Trees
210(1)
6.18 Error-Rate Estimation
211(2)
6.18.1 Apparent Error Rate
211(2)
6.18.2 Bias Correction of the Apparent Error Rate
213(1)
6.19 Cross-Validation
213(1)
6.19.1 Leave-One-Out(LOO) Estimator
213(1)
6.19.2 q-Fold Cross-Validation
214(1)
6.20 Error-Rate Estimation via the Bootstrap
214(2)
6.20.1 The 0.632 Estimator
214(1)
6.20.2 Mean Squared Error of the Estimated Error Rate
215(1)
6.21 Selection of Feature Variables
216(2)
6.22 Error-Rate Estimation with Selection Bias
218(3)
6.22.1 Selection Bias
218(1)
6.22.2 External Cross-Validation
218(1)
6.22.3 The 0.632+ Estimator
219(2)
7 Supervised Classification of Tissue Samples 221(32)
7.1 Introduction
221(1)
7.2 Reducing the Dimension of the Feature Space of Genes
222(2)
7.2.1 Principal Components
223(1)
7.2.2 Partial Least Squares
223(1)
7.2.3 Ranking of Genes
223(1)
7.2.4 Grouping of Genes
224(1)
7.3 SVM with Recursive Feature Elimination (RFE)
224(2)
7.4 Selection Bias: SVM with RFE
226(2)
7.5 Selection Bias: Fisher's Rule with Forward Selection
228(2)
7.6 Selection Bias: Noninformative Data
230(2)
7.7 Discussion of Selection Bias
232(1)
7.8 Selection of Marker Genes with SVM
233(3)
7.8.1 Description of van de Vijver Breast Cancer Data
233(1)
7.8.2 Application of SVM with RFE
234(2)
7.9 Nearest-Shrunken Centroids
236(3)
7.9.1 Definition
236(3)
7.10 Comparison of Nearest-Shrunken Centroids with SVM
239(6)
7.10.1 Alon Data
239(1)
7.10.2 van de Vijver Data
239(6)
7.11 Selection Bias Working with the Top 70 Genes
245(4)
7.11.1 Bias in Error Rates
245(1)
7.11.2 Bias in Comparative Studies of Error Rates
246(2)
7.11.3 Bias in Plots
248(1)
7.12 Discriminant Rules Via Initial Grouping of Genes
249(4)
7.12.1 Supervised Version of EMMIX-GENE
249(1)
7.12.2 Bayesian Tree Classification
249(1)
7.12.3 Tree Harvesting
249(1)
7.12.4 Block PCA
250(1)
7.12.5 Grouping of Genes via Supervised Procedures
250(3)
8 Linking Microarray Data with Survival Analysis 253(14)
8.1 Introduction
253(1)
8.2 Four Lung Cancer Data Sets
254(1)
8.3 Statistical Analysis of Two Data Sets
255(1)
8.4 Ontario Data set
256(5)
8.4.1 Cluster Analysis
256(3)
8.4.2 Survival Analysis
259(1)
8.4.3 Discriminant Analysis
260(1)
8.5 Stanford Data Set
261(5)
8.5.1 Cluster Analysis of AC Tumors
262(1)
8.5.2 Survival Analysis
263(3)
8.5.3 Discriminant Analysis
266(1)
8.6 Discussion
266(1)
References 267(30)
Author Index 297(16)
Subject Index 313

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program