9780321321367

Introduction to Data Mining

by ; ;
  • ISBN13:

    9780321321367

  • ISBN10:

    0321321367

  • Edition: 1st
  • Format: Hardcover
  • Copyright: 5/2/2005
  • Publisher: Pearson

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping On Orders Over $59!
    Your order must be $59 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
  • We Buy This Book Back!
    In-Store Credit: $10.50
    Check/Direct Deposit: $10.00
List Price: $161.80 Save up to $64.72
  • Rent Book $97.08
    Add to Cart Free Shipping

    TERM
    PRICE
    DUE

Supplemental Materials

What is included with this book?

  • The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
  • The Rental copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Summary

Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms.

Table of Contents

Preface vii
Introduction
1(18)
What Is Data Mining?
2(2)
Motivating Challenges
4(2)
The Origins of Data Mining
6(1)
Data Mining Tasks
7(4)
Scope and Organization of the Book
11(2)
Bibliographic Notes
13(3)
Exercises
16(3)
Data
19(78)
Types of Data
22(14)
Attributes and Measurement
23(6)
Types of Data Sets
29(7)
Data Quality
36(8)
Measurement and Data Collection Issues
37(6)
Issues Related to Applications
43(1)
Data Preprocessing
44(21)
Aggregation
45(2)
Sampling
47(3)
Dimensionality Reduction
50(2)
Feature Subset Selection
52(3)
Feature Creation
55(2)
Discretization and Binarization
57(6)
Variable Transformation
63(2)
Measures of Similarity and Dissimilarity
65(19)
Basics
66(1)
Similarity and Dissimilarity between Simple Attributes
67(2)
Dissimilarities between Data Objects
69(3)
Similarities between Data Objects
72(1)
Examples of Proximity Measures
73(7)
Issues in Proximity Calculation
80(3)
Selecting the Right Proximity Measure
83(1)
Bibliographic Notes
84(4)
Exercises
88(9)
Exploring Data
97(48)
The Iris Data Set
98(1)
Summary Statistics
98(7)
Frequencies and the Mode
99(1)
Percentiles
100(1)
Measures of Location: Mean and Median
101(1)
Measures of Spread: Range and Variance
102(2)
Multivariate Summary Statistics
104(1)
Other Ways to Summarize the Data
105(1)
Visualization
105(26)
Motivations for Visualization
105(1)
General Concepts
106(4)
Techniques
110(14)
Visualizing Higher-Dimensional Data
124(6)
Do's and Don'ts
130(1)
OLAP and Multidimensional Data Analysis
131(8)
Representing Iris Data as a Multidimensional Array
131(2)
Multidimensional Data: The General Case
133(2)
Analyzing Multidimensional Data
135(4)
Final Comments on Multidimensional Data Analysis
139(1)
Bibliographic Notes
139(2)
Exercises
141(4)
Classification: Basic Concepts, Decision Trees, and Model Evaluation
145(62)
Preliminaries
146(2)
General Approach to Solving a Classification Problem
148(2)
Decision Tree Induction
150(22)
How a Decision Tree Works
150(1)
How to Build a Decision Tree
151(4)
Methods for Expressing Attribute Test Conditions
155(3)
Measures for Selecting the Best Split
158(6)
Algorithm for Decision Tree Induction
164(2)
An Example: Web Robot Detection
166(2)
Characteristics of Decision Tree Induction
168(4)
Model Overfitting
172(14)
Overfitting Due to Presence of Noise
175(2)
Overfitting Due to Lack of Representative Samples
177(1)
Overfitting and the Multiple Comparison Procedure
178(1)
Estimation of Generalization Errors
179(5)
Handling Overfitting in Decision Tree Induction
184(2)
Evaluating the Performance of a Classifier
186(2)
Holdout Method
186(1)
Random Subsampling
187(1)
Cross-Validation
187(1)
Bootstrap
188(1)
Methods for Comparing Classifiers
188(5)
Estimating a Confidence Interval for Accuracy
189(2)
Comparing the Performance of Two Models
191(1)
Comparing the Performance of Two Classifiers
192(1)
Bibliographic Notes
193(5)
Exercises
198(9)
Classification: Alternative Techniques
207(120)
Rule-Based Classifier
207(16)
How a Rule-Based Classifier Works
209(2)
Rule-Ordering Schemes
211(1)
How to Build a Rule-Based Classifier
212(1)
Direct Methods for Rule Extraction
213(8)
Indirect Methods for Rule Extraction
221(2)
Characteristics of Rule-Based Classifiers
223(1)
Nearest-Neighbor classifiers
223(4)
Algorithm
225(1)
Characteristics of Nearest-Neighbor Classifiers
226(1)
Bayesian Classifiers
227(19)
Bayes Theorem
228(1)
Using the Bayes Theorem for Classification
229(2)
Naive Bayes Classifier
231(7)
Bayes Error Rate
238(2)
Bayesian Belief Networks
240(6)
Artificial Neural Network (ANN)
246(10)
Perceptron
247(4)
Multilayer Artificial Neural Network
251(4)
Characteristics of ANN
255(1)
Support Vector Machine (SVM)
256(20)
Maximum Margin Hyperplanes
256(3)
Linear SVM: Separable Case
259(7)
Linear SVM: Nonseparable Case
266(4)
Nonlinear SVM
270(6)
Characteristics of SVM
276(1)
Ensemble Methods
276(18)
Rationale for Ensemble Method
277(1)
Methods for Constructing an Ensemble Classifier
278(3)
Bias-Variance Decomposition
281(2)
Bagging
283(2)
Boosting
285(5)
Random Forests
290(4)
Empirical Comparison among Ensemble Methods
294(1)
Class Imbalance Problem
294(12)
Alternative Metrics
295(3)
The Receiver Operating Characteristic Curve
298(4)
Cost-Sensitive Learning
302(3)
Sampling-Based Approaches
305(1)
Multiclass Problem
306(3)
Bibliographic Notes
309(6)
Exercises
315(12)
Association Analysis: Basic Concepts and Algorithms
327(88)
Problem Definition
328(4)
Frequent Itemset Generation
332(17)
The Apriori Principle
333(2)
Frequent Itemset Generation in the Apriori Algorithm
335(3)
Candidate Generation and Pruning
338(4)
Support Counting
342(3)
Computational Complexity
345(4)
Rule Generation
349(4)
Confidence-Based Pruning
350(1)
Rule Generation in Apriori Algorithm
350(2)
An Example: Congressional Voting Records
352(1)
Compact Representation of Frequent Itemsets
353(6)
Maximal Frequent Itemsets
354(1)
Closed Frequent Itemsets
355(4)
Alternative Methods for Generating Frequent Itemsets
359(4)
FP-Growth Algorithm
363(7)
FP-Tree Representation
363(3)
Frequent Itemset Generation in FP-Growth Algorithm
366(4)
Evaluation of Association Patterns
370(16)
Objective Measures of Interestingness
371(11)
Measures beyond Pairs of Binary Variables
382(2)
Simpson's Paradox
384(2)
Effect of Skewed Support Distribution
386(4)
Bibliographic Notes
390(14)
Exercises
404(11)
Association Analysis: Advanced Concepts
415(72)
Handling Categorical Attributes
415(3)
Handling Continuous Attributes
418(8)
Discretization-Based Methods
418(4)
Statistics-Based Methods
422(2)
Non-discretization Methods
424(2)
Handling a Concept Hierarchy
426(3)
Sequential Patterns
429(13)
Problem Formulation
429(2)
Sequential Pattern Discovery
431(5)
Timing Constraints
436(3)
Alternative Counting Schemes
439(3)
Subgraph Patterns
442(15)
Graphs and Subgraphs
443(1)
Frequent Subgraph Mining
444(3)
Apriori-like Method
447(1)
Candidate Generation
448(5)
Candidate Pruning
453(4)
Support Counting
457(1)
Infrequent Patterns
457(12)
Negative Patterns
458(1)
Negatively Correlated Patterns
458(2)
Comparisons among Infrequent Patterns, Negative Patterns, and Negatively Correlated Patterns
460(1)
Techniques for Mining Interesting Infrequent Patterns
461(2)
Techniques Based on Mining Negative Patterns
463(2)
Techniques Based on Support Expectation
465(4)
Bibliographic Notes
469(4)
Exercises
473(14)
Cluster Analysis: Basic Concepts and Algorithms
487(82)
Overview
490(6)
What Is Cluster Analysis?
490(1)
Different Types of Clusterings
491(2)
Different Types of Clusters
493(3)
K-means
496(19)
The Basic K-means Algorithm
497(9)
K-means: Additional Issues
506(2)
Bisecting K-means
508(2)
K-means and Different Types of Clusters
510(1)
Strengths and Weaknesses
510(3)
K-means as an Optimization Problem
513(2)
Agglomerative Hierarchical Clustering
515(11)
Basic Agglomerative Hierarchical Clustering Algorithm
516(2)
Specific Techniques
518(6)
The Lance-Williams Formula for Cluster Proximity
524(1)
Key Issues in Hierarchical Clustering
524(2)
Strengths and Weaknesses
526(1)
DBSCAN
526(6)
Traditional Density: Center-Based Approach
527(1)
The DBSCAN Algorithm
528(2)
Strengths and Weaknesses
530(2)
Cluster Evaluation
532(23)
Overview
533(3)
Unsupervised Cluster Evaluation Using Cohesion and Separation
536(6)
Unsupervised Cluster Evaluation Using the Proximity Matrix
542(2)
Unsupervised Evaluation of Hierarchical Clustering
544(2)
Determining the Correct Number of Clusters
546(1)
Clustering Tendency
547(1)
Supervised Measures of Cluster Validity
548(5)
Assessing the Significance of Cluster Validity Measures
553(2)
Bibliographic Notes
555(4)
Exercises
559(10)
Cluster Analysis: Additional Issues and Algorithms
569(82)
Characteristics of Data, Clusters, and Clustering Algorithms
570(7)
Example: Comparing K-means and DBSCAN
570(1)
Data Characteristics
571(2)
Cluster Characteristics
573(2)
General Characteristics of Clustering Algorithms
575(2)
Prototype-Based Clustering
577(23)
Fuzzy Clustering
577(6)
Clustering Using Mixture Models
583(11)
Self-Organizing Maps (SOM)
594(6)
Density-Based Clustering
600(12)
Grid-Based Clustering
601(3)
Subspace Clustering
604(4)
Denclue: A Kernel-Based Scheme for Density-Based Clustering
608(4)
Graph-Based Clustering
612(18)
Sparsification
613(1)
Minimum Spanning Tree (MST) Clustering
614(2)
Opossum: Optimal Partitioning of Sparse Similarities Using METIS
616(1)
Chameleon: Hierarchical Clustering with Dynamic Modeling
616(6)
Shared Nearest Neighbor Similarity
622(3)
The Jarvis-Patrick Clustering Algorithm
625(2)
SNN Density
627(2)
SNN Density-Based Clustering
629(1)
Scalable Clustering Algorithms
630(9)
Scalability: General Issues and Approaches
630(3)
Birch
633(2)
Cure
635(4)
Which Clustering Algorithm?
639(4)
Bibliographic Notes
643(4)
Exercises
647(4)
Anomaly Detection
651(34)
Preliminaries
653(5)
Causes of Anomalies
653(1)
Approaches to Anomaly Detection
654(1)
The Use of Class Labels
655(1)
Issues
656(2)
Statistical Approaches
658(8)
Detecting Outliers in a Univariate Normal Distribution
659(2)
Outliers in a Multivariate Normal Distribution
661(1)
A Mixture Model Approach for Anomaly Detection
662(3)
Strengths and Weaknesses
665(1)
Proximity-Based Outlier Detection
666(2)
Strengths and Weaknesses
666(2)
Density-Based Outlier Detection
668(3)
Detection of Outliers Using Relative Density
669(1)
Strengths and Weaknesses
670(1)
Clustering-Based Techniques
671(4)
Assessing the Extent to Which an Object Belongs to a Cluster
672(2)
Impact of Outliers on the Initial Clustering
674(1)
The Number of Clusters to Use
674(1)
Strengths and Weaknesses
674(1)
Bibliographic Notes
675(5)
Exercises
680(5)
Appendix A Linear Algebra
685(16)
Vectors
685(6)
Definition
685(1)
Vector Addition and Multiplication by a Scalar
685(2)
Vector Spaces
687(1)
The Dot Product, Orthogonality, and Orthogonal Projections
688(2)
Vectors and Data Analysis
690(1)
Matrices
691(9)
Matrices: Definitions
691(1)
Matrices: Addition and Multiplication by a Scalar
692(1)
Matrices: Multiplication
693(2)
Linear Transformations and Inverse Matrices
695(2)
Eigenvalue and Singular Value Decomposition
697(2)
Matrices and Data Analysis
699(1)
Bibliographic Notes
700(1)
Appendix B Dimensionality Reduction
701(18)
PCA and SVD
701(7)
Principal Components Analysis (PCA)
701(5)
SVD
706(2)
Other Dimensionality Reduction Techniques
708(8)
Factor Analysis
708(2)
Locally Linear Embedding (LLE)
710(2)
Multidimensional Scaling, FastMap, and ISOMAP
712(3)
Common Issues
715(1)
Bibliographic Notes
716(3)
Appendix C Probability and Statistics
719(10)
Probability
719(4)
Expected Values
722(1)
Statistics
723(3)
Point Estimation
724(1)
Central Limit Theorem
724(1)
Interval Estimation
725(1)
Hypothesis Testing
726(3)
Appendix D Regression
729(10)
Preliminaries
729(1)
Simple Linear Regression
730(6)
Least Square Method
731(2)
Analyzing Regression Errors
733(2)
Analyzing Goodness of Fit
735(1)
Multivariate Linear Regression
736(1)
Alternative Least-Square Regression Methods
737(2)
Appendix E Optimization
739(11)
Unconstrained Optimization
739(7)
Numerical Methods
742(4)
Constrained Optimization
746(4)
Equality Constraints
746(1)
Inequality Constraints
747(3)
Author Index 750(8)
Subject Index 758(11)
Copyright Permissions 769

Rewards Program

Write a Review