did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780470084854

Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner®

by ; ;
  • ISBN13:

    9780470084854

  • ISBN10:

    0470084855

  • Format: Hardcover
  • Copyright: 2006-12-01
  • Publisher: Wiley-Interscience
  • Purchase Benefits
List Price: $126.50

Summary

In today's world, businesses are becoming more capable of accessing their ideal consumers, and an understanding of data mining contributes to this success. Data Mining for Business Intelligence, which was developed from a course taught at the Massachusetts Institute of Technology's Sloan School of Management, and the University of Maryland's Smith School of Business, uses real data and actual cases to illustrate the applicability of data mining intelligence to the development of successful business models. Featuring XLMiner, the Microsoft Office Excel add-in, this book allows readers to follow along and implement algorithms at their own speed, with a minimal learning curve. In addition, students and practitioners of data mining techniques are presented with hands-on, business-oriented applications. An abundant amount of exercises and examples are provided to motivate learning and understanding. Data Mining for Business Intelligence: * Provides both a theoretical and practical understanding of the key methods of classification, prediction, reduction, exploration, and affinity analysis * Features a business decision-making context for these key methods * Illustrates the application and interpretation of these methods using real business cases and data This book helps readers understand the beneficial relationship that can be established between data mining and smart business practices, and is an excellent learning tool for creating valuable strategies and making wiser business decisions.

Author Biography

GALIT SHMUELI, PHD, is Assistant Professor of Statistics in the Decision and Information Technologies Department of the Robert H. Smith School of Business at the University of Maryland.

NITIN R. PATEL, PHD, is Chairman, Founder, and Chief Technology Officer of Cambridge-based Cytel Incorporated and a Visiting Professor in the Engineering Systems Division at the Massachusetts Institute of Technology.

PETER C. BRUCE is President and owner of statistics.com, the leading provider of professional development courses in statistics.

Table of Contents

Foreword xiii
Preface xv
Acknowledgments xvii
Introduction
1(8)
What Is Data Mining?
1(1)
Where Is Data Mining Used?
2(1)
The Origins of Data Mining
2(1)
The Rapid Growth of Data Mining
3(1)
Why Are There So Many Different Methods?
4(1)
Terminology and Notation
4(2)
Road Maps to This Book
6(3)
Overview of the Data Mining Process
9(26)
Introduction
9(1)
Core Ideas in Data Mining
9(2)
Supervised and Unsupervised Learning
11(1)
The Steps in Data Mining
11(2)
Preliminary Steps
13(8)
Building a Model: Example with Linear Regression
21(6)
Using Excel for Data Mining
27(8)
Problems
31(4)
Data Exploration and Dimension Reduction
35(18)
Introduction
35(1)
Practical Considerations
35(2)
Example 1: House Prices in Boston
36(1)
Data Summaries
37(1)
Data Visualization
38(2)
Correlation Analysis
40(1)
Reducing the Number of Categories in Categorical Variables
41(1)
Principal Components Analysis
41(12)
Example 2: Breakfast Cereals
42(3)
Principal Components
45(1)
Normalizing the Data
46(3)
Using Principal Components for Classification and Prediction
49(2)
Problems
51(2)
Evaluating Classification and Predictive Performance
53(22)
Introduction
53(1)
Judging Classification Performance
53(19)
Accuracy Measures
53(3)
Cutoff for Classification
56(4)
Performance in Unequal Importance of Classes
60(1)
Asymmetric Misclassification Costs
61(5)
Oversampling and Asymmetric Costs
66(6)
Classification Using a Triage Strategy
72(1)
Evaluating Predictive Performance
72(3)
Problems
74(1)
Multiple Linear Regression
75(16)
Introduction
75(1)
Explanatory vs. Predictive Modeling
76(1)
Estimating the Regression Equation and Prediction
76(5)
Example: Predicting the Price of Used Toyota Corolla Automobiles
77(4)
Variable Selection in Linear Regression
81(10)
Reducing the Number of Predictors
81(1)
How to Reduce the Number of Predictors
82(4)
Problems
86(5)
Three Simple Classification Methods
91(20)
Introduction
91(1)
Example 1: Predicting Fraudulent Financial Reporting
91(1)
Example 2: Predicting Delayed Flights
92(1)
The Naive Rule
92(1)
Naive Bayes
93(10)
Conditional Probabilities and Pivot Tables
94(1)
A Practical Difficulty
94(1)
A Solution: Naive Bayes
95(5)
Advantages and Shortcomings of the naive Bayes Classifier
100(3)
k-Nearest Neighbors
103(8)
Example 3: Riding Mowers
104(1)
Choosing k
105(1)
k-NN for a Quantitative Response
106(1)
Advantages and Shortcomings of k-NN Algorithms
106(2)
Problems
108(3)
Classification and Regression Trees
111(26)
Introduction
111(2)
Classification Trees
113(1)
Recursive Partitioning
113(1)
Example 1: Riding Mowers
113(7)
Measures of Impurity
115(5)
Evaluating the Performance of a Classification Tree
120(1)
Example 2: Acceptance of Personal Loan
120(1)
Avoiding Overfitting
121(9)
Stopping Tree Growth: CHAID
121(4)
Pruning the Tree
125(5)
Classification Rules from Trees
130(1)
Regression Trees
130(2)
Prediction
130(1)
Measuring Impurity
131(1)
Evaluating Performance
132(1)
Advantages, Weaknesses, and Extensions
132(5)
Problems
134(3)
Logistic Regression
137(30)
Introduction
137(1)
The Logistic Regression Model
138(8)
Example: Acceptance of Personal Loan
139(2)
Model with a Single Predictor
141(2)
Estimating the Logistic Model from Data: Computing Parameter Estimates
143(1)
Interpreting Results in Terms of Odds
144(2)
Why Linear Regression Is Inappropriate for a Categorical Response
146(2)
Evaluating Classification Performance
148(2)
Variable Selection
148(2)
Evaluating Goodness of Fit
150(3)
Example of Complete Analysis: Predicting Delayed Flights
153(7)
Data Preprocessing
154(1)
Model Fitting and Estimation
155(1)
Model Interpretation
155(1)
Model Performance
155(2)
Goodness of fit
157(1)
Variable Selection
158(2)
Logistic Regression for More Than Two Classes
160(7)
Ordinal Classes
160(1)
Nominal Classes
161(2)
Problems
163(4)
Neural Nets
167(20)
Introduction
167(1)
Concept and Structure of a Neural Network
168(1)
Fitting a Network to Data
168(13)
Example 1: Tiny Dataset
169(1)
Computing Output of Nodes
170(2)
Preprocessing the Data
172(1)
Training the Model
172(4)
Example 2: Classifying Accident Severity
176(1)
Avoiding overfitting
177(4)
Using the Output for Prediction and Classification
181(1)
Required User Input
181(1)
Exploring the Relationship Between Predictors and Response
182(1)
Advantages and Weaknesses of Neural Networks
182(5)
Problems
184(3)
Discriminant Analysis
187(16)
Introduction
187(1)
Example 1: Riding Mowers
187(1)
Example 2: Personal Loan Acceptance
188(1)
Distance of an Observation from a Class
188(3)
Fisher's Linear Classification Functions
191(3)
Classification Performance of Discriminant Analysis
194(1)
Prior Probabilities
195(1)
Unequal Misclassification Costs
195(1)
Classifying More Than Two Classes
196(1)
Example 3: Medical Dispatch to Accident Scenes
196(1)
Advantages and Weaknesses
197(6)
Problems
200(3)
Association Rules
203(16)
Introduction
203(1)
Discovering Association Rules in Transaction Databases
203(1)
Example 1: Synthetic Data on Purchases of Phone Faceplates
204(1)
Generating Candidate Rules
204(2)
The Apriori Algorithm
205(1)
Selecting Strong Rules
206(6)
Support and Confidence
206(1)
Lift Ratio
207(1)
Data Format
207(2)
The Process of Rule Selection
209(1)
Interpreting the Results
210(1)
Statistical Significance of Rules
211(1)
Example 2: Rules for Similar Book Purchases
212(1)
Summary
212(7)
Problems
215(4)
Cluster Analysis
219(22)
Introduction
219(1)
Example: Public Utilities
220(2)
Measuring Distance Between Two Records
222(5)
Euclidean Distance
223(1)
Normalizing Numerical Measurements
223(1)
Other Distance Measures for Numerical Data
223(3)
Distance Measures for Categorical Data
226(1)
Distance Measures for Mixed Data
226(1)
Measuring Distance Between Two Clusters
227(1)
Hierarchical (Agglomerative) Clustering
228(5)
Minimum Distance (Single Linkage)
229(1)
Maximum Distance (Complete Linkage)
229(1)
Group Average (Average Linkage)
230(1)
Dendrograms: Displaying Clustering Process and Results
230(1)
Validating Clusters
231(1)
Limitations of Hierarchical Clustering
232(1)
Nonhierarchical Clustering: The k-Means Algorithm
233(8)
Initial Partition into k Clusters
234(3)
Problems
237(4)
Cases
241(30)
Charles Book Club
241(9)
German Credit
250(4)
Tayko Software Cataloger
254(4)
Segmenting Consumers of Bath Soap
258(4)
Direct-Mail Fundraising
262(3)
Catalog Cross-Selling
265(2)
Predicting Bankruptcy
267(4)
References 271(2)
Index 273

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program