Data Mining for Business Intelligence, Second Edition uses real data and actual cases to illustrate the applicability of data mining (DM) intelligence in the development of successful business models. Featuring complimentary access to XLMiner®, the Microsoft Office Excel® add-in, this book allows readers to follow along and implement algorithms at their own speed, with a minimal learning curve. In addition, students and practitioners of DM techniques are presented with hands-on, business-oriented applications. An abundant amount of exercises and examples, now doubled in number in the second edition, are provided to motivate learning and understanding.

This book helps readers understand the beneficial relationship that can be established between DM and smart business practices, and is an excellent learning tool for creating valuable strategies and making wiser business decisions. New topics include detailed coverage of visualization enhanced by Spotfire subroutines and time series forecasting, among a host of other subject matter.

The Second Edition now features:

-Three new chapters on time series forecasting, introducing popular business forecasting methods including moving average, exponential smoothing methods; regression-based models; and topics such as explanatory vs. predictive modeling, two-level models, and ensembles

-A revised chapter on data visualization that now features interactive visualization principles and added assignments that demonstrate interactive visualization in practice

-Separate chapters that each treat k-nearest neighbors and Naïve Bayes methods

-Summaries at the start of each chapter that supply an outline of key topics

GALIT SHMUELI, PhD, is Associate Professor of Statistics and Director of the eMarkets Research Lab in the Robert H. Smith School of Business at the University of Maryland. Dr. Shmueli is the coauthor of Statistical Methods in e-Commerce Research and Modeling Online Auctions, both published by Wiley.

NITIN R. PATEL, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology for over ten years.

PETER C. BRUCE is President and owner of statistics.com, the leading provider of online education in statistics.

Foreword
Preface
Acknowledgments
Preliminaries
Introduction
What Is Data Mining?
Where Is Data Mining Used?
The Origins of Data Mining
The Rapid Growth of Data Mining
Why Are There So Many Different Methods?
Terminology and Notation
Road Maps to This Book
Overview of the Data Mining Process
Introduction
Core Ideas in Data Mining
Supervised and Unsupervised Learning
The Steps in Data Mining
Preliminary Steps
Building a Model: Example with Linear Regression
Using Excel for Data Mining
Problems
Data Exploration and Dimension Reduction
Data Visualization
Uses of Data Visualization
Data Examples
Boston Housing Data
Ridership on Amtrak Trains
Basic Charts: bar charts, line graphs, and scatterplots
Distribution Plots
Heatmaps: visualizing correlations and missing values
MultiDimensional Visualization
Adding Variables: color, hue, size, shape, multiple panels, animation
Manipulations: rescaling,aggregation and hierarchies, zooming and panning, filtering
Reference: trend line and labels
Scaling up: large datasets
Multivariate plot: parallel coordinates plot
Interactive visualization
Specialized Visualizations
Visualizing networked data
Visualizing hierarchical data: treemaps
Visualizing geographical data: maps
Summary of major visualizations and operations, according to data mining goal
Prediction
Classification
Time series forecasting
Unsupervised learning
Problems
Dimension Reduction
Introduction
Practical Considerations
House Prices in Boston
Data Summaries
Correlation Analysis
Reducing the Number of Categories in Categorical Variables
Converting A Categorical Variable to A Numerical Variable
Principal Components Analysis
Breakfast Cereals
Principal Components
Normalizing the Data
Using Principal Components for Classification and Prediction
Dimension Reduction Using Regression Models
Dimension Reduction Using Classification and Regression Trees
Problems
Performance Evaluation
Evaluating Classification and Predictive Performance
Introduction
Judging Classification Performance
Benchmark: The Naive Rule
Class Separation
The Classification Matrix
Using the Validation Data
Accuracy Measures
Cutoff for Classification
Performance in Unequal Importance of Classes
Asymmetric Misclassification Costs
Oversampling and Asymmetric Costs
Classification Using a Triage Strategy
Evaluating Predictive Performance
Benchmark: The Average
Prediction Accuracy Measures
Problems
Prediction and Classification Methods
Multiple Linear Regression
Introduction
Explanatory vs. Predictive Modeling
Estimating the Regression Equation and Prediction
Example: Predicting the Price of Used Toyota Corolla Automobiles
Variable Selection in Linear Regression
Reducing the Number of Predictors
How to Reduce the Number of Predictors
Problems
kNearest
Neighbors (kNN)
The kNN
Classifier
Determining Neighbors
Classification Rule
Example: Riding Mowers
Choosing k
Setting the Cutoff Value
kNN
With More Than 2 Classes
kNN
for a Numerical Response
Advantages and Shortcomings of kNN
Algorithms
Problems
Naive Bayes
Introduction
Predicting Fraudulent Financial Reporting
The Practical Difficulty with the Complete (Exact) Bayes Procedure
The Solution: Na‹ve Bayes
Predicting Fraudulent Financial Reports, 2 Predictors
Predicting Delayed Flights
Advantages and Shortcomings of the naive Bayes Classifier
Problems
Classification and Regression Trees
Introduction
Classification Trees
Recursive Partitioning
Riding Mowers
Measures of Impurity
Evaluating the Performance of a Classification Tree
Acceptance of Personal Loan
Avoiding Overfitting
Stopping Tree Growth: CHAID
Pruning the Tree
Classification Rules from Trees
Classification Trees for More Than 2 Classes
Regression Trees
Prediction
Measuring Impurity
Evaluating Performance
Advantages, Weaknesses, and Extensions
Problems
Logistic Regression
Introduction
The Logistic Regression Model
Example: Acceptance of Personal Loan
Model with a Single Predictor
Estimating the Logistic Model from Data: Computing Parameter
Estimates
Interpreting Results in Terms of Odds
Evaluating Classification Performance
Variable Selection
Example of Complete Analysis: Predicting Delayed Flights
Data Preprocessing
Model Fitting and Estimation
Model Interpretation
Model Performance
Variable Selection
Appendix: Logistic Regression for Profiling
Appendix: Logistic regression for profiling
Appendix: B: Evaluating Goodness of Fit
Appendix B Evaluating Goodness of Fit
Appendix: C: Logistic Regression for More Than Two Classes
Appendix C Logistic Regression for More Than Two Classes
Problems
Neural Nets
Introduction
Concept and Structure of a Neural Network
Fitting a Network to Data
Tiny Dataset
Computing Output of Nodes
Preprocessing the Data
Training the Model
Classifying Accident Severity
Avoiding overfitting
Using the Output for Prediction and Classification
Required User Input
Exploring the Relationship Between Predictors and Response
Advantages and Weaknesses of Neural Networks
Problems
Discriminant Analysis
Introduction
Riding Mowers
Personal Loan Acceptance
Distance of an Observation from a Class
Fisher's Linear Classification Functions
Classification Performance of Discriminant Analysis
Prior Probabilities
Unequal Misclassification Costs
Classifying More Than Two Classes
Medical Dispatch to Accident Scenes
Advantages and Weaknesses
Problems
Mining Relationships Among Records
Association Rules
Introduction
Discovering Association Rules in Transaction Databases
Synthetic Data on Purchases of Phone Faceplates
Generating Candidate Rules
The Apriori Algorithm
Selecting Strong Rules
Support and Confidence
Lift Ratio
Data Format
The Process of Rule Selection
Interpreting the Results
Statistical Significance of Rules
Rules for Similar Book Purchases
Summary
Problems
Cluster Analysis
Introduction
Example: Public Utilities
Measuring Distance Between Two Records
Euclidean Distance
Normalizing Numerical Measurements
Other Distance Measures for Numerical Data
Distance Measures for Categorical Data
Distance Measures for Mixed Data
Measuring Distance Between Two Clusters
Hierarchical (Agglomerative) Clustering
Contents
Minimum Distance (Single Linkage)
Maximum Distance (Complete Linkage)
Average Distance (Average Linkage)
Dendrograms: Displaying Clustering Process and Results
Validating Clusters
Limitations of Hierarchical Clustering
Nonhierarchical Clustering: The kMeans Algorithm
Initial Partition into k Clusters
Problems
Forecasting Time Series
Handling Time Series
Introduction
Explanatory vs. Predictive Modeling
Popular Forecasting Methods in Business
Combining Methods
Time Series Components
Example: Ridership on Amtrak Trains
Data Partitioning
Problems
Regression Based Forecasting
A Model with Trend
Linear Trend
Exponential Trend
Polynomial Trend
A Model with Seasonality
A model with trend and seasonality
Autocorrelation and ARIMA Models
Computing Autocorrelation
Computing Autocorrelation
Improving Forecasts by Integrating Autocorrelation Information
Improving Forecasts by Integrating Autocorrelation Information
Evaluating Predictability
Evaluating Predictability
Problems
Smoothing Methods
Introduction
Moving Average
Centered Moving Average for Visualization
Trailing Moving Average for Forecasting
Choosing Window Width
Simple Exponential Smoothing
Choosing Smoothing Parameter
Relation Between Moving Average and Simple Exponential
Smoothing
Advanced Exponential Smoothing
Series with a trend
Series with a trend and seasonality
Series with seasonality
Problems
Cases
Cases
Charles Book Club
German Credit
Tayko Software Cataloger
Segmenting Consumers of Bath Soap
DirectMail Fundraising
Catalog CrossSelling
Predicting Bankruptcy
Time Series Case: Forecasting Public Transportation Demand
References
Index
Table of Contents provided by Publisher. All Rights Reserved.

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Data Mining for Business Intelligence : Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner

9780470526828

0470526823

Supplemental Materials

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program