Introduction | p. 1 |
Motivation | p. 1 |
Data and Knowledge | p. 2 |
Tycho Brahe and Johannes Kepler | p. 4 |
Intelligent Data Analysis | p. 6 |
The Data Analysis Process | p. 7 |
Methods, Tasks, and Tools | p. 11 |
How to Read This Book | p. 13 |
References | p. 14 |
Practical Data Analysis: An Example | p. 15 |
The Setup | p. 15 |
Data Understanding and Pattern Finding | p. 16 |
Explanation Finding | p. 20 |
Predicting the Future | p. 21 |
Concluding Remarks | p. 23 |
Project Understanding | p. 25 |
Determine the Project Objective | p. 26 |
Assess the Situation | p. 28 |
Determine Analysis Goals | p. 30 |
Further Reading | p. 31 |
References | p. 32 |
Data Understanding | p. 33 |
Attribute Understanding | p. 34 |
Data Quality | p. 37 |
Data Visualization | p. 40 |
Methods for One and Two Attributes | p. 40 |
Methods for Higher-Dimensional Data | p. 48 |
Correlation Analysis | p. 59 |
Outlier Detection | p. 62 |
Outlier Detection for Single Attributes | p. 63 |
Outlier Detection for Multidimensional Data | p. 64 |
Missing Values | p. 65 |
A Checklist for Data Understanding | p. 68 |
Data Understanding in Practice | p. 69 |
Data Understanding in KNIME | p. 70 |
Data Understanding in R | p. 73 |
References | p. 78 |
Principles of Modeling | p. 81 |
Model Classes | p. 82 |
Fitting Criteria and Score Functions | p. 85 |
Error Functions for Classification Problems | p. 87 |
Measures of Interestingness | p. 89 |
Algorithms for Model Fitting | p. 89 |
Closed Form Solutions | p. 89 |
Gradient Method | p. 90 |
Combinatorial Optimization | p. 92 |
Random Search, Greedy Strategies, and Other Heuristics | p. 92 |
Types of Errors | p. 93 |
Experimental Error | p. 94 |
Sample Error | p. 99 |
Model Error | p. 100 |
Algorithmic Error | p. 101 |
Machine Learning Bias and Variance | p. 101 |
Learning Without Bias? | p. 102 |
Model Validation | p. 102 |
Training and Test Data | p. 102 |
Cross-Validation | p. 103 |
Bootstrapping | p. 104 |
Measures for Model Complexity | p. 105 |
Model Errors and Validation in Practice | p. 111 |
Errors and Validation in KNIME | p. 111 |
Validation in R | p. 111 |
Further Reading | p. 113 |
References | p. 113 |
Data Preparation | p. 115 |
Select Data | p. 115 |
Feature Selection | p. 116 |
Dimensionality Reduction | p. 121 |
Record Selection | p. 121 |
Clean Data | p. 123 |
Improve Data Quality | p. 123 |
Missing Values | p. 124 |
Construct Data | p. 127 |
Provide Operability | p. 127 |
Assure Impartially | p. 129 |
Maximize Efficiency | p. 131 |
Complex Data Types | p. 134 |
Data Integration | p. 135 |
Vertical Data Integration | p. 136 |
Horizontal Data Integration | p. 136 |
Data Preparation in Practice | p. 138 |
Data Preparation in KNIME | p. 139 |
Data Preparation in R | p. 141 |
References | p. 142 |
Finding Patterns | p. 145 |
Hierarchical Clustering | p. 147 |
Overview | p. 148 |
Construction | p. 150 |
Variations and Issues | p. 152 |
Notion of (Dis-)Similarity | p. 155 |
Prototype-and Model-Based Clustering | p. 162 |
Overview | p. 162 |
Construction | p. 164 |
Variations and Issues | p. 167 |
Density-Based Clustering | p. 169 |
Overview | p. 170 |
Construction | p. 171 |
Variations and Issues | p. 173 |
Self-organizing Maps | p. 175 |
Overview | p. 175 |
Construction | p. 176 |
Frequent Pattern Mining and Association Rules | p. 179 |
Overview | p. 179 |
Construction | p. 181 |
Variations and Issues | p. 187 |
Deviation Analysis | p. 194 |
Overview | p. 194 |
Construction | p. 195 |
Variations and Issues | p. 197 |
Finding Patterns in Practice | p. 198 |
Finding Patterns with KNIME | p. 199 |
Finding Patterns in R | p. 201 |
Further Reading | p. 203 |
References | p. 204 |
Finding Explanations | p. 207 |
Decision Trees | p. 208 |
Overview | p. 209 |
Construction | p. 210 |
Variations and Issues | p. 213 |
Bayes Classifiers | p. 218 |
Overview | p. 218 |
Construction | p. 220 |
Variations and Issues | p. 224 |
Regression | p. 229 |
Overview | p. 230 |
Construction | p. 231 |
Variations and Issues | p. 234 |
Two Class Problems | p. 242 |
Rule learning | p. 244 |
Prepositional Rules | p. 245 |
Inductive Logic Programming or First-Order Rules | p. 251 |
Finding Explanations in Practice | p. 253 |
Finding Explanations with KNIME | p. 253 |
Using Explanations with R | p. 255 |
Further Reading | p. 257 |
References | p. 258 |
Finding Predictors | p. 259 |
Nearest-Neighbor Predictors | p. 261 |
Overview | p. 261 |
Construction | p. 263 |
Variations and Issues | p. 265 |
Artifical Neural Networks | p. 269 |
Overview | p. 269 |
Construction | p. 272 |
Variations and Issues | p. 276 |
Support Vector Machines | p. 277 |
Overview | p. 278 |
Construction | p. 282 |
Variations and Issues | p. 283 |
Ensemble Methods | p. 284 |
Overview | p. 284 |
Construction | p. 286 |
Further Reading | p. 289 |
Finding Predictors in Practice | p. 290 |
Finding Predictors with KNIME | p. 290 |
Using Predictors in R | p. 292 |
References | p. 294 |
Evaluation and Deployment | p. 297 |
Evaluation | p. 297 |
Deployment and Monitoring | p. 299 |
References | p. 301 |
Statistics | p. 303 |
Terms and Notation | p. 304 |
Descriptive Statistics | p. 305 |
Tabular Representations | p. 305 |
Graphical Representations | p. 306 |
Characteristic Measures for One-Dimensional Data | p. 309 |
Characteristic Measures for Multidimensional Data | p. 316 |
Principal Component Analysis | p. 318 |
Probability Theory | p. 323 |
Probability | p. 323 |
Basic Methods and Theorems | p. 327 |
Random Variables | p. 333 |
Characteristic Measures of Random Variables | p. 339 |
Some Special Distributions | p. 343 |
Inferential Statistics | p. 349 |
Random Samples | p. 350 |
Parameter Estimation | p. 351 |
Hypothesis Testing | p. 361 |
The R Project | p. 369 |
Installation and Overview | p. 369 |
Reading Files and R Objects | p. 370 |
R Functions and Commands | p. 372 |
Libraries/Packages | p. 373 |
R Workspace | p. 373 |
Finding Help | p. 374 |
Further Reading | p. 374 |
Knime | p. 375 |
Installation and Overview | p. 375 |
Building Workflows | p. 377 |
Example Flow | p. 378 |
R Integration | p. 380 |
References | p. 383 |
p. 383 | |
p. 383 | |
Index | p. 385 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.