Introduction | p. 1 |
What Is Data Mining? | p. 3 |
Some More Real-World Applications | p. 3 |
Data Mining Methods - An Overview | p. 6 |
Basic Problem Types | p. 6 |
Prediction | p. 6 |
Classification | p. 6 |
Regression | p. 7 |
Knowlegde Discovery | p. 7 |
Deviation Detection | p. 7 |
Cluster Analysis | p. 7 |
Visualization | p. 8 |
Association Rules | p. 8 |
Segmentation | p. 8 |
Data Mining Viewed from the Data Side | p. 9 |
Types of Data | p. 10 |
Conclusion | p. 11 |
Data Preparation | p. 13 |
Data Cleaning | p. 13 |
Handling Outlier | p. 14 |
Handling Noisy Data | p. 14 |
Missing Values Handling | p. 16 |
Coding | p. 16 |
Recognition of Correlated or Redundant Attributes | p. 16 |
Abstraction | p. 17 |
Attribute Construction | p. 17 |
Images | p. 17 |
Time Series | p. 18 |
Web Data | p. 19 |
Conclusions | p. 22 |
Methods for Data Mining | p. 23 |
Decision Tree Induction | p. 23 |
Basic Principle | p. 23 |
Terminology of Decision Tree | p. 24 |
Subtasks and Design Criteria for Decision Tree Induction | p. 25 |
Attribute Selection Criteria | p. 28 |
Information Gain Criteria and Gain Ratio | p. 29 |
Gini Function | p. 30 |
Discretization of Attribute Values | p. 31 |
Binary Discretization | p. 32 |
Multi-interval Discretization | p. 34 |
Discretization of Categorical or Symbolical Attributes | p. 41 |
Pruning | p. 42 |
Overview | p. 43 |
Cost-Complexity Pruning | p. 43 |
Some General Remarks | p. 44 |
Summary | p. 46 |
Case-Based Reasoning | p. 46 |
Background | p. 47 |
The Case-Based Reasoning Process | p. 47 |
CBR Maintenance | p. 48 |
Knowledge Containers in a CBR System | p. 49 |
Design Consideration | p. 50 |
Similarity | p. 50 |
Formalization of Similarity | p. 50 |
Similarity Measures | p. 51 |
Similarity Measures for Images | p. 51 |
Case Description | p. 53 |
Organization of Case Base | p. 53 |
Learning in a CBR System | p. 55 |
Learning of New Cases and Forgetting of Old Cases | p. 56 |
Learning of Prototypes | p. 56 |
Learning of Higher Order Constructs | p. 56 |
Learning of Similarity | p. 56 |
Conclusions | p. 57 |
Clustering | p. 57 |
Introduction | p. 57 |
General Comments | p. 58 |
Distance Measures for Metrical Data | p. 59 |
Using Numerical Distance Measures for Categorical Data | p. 60 |
Distance Measure for Nominal Data | p. 61 |
Contrast Rule | p. 62 |
Agglomerate Clustering Methods | p. 62 |
Partitioning Clustering | p. 64 |
Graphs Clustering | p. 64 |
Similarity Measure for Graphs | p. 65 |
Hierarchical Clustering of Graphs | p. 69 |
Conclusion | p. 71 |
Conceptual Clustering | p. 71 |
Introduction | p. 71 |
Concept Hierarchy and Concept Description | p. 71 |
Category Utility Function | p. 72 |
Algorithmic Properties | p. 73 |
Algorithm | p. 73 |
Conceptual Clustering of Graphs | p. 75 |
Notion of a Case and Similarity Measure | p. 75 |
Evaluation Function | p. 75 |
Prototype Learning | p. 76 |
An Example of a Learned Concept Hierarchy | p. 76 |
Conclusion | p. 79 |
Evaluation of the Model | p. 79 |
Error Rate, Correctness, and Quality | p. 79 |
Sensitivity and Specifity | p. 81 |
Test-and-Train | p. 82 |
Random Sampling | p. 82 |
Cross Validation | p. 82 |
Conclusion | p. 83 |
Feature Subset Selection | p. 83 |
Introduction | p. 83 |
Feature Subset Selection Algorithms | p. 83 |
The Wrapper and the Filter Model for Feature Subset Selection | p. 84 |
Feature Selection Done by Decision Tree Induction | p. 85 |
Feature Subset Selection Done by Clustering | p. 86 |
Contextual Merit Algorithm | p. 87 |
Floating Search Method | p. 88 |
Conclusion | p. 88 |
Applications | p. 91 |
Controlling the Parameters of an Algorithm/Model by Case-Based Reasoning | p. 91 |
Modelling Concerns | p. 91 |
Case-Based Reasoning Unit | p. 92 |
Management of the Case Base | p. 93 |
Case Structure and Case Base | p. 94 |
Non-image Information | p. 95 |
Image Information | p. 96 |
Image Similarity Determination | p. 97 |
Image Similarity Measure 1 (ISim_1) | p. 97 |
Image Similarity Measure 2 (iSIM_2) | p. 98 |
Comparision of ISim_1 and ISim_2 | p. 98 |
Segmentation Algorithm and Segmentation Parameters | p. 99 |
Similarity Determination | p. 100 |
Overall Similarity | p. 100 |
Similarity Measure for Non-image Information | p. 101 |
Similarity Measure for Image Information | p. 101 |
Knowledge Acquisition Aspect | p. 101 |
Conclusion | p. 102 |
Mining Images | p. 102 |
Introduction | p. 102 |
Preparing the Experiment | p. 103 |
Image Mining Tool | p. 105 |
The Application | p. 106 |
Brainstorming and Image Catalogue | p. 107 |
Interviewing Process | p. 107 |
Setting Up the Automatic Image Analysis and Feature Extraction Procedure | p. 107 |
Image Analysis | p. 108 |
Feature Extraction | p. 109 |
Collection of Image Descriptions into the Data Base | p. 111 |
The Image Mining Experiment | p. 112 |
Review | p. 113 |
Using the Discovered Knowledge | p. 114 |
Lessons Learned | p. 115 |
Conclusions | p. 116 |
Conclusion | p. 117 |
Appendix | p. 119 |
The IRIS Data Set | p. 119 |
References | p. 121 |
Index | p. 129 |
Table of Contents provided by Publisher. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.