Note: Supplemental materials are not guaranteed with Rental or Used book purchases.
Purchase Benefits
Looking to rent a book? Rent Knowledge Discovery from Data Streams [ISBN: 9781439826119] for the semester, quarter, and short term or search our site for other textbooks by Gama; Joao. Renting a textbook can save you up to 90% from the cost of buying.
List of Tables | p. xi |
List of Figures | p. xiii |
List of Algorithms | p. xv |
Foreword | p. xvii |
Acknowledgments | p. xix |
Knowledge Discovery from Data Streams | p. 1 |
Introduction | p. 1 |
An Illustrative Example | p. 2 |
A World in Movement | p. 4 |
Data Mining and Data Streams | p. 5 |
Introduction to Data Streams | p. 7 |
Data Stream Models | p. 7 |
Research Issues in Data Stream Management Systems | p. 8 |
An Illustrative Problem | p. 8 |
Basic Streaming Methods | p. 9 |
Illustrative Examples | p. 10 |
Counting the Number of Occurrences of the Elements in a Stream | p. 10 |
Counting the Number of Distinct Values in a Stream | p. 11 |
Bounds of Random Variables | p. 11 |
Poisson Processes | p. 13 |
Maintaining Simple Statistics from Data Streams | p. 14 |
Sliding Windows | p. 14 |
Computing Statistics over Sliding Windows: The ADWIN Algorithm | p. 16 |
Data Synopsis | p. 19 |
Sampling | p. 19 |
Synopsis and Histograms | p. 20 |
Wavelets | p. 21 |
Discrete Fourier Transform | p. 22 |
Illustrative Applications | p. 23 |
A Data Warehouse Problem: Hot-Lists | p. 23 |
Computing the Entropy in a Stream | p. 24 |
Monitoring Correlations Between Data Streams | p. 27 |
Monitoring Threshold Functions over Distributed Data Streams | p. 29 |
Notes | p. 30 |
Change Detection | p. 33 |
Introduction | p. 33 |
Tracking Drifting Concepts | p. 34 |
The Nature of Change | p. 35 |
Characterization of Drift Detection Methods | p. 36 |
Data Management | p. 37 |
Detection Methods | p. 38 |
Adaptation Methods | p. 40 |
Decision Model Management | p. 41 |
A Note on Evaluating Change Detection Methods | p. 41 |
Monitoring the Learning Process | p. 42 |
Drift Detection Using Statistical Process Control | p. 42 |
An Illustrative Example | p. 45 |
Final Remarks | p. 46 |
Notes | p. 47 |
Maintaining Histograms from Data Streams | p. 49 |
Introduction | p. 49 |
Histograms from Data Streams | p. 50 |
K-buckets Histograms | p. 50 |
Exponential Histograms | p. 51 |
An Illustrative Example | p. 52 |
Discussion | p. 52 |
The Partition Incremental Discretization Algorithm - PiD | p. 53 |
Analysis of the Algorithm | p. 56 |
Change Detection in Histograms | p. 56 |
An Illustrative Example | p. 57 |
Applications to Data Mining | p. 59 |
Applying PiD in Supervised Learning | p. 59 |
Time-Changing Environments | p. 61 |
Notes | p. 62 |
Evaluating Streaming Algorithms | p. 63 |
Introduction | p. 63 |
Learning from Data Streams | p. 64 |
Evaluation Issues | p. 65 |
Design of Evaluation Experiments | p. 66 |
Evaluation Metrics | p. 67 |
Error Estimators Using a Single Algorithm and a Single Dataset | p. 68 |
An Illustrative Example | p. 68 |
Comparative Assessment | p. 69 |
The 0 - 1 Loss Function | p. 70 |
Illustrative Example | p. 71 |
Evaluation Methodology in Non-Stationary Environments | p. 72 |
The Page-Hinkley Algorithm | p. 72 |
Illustrative Example | p. 73 |
Lessons Learned and Open Issues | p. 75 |
Notes | p. 77 |
Clustering from Data Streams | p. 79 |
Introduction | p. 79 |
Clustering Examples | p. 80 |
Basic Concepts | p. 80 |
Partitioning Clustering | p. 82 |
The Leader Algorithm | p. 82 |
Single Pass k-Means | p. 82 |
Hierarchical Clustering | p. 83 |
Micro Clustering | p. 85 |
Discussion | p. 86 |
Monitoring Cluster Evolution | p. 86 |
Grid Clustering | p. 87 |
Computing the Fractal Dimension | p. 88 |
Fractal Clustering | p. 88 |
Clustering Variables | p. 90 |
A Hierarchical Approach | p. 91 |
Growing the Hierarchy | p. 91 |
Aggregating at Concept Drift Detection | p. 94 |
Analysis of the Algorithm | p. 96 |
Notes | p. 96 |
Frequent Pattern Mining | p. 97 |
Introduction to Frequent Itemset Mining | p. 97 |
The Search Space | p. 98 |
The FP-growth Algorithm | p. 100 |
Summarizing Itemsets | p. 100 |
Heavy Hitters | p. 101 |
Mining Frequent Itemsets from Data Streams | p. 103 |
Landmark Windows | p. 104 |
The LossyCounting Algorithm | p. 104 |
Frequent Itemsets Using LossyCounting | p. 104 |
Mining Recent Frequent Itemsets | p. 105 |
Maintaining Frequent Itemsets in Sliding Windows | p. 105 |
Mining Closed Frequent Itemsets over Sliding Windows | p. 106 |
Frequent Itemsets at Multiple Time Granularities | p. 108 |
Sequence Pattern Mining | p. 110 |
Reservoir Sampling for Sequential Pattern Mining over Data Streams | p. 111 |
Notes | p. 113 |
Decision Trees from Data Streams | p. 115 |
Introduction | p. 115 |
The Very Fast Decision Tree Algorithm | p. 116 |
VFDT-The Base Algorithm | p. 116 |
Analysis of the VFDT Algorithm | p. 118 |
Extensions to the Basic Algorithm | p. 119 |
Processing Continuous Attributes | p. 119 |
Exhaustive Search | p. 119 |
Discriminant Analysis | p. 121 |
Functional Tree Leaves | p. 123 |
Concept Drift | p. 124 |
Detecting Changes | p. 126 |
Reacting to Changes | p. 127 |
Final Comments | p. 128 |
OLIN: Info-Fuzzy Algorithms | p. 129 |
Notes | p. 132 |
Novelty Detection in Data Streams | p. 133 |
Introduction | p. 133 |
Learning and Novelty | p. 134 |
Desiderata for Novelty Detection | p. 135 |
Novelty Detection as a One-Class Classification Problem | p. 135 |
Autoassociator Networks | p. 136 |
The Positive Naive-Bayes | p. 137 |
Decision Trees for One-Class Classification | p. 138 |
The One-Class SVM | p. 138 |
Evaluation of One-Class Classification Algorithms | p. 139 |
Learning New Concepts | p. 141 |
Approaches Based on Extreme Values | p. 141 |
Approaches Based on the Decision Structure | p. 142 |
Approaches Based on Frequency | p. 143 |
Approaches Based on Distances | p. 144 |
The Online Novelty and Drift Detection Algorithm | p. 144 |
Initial Learning Phase | p. 145 |
Continuous Unsupervised Learning Phase | p. 146 |
Identifying Novel Concepts | p. 147 |
Attempting to Determine the Nature of New Concepts | p. 149 |
Merging Similar Concepts | p. 149 |
Automatically Adapting the Number of Clusters | p. 150 |
Computational Cost | p. 150 |
Notes | p. 151 |
Ensembles of Classifiers | p. 153 |
Introduction | p. 153 |
Linear Combination of Ensembles | p. 155 |
Sampling from a Training Set | p. 156 |
Online Bagging | p. 157 |
Online Boosting | p. 158 |
Ensembles of Trees | p. 160 |
Option Trees | p. 160 |
Forest of Trees | p. 161 |
Generating forest of Trees | p. 162 |
Classifying Test Examples | p. 162 |
Adapting to Drift Using Ensembles of Classifiers | p. 162 |
Mining Skewed Data Streams with Ensembles | p. 165 |
Notes | p. 166 |
Time Series Data Streams | p. 167 |
Introduction to Time Series Analysis | p. 167 |
Trend | p. 167 |
Seasonality | p. 169 |
Stationarity | p. 169 |
Time-Series Prediction | p. 169 |
The Kalman Filter | p. 170 |
Least Mean Squares | p. 173 |
Neural Nets and Data Streams | p. 173 |
Stochastic Sequential Learning of Neural Networks | p. 174 |
Illustrative Example: Load Forecast in Data Streams | p. 175 |
Similarity between Time-Series | p. 177 |
Euclidean Distance | p. 177 |
Dynamic Time-Warping | p. 178 |
Symbolic Approximation-SAX | p. 180 |
The SAX Transform | p. 180 |
Piecewise Aggregate Approximation (PAA) | p. 181 |
Symbolic Discretization | p. 181 |
Distance Measure | p. 182 |
Discussion | p. 182 |
Finding Motifs Using SAX | p. 183 |
Finding Discords Using SAX | p. 183 |
Notes | p. 184 |
Ubiquitous Data Mining | p. 185 |
Introduction to Ubiquitous Data Mining | p. 185 |
Distributed Data Stream Monitoring | p. 186 |
Distributed Computing of Linear Functions | p. 187 |
A General Algorithm for Computing Linear Functions | p. 188 |
Computing Sparse Correlation Matrices Efficiently | p. 189 |
Monitoring Sparse Correlation Matrices | p. 191 |
Detecting Significant Correlations | p. 192 |
Dealing with Data Streams | p. 192 |
Distributed Clustering | p. 193 |
Conquering the Divide | p. 193 |
Furthest Point Clustering | p. 193 |
The Parallel Guessing Clustering | p. 193 |
DGClust - Distributed Grid Clustering | p. 194 |
Local Adaptive Grid | p. 194 |
Frequent State Monitoring | p. 195 |
Centralized Online Clustering | p. 196 |
Algorithm Granularity | p. 197 |
Algorithm Granularity Overview | p. 199 |
Formalization of Algorithm Granularity | p. 200 |
Algorithm Granularity Procedure | p. 200 |
Algorithm Output Granularity | p. 201 |
Notes | p. 203 |
Final Comments | p. 205 |
The Next Generation of Knowledge Discovery | p. 205 |
Mining Spatial Data | p. 206 |
The Time Situation of Data | p. 206 |
Structured Data | p. 206 |
Where We Want to Go | p. 206 |
Resources | p. 209 |
Software | p. 209 |
Datasets | p. 209 |
Bibliography | p. 211 |
Index | p. 235 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.