Knowledge Discovery from Data Streams

by ;
  • ISBN13:


  • ISBN10:


  • Edition: 1st
  • Format: Hardcover
  • Copyright: 2010-05-25
  • Publisher: Chapman & Hall/

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $98.95 Save up to $71.13
  • Rent Book $89.06
    Add to Cart Free Shipping


Supplemental Materials

What is included with this book?

  • The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
  • The Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.


Since the beginning of the Internet age and the increased use of ubiquitous computing devices, the large volume and continuous flow of distributed data have imposed new constraints on the design of learning algorithms. Exploring how to extract knowledge structures from evolving and time-changing data, Knowledge Discovery from Data Streams presents a coherent overview of state-of-the-art research in learning from data streams.The book covers the fundamentals that are imperative to understanding data streams and describes important applications, such as TCP/IP traffic, GPS data, sensor networks, and customer click streams. It also addresses several challenges of data mining in the future, when stream mining will be at the core of many applications. These challenges involve designing useful and efficient data mining solutions applicable to real-world problems. In the appendix, the author includes examples of publicly available software and online data sets.This practical, up-to-date book focuses on the new requirements of the next generation of data mining. Although the concepts presented in the text are mainly about data streams, they also are valid for different areas of machine learning and data mining.

Table of Contents

List of Tablesp. xi
List of Figuresp. xiii
List of Algorithmsp. xv
Forewordp. xvii
Acknowledgmentsp. xix
Knowledge Discovery from Data Streamsp. 1
Introductionp. 1
An Illustrative Examplep. 2
A World in Movementp. 4
Data Mining and Data Streamsp. 5
Introduction to Data Streamsp. 7
Data Stream Modelsp. 7
Research Issues in Data Stream Management Systemsp. 8
An Illustrative Problemp. 8
Basic Streaming Methodsp. 9
Illustrative Examplesp. 10
Counting the Number of Occurrences of the Elements in a Streamp. 10
Counting the Number of Distinct Values in a Streamp. 11
Bounds of Random Variablesp. 11
Poisson Processesp. 13
Maintaining Simple Statistics from Data Streamsp. 14
Sliding Windowsp. 14
Computing Statistics over Sliding Windows: The ADWIN Algorithmp. 16
Data Synopsisp. 19
Samplingp. 19
Synopsis and Histogramsp. 20
Waveletsp. 21
Discrete Fourier Transformp. 22
Illustrative Applicationsp. 23
A Data Warehouse Problem: Hot-Listsp. 23
Computing the Entropy in a Streamp. 24
Monitoring Correlations Between Data Streamsp. 27
Monitoring Threshold Functions over Distributed Data Streamsp. 29
Notesp. 30
Change Detectionp. 33
Introductionp. 33
Tracking Drifting Conceptsp. 34
The Nature of Changep. 35
Characterization of Drift Detection Methodsp. 36
Data Managementp. 37
Detection Methodsp. 38
Adaptation Methodsp. 40
Decision Model Managementp. 41
A Note on Evaluating Change Detection Methodsp. 41
Monitoring the Learning Processp. 42
Drift Detection Using Statistical Process Controlp. 42
An Illustrative Examplep. 45
Final Remarksp. 46
Notesp. 47
Maintaining Histograms from Data Streamsp. 49
Introductionp. 49
Histograms from Data Streamsp. 50
K-buckets Histogramsp. 50
Exponential Histogramsp. 51
An Illustrative Examplep. 52
Discussionp. 52
The Partition Incremental Discretization Algorithm - PiDp. 53
Analysis of the Algorithmp. 56
Change Detection in Histogramsp. 56
An Illustrative Examplep. 57
Applications to Data Miningp. 59
Applying PiD in Supervised Learningp. 59
Time-Changing Environmentsp. 61
Notesp. 62
Evaluating Streaming Algorithmsp. 63
Introductionp. 63
Learning from Data Streamsp. 64
Evaluation Issuesp. 65
Design of Evaluation Experimentsp. 66
Evaluation Metricsp. 67
Error Estimators Using a Single Algorithm and a Single Datasetp. 68
An Illustrative Examplep. 68
Comparative Assessmentp. 69
The 0 - 1 Loss Functionp. 70
Illustrative Examplep. 71
Evaluation Methodology in Non-Stationary Environmentsp. 72
The Page-Hinkley Algorithmp. 72
Illustrative Examplep. 73
Lessons Learned and Open Issuesp. 75
Notesp. 77
Clustering from Data Streamsp. 79
Introductionp. 79
Clustering Examplesp. 80
Basic Conceptsp. 80
Partitioning Clusteringp. 82
The Leader Algorithmp. 82
Single Pass k-Meansp. 82
Hierarchical Clusteringp. 83
Micro Clusteringp. 85
Discussionp. 86
Monitoring Cluster Evolutionp. 86
Grid Clusteringp. 87
Computing the Fractal Dimensionp. 88
Fractal Clusteringp. 88
Clustering Variablesp. 90
A Hierarchical Approachp. 91
Growing the Hierarchyp. 91
Aggregating at Concept Drift Detectionp. 94
Analysis of the Algorithmp. 96
Notesp. 96
Frequent Pattern Miningp. 97
Introduction to Frequent Itemset Miningp. 97
The Search Spacep. 98
The FP-growth Algorithmp. 100
Summarizing Itemsetsp. 100
Heavy Hittersp. 101
Mining Frequent Itemsets from Data Streamsp. 103
Landmark Windowsp. 104
The LossyCounting Algorithmp. 104
Frequent Itemsets Using LossyCountingp. 104
Mining Recent Frequent Itemsetsp. 105
Maintaining Frequent Itemsets in Sliding Windowsp. 105
Mining Closed Frequent Itemsets over Sliding Windowsp. 106
Frequent Itemsets at Multiple Time Granularitiesp. 108
Sequence Pattern Miningp. 110
Reservoir Sampling for Sequential Pattern Mining over Data Streamsp. 111
Notesp. 113
Decision Trees from Data Streamsp. 115
Introductionp. 115
The Very Fast Decision Tree Algorithmp. 116
VFDT-The Base Algorithmp. 116
Analysis of the VFDT Algorithmp. 118
Extensions to the Basic Algorithmp. 119
Processing Continuous Attributesp. 119
Exhaustive Searchp. 119
Discriminant Analysisp. 121
Functional Tree Leavesp. 123
Concept Driftp. 124
Detecting Changesp. 126
Reacting to Changesp. 127
Final Commentsp. 128
OLIN: Info-Fuzzy Algorithmsp. 129
Notesp. 132
Novelty Detection in Data Streamsp. 133
Introductionp. 133
Learning and Noveltyp. 134
Desiderata for Novelty Detectionp. 135
Novelty Detection as a One-Class Classification Problemp. 135
Autoassociator Networksp. 136
The Positive Naive-Bayesp. 137
Decision Trees for One-Class Classificationp. 138
The One-Class SVMp. 138
Evaluation of One-Class Classification Algorithmsp. 139
Learning New Conceptsp. 141
Approaches Based on Extreme Valuesp. 141
Approaches Based on the Decision Structurep. 142
Approaches Based on Frequencyp. 143
Approaches Based on Distancesp. 144
The Online Novelty and Drift Detection Algorithmp. 144
Initial Learning Phasep. 145
Continuous Unsupervised Learning Phasep. 146
Identifying Novel Conceptsp. 147
Attempting to Determine the Nature of New Conceptsp. 149
Merging Similar Conceptsp. 149
Automatically Adapting the Number of Clustersp. 150
Computational Costp. 150
Notesp. 151
Ensembles of Classifiersp. 153
Introductionp. 153
Linear Combination of Ensemblesp. 155
Sampling from a Training Setp. 156
Online Baggingp. 157
Online Boostingp. 158
Ensembles of Treesp. 160
Option Treesp. 160
Forest of Treesp. 161
Generating forest of Treesp. 162
Classifying Test Examplesp. 162
Adapting to Drift Using Ensembles of Classifiersp. 162
Mining Skewed Data Streams with Ensemblesp. 165
Notesp. 166
Time Series Data Streamsp. 167
Introduction to Time Series Analysisp. 167
Trendp. 167
Seasonalityp. 169
Stationarityp. 169
Time-Series Predictionp. 169
The Kalman Filterp. 170
Least Mean Squaresp. 173
Neural Nets and Data Streamsp. 173
Stochastic Sequential Learning of Neural Networksp. 174
Illustrative Example: Load Forecast in Data Streamsp. 175
Similarity between Time-Seriesp. 177
Euclidean Distancep. 177
Dynamic Time-Warpingp. 178
Symbolic Approximation-SAXp. 180
The SAX Transformp. 180
Piecewise Aggregate Approximation (PAA)p. 181
Symbolic Discretizationp. 181
Distance Measurep. 182
Discussionp. 182
Finding Motifs Using SAXp. 183
Finding Discords Using SAXp. 183
Notesp. 184
Ubiquitous Data Miningp. 185
Introduction to Ubiquitous Data Miningp. 185
Distributed Data Stream Monitoringp. 186
Distributed Computing of Linear Functionsp. 187
A General Algorithm for Computing Linear Functionsp. 188
Computing Sparse Correlation Matrices Efficientlyp. 189
Monitoring Sparse Correlation Matricesp. 191
Detecting Significant Correlationsp. 192
Dealing with Data Streamsp. 192
Distributed Clusteringp. 193
Conquering the Dividep. 193
Furthest Point Clusteringp. 193
The Parallel Guessing Clusteringp. 193
DGClust - Distributed Grid Clusteringp. 194
Local Adaptive Gridp. 194
Frequent State Monitoringp. 195
Centralized Online Clusteringp. 196
Algorithm Granularityp. 197
Algorithm Granularity Overviewp. 199
Formalization of Algorithm Granularityp. 200
Algorithm Granularity Procedurep. 200
Algorithm Output Granularityp. 201
Notesp. 203
Final Commentsp. 205
The Next Generation of Knowledge Discoveryp. 205
Mining Spatial Datap. 206
The Time Situation of Datap. 206
Structured Datap. 206
Where We Want to Gop. 206
Resourcesp. 209
Softwarep. 209
Datasetsp. 209
Bibliographyp. 211
Indexp. 235
Table of Contents provided by Ingram. All Rights Reserved.

Rewards Program

Write a Review