Introduction | p. 1 |
A Natural Representation of Data Similarity | p. 3 |
Goals | p. 8 |
Outline | p. 10 |
Basic Structure of High-Dimensional Spaces | p. 13 |
Comparing Attributes | p. 13 |
Comparing Records | p. 14 |
Similarity | p. 14 |
High-Dimensional Spaces | p. 16 |
Summary | p. 18 |
Algorithms | p. 19 |
Improving the Natural Geometry | p. 19 |
Projection | p. 20 |
Singular Value Decompositions | p. 20 |
Random Projections | p. 22 |
Algorithms that Find Standalone Clusters | p. 23 |
Clusters Based on Density | p. 23 |
Parallel Coordinates | p. 24 |
Independent Component Analysis | p. 24 |
Latent Dirichlet Allocation | p. 25 |
Algorithms that Find Clusters and Their Relationships | p. 25 |
Clusters Based on Distance | p. 25 |
Clusters Based on Distribution | p. 26 |
Semidiscrete Decomposition | p. 27 |
Hierarchical Clustering | p. 29 |
Minimum Spanning Tree with Collapsing | p. 29 |
Overall Process for Constructing a Skeleton | p. 30 |
Algorithms that Wrap Clusters | p. 31 |
Distance-Based | p. 32 |
Distribution-Based | p. 32 |
1-Class Support Vector Machines | p. 32 |
Autoassociative Neural Networks | p. 33 |
Covers | p. 34 |
Algorithms to Place Boundaries Between Clusters | p. 34 |
Support Vector Machines | p. 35 |
Random Forests | p. 35 |
Overall Process for Constructing Empty Space | p. 36 |
Summary | p. 37 |
Spaces with a Single Center | p. 39 |
Using Distance | p. 39 |
Using Density | p. 40 |
Understanding the Skeleton | p. 42 |
Understanding Empty Space | p. 43 |
Summary | p. 45 |
Spaces with Multiple Centers | p. 47 |
What is a Cluster? | p. 48 |
Identifying Clusters | p. 50 |
Clusters Known Already | p. 50 |
Finding Clusters | p. 50 |
Finding the Skeleton | p. 55 |
Empty Space | p. 58 |
An Outer Boundary and Novel Data | p. 58 |
Interesting Data | p. 60 |
One-Cluster Boundaries | p. 63 |
One-Cluster-Against-the-Rest Boundaries | p. 63 |
Summary | p. 64 |
Representation by Graphs | p. 67 |
Building a Graph from Records | p. 68 |
Local Similarities | p. 68 |
Embedding Choices | p. 69 |
Using the Embedding for Clustering | p. 70 |
Summary | p. 71 |
Using Models of High-Dimensional Spaces | p. 73 |
Understanding Clusters | p. 73 |
Structure in the Set of Clusters | p. 76 |
Semantic Stratified Sampling | p. 77 |
Ranking Using the Skeleton | p. 78 |
Ranking Using Empty Space | p. 87 |
Applications to Streaming Data | p. 89 |
Concealment | p. 90 |
Summary | p. 91 |
Including Contextual Information | p. 93 |
What is Context? | p. 93 |
Changing Data | p. 93 |
Changing Analyst and Organizational Properties | p. 94 |
Changing Algorithmic Properties | p. 95 |
Letting Context Change the Models | p. 95 |
Recomputing the View | p. 95 |
Recomputing Derived Structures | p. 96 |
Recomputing the Clustering | p. 97 |
Summary | p. 98 |
Conclusions | p. 99 |
References | p. 103 |
Index | p. 107 |
Table of Contents provided by Ingram. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.