Understanding High-Dimensional Spaces

by Skillicorn, David B.

ISBN13:
9783642333972
ISBN10:
3642333974
Format: Paperback
Copyright: 2012-09-27
Publisher: Springer-Nature New York Inc
More Book Details
Purchase Benefits

Free Shipping On Orders Over $35!

Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
Get Rewarded for Ordering Your Textbooks! Enroll Now
Customer Reviews Read Reviews
Write a Review
Understanding High-Dimensional Spaces > ISBN13: 9783642333972

List Price: ~~$69.99~~ Save up to $50.19

Digital

$42.90*

More Prices

Digital
$42.90*

Add to Cart

DURATION

PRICE

Online: 30 Days

Downloadable: 30 Days - VitalSource

$19.80*

Online: 60 Days

Downloadable: 60 Days - VitalSource

$26.40*

Online: 90 Days

Downloadable: 90 Days - VitalSource

$33.00*

Online: 120 Days

Downloadable: 120 Days - VitalSource

$39.60*

Online: 180 Days

Downloadable: 180 Days - VitalSource

$42.90*

Online: 1825 Days

Downloadable: Lifetime Access - VitalSource

$65.99*

*To support the delivery of the digital material to you, a digital delivery fee of $3.99 will be charged on each digital item.

Marketplace

$56.82

More Prices

Summary

High-dimensional spaces arise as a way of modelling datasets with many attributes. Such a dataset can be directly represented in a space spanned by its attributes, with each record represented as a point in the space with its position depending on its attribute values. Such spaces are not easy to work with because of their high dimensionality: our intuition about space is not reliable, and measures such as distance do not provide as clear information as we might expect. There are three main areas where complex high dimensionality and large datasets arise naturally: data collected by online retailers, preference sites, and social media sites, and customer relationship databases, where there are large but sparse records available for each individual; data derived from text and speech, where the attributes are words and so the corresponding datasets are wide, and sparse; and data collected for security, defense, law enforcement, and intelligence purposes, where the datasets are large and wide. Such datasets are usually understood either by finding the set of clusters they contain or by looking for the outliers, but these strategies conceal subtleties that are often ignored. In this book the author suggests new ways of thinking about high-dimensional spaces using two models: a skeleton that relates the clusters to one another; and boundaries in the empty space between clusters that provide new perspectives on outliers and on outlying regions. The book will be of value to practitioners, graduate students and researchers.

Introduction	p. 1
A Natural Representation of Data Similarity	p. 3
Goals	p. 8
Outline	p. 10
Basic Structure of High-Dimensional Spaces	p. 13
Comparing Attributes	p. 13
Comparing Records	p. 14
Similarity	p. 14
High-Dimensional Spaces	p. 16
Summary	p. 18
Algorithms	p. 19
Improving the Natural Geometry	p. 19
Projection	p. 20
Singular Value Decompositions	p. 20
Random Projections	p. 22
Algorithms that Find Standalone Clusters	p. 23
Clusters Based on Density	p. 23
Parallel Coordinates	p. 24
Independent Component Analysis	p. 24
Latent Dirichlet Allocation	p. 25
Algorithms that Find Clusters and Their Relationships	p. 25
Clusters Based on Distance	p. 25
Clusters Based on Distribution	p. 26
Semidiscrete Decomposition	p. 27
Hierarchical Clustering	p. 29
Minimum Spanning Tree with Collapsing	p. 29
Overall Process for Constructing a Skeleton	p. 30
Algorithms that Wrap Clusters	p. 31
Distance-Based	p. 32
Distribution-Based	p. 32
1-Class Support Vector Machines	p. 32
Autoassociative Neural Networks	p. 33
Covers	p. 34
Algorithms to Place Boundaries Between Clusters	p. 34
Support Vector Machines	p. 35
Random Forests	p. 35
Overall Process for Constructing Empty Space	p. 36
Summary	p. 37
Spaces with a Single Center	p. 39
Using Distance	p. 39
Using Density	p. 40
Understanding the Skeleton	p. 42
Understanding Empty Space	p. 43
Summary	p. 45
Spaces with Multiple Centers	p. 47
What is a Cluster?	p. 48
Identifying Clusters	p. 50
Clusters Known Already	p. 50
Finding Clusters	p. 50
Finding the Skeleton	p. 55
Empty Space	p. 58
An Outer Boundary and Novel Data	p. 58
Interesting Data	p. 60
One-Cluster Boundaries	p. 63
One-Cluster-Against-the-Rest Boundaries	p. 63
Summary	p. 64
Representation by Graphs	p. 67
Building a Graph from Records	p. 68
Local Similarities	p. 68
Embedding Choices	p. 69
Using the Embedding for Clustering	p. 70
Summary	p. 71
Using Models of High-Dimensional Spaces	p. 73
Understanding Clusters	p. 73
Structure in the Set of Clusters	p. 76
Semantic Stratified Sampling	p. 77
Ranking Using the Skeleton	p. 78
Ranking Using Empty Space	p. 87
Applications to Streaming Data	p. 89
Concealment	p. 90
Summary	p. 91
Including Contextual Information	p. 93
What is Context?	p. 93
Changing Data	p. 93
Changing Analyst and Organizational Properties	p. 94
Changing Algorithmic Properties	p. 95
Letting Context Change the Models	p. 95
Recomputing the View	p. 95
Recomputing Derived Structures	p. 96
Recomputing the Clustering	p. 97
Summary	p. 98
Conclusions	p. 99
References	p. 103
Index	p. 107
Table of Contents provided by Ingram. All Rights Reserved.

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.