Introduction to Clustering Large and High-Dimensional Data

by Jacob Kogan

ISBN13:
9780521852678
ISBN10:
0521852676
Format: Hardcover
Copyright: 2006-11-13
Publisher: Cambridge University Press
More Book Details
Purchase Benefits

Free Shipping On Orders Over $35!

Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
Get Rewarded for Ordering Your Textbooks! Enroll Now
Customer Reviews Read Reviews
Write a Review
Introduction to Clustering Large and High-Dimensional Data > ISBN13: 9780521852678

List Price: ~~$78.99~~ Save up to $34.83

Digital

$44.16*

More Prices

Digital
$44.16*

Add to Cart

DURATION

PRICE

Online: 180 Days

Downloadable: 180 Days - VitalSource

$44.16*

Online: 1825 Days

Downloadable: Lifetime Access - VitalSource

$55.19*

*To support the delivery of the digital material to you, a digital delivery fee of $3.99 will be charged on each digital item.

Marketplace

$78.23

More Prices

Summary

There is a growing need for a more automated system of partitioning data sets into groups, or clusters. For example, digital libraries and the World Wide Web continue to grow exponentially, the ability to find useful information increasingly depends on the indexing infrastructure or search engine. Clustering techniques can be used to discover natural groups in data sets and to identify abstract structures that might reside there, without having any background knowledge of the characteristics of the data. Clustering has been used in a variety of areas, including computer vision, VLSI design, data mining, bio-informatics (gene expression analysis), and information retrieval, to name just a few. This book focuses on a few of the most important clustering algorithms, providing a detailed account of these major models in an information retrieval context. The beginning chapters introduce the classic algorithms in detail, while the later chapters describe clustering through divergences and show recent research for more advanced audiences.

Author Biography

Jacob Kogan is an Associate Professor in the Department of Mathematics and Statistics at the University of Maryland, Baltimore County.

Foreword	p. xi
Preface	p. xiii
Introduction and motivation	p. 1
A way to embed ASCII documents into a finite dimensional Euclidean space	p. 3
Clustering and this book	p. 5
Bibliographic notes	p. 6
Quadratic k-means algorithm	p. 9
Classical batch k-means algorithm	p. 10
Quadratic distance and centroids	p. 12
Batch k-means clustering algorithm	p. 13
Batch k-means: advantages and deficiencies	p. 14
Incremental algorithm	p. 21
Quadratic functions	p. 21
Incremental k-means algorithm	p. 25
Quadratic k-means: summary	p. 29
Numerical experiments with quadratic k-means	p. 29
Stable partitions	p. 31
Quadratic k-means	p. 35
Spectral relaxation	p. 37
Bibliographic notes	p. 38
Birch	p. 41
Balanced iterative reducing and clustering algorithm	p. 41
BIRCH-like k-means	p. 44
Bibliographic notes	p. 49
Spherical k-means algorithm	p. 51
Spherical batch k-means algorithm	p. 51
Spherical batch k-means: advantages and deficiencies	p. 53
Computational considerations	p. 55
Spherical two-cluster partition of one-dimensional data	p. 57
One-dimensional line vs. the unit circle	p. 57
Optimal two cluster partition on the unit circle	p. 60
Spherical batch and incremental clustering algorithms	p. 64
First variation for spherical k-means	p. 65
Spherical incremental iterations-computations complexity	p. 68
The "ping-pong" algorithm	p. 69
Quadratic and spherical k-means	p. 71
Bibliographic notes	p. 72
Linear algebra techniques	p. 73
Two approximation problems	p. 73
Nearest line	p. 74
Principal directions divisive partitioning	p. 77
Principal direction divisive partitioning (PDDP)	p. 77
Spherical principal directions divisive partitioning (sPDDP)	p. 80
Clustering with PDDP and sPDDP	p. 82
Largest eigenvector	p. 87
Power method	p. 88
An application: hubs and authorities	p. 88
Bibliographic notes	p. 89
Information theoretic clustering	p. 91
Kullback-Leibler divergence	p. 91
k-means with Kullback-Leibler divergence	p. 94
Numerical experiments	p. 96
Distance between partitions	p. 98
Bibliographic notes	p. 99
Clustering with optimization techniques	p. 101
Optimization framework	p. 102
Smoothing k-means algorithm	p. 103
Convergence	p. 109
Numerical experiments	p. 114
Bibliographic notes	p. 122
k-means clustering with divergences	p. 125
Bregman distance	p. 125
ϕ-divergences	p. 128
Clustering with entropy-like distances	p. 132
BIRCH-type clustering with entropy-like distances	p. 135
Numerical experiments with (¿, ¿) k-means	p. 140
Smoothing with entropy-like distances	p. 144
Numerical experiments with (¿, ¿) smoka	p. 146
Bibliographic notes	p. 152
Assessment of clustering results	p. 155
Internal criteria	p. 155
External criteria	p. 156
Bibliographic notes	p. 160
Appendix: Optimization and linear algebra background	p. 161
Eigenvalues of a symmetric matrix	p. 161
Lagrange multipliers	p. 163
Elements of convex analysis	p. 164
Conjugate functions	p. 166
Asymptotic cones	p. 169
Asymptotic functions	p. 173
Smoothing	p. 176
Bibliographic notes	p. 178
Solutions to selected problems	p. 179
Bibliography	p. 189
Index	p. 203
Table of Contents provided by Ingram. All Rights Reserved.

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.