What is included with this book?
1. Cluster Analysis.
2. Model-Based Co-Clustering.
3. Co-Clustering of Binary and Categorical Data.
4. Co-Clustering of Contingency Tables.
5. Co-Clustering of Continuous Data.
About the Authors
Gérard Govaert is Professor at the University of Technology of Compiègne, France. He is also a member of the CNRS Laboratory Heudiasyc (Heuristic and diagnostic of complex systems). His research interests include latent structure modeling, model selection, model-based cluster analysis, block clustering and statistical pattern recognition. He is one of the authors of the MIXMOD (MIXtureMODelling) software.
Mohamed Nadif is Professor at the University of Paris-Descartes, France, where he is a member of LIPADE (Paris Descartes computer science laboratory) in the Mathematics and Computer Science department. His research interests include machine learning, data mining, model-based cluster analysis, co-clustering, factorization and data analysis.
Cluster Analysis is an important tool in a variety of scientific areas. Chapter 1 briefly presents a state of the art of already well-established as well more recent methods. The hierarchical, partitioning and fuzzy approaches will be discussed amongst others. The authors review the difficulty of these classical methods in tackling the high dimensionality, sparsity and scalability. Chapter 2 discusses the interests of coclustering, presenting different approaches and defining a co-cluster. The authors focus on co-clustering as a simultaneous clustering and discuss the cases of binary, continuous and co-occurrence data. The criteria and algorithms are described and illustrated on simulated and real data. Chapter 3 considers co-clustering as a model-based co-clustering. A latent block model is defined for different kinds of data. The estimation of parameters and co-clustering is tackled under two approaches: maximum likelihood and classification maximum likelihood. Hard and soft algorithms are described and applied on simulated and real data. Chapter 4 considers co-clustering as a matrix approximation. The trifactorization approach is considered and algorithms based on update rules are described. Links with numerical and probabilistic approaches are established. A combination of algorithms are proposed and evaluated on simulated and real data. Chapter 5 considers a co-clustering or bi-clustering as the search for coherent co-clusters in biological terms or the extraction of co-clusters under conditions. Classical algorithms will be described and evaluated on simulated and real data. Different indices to evaluate the quality of coclusters are noted and used in numerical experiments.
Ch 1: Cluster Analyis
1. Hierarchical clustering
2. Partitional clustering
3. Subspace clustering
Ch 2: Co-clustering analysis
1. Two-side cluster Analysis
2. Nature of a co-cluster
3. Objective functions and algorithms
4. Binary data and Crobin
5. Continuous data and Croeuc
6. Co-occurrence, dhillon, Croki2 etc. (text mining)
7. Numerical experiments
Ch 3: Model-based co-clustering
1. Latent block model
2. Classification Maximum likelihood approach
3. Complete data likelihood
4. Block CEM Algorithm
5. Binary, continuous and Co-occurrence table
6. Maximum likelihood approach
8. Variational Algorithm
9. Binary, continuous and Co-occurrence table
10. Selection of the model
11. Numerical experiments
Ch 4: Matricial-based co-clustering
1. Matricial formulation of co-clustering
2. Nonnegative matrix factorization
3. Nonnegative matrix Tri-factorization
4. Numerical experiments
Ch 5: Co-clustering in Bioinformatics
1. Cheng and Church algorithm
3. Plaid model
4. Spectral algorithm
5. Xmotifs algorithm
6. Numerical experiments