A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling

IEEE Trans Neural Netw. 2010 Jan;21(1):107-22. doi: 10.1109/TNN.2009.2034851. Epub 2009 Dec 4.

Abstract

In this paper, we propose a clustering algorithm based on both Dirichlet processes and generalized Dirichlet distribution which has been shown to be very flexible for proportional data modeling. Our approach can be viewed as an extension of the finite generalized Dirichlet mixture model to the infinite case. The extension is based on nonparametric Bayesian analysis. This clustering algorithm does not require the specification of the number of mixture components to be given in advance and estimates it in a principled manner. Our approach is Bayesian and relies on the estimation of the posterior distribution of clusterings using Gibbs sampler. Through some applications involving real-data classification and image databases categorization using visual words, we show that clustering via infinite mixture models offers a more powerful and robust performance than classic finite mixtures.

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Bayes Theorem
  • Cluster Analysis*
  • Humans
  • Image Interpretation, Computer-Assisted
  • Information Storage and Retrieval*
  • Models, Statistical