Weighted Latent Dirichlet allocation, an Improved Probabilistic Model for Large Corpus of Data

Authors: Bhat, M.R. and Wani, M.A.

Journal: 12th Indiacom 5th International Conference on Computing for Sustainable Global Development Indiacom 2018

Pages: 3827-3833

Abstract:

In the context of text modelling, probabilistic model based on Dirichlet distribution namely Latent Dirichlet Allocation (LDA) is a joint statistical model which allows selection of a particular observed word in a textual document be attributed to unobserved latent topic of the document as it posits that each document is a mixture of many topics. This assumption is modelled using Dirichlet distribution, a discrete distribution generally represented as Dir(α), α is a single parameter or vector of parameters. This study proposes a new distribution, weighted Dirichlet Distribution (WDD) as a generalization and optimization to existing Dirichlet distribution. Weight is applied to parameter α. Different characteristic, statistical and the structural properties of the introduced distribution especially measures of central tendency and dispersion are thoroughly investigated. A simulated data has been generated from Weighted Dirichlet distribution (WDD) to make the comparison between various special cases of the proposed distribution in terms of their AIC's, AICC's, BIC's and P-values. Weighted Dirichlet distribution out performed in modeling data as compared to other distributions. As a generalization and optimization to existing model Latent Dirichlet allocation (LDA), a new model for probabilistic modeling named as Weighted Latent Dirichlet Allocation (WLDA) is also proposed in this study. Generative process of the proposed probabilistic model Weighted Latent Dirichlet Allocation is thoroughly explained in this research besides illustrating working of the proposed model WLDA.

Source: Scopus