Topic Modeling

Topic Modeling: An Application

Introduction My work involves the use and the development of topic modeling algorithms. A surprising challenge I have had is communicating the output of topic modeling algorithms to people not familiar with text analytics. Here is my 10 cents explanation of the LDA output to my econ friends. The use of text data for economic analysis is gaining attractions. One popular analytical tool is Latent Dirichlet Allocation (LDA), also called topic modeling (Blei, Ng, and Jordan 2003).

Introduction to LDA

Introduction An important development of text analytics is the invention of the Latent Dirichlet Allocation (LDA) algorithm (also called topic modeling) in 2003. LDA is non negative matrix factorization algorithm. A matrix factorization consists of decomposing a matrix into a product of two or more matrices. It turned out that these linear algebra techniques have applications for data analysis. These applications are generaly referred as data dimension reductions methods. Examples of matrix factorization methods in statistics include Factor Analysis, Principal Component Analysis, and Latent Dirichlet Allocation.