Recent Posts

Introduction In this post, I will scrape the 2018 State of the State Addresses (SoSAs), convert the speeches into a dataframe of words counts with the rows representing the speeches and the columns representing the words. This type of dataframe is known as document term matrix (dtm). I will also perform some exploratory analysis of the constructed dataset. Every year, at the beginning of the year, most U.S governors present their visions for their states in their SoSAs.

CONTINUE READING

Introduction Whenever I give a talk on topic modeling to people not familiar with the subject, the usual question I receive is: “can you provide some intuition behind topic modeling?” Another variant of the same question is: “This is magic. How can the computer identify the topics in the documents?”. No! It is not magic. It is Math. I presented the math behind Latent Dirichlet Allocation, and an example apllication in previous posts.

CONTINUE READING

Introduction My work involves the use and the development of topic modeling algorithms. A surprising challenge I have had is communicating the output of topic modeling algorithms to people not familiar with text analytics. Here is my 10 cents explanation of the LDA output to my econ friends. The use of text data for economic analysis is gaining attractions. One popular analytical tool is Latent Dirichlet Allocation (LDA), also called topic modeling (Blei, Ng, and Jordan 2003).

CONTINUE READING

Introduction My work involves the use and the development of topic modeling algorithms. A surprising challenge I have had is communicating the output of topic modeling algorithms to people not familiar with text analytics. Here is my 10 cents explanation of the LDA output to my econ friends. The use of text data for economic analysis is gaining attractions. One popular analytical tool is Latent Dirichlet Allocation (LDA), also called topic modeling (Blei, Ng, and Jordan 2003).

CONTINUE READING

Introduction An important development of text analytics is the invention of the Latent Dirichlet Allocation (LDA) algorithm (also called topic modeling) in 2003. LDA is non negative matrix factorization algorithm. A matrix factorization consists of decomposing a matrix into a product of two or more matrices. It turned out that these linear algebra techniques have applications for data analysis. These applications are generaly referred as data dimension reductions methods. Examples of matrix factorization methods in statistics include Factor Analysis, Principal Component Analysis, and Latent Dirichlet Allocation.

CONTINUE READING

Working Papers

This paper provides empirical evidence for the role of political leadership in economic performance. Although “good political leadership” is often cited as important, empirical tests are suspect because good leadership is typically an ex-post conclusion. We posit that good leaders are committed to economic performance and that this commitment will be evident in their public announcements. Taking advantage of recent developments in machine learning, we apply Latent Dirichlet Allocation (LDA) to quantify the priorities expressed in U.S governors’ State of the State Addresses. We validate the approach by showing that these thematic contents mirror objective measures of actual future state budgets. More importantly, we find strong evidence that consistency on priorities predicts measures of economic performance. The approach developed and expounded upon in this paper shows that a leader’s commitment to economic performance can be measured objectively and that this commitment has real and measurable consequences. (JEL C38, O21, P16, R50, H52)
Keywords: Political leadership; Development; Economic Performance; Topic modeling; Canonical Correlation Analysis; State of the State Addresses.
, 2018.

Topic Modeling (TM) is a text data dimension reduction algorithm, akin to factor analysis (FA) or principal component analysis (PCA), widely used for text data analysis (classification, clustering, etc.). Modern TM algorithms such as Latent Dirichlet Allocation (LDA) are probabilistic and complex, impeding their intuitive understanding. However, relating them to Non-Negative Matrix Factorization (NMF), and PCA mitigates this impediment. Indeed, parallel to being analogous to NMF, LDA also emerges from Principal Component Analysis (PCA), both of which are intuitively easy to understand. Therefore, presenting LDA as emerging from NMF and/or PCA provides an intuitive grounding of modern TM algorithms.
, 2018.

Attracting and expanding businesses in the state often appear prominently in the speeches of political leaders. Do they talk the talk or do they actually follow through? Do their commitments to promote businesses matter? The talks about promoting businesses suggest that political leaders play a role in the expansion of business establishments. Though that may be true, the economics literature is silent on the ability of political leaders to alter the dynamics of business establishments. Applying machine learning algorithms to U.S governors’ State of the State Addresses from 2001 to 2013, this paper captures the level of the governors’ professed business friendly agendas; then studies the relationship between the governor’s long term commitment to his/her business agenda and business dynamics in his/her state. The paper shows that the commitment of the governor to expand business establishments in his/her state is positively associated with the growth rate of business establishments in their states. (JEL O10, R50, C38)
, 2017.

In recent years, a growing number of economists have come to recognize the importance of political leadership in promoting economic performance. However, without an agreed upon measure of leadership, formally demonstrating and testing this relationship remains elusive. This paper proposes identifying economic leadership by measuring the consistency with which leaders talk about economic issues. We employ a text analytics approach–Topic Modeling–to studying leaders’ discourses, and measure the relationship between these discourses and economic growth. Specifically, using the Latent Dirichlet Allocation (LDA) algorithm, we identified the topical content of U.S governors’ state of the state speeches from 2001 to 2013, constructed a consistency measure over these topics, and studied the relationship between the consistency of these topical content and the states’ real GDP growth. We find that the consistency with which governors address economic issues is strongly associated with economic growth. (JEL C40, H70, O40)
, 2016.

Recent & Upcoming Talks

Projects

Coming Soon

Will be presenting projects I worked on, or I am currently working on.

Teaching

I have taught introductory statistics for business students (several times), Math boot camp for beginning econ PhD students; and I am currently teaching a graduate level multivariate statistics course:

  • Econ-215: statistics
  • Stat-873: Applied Multivariate Statistical Analysis
  • Econ-815: Analytical Methods in Economics and Business