This is a demo of 'LDAvis', our interactive visualization tool for topic models fit using LDA. Below is a visualization of a 40-topic model fit to the AP data (2246 Associated Press documents made available by David Blei on his website).
For an explanation of our tool, see our paper, LDAvis: A method for visualizing and interpreting topics, to be presented at the 2014 ACL Workshop on Interactive Language Learning, Visualization, and Interfaces in Baltimore on June 27, 2014. Or check out our repo on github.
To use the visualization tool, click a circle in the left panel to select a topic, and the bar chart in the right panel will display the 30 most relevant terms for the selected topic, where we define the relevance of a term to a topic, given a weight parameter, 0 ≤ λ ≤ 1, as λ log(p(term | topic)) + (1 - λ) log(p(term | topic)/p(term)). The red bars represent the frequency of a term in a given topic, (proportional to p(term | topic)), and the blue bars represent a term's frequency across the entire corpus, (proportional to p(term)). Change the value of λ to adjust the term rankings -- small values of λ (near 0) highlight potentially rare, but exclusive terms for the selected topic, and large values of λ (near 1) highlight frequent, but not necessarily exclusive, terms for the selected topic. A user study described in our paper suggested that setting λ near 0.6 aids users in topic interpretation, although we expect this to vary across topics and data sets (hence our tool, which allows you to flexiby adjust λ).
Carson Sievert and Kenny Shirley
May 28, 2014
[Updated: March 17, 2015]
PS. In case you were wondering, yes, these new articles are mostly from the late 1980s and early 1990s.