Task-Driven Comparison of Topic Models

IEEE Trans Vis Comput Graph. 2016 Jan;22(1):320-9. doi: 10.1109/TVCG.2015.2467618. Epub 2015 Aug 13.

Abstract

Topic modeling, a method of statistically extracting thematic content from a large collection of texts, is used for a wide variety of tasks within text analysis. Though there are a growing number of tools and techniques for exploring single models, comparisons between models are generally reduced to a small set of numerical metrics. These metrics may or may not reflect a model's performance on the analyst's intended task, and can therefore be insufficient to diagnose what causes differences between models. In this paper, we explore task-centric topic model comparison, considering how we can both provide detail for a more nuanced understanding of differences and address the wealth of tasks for which topic models are used. We derive comparison tasks from single-model uses of topic models, which predominantly fall into the categories of understanding topics, understanding similarity, and understanding change. Finally, we provide several visualization techniques that facilitate these tasks, including buddy plots, which combine color and position encodings to allow analysts to readily view changes in document similarity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.