Research

STYLOMETRY

Integration of natural language processing, machine learning, and literary criticism

Stylometry is the study of authorial or literary style using quantitative metrics. A wide range of stylistic features are amenable to measurement, from vocabulary to punctuation and syntax to rhythm. The practice of using quantitative evidence in literary study has been a longstanding component of “philology,” a technical term for the study of language and literature, and as such has regularly featured in classical scholarship. More recent research has exploited the power of computation to do calculations that are many times more numerous and more advanced than those possible using manual methods. Stylometry sdfthus forms one of the main research activities within computational text analysis today. Stylometry has often been used to answer questions of disputed authorship, since the data can provide objective evidence to support or reject the attribution of a text to a particular author. QCL redeploys similar methods for the purpose of literary criticism. We create quantitative profiles with the primary aim of drawing inferences about complex, subjective questions such as the literary significance of stylistic modulation within an individual work or across a broader tradition. Central to QCL’s stylometric research is the use of sophisticated techniques from machine learning to understand high-dimensional literary data and to profile large-scale trends in the evolution of literary style and influence. Our first paper on stylometry, published in PNAS in 2017, uses anomaly detection methods to characterize the citational practices of the Roman historian Livy, with the parallel goal of demonstrating the promise of machine learning-aided literary criticism to scholars outside of the humanities. QCL has long-term interests in the development of cross-cultural stylometric profiles of literature and in leveraging quantitative philological methods for natural language processing tasks across domains.