This presentation discusses two related efforts to substantially advance research in the emerging area of digital intertextual studies: a novel approach to large-scale intertextual study and an extension of digital methods to take account of citations in secondary literature.
First, to be useful, “big data” must be accompanied by “big interpretation.” Too often, however, the data generated using new high-throughput methods can overwhelm traditional analytical approaches. The Tesserae Project has pioneered the computational study of intertextuality through implementation of search tools for specific intertexts (Coffee et al. 2012, Coffee et al. 2013). As more tools are developed to capture the full range of linguistic, thematic, and metrical allusions in classical literature (Thomas 1986, Hinds 1998), the capacity to generate intertextual hypotheses threatens to outpace the ability of any one critic to evaluate them.
The application of machine learning clustering algorithms to intertextuality can help address this problem. In this part of the presentation, we reflect on the value of unsupervised machine learning for large-scale, unbiased studies of literary intertextuality, using techniques from biological studies. In the 1990s, the development of microarray technologies enabled simultaneous measurement of the expression levels of thousands of genes (Chu et al. 1998, Brown and Botstein 1999), and standard machine learning clustering algorithms k-means and hierarchical are now routinely applied to genome-wide studies (Eisen et al. 1998, Bruent et al. 2004). Here we discuss the application of a more robust version of standard clustering algorithms, non-negative matrix factorization (NNMF) to intertextual studies, taking as a test case the influence of the Aeneid, the Metamorphoses, and Senecean tragedy on Silver Latin epic. This analysis appears to support a shift away from viewing Silver epic primarily through a Vergilian lens (Hardie 1993, Gervais 2013). A key advantage of clustering is that it accounts for both discovery and (initial) interpretation; beyond generating lists of potential intertexts, clustering can indicate thematically associated parallels.
In parallel with development of large-scale analytical techniques are efforts to use digital intertextual approaches to understand the history of scholarly discussions. Canonical citations of primary sources in journal articles and other secondary sources can indicate intertextual parallels as well as, more generally, text passages the author believes to have some relationship. By capturing canonical citations, we can track over time of how texts and parts of texts have been studied, essential pieces of information for a data-driven study of intertextuality and text reception. Canonical citations can be envisioned as a network, where the nodes are the citing texts (e.g. journal articles) and the cited text passages of the primary sources, connected by edges (i.e. links) representing the actual citations. Citation networks among modern publications on classics and archaeology have been extensively studied (Brughmans 2012). The networks of citations between modern publications and ancient texts, however, remain largely unexplored, but have great potential particularly for the study of intertextuality and text reception.
We address here technical challenges that need to be solved to capture automatically canonical citations and their meaning (Romanello, Boschetti, and Crane 2009; Romanello 2013) to create this key service in the emerging cyberinfrastructure for classics (Crane, Seales, and Terras 2009; Smith 2010), complementary to recently developed tools, such as Tesserae (Coffee et al. 2013), that allow scholars to discover new possible intertextual parallels. At the same time, the visual exploration of a citation vast network poses challenges for data visualization. We conclude with examples to illustrate the proposed approach to visualization drawn from two different but essential corpora: the journal articles contained in JSTOR and the analytical abstracts of the L’Année Philologique.