This paper approaches a system that has been designed, and continues to be in development, for the aggregation of metadata surrounding collections of documentary literary sound recordings, as an object for theoretical and practical discussion of how information about diverse collections of time-based media should be managed, and what such schema and system development means for our engagement with the contents of such collections as artifacts of humanist inquiry. Swallow (Swallow Metadata Management System 2019), the interoperable spoken-audio metadata ingest system project that is the boundary object for this talk, emerged out of the goals of the SpokenWeb SSHRC Partnership Grant research network to digitize, process, describe, and aggregate the metadata of a diverse range of sound collections documenting literary and cultural activity in Canada since the 1950s. Our talk, collaboratively written and delivered by a literary scholar and critical theorist, a digital projects and systems development librarian, and a library developer / programmer, outlines 1) a theoretical rationale for the audiotext as a significant form of data in the humanities, 2) consequent modes of description deemed necessary to render such data useful for humanities scholars, and 3) a rationale for the development of a specific form of database system given the material and systems contexts that inform our national holdings of documentary literary sound recordings at the present time.