Jeremiah Mercurio is the Head Librarian for Humanities and History at Columbia, and our Subject Librarian for Linguistics. He spoke with us about his journey with research, a librarian’s role within the university, and how Columbia Libraries supports linguistics students. Read on to learn about how an information scientist manages his own research, and for a glimpse of the many resources available to Columbia linguistics students!
Can you tell us a little about yourself?
I’m a research librarian and literary historian who has served since 2018 as the Head of Humanities and History in the Columbia Libraries. Before that, I worked as a research librarian and visiting professor at Haverford College and Fairfield University. I grew up in West Virginia but have lived in many places along the East Coast, including New York, Boston, Philadelphia, and Southwest Virginia. For several years while completing my Ph.D., I also lived in St Andrews, Scotland, which was a lovely experience and certainly a linguistically interesting one! Currently, I live on the Upper West Side with my family and two cats.
What does it mean to be Columbia’s subject librarian for linguistics, and how did you find your way to this role?
As the subject librarian for linguistics, I’m responsible for providing specialized research support and training for students, acquiring and stewarding collections (both print and electronic), and serving as a liaison between the Linguistics program and the Libraries. While I serve as the Libraries’ primary point of contact for linguistics majors, students also benefit from the support and expertise of my colleagues across the Libraries, especially those who support research in global languages and cultures, such as those from our Global Studies team or the C.V. Starr East Asian Library as well as our Research Data Services team and other subject librarians in the sciences and social sciences.
Columbia hired me in 2018 as the Libraries’ Head of Humanities and History. In that position, I lead a team of subject specialists and other public-service librarians who assist both undergraduate and graduate students. When I arrived at Columbia the linguistics role—a rather small duty before the program was reestablished—was being covered by a former Humanities and History librarian who had recently moved into a different position in our Rare Books and Manuscripts Library. As linguistics became an even smaller part of that colleague’s work, it made sense to return the responsibility to the Humanities and History team. I had previously served as a subject librarian for linguistics at Haverford College, so it also seemed natural for me to take on that responsibility directly.
The approval of the linguistics major in 2019 created an imperative for the Libraries’ to take a much more active role in supporting research in linguistics and ensuring that our collections were tailored to the needs of current students and faculty. In conversation with Prof. Landman and others, and deploying my own experience as a subject librarian for linguistics, I began to use the still small collections budget to fill gaps in our holdings and to target a broader array of format types. I also began to meet with an increasing number of students for research consultations and attempted to enhance the discovery of our linguistics resources by creating a guide for researchers.
My own training is in literary history and not linguistics, although my dissertation explored the influence on literary Decadence of the scientific turn in linguistics during the 19th century (through the work of the German Neogrammarians, for example). However, my real exposure to linguistics came through my work at Haverford with Prof. Brook Lillehaugen of the Tri-College (Haverford, Swarthmore, and Bryn Mawr) Department of Linguistics. Her work on Zapotec languages and her collaborative approach to teaching and project development—for example, the digital text explorer, Ticha, for Colonial Zapotec that she created with the Haverford Libraries—really demonstrated not only how exciting and interesting the work could be for students and scholars, but also how important and useful it could be for the speech communities themselves.
You work with undergraduates on all types of research projects. What kinds of resources can all students take advantage of through Columbia Libraries? Are there any underrated/hidden gem resources you’d like to highlight?
The Libraries’ collections are enormous and spread across various digital and physical repositories, so our subject experts are some of the best starting points for working with the collections. In addition to the books, journals, and databases that you’d expect to find in the Libraries, we have rich collections of audio-visual material, rare books and manuscripts, specialized software (such as NVivo for coding and analyzing data), art objects, organizational archives, and many other types of primary and secondary research materials. We also regularly offer drop-in workshops and other trainings on the use of specialized tools and databases that are available to all students.
Some underrated—or at least less obvious—benefits for all students include the Libraries’ digital subscriptions to major newspapers like The New York Times, Financial Times, The Atlantic, and The Wall Street Journal. These subscriptions provide individual access to the newspapers’ websites and mobile apps. We also have access to several collections of streaming feature films and documentaries, not all of which are discoverable through our catalog.
For linguistics students, one potentially useful but hidden resource is the Libraries’ license for ProQuest TDM Studio, a web-based portal that allows users to create, analyze, and visualize corpora of selected full-text documents from the many ProQuest databases to which we subscribe. Access to ProQuest TDM Studio is separate from our access to the ProQuest databases themselves and requires consultation with our Research Data Services team.
Yet another gem for linguistics students is the The Language and Culture Archive of Ashkenazic Jewry (LCAAJ), a project started by Professor Uriel Weinreich, then Chairman of Columbia University’s Department of Linguistics, in 1959. The collection includes nearly 6,000 hours of taped field interviews with Yiddish speakers and approximately 100,000 pages of accompanying field notes.
Linguistics seniors are currently staring down the barrel of their senior thesis paper. What advice can you give to students as we begin such a large research project, and how can we use the Libraries’ materials to make our research more efficient, accurate, and thorough?
Embarking on a large research project is exciting but daunting. Working with your advisors to refine your research questions, identify appropriate methodologies, and get recommended readings is a great place to begin. However, it’s also good to remember how circuitous the research process is and how much it involves false starts, dead ends, and a great deal of self-education. The messiness of the process is also what makes it so generative, helping you to identify gaps in the published literature, opportunities for further research, and insights that might not otherwise have been obvious. It’s also important to remember that a senior thesis paper is not a dissertation and that an excellent thesis doesn’t necessarily have to exhaust its subject or stretch beyond the scope of the assignment. The work you do for the project could feed into graduate study or other kinds of work or community engagement down the road.
Having said all that, I would recommend some resources and tools that can make your work more efficient and thorough. One excellent resource for starting your research is the Oxford Bibliographies in Linguistics. These are curated bibliographies on a range of linguistics topics that outline introductory, foundational, notable, and recent works; for instance, you can find bibliographies on causatives, Caddoan languages, and Noam Chomsky—to name just three entries from the Cs. These bibliographies can help you locate scholarship quickly and give you a sense of its relative importance.
Another tool that I always recommend is Zotero, a free research management program that allows you collect, organize, store, cite, annotate, and share your research. Using Zotero early in the research process helps you keep it organized from an early stage and makes it simple to cite those sources once you’re in the writing phase(s) of your project. The Libraries regularly offer workshops and individualized support for using Zotero.
How do Columbia Libraries build linguistics collections? How do you find materials, add them to the collection, and catalogue them? Is there anything unique about Columbia’s process, compared to other institutions you’ve been at?
The Libraries’ collection strategies depend greatly on format. For books and ebooks, libraries increasingly rely on vendor approval plans and publisher frontlists. Approval plans allow us to set parameters based on publisher, subject, audience, cost, and so on, which the vendors use to automatically send us new titles that adhere to those criteria. For some well-reputed publishers, we also buy packages of all or most of the titles they publish each year in the subjects for which we collect. We of course use traditional strategies as well—such as book reviews in linguistics journals and recommendations from faculty and students—but the great advantage of the approval plans and frontlists is that they allow the librarians to dig deeper in the small-press catalogs and organizational publications to expand the range of our holdings and incorporate a wider set of voices. A good example of this kind of work is the Lakota Language Consortium’s recent edition of its bilingual Lakota and English dictionary.
Acquiring language data is much more challenging. There are some commercially produced or distributed datasets that the Libraries can and do acquire, but even those data require different strategies to identify, purchase, store, preserve, and make them discoverable. These datasets also often come with restrictive licenses that require us to provide more mediated access than for traditional research materials. Because collecting and maintaining linguistics data is challenging for individual libraries, disciplinary repositories like the Linguistics Data Consortium, of which Columbia is a member, play a very important role in making such material available to researchers.
Of course, many library collections that were not acquired for linguistics could have value for linguistics researchers. The Libraries’ oral history archives is one example. Another is the ProQuest TDM Studio that I previously mentioned.
Any new acquisitions you’d like to highlight?
We recently acquired a large set of full-text corpus data in English, Spanish, and Portuguese. Often called the Davies corpora after the retired linguistics professor, Mark Davies, who compiled and maintains them, they represent data from many web and media sources and include the Corpus of Contemporary American English (COCA) and Corpus of Historical American English (COHA).
Can undergraduates request acquisitions, and if so, how?
Absolutely. If the item isn’t in CLIO, students can use the “Recommend a Title for Purchase” form or more simply email me with suggestions.
Linguistics is unique among the humanities because it’s a relatively new field, and the key materials are not always text-based (i.e. much of linguistics is done via recording). How does this affect the types of materials you acquire, and how you catalogue them?
Columbia’s collecting for linguistics typically centers on text-based, peer-reviewed scholarship because of the small size of the budget and my role in acquiring general circulating—as opposed to unique or rare—materials. However, we also regularly acquire commercially produced video and audio content, datasets, and other non-textual material.
The barriers to collecting generally relate not to format but to the type of publisher. Columbia’s procurement policies require vendors to go through an approval process that protects the university but can be challenging for small organizations and individuals from whom we might want to purchase content. Those same vendors often can’t host the material on their own websites, requiring the Libraries to develop an infrastructure for hosting and stewarding content that increases our acquisition and preservation costs. Traditional publishers also supply metadata. For material that hasn’t already been collected and described by another institution, we have to catalog it locally and might not have in-house expertise to do so, depending on the language, script, or subject knowledge.
We face another set of barriers for collecting archival material. We don’t have an active curatorial program for linguistics at Columbia; however, the archives do contain rich sets of historical material valuable to linguists. This material, such as the Franz Boas Chinook language materials in our Rare Books and Manuscript Library, is valuable to linguists but was also collected in ways that don’t necessarily adhere to today’s best practices. Its use requires researchers to understand and interrogate its perspective, limitations, biases, potential distortions, and omissions.
We’d love to hear about your research interestsǃ What kinds of projects do you work on, and how do you organize your own research?
Most of my own research centers on British literature of the long 19th century with a particular emphasis on the Victorian fin de siècle. I’m especially interested in the relationship between word and image, including what my frequent collaborator and I term “literary doodling,” which is the doodling by authors in their literary manuscripts, notebooks, and personal libraries and the role it plays in the creation and reception of literature. We published a book on that topic, The Form and Theory of Literary Doodling (Cambridge University Press), in February and are currently co-editing a series on doodles and marginalia for the same press.
Because my work focuses so much on visual culture, I’m frequently dealing with a large number of digital images. In addition to reproductions of illustrations, doodles, and other drawings, I collect numerous photographs of manuscripts and other archival material from my visits to literary archives. I’ve started to use Tropy to help manage these files. Tropy is an open-source tool developed by the same research center at George Mason University that created Zotero. It helps researchers to organize, tag, and annotate their research photographs. I’m hoping to offer a few workshops on it for the Columbia community this year.
I also use Zotero for managing citations and notetaking. Sometimes my co-author and I use OneNote for collaboratively editing manuscripts. Otherwise, my methods are fairly old-fashioned: taking notes in Word or Google Docs, naming files consistently, and storing them in predicable places on cloud storage.
Given that you have experience both using and curating research materials, can you share some “insider information” related to note-taking, finding/using secondary sources, keeping track of large amounts of data, or other common struggles for researchers?
In my own work, I’ve found that there isn’t any magic formula for note-taking and organizing sources other than to adopt tools and methods that you can use consistently. The most advanced annotation tools aren’t valuable unless you use them consistently. Of course, it’s important to think about your needs before you embark on a project: What kinds of data and metadata do you need? What format types will you be working with? And what are your long-terms plans for the recordings, data, etc.? If you’re making your data available alongside your more formal publications, it’s worth thinking about long-term stewardship, including file types, permissions, privacy, and storage. But again the best approach is always that one with which you’ll stick so long as the tools and methods you use provide the type of information you’ll need in the short and long term. For those getting ready to conduct fieldwork, the Libraries provides access to a number of guides that can provide more detailed and specific recommendations for tools and methods, including Linguistic Fieldwork: A Practical Guide and Field Linguistics: A Beginner’s Guide.
In terms of finding sources, identifying the main journals in your area(s) of research is a good way to stay on top of the current literature. Similarly, tracing citations backward (though bibliographies) and forward (via tools such as Google Scholar) in time helps you to identify scholarly discussions outside the scope of individual journals. Resources such as the Oxford Bibliographies that I mentioned earlier and Annual Reviews are also useful ways to get your head around a field or subfield, especially if you’re new to the research in that area.
Beyond those sources, understanding how different types of sources are cataloged and organized can help you to be a better researcher. For example, you can eliminate some of the messiness of keyword searching by using Library of Congress subject headings. For research on individual languages, this is an especially useful strategy because there will be a main subject heading for the language as well subheadings that group resources based on their focus regardless of the particular keywords it uses. For example, you can easily find sources on Hausa language > History or Hausa language > Phonology by finding those headings and using them in CLIO or another catalog such as WorldCat. If you are using keywords—and it’s a good idea to vary your strategies—using a tool like Ethnologue can help you to identify variant spellings for language names that you can add to your search. Archival material and datasets often use different controlled vocabularies in separate repositories, so understanding the conventions used to organize and describe each material type will further help with your searching. For instance, using the Libraries archives portal to find special collections and knowing how to read a finding aid (for which the Libraries offer some drop-in workshops) will greatly improve your archival research.
Finally, I would simply emphasize how fun and rewarding research can be, despite and sometimes because of its messiness. Having the opportunity to research a topic at length is valuable not only for its research outputs but also for the experience, which is sometimes a hard lesson to remember when a deadline is bearing down on you. Don’t forgot your human guides—your professors, librarians, and peers—and good luck!