Linguistics alumna Eleanor Lin (CC ‘24) is now a PhD student in Computer Science at UMich, making waves in natural language processing (NLP) research. She’s especially interested in the intersection between NLP and multilingualism. Eleanor generously shared insights on her path to graduate school, her senior thesis project and current research interests, and her inside advice for undergraduates interested in NLP research and careers.
Can you tell us a little about yourself?
I am a second-year PhD student in computer science and engineering at the University of Michigan. I double-majored in computer science and linguistics at Columbia. I currently research NLP. NLP builds systems that can understand and produce language, and also applies those systems to study language and communication.
Many of us stumble into linguistics, discovering it somewhat fortuitously. What is your linguist origin story?
I come from a multilingual family. Our heritage languages are Taiwanese Hakka and Taiwanese Southern Min, but we also speak Mandarin Chinese and General American English by necessity. However, in grade school, I studied Spanish, Latin, and German (in that order). I enjoyed and excelled in my language classes, so my parents encouraged me to continue following my interests. I had only a vague understanding of what exactly linguists did when I began the linguistics major, but I had enough exposure to different languages at the beginning to understand that, as systems of sound, meaning, and social interaction, languages are really amazing. I just did not have the proper vocabulary to articulate this at the time.
You’ve been a very active NLP researcher, both at Columbia and now at UMich. How did you get started with NLP research?
My first undergraduate research position was with then-PhD student Katherine Moore, in the Cognitive Science in Education program at Teachers College. I found this position through a listing on the Undergraduate Research and Fellowships webpage. This was during the spring of my first year of undergraduate studies. At that point, my technical skills were not very advanced yet, so the most I could help with was manual data annotation. However, I learned a lot from working on this project about just how impactful NLP is across disciplines. In this specific project, Dr. Moore was using NLP to study the psychology of collaborative learning, which has implications for how teachers design classroom group-work activities. Interdisciplinary research has continued to be a key part of my intellectual identity since then, and is one of the things I enjoy most about my research area.
What role did working at Dr. Hirschberg’s lab play in this?
Dr. Julia Hirschberg’s Spoken Language Processing Group (also known as the Speech Lab) was a great place to learn about all aspects of the research process. Research is a very complex endeavor. Besides the things we typically think about and see, like running experiments, there are lots of other crucial things that need to go right, e.g., teamwork, time management, and paper writing. I was very fortunate to be mentored by Dr. Hirschberg and multiple of her PhD students (Run Chen, Ziwei Gong, and Debasmita Bhattacharya), who helped me learn these things.
What is the focus of your PhD dissertation?
I have not chosen my PhD dissertation topic yet, but I am particularly interested in both applying NLP to understand how multilingual people communicate, as well as building technologies that better serve multilinguals.
More generally, the norm for computer science doctoral programs differs from some social science and humanities fields. Instead of writing one giant dissertation, we usually write several papers (on average, one per year in my advisor’s lab), ideally all following some common theme. Then, these papers help form the chapters of our dissertation.
For example, I recently wrote a paper looking at language mixing in LLMs. There are patterns to this mixing, and I wanted to see if those patterns could indicate something useful about how the models are trying to solve problems. A few days ago, I actually presented that workshop paper at the NeurIPS conference, which is an AI conference in San Diego!
NLP requires a strong background in computer science and linguistics. Can you tell us about your experiences learning to work in two disciplines? How much overlap do you see between them, and what is very distinct/unique about linguistics?
I actually had the privilege of asking the illustrious Dan Jurafsky some version of this question once, and I agree with the answer he gave, which I will reiterate along my own lines here. Linguistics (and the social sciences and humanities more broadly) help us design better and more interesting research questions to pursue. Computer science gives us the tools to answer those questions. For example, traditional corpus linguistics might require manually annotating thousands of examples of human language, a painstaking and time-consuming endeavor. Now with advances in large language models (LLMs), we can accelerate this annotation process. But LLMs cannot tell us which questions to ask about the data (at least, not yet).
Pursuing the double major was quite intense. I took almost no courses that were not for the linguistics major, the computer science major, or Core Curriculum during my time at Columbia, and I still had at least 16 or 17 credits of coursework every semester. However, from where I stand now, I am quite happy that I decided to double major, as I believe that having this dual background is a unique strength. Many NLP researchers today lack linguistics knowledge, and I think this is something that needs to change about our field. Specifically, I think linguistics training fills a gap in computer science training, since it teaches you to design and execute experiments. I try to align the technologies we develop more with the humanistic values that I see reflected in what linguists advocate for.
Most of your interviewers (3⁄4) are currently working on their senior thesis projects. We recall your project mixing computer science and linguistics to build a translator for Taiwanese– super cool! Can you tell us more about your senior thesis at Columbia?
For my senior thesis, I took up the challenge of building a system to translate spoken Taiwanese Southern Min into spoken English, automatically. While this has been done previously (by Meta), the twist that I added was that I wanted to use only free and publically available resources. This was because in practice, most language communities don’t have comparable financial, computational, or technical resources to Meta to build language technologies for the languages they speak. So I wanted to see if this was practicable for one of my community’s languages.
In my thesis I argued that this model worked in some aspects, although not all. I think the approaches I used are scalable, but I would say that I didn’t have nearly enough data to make a useful translation system for my language community.
How did your senior thesis project shape your thinking about your future research, especially as you were looking ahead to graduate school? How has your senior thesis research been relevant to your PhD research?
For my senior thesis, I had to review a lot of literature and apply methods for machine translation and speech translation. This definitely continues to be applicable to my PhD research, as I am still working on multilingual NLP and speech processing. More broadly, conducting that project (as well as my current research) has convinced me that multilingual NLP is a deeply worthwhile, though currently undervalued, research area. As linguists, most of us probably agree that the right to freely speak one’s mother tongue is a fundamental human right. However, even from a purely market-driven perspective, advancing multilingual NLP would allow tech companies vast opportunities for growth. There are simply so many people on Earth who speak languages other than English, and the performance gap they experience for their languages compared to English is huge. So I am optimistic that my research area will continue to grow in the future.
You went straight from undergrad to graduate school. Can you tell us about how you made this decision, and what the experience has been like for you?
A significant deciding factor for me was finances. Some people choose to complete a master’s degree before applying for doctoral programs, although in the United States this is not a requirement. However, most master’s degrees in the United States are expensive and do not provide financial aid either. In contrast, all reputable doctoral programs cover most or all of the costs of attendance for their students, and additionally pay us a stipend so we can afford basic living costs.
A second factor is that I was pretty sure I wanted to pursue an academic career, so I knew that I needed a PhD. Overall, I do think that no matter how much research you conduct as an undergraduate, you will still feel a gap transitioning from undergrad to PhD. However, PhD is mostly about learning just how much you can learn by doing, so that gap is definitely bridgeable.
Lots of students here are interested in NLP research and jobs. What types of projects, trends, and opportunities should students be aware of? Do you have any advice for undergraduates with these goals in mind?
The great news is that almost everyone is finding ways to incorporate NLP into their research nowadays. I highly recommend staying open-minded and searching for opportunities from any and all departments. In the past, I have seen NLP-related research positions advertised by the Department of Computer Science, the School of Journalism, and the Data Science Institute. You can also cold-email professors, network with classmates, or connect with Undergraduate Research and Fellowships. Summer Research Experiences for Undergraduates (funded by the National Science Foundation) are an additional option beyond Columbia.
Don’t be discouraged by the initial rejections and never take them personally. Once you get your first experience, it becomes much easier to apply for other research experiences. So take your responsibilities seriously, no matter how simple or seemingly menial the task. And don’t be afraid to thoroughly understand expectations and advocate for yourself throughout as well. Both your mentors and you should be mutually benefiting from the experience.
What’s next for you? Do you have any burgeoning research interests?
I want to stay in the multilingual NLP space for now. Recently, I’m thinking that it’s time to get more involved in speech research again, as I’ve been focusing on LLMs in the past year.
But we shall see!