Skip to Content, Navigation, or Footer.
The Cornell Daily Sun

AI reading scientific literature

‘AI is, at its heart and best, a tool’: Students, Faculty Weigh in on the Use of Large Language Models to Read Academic Papers

Reading time: about 5 minutes

Artificial intelligence has grown commonplace in higher education, with some students using it to assist with learning and professors often acknowledging or integrating it into their classes. The development of large language models — a category of deep learning models trained to understand and generate natural language  —  could help young students and researchers in academia. 

A recent study from Cornell physicists and Google researchers tested six LLM systems — including ChatGPT, Claude and Google Gemini — on their ability to read scientific literature at the level of a specialist.

A Helpful Tool

The study found that some systems performed better than others, revealing gaps in the current LLM capability. Researchers then created a wish list for areas AI developers should improve in future models. 

Prof. Eunah Ah-Kim, physics, was a corresponding author for the study. Ah-Kim was motivated to contribute to the study because of how time-consuming it is to read and write scientific papers while simultaneously doing research. 

“Many people quote review articles and citations without actually reading the paper,” Ah-Kim said. 

Ah-Kim added that having a conversation with a person who understands the content can offer context and clarity, and AI can simulate those conversations to increase understanding.

“AI models can be a more efficient way to engage with literature and help students and myself with making scientific endeavors a much more collective effort,” she said.

Researchers first curated a database of 1,726 scientific papers that cover the history of the field of high-temperature cuprates, which are a family of high-temperature superconducting materials made with layers of copper oxides.

With these measures, they examined Chat GPT-4, Claude 3.5, Perplexity, Gemini Advanced Pro 1.5 and NotebookLM, a Google product that answers a user's questions based on documents provided. They also tested a retrieval-augmented generation system, custom made for the study,  that is capable of retrieving relevant images and texts from curated documents. 

The researchers then developed a set of 67 questions to give insight into the LLM’s deeper understanding of literature. 

To evaluate the performances of these language models, a group of 12 experts with high experience in the field manually graded the answers of these systems based on a rubric that assessed how balanced, factual, succinct and supported the LLMs’ responses were.

The identities of the different LLM systems remained unknown to the experts grading. After grading these systems, scientists in the field concluded that Chat GPT-4, Claude 3.5, Perplexity, Gemini Advanced Pro 1.5 are trained to pull their answers from any available web data, and crawl the internet to find data sources relevant to the query and utilize them when responding. 

Meanwhile, NotebookLM and the RAG system did the best at curating information since the answers were pulled from more reputable sources.  

Ah-Kim also pointed out that although LLMs excel at pulling out text-based information, they struggle to engage with data visualization. 

“Models are good at handling texts, but they are not yet at an expert level,” she said. “Experts have to read the data visualization and look at the figures first.” 

Some students have also started utilizing LLMs in reading scientific literature.

Abrar Amin ’28, who is majoring in computer science and astronomy, has been using LLMs throughout this semester. 

“AI is a pretty good tool for exploring a field and learning about it,” Amin said. “It gives people the ability to access academic papers once they get a cursory understanding of a subject.”

Meanwhile, Parker Fuld ’29, who is majoring in math, has used LLMs to assist with understanding scientific papers for three years. While he doesn’t use them regularly, Fuld recognizes their usefulness. 

“AI is, at its heart and best, a tool,” Fuld said. “One of the best uses for the AI tool is to gain knowledge, and reading scientific papers is hard for everyone. Background information is helpful in understanding these papers but can be difficult to obtain without AI’s assistance.”

A Work in Progress

Using AI as a tool to read scientific literature has key limitations. Ah-Kim noted that the models sometimes pull from non-peer-reviewed sources, such as opinion and blog articles. 

“When the models are answering the questions with unlimited resources, there is a problem with mirroring what is found more frequently,” Ah-Kim said. “Opinions found all over the web are not reliable information. Credibility vetted sources should only be used to read scientific literature.”

Anna Johnson ’29, who is studying astrophysics, said she wouldn’t consider using AI to read scientific papers. 

“AI is a recent phenomenon, and can be unreliable at times,” Johnson said. “Drawing conclusions from AI for scientific writing may not be entirely accurate.”

Students have also observed the errors AI can make in their own usage. Natalie Deng ’27, who is studying biological sciences, has previously used AI to assist with reading assignments. 

“It can get stuff wrong sometimes,” Deng said. “I find that after using it to summarize papers, I still have to cross-reference what they say by reading through the article myself, which is more time-consuming in the end.”

Ultimately, Ah-Kim encourages students to still immerse themselves in the scientific literature they use for research even if they are using AI to help them. 

“Learning how to read critically and being able to measure how the claims relate to the evidence is vital when conducting research,” she said. 


Read More