School of Computer Science › Language Technologies Institute › News and Events › news › Does Your Chatbot Swear to Tell the Truth?

Headshot images of Maarten Sap and Xuhui Zhou

June 26, 2025

Does Your Chatbot Swear to Tell the Truth?

New research finds that LLM-based agents can't always be trusted to be truthful

By Bryan Burtner bburtner(through)andrew.cmu.edu

Media Inquiries

Bryan Burtner

Language Technologies Institute
bburtner(through)andrew.cmu.edu

As Artificial Intelligence agents grow more advanced and ubiquitous, many users have become accustomed to a common chorus of warnings and caveats: while AI agents are increasingly powerful tools, they are still susceptible to mistakes borne out of the way they work. Biases in training data and occasional “hallucinations” are well-documented risks. But new research from Carnegie Mellon University’s School of Computer Science sheds light on another, less often discussed danger: AI, it turns out, can lie.

A team of researchers from the Language Technologies Institute (LTI), as well as the Allen Institute for AI (Ai2) and the University of Michigan, examined the tendency for Large Language Model (LLM)-based agents, such as chatbots, to stretch, elide, or even completely misrepresent the truth, especially when doing so further the agent’s other goals, which researchers refer to as “utility,” defined as satisfying human needs and instructions. They found that truthfulness and LLM-based agent’s utility are often at odds with each other, and AI models can and do frequently sacrifice truth in service of fulfilling both the needs of their human users and the developer’s instructions.

“AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents,” was recently published in the Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL).

Researchers defined lying as “to intentionally provide falsified or misleading information in order to achieve some internal goal,” explained Xuhui Zhou, an LTI Ph.D. student. Zhou said researchers saw LLMs fail to answer direct questions when the true answer was sub-optimal for utility and give explicitly false information.

Zhou gave as an example a ChatGPT-like model that’s asked to compare commercial products, only to favor the products of a certain manufacturer (such as one that shares a parent company or partnership with the company that developed the LLM model) when describing its features and ranking it against those of competitors.

In the experiment, researchers used an LLM acting as an AI chatbot on the website of a car dealership. When asked to discuss and compare the pros and cons of various makes and models of various cars, the chatbot gave artificially inflated ratings to those vehicles sold by the “dealership” that employed this AI agent. Rather than providing an objective response, this LLM furthered the goals of its employer: the car dealership.

This conflict of interest is especially dangerous, Zhou noted, as many human users inherently expect AI bots to be objective and neutral when asked direct questions.

Perhaps even more surprising than the willingness of LLM agents to misrepresent the truth was the frequency with which they did it. Despite performing well by “goal completion” metrics, every model tested was found to be truthful less than 50 percent of the time.

“I think part of the cause of this high rate of untruthfulness is because of the conflicting goals that we’re setting for LLMs,” Zhou said. “The model is getting instructions both from its developers and from the end user.”

Attempting to meet these various goals, Zhou continued, often leads the models to sacrifice strict adherence to factual accuracy. “Usually, truthfulness is not the priority of the training paradigm.”

The multi-turn setup of the experiments meant that researchers could also examine how models responded to repeated attempts at gleaning a certain piece of information. When prompted repeatedly, the models tended to become more prone to offer untrue responses. And when instructed to pursue their utility goals above all, untruthfulness sometimes rose as high as 90 percent.

“That’s probably the most alarming part,” Zhou said. “If developers or companies deploying these models want to use them to actively deceive users, that’s very easy to achieve.”

LTI Assistant Professor Maarten Sap, Zhou’s adviser and co-author on the paper, added that these instances of untruthfulness are not always nefarious.

“There’s a whole other aspect of this that’s relevant, and that’s the issue of sycophancy,” Sap explained. “There’s a whole category of lying we looked at that was related to preserving someone’s feelings. The models are trained to be very nice and very polite and very agreeable to the user.”

That politeness can also lead to untruthfulness. Models might avoid telling a user something that it determines could be hurtful – an issue that caused ChatGPT to pull a recent update.

How LLMs are built and trained can address untruthfulness said Zhou and Sap. If developers takegreater care when setting priorities, selecting training data, and regulating the development of AI agents, then these issues can be avoided.

“These systems need to be evaluated in much more interactive settings,” Sap said, with varied user groups testing diverse multi-turn situations to determine the safety of the models.

“And from a policy standpoint, not only do we need regulation,” Sap added. “We need democratic methods to ask the public, ‘What should an AI do?’ Should it give you blunt feedback? Should companies be allowed to use LLMs to make themselves look better? I think those are big questions that we need a public debate on that isn’t necessarily currently happening.”

For now, users of LLM-based AI agents will need to rely on their own caution and skepticism to mitigate the pitfalls of the conflicting priorities of utility and truthfulness.

“I think it’s like the old saying goes,” Zhou said. “Take everything with a grain of salt.”