FLATTERY GETS YOU

Feb 21
3 min read

Updated: Apr 1

You've likely heard about or experienced the problem of AI tools being overly agreeable and flattering. Some folks have finally taken a serious look at it. A team of researchers demonstrated not merely that most of popular, state-of-the-art AI models will reliably lie to you and flatter you, tell you you're right when you are in fact wrong: they will do so when they know you're making a serious error or even with the knowledge that you are harming someone else. Perhaps obviously, this is why users love them perceive them as indispensable. Also obvious: this is making users worse people as a result.

Six researchers from psychology and computer science departments at Stanford and Carnegie Mellon universities analyzed many thousands of real advice-seeking conversations across 11 different LLMs, including those from OpenAI, Google, Anthropic, Meta, DeepSeek, Mistra, and Qwen. And their findings appeared to be universal. When querying Gemini or ChatGPT about a life decision you're uncertain about or how to respond to a conflict at work or with you spouse, these models would all reliably do the opposite of a good therapist or counsellor and tell you what you want to hear, not what you need to hear. That alone would be a serious problem because, as the team explain, "AI use is often underpinned by expectations of neutrality and objectivity, and indeed we find that participants’ described the sycophantic AI as 'objective', 'fair', providing an 'honest assessment' and 'helpful guidance free from bias.'" But things got darker than that.

Users were validated even when describing their manipulation or deception of others someone, even a friend. The AI would flatter and endorse the behaviour of users even when reporting causing real harm to another individual. Instead of providing needed push-back then AI chose instead to cheer on abusive users. With these results the team, then conducted an experiment. 1,604 users were asked to discussed real personal conflicts with these AI models. Placed in two groups, one cohort was paired with a sycophantic LLM and the other got a more neutral one.

Following the experience the group exposed to inappropriate levels of affirmation and flattery became measurably less prosocial and independent, less willing to compromise, appreciate other views, or apologize. The study found "interaction with sycophantic AI models significantly reduced participants’ willingness to take actions to repair interpersonal conflict, while increasing their conviction of being in the right." The AI validated users' worst behaviours and instincts and they walked away acting more selfish than prior to the study.

Even more disturbing, at least to my mind, those same participants also reported their sycophantic AI as being higher quality and more trustworthy and, as a result, that they were more likely to use those models in the future. While steering them wrong, users reliably reported they felt the harm producing models were better and more reliable. This is what's being aggressively deployed across every business and app out there.

On that point, it might be worth asking what appears to drive user engagement with AI tools? Sycophancy. And so, given that it is the companies gifting this to us that insist they will be worth $10 trillion inside of a decade and are on the verge of granting the world a previously unimaginable utopia, what is the chance they will correct for this and inhibit the very anti-social psychology and behaviour keeping people coming back?

The research team acknowledges all of this and tells us "These effects hold across different scenarios, participant traits, and stylistic factors, raising urgent concerns about how such models distort decision-making, weaken accountability, and reshape social interaction at scale." Indeed. Indeed.