Papers
arxiv:2601.11000

When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs

Published on Jan 16
· Submitted by
SunZX
on Jan 19
Authors:
,
,
,
,
,

Abstract

Personalized large language models can generate false information aligned with user history instead of factual truth, but a new method called FPPS helps maintain both factual accuracy and personalized responses while preserving existing personalization effects.

AI-generated summary

Personalized large language models (LLMs) adapt model behavior to individual users to enhance user satisfaction, yet personalization can inadvertently distort factual reasoning. We show that when personalized LLMs face factual queries, there exists a phenomenon where the model generates answers aligned with a user's prior history rather than the objective truth, resulting in personalization-induced hallucinations that degrade factual reliability and may propagate incorrect beliefs, due to representational entanglement between personalization and factual representations. To address this issue, we propose Factuality-Preserving Personalized Steering (FPPS), a lightweight inference-time approach that mitigates personalization-induced factual distortions while preserving personalized behavior. We further introduce PFQABench, the first benchmark designed to jointly evaluate factual and personalized question answering under personalization. Experiments across multiple LLM backbones and personalization methods show that FPPS substantially improves factual accuracy while maintaining personalized performance.

Community

💡 Overview
Personalization is increasingly adopted in modern LLM systems, but we find it can systematically distort factual reasoning. We identify personalization-induced hallucinations, where models generate answers aligned with user history rather than objective truth. To mitigate this, we also propose Factuality-Preserving Personalized Steering (FPPS), a lightweight inference-time approach that detects harmful personalization and adaptively steers internal representations to recover factual correctness while keeping useful personalization.

🔥Key Insights

  • Problem discovery: We provide the first systematic study of personalization-induced hallucinations and show risks to factual reliability, downstream knowledge acquisition, and long-term user trust.
  • Mitigation method: We propose FPPS, a lightweight inference-time framework that selectively restores factuality under personalization.
  • Evaluation dataset: We develop PFQABench to jointly evaluate factual QA and personalized QA under aligned user sessions, enabling controlled assessment of factuality failures and mitigation.
  • Results: Extensive experiments across multiple LLM backbones and personalization methods show FPPS substantially improves factual accuracy without sacrificing personalization performance.
Paper author Paper submitter
This comment has been hidden (marked as Abuse)

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.11000 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.11000 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.11000 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.