Adversarial Moral Stress Testing of Large Language Models
By: Saeid Jamshidi, Foutse Khomh, Arghavan Moradi Dakhel, Amin Nikanjam, Mohammad Hamdaqa, Kawser Wazed Nafi
Published: 2026-04-02
View on arXiv →#cs.AI
Abstract
This paper investigates adversarial moral stress testing for large language models, aiming to identify vulnerabilities and biases in their ethical decision-making processes under challenging conditions. This is essential for deploying ethical and robust AI systems.