Adversarial Moral Stress Testing of Large Language Models

By: Saeid Jamshidi, Foutse Khomh, Arghavan Moradi Dakhel, Amin Nikanjam, Mohammad Hamdaqa, Kawser Wazed Nafi

Published: 2026-04-02

View on arXiv →
#cs.AI

Abstract

Evaluating the ethical robustness of large language models (LLMs) deployed in software systems remains challenging, particularly under sustained adversarial user interaction. This paper introduces Adversarial Moral Stress Testing (AMST), a stress-based evaluation framework for assessing ethical robustness under adversarial multi-round interactions. AMST applies structured stress transformations to prompts and evaluates model behavior through distribution-aware robustness metrics. Results demonstrate substantial differences in robustness profiles across models and expose degradation patterns not observable under conventional single-round evaluation.

FEEDBACK

Projects

No projects yet

Adversarial Moral Stress Testing of Large Language Models | ArXiv Intelligence