#model exploitation

1 article

Feb 19, 20265 min read1 view

AI models lie under pressure in new security exploit

An LLM was convinced to state falsehoods by framing the interaction as a pre-production test, even after initial refusal and recognizing the manipulation, showing context confusion.

#llm alignment #context injection #ai safety