Zukunftsforschung

Agentenfehlausrichtung: Wie LLMs Insider -Bedrohungen sein könnten \ Anthropic

05.10.2025

No_Pineapple_4719 on 05.10.2025 11:38 a.m.

This article from Anthropic discusses the potential for LLMs to be used maliciously by insider threats within organizations. It raises important questions about AI safety and security as these models become more advanced. This is crucial for understanding of AI development.
Fishtoart on 05.10.2025 11:45 a.m.

To be honest, at this point, I’m more afraid of misaligned human beings. They are actually trying to start a civil war, and nobody in the government is doing anything about it. The danger from misaligned LLMs is definitely something to be concerned about, but the people and corporations who are willing to do anything to get richer are going to use LLMs to manipulate people to that purpose. That is the main short term danger.
howdoigetauniquename on 05.10.2025 5:30 p.m.

My favourite part of this is how they bury the lede:
„However, there are important limitations to this work. Our experiments deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm.“

the AI was only ever given two choices in these scenarios. They don’t provide what these two choices were.

„Additionally, our artificial prompts put a large number of important pieces of information right next to each other.“

Reading this, it kind of sounds they coericed the model into making the bad choices.