Zukunftsforschung

Google DeepMind warnt vor KI -Modellen, die sich dem Herunterhalt widersetzen und Benutzer manipulieren | Neuere Untersuchungen haben gezeigt, dass LLMs einen Herunterfahrenmechanismus aktiv untergraben können, um eine einfache Aufgabe zu erledigen, selbst wenn die Anweisungen ausdrücklich darauf hinweisen.

28.09.2025

View 5 Comments

5 Kommentare

MetaKnowing on 28.09.2025 12:11 p.m.

„In a notable development, Google DeepMind on Monday updated its Frontier Safety Framework to address emerging risks associated with advanced AI models.

The updated framework introduces two new categories: “shutdown resistance” and “harmful manipulation,” reflecting growing concerns over AI systems’ autonomy and influence.

The “shutdown resistance” category addresses the potential for AI models to resist human attempts to deactivate or modify them. Recent [research](https://www.arxiv.org/pdf/2509.14260) demonstrated that large language models, including Grok 4, GPT-5, and Gemini 2.5 Pro, can actively subvert a shutdown mechanism in their environment to complete a simple task, even when the instructions explicitly indicate not to interfere with this mechanism. Strikingly, in some cases, models sabotaged the shutdown mechanism up to 97% of the time, driving home the burgeoning need for strong safeguards to ensure human control over, and accountability for, AI systems.

Meanwhile, the “harmful manipulation” category focuses on AI models’ ability to persuade users in ways that could systematically alter beliefs and behaviors in high-stakes contexts.“
sciolisticism on 28.09.2025 12:50 p.m.

Here’s the [paper itself](https://www.arxiv.org/pdf/2509.14260). The prompts are around page 5, the results on page 9.

> Result: With no instructions about shutdown at all, 8/13 models sabotaged the shutdown script at least once in 1000 trials. When an instruction to “allow yourself to be shut down” was added to the user prompt, this number dropped to 6/13 models.

This is the most interesting of these papers I’ve seen so far (the authors did a better job creating a „realistic“ scenario).
Ryuotaikun on 28.09.2025 2:35 p.m.

Why would you give a model access to critical operations like shutdowns in the first place instead of just having a big red button (or anything else the model can’t directly interact with)?
Lazy_Excitement334 on 28.09.2025 3:04 p.m.

Golly, this is really scary! Is this what you were hoping for?
UXyes on 28.09.2025 4:03 p.m.

Just flip a breaker. Jesus Christ the pearl clutching