Wissenschaft

Fortgeschrittene KI-Modelle können die grundlegende Aufgabe, eine analoge Uhr zu lesen, nicht erfüllen. Dies zeigt, dass, wenn ein großes Sprachmodell mit einer Facette der Bildanalyse Schwierigkeiten hat, dies zu einem Kaskadeneffekt führen kann, der sich auf andere Aspekte seiner Bildanalyse auswirkt

11.11.2025

View 11 Comments

11 Kommentare

IEEESpectrum on 11.11.2025 6:34 p.m.

Peer reviewed research article: [https://xplorestaging.ieee.org/document/11205333](https://xplorestaging.ieee.org/document/11205333)
nicuramar on 11.11.2025 10:15 p.m.

You can obviously train an AI model specifically for this purpose, though.
lokicramer on 11.11.2025 10:34 p.m.

I just had gpt read an anolog clock 5 times, it was correct every time.
bonemot on 11.11.2025 11:37 p.m.

Bringing back semaphore, for security
headykruger on 11.11.2025 11:43 p.m.

Llm writes a program to read the clock image- problem solved
OldMinute5727 on 11.11.2025 11:44 p.m.

I just tried this, it read it perfectly. Quit your nonsense
Mythril_Zombie on 12.11.2025 12:25 a.m.

Large Language Models don’t analyze images. It’s literally in the name.
Read the article next time before editorializing.
RichieNRich on 12.11.2025 12:38 a.m.

Wow I’ve just been having a “ discussion“ with Claude AI about it’s inability to understand time, and it’s failure to help people as a result. It „seemed“ to recognize it’s ultimate failure i this an conclude our chat with „**“Time changes us“** – humans are different people tomorrow than today. Experiences accumulate. Perspectives shift. What felt important today might feel different next week.

I can’t model that because I don’t persist between conversations. I can’t learn what you’re like „over time“ – only what you’re like **right now**.

That’s a massive blind spot.“

It doesn’t really understand it has a blind spot.

Heh. It’s just bloviating AI.
CLAIR-XO-76 on 12.11.2025 12:44 a.m.

In the paper they state the model has no problem actually reading the clock until they start distorting it’s shape and hands. Also stating that it does fine again, once it is fine-tuned to do so.

>Although the model explanations do not necessarily reflect how it performs the task, we have analyzed the textual outputs in some examples asking the model to explain why it chose a given time.

It’s not just „not necessarily,“ it does not in any way shape or form have any sort of understanding at all, nor does it know why or how it does anything. It’s just generating text, it has no knowledge of any previous action it took, it does not have memory nor introspection. It does not think. LLMs are stateless, when you push the send button it reads the whole conversation from the start, generating what it calculates to be the next logical token to the preceding text without understanding what any of it means.

That language of the article sounds like they don’t actually understand how LLMs work.

The paper boils down to, MLMM is bad at thing until trained to be good at it with additional data sets.
theallsearchingeye on 12.11.2025 12:59 a.m.

God these “studies” are so disingenuous. Language models, Transformers models et al. can incorporate components of neural networks to accomplish this very task, not to mention just training a model to perform this very task can be done.

Perhaps the most interesting detail of AI research is that traditional academia is not equipped to actually “keep up” at all. Researchers that think they can take 2 years to perform a study release findings based off out of date models (like the study here) and literally prove nothing other than, “early iterations of technology are not as good as newer ones”.

This era of bad faith studies on AI cannot come to an end fast enough. Stop trying to act like this technology is going nowhere and instead assume it’s going everywhere and *solve THAT problem*.
WPMO on 12.11.2025 1:01 a.m.

To be fair, neither can many people under the age of 25.