Fortgeschrittene KI-Modelle können die grundlegende Aufgabe, eine analoge Uhr zu lesen, nicht erfüllen. Dies zeigt, dass, wenn ein großes Sprachmodell mit einer Facette der Bildanalyse Schwierigkeiten hat, dies zu einem Kaskadeneffekt führen kann, der sich auf andere Aspekte seiner Bildanalyse auswirkt

    https://spectrum.ieee.org/large-language-models-reading-clocks

    Share.

    11 Kommentare

    1. Mythril_Zombie on

      Large Language Models don’t analyze images. It’s literally in the name.
      Read the article next time before editorializing.

    2. Wow I’ve just been having a “ discussion“ with Claude AI about it’s inability to understand time, and it’s failure to help people as a result. It „seemed“ to recognize it’s ultimate failure i this an conclude our chat with „**“Time changes us“** – humans are different people tomorrow than today. Experiences accumulate. Perspectives shift. What felt important today might feel different next week.

      I can’t model that because I don’t persist between conversations. I can’t learn what you’re like „over time“ – only what you’re like **right now**.

      That’s a massive blind spot.“

      It doesn’t really understand it has a blind spot.

      Heh. It’s just bloviating AI.

    3. In the paper they state the model has no problem actually reading the clock until they start distorting it’s shape and hands. Also stating that it does fine again, once it is fine-tuned to do so.

      >Although the model explanations do not necessarily reflect how it performs the task, we have analyzed the textual outputs in some examples asking the model to explain why it chose a given time.

      It’s not just „not necessarily,“ it does not in any way shape or form have any sort of understanding at all, nor does it know why or how it does anything. It’s just generating text, it has no knowledge of any previous action it took, it does not have memory nor introspection. It does not think. LLMs are stateless, when you push the send button it reads the whole conversation from the start, generating what it calculates to be the next logical token to the preceding text without understanding what any of it means.

      That language of the article sounds like they don’t actually understand how LLMs work.

      The paper boils down to, MLMM is bad at thing until trained to be good at it with additional data sets.

    4. theallsearchingeye on

      God these “studies” are so disingenuous. Language models, Transformers models et al. can incorporate components of neural networks to accomplish this very task, not to mention just training a model to perform this very task can be done.

      Perhaps the most interesting detail of AI research is that traditional academia is not equipped to actually “keep up” at all. Researchers that think they can take 2 years to perform a study release findings based off out of date models (like the study here) and literally prove nothing other than, “early iterations of technology are not as good as newer ones”.

      This era of bad faith studies on AI cannot come to an end fast enough. Stop trying to act like this technology is going nowhere and instead assume it’s going everywhere and *solve THAT problem*.

    Leave A Reply