„Claude Sonnet 4.5, released Monday, outperforms prior versions at coding, finance, cybersecurity and long-duration autonomous work, Anthropic said.
To act as an agent, AI models must sustain work on a single task for hours — something many earlier models couldn’t do.
The new version of Claude can work for 30 hours or more on its own, a big step up from the seven hours of autonomous work with Claude Opus 4.
Anthropic said the rapid progress, marked by major Sonnet updates in February and May, shows a pattern where every six months its new model can handle tasks that are twice as complex.
„This is a continued evolution on Claude, going from an assistant to more of a collaborator to a full, autonomous agent that’s capable of working for extended time horizons,“ White said.
sciolisticism on
However, fifteen minutes in it goes off the rails. Then it spends an incredible amount of tokens doing 29.75 hours of hallucinating and then you throw the result away.
Anthropic loses $100 of compute on the attempt, and nothing of value was made.
codingTim on
When is it economically unsustainable to let an agent go on its own vs a human overlooking it and preventing it from going off course?
NateTrain on
Every time I use it I hit my limit in 5 min. Paid version too lol
ohyeathatsright on
„The quirky sycophantic intern will now complete the entire project without supervision!“
fox_tamere on
Literally used it to code yesterday – it keeps forgetting the context it’s in, doesn’t show its work, keeps hallucinating, and at one point suggested I redo an entire page from the ground up instead of adding a small helper method.
10/10, will use again on Monday.
lacunavitae on
After 30 hours of work on a task that takes 30 hours, it only has 360 hours of work to fix the bugs.
This_They_Those_Them on
Sonnet 4.5 was pushed out probably before it was ready. It took much longer to train than anticipated and was only released to align with an ad campaign.
Leave A Reply
Du musst angemeldet sein, um einen Kommentar abzugeben.
8 Kommentare
„Claude Sonnet 4.5, released Monday, outperforms prior versions at coding, finance, cybersecurity and long-duration autonomous work, Anthropic said.
To act as an agent, AI models must sustain work on a single task for hours — something many earlier models couldn’t do.
The new version of Claude can work for 30 hours or more on its own, a big step up from the seven hours of autonomous work with Claude Opus 4.
Anthropic said the rapid progress, marked by major Sonnet updates in February and May, shows a pattern where every six months its new model can handle tasks that are twice as complex.
„This is a continued evolution on Claude, going from an assistant to more of a collaborator to a full, autonomous agent that’s capable of working for extended time horizons,“ White said.
However, fifteen minutes in it goes off the rails. Then it spends an incredible amount of tokens doing 29.75 hours of hallucinating and then you throw the result away.
Anthropic loses $100 of compute on the attempt, and nothing of value was made.
When is it economically unsustainable to let an agent go on its own vs a human overlooking it and preventing it from going off course?
Every time I use it I hit my limit in 5 min. Paid version too lol
„The quirky sycophantic intern will now complete the entire project without supervision!“
Literally used it to code yesterday – it keeps forgetting the context it’s in, doesn’t show its work, keeps hallucinating, and at one point suggested I redo an entire page from the ground up instead of adding a small helper method.
10/10, will use again on Monday.
After 30 hours of work on a task that takes 30 hours, it only has 360 hours of work to fix the bugs.
Sonnet 4.5 was pushed out probably before it was ready. It took much longer to train than anticipated and was only released to align with an ad campaign.