KingofHazor said:Logos Stick said:
For the others on this board....
that image about integer arithmetic comes from a 2023 research paper titled "GPT Can Solve Mathematical Problems Without a Calculator".
Might as well publish an example from 1990.
Grok4 - released in July of last year - scored 100% on the AIME 2025. AIME is a notoriously difficult early-college level math competition exam used to qualify students for the US Math Olympiad team. GROK 4 aced it!
Your claim is just cherry picking an old, narrow benchmark to make a broad negative point about AI capabilities. It was legit criticism in 2023-2024, but it doesn't hold water with frontier 2025-2026 models.
You guys can claim accuracy by AI all you want but my frequent use of Gemini, Claude, Grok, ChatGPT, and Elicit show that they remain replete with all kinds of errors and are absolutely untrustworthy.
The anecdotal stories of AIs failing some test and then passing it a year later with flying colors sounds like the AIs are being revised specifically to pass those tests they failed, without fixing the underlying problems that cause them to fail multiple different types of tests. It's reminiscent of stock traders tweaking their models to perform 100% of historical data, but then the models fail 100% of the time on real-time trades.
Unlike your argument - "my frequent use shows AI just doesn't work" - my post is not anecdotal. It's a real word math test that GROK4 aced. Of course the models are being improved over time. Why do you consider that a negative? LOL. Makes no sense.