After a little more playing around with GPT4-o, I think it's added some fun new features, but isn't significantly better at anything that really matters compared to its previous incarnations. It's also restricted for free users to an absurd and almost offensive degree : it's barely enough to tell if it would be worth paying for, let alone do anything useful with.
Some tests : I fed it links to five arXiv papers and asked it to identify the first author. It got four of them wrong, and managed to correct one more when I told it so. Interestingly, when I fed it screenshots of the same papers, it identified all the authors correctly on the first time. I deliberately cropped the screenshots differently to show different parts of the text, sometimes truncating the title/abstract and sometimes not. This didn't seem to make any difference.
When I fed it a whole PDF, it could also identify the first author correctly, but as with other LLMs capable of handling PDFs, its results to simple inquiries were a uselessly mixed bag. It got a numerical estimate from a figure completely wrong the first time, but corrected it to something far more sensible the second time (I didn't ask it to correct the value or suggest a better one, I just asked for an explanation of how it got the number - ChatGPT still insists on treating any question about its answer as a request for a correction). On another figure it just came up with a nonsense value that bore no relation to anything in the figure at all.
So nothing has really changed here. Advertising this is "free" when you get ~10 messages and maybe ~5 files every 24 hours is like Amazon's continued insistence on giving me two free months of Audible : optimistic at best. Multimodal features are nice to have but near-useless without having fixed the hallucination problems, like giving games better and better graphics but forgetting to fix bugs in the gameplay. I know, it's not the same, and hallucinations are a fundamental part of how LLMs work. But still, continuing to add ever-more and more sophisticated ways of accessing inaccurate information feels very much like a house of cards. Yes, these things are still good for inspiration, and there are many cases where accuracy isn't needed. Still I would have hoped for a bit of progress here, but there hasn't been any as far as I can tell. That's worrying.
#AI
#ChatGPT