I used to regularly test LLMs on my own curated problems from game theory, maths and physics. I used to initially post them on LinkedIn highlighting failures and also questioning the benchmarks.However I found newer versions were solving those problems easily. So I stopped posting. However, last time I asked o1 mini to solve laminar flow in an annular tube. It failed spectacularly. But o3 mini and then even lighter models solved it as if it was nothing.All this hype is BS. 1/2
If one really wants to check human level reasoning, curate the training and test data with a clear separation in time. Sure it’s hard to do but this is the only sure way of not testing on the train set. 2/2
I don’t claim to know anything about this type of research, but some of those totals in the screenshot of the original tweet don’t add up. Is that normal?
Nonsense, both my video (sold) and space companies test on the train set. I call it “LAM” dropping the “S” from SLAM, but others call it SfM. Well, it’s really not the test set but the “object” is the same.
I used LLMs to help my high school kid with his math. They are awesome solving most of the textbook problems however sometimes they write correct solution with nonsensical and wrong intermediate steps
Faith in humanity temporarily restored. Our consciousness and subjectivity still allow for better adaptation/transference. Evidence (weak) in support for this theory of why we developed comsciousness and are not automatons.
We Do Not Have Enough Compute by Max WeinbachI am not sure his thesis is correct, but this tidbit is interesting"While there is no pricing on Gemini 2.5 Pro yet, I’m hearing it’s around the same price as Deepseek R1. It’s more intelligent than o1-Pro while being around 150x cheaper as well."
Some might consider it a brilliant move.Increasing tariffs has a similar effect as increasing tax by a flat rate, regardless of incomes. So it is not progressive as income tax. The additional income would support tax reduction for corporates and high earners.So tax the poor to give to the rich, and rely on trickle-down economics.Win win!Just saying! :-)