Create
Notifications
Profile
Pin
More
More
Back
yannlecun's profile picture
Never test on the training set.
May be an image of text
Like
786
Comment
29
Repost
55
Share
48
sung.kim.mw's profile picture
May be a graphic of poster, calendar and text
Like
92
Comment
2
Repost
3
Share
1
mshonle's profile picture
Wait, wouldn't that be never train on the test set, or am I missing the point?
Like
5
Comment
2
Repost
Share
borun.d.chowdhury's profile picture
I used to regularly test LLMs on my own curated problems from game theory, maths and physics. I used to initially post them on LinkedIn highlighting failures and also questioning the benchmarks.However I found newer versions were solving those problems easily. So I stopped posting. However, last time I asked o1 mini to solve laminar flow in an annular tube. It failed spectacularly. But o3 mini and then even lighter models solved it as if it was nothing.All this hype is BS. 1/2
Like
3
Comment
1
Repost
Share
borun.d.chowdhury's profile picture
If one really wants to check human level reasoning, curate the training and test data with a clear separation in time. Sure it’s hard to do but this is the only sure way of not testing on the train set. 2/2
Like
Comment
Repost
Share
steriana's profile picture
Huh...that's exactly what final exams in college are.
Like
22
Comment
1
Repost
Share
doyouknowmeforsure's profile picture
I am right now in my machine learning class and everybody is presenting their projects.
Like
2
Comment
1
Repost
Share
doyouknowmeforsure's profile picture
it's the final project that my professor wants. I built an MLP trained on the MNIST data set!
Like
1
Comment
1
Repost
Share
trqianf's profile picture
Time to add protoPNets! 🙃
Like
1
Comment
Repost
Share
zshao's profile picture
Maybe the US Math Olympiad problems were selected based on how badly the public available models do. Just speculating.
Like
2
Comment
Repost
Share
meliusvivere's profile picture
They didn’t test on Gemini 2.5?
Like
1
Comment
1
Repost
Share
danielogbuigwe's profile picture
It just came out. Research probably was carried out in the previous weeks.
Like
2
Comment
1
Repost
Share
traveelravioli's profile picture
Interesting
Like
Comment
Repost
Share
supercake.ai's profile picture
shooketh
Like
1
Comment
Repost
Share
_wgljr's profile picture
I don’t claim to know anything about this type of research, but some of those totals in the screenshot of the original tweet don’t add up. Is that normal?
Like
1
Comment
Repost
Share
gkasperf's profile picture
Surprising a total of 0 people
Like
3
Comment
1
Repost
Share
nag3lt's profile picture
Unfortunately, the total number or LLM hype-men and fanboys is much farther away from zero, than we'd like
Like
Comment
Repost
Share
gary.bradski's profile picture
Nonsense, both my video (sold) and space companies test on the train set. I call it “LAM” dropping the “S” from SLAM, but others call it SfM. Well, it’s really not the test set but the “object” is the same.
Like
Comment
Repost
Share
kshirsagarmahesh's profile picture
I used LLMs to help my high school kid with his math. They are awesome solving most of the textbook problems however sometimes they write correct solution with nonsensical and wrong intermediate steps
Like
Comment
Repost
Share
fivetrp's profile picture
Even more important: Never train on the test set 😂 (ARC-AGI much?)
Like
Comment
Repost
Share
walulyajfrancis's profile picture
💯data overfiting, you need to challege the model with data it has never seen before
Like
1
Comment
Repost
Share
jpraderad's profile picture
Faith in humanity temporarily restored. Our consciousness and subjectivity still allow for better adaptation/transference. Evidence (weak) in support for this theory of why we developed comsciousness and are not automatons.
Like
1
Comment
Repost
1
Share
dikaiosvne's profile picture
Verified
wow, what.
Like
Comment
Repost
Share
jpagano569's profile picture
Isn’t this like grading if a fish can climb a tree?
Like
Comment
Repost
Share
daniel.sum's profile picture
I wanna see a math AI try the Putnam exam. Cats can’t do that either.
Like
Comment
Repost
Share
Related threads
wongmjane's profile picture
Verified
Edited
發覺有啲海外港人鍾意鬥邊個嚟得耐啲其實有冇個計分系統嚟判斷邊個「勁啲」?以前喺香港嘅出身、住邊度、讀邊間名校係咪會carry over嚟呢邊做base score?有冇個formula?  
Translate
Like
37
Comment
3
Repost
1
Share
6
sung.kim.mw's profile picture
We Do Not Have Enough Compute by Max WeinbachI am not sure his thesis is correct, but this tidbit is interesting"While there is no pricing on Gemini 2.5 Pro yet, I’m hearing it’s around the same price as Deepseek R1. It’s more intelligent than o1-Pro while being around 150x cheaper as well."
Like
2
Comment
1
Repost
Share
benedictevans's profile picture
I just saw a slide that talked about 'product dynamicity'
Like
15
Comment
5
Repost
Share
justinwolfers's profile picture
Trump Says Recession Unfortunate But Necessary Step To Get To Depression
Like
2K
Comment
36
Repost
183
Share
42
nam.n.nguyen's profile picture
Some might consider it a brilliant move.Increasing tariffs has a similar effect as increasing tax by a flat rate, regardless of incomes. So it is not progressive as income tax. The additional income would support tax reduction for corporates and high earners.So tax the poor to give to the rich, and rely on trickle-down economics.Win win!Just saying! :-)
Like
24
Comment
8
Repost
Share
1
Log in to see more replies.