Post

Conversation

This paper is wild - a Stanford team shows the simplest way to make an open LLM into a reasoning model. They used just 1,000 carefully curated reasoning examples & a trick where if the model tries to stop thinking, they append "Wait" to force it to continue. Near o1 at math.
Image
Image
Image
Image
David Watson 🥑
Post your reply

Cute idea, reminds me of “let’s think step by step” trick. Both lean on the language prior to steer the thoughts.
taps sign
Quote
Not Gary Marcus
@InverseMarcus
this is probably not exactly right, but maybe directionally right: there is a prompt out there in the universe that will make gpt3 as good as o3
This is cool! Now this gives superior COT like capabilities to simple models. Wull have to see how much of a time adder it is to a device level model.
Honestly not too sold on the “wait/hmm/alternatively" trick, as there doesn’t seem to be *that much* improvement per Table 4. But the dataset efforts are absolutely great — collecting and filtering 59k down to 1k samples and opensourcing them all is down right god work for the
Show more
You seen this? Under 900 examples and beats 01 at math.
Quote
The AI Veteran
@TheAIVeteran
LIMO: Less is More for Reasoning Efficient RL through careful data curation enables using 817 carefully curated reasoning traces to generate better results on math benchmarks than 100k+ traces. "We formalize the Less-Is-More Reasoning (LIMO) Hypothesis as follows: In foundation x.com/BLeavesYe/stat…
Show more
If you want to test this in action:
Quote
An Qu
@hahahahohohe
I made DeepSeek R1 Overthinker - a chatbot that lets you force r1 models to think for as long as you wish. Set a minimum thinking threshold and watch the model think about your problem for hours • Unlimited context length • Run models up to 14B on free Colab T4 Link below👇
Show more
0:11
One question if someone read the papers in detail and has better knowledge: in the supervised step, with those 1000 examples, the training is computationally demanding? I was wondering if the model sees each example 1 time or millions of times.
yep! long-time fave, think in HTML numbered tags e.g., <note,#,note>, <oops,#,note>, <fix,#,note>, <btw,#,note>, <pausing,#,note>, etc. add tags based on thread & weave as go; can multi-tag, e.g., <pause,19,collecting thoughts ><fix,4,mod title>, etc.
awesome results! the architecture is really critical, raw LLMs need to be molded using agents and deeper architecture like this. Now combine this method with 2 or more models co-operating on a problem, and watch the accuracy soar.
The AI industry thinks building better agents requires: • Massive compute • Billion-dollar training runs • Warehouse-scale infrastructure Stanford just proved everyone wrong Their breakthrough: A simple wrapper called budget forcing Forces models to think sequentially &
Show more
That means I have just seen the cost of running reading models come down 100 times. Amazing progress.
I wish they could create an alternative token. Using already meaningful words reduces the model's prompt-following ability and increases model confusion. When I am developing such models, I use nonsensical words such as "UKILAL".
Imagine being an intelligent entity and you just stopped thinking and then some external actor implants "wait" in your thoughts and you can't stop thinking? This is literally what anxiety is
If I were starting a business over today and wanted to turn $500 to $4M again (working 15 hours a week), I'd do this: (Step 3 is key) STEP 1: Not blindly imitate someone else’s business Almost two decades ago I started a marketing agency, imitating a winner in a similar
Show more
Image
interesting findings! we actually tried something similar at jenova ai while testing our model router. found that while this trick helps, the results aren't quite as consistent as using specialized models like o3-mini or claude 3.5 sonnet for complex reasoning. the "Wait" prompt
Show more
R1: 90% Cheaper Than O1—And It Learns to Reason Without All Those Pre-Labeled Examples! Thread on why this destroys the "hitting a wall" argument and what this could mean for AI in 2025🧵👀
Image
On a related note, with a simple prompt template change the R1-distill-llama-8B can be forced to think for longer achieving way better results. It thinks from 2x to 10x longer and solves reasoning problems that 100B models can't solve.
Think Long > Work hard Also adds a new layer of meaning to something being thoughtful aka thought-full.
What I love about this is it tells me that I don't necessarily need to work harder -- but think for longer -- for better outcomes in my life.
For those who want a non technical summary of the paper : It explores a new way to improve AI reasoning without retraining it, called test-time scaling. Instead of spending more time and resources training an AI model, this method allocates more computing power while the model
Show more
Thought is the ultimate latent variable we have all been looking for. Every model output backed by thought will be more accurate and explainable. For the first time, I feel AGI is possible and and I am excited and fearful at the same time.
With interrupted thought humans too can gain superpowers. The problem with us may be too much (uninterrupted)thinking.
Stanford's paper is interesting. Budget forcing helps models reason better without massive retraining. But let's not overhype it. It doesn’t crush 175B models or exceed GPT-4. It just nudges smaller models to think more carefully.
How close to the inflection point are we getting? It feels like there's a new paper for efficiency gains or a stepwise increase in capability every other week. Is it when we get 10X Deep research at pennies per run, and agents start writing the papers every day?
how does one even interfere with inference time activities of the llm? how can i force chatgpt or claude to take some conditional actions during the inference
It needs a contrast character to distinguish placement. (If you follow me regularly,) like yoox shipping system.
Do they append "wait" when they see that the answer is going to be wrong, or answer-agnostically to force the model to exhaust some fixed compute budget?
Read THIS before you make a mistake with your first franchise. Looking at 10+ options? I know you want to be thorough, but you'll become overwhelmed QUICKLY. Do this instead: 1. Define your goals. 2. Set a realistic budget. 3. Focus on the lifestyle you want. Eliminate
Show more
That's what R1 is doing. It regularly self-prompts within its reasoning with: "wait," and "another approach could be," and a few others. Sometimes you can see it walk in circles. I think it's trained in, but maybe just part of the service. That's how diligent humans work too.
I wonder if this paper is why google stopped providing the thinking tokens in the API.