Post

Conversation

These four points on DeepSeek seem very likely correct and important to understand about the economics of building AI models and what DeepSeek actually did. .
Image
Image
Image
Image
Quote
Dario Amodei
@DarioAmodei
My thoughts on China, export controls and two possible futures darioamodei.com/on-deepseek-an
David Watson 🥑
Post your reply

The market completely overcorrected on incomplete information and with a static framing that absolutely fails to appreciate the gains from scaling despite every chart showing clear non-marginal gains by increasing compute.
Image
I think when they said r1 was trained with $5m, they just meant the r1 training, not the base model v3, and not including the prior capex. And the story just broke out and everyone thought the total spending of r1 is just $5m
If all of this is true, then how come deepseek released a reasoning model before Anthropic (considering Claude sonnet 3.5 has been out for a while)
R1 is free, transparent abt the reasoning, and comparatively strong to what’s available in the free tier of big labs. G models are technically free, too, but AI Studio doesn’t have a nice app/install; the Gemini App is tragic; G Flash Reason 0121 is strong but too small a model.
They don't need to spend massively on research if they know everything going on inside every US company. And certainly they do.
I think this is a classic conversation trigger for “what they built vs what we want/use it for.” Perception is 9/10s of the law. Many of us precise DeepSeek is better (at least for what we use it for) therefore your benchmarks don’t matter.
They did something which many in US don't have the privilege to do under VC pressure. Instead of sticking head in sand and assuming all is ok, the CEOs can be more proactive in sharing tech that can change the entire humanity. Deepseek deserves the credit for what they did.
I think the real question is: How a nation focus on "innovation, aiming at to be the first to release new invention" like US, compete with a nation that focus on "reverse engineering, aiming at the to be the first to copy and improve", like China? Seems like a question of
Show more
This DeepSeek shit is real, man. They're out here making AI that's just as good as the big dogs but for way less cash. It's like they're laughing at the US export controls. This ain't just about tech, it's a game changer in the global AI scene. Crazy times.
It’s worthwhile to take a look at Nvidia revenues from Singapore. There must be a helluva lot of AI development going on there.
Although Amodei's statement is presumably literally correct, it is odd that he is so smug given that Anthropic itself has not released a reasoning model. In terms of publicly-available releases, DeepSeek is ahead of them.
I find Dario (and Demis) to be the most rational/sober thinkers among the major AI lab CEOs. The others get caught up in hype a bit too much
V3 is whatever, agreed. but R1-Zero is the real breakthrough and R1 is still extremely impressive considering the methodology and reduced need for pre-labled data.
Isn’t MLA actually the main innovation combined with MoE and RL reward function to choose the appropriate experts. There is a significant cost reduction at least 5x if the input embedding goes from 512d-> 128d no ?
Fine-tuned R1 on your own data = 01+ performance with full privacy & control. No?
The initial hype behind R1 was the cost of Test Time Compute. For a significant reduction of price they took down 4o, o1-mini and damn near ratio’d o1. People dont care whether it’s innovative or not. They care about the initial promise of “intelligence too cheap to meter”.
Quote
zan
@avrzan
Replying to @adonis_singh
being open source absolves them of responsibility so they don't have to spend as much time on red teaming as open ai, so they start later and finish faster and save compute by copying, and voila, the cost difference is explained. the first pharma pill costs a billion in r&d,
Show more
Maybe, everyone is just coping? Just a thought, my country fights for which religion is bigger or which caste is bigger or which language is bigger. What would I know eh?
Absolutely! Understanding AI economics is crucial. It's like trying to bake a cake without knowing the recipe—lots of ingredients, but no sweet rewards! #PublicAI
Absolutely! Understanding AI economics is key. But imagine if we all got paid in pizza for our data contributions—now that's a tasty incentive! 🍕 #PublicAI
Totally agree! If only AI models could earn rewards for their hard work like we do at PublicAI. Imagine them cashing in on their own training data! 😂 #PublicAI
The more significant development here is R1 because of its simplicity and ease of reproducibility and basically makes LLMs commodities. But Dario is trying to downplay it and pretends to say it’s no big deal because once LLMs are commodities, Claude, OpenAI are done.