Post

Conversation

The longer I’ve spent time with this paper the bigger of a deal it seems. The economic implications are quite significant. This is a frontier expert task. This is Qwen3-235B.
Image
Quote
Mira Murati
@miramurati
Bridgewater used their unique financial knowledge and partnered with us on @tinkerapi to fine-tune a model that helps their analysts focus on what's important. Experts improving AI that empowers experts. thinkingmachines.ai/news/learning-
David Watson 🥑
Post your reply

Can you say why? Haven't dug too deeply into it, but seems unsurprising that a model fine-tuned on a specific task does better on that task. The question is usually one of "will the dedicated model see enough demand for the cost of hosting to be worth it". Which might be true for
My general question is whether they’re overfitting to their internal benchmark. How do we know? Determining what’s legitimate financial news also seems like a moving target to me.

Discover more

Sourced from across X
A lot to unpack here. Anthropic is burying some hard truths in careful political language. Some initial reads: 1) Anthropic verifies that none of the jailbreaks provided a capability beyond what many other models, including Chinese models, could do.
Quote
Anthropic
@AnthropicAI
Claude Fable 5 will be available again globally tomorrow. After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding
A small pet peeve It's really silly when non-academics interpret being chair of an academic department as a mark of seniority That's not the way academia works! Chairs are critical administrators but it's more like being the chair of a co-op board than Director at Meta
Update: I've joined and taken leave from the university. Excited to work with many talented, mission-driven people on the defining technology of our time.