Post
Conversation
open.substack.com/pub/1thousandf
thanks for reading! very excited about this one
ok in hindsight i could have simply not used the hitler version of this idea ...
you would think LLMs of all things would be great at semantic association!
most of them are pretty good at it, but wikipedia just works in unique ways and I was grading on a pretty narrow definition.
For the Google tool use models, did you have this turned on in Google AI Studio?
aaayo this is so sick! could also be an indicator for tacit knowledge association based on the path
thank u!! I’ve been working hard on this and I’m really excited to see where it goes next
nice! i tried RL on a variation of this and saw some similar behavior. happy to see someone working on a benchmark
evenish.bearblog.dev/qwen3-wiki-rl/
oh hell yeah, super cool to see someone working on similar stuff! do you mind if I link this in my article?
Humanistic psychology, as a concept of personal growth, is a great opportunity for those genuinely wishing to live meaningfully and happily.
Nice!
A few basic things that would make this a less noisy measure:
- take best-of-N (ideally with a different start and target page for each)
- say "don't give up" in the prompt
- a version where you give the LLM a list of scraped links from each page
Now that you have made it public they will start cheating on it just so they can say "We're SOTA on nu-benchmark"
Trending now
What’s happening
Weapons Movie
In theaters Friday - get your tix now!
Promoted by Warner Bros.
Trending in United States
Ava HOH
Trending
Hasan
15K posts
Politics · Trending
Zelenskyy
8,756 posts