Post
As a fan of weird AI benchmarks, I like MCBench, where you vote on which LLM makes the best Minecraft build based on a prompt
Also interesting how much every leaderboard converges no matter what metric: Claude 3.7 & 3.5 and GPT-4.5 lead here, too. Suggests an underlying characteristic. mcbench.ai
March 18, 2025 at 7:24 PM
4 reposts
1 quote
48 likes
I regret to announce that the meme Turing Test has been passed
LLMs produce funnier memes than the average human, as judged by humans. Humans working with AI get no boost (a finding that is coming up often in AI-creativity work) The best human memers still beat AI, however. arxiv.org/abs/2501.11433
I initially read this as "Weird Al" benchmarks (e.g., song parodies, polka covers, food puns, etc.), which could possibly be useful in their own right ...