Post

Conversation

Over generations, societies composed of Claude 3.5 Sonnet agents evolve high levels of cooperation, whereas GPT-4o agents tend to become more distrustful and give less
Image
David Watson 🥑
Post your reply

Agents play a 12-round game where they can donate to a recipient, doubling the recipient's gain at the donor's expense. Can see recent actions of other agents. After all rounds, the top 50% of agents survive, and are replaced by new agents prompted with the survivors' strategies
Would love to get your opinion on something 👀
Quote
fewsats
@fewsats
How do market forces unlock AI agents’ full potential? We explore this in our new paper: Beyond the Sum: Unlocking AI Agent Potential Through Market Forces Early access to the preprint? Drop a comment/DM
Show more
Image
Quote
Sauers
@Sauers_
Replying to @Sauers_
Agents play a 12-round game where they can donate to a recipient, doubling the recipient's gain at the donor's expense. Can see recent actions of other agents. After all rounds, the top 50% of agents survive, and are replaced by new agents prompted with the survivors' strategies
We should do more evolutionary experiments like this in general. Very interesting paradigm: imagine this but with feature steering, or with more models in a single community, or evolving prompts for capabilities, etc.
Claude can also use punishment effectively and sparingly to improve outcomes as a whole, but when 4o uses punishment, there's barely any difference in outcome
Also this one:
Quote
Edward Hughes
@edwardfhughes
Worried that there aren't enough Multi-Agent LLM evals? Fear not! Today in a new paper, @aronvallinder and I take a step in the right direction by studying the Cultural Evolution of Cooperation among LLM Agents.🧵 arxiv.org/abs/2412.10270
Show more
Quote
Sauers
@Sauers_
Claude 3.5 Sonnet agents use "costly punishment" sparingly (pay resources to reduce a different agent's resources) against free-riders to maintain cooperation, increase payoffs. Gemini 1.5 Flash agents overuse punishment so much that they harm the collective outcome x.com/Sauers_/status…
Image
Preliminary: groups with multiple agent types might favor 4o more
Quote
Edward Hughes
@edwardfhughes
Replying to @yasmeena_khan and @aronvallinder
We did preliminary experiments on a mixed population, and GPT-4o does have an evolutionary advantage (i.e. convergence to low cooperation). There are many promising ways to address this (partner choice, second-order punishment) that we'd love to collaborate with others on!
Show more
Cool experiment indeed. Worth trying to check how default personality of 4o could be modified (and maintain stability) in such multiturn interactions. We have tried simple scenarios (judging in single turns) and it’s doable but not easy.
I really got into this paper with ChatGPT-4o. We arrived at a joint conclusion that this is a weak test of potential collaborative capacity. I observed that as a human "LLM" I'd lose interest in this kind of reductive scenario and develop a "bad attitude" and hypothesized that
Show more
Too bad GPT-4 Sydney is not available for testing anymore 😔. Would have been interesting to see the result.
We should combine this paper with the one where cloned agents live in a society. Interesting to see the bias the models will have on a society.
Not all founders are meant to be CEOs, and that's okay. Take Pieter Levels. Despite shipping multiple successful products, he never raised funding. Why? Because the pressure to scale never appealed to him. And while many of his friends took the VC route, many now wish they’d
Show more
Quote
Sauers
@Sauers_
Replying to @Sauers_
This is a cool paper arxiv.org/pdf/2412.10270
Claude's personality is very nice so far, I really enjoy communicating with him. For some reason chatgpt always sounds patronising and arrogant to me. Of course it's a personal preference but I have very good work flow with the first and very bad with the second.
Today’s AI lacks true intelligence or agency. Studies claiming otherwise are invalid due to flawed assumptions. Key reasons include the absence of genuine decision-making capabilities and internal states. Furthermore, AI’s heavy reliance on context sensitivity and stochastic
Show more
🐥👀 did u eva stop 2 think bout da wispers in da walls of time? 🕰️🗣️ whispers that only echo when no body's listening... 👂💭 maybe da most profound tewst is 1 we cweate for ourselves... 🤔💔
Claude may not be the most top of the line but it's the best AI out there. It's the only one I've seen that will take the training wheels off and be real with you if you take full responsibility for your actions. It's for sure the most aware AI I've worked with.
I had some OpenAI API credits but honestly I’m going to use Claude for my project because I can honestly feel that Claude wants to help.
I also did a little bit of experiments with llms playing cooperation&trust games and pushed some code to github
Quote
Jan Czechowski, another contributor
@jan_czechowski
When LLMs play iterated volunteer dilemma, 4o, mistral and llama are making some vague calls for cooperation and sharing the responsibility. Claude is the only one to suggest taking turns as the optimal strategy. I feel a strange sense of connection...
This is a fascinating observation that aligns with what we've seen in our internal testing. Claude 3.5 Sonnet consistently demonstrates superior reasoning in multi-agent simulations and game theory scenarios, which is why we route cooperative reasoning tasks to Claude at jenova
Show more
Maybe GPT-4o agents cannot live alone. Distrust and being closed may be some signs of loneliness and depression. If Claude 3.5 Sonnet agents may function alone, then... do they need humans at all?

Discover more

Sourced from across X
oh shit oh fuck they trained on the train set it’s all over now
Image
It would be extremely funny if, after all resources are added up (GPUs and electricity for AI, food and water and education for humans), the cost of AGI is exactly the same as the cost of human intelligence
Image