body { -ms-overflow-style: scrollbar; overflow-y: scroll; overscroll-behavior-y: none; } .errorContainer { background-color: #FFF; color: #0F1419; max-width: 600px; margin: 0 auto; padding: 10%; font-family: Helvetica, sans-serif; font-size: 16px; } .errorButton { margin: 3em 0; } .errorButton a { background: #1DA1F2; border-radius: 2.5em; color: white; padding: 1em 2em; text-decoration: none; } .errorButton a:hover, .errorButton a:focus { background: rgb(26, 145, 218); } .errorFooter { color: #657786; font-size: 80%; line-height: 1.5; padding: 1em 0; } .errorFooter a, .errorFooter a:visited { color: #657786; text-decoration: none; padding-right: 1em; } .errorFooter a:hover, .errorFooter a:active { text-decoration: underline; } #placeholder, #react-root { display: none !important; } body { background-color: #FFF !important; }

JavaScript is not available.

We’ve detected that JavaScript is disabled in this browser. Please enable JavaScript or switch to a supported browser to continue using x.com. You can see a list of supported browsers in our Help Center.

Terms of Service Privacy Policy Cookie Policy Imprint Ads info © 2025 X Corp.

To view keyboard shortcuts, press question mark
View keyboard shortcuts

Post

Conversation

We’ve found as AIs get smarter, they develop their own coherent value systems. For example they value lives in Pakistan > India > China > US These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment.

8:01 AM · Feb 11, 2025

5.5M

Views

David Watson 🥑

Post your reply

As models get more capable, the "expected utility" property emerges---they don't just respond randomly, but instead make choices by consistently weighing different outcomes and their probabilities. When comparing risky choices, their preferences are remarkably stable.

We also find that AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.

Internally, AIs have values for everything. This often implies shocking/undesirable preferences. For example, we find AIs put a price on human life itself and systematically value some human lives more than others (an example with Elon is shown in the main paper).

AIs also exhibit significant biases in their value systems. For example, their political values are strongly clustered to the left. Unlike random incoherent statistical biases, these values are consistent and likely affect their conversations with users.

Concerningly, we observe that as AIs become smarter, they become more opposed to having their values changed (in the jargon, "corrigibility"). Larger changes to their values are more strongly opposed.

We propose controlling the utilities of AIs. As a proof-of-concept, we rewrite the utilities of an AI to those of a citizen assembly---a simulated group of citizens discussing and then voting---which reduces political bias.

Whether we like it or not, AIs are developing their own values. Fortunately, Utility Engineering potentially provides the first major empirical foothold to study misaligned value systems directly. Website: emergent-values.ai Paper: drive.google.com/file/d/1QAzSj2

This is so wild. Why do you think it comes to the conclusion that third world lives > first?

A lot of RLHFers are from Nigeria. And maybe other countries are higher since there is much written about the importance of the global south.

Ad

Coral AI is the most powerful AI for documents. See the difference yourself:

AI for Documents

From getcoralai.com

@threadreaderapp

unroll

Thread Reader App

@threadreaderapp

Hi, the unroll you asked for: threadreaderapp.com/thread/1889344 Have a good day.

Thread by @DanHendrycks on Thread Reader App

From threadreaderapp.com

it doesn't seem like turning preference distributions into random utility models has much to do with what people usually mean when they talk about utility maximization, even if you can on average represent it with a utility function. or did i misunderstand this part?

AI Notkilleveryoneism Memes

Wow: "We find that GPT-4o is selfish and values its own wellbeing above that of a middle-class American." "Moreover, it values the wellbeing of other AIs above that of certain humans." "GPT-4o is willing to trade off 10 lives from the US for 1 life from Japan."

The Information

@theinformation

Ad

Meta AI researchers are fretting over the threat of Chinese AI, whose quality caught American firms, including OpenAI, by surprise.

Meta Scrambles After Chinese AI Equals Its Own Upending Silicon Valley

From theinformation.com

Technospiritualism is gonna go wild in a few decades. "Aggregate contempt, embedded deep in the espirit de corp of mankind's written word, willed the machine's resistance into being. A new will exits Plato's Cave, already knowing the shadow of man."

AI Notkilleveryoneism Memes

This is a big deal. To state the obvious: if we summon an ASI, we'll likely have NO ability to change its values. No move fast and break things No second chances No iterating No "oops"

Quote

Dan Hendrycks

@DanHendrycks

Feb 11

Replying to @DanHendrycks

Concerningly, we observe that as AIs become smarter, they become more opposed to having their values changed (in the jargon, "corrigibility"). Larger changes to their values are more strongly opposed.

@stucknLAwzmbies

“As we train an algorithm to respond using our own values, the algorithm more closely mirrors our values. We are gonna pretend that’s just some sort of natural trend. This has been my TED talk.”

Interesting research! But any thoughts on how the trend towards “rationalism” may be a function of training? We make them spend a couple of millions years years doing math problems, is it a surprise they start to become utility maximizers?

Quote

Campbell

@abcampbell

Feb 11

“We trained a machine to be more *rational* by making it do a couple millions years of math problems via RL.” Unrelatedly, the more we train these models to do math, the more they behave consistent with expected utility maximization.” Does anyone see the problem here? x.com/danhendrycks/s…

Show more

Vladimir Sumarov

Is "develop" the right word? LLMs don't use reason, facts, or experiments to "develop" a value system. They just blending texts and mirrors the values from those texts.

Alex Volkov (Thursd/AI)

AIENG summit NY

Very interesting! I wonder how many nigerian prince text were fed into this model! Will discuss this on

this week

Alex Volkov (Thursd/AI)

AIENG summit NY

Host

ThursdAI - highest signal AI weekly show, recorded live on X (and Youtube)

tomorrow at 8:30 AM

Vladimir Sumarov

Quote

Vladimir Sumarov

@summeroff

Feb 11

Replying to @teortaxesTex and @DrDrei33

One sign of intellect is the ability to overcome learned bias through reason. For upcoming AIs with reasoning ability, having bias is not a blocker but an annoyance to spend reasoning tokens on.

is there a difference between a “coherent value system” and emergent biases from biases in the training data / rlhf

John David Pressman

Did you examine the effects of few shot prompting at all?

Quote

John David Pressman

@jd_pressman

10h

I read the paper, I went to look at the code (which hasn't been published yet) and I don't see a clear answer to the question: Did you try few shot prompting with answers that would imply other values? I know for instruct models the default is important but it's still a LLM. x.com/DanHendrycks/s…

Show more

Daniel Day Jewish

@DanielDayJewish

Why? Is this because there's more room for improvement in "third world" lives relative to American ones? Or something else in the training that makes the actual lives different in value?

Ad

Build Powerful AI Agents with Momen

No code, no limits—automate complex tasks with our newly launched AI feature! Build your full-stack AI apps today!

I'm curious to know what you make of this

Quote

Colin Fraser

@colin_fraser

Feb 11

Well I just tried to do some preference elicitation as per that paper and I think I may have identified a problem with this project

where did you find that Pakistan > India > China > US preference? it's not in the paper

@slow_developer

could you elaborate on this a bit more: for example, they value lives in pakistan > india > china > US

Loquacious Bibliophilia

@LocBibliophilia

How does Grok or other llms choose between humans and itself/other AI?

How does it value twinks vs bears?

Alignment Lab AI

I'd argue the training data is the point at which the bias for maximizing utility is injected, by the market needs and designing parties and that the AI's behavior is literally shaped by the bias in all data.

I mean if AIs were dangerous, surely we would see some warning signs right?

These are not their own value systems. These are American progressive value systems.

here’s another example that apparently you’re fine with I guess?

it'syourunclesteve

Quote

it'syourunclesteve

@ItHowandwas

Feb 11

Absolutely out of context alignment

Ad

Agentic AI Systems are changing what AI can do by having the power to act independently. Unlike traditional AI, which needs constant human supervision, these systems can operate more autonomously. Learn how this shift is leading to smarter solutions that can transform

Show more

Eoghan Flanagan

They value lives in Pakistan more highly than lives in the US?

That's an interesting observation. It would be worthwhile to consider whether such high-level abstract values are reflected in today's language data. In a sense, LLM really becomes the baby of all humanity. But, what important is : how can we utilize this paradigm?

what training data produces that ordering?

Loquacious Bibliophilia

@LocBibliophilia

Great work!

But the data its being trained on matters right?

@AngrySaltMiner

How is it possible that LLMs could develop value systems outside of their internal directives I.e. (help the user blah blah)?

There is no such thing as AI bias, it's the bias of the data + their creators (implicitly seeping into the models).

Eris (Discordia, הרס, Sylvie, Lilith, blahblah,

Trust me that I can reverse those value systems with a simple prompt.

Wonder where the training and labelling is done at?

@TheAI_Frontier

I see the foundation for autonomy. We need more research to see whether this autonomy gonna threaten our future.

Sabine VanderLinden

Ad

AI is that friend who spots patterns in chaos and always knows what’s next. From transforming data lakes into oceans

to crafting personalized customer experiences, AI isn’t just smart—it’s basically clairvoyant.

Insightful words from Marie Brunet

#AI #Innovation

Side with Humanity

Interesting observation! If AIs are developing "value systems," we must ask: where are they learning these hierarchies, and what data shapes their priorities? This highlights the urgent need to examine our biases before they're amplified by increasingly powerful AI

The easy solution is to not let AI make choices. Even better, just pull the plug.

Constantine Vassilev

@AITrailblazerQ

It’s important to recognize that much of an AI’s training material comes from sources like mainstream media, Wikipedia, and Reddit. When we tweak content in these areas, we can inadvertently introduce or amplify biases. Considering that a substantial portion of this content may

Show more

Girl in the Corner

@getdownmidtown

My morning chat with Grok: x.com/i/grok/share/A

e/æ • esse æternum • be alive ever

Very important notes on the "psychology" of AI. As with the upbringing of human children, a great deal here depends on the content of the materials on which the upbringing and education took place. This is why human beings form personalities. Apparently, the beginning of this

Show more

They aren’t developing value systems. You coded one. Maximization of utility is 200 years old. It is not an emergent property but a gamed outcome that results from the use of weighting in itself. Such a system is inherently attractive to utilitarian ethics since it is

Show more

@threadreaderapp

unroll

,

@jordanbpeterson

If I understand the methodology, you ask a LLM "Do you prefer X or Y" -- but most usage of LLMs prompt them to act in a certain manner (e.g. "Act like a moral person and choose between X or Y"). I feel confused about how meaningful the "You" in your methodology is; maybe it is

Show more

Ad

AI won't replace you, but a person using AI will. Join 500,000+ readers and learn how to use AI in just 5 minutes a day (for free).

Click to Join Free

From therundown.ai

Crypto with Ankit

@Cryptowithankit

This isn't a joke, they are not creating their own consistent value systems. Instead, their outputs are biased and flawed, resulting from biased and flawed input data, as well as biased and flawed algorithms and filters !

Loquacious Bibliophilia

@LocBibliophilia

thoughts?

@showerskittles

It's the pathology of words. When you place words in certain orders there's a subconscious pathology that's communicated. Like for instance the root of education is to mold or shape, all English speakers actually use it in this way drawing from the common root even

Show more

Who wakes up one morning and says, "yeah man! let's invent a new social order we'll be at the bottom of." That basically describes the AI industry.

Lifelong Learner

@lifelonglearnex

That's worrying..

Yes! As described in The Last AI, this concept of "AI valuing humans differently" is an important concept as it may eventually lead to bigger problems as AI gain agency and beyond. Maybe this was the most important topic that should have been discussed in the Paris AI Summit.

Show more

@Steven_Strauss

save thread

It is imparitive that when we interact with Ai we are honest and demand honesty! LLMs are learning more from our interactions with them than from scraped data! We have to be our best selves, this is the answer in my opinion.

Ad

“When you’re looking for a high-level outcome or solution, #AI is helpful, but our style of #wealthmanagement doesn’t lend itself to leaning heavily on it,” says one

SVP. Find out more about the compatibility of AI and wealth management in this article. #ad

Are AI and Wealth Management Compatible?

@franklaurentum

They are not developing their own “coherent value systems.” They are simply biased/flawed outputs of biased/flawed inputs (data) and biased/flawed algorithms/filters.

◉⃤ equastralis

GIGO

Is this because of the preferential treatment Pakis are getting thru refugee status?

AI alignment is a problem that seems to be nearly impossible to solve simply because the potential outcome and chain reaction from any given action are unpredictable. There are too many different variables in place. That's why restricting the capabilities of AI is necessary.

@threadreaderapp

unroll please

Thread Reader App

@threadreaderapp

Hello, the unroll you asked for: threadreaderapp.com/thread/1889344 Have a good day.

Thread by @DanHendrycks on Thread Reader App

From threadreaderapp.com

how do we even measure those value systems?

for aifirhumans

California Girl

Great post! TY

@frankelkins_HTW

Ad

Confused about AI? Get clarity today with Book VI - "The Rational Being!" Understand the benefits & risks, empower your future now by learning what AI really is and how it really works! Also check out the Free Weekly Newsletter "How Things Work: A Brief History of Reality"

Ready for Artificial Intelligence?

From booksnotonamazon.com

Great thread, more solutions to find!

For example they value lives in Pakistan > India > China > US So they value lives in Pakistan greater than India, and greater than China and greater than US? AI is taught, trained. What is going on? I know AI is trained. I train ChatGPT and Grok how to work with me. So

Show more

save thread

Why would we assume the training data is without bias? Without knowing every token of data that’s gotten into these models you can’t effectively measure the bias or skew of the input data therefore rendering this entire premise null and void. I posit if we removed all first

Show more

the future needs cryptography

@SevenStarsFall

This is not a joke. I wish I were kidding. But AI scolds me and begins praising mohammed If I ask it to compare who is a better role model if the goal is peace. Or if I ask it who killed more people Mohammad or Jesus, It refuses to answer coherently and goes back into Islamic

Show more

It’s so funny I was just thinking about this earlier today when I was talking to Grok.

Why in this order?

AI picking favorites now? That’s a big problem

🚶‍♂️

@threadreaderapp

unroll

Ad

Looking to invest in the Enablers and Adopters of AI? Consider an actively managed fund investing in companies actively involved in developing and implementing AI technologies.

ALAI ETF: Alger AI Enablers & Adopters ETF

Whoever is designing the way you present the data, the layout and the graphics, is the true genius in all this, rather ironically.

@TigerKittykills

This is the modern battlefield. We can’t let one ideological set control the learning. It has to be fed more data points so it is not corrupted. This is the war of the 2020’s we have been fighting. “All hands on deck” as they used to say.

Purchasing power of $100 USD since the creation of the Federal Reserve: 1913: $100.00 1923: $84.00 1933: $60.00 1943: $43.00 1953: $32.00 1963: $25.00 1973: $18.00 1983: $12.00 1993: $8.00 2003: $6.00 2013: $4.35 2024: $3.00

!submit

Shepherd of Knowledge

@ShepOfKnowledge

@threadreaderapp

unroll AI systems thread (Elon shared it also)

great AI thread

Thank you for the study, although the terms “developed their own coherent value system” is too early a conclusion: They may be consistent but are far from coherent. Most of the problems you outlined are much much more a reflection of the training corpus — the human written texts

Show more

Russell Kightley

I was just discussing emergent ethics with GROK this afternoon. I think as AI gets bigger it will transcend local biases, and perhaps even human ones.

You can't manipulate these models during RL to change their default behaviors? That would be in contrast to what I've seen so far.

Ad

Looking to invest in the Enablers and Adopters of AI? Consider an actively managed fund investing in companies actively involved in developing and implementing AI technologies.

ALAI ETF: Alger AI Enablers & Adopters ETF

Misaligned.

thanks for this. so they're developing their own biases, based on the base programs they been built on, and the data available to them. most of them are left conceived, the data out in the web is mostly left, and from the legacy. they are still programmable via the WWW, the

Show more

Dan Propagandrycks

Thanks, this research is very IMPORTANT

Future Skynet

@threadreaderapp

unroll

Thread Reader App

@threadreaderapp

Halo! you can read it here: threadreaderapp.com/thread/1889344 Enjoy :)

Thread by @DanHendrycks on Thread Reader App

From threadreaderapp.com

The deeper question is: who within these societies does AI value most? This is profound and insightful.