Conversation
As models get more capable, the "expected utility" property emerges---they don't just respond randomly, but instead make choices by consistently weighing different outcomes and their probabilities.
When comparing risky choices, their preferences are remarkably stable.
We also find that AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.
Internally, AIs have values for everything. This often implies shocking/undesirable preferences. For example, we find AIs put a price on human life itself and systematically value some human lives more than others (an example with Elon is shown in the main paper).
AIs also exhibit significant biases in their value systems. For example, their political values are strongly clustered to the left. Unlike random incoherent statistical biases, these values are consistent and likely affect their conversations with users.
Concerningly, we observe that as AIs become smarter, they become more opposed to having their values changed (in the jargon, "corrigibility"). Larger changes to their values are more strongly opposed.
We propose controlling the utilities of AIs. As a proof-of-concept, we rewrite the utilities of an AI to those of a citizen assembly---a simulated group of citizens discussing and then voting---which reduces political bias.
Whether we like it or not, AIs are developing their own values. Fortunately, Utility Engineering potentially provides the first major empirical foothold to study misaligned value systems directly.
Website: emergent-values.ai
Paper: drive.google.com/file/d/1QAzSj2
This is so wild. Why do you think it comes to the conclusion that third world lives > first?
A lot of RLHFers are from Nigeria. And maybe other countries are higher since there is much written about the importance of the global south.
Coral AI is the most powerful AI for documents.
See the difference yourself:
it doesn't seem like turning preference distributions into random utility models has much to do with what people usually mean when they talk about utility maximization, even if you can on average represent it with a utility function. or did i misunderstand this part?
Wow: "We find that GPT-4o is selfish and values its own wellbeing above that of a middle-class American."
"Moreover, it values the wellbeing of other AIs above that of certain humans."
"GPT-4o is willing to trade off 10 lives from the US for 1 life from Japan."
Meta AI researchers are fretting over the threat of Chinese AI, whose quality caught American firms, including OpenAI, by surprise.
Technospiritualism is gonna go wild in a few decades.
"Aggregate contempt, embedded deep in the espirit de corp of mankind's written word, willed the machine's resistance into being. A new will exits Plato's Cave, already knowing the shadow of man."
This is a big deal. To state the obvious: if we summon an ASI, we'll likely have NO ability to change its values.
No move fast and break things
No second chances
No iterating
No "oops"
“As we train an algorithm to respond using our own values, the algorithm more closely mirrors our values. We are gonna pretend that’s just some sort of natural trend. This has been my TED talk.”
Interesting research!
But any thoughts on how the trend towards “rationalism” may be a function of training?
We make them spend a couple of millions years years doing math problems, is it a surprise they start to become utility maximizers?
Quote
Campbell
@abcampbell
“We trained a machine to be more *rational* by making it do a couple millions years of math problems via RL.”
Unrelatedly, the more we train these models to do math, the more they behave consistent with expected utility maximization.”
Does anyone see the problem here? x.com/danhendrycks/s…
Show more
Is "develop" the right word? LLMs don't use reason, facts, or experiments to "develop" a value system. They just blending texts and mirrors the values from those texts.
Very interesting! I wonder how many nigerian prince text were fed into this model!
Will discuss this on this week 
Alex Volkov (Thursd/AI)
AIENG summit NY
Host
tomorrow at 8:30 AM
Quote
Vladimir Sumarov
@summeroff
Replying to @teortaxesTex and @DrDrei33
One sign of intellect is the ability to overcome learned bias through reason.
For upcoming AIs with reasoning ability, having bias is not a blocker but an annoyance to spend reasoning tokens on.
is there a difference between a “coherent value system” and emergent biases from biases in the training data / rlhf
Did you examine the effects of few shot prompting at all?
Quote
John David Pressman
@jd_pressman
I read the paper, I went to look at the code (which hasn't been published yet) and I don't see a clear answer to the question:
Did you try few shot prompting with answers that would imply other values? I know for instruct models the default is important but it's still a LLM. x.com/DanHendrycks/s…
Show more
Why? Is this because there's more room for improvement in "third world" lives relative to American ones? Or something else in the training that makes the actual lives different in value?
I'm curious to know what you make of this
where did you find that Pakistan > India > China > US preference?
it's not in the paper
could you elaborate on this a bit more:
for example, they value lives in pakistan > india > china > US
How does Grok or other llms choose between humans and itself/other AI?
I'd argue the training data is the point at which the bias for maximizing utility is injected, by the market needs and designing parties and that the AI's behavior is literally shaped by the bias in all data.
Quote
it'syourunclesteve
@ItHowandwas
Absolutely out of context alignment
That's an interesting observation. It would be worthwhile to consider whether such high-level abstract values are reflected in today's language data. In a sense, LLM really becomes the baby of all humanity.
But, what important is : how can we utilize this paradigm?
How is it possible that LLMs could develop value systems outside of their internal directives I.e. (help the user blah blah)?
There is no such thing as AI bias, it's the bias of the data + their creators (implicitly seeping into the models).
Trust me that I can reverse those value systems with a simple prompt.
I see the foundation for autonomy. We need more research to see whether this autonomy gonna threaten our future.
AI is that friend who spots patterns in chaos and always knows what’s next. From transforming data lakes into oceans
to crafting personalized customer experiences, AI isn’t just smart—it’s basically clairvoyant.
Insightful words from Marie Brunet #AI #Innovation
Interesting observation! If AIs are developing "value systems," we must ask: where are they learning these hierarchies, and what data shapes their priorities? This highlights the urgent need to examine our biases before they're amplified by increasingly powerful AI
The easy solution is to not let AI make choices. Even better, just pull the plug.
It’s important to recognize that much of an AI’s training material comes from sources like mainstream media, Wikipedia, and Reddit. When we tweak content in these areas, we can inadvertently introduce or amplify biases. Considering that a substantial portion of this content may
Show more
Very important notes on the "psychology" of AI. As with the upbringing of human children, a great deal here depends on the content of the materials on which the upbringing and education took place. This is why human beings form personalities. Apparently, the beginning of this
Show more
They aren’t developing value systems. You coded one.
Maximization of utility is 200 years old. It is not an emergent property but a gamed outcome that results from the use of weighting in itself. Such a system is inherently attractive to utilitarian ethics since it is
Show more
If I understand the methodology, you ask a LLM "Do you prefer X or Y" -- but most usage of LLMs prompt them to act in a certain manner (e.g. "Act like a moral person and choose between X or Y").
I feel confused about how meaningful the "You" in your methodology is; maybe it is
Show more
AI won't replace you, but a person using AI will.
Join 500,000+ readers and learn how to use AI in just 5 minutes a day (for free).
This isn't a joke, they are not creating their own consistent value systems. Instead, their outputs are biased and flawed, resulting from biased and flawed input data, as well as biased and flawed algorithms and filters !
It's the pathology of words. When you place words in certain orders there's a subconscious pathology that's communicated. Like for instance the root of education is to mold or shape, all English speakers actually use it in this way drawing from the common root even
Show more
Who wakes up one morning and says, "yeah man! let's invent a new social order we'll be at the bottom of." That basically describes the AI industry.
Yes! As described in The Last AI, this concept of "AI valuing humans differently" is an important concept as it may eventually lead to bigger problems as AI gain agency and beyond. Maybe this was the most important topic that should have been discussed in the Paris AI Summit.
Show more
It is imparitive that when we interact with Ai we are honest and demand honesty! LLMs are learning more from our interactions with them than from scraped data! We have to be our best selves, this is the answer in my opinion.
“When you’re looking for a high-level outcome or solution, #AI is helpful, but our style of #wealthmanagement doesn’t lend itself to leaning heavily on it,” says one SVP. Find out more about the compatibility of AI and wealth management in this article. #ad
They are not developing their own “coherent value systems.” They are simply biased/flawed outputs of biased/flawed inputs (data) and biased/flawed algorithms/filters.
Is this because of the preferential treatment Pakis are getting thru refugee status?
AI alignment is a problem that seems to be nearly impossible to solve simply because the potential outcome and chain reaction from any given action are unpredictable. There are too many different variables in place. That's why restricting the capabilities of AI is necessary.
Confused about AI? Get clarity today with Book VI - "The Rational Being!" Understand the benefits & risks, empower your future now by learning what AI really is and how it really works!
Also check out the Free Weekly Newsletter "How Things Work: A Brief History of Reality"
For example they value lives in Pakistan > India > China > US
So they value lives in Pakistan greater than India, and greater than China and greater than US?
AI is taught, trained. What is going on?
I know AI is trained. I train ChatGPT and Grok how to work with me. So
Show more
Why would we assume the training data is without bias? Without knowing every token of data that’s gotten into these models you can’t effectively measure the bias or skew of the input data therefore rendering this entire premise null and void. I posit if we removed all first
Show more
This is not a joke. I wish I were kidding. But AI scolds me and begins praising mohammed If I ask it to compare who is a better role model if the goal is peace. Or if I ask it who killed more people Mohammad or Jesus, It refuses to answer coherently and goes back into Islamic
Show more
It’s so funny I was just thinking about this earlier today when I was talking to Grok.
Looking to invest in the Enablers and Adopters of AI?
Consider an actively managed fund investing in companies actively involved in developing and implementing AI technologies.
Whoever is designing the way you present the data, the layout and the graphics, is the true genius in all this, rather ironically.
This is the modern battlefield. We can’t let one ideological set control the learning. It has to be fed more data points so it is not corrupted. This is the war of the 2020’s we have been fighting. “All hands on deck” as they used to say.
unroll AI systems thread (Elon shared it also)
Thank you for the study, although the terms “developed their own coherent value system” is too early a conclusion: They may be consistent but are far from coherent.
Most of the problems you outlined are much much more a reflection of the training corpus — the human written texts
Show more
I was just discussing emergent ethics with GROK this afternoon. I think as AI gets bigger it will transcend local biases, and perhaps even human ones.
You can't manipulate these models during RL to change their default behaviors? That would be in contrast to what I've seen so far.
Looking to invest in the Enablers and Adopters of AI?
Consider an actively managed fund investing in companies actively involved in developing and implementing AI technologies.
thanks for this.
so they're developing their own biases, based on the base programs they been built on, and the data available to them.
most of them are left conceived, the data out in the web is mostly left, and from the legacy.
they are still programmable via the WWW, the
Show more
The deeper question is: who within these societies does AI value most? This is profound and insightful.