Adverts help to support the work of this non-profit organisation. To go ad-free join as a Member.

Are we overthinking Friendliness?

Started by modelcadet , Jan 13 2008 12:31 PM

Please log in to reply

15 replies to this topic

#1 modelcadet

Guest
443 posts
7 ₮

Posted 13 January 2008 - 12:31 PM

Video of the session from 2007 Foresight Vision Weekend:

I sent this message to Dr. Omohundro, but I wanted to discuss this further with others:

Dr. Omohundro,

I just watched your session, "Self-Improving AI: Designing 2030″ from the 2007 Foresight Vision Weekend on google video. I really like the notion that each entity's utility function will fight to preserve itself intrinsically. That makes sense. I have some thoughts though:
Here's the friendliness problem: Develop artificial general intelligence agents that don't harm us fleshies. I think we're making this problem a lot harder than it need be.
Premise: 'Logical agents won't act against their utility functions.
Premise: Computer programmers want to construct friendly AGIs.
Contention: Computer programmers should (hopefully) be logical agents when constructing AGIs.
Conclusion: Computer programmers should construct AGIs whose utility function equals the utility function of the computer programmer.

The more I think about AGI, the more apprehensive I am about us labeling our creations as autonomous entities. Of course, two entities sharing a single utility function are not in all senses identical (and/or, to be so brazen, concurrently one entity) .

So, the 'first rule of robotics' could be
for (time i=now, i object programmers utility}

I feel like one of the first goals of such an AGI would be to accurately set it's utility to the programmer's current utility. [Or should it be the utility from the instance of the program's creation, if this programmer is completely logical?]

Thank you for your time and your prior work. Although I'm not much of a programmer, I really appreciate what you are trying to accomplish and give kudos for your work towards this end.

Sincerely,
Paul Tiffany

So, what does everybody think?

Edited by modelcadet, 13 January 2008 - 12:33 PM.

#2 caston

Guest
2,141 posts
23 ₮

Location:Perth Australia

Posted 13 January 2008 - 12:40 PM

Good idea. When someone is being a little bit to friendly people start to feel awkward and back away from them.

#3 RighteousReason

Guest
2,491 posts
-103 ₮

Location:Atlanta, GA

Posted 13 January 2008 - 10:13 PM

bah

Humans have a ton of cognitive biases which may result in adverse effects if their "utility function" is ported directly into an AGI

Edited by cnorwood, 15 January 2008 - 06:14 PM.

#4 bandit

Guest
24 posts
0 ₮

Posted 14 January 2008 - 02:17 AM

Conclusion: Computer programmers should construct AGIs whose utility function equals the utility function of the computer programmer.

A human utility function is in general unsafe in an AGI undergoing an intelligence explosion IMO. Such an AGI probably requires an intelligently designed utility function defined for optimal Friendliness over the collective expected utility of all humanity in stable, isolated, provable way (not necessarily saying that your own utility function doesn't already take that into account).

1) The “Benevolent Mimic” goal system. In this variant, the AI is programmed to copy directly the philosophical belief system and goal structure of the most trustworthy and benevolent known human on Earth. If substantial agreement cannot be found as to who exactly this person is, then a person is located that is widely regarded as a very moral and compassionate individual, and upon meeting them in person, you would be strongly compelled to agree with the assessment. Since the AI could reprogram itself at will, among its benevolent behaviors would be the honing and improvement of its own benevolence, through simulated self-tests and cognitive rewrites. The advantage of the mimic approach is that it would be guaranteed to produce a superintelligence at least as benevolent as that which could be provided by an idealized human upload.
http://www.accelerat.../...=10&paged=2

#5 bandit

Guest
24 posts
0 ₮

Posted 15 January 2008 - 02:43 AM

Humans have a lot of cognitive biases, some of which could cause adverse effects if their "utility function" were ported directly into an AGI (or if they did an intelligence explosion themselves). More generally you shouldn't trust any evolved mind to undergo intelligence explosion, as you leave gaping holes in the safety of the design that obliterate the chances of Friendliness. *cough* de Garis

Edited by bandit, 15 January 2008 - 02:52 AM.

#6 treonsverdery

Guest
1,312 posts
161 ₮

Location:where I am at

Posted 21 February 2008 - 10:29 PM

the entire purpose of friendly AI is Yudokowsky n friends preferring that the singularity be a hippy vegetarian jain Go team Yudokowksy

#7 RighteousReason

Guest
2,491 posts
-103 ₮

Location:Atlanta, GA

Posted 22 February 2008 - 03:56 AM

the entire purpose of friendly AI is Yudokowsky n friends preferring that the singularity be a hippy vegetarian jain Go team Yudokowksy

Ignorant fool. You are attacking your own fantasy that has no correspondence to the technical subject Yudkowsky addresses.

The Basic AI Drives. Stephen M. OMOHUNDRO

Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully.

Edited by Savage, 22 February 2008 - 04:00 AM.

#8 treonsverdery

Guest
1,312 posts
161 ₮

Location:where I am at

Posted 22 February 2008 - 04:06 AM

actually I'm supporting vegetarian hippy jain AI

kindliness is a high goal; sometimes my arrogance plus ignorance overshoots things

your accurate use of the word ignorant moves me to read yudkowskys work there is much there Yay

I think this video suggests the critical difference between Pillow AI n humans

If I were a friendly AI
How pretty I would be
All the people in the house
Would put their heads on me!

I wouldn't walk or dance or play
I'd sit there on the bed
Waiting for the night to come
To comfort someone's head.

I could be striped or dotty or plain
Or maybe even bare
If someone sleepy would show up
I surely would be there.

I bet you're jealous of your pillow
I sure am of mine
But being human isn't so bad
Because

humans are permitted less than pillow like perfection

Edited by treonsverdery, 22 February 2008 - 04:38 AM.

#9 modelcadet

Topic Starter
Guest
443 posts
7 ₮

Posted 24 February 2008 - 09:29 PM

Humans have a ton of cognitive biases which may result in adverse effects if their "utility function" is ported directly into an AGI

Ton of cognitive biases? Adverse effects? Biased against what? Your own particular biases, obviously. But if the AGI is attuned to your particular cognitive biases, what do you honestly care for the adverse affects of others (unless you do care, in which case the AGI will also care about those others are do you).

Conclusion: Computer programmers should construct AGIs whose utility function equals the utility function of the computer programmer.

A human utility function is in general unsafe in an AGI undergoing an intelligence explosion IMO. Such an AGI probably requires an intelligently designed utility function defined for optimal Friendliness over the collective expected utility of all humanity in stable, isolated, provable way (not necessarily saying that your own utility function doesn't already take that into account).

Not to put words in your mouth, but I'm going to anyway. Essentially, you're saying that we shouldn't allow any human utility functions to undergo intellegence explosion. For you, any utility function must be morally Utilitarian to be an acceptable candidate for intelligence explosion. My two responses to your opinion are such: If you do care about Utilitarianism, so would your bot; If you didn't, why would you create a bot that did, and not maximize your own utility.

1) The "Benevolent Mimic" goal system. In this variant, the AI is programmed to copy directly the philosophical belief system and goal structure of the most trustworthy and benevolent known human on Earth. If substantial agreement cannot be found as to who exactly this person is, then a person is located that is widely regarded as a very moral and compassionate individual, and upon meeting them in person, you would be strongly compelled to agree with the assessment. Since the AI could reprogram itself at will, among its benevolent behaviors would be the honing and improvement of its own benevolence, through simulated self-tests and cognitive rewrites. The advantage of the mimic approach is that it would be guaranteed to produce a superintelligence at least as benevolent as that which could be provided by an idealized human upload.

http://www.accelerat.../...=10&paged=2

What I'm suggesting isn't a benevolent mimic goal system. I don't give a crap about what makes Mother Teresa happy. If I'm making an AGI, or any tool, I'm doing it to make myself happy. What's moral isn't a consideration. I'm talking about what's logical. And what's most logical is for the AGI to learn my utility function and achieve my personal goals. If a team is creating the AGI, then the AGI's goal structure should equal some contract among the participants for maximal utility of the group of creators.

Humans have a lot of cognitive biases, some of which could cause adverse effects if their "utility function" were ported directly into an AGI (or if they did an intelligence explosion themselves). More generally you shouldn't trust any evolved mind to undergo intelligence explosion, as you leave gaping holes in the safety of the design that obliterate the chances of Friendliness. *cough* de Garis

Essentially, I'm saying: *You* shouldn't trust anyone else to undergo intelligence explosion, because they might compromise your utility function. But you should certainly trust your own goal structure (can anyone find a counterexample?). Why do you care about your utility function harming anyone elses, if it's protecting and enhancing yours. If you care about others, you'll care about others. When you don't, that's evolution, baby.

I'd like to see somebody respond to this and refute this line of reasoning, because I think it's a pretty solid approach. Unless someone can convince me otherwise, I'm just gonna consider the Friendliness issue solved, and I'm going to spend my genius on other more challenging pursuits. If anyone can extend the argument in a valuable manner, however, I'd love the discourse.

#10 RighteousReason

Guest
2,491 posts
-103 ₮

Location:Atlanta, GA

Posted 24 February 2008 - 11:50 PM

Savage: Essentially, I'm saying: *You* shouldn't trust anyone else to undergo intelligence explosion, because they might compromise your utility function. But you should certainly trust your own goal structure (can anyone find a counterexample?). Why do you care about your utility function harming anyone elses, if it's protecting and enhancing yours. If you care about others, you'll care about others. When you don't, that's evolution, baby.
Michael Wilson: Where is that from?
Savage: imminst.org thread
Michael Wilson: Depressing.
Savage: i couldn't really think of an argument
Michael Wilson: Actually you probably shouldn't trust your own goal structure when it's inconsistent.
Michael Wilson: And you should trust other people if you think the results of them attempting to execute their goal system are better than you attempting to execute yours (on competence grounds)
Michael Wilson: Finally, the author seems to be a Randian.
Michael Wilson: It's perfectly possible to directly assign utility to other people's utility functions.
Michael Wilson: (or rather, the notion of other people's utility functions being executed)
Michael Wilson: Amusing, Randian philosophy does that implicitly while decrying it explicitly.
Michael Wilson: Collective Volition, for example, assumes that other people's volitions being executed is something you should value.
Michael Wilson: (if you're a good person ™ )
Michael Wilson: If you didn't, you wouldn't implement CV in the first place, you'd take over the world and either delete everyone else or make them your playthings.
Michael Wilson: (if you were a Randian and in denial about the (limited) value you place on other people's goal systems you'd probably make an eden for yourself and not let anyone else in, saying 'charity only makes people weak, they should build their own seed AIs' or similar)
Michael Wilson: Feel free to cc that if you'd like

#11 modelcadet

Topic Starter
Guest
443 posts
7 ₮

Posted 25 February 2008 - 01:36 PM

Thanks to Savage and Michael Wilson for the reply. I'll try to devour this over the next couple hours. You've brought up many splendid points for conversation.
I'm listening to this video whilst I type. I love the Internet!

Savage: Essentially, I'm saying: *You* shouldn't trust anyone else to undergo intelligence explosion, because they might compromise your utility function. But you should certainly trust your own goal structure (can anyone find a counterexample?). Why do you care about your utility function harming anyone elses, if it's protecting and enhancing yours. If you care about others, you'll care about others. When you don't, that's evolution, baby.
Michael Wilson: Where is that from?
Savage: imminst.org thread
Michael Wilson: Depressing.
Savage: i couldn't really think of an argument

Michael Wilson: Actually you probably shouldn't trust your own goal structure when it's inconsistent.
Michael Wilson: And you should trust other people if you think the results of them attempting to execute their goal system are better than you attempting to execute yours (on competence grounds)
Michael Wilson: Finally, the author seems to be a Randian.
Michael Wilson: It's perfectly possible to directly assign utility to other people's utility functions.
Michael Wilson: (or rather, the notion of other people's utility functions being executed)
Michael Wilson: Amusing, Randian philosophy does that implicitly while decrying it explicitly.
Michael Wilson: Collective Volition, for example, assumes that other people's volitions being executed is something you should value.
Michael Wilson: (if you're a good person ™ )
Michael Wilson: If you didn't, you wouldn't implement CV in the first place, you'd take over the world and either delete everyone else or make them your playthings.
Michael Wilson: (if you were a Randian and in denial about the (limited) value you place on other people's goal systems you'd probably make an eden for yourself and not let anyone else in, saying 'charity only makes people weak, they should build their own seed AIs' or similar)
Michael Wilson: Feel free to cc that if you'd like

Edited by modelcadet, 25 February 2008 - 01:42 PM.

#12 modelcadet

Topic Starter
Guest
443 posts
7 ₮

Posted 25 February 2008 - 04:07 PM

Ok, wow, I'm seriously confused with the interface at the moment. I feel the frustration of my parents.

Savage: Essentially, I'm saying: *You* shouldn't trust anyone else to undergo intelligence explosion, because they might compromise your utility function. But you should certainly trust your own goal structure (can anyone find a counterexample?). Why do you care about your utility function harming anyone elses, if it's protecting and enhancing yours. If you care about others, you'll care about others. When you don't, that's evolution, baby.
Michael Wilson: Where is that from?
Savage: imminst.org thread
Michael Wilson: Depressing.
Savage: i couldn't really think of an argument

Well, I'm glad you couldn't think of an argument at first, Savage. It reassures me that I'm not completely crazy [depressing?].

Michael Wilson: Actually you probably shouldn't trust your own goal structure when it's inconsistent.

Ok. So we know that our utility function changes over time. Surely, we should attempt to maximize our utility beyond our present utility function. Well, I think if you're future utility functions are represented in your current utility function, then go for it, sport. And if not, why do you care in any decision you're making? If you believe our present utility function is inconsistent [arguable, but it is a valid worry], shouldn't you assess that as part of your overall utility function? And if not, isn't it moot for the logic of Friendliness in any transhumanization.

Michael Wilson: And you should trust other people if you think the results of them attempting to execute their goal system are better than you attempting to execute yours (on competence grounds)

Yeah, it could very well be a crapshoot, who will best maximize your utility. Then you'd think you'd value it, and it'd be represented in you utility function. If not, why would we worry about it from our one round game. There will be nash equilibriums to our actions in these transhumanist debates. I'm all about respecting the marketplace of ideas and biodiversity. So would my AGI, in reality an extension of my self-identity. Why worry about friendliness? The most friendly person is us!

Michael Wilson: Finally, the author seems to be a Randian.

Oof. I don't very much like argumentum ad hominem. If we want to discuss specific issues, I will be happy to do so.

Michael Wilson: It's perfectly possible to directly assign utility to other people's utility functions.
Michael Wilson: (or rather, the notion of other people's utility functions being executed)

Ok, sure. If you care about it, you'd put it into your skynet. If you don't, you don't. Still doesn't affect the answer to the question for the AGI engineer.

Michael Wilson: Amusing, Randian philosophy does that implicitly while decrying it explicitly.

We saw the bait, and now here's the hook.

Michael Wilson: Collective Volition, for example, assumes that other people's volitions being executed is something you should value.
Michael Wilson: (if you're a good person ™ )
Michael Wilson: If you didn't, you wouldn't implement CV in the first place, you'd take over the world and either delete everyone else or make them your playthings.
Michael Wilson: (if you were a Randian and in denial about the (limited) value you place on other people's goal systems you'd probably make an eden for yourself and not let anyone else in, saying 'charity only makes people weak, they should build their own seed AIs' or similar)

Ok. Sure, I might not recognize collective volition. And that would suck for everybody. But that would still be the optimal thing to do for *me*. Fortunately, people like Ben Goertzel, who will be the first to truly get the shiney new transhumanist toys, won't be dicks. I have to say I trust them in general to respect me enough to allow me to get my own transhumanist Red Bull. But my trust in Ben shouldn't matter any more to him and his AGI's goal structure than it matters to him.

Michael Wilson: Feel free to cc that if you'd like

Thank you for your input. Feel free to join the ImmInst community, if you like. We're not all as crazy as me, I promise.

Edited by modelcadet, 25 February 2008 - 04:38 PM.

#13 samantha

Guest
35 posts
0 ₮

Location:Silicon Valley

Posted 09 March 2008 - 05:19 AM

the entire purpose of friendly AI is Yudokowsky n friends preferring that the singularity be a hippy vegetarian jain Go team Yudokowksy
Ignorant fool. You are attacking your own fantasy that has no correspondence to the technical subject Yudkowsky addresses.

The Basic AI Drives. Stephen M. OMOHUNDRO
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully.

It may be that the only way any evolved intelligent species or even any intelligence however it comes into being can survive and thrive is by learning a type of "Friendliness". If the intelligence, however derived, sees its interest/goals as in opposition to those of other intelligences and spends much time in opposition then it may be inevitable that it will sooner or later precipitate conflict that is costly (perhaps the ultimate cost) to itself and other intelligences. It may be that optimization of all resources and possibilities requires playing well with others and realizing the fullest potential contribution and maximal inter-working of all.

On our little planet it may be that until we work joyously for the maximization of the potential of all that we cannot help but devolve into the chaos of deep distrust, control and endless war or a stasis of unbreakable oppression.

#14 RighteousReason

Guest
2,491 posts
-103 ₮

Location:Atlanta, GA

Posted 26 October 2008 - 07:03 PM

modelcadet, I wish I had our convo to post here. I think I answered you really well there. bah.

#15 modelcadet

Topic Starter
Guest
443 posts
7 ₮

Posted 27 October 2008 - 08:38 AM

modelcadet, I wish I had our convo to post here. I think I answered you really well there. bah.

Savage, that indeed was a great conversation. Since our conversation, I have done a little more legwork on a theory for friendliness, including game theoretic work. I no longer believe it's acceptable to allow a single researcher to create and access a true AGI. If you'd like to talk about my progress, I'm happy to continue our conversation. I am approaching the point, however, where I believe my theory requires an academic rigorousness. Does anybody have any leads for me to complete my research and publish the solution to the Friendliness problem? Because I have *the* solution. But I also have an exam tuesday, and am currently selling blunts and redbull to pay for it.

It's incredibly difficult to be taken seriously as an undergraduate. Can someone give me a break? I was thinking about publishing what I currently have to Ben's AGI mailing list, although that might prevent me from getting social capital I desperately need to kickstart some other projects.

#16 RighteousReason

Guest
2,491 posts
-103 ₮

Location:Atlanta, GA

Posted 27 October 2008 - 11:42 AM

modelcadet, I wish I had our convo to post here. I think I answered you really well there. bah.

Savage, that indeed was a great conversation. Since our conversation, I have done a little more legwork on a theory for friendliness, including game theoretic work. I no longer believe it's acceptable to allow a single researcher to create and access a true AGI. If you'd like to talk about my progress, I'm happy to continue our conversation. I am approaching the point, however, where I believe my theory requires an academic rigorousness. Does anybody have any leads for me to complete my research and publish the solution to the Friendliness problem? Because I have *the* solution. But I also have an exam tuesday, and am currently selling blunts and redbull to pay for it.

It's incredibly difficult to be taken seriously as an undergraduate. Can someone give me a break? I was thinking about publishing what I currently have to Ben's AGI mailing list, although that might prevent me from getting social capital I desperately need to kickstart some other projects.

Ah, yes... *the* solution, hehe.

Yeah, I would definitely be open to chatting again or reviewing your ideas.

You could also post them to the SL4, AGIRI Singularity, or AGIRI AGI mailing lists, or take them to Eliezer who would quickly disabuse you of your foolish, youthful notions

http://www.singinst.org/upload/CFAI/ (old, outdated)
http://www.sl4.org/w...nowabilityOfFAI
http://www.vetta.org...dly-ai-is-bunk/
http://www.goertzel....riendliness.pdf

AGIRI mailing list:

My own take in the above PDF is not as entertainingly written as
Shane's (a bit more technical) nor quite as extreme, but we have the
same basic idea, I think. The main difference between our
perspectives is that, while I agree with Shane that achieving
"Friendly AI" (in the sense of AI that is somehow guaranteed to remain
benevolent to humans even as the world and it evolve and grow) is an
infeasible idea ... I still suspect it may be possible to create AGI's
that are very likely to maintain other, more abstract sorts of
desirable properties (compassion, anyone?) as they evolve and grow.
This latter notion is extremely interesting to me and I wish I had
time to focus on developing it further ... I'm sure though that I will
take that time, in time ;-)

Thoughtful comments on Shane's or my yakking, linked above, will be
much appreciated and enjoyed.... (All points of view will be accepted
openly, of course: although I am hosting this new list, my goal is not
to have a list of discussions mirroring my own view, but rather to
have a list I can learn something from.)

Yours,
Ben Goertzel