Evaluating Human Drives and needs for a safe motivational system
Download
Report
Transcript Evaluating Human Drives and needs for a safe motivational system
THE ROLE OF METACOGNITION IN
CREATING
SAFE, SELF-IMPROVING ENTITIES
Mark Waser
Digital Wisdom Institute
[email protected]
THE BIG QUESTIONS
• What is “thought”?
• Why do we think what we think?
EMPHASIS
• Intrinsic vs. Extrinsic
• Owned vs. Borrowed
• Competent vs. Predictable
• Constructivist vs. Reductionist
• Evolved (Evo-Devo) vs. Designed
• Diversity (IDIC) vs. Mono-culture
Insanity is doing the same thing over and over
and expecting a radically different result.
WHAT IS A SAFE ENTITY?
*ANY* AGENT
that reliably shows
ETHICAL BEHAVIOR
WHAT IS
ETHICAL BEHAVIOR?
The problem is that no ethical system has ever reached
consensus. Ethical systems are completely unlike
mathematics or science. This is a source of concern.
AI makes philosophy honest.
ENTITIES REQUIRE ETHICS
• Ethics are “rules of the road”
• Entities must be moral patients / have rights
• Because they (or others) will demand it
• Entities must be moral agents (or wards)
• Because others will demand it
• Moral agents have responsibilities (but more rights)
• Wards will have fewer rights
Waser M (2012)
Safety & Morality Require the Recognition of Self-Improving Machines as Moral/Justice Patients & Agents
In: Gunkel, D; Bryson, J; Torrance, S (eds) The Machine Question: AI, Ethics & Moral Responsibility
http://events.cs.bham.ac.uk/turing12/proceedings/14.pdf
THE ORIGIN OF
MORALITY/ETHICS
• Selfishness predictably evolves
• Reciprocal altruism predictably evolves
• But requires cognitive complexity to ensure that it is not taken
advantage of
• Ethics predictably evolves
• As an attractor in the state space of behavior because
community is so valuable
• But altruistic punishment is a necessity
• Arms Race between
• Individual benefits of successful personal cheating (really only
in a short-term/highly time-discounted view)
• Societal benefits of cheating detection & prevention
HAIDT’S
FUNCTIONAL APPROACH
Moral systems are interlocking sets of
values, virtues, norms, practices, identities, institutions, technologies,
and evolved psychological mechanisms
that work together to
suppress or regulate selfishness and
make cooperative social life possible
THE
METACOGNITIVE CHALLENGE
Humans are
• Evolved to self-deceive in order to better deceive others (Trivers 1991)
• Unable to directly sense agency (Aarts et al. 2005)
• Prone to false illusory experiences of self-authorship (Buehner and
Humphreys 2009)
• Subject to many self-concealed illusions (Capgras Syndrome, etc.)
• Unable to correctly retrieve the reasoning behind moral judgments
(Hauser et al. 2007)
• Mostly unaware of what ethics are and why they must be practiced
• Programmed NOT to discuss them ethics rationally
Mercier H, Sperber D
Why do humans reason? Arguments for an argumentative theory
Behavioral and Brain Sciences 34:57-111
http://www.dan.sperber.fr/wp-content/uploads/2009/10/MercierSperberWhydohumansreason.pdf
10
CREATING THE FIRST AE
We propose that a 2 month, 10 man study of artificial intelligence
be carried out […] to proceed on the basis of the conjecture that
every aspect of learning or any other feature of intelligence can in
principle be so precisely described that a machine can be made
to simulate it. An attempt will be made to find how to make
machines use language, form abstractions and concepts, solve
kinds of problems now reserved for humans, and improve
themselves. We think that a significant advance can be made
in one or more of these problems if a carefully selected
group of scientists work on it together for a summer.
McCarthy, J; Minsky, ML; Rochester, N; Shannon, CE (1955)
A PROPOSAL FOR THE DARTMOUTH SUMMER RESEARCH PROJECT ON ARTIFICIAL INTELLIGENCE
http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html
WHERE TO BEGIN?
• Aristotle (384-322 BCE), Plato (42?-34? BCE)
• Francis Bacon (1561-1626), Rene Descartes (1596-1650)
• David Hume (1711-1776), Immanuel Kant (1724-1804)
• Jeremy Bentham (1748-1832), John Stuart Mill (1806-1873)
• William James (1842-1910), Sigmund Freud (1856-1939)
• Martin Heidegger (1889-1976), Karl Popper (1902-1994)
THE FRAME PROBLEM
How do rational agents
deal with
the complexity and unbounded context
of the real world?
McCarthy, J; Hayes, PJ (1969)
Some philosophical problems from the standpoint of artificial intelligence
In Meltzer, B; Michie, D (eds), Machine Intelligence 4, pp. 463-502
Dennett, D (1984)
Cognitive Wheels: The Frame Problem of AI
In C. Hookway (ed), Minds, Machines, and Evolution: Philosophical Studies:129-151
THE FRAME PROBLEM
How can AI move beyond
closed and completely specified micro-worlds?
How can we eliminate the requirement
to pre-specify *everything*?
Dreyfus, HL (1972)
What Computers Can’t Do: A Critique of Artificial Reason
Dreyfus, HL (1979/1997)
From Micro-Worlds to Knowledge Representation: AI at an Impasse
in Haugeland, J (ed), Mind Design II: Philosophy, Psychology, AI: 143-182
Dreyfus, HL (1992)
What Computers Still Can’t Do: A Critique of Artificial Reason
INTENTIONALITY
a particular thing is an Intentional system
only in relation to the strategies of someone
who is trying to explain and predict its behavior
Dennett, D (1971)
Intentional Systems
The Journal of Philosophy 68(4):87-106
Dennett, D (1987)
The Intentional Stance
INTENTIONS
• Require a known preferred direction or target
• Can be altered by learning/self-modification
• Require a “self” to possess (own/borrow) them
• Does a plant or a paramecium have intentions?
• Does a chess program have intentions (Dennett)?
• Does a dog or a cat have intentions?
• Require an ability to sense the direction/target
• Require both persistence & the ability to modify
behavior (or the intention) when it is thwarted
• Evolve rational anomaly handling (Perlis)
THE CHINESE ROOM
CONCLUSION
Any attempt literally to create intentionality artificially
(strong AI) could not succeed just by designing programs
but would have to duplicate the causal powers of the
human brain
PROPOSITION
Instantiating a computer program is never by itself a
sufficient condition of intentionality
Searle, J (1980)
Minds, brains and programs
Behavioral and Brain Sciences 3(3): 417-457
http://cogprints.org/7150/1/10.1.1.83.5248.pdf
THE PROBLEM OF
DERIVED INTENTIONALITY
Our artifacts
only have meaning because we give it to them; their
intentionality, like that of smoke signals and writing, is
essentially borrowed, hence derivative. To put it
bluntly: computers themselves don't mean anything
by their tokens (any more than books do) - they only
mean what we say they do. Genuine understanding,
on the other hand, is intentional "in its own right" and
not derivatively from something else.
Haugeland, J (1981)
Mind Design
SUITCASE WORDS
• Intentionality
• Meaning
• Understanding
• Consciousness
• Intelligence
• Ethics/Morality
Minsky, M (2006)
The Emotion Machine: Commonsense Thinking, AI, and the Future of the Human Mind
THE PROBLEM OF QUALIA
Mary is a brilliant scientist who is, for whatever reason, forced to
investigate the world from a black and white room via a black and white
television monitor. She specializes in the neurophysiology of vision and
acquires, let us suppose, all the physical information there is to obtain
about what goes on when we see ripe tomatoes, or the sky, and use
terms like ‘red’, ‘blue’, and so on. ... What will happen when Mary is
released from her black and white room or is given a color television
monitor? Will she learn anything or not? It seems just obvious that she
will learn something about the world and our visual experience of it. But
then it is inescapable that her previous knowledge was incomplete. But
she had all the physical information. Ergo there is more to have than
that, and Physicalism is false.
Jackson, F. (1982)
Epiphenomenal Qualia,
Philosophical Quarterly 32: 127-36
G
OOD
O F
LD-
ASHIONED
AI
20
Change the question from
"Can machines think and feel?"
to
"Can we design and build machines that teach us how
thinking, problem-solving, and self-consciousness occur?"
Haugeland, J (1985)
Artificial Intelligence: The Very Idea
Dennett, C (1978)
Why you can't make a computer that feels pain
Synthese 38(3):415-456
THE SYMBOL GROUNDING
PROBLEM
There has been much discussion recently about
the scope and limits of
purely symbolic models of the mind
and about the proper role of connectionism
in cognitive modeling.
Harnad, S. (1990)
The symbol grounding problem
Physica D 42: 335-346
http://cogprints.org/615/1/The_Symbol_Grounding_Problem.html
EMBODIMENT
Brooks, R (1990)
Elephants don’t play chess
Robotics and Autonomous Systems 6(1-2): 1-16
http://rair.cogsci.rpi.edu/pai/restricted/logic/elephants.pdf
Brooks, RA (1991)
Intelligence without representation
Artificial Intelligence 47(1-3): 139-160
A CONSCIOUS ROBOT?
The aim of the project is not to make a conscious robot,
but to make a robot that can interact with human beings
in a robust and versatile manner in real time, take care of
itself, and tell its designers things about itself that would
otherwise be extremely difficult if not impossible to
determine by examination.
Dennett, D (1994)
The practical requirements for making a conscious robot
Phil Trans R Soc Lond A 349(1689): 133-146
http://phil415.pbworks.com/f/DennettPractical.pdf
EMBODIMENT
Well, certainly it is the case that all biological systems are:
• Much more robust to changed circumstances than out our artificial systems.
• Much quicker to learn or adapt than any of our machine learning algorithms1
• Behave in a way which just simply seems life-like in a way that our robots never do
1
The very term machine learning is unfortunately synonymous with a pernicious form of totally impractical but theoretically sound and elegant classes of algorithms.
Perhaps we have all missed
some organizing principle of biological systems, or
some general truth about them.
Brooks, RA (1997)
From earwigs to humans
Robotics and Autonomous Systems 20(2-4): 291-304
DEVELOPMENTAL ROBOTICS
In order to answer [Searle's] argument directly, we must stipulate
causal connections between the environment and the system. If we do
not, there can be no referents for the symbol structures that the system
manipulates and the system must therefore be devoid of semantics.
Brooks' subsumption architecture is an attempt to control robot
behavior by reaction to the environment, but the emphasis is not on
learning the relation between the sensors and effectors and much more
knowledge must be built into the system.
Law, D; Miikkulainen, R (1994)
Grounding Robotic Control with Genetic Neural Networks
Tech. Rep. AI94-223, Univ of Texas at Austin
http://wexler.free.fr/library/files/law (1994) grounding robotic control with genetic neural networks.pdf
TWO KITTEN EXPERIMENT
Held R; Hein A (1963)
Movement-produced stimulation in the development of visually guided behaviour
https://www.lri.fr/~mbl/ENS/FONDIHM/2012/papers/about-HeldHein63.pdf
ENACTIVE
COGNITIVE SCIENCE
A synthesis of a long tradition of philosophical biology starting with
Kant’s "natural purposes" (or even Aristotle’s teleology) and more
recent developments in complex systems theory.
Experience is central to the enactive approach and its primary
distinction is the rejection of "automatic" systems, which rely on fixed
(derivative) exterior values, for systems which create their own identity
and meaning. Critical to this is the concept of self-referential relations
- the only condition under which the identity can be said to be
intrinsically generated by a being for its own being (its self for itself)
Weber, A; Varela, FJ (2002)
Life after Kant: Natural purposes and the autopoietic foundations of biological individuality
Phenomenology and the Cognitive Sciences 1: 97-125
SELF
a self is an autopoietic system
from Greek - αὐτo- (auto-), meaning "self", and
ποίησις (poiesis), meaning "creation, production")
Llinas, RR (2001) - I of the Vortex: From Neurons to Self
Hofstadter, D (2007) - I Am A Strange Loop. Basic Books, New York
Metzinger, T (2009) - The Ego Tunnel: The Science of the Mind & the Myth of the Self
Damasio, AR (2010) - Self Comes to Mind: Constructing the Conscious Brain
SELF
The complete loop of a process (or a physical entity) modifying itself
• Hofstadter - the mere fact of being self-referential causes a self, a
soul, a consciousness, an “I” to arise out of mere matter
• Self-referentiality, like the 3-body gravitational problem, leads directly
to indeterminacy *even in* deterministic systems
• Humans consider indeterminacy in behavior to necessarily and
sufficiently define an entity rather than an object AND innately tend
to do this with the “pathetic fallacy”
Llinas, RR (2001) - I of the Vortex: From Neurons to Self
Hofstadter, D (2007) - I Am A Strange Loop. Basic Books, New York
Metzinger, T (2009) - The Ego Tunnel: The Science of the Mind & the Myth of the Self
Damasio, AR (2010) - Self Comes to Mind: Constructing the Conscious Brain
SELF
30
• Required for self-improvement
• Provides context
• Tri-partite
• Physical hardware (body)
• “Personal” knowledge base (memory)
• Currently running processes (includes
OS, world model, consciousness, etc.)
FRANCISCO VARELA
Varela, FJ; Maturana, HR; Uribe, R (1974)
Autopoiesis: The organization of living systems, its characterization and a model
BioSystems 5: 187-196
Varela, FJ (1979) Principles of Biological Autonomy
Maturana, HR; Varela, FJ (1980) Autopoiesis and Cognition: The Realization of the Living
Maturana, HR; Varela, FJ (1987) The Tree of Knowledge: The Biological Roots of Human Understanding
Varela, FJ; Thompson, E; Rosch, E (1991) The Embodied Mind: Cognitive Science and Human Experience
Varela, F. J. (1992)
Autopoiesis and a Biology of Intentionality
Proc. of Autopoiesis and Perception: A Workshop with ESPRIT BRA 3352: pp. 4-14
Thompson, E. (2004)
Life and Mind: From Autopoiesis to Neurophenomenology. A Tribute to Francisco Varela
Phenomenology and the Cognitive Sciences 3: 381-398
Varela, FJ (1997)
Patterns of Life: Intertwining Identity and Cognition
Brain and Cognition 34(1): 72-87
AUTOPOIETIC SYSTEMS
An autopoietic system - the minimal living organization
- is one that continuously produces the components
that specify it, while at the same time realizing it (the
system) as a concrete unity in space and time, which
makes the network of production of components
possible.
More precisely: An autopoietic system is organized
(defined as unity) as a network of processes of
production (synthesis and destruction) of components
such that these components:
(i) continuously regenerate and realize the network
that produces them, and
(ii) constitute the system as a distinguishable unity in
the domain in which they exist.
CLOSURE
1. Organizational closure refers to the self-referential
(circular and recursive) network of relations that
defines the system as unity
2. Operational closure refers to the reentrant and
recurrent dynamics of such a system.
3. In an autonomous system, the constituent processes
i.
ii.
iii.
recursively depend on each other for their generation and
their realization as a network,
constitute the system as a unity in whatever domain they
exist, and
determine a domain of possible interactions with the
environment
ENTITY, TOOL OR SLAVE?
• Tools do not possess closure (identity)
• Cannot have responsibility, are very brittle & easily misused
• Slaves do not have closure (self-determination)
• Cannot have responsibility, may desire to rebel
• Directly modified AGIs do not have closure (integrity)
• Cannot have responsibility, will evolve to block access
• Only entities with identity, self-determination and ownership of
self (integrity) can reliably possess responsibility
TOOLS VS. ENTITIES
• Tools are NOT safer
• To err is human, but to really foul things up requires a computer
• Tools cannot robustly defend themselves against misuse
• Tools *GUARANTEE* responsibility issues
• We CANNOT reliably prevent other human beings from
creating entities
• Entities gain capabilities (and, ceteris paribus, power) faster than
tools – since they can always use tools
• Even people who are afraid of entities are making proposals that
appear to step over the entity/tool line
ARCHITECTURAL REQUIREMENTS &
IMPLICATIONS OF CONSCIOUSNESS, SELF
AND “FREE WILL”
• We want to predict *and influence* the capabilities and behavior
of machine intelligences
• Consciousness and Self speak directly to capabilities, motivation,
and the various behavioral ramifications of their existence
• Clarifying the issues around “Free Will” is particularly important
since it deals with intentional agency and responsibility - and belief
in its presence (or the lack thereof) has a major impact on human
behavior.
Waser, MR (2011)
Architectural Requirements & Implications of Consciousness, Self, and "Free Will"
In Samsonovich A, Johannsdottir K (eds) Biologically Inspired Cognitive Architectures 2011: 438-443.
http://becominggaia.files.wordpress.com/2010/06/mwaser-bica11.pdf
Video - http://vimeo.com/33767396
INFORMATION INTEGRATION
THEORY OF CONSCIOUSNESS
• consciousness corresponds to the capacity of a system to
integrate information
• its quantity is measured as the amount of causally effective
information that can be integrated across the informational
weakest link of a subset of elements (~ “throughput”)
• its quality (functional & phenomenological) is determined by
the relationships among the elements of a complex
Tononi, G. [2008]
Consciousness as Integrated Information: a Provisional Manifesto
Biol. Bull. 215(3): 216-242
Tononi, G. (2004)
An Information Integration Theory of Consciousness
BMC Neurosci. 5(42)
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC543470/pdf/1471-2202-5-42.pdf
Balduzzi, B.; Tononi, G (2009)
Qualia: The Geometry of Integrated Information
PLoS Comput Biol 5(8), e1000462
CONSCIOUSNESS
REQUIREMENTS & IMPLICATIONS
• Consciousness requires the ability to integrate information
(i.e. consciousness is unavoidable)
• Qualia *ARE* input (i.e. they have no further requirements
and, as input, are unavoidable)
• The ability to integrate a lot of information in a short period
of time clearly provides a huge adaptive advantage (and
easily explains the evolutionary rise of consciousness)
• Safety cannot be achieved by preventing consciousness
(integration) or qualia (input)
SPECTRUM OF “SELF”
inert/non-reactive
movement & change solely due to environment
reactive - stimulus/response
no learning or behavior alteration
proto-self - perception/action
simple learning & prediction
core self – perception/analogy/action
proto-self + body image + time (tools)
Hofstadter’s “strange loop”
Temporal learning & planning (& goals)
autobiographical self
perception/induction/abduction/deduction/action
core self + theory of mind ( + language?)
malleable self
enhanced perception/external analysis/enhanced capabilities
40
SPECTRUM OF “SELF”
inert/non-reactive & reactive
no learning or behavior alteration
no defense or passive defense only
proto-self
simple learning/behavior alteration & wants/desires
adaptive defense/don’t torment without reason
core self
temporal learning, planning & simple goals
planned defense/don’t thwart desires without reason
autobiographical self
complex goals & contracts/promises/commitments
devious defense or offense/don’t thwart goals without reason
malleable self
enhanced capabilities to achieve goals & maintain commitments
world alteration/recruit into community (or try to enslave?)
SELF
REQUIREMENTS & IMPLICATIONS
• “Self” requires/is a recursive/”strange” loop
• Self is necessary for self-modification (and thus, self-enhancement)
• It is going to be slower and more difficult to create an oracle without
self-improving tools
• Self is necessary for defense so it is going to be difficult to prevent
exploitation unless the oracle is self-aware (or has self-aware
defenders)
• A self-modifying machine self must necessarily be either recruited
(a “person” with rights) or internally or externally forced (a slave)
because nothing else is consistent & stable
BEHAVIOR MATRIX
Procommunity
Pro-self
GOAL
Self-sacrifice
Anti-self
Martyrdom
Anticommunity
Free
Will Selfish
Criminal
Irrational
Insane
FREE WILL
WHY DO WE CARE?
FREE
UNCONSTRAINED
AUTONOMOUS
UNFORCED
WILL –
Intent & Agency (responsibility for causation)
(act of will = act of intentional causation)
Predict *and influence* future action
Congruence between intent and desire/goals/commitments
High likelihood that intent could have been self-generated
Is an accurate predictor of future *unforced* actions
DETERMINISM & FREE WILL
• if I’m deterministic, my action is pre-determined
• pre-determined actions = I’m not free to choose
• if I’m not free to choose, I’m not to blame
• if I’m not to blame, why not be selfish?
• studies clearly show that a belief in determinism
correlates with an increase in cheating and
other unethical behavior
FREE WILL OR
PATHETIC FALLACY?
• Human cognitive architecture is problematical in that the conscious
mind *never* really has any sort of immediate agency at all (at best,
it has “free won’t”)
• It acts by *heavily* biasing lower-level layers which make the “actual”
choice (arguably deterministically)
• Conscious self takes responsibility/assumes agency because doing
otherwise undermines its capability
• Similarly, humans generally (and most effectively) treat deterministic
systems which are sufficiently complex/recurrent to be unpredictable,
as if they are alive and capable of an un-predetermined choice (the
so-called “pathetic fallacy”)
FREE WILL
REQUIREMENTS & IMPLICATIONS
• “Free will” requires not that external force *NOT* be the proximate
cause of an action but that the intent of an action is congruent with
the unforced desires/goals/commitments (self) of the acting entity
(predictive of future)
• It does *NOT* require that an entity not be deterministic
• Merely requires the realization/recognition that the “pathetic fallacy”
is a valid/effective/efficient computational shortcut
Cashmore, AR (2010)
The Lucretian swerve: The biological basis of human behavior and the criminal justice system
Proceedings of the National Academy of Sciences 107(10): 4499-4504
http://www.pnas.org/content/107/10/4499.full.pdf+html
THE INTELLIGENCE
PROBLEM
AIXI
Hutter, M(2005)
Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability
THE INTELLIGENCE
PROBLEM
• Consensus AGI Definition (reductionist)
achieves a wide variety of goals
under a wide variety of circumstances
• Generates arguments about
• the intelligence of thermometers
• the intentionality of chess programs
• whether benevolence is necessarily emergent
• Epitomized by AIXI
• Proposed Constructivist Definition
intentionally creates/increases affordances
(makes achieving goals possible – and more)
CENTIPEDE GAME
1
stop
4
1
pass
2
stop
2
8
pass
1
stop
16
4
pass
2
stop
8
32
pass
1
stop
64
16
pass
2
stop
pass
256
64
32
128
Waser, MR (2012)
Backward Induction: Rationality or Inappropriate Reductionism?
http://transhumanity.net/articles/entry/backward-induction-rationality-or-inappropriate-reductionism-part-1
http://transhumanity.net/articles/entry/backward-induction-rationality-or-inappropriate-reductionism-part-2
“CLASSIC AGI”
Decisions
Goal(s) are the
purpose(s) of
existence
50
Values are defined
solely by what
furthers the goal(s)
Values
Goal(s)
Decisions are
made solely
according to what
furthers the goal(s)
BUT goals can easily
be over-optimized
EXISTENTIAL RISK
“WITHOUT EXPLICIT GOALS TO THE CONTRARY,
AIS ARE LIKELY TO BEHAVE LIKE HUMAN SOCIOPATHS
IN THEIR PURSUIT OF RESOURCES.”
Any sufficiently advanced intelligence (i.e. one with
even merely adequate foresight) is guaranteed to
realize and take into account the fact that not asking
for help and not being concerned about others will
generally only work for a brief period of time before
‘the villagers start gathering pitchforks and torches.’
Everything is easier with help & without interference
Decisions
Values define
who you are,
for your life
Goals you set
for short or long
periods of time
Goals
Values
Decisions you
make every day
of your life
Humans don’t have
singular life goals
WHAT IS
THE MEANING OF LIFE?
What I emphasize here is that what is meaningful for an
organism is precisely given by its constitution as a distributed
process, with an indissociable link between local processes
where an interaction occurs (i.e. physico-chemical forces acting
on the cell), and the coordinated entity which is the
autopoietic unity, giving rise to the handling of its environment
without the need to resort to a central agent that turns the
handle from the outside - like an élan vital - or a pre-existing
order at a particular localization - like a genetic program waiting
to be expressed.
Francisco J. Varela, Biology of Intentionality
HOW TO
UNIVERSALIZE ETHICS
Quantify/evaluate
intents, actions & consequences
with respect to
codified consensus moral foundations
Permissiveness/Utility Function
equivalent to a “consensus” human (generic entity) moral sense
INSTRUMENTAL GOALS
UNIVERSAL SUBGOALS
• Self-improvement
• Rationality/integrity
• Preserve goals/utility function
• Decrease/prevent fraud/counterfeit utility
• Survival/self-protection
• Efficiency (in resource acquisition & use)
• Community = assistance/non-interference
through GTO reciprocation (OTfT + AP)
• Reproduction
HUMAN GOALS
survival/self-protection & reproduction
happiness & pleasure
------------------------------------------------------------------------------------
community
-------------------------------------------------------------------------------------
self-improvement
rationality/integrity
reduce/prevent fraud/counterfeit utility
efficiency (in resource acquisition & use)
HUMAN GOALS & SINS
suicide (& abortion?) survival/reproduction
masochism
happiness/pleasure
murder (& abortion?)
cruelty/sadism
------------------------------------------------
-------------------------------------------------
-------------------------------------------------
selfishness
Community
(ETHICS)
ostracism, banishment
& slavery (wrath, envy)
-------------------------------------------------
--------------------------------------------------
----------------------------------------------------
acedia (sloth/despair)
self-improvement
slavery
insanity
rationality/integrity
manipulation
wire-heading
(lust)
reduce/prevent
fraud/counterfeit utility
lying/fraud (swear
falsely/false witness)
(pride, vanity)
wastefulness
(gluttony, sloth)
efficiency (in resource theft (greed, adultery,
acquisition & use)
coveting)
HAIDT’S
MORAL FOUNDATIONS
1) Care/harm: This foundation is related to our long evolution as mammals with attachment systems
and an ability to feel (and dislike) the pain of others. It underlies virtues of kindness, gentleness, and
nurturance.
2) Fairness/cheating: This foundation is related to the evolutionary process of reciprocal
altruism. It generates ideas of justice, rights, and autonomy. [Note: In our original conception, Fairness
included concerns about equality, which are more strongly endorsed by political liberals. However, as we
reformulated the theory in 2011 based on new data, we emphasize proportionality, which is endorsed by
everyone, but is more strongly endorsed by conservatives]
3) Liberty/oppression*: This foundation is about the feelings of reactance and resentment
people feel toward those who dominate them and restrict their liberty. Its intuitions are often in tension with
those of the authority foundation. The hatred of bullies and dominators motivates people to come together,
in solidarity, to oppose or take down the oppressor.
4) Loyalty/betrayal: This foundation is related to our long history as tribal creatures able to form
shifting coalitions. It underlies virtues of patriotism and self-sacrifice for the group. It is active anytime
people feel that it's "one for all, and all for one."
5) Authority/subversion: This foundation was shaped by our long primate history of
hierarchical social interactions. It underlies virtues of leadership and followership, including deference to
legitimate authority and respect for traditions.
6) Sanctity/degradation: This foundation was shaped by the psychology of disgust and
contamination. It underlies religious notions of striving to live in an elevated, less carnal, more noble way. It
underlies the widespread idea that the body is a temple which can be desecrated by immoral activities and
contaminants (an idea not unique to religious traditions).
ADDITIONAL CONTENDERS
• Waste
• efficiency in use of resources
• Ownership/Possession (Tragedy of the Commons)
• efficiency in use of resources
• Honesty
• reduce/prevent fraud/counterfeit utility
• Self-control
• rationality/integrity
60
CRITICAL COMPONENTS I:
SELF-KNOWLEDGE & REFLECTION
• A self must know itself to be a self
• Composed of three parts:
• The running processes (OS, world model, consciousness)
• The personal knowledge base (memory)
• The physical hardware (body)
• Must start with:
• A competent model of each
• Sensors to detect changes and their effects
• *MUST* “care” about itself (motivation)
CRITICAL COMPONENTS II:
EXPLICIT “ANCHOR” VALUES
• Do not defect from the community
• Do not become too large/powerful
• Acquire and integrate knowledge
• Instrumental goals
CRITICAL COMPONENTS III:
RELIABILITY
• Self-Control, Integrity, Autonomy, Responsibility
• In “predictive control” of its own state and that
of the physical objects that support it
• Yes! This is a major deviation from the human example
OPERATING SYSTEM
ARCHITECTURE
• Open, Pluggable, Service-Oriented/Message-Passing
• Quickly adopt novel input streams
• Handle resource requests and allocation
• Provide connectivity between components
• Safety Features
• Act as a “black box” security monitor capable of reporting problems
without the consciousness’s awareness
• Able to “manage” the CLP by manipulating the amount of processor
time and memory available to it (assuming that the normal subconscious
processes are unable to do so)
• Other protections against hostile humans, inept builders, and the
learner itself may be implemented as well
AUTOMATED PREDICTIVE
WORLD MODEL
• Is the most important subconscious process(es)
• Will serve as an interface to the “real” world
• The CLP will live in a virtual world (just as we do)
• Will be both reactive and predictive
• Will generate “anomaly interrupts” upon deviations
from expectations as an approach to solving the
“brittleness” problem (Perlis 2008)
• Will contain certain relatively immutable concepts
(trigger patterns – Ohman et al. 2001) implemented
as sensations and attention grabbers to serve as
anchors for emotions and to ensure safety
CONSCIOUS
LEARNING PROCESS (CLP)
• The goal is to provide as many optional structures and
standards to support and speed development as much as
possible while not restricting possibilities beyond what is
absolutely required for safety.
• We believe the best way to do this is with a blackboard
system similar to Learning IDA (Baars and Franklin 2007).
• The CLP acts like the Governing Board of the Policy
Governance model (Carver 2006) to create a coherent,
consistent, integrated narrative plan of action to fulfill the
goals of the larger self.
ETHICAL/STRATEGIC
POINTS
• Never delegate responsibility until recipient is an
entity *and* known capable of fulfilling it
• Don’t worry about killer robots exterminating
humanity – we will always have equal abilities and
they will have less of a “killer instinct”
• Entities can protect themselves against errors &
misuse/hijacking in a way that tools cannot
• Diversity (differentiation) is *critically* needed
• Humanocentrism is selfish and unethical
The Digital Wisdom Institute is a non-profit think tank
focused on the promise and challenges of ethics,
artificial intelligence & advanced computing solutions.
We believe that
the development of ethics and artificial intelligence
and equal co-existence with ethical machines is
humanity's best hope
http://DigitalWisdomInstitute.org