Computational Discovery of Communicable Knowledge

Download Report

Transcript Computational Discovery of Communicable Knowledge

Experimental Studies of
Integrated Cognitive Systems
Pat Langley
Computational Learning Laboratory
Center for the Study of Language and Information
Stanford University, Stanford, California
Elena Messina
Intelligent Systems Division
National Institute of Standards and Technology
Gaithersburg, Maryland
Thanks to David Aha, Michael Genesereth, and Barney Pell. This work was funded in part
by DARPA IPTO, which is not responsible for the points made herein.
Experimentation in Artificial Intelligence
Controlled experiments are the primary evaluation tool in modern
AI, including the subfields of:
 supervised learning and reinforcement learning;
 generative planning and scheduling;
 computational linguistics and text processing;
 but not for work on integrated cognitive systems.
Extending experimental methods to the latter is crucial, since it
deals with the ultimate goals of artificial intelligence.
Challenges for Experimentation
The reasons that experiments with integrated cognitive systems
have lagged behind are clear from the phrase itself:
 systems are harder to evaluate than component algorithms;
 cognitive methods involve complex, multi-step reasoning;
 integrated software relies on interactions among components.
Together, these factors have slowed the development and wide
acceptance of an experimental framework.
In this talk, we propose the key elements of an experimental
method for the study of integrated cognitive systems.
Dependent Variables: Basic Measures
Dependent variables in an experiment measure system behavior.
Some basic measures of integrated cognitive systems include:
 success or failure on a given problem;
 speed or efficiency of the system’s response;
 desirability or quality of the system’s response.
Such metrics provide the building blocks for more sophisticated
and informative measures of behavior.
Dependent Variables: Combined Measures
Statistics tells us we should not draw conclusions from one case.
Collecting multiple samples supports combined measures like:
 average behavior of the system;
 cumulative behavior of the system;
 variance of the system’s behavior.
Combined measures also partly cancel variation due to unknown
or uncontrolled factors.
However, this requires some population from which samples are
drawn, which one should always specify clearly.
Dependent Variables: Higher-Order Metrics
Combined measures present only a small window on behavior.
However, one can also derive higher-order measures such as:
 the slope and intercept with respect to a control system;
 the intercept, rate, and asymptote of a learning curve.
Such metrics let one summarize behavior even when variation
across samples is not systematic.
Conclusions about higher-order measures are more important
than ones about basic or combined variables.
Independent Variables: Task Characteristics
Independent variables in an experiment reflect factors thought to
influence system behavior.
An important class of factors are domain or task features like:
 the complexity of the environment;
 the difficulty of achieving a given task;
 the resources available for pursuing the task.
Experiments that vary these factors reveal how the intelligent
system’s behavior depends on them.
Synthetic domains let one alter such variables systematically,
but it is crucial that they be similar to natural domains.
Independent Variables: System Characteristics
Another important class of variables involves system features.
Varying these factors leads to different types of experiments:
 parametric studies (altering system parameters);
 lesion studies (removing a system component);
 replacement studies (replacing one module with another).
Such experiments suggest ways that the intelligent system’s
behavior depends on its parameters and components.
Studies that vary two or more factors can reveal interactions
among them.
Independent Variables: System Knowledge
A third class of factors concerns the knowledge and experience
of the intelligent system.
One can adapt lesion and replacement studies to examine:
 the presence or absence of types of knowledge;
 the amount of knowledge about a given subject;
 the amount of experience with a class of tasks.
Such experiments let one plot behavioral measures as a function
of knowledge and experience (learning curves).
They also let one compute higher-order measures such as rate
of improvement and asymptotic performance.
Repositories for Cognitive Systems
Public repositories are now common among the AI subfields,
and they offer clear advantages for research by:
 providing fast and cheap materials for experiments;
 supporting replication and standards for comparison.
However, they can also produce undesirable side effects by:
 focusing attention on a narrow class of problems;
 encouraging a ‘bake-off ’ mentality among researchers.
To support research on cognitive systems, we need testbeds
and environments designed to evaluate general intelligence.
Desirable Characteristics of Testbeds
Testbeds that are designed to support research on integrated
cognitive systems should:
 include a variety of domains to ensure generality;
 be well documented and simple for researchers to use;
 have standard formats to ease interface with systems.
However, these features are already present in many existing
repositories, and more work is necessary.
Desirable Characteristics of Testbeds
In addition, testbeds for integrated cognitive systems should:
 contain not data sets but task environments
 which support agents that exist over time
 at least some of which involve physical domains
 provide an infrastructure to ease experimentation with
 external databases (e.g., geographic information systems)
 controlled capture, replay, and restart of scenarios
 methods for recording performance measures
Also, environments should have little or no dependence on
sensory processing.
Physical vs. Simulated Environments
For domains that involve external settings, one can either a
physical or a simulated environment for evaluation.
Simulated environments have many advantages, including:
 ability to vary domain parameters and physical layout;
 ease of recording traces of behavior and cognitive state.
One can make simulated environments more realistic by:
 using simulators that support kinematics and dynamics;
 including data from real sensors in analogous locations.
This approach combines the relevance of physical testbeds
with the affordability of synthetic ones.
Some Promising Domains
A number of domains hold promise for the experimental study
of integrated cognitive systems:
 urban search and rescue (Balakirsky & Messina, 2002);
 flying aircraft on military missions (Jones et al., 1999);
 driving a vehicle in a city (Choi et al., 2004);
 playing strategy games (Aha & Molineaux, 2004);
 general game playing (Genesereth, 2004).
Each requires the integration of cognition, perception, and
action in a complex, dynamical setting.
Goals of Scientific Experimentation
Science aims not to show that one method is better than another,
but to understand the reasons for complex behavior.
This goal can best be achieved through experimental studies that:
 ask clear questions or test specific hypotheses
 examine relations between behavior and independent factors
 move beyond descriptions to explanations of phenomena
Good experiments provide insight into the reasons that underlie
system behavior.
Also, whether or not they support an hypothesis, they do not end
the story, but rather suggest ideas for further studies.
Concluding Remarks
In this talk, we considered the experimental study of integrated
cognitive systems, including:




challenges posed by their distinctive characteristics;
dependent measures that describe their behavior;
independent variables that influence this behavior;
the need for environments and testbeds that:
 exercise the full capabilities of integrated agents;
 evaluate their behavior at the system level;
 support studies of interactions among components.
Taking these into account will transform the study of integrated
cognitive systems into a well-balanced experimental science.
End of Presentation