Transcript Document

CONFUCIUS:
An Intelligent MultiMedia Storytelling
Interpretation and Presentation System
Minhua Eunice Ma
Supervisor: Prof. Paul Mc Kevitt
School of Computing and Intelligent Systems
Faculty of Engineering
University of Ulster, Magee
Outline








Related research
Overview of CONFUCIUS
Automatic generation of 3D animation
Semantic representation
Natural language processing
Current state of implementation
Relation to other work
Conclusion & Future work
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Related research
 3D visualisation

Virtual humans & embodied agents: Jack, Improv, BEAT

MultiModal interactive storytelling: AesopWorld, KidsRoom,
Larsen & Petersen’s Interactive Storytelling, computer games

Automatic Text-to-Graphics Systems: WordsEye, CD-based
language animation
 Related research in NLP




Lexical semantics
Levin’s verb classes
Jackendoff’s Lexical Conceptual Structure
Schank’s scripts
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Objectives of CONFUCIUS
Storywriter
/playwright
Movie/drama script
CONFUCIUS
3D animation
User
/story
listener
 To interpret natural language sentences/stories and to
extract conceptual semantics from the natural language
 To generate 3D animation and virtual worlds
automatically from natural language
 To integrate 3D animation with speech and non-speech
audio, to form an intelligent multimedia storytelling
system
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Architecture of CONFUCIUS
Natural language stories
Script writer
Script parser
Prefabricated objects
(knowledge base)
LCS lexicon
Natural
grammar
Language knowledge
mapping
Language
Processing
3D authoring tools,
existing 3D models &
character models
visual knowledge
(3D graphic library)
Text To
Speech
Sound
effects
semantic
representations
visual
knowledge
Animation
generation
Synchronizing & fusion
3D world with audio in VRML
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Software & Standards
 Java



parsing semantic representation
changing VRML code to add/modify animation
integrating modules
 Natural language processing tools


Connexor Machinese DFG parser (morphologic and syntax
parsing)
WordNet (lexicon, semantic inference)
 3D graphic modelling


Existing 3D models (virtual human/object) on Internet
Authoring tools




Humanoid characters: Character Studio
Props & stage: 3D Studio Max
Narrator: Microsoft Agent
Modelling language & standard


VRML 97 for modelling geometry of objects, props, environment
H-Anim specifications for humanoid modelling
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Agents and Avatars—How much autonomy?
 Autonomous agents have higher requirements for sensing, memory,
reasoning, planning, behaviour control & emotion (sense-emotioncontrol-action structure)
 “User-controlled” avatars require fewer autonomous actions-- basic
naïve physics such as collision detection and reaction still required
 Virtual character in non-interactive storytelling between agents and
avatars--its behaviours, emotion, responses to changing environment
described in story input
Virtual humans:
Autonomy & intelligence:
avatars
characters in
non-interactive
storytelling
interface agents
low
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
autonomous
agents
high
Graphics library
objects/props
characters
Simple geometry files
geometry & joint hierarchy
Files (H-Anim)
instantiation
motions
animation library
(key frames)
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Level of Articulation (LOA) of H-Anim
 CONFUCIUS adopts LOA1 in human animation
 animation engine adds ROUTEs dynamically
based on H-anim’s joints & animation keyframes
 CONFUCIUS’ human animation adapted for
other LOAs.
pushing objects
holding objects
Joints and segments of LOA1
Example site nodes on hands
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Semantic representations
Categories
Knowledge representations
rule-based representation
Decomposite
FOPC
(First Order Predicate Calculus)
general knowledge
representation &
reasoning
Typical applications
expert systems
semantic networks
sentence representation,
expert systems
lexical semantics
Schank’s scripts
story understanding
frame-based representations
XML-based representations
multimodal semantics
Conceptual Dependency (CD)
event-logic truth conditions
physical knowledge
representation &
x-schema and f-structure
reasoning (inc.
Lexical-Conceptual Structure
spatial /temporal
(LCS)
reasoning)
Lexical Visual Semantic
Representation (LVSR)
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
dynamic vision (movement)
recognition & generation
Lexical Visual Semantic Representation
 Lexical Visual Semantic Representation (LVSR):
semantic representation between language syntax and
3D models
 LVSR based on Jackendoff’s LCS adapted to task of
language visualization (enhancement with Schank’s
scripts)
 Ontological categories: OBJ, HUMAN, EVENT, STATE,
PLACE, PATH, PROPERTY





OBJ -- props/places (e.g. buildings)
HUMAN -- human being/other articulated animated characters
(e.g. animals) as long as their skeleton hierarchy is defined
EVENT -- actions, movements and manners
STATE -- static existence
PROPERTY -- attributes of OBJ/HUMAN
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
PATH & PLACE predicates
 interpret spatial movement of OBJ/HUMANs
 62 common English prepositions
 7 PATH predicates & 11 PLACE predicates
PATH
predicates
Direction
feature
Termination
feature
to
from
toward
away_from
via
across
along
1
0
1
0
n/a
n/a
n/a
1
1
0
0
0
n/a
n/a
PLACE
predicates
at
behind
end_of
in
in_front_of
near
on
out
over
top_of
under
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
contact/attach
feature
unmarked
<-contact>
n/a
unmarked
<-contact>
<-contact>
<+contact>
unmarked
<-contact>
n/a
unmarked
NLP in CONFUCIUS
Pre-processing
Part-of-speech tagger
Connexor
FDG parser
Syntactic parser
Semantic
inference
WordNet
LCS database
Disambiguation
FEATURES
Morphological
parser
Coreference
resolution
Temporal
reasoning
Lexical
temporal relations
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Post-lexical
temporal relations
Visual valency & verb ontology
2.2.1. Human action verbs
2.2.1.1. One visual valency (the role is a human, (partial) movement)
2.2.1.1.1. Biped kinematics: arm actions (wave, scratch), leg actions (walk,
jump, kick), torso actions (bow), combined actions (climb)
2.2.1.1.2. Facial expressions & lip movement, e.g. laugh, fear, say, sing, order
2.2.1.2. Two visual valency (at least one role is human)
2.2.1.2.1. One human and one object (vt. or vi.+instrument)
e.g. throw, push, kick, open, eat, drink, bake, trolley
2.2.1.2.2. Two humans, e.g. fight, chase, guide
2.2.1.3. Visual valency ≥ 3 (at least one role is human)
2.2.1.3.1. Two humans and one object (inc. ditransitive verbs), e.g. give, show
2.2.1.3.2. One human and 2+ objects (vt. + object + implicit instr./goal/theme)
e.g. cut, write, butter, pocket, dig, cook
2.2.1.4. Verbs without distinct visualisation when out of context: verbs of trying,
helping, letting, creating/destroying
2.2.1.5. High level behaviours (routine events), political and social activities
e.g. interview, eat out (go to restaurant), go shopping
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Level-of-Detail (LOD)
basic-level verbs & troponyms
EVENT
…
go
cause
event level verbs
…
walk
climb
limp stride trot swagger
run
jump
manner level verbs
jog romp skip bounce hop
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
troponym level verbs
Current status of implementation
 Collision detection example (contact verbs: hit, collide, scratch, touch)
The car collided with a wall.

using ParallelGraphics’ VRML extension--object-to-object collision

non-speech sound effects
 H-Anim examples:
3 visual valency verbs
John put a cup of coffee on the table.

H-Anim Site node

locative tags of object (on_table tag for table object)
2 visual valency verbs
John pushed the door.
John ate the bread.
Nancy sat on the chair.
1 visual valency verbs
The waiter came to me: “Can I help you? Sir.”

speech modality & lip synchronization

camera direction (avatar’s point-of-view)
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Relation to other work
 Domain-independent general purpose humanoid
character animation
 CONFUCIUS’ character animation focuses on
language-to-humanoid animation process rather than
considering human modelling & motion solely
 Implementable semantic representation LVSR
connecting linguistic semantics to visual semantics &
suitable for action execution (animation)
 Categorization and visualisation of eventive verbs
based on visual valency
 Reusable common sense knowledge base to elicit
implied actions, instruments, goals, themes
underspecified in language input
Faculty Research Student Conference
Jordanstown, 15 Jan 2004
Conclusion & Future work
 Humanoid animation explores problems in language
visualization & automatic animation production
 Formalizes meaning of action verbs and spatial prepositions
 Maps language primitives with visual primitives
 Reusable common senses knowledge base for other systems
Further work
Prospective applications
 Discourse level interpretation
 Action composition for simultaneous
 Children’s education
activities
 Verbs concerning multiple characters’
synchronization & coordination
(e.g. introduce)
 Movie/drama production
 Multimedia presentation
 Computer games
 Virtual Reality
Faculty Research Student Conference
Jordanstown, 15 Jan 2004