Semantic association in humans and machines

Download Report

Transcript Semantic association in humans and machines

Analyzing unstructured text with topic models
Mark Steyvers
Dep. of Cognitive Sciences & Dep. of Computer Science
University of California, Irvine
collaborators: Padhraic Smyth, UC Irvine; Tom Griffiths UC Berkeley
Analyzing Unstructured Text
NYT
330,000 articles
AOL queries
20,000,000 queries
650,000 users
Enron
250,000 emails
Pennsylvania Gazette
(1728-1800)
80,000 articles
NSF/ NIH
100,000 grants
16 million Medline articles
Topic Models and Text Analysis
• Can answer a number of questions:
What is in this corpus?
What is in this document, paragraph, or sentence?
What does this person/group of people write about?
What tags are appropriate for this document?
What are the topical trends over time?
Topic Models
• Automatic and unsupervised extraction of semantic themes
from large text collections.
• Widely used model in machine learning and text mining
– pLSI Model: Hoffman (1999)
– LDA Model: Blei, Ng, and Jordan (2001, 2003)
– LDA with Gibbs sampling : Griffiths and Steyvers (2003, 2004)
Basic Assumptions
• Each topic is a distribution over words
• Each document a mixture of topics
• Each word in a document originates from a single topic
Model
P( words | document ) = S P(words|topic) P (topic|document)
Topic = probability
distribution over words
topic weights
for each document
Automatically learned from text corpus
Toy Example
MONEY
LOAN
BANK
RIVER
STREAM
1.0
1
1
1
1
1
1
1
MONEY BANK BANK LOAN BANK MONEY BANK
1
1 LOAN1 LOAN1 BANK1 MONEY1 ....
MONEY BANK
.6
2
1
2
2
2
1
RIVER MONEY BANK STREAM BANK BANK
.4
RIVER
STREAM
BANK
MONEY
LOAN
Topics
1.0
Topic
Weights
1
2
1
2 LOAN1 MONEY1 ....
MONEY RIVER MONEY BANK
2
2
2
2
2
2....
RIVER BANK STREAM BANK RIVER BANK
Documents and topic
assignments
Statistical Inference
?
?
?
?
?
MONEY BANK BANK LOAN BANK MONEY BANK
?
?
?
?
?
MONEY BANK LOAN LOAN BANK MONEY
?
?
?
?
?
?
?
? ....
?
RIVER MONEY BANK STREAM BANK BANK
?
?
?
?
?
MONEY RIVER MONEY BANK LOAN MONEY
?
?
?
?
?
RIVER BANK STREAM BANK RIVER BANK
Topics
Topic
Weights
Documents and topic
assignments
?
?
? ....
?....
Statistical Inference
• Exact inference is intractable
• Markov chain Monte Carlo (MCMC) with Gibbs sampling
• scalable to large document collections (e.g. all of wikipedia)
• parallelizable
• Form of dimensionality reduction
– Number of topics T= 50…2000
Examples Topics from New York Times
Terrorism
Wall Street Firms
Stock Market
Bankruptcy
SEPT_11
WAR
SECURITY
IRAQ
TERRORISM
NATION
KILLED
AFGHANISTAN
ATTACKS
OSAMA_BIN_LADEN
AMERICAN
ATTACK
NEW_YORK_REGION
NEW
MILITARY
NEW_YORK
WORLD
NATIONAL
QAEDA
TERRORIST_ATTACKS
WALL_STREET
ANALYSTS
INVESTORS
FIRM
GOLDMAN_SACHS
FIRMS
INVESTMENT
MERRILL_LYNCH
COMPANIES
SECURITIES
RESEARCH
STOCK
BUSINESS
ANALYST
WALL_STREET_FIRMS
SALOMON_SMITH_BARNEY
CLIENTS
INVESTMENT_BANKING
INVESTMENT_BANKERS
INVESTMENT_BANKS
WEEK
DOW_JONES
POINTS
10_YR_TREASURY_YIELD
PERCENT
CLOSE
NASDAQ_COMPOSITE
STANDARD_POOR
CHANGE
FRIDAY
DOW_INDUSTRIALS
GRAPH_TRACKS
EXPECTED
BILLION
NASDAQ_COMPOSITE_INDEX
EST_02
PHOTO_YESTERDAY
YEN
10
500_STOCK_INDEX
BANKRUPTCY
CREDITORS
BANKRUPTCY_PROTECTION
ASSETS
COMPANY
FILED
BANKRUPTCY_FILING
ENRON
BANKRUPTCY_COURT
KMART
CHAPTER_11
FILING
COOPER
BILLIONS
COMPANIES
BANKRUPTCY_PROCEEDINGS
DEBTS
RESTRUCTURING
CASE
GROUP
Learning multiple meanings of words
PRINTING
PAPER
PRINT
PRINTED
TYPE
PROCESS
INK
PRESS
IMAGE
PRINTER
PRINTS
PRINTERS
COPY
COPIES
FORM
OFFSET
GRAPHIC
SURFACE
PRODUCED
CHARACTERS
PLAY
PLAYS
STAGE
AUDIENCE
THEATER
ACTORS
DRAMA
SHAKESPEARE
ACTOR
THEATRE
PLAYWRIGHT
PERFORMANCE
DRAMATIC
COSTUMES
COMEDY
TRAGEDY
CHARACTERS
SCENES
OPERA
PERFORMED
TEAM
GAME
BASKETBALL
PLAYERS
PLAYER
PLAY
PLAYING
SOCCER
PLAYED
BALL
TEAMS
BASKET
FOOTBALL
SCORE
COURT
GAMES
TRY
COACH
GYM
SHOT
JUDGE
TRIAL
COURT
CASE
JURY
ACCUSED
GUILTY
DEFENDANT
JUSTICE
EVIDENCE
WITNESSES
CRIME
LAWYER
WITNESS
ATTORNEY
HEARING
INNOCENT
DEFENSE
CHARGE
CRIMINAL
HYPOTHESIS
EXPERIMENT
SCIENTIFIC
OBSERVATIONS
SCIENTISTS
EXPERIMENTS
SCIENTIST
EXPERIMENTAL
TEST
METHOD
HYPOTHESES
TESTED
EVIDENCE
BASED
OBSERVATION
SCIENCE
FACTS
DATA
RESULTS
EXPLANATION
STUDY
TEST
STUDYING
HOMEWORK
NEED
CLASS
MATH
TRY
TEACHER
WRITE
PLAN
ARITHMETIC
ASSIGNMENT
PLACE
STUDIED
CAREFULLY
DECIDE
IMPORTANT
NOTEBOOK
REVIEW
Demographic Analysis of
Search Queries
AOL dataset
• Dataset:
- 20,000,000+ web queries
- 650,000+ users
• Users were given “anonymous” user-id
– No demographics in this dataset
Example query log from user #2178
ID
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
2178
Query
dog eats uncooked pasta
inducing dog vomiting
inducing dog vomiting
inducing dog vomiting
inducing dog vomiting
inducing dog vomiting
walmart
sears
target
babycenter.com
google
fit pregnancy
baby center
yahoo.com
applebee's carside
baby names
baby names
baby names
mortgage calculator
us zip codes
us zip codes
Date/Time
2006-05-26
2006-05-26
2006-05-26
2006-05-26
2006-05-26
2006-05-26
2006-05-12
2006-05-12
2006-05-12
2006-05-12
2006-05-16
2006-05-16
2006-05-16
2006-05-18
2006-05-19
2006-05-20
2006-05-20
2006-05-20
2006-05-24
2006-05-25
2006-05-25
15:31:56
15:32:46
15:32:46
15:32:46
15:32:46
15:38:36
12:39:52
12:44:22
17:05:36
17:43:59
10:54:39
15:34:23
15:37:22
17:11:05
19:21:08
15:02:38
15:02:38
15:02:38
14:39:05
21:26:47
21:26:47
URL clicked
http://www.twodogpress.com
http://www.canismajor.com
http://kitchen.robbiehaf.com
http://www.dog-first-aid-101.com
http://www.walmart.com
http://www.sears.com
http://www.target.com
http://www.babycenter.com
http://www.google.com
http://www.yahoo.com
http://www.applebees.com
http://www.babynames.com
http://www.babynamesworld.com
http://www.thinkbabynames.com
http://www.bankrate.com
http://www.usps.com
http://www.usps.com
Another Query Database…
• Not publicly available
• Dataset
– 250,000+ users
– 411,000+ queries
• Age and gender of users are known:
– age brackets: 0-12, 13-17, 18-20, 21-24, 25-29, 3034, 35-44, 45-54, 55-64, 65+
Topic modeling of queries
• Each user searches for a mixture of topics
• Each topic is a probability distribution over query words
Four example topics (out of 200)
auto
brain
fmri
car
imaging
parts
functional
cars
mri
used
subjects
ford
magnetic
honda
resonance
truck
neuroimaging
toyota
structural
Probability distribution
over words. Most likely
words listed at the top
schizophrenia
webmd
patients
cymbalta
deficits
xanax
schizophrenic
gout
psychosis
vicodin
subjects
effexor
psychotic
prednisone
dysfunction
lexapro
abnormalities
clinical
ambien
party
memory
working
store
memories
wedding
tasks
birthday
retrieval
jewelry
encoding
ideas
cognitive
cards
processing
cake
recognition
gifts
performance
disease
hannah
ad
montana
alzheimer
zac
diabetes
efron
cardiovascular
disney
insulin
highvascular
school
musical
blood
mileyclinical
cyrus
individuals
hilary
duff
User = mixture of topics
auto
brain
fmri
car
imaging
parts
functional
cars
mri
used
subjects
ford
magnetic
honda
resonance
truck
neuroimaging
structural
toyota
schizophrenia
webmd
patients
cymbalta
deficits
xanax
schizophrenic
gout
psychosis
vicodin
subjects
effexor
psychotic
prednisone
dysfunction
abnormalities
lexapro
clinical
ambien
80%
20%
User #7654
party
memory
working
store
memories
wedding
tasks
birthday
retrieval
jewelry
encoding
ideas
cognitive
cards
processing
cake
recognition
performance
gifts
100%
User #246
disease
hannah
ad
montana
alzheimer
zac
diabetes
efron
cardiovascular
disney
insulin
highvascular
school
musical
blood
mileyclinical
cyrus
individuals
hilary
duff
Topic Analysis
• Find likely topics for each demographic bucket
• Find likely demographics given topics
• What’s on the mind of people in different age-groups?
“poems” topic
Male
Female
0-12
poems
love_poems
quotes
poetry
love_quotes
famous_quotes
lyrics
love
funny_quotes
friendship_poems
best_love_poems
funny_poems
inspirational_quotes
love_songs
shakespeare
13-17
18-20
21-24
Age group
Topic 6
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
“myspace” topic
Topic 2
0-12
13-17
18-20
Age group
21-24
25-29
30-34
35-44
45-54
55-64
Male
Female
65+
Prob. topic
myspace
google
my_space
yahoo
mysapce
about_blank
my
photobucket
http_google
ww.myspace
myspace_com_blogs
http_myspace
myspace.co
w_myspace
myspcae
“sports” topic
Male
Female
0-12
espn
nfl
nfl_draft
nba
2006_nfl_mock_draft
2006_nfl_draft
mlb
reggie_bush
nfl_mock_draft
dallas_cowboys
vince_young
fox_sports
lakers
raiders
espn_sports
13-17
18-20
21-24
Age group
Topic 29
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
“MTV” topic
Male
Female
0-12
bet
chris_brown
mtv
lyrics
ciara
50_cent
ti
proof
bow_wow
chamillionaire
t.i.
beyonce
atl
allhiphop
lil_wayne
13-17
18-20
21-24
Age group
Topic 92
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
“Clothing Stores” topic
Topic 111
0-12
13-17
18-20
Age group
21-24
25-29
30-34
35-44
45-54
55-64
Male
65+
Female
Prob. topic
old_navy
victoria_secret
hollister
american_eagle
gap
abercrombie
aeropostale
forever_21
victorias_secret
express
charlotte_russe
hot_topic
target
abercrombiefitch
wet_seal
“Hairstyles” topic
Male
Female
0-12
hairstyles
hair_styles
prom_hairstyles
pictureshairstyles
haircuts
sally_beauty_supply
celebrity_hairstyles
hair
short_hairstyles
cosmopolitan
prom_updos
prom_hair_styles
short_hair_styles
picturesprom_hairstyles
prom_hair
13-17
18-20
21-24
Age group
Topic 173
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
“recipes” topic
Male
Female
0-12
food_network
recipes
foodnetwork
foodtv
martha_stewart
kraft
betty_crocker
food_tv
food_network_recipes
allrecipes
easter_recipes
epicurious
rachel_ray
kraft_foods
chicken_recipes
13-17
18-20
21-24
Age group
Topic 10
25-29
30-34
35-44
45-54
55-64
65+
Prob. topic
Results
• Topic models give quick summaries of demographic
trends in query datasets
• Other potential applications:
– e.g. blogs, social networking sites, email, etc
– clinical data, e.g. therapy discussions
Analyzing Emails
who writes on what topics?
Enron email data
250,000 emails
5000 authors
1999-2002
Author-topic models
• We can learn the association between authors of
documents and topics
• Assume each author works on a mixture of topics
ENRON Email: who writes on certain topics?
TOPIC 66
TOPIC 182
TOPIC 113
TOPIC 109
WORD
PROB.
WORD
PROB.
WORD
PROB.
WORD
PROB.
HOLIDAY
0.0857
TEXANS
0.0145
GOD
0.0357
AMAZON
0.0312
PARTY
0.0368
WIN
0.0143
LIFE
0.0272
GIFT
0.0226
YEAR
0.0316
FOOTBALL
0.0137
MAN
0.0116
CLICK
0.0193
SEASON
0.0305
FANTASY
0.0129
PEOPLE
0.0103
SAVE
0.0147
COMPANY
0.0255
SPORTSLINE
0.0129
CHRIST
0.0092
SHOPPING
0.0140
CELEBRATION
0.0199
PLAY
0.0123
FAITH
0.0083
OFFER
0.0124
ENRON
0.0198
TEAM
0.0114
LORD
0.0079
HOLIDAY
0.0122
TIME
0.0194
GAME
0.0112
JESUS
0.0075
RECEIVE
0.0102
RECOGNIZE
0.019
SPORTS
0.011
SPIRITUAL
0.0066
SHIPPING
0.0100
MONTH
0.018
GAMES
0.0109
VISIT
0.0065
FLOWERS
0.0099
SENDER
PROB.
SENDER
PROB.
SENDER
PROB.
SENDER
PROB.
chairman & ceo
0.131
cbs sportsline com 0.0866
crosswalk com
0.2358
amazon com
0.1344
***
0.0102
houston texans 0.0267
wordsmith
0.0208
jos a bank
0.0266
***
0.0046
houstontexans 0.0203
***
0.0107
sharperimageoffers
0.0136
***
0.0022
sportsline rewards 0.0175
travelocity com
0.0094
general announcement 0.0017
pro football 0.0136
barnes & noble com
0.0089
doctor dictionary 0.0101
***
0.0061
... But also over senders (authors) of email. Most likely
authors listed at the top
Enron email: two example topics (T=100)
TOPIC 10
TOPIC 32
WORD
PROB.
WORD
PROB.
BUSH
0.0227
ANDERSEN
0.0241
LAY
0.0193
FIRM
0.0134
MR
0.0183
ACCOUNTING
0.0119
WHITE
0.0153
SEC
0.0065
ENRON
0.0150
SETTLEMENT
0.0062
HOUSE
0.0148
AUDIT
0.0054
PRESIDENT
0.0131
CORPORATE
0.0053
ADMINISTRATION
0.0115
FINANCIAL
0.0052
COMPANY
0.0090
JUSTICE
0.0052
ENERGY
0.0085
INFORMATION
0.0050
SENDER
PROB.
SENDER
PROB.
NELSON, KIMBERLY (ETS)
0.3608
HILTABRAND, LESLIE
0.1359
PALMER, SARAH
0.0997
WELLS, TORI L.
0.0865
DENNE, KAREN
0.0541
DUPREE, DIANNA
0.0825
HOTTE, STEVE
0.0340
ARMSTRONG, JULIE
0.0316
DUPREE, DIANNA
0.0282
DENNE, KAREN
0.0208
ARMSTRONG, JULIE
0.0222
SULLIVAN, LORA
0.0072
LOKEY, TEB
0.0194
[email protected]
0.0026
SULLIVAN, LORA
0.0073
WILSON, DANNY
0.0016
VILLARREAL, LILLIAN
0.0040
HU, SYLVIA
0.0013
BAGOT, NANCY
0.0026
MATHEWS, LEENA
0.0012
Detecting Papers on Unusual Topics for Authors
• We can calculate perplexity (unusualness) for words in a
document given an author
Papers ranked by perplexity for M. Jordan:
Author Separation
Can model attribute words to authors correctly within a
document?
A method1 is described which like the kernel1 trick1 in support1 vector1 machines1 SVMs1 lets us generalize
distance1 based2 algorithms to operate in feature1 spaces usually nonlinearly related to the input1 space This
is done by identifying a class of kernels1 which can be represented as norm1 based2 distances1 in Hilbert
spaces It turns1 out that common kernel1 algorithms such as SVMs1 and kernel1 PCA1 are actually really
distance1 based2 algorithms and can be run2 with that class of kernels1 too As well as providing1 a useful
new insight1 into how these algorithms work the present2 work can form the basis1 for conceiving new
algorithms
Written by
(1) Scholkopf_B
This paper presents2 a comprehensive approach for model2 based2 diagnosis2 which includes proposals for
characterizing and computing2 preferred2 diagnoses2 assuming that the system2 description2 is augmented
with a system2 structure2 a directed2 graph2 explicating the interconnections between system2 components2
Specifically we first introduce the notion of a consequence2 which is a syntactically2 unconstrained
propositional2 sentence2 that characterizes all consistency2 based2 diagnoses2 and show2 that standard2
characterizations of diagnoses2 such as minimal conflicts1 correspond to syntactic2 variations1 on a
consequence2 Second we propose a new syntactic2 variation on the consequence2 known as negation2
normal form NNF and discuss its merits compared to standard variations Third we introduce a basic
algorithm2 for computing consequences in NNF given a structured system2 description We show that if the
system2 structure2 does not contain cycles2 then there is always a linear size2 consequence2 in NNF which
can be computed in linear time2 For arbitrary1 system2 structures2 we show a precise connection between
the complexity2 of computing2 consequences and the topology of the underlying system2 structure2 Finally
we present2 an algorithm2 that enumerates2 the preferred2 diagnoses2 characterized by a consequence2 The
algorithm2 is shown1 to take linear time2 in the size2 of the consequence2 if the preference criterion1 satisfies
some general conditions
Written by
(2) Darwiche_A
Application:
Faculty Browser
Faculty Browser
• Automatically analyzes computer science papers by
UC San Diego and UC Irvine researchers
• Finds topically related researchers
one
topic
most prolific
researchers
in this topic
one
researcher
topics this
researcher is
interested in
other researchers
with similar
topical interests
Inferred network of researchers connected through topics
Modeling Extensions
Entity-topic modeling
330,000 articles
2000-2002
Who is mentioned in what context?
Extracted Named Entities
Three investigations began Thursday into the
securities and exchange_commission's choice
of william_webster to head a new board
overseeing the accounting profession. house and
senate_democrats called for the resignations of
both judge_webster and harvey_pitt, the
commission's chairman.
The white_house
expressed support for judge_webster as well as
for harvey_pitt, who was harshly criticized
Thursday for failing to inform other
commissioners before they approved the choice
of judge_webster that he had led the audit
committee of a company facing fraud
accusations. “The president still has confidence
in harvey_pitt,” said dan_bartlett, bush's
communications director …
Used standard algorithms
to extract named entities:
- People
- Places
- Organizations
Standard Topic Model with Entities
Basketball
team
0.028
play
0.015
game
0.013
season
0.012
final
0.011
games
0.011
point
0.011
series
0.011
player
0.010
coach
0.009
playoff
0.009
championship
0.007
playing
0.006
win
0.006
LAKERS
0.062
SHAQUILLE-O-NEAL0.028
KOBE-BRYANT
0.028
PHIL-JACKSON
0.019
NBA
0.013
SACRAMENTO
0.007
RICK-FOX
0.007
PORTLAND
0.006
ROBERT-HORRY 0.006
DEREK-FISHER
0.006
Tour de France
tour
rider
riding
bike
team
stage
race
won
bicycle
road
hour
scooter
mountain
place
LANCE-ARMSTRONG
FRANCE
JAN-ULLRICH
LANCE
U-S-POSTAL-SERVICE
MARCO-PANTANI
PARIS
ALPS
PYRENEES
SPAIN
0.039
0.029
0.017
0.016
0.016
0.014
0.013
0.012
0.010
0.009
0.009
0.008
0.008
0.008
0.021
0.011
0.003
0.003
0.002
0.002
0.002
0.002
0.001
0.001
Holidays
holiday
gift
toy
season
doll
tree
present
giving
special
shopping
family
celebration
card
tradition
CHRISTMAS
THANKSGIVING
SANTA-CLAUS
BARBIE
HANUKKAH
MATTEL
GRINCH
HALLMARK
EASTER
HASBRO
Oscars
0.071
0.050
0.023
0.019
0.014
0.011
0.008
0.008
0.007
0.007
0.007
0.007
0.007
0.006
0.058
0.018
0.009
0.004
0.003
0.003
0.003
0.002
0.002
0.002
award
film
actor
nomination
movie
actress
won
director
nominated
supporting
winner
picture
performance
nominees
OSCAR
ACADEMY
HOLLYWOOD
DENZEL-WASHINGTON
JULIA-ROBERT
RUSSELL-CROWE
TOM-HANK
STEVEN-SODERBERGH
ERIN-BROCKOVICH
KEVIN-SPACEY
0.026
0.020
0.020
0.019
0.015
0.011
0.011
0.010
0.010
0.010
0.008
0.008
0.007
0.007
0.035
0.020
0.009
0.006
0.005
0.005
0.005
0.004
0.003
0.003
Computers
computer
technology
system
digital
chip
software
machine
devices
machines
video
Companies
0.069
0.026
0.015
0.014
0.013
0.013
0.011
0.010
0.010
0.009
1.000
Companies
IBM
APPLE
INTEL
MICROSOFT
COMPAQ
SONY
DELL
HP
0.074
0.061
0.059
0.053
0.041
0.029
0.019
0.018
Arts
play
show
stage
theater
director
production
performance
dance
audience
festival
Theater
Music
0.030
0.029
0.022
0.022
0.017
0.017
0.016
0.014
0.014
0.013
0.960
0.040
Theatre
BROADWAY
NEW_YORK
SHAKESPEARE
THEATER
LONDON
GUINNESS
TONY
LINCOLN_CTR
Music
0.119
0.044
0.029
0.022
0.019
0.018
0.016
0.015
BACH
BEETHOVEN
LOUIS_ARMSTRONG
MOZART
CARNEGIE_HALL
LATIN
0.035
0.026
0.019
0.019
0.017
0.017
Example of Extracted
Entity-Topic Network
FBI_Investigation
AL_HAZMI
Pakistan_Indian_War
MOHAMMED_ATTA
Detainees
ZAWAHIRI
TALIBAN
US_Military
Terrorist_Attacks
AL_QAEDA
Muslim_Militance
HAMAS
BIN_LADEN
ARIEL_SHARON
Mid_East_Conflict
Afghanistan_War
MOHAMMED
KING_ABDULLAH
HAMID_KARZAI
Palestinian_Territories
NORTHERN_ALLIANCE
YASSER_ARAFAT
KING_HUSSEIN
Mid_East_Peace
Religion
EHUD_BARAK
Topic Trends
Tour-de-France
15
Proportion of words
assigned to topic for that
time slice
10
5
0
Jan00
Quarterly Earnings
Jul00
Jan01
Jul01
Jan02
Jul02
Jan03
Jul00
Jan01
Jul01
Jan02
Jul02
Jan03
Jul02
Jan03
30
20
10
0
Jan00
Anthrax
100
50
0
Jan00
Jul00
Jan01
Jul01
Jan02
Learning Topic Hierarchies
(example: psych Review Abstracts)
THE
OF
AND
TO
IN
A
IS
A
MODEL
MEMORY
FOR
MODELS
TASK
INFORMATION
RESULTS
ACCOUNT
RESPONSE
SPEECH
STIMULUS
READING
REINFORCEMENT
WORDS
RECOGNITION MOVEMENT
STIMULI
MOTOR
RECALL
VISUAL
CHOICE
WORD
CONDITIONING SEMANTIC
ACTION
SOCIAL
SELF
EXPERIENCE
EMOTION
GOALS
EMOTIONAL
THINKING
SELF
SOCIAL
PSYCHOLOGY
RESEARCH
RISK
STRATEGIES
INTERPERSONAL
PERSONALITY
SAMPLING
GROUP
IQ
INTELLIGENCE
SOCIAL
RATIONAL
INDIVIDUAL
GROUPS
MEMBERS
SEX
EMOTIONS
GENDER
EMOTION
STRESS
WOMEN
HEALTH
HANDEDNESS
MOTION
VISUAL
SURFACE
BINOCULAR
RIVALRY
CONTOUR
DIRECTION
CONTOURS
SURFACES
DRUG
FOOD
BRAIN
AROUSAL
ACTIVATION
AFFECTIVE
HUNGER
EXTINCTION
PAIN
REASONING
IMAGE
CONDITIONIN
ATTITUDE
COLOR
STRESS
CONSISTENCY
MONOCULAR
EMOTIONAL
SITUATIONAL
LIGHTNESS
BEHAVIORAL
INFERENCE
GIBSON
FEAR
JUDGMENT
SUBMOVEMENT STIMULATION
PROBABILITIES ORIENTATION
TOLERANCE
STATISTICAL HOLOGRAPHIC
RESPONSES
developmental
social
ethnic
processes
task
development
resource
performance
processing
behavior
anaphors
food
group
intelligence
intellectual
drinking
iq
hypothalamus
behavior
connections
physiological
development
evolutionary
genes part
thinking
perception comparative
kind
visual
scientific
direction
activities
rule distance
adaptationretinal
disparity
image
perceived
strategies
problems
term
confirmation
neural
limitations
visual
neurons
behavioral
fear
masking
anxiety
pain
amygdala
automatic
bias
statistical
associative
heuristics
reading matrices
know ledge
matrix
text
learningintuitive
strength
al
readers
maps heuristic
familiarity
face
distributed retroactive
meaning
map
recognition
grams
comprehension
language
barrier
deviation
faces
associate likelihood
semantic
parallel
damaged
associations
linguistic
semantic
paired
thought
correlations
memory
list
item
items
recognition
model
theory
models
w ord
response
response
instrumental
responses
conditioning
behavior
memory
model
models
information
social
model
theory
information
effects
account
theory
risk
model
conjunction
data
decision
information
structure
probabilities
proposed
relations
risky
scale binocular
dimensional rivalry
keys stereopsisreinforcement
monocular behavior
extinction
visual
matching
partial
sleep
imagery
social
left
dreaming perception
rem
impression cerebral
eye
research handedness
speech
approach
human
events
motion
categories
contrast
category interpersonal
event
path
metaphor
impersonal
visual
personality
object
sex
contour
behavior
metaphors equilibrium surface
bayesian
affects
psychological
representations trait
inference
biological
psychology surfaces consistency
speech algorithms
cognitive
differences
review
occluding idiographic
authors
auditory
gigerenzer
handedness
american
contour
action
acoustic frequency
heuristics
association
child reasoning
control perceptual
children biases
intention
sound
development
goal
field
intentions
risk
memory
know ledge
retrieval
skill
serial
reading
storage
access
w orking
specific
preference
visual
movement
reinforcement imagery
eye
choice representations
position
punishment
mental
model
speed
contingent subsystems
effects
target
learning
situational
theory
consistency object
images
systems
cross
based
perception orientation
attribute
temporal
neglect
erotic
according
behavior attention stochastic
bem
lightness
choice
space
sexual
objects
difference
masking
ebe
transitivity
stimuli
metacontrast
visual
type
serial
component
inhibition function
contour
mask
latency
forw ard
position
items
choice
delays
alternatives
fixed
rew ard
reasoning
bayesian
similarities
model statements
memory
gain
processes
models
learning
letters
model
w ords
letter
function
memory
psychometric
correlations
individuals
stress
performance
system
immune
arousal
fight
model
cs
avoidance
ucs
model
conditioning
memory
problems
items
goodness
theoretical
approach
representation
holographic
pictorial
similarity
geometric
objects
density
distance
ce
conditioning
principles
image
reinforcement
components
rew ard
bound
nearest
memory
theory
neighbor
reasoning
sentence
interference
james
process
fit model
background
emotion
memory
decision
response
theory
theory
achievement
emotion
motivation
failure
Hidden Markov Topics Model
• Syntactic dependencies  short range dependencies
• Semantic dependencies  long-range
q
z1
z2
z3
z4
w1
w2
w3
w4
s1
s2
s3
s4
Semantic state: generate
words from topic model
Syntactic states: generate
words from HMM
(Griffiths, Steyvers, Blei, & Tenenbaum, 2004)
NIPS Semantics
IMAGE
DATA
IMAGES
GAUSSIAN
OBJECT
MIXTURE
OBJECTS
LIKELIHOOD
FEATURE
POSTERIOR
RECOGNITION
PRIOR
VIEWS
DISTRIBUTION
#
EM
PIXEL
BAYESIAN
VISUAL
PARAMETERS
STATE
POLICY
VALUE
FUNCTION
ACTION
REINFORCEMENT
LEARNING
CLASSES
OPTIMAL
*
MEMBRANE
SYNAPTIC
CELL
*
CURRENT
DENDRITIC
POTENTIAL
NEURON
CONDUCTANCE
CHANNELS
EXPERTS
EXPERT
GATING
HME
ARCHITECTURE
MIXTURE
LEARNING
MIXTURES
FUNCTION
GATE
KERNEL
SUPPORT
VECTOR
SVM
KERNELS
#
SPACE
FUNCTION
MACHINES
SET
NETWORK
NEURAL
NETWORKS
OUPUT
INPUT
TRAINING
INPUTS
WEIGHTS
#
OUTPUTS
NIPS Syntax
IN
WITH
FOR
ON
FROM
AT
USING
INTO
OVER
WITHIN
IS
WAS
HAS
BECOMES
DENOTES
BEING
REMAINS
REPRESENTS
EXISTS
SEEMS
SEE
SHOW
NOTE
CONSIDER
ASSUME
PRESENT
NEED
PROPOSE
DESCRIBE
SUGGEST
USED
TRAINED
OBTAINED
DESCRIBED
GIVEN
FOUND
PRESENTED
DEFINED
GENERATED
SHOWN
MODEL
ALGORITHM
SYSTEM
CASE
PROBLEM
NETWORK
METHOD
APPROACH
PAPER
PROCESS
HOWEVER
ALSO
THEN
THUS
THEREFORE
FIRST
HERE
NOW
HENCE
FINALLY
#
*
I
X
T
N
C
F
P
Random sentence generation
LANGUAGE:
[S] RESEARCHERS GIVE THE SPEECH
[S] THE SOUND FEEL NO LISTENERS
[S] WHICH WAS TO BE MEANING
[S] HER VOCABULARIES STOPPED WORDS
[S] HE EXPRESSLY WANTED THAT BETTER VOWEL
Software
Public-domain MATLAB toolbox for topic modeling on the Web:
http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm