Small-business Entrepreneurship as a Tool for Fostering

Download Report

Transcript Small-business Entrepreneurship as a Tool for Fostering

IGERT External Advisory
Board Meeting
Wednesday, March 14, 2007
INSTITUTE FOR COGNITIVE SCIENCES
University of Pennsylvania
COGS 501 & COGS 502
A two-semester sequence
that aims to provide basic mathematical and algorithmic tools
for the study of animal, human or machine communication.
COGS 501:
1. Mathematical and programming basics:
linear algebra and Matlab
2. Probability
3. Model fitting
4. Signal processing
COGS 502 topics:
1. Information theory
2. Formal language theory
3. Logic
4. Machine learning
Challenges

Diverse student background:
from almost nothing to MS-level skills in




Mathematics
Programming
Breadth of topics and applications:
normally many courses
with many prerequisites
Lack of suitable instructional materials
Precedent: LING 525 / CIS 558
Computer Analysis and Modeling of Biological
Signals and Systems.
A hands-on signal and image processing course for
non-EE graduate students needing these skills.
We will go through all the fundamentals of
signal and image processing using computer
exercises developed in MATLAB. Examples will
be drawn from speech analysis and synthesis,
computer vision, and biological modeling.
History
CIS 558 / LING 525: “Digital signal processing for non-EEs”
- started in 1996 by Simoncelli and Liberman
- similar problems of student diversity,
breadth of topics, lack of suitable materials
Solutions:
- Matlab-based lab course: concepts, digital methods, applications
- Several tiers for each topic: basic, intermediate, advanced
- Extensive custom-built lecture notes and problem sets
- Lots of individual hand-holding
Results:
- Successful uptake for wide range of student backgrounds
(e.g. from “no math since high school” to “MS in math”;
from “never programmed” to “five years in industry”)
- Successor now a required course in NYU neuroscience program:
“Mathematical tools for neural science”
Mathematical Foundations
Course goal:
IGERT students should understand
and be able to apply



Models of language and communication
Experimental design and analysis
Corpus-based methods
in research areas including





Sentence processing
Animal communication
Language learning
Communicative interaction
Cognitive neuroscience
COGS 501-2 rev. 0.1

Problem is somewhat more difficult:



Advance preparation was inadequate



Students are even more diverse
Concepts and applications are even broader
Lack of pre-prepared lecture notes and problems
(except for those derived from other courses)
Not enough coordination by faculty
Not enough explicit connections to research
COGS 501-2: how to do better
Will start sequence again in Fall 2007
 Plans for rev. 0.9:



Advance preparation
of course-specific lecture notes and problem sets
Systematic remediation where needed



for mathematical background
for entry-level Matlab programming
Connection to research themes
(e.g. sequence modeling, birdsong analysis, artificial
language learning)


Historical papers
Contemporary research
Research theme: example
“Colorless green ideas sleep furiously”

Shannon 1948

Chomsky 1957

Pereira 2000

(?)
Word sequences: Shannon
C. Shannon, “A mathematical theory of communication”, BSTJ, 1948:
…a sufficiently complex stochastic process will give a satisfactory
representation of a discrete source.
[The entropy] H … can be determined by limiting operations directly from the
statistics of the message sequences…
[Specifically:]
Theorem 6: Let p(Bi, Sj) be the probability of sequence Bi followed by symbol Sj and
pBi(Sj) … be the conditional probability of Sj after Bi. Let
FN   p( Bi , S j ) log pBi (S j )
i, j
where the sum is over all blocks Bi of N-1 symbols and over all symbols Sj.
Then FN is a monotonic decreasing function of N, …
and LimN→∞ FN = H.
Word sequences: Chomsky
N. Chomsky, Syntactic Structures, 1957:
(1) Colorless green ideas sleep furiously.
(2) Furiously sleep ideas green colorless.
. . . It is fair to assume that neither sentence (1) nor (2)
(nor indeed any part of these sentences) has ever
occurred in an English discourse. Hence, in any
statistical model for grammaticalness, these sentences
will be ruled out on identical grounds as equally
‘remote’ from English. Yet (1), though nonsensical, is
grammatical, while (2) is not.
Word sequences: Pereira
F. Pereira,
“Formal grammar and information theory: together again?”, 2000:
[Chomsky’s argument] relies on the unstated assumption that any
probabilistic model necessarily assigns zero probability to unseen
events. Indeed, this would be the case if the model probability
estimates were just the relative frequencies of observed events (the
maximum-likelihood estimator). But we now understand that this
naive method badly overfits the training data. […] To avoid this, one
usually smoothes the data […] In fact, one of the earliest such
methods, due to Turing and Good (Good, 1953), had been published
before Chomsky's attack on empiricism…
Word sequences: Pereira
Hidden variables … can also be used to create factored models of joint distributions
that have far fewer parameters to estimate, and are thus easier to learn, than models
of the full joint distribution. As a very simple but useful example, we may
approximate the conditional probability p(x, y) of occurrence of two words x and y
in a given configuration as
p( x, y)  p( x) p( y | c) p(c | x)
c
where c is a hidden “class” variable for the associations between x and y …
When (x ,y) = (vi, vi+1) we have an aggregate bigram model … which is useful for
modeling word sequences that include unseen bigrams.
With such a model, we can approximate the probability of a string p(w1 … wn) by
n
p( w1...wn )  p( w1  p( wi | wi 1 )
i 2
Word sequences: Pereira
Using this estimate for the probability of a string and an aggregate model with
C=16 trained on newspaper text using the expectation-maximization method,
we find that
p(Colorless green ideas sleep furiously)
 2 105
p(Furiously sleep ideas green colorless)
Word sequences: concepts & problems

Concepts:

Information entropy
(and conditional entropy, cross-entropy, mutual information)



Markov models, N-gram models
Chomsky hierarchy (first glimpse)
Problems:



Entropy estimation algorithms
(n-gram, LZW, BW, etc.)
LNRE smoothing methods
EM estimation of hidden variables
(learned earlier for gaussian mixtures)