Tanya Korelsky - The COCOSDA home page

Download Report

Transcript Tanya Korelsky - The COCOSDA home page

NSF Funding
of LT resources
Tanya Korelsky, Program Director
Robust Intelligence Cluster
Division of Information and Intelligent Systems
Directorate for Computer and Information Science and Engineering
National Science Foundation
[email protected]
http://www.nsf.gov/
How NSF is organized
Office of the Director
Biological Sciences
Geosciences
Computer and Information
Sciences and Engineering
Mathematical and
Physical Sciences
Education and
Human Resources
Social, Behavioral
And Economic Sciences
Engineering
How CISE is organized
Office of the Director
Office of the
Assistant Director
for CISE
CCF
CNS
IIS
OCI
Computing and
Communications
Foundations
Computer and
Network
Systems
Information and
Intelligent
Systems
Office of
Cyberinfrastructure
Clusters
Clusters
Crosscutting Emphasis Areas
Clusters
(formerly SCI,
now with NSFwide mission,
reporting to
Director of NSF)
Funding Rate for Competitive Awards in CISE
100%
90%
6,000
80%
5,000
70%
60%
4,000
50%
3,000
40%
30%
2,000
20%
1,000
10%
0
0%
1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004
Competitive Proposal Actions
Competitive Awards
Funding Rate
Funding Rate
Number of Proposals and Awards
7,000
CISE Proposal/Award Statistics
FY
Proposals
Awards
Funding
Rate
CGIs
Supplements
2005
4,962
1,086
23%
1,398
581
2004
6,266
1,017
16%
1,297
400
2003
5,346
1,174
22%
1,023
354
2002
4,314
1,038
24%
918
308
2001
3,579
885
25%
768
231
2000
2,853
903
32%
547
210
1999
2,209
746
34%
493
301
1998
1,885
667
35%
476
211
1997
1,894
684
36%
527
219
1996
1,760
601
34%
610
183
1995
1,941
708
36%
631
215
*ADJUSTED
CISE Budget: 2003-2007
$527M
Dollars in Millions
525
Requested 6.1%
increase includes
20M for cybersecurity,
10M for GENI
500
$496M
475
2003
2004
2005
Fiscal Year
2006
2007
Request
The Human Language and Communication Program
(HLC)
Initiated by Dr. Mary Harper
 This HLC program emphasizes innovative advances in computer
and information sciences relating to all forms of human
communication.
 High-level human communication topics:
 Text Processing
 Speech Processing
 Multimodal Communication Processing
 HLC is attempting to strengthen current research while broadening
future research directions of the language processing research
community (e.g., multimodal communication).
HLC/ITR LT recent resource, annotation and
evaluation metrics awards
 ITR ’03: Collaborative effort on Interlingual Annotation
 HLC ’04: Constructing an Enhanced Version of WordNet, $100K
(12 months)
 HLC ’05:
 Rapid Development of Frame Semantic lexicon, to ICSI, UC
Berkeley, $400K (36 months)
 SGER: Learning Syntax-based Evaluation Metrics for Machine
Translation, Dr. Rebecca Hwa, University of Pittsburgh, $200K
(24 months)
 A Framework for Learning High Accuracy Evaluation Metrics for
NLP Applications, Dr. Alon Lavie, CMU, $150K (24 months)
CISE CRI (Computing Research Infrastructure)
Program
 Funds community resources for IIS programs; reviewers are
supplied by the technical program directors
 ’04 LT resource planning award: to Vassar College: An Open
Linguistic Infrastructure for American English, $50K (12 month)
 ’05 LT resource/annotation awards:
 Towards a Comprehensive Linguistic Annotation of Language
(Brandeis, UColorado, Pitt, Penn, NYU), $850K, 24 months;
goals include achieving an international consensus on a metaspecification framework
 Another planning award ($100K) to Vassar College and
Princeton University: An Open Linguistic infrastructure for
American English; goals include annotation of semantic
categories using WordNet and FrameNet
Information and Intelligent Systems
Reorganization into Clusters
 Robust Intelligence
Artificial Intelligence, Human Language and
Communication, Robotics, Computer Vision,
Computational Neuroscience
 Human-centered Computing
Human Computer Interaction, Social Informatics,
Universal Access
 Information Integration and Informatics
Data, Information, and Knowledge Management;
Information Integration; Science and Engineering
Informatics; Digital Libraries; Digital Government
Information and Intelligent Systems
New Cluster-oriented Solicitation
 Scheduled to be published in May with submission deadline late
October – early November
 One of cross-cutting threads: Human-Robot Interaction
 Implications for HLC area - renewed attention to
 dialogue (human-human, machine-human);
 ASR of imperfect and affected speech;
 Speech-to-concept understanding; concept-to-speech
generation
 Need corpora to support these research areas!
One Small Current Effort
 SGER (Small Grant for Exploratory Research)
 Creation of a Goal-Oriented, Human-Machine Spoken
Corpus
 ICSI (UC Berkeley), Dr. Dillek Hakkani-Tur
 Building a spoken mixed-initiative dialogue system for
for conference services
 Deploying the system for the IEEE SLT Workshop
(December 2006)
 Collecting and annotating the dialogue corpus
Digital Tools Summit at Michigan State
University (June 2006)
 Funded jointly by the Linguistics Program and (former) HLC
program
 Addresses a functionality gap between the tools that documentary
linguists and typologists need and the ability of existing tools to
annotate partially-understood linguistic data
 Existing methods and tools presuppose a regularized digital corpus
of a well-understood language and require a high degree of
computational sophistication
 Aims to develop a roadmap for creating regional and national
language archives and the tools to achieve it
 Brings together theoretical computational linguists and “datadriven” linguists to brainstorm the challenging issues
NSF perspective on funding LT resources
 New corpora for dialogue research
 New corpora for ASR research:
 mixed language (English-Spanish)
 affected speech (911 calls); senior speech
 New general corpora (ANC), both text and speech
 Dependency treebanks and parsers
 Harmonization of existing semantic resources (WordNet
and FrameNet)
 Basic research on semantic annotation: ambivalent
attitude to standardization
NSF perspective on funding LT resources
(international resources)
 Parallel corpora for new MT research on statistical
methods applied to syntactic and semantic
representations
 Research on MT for minority languages (pending award
to CMU for Inupiaq and Aymara)
 Corpora for research on language identification
 International collaboration on speech processing (NYUEBIRE- CNRS) and on unified linguistic annotation
 International workshop on dependency representations
(2007 ACL in Prague)
Thank you
Tanya Korelsky
Robust Intelligence
Human Language and Communication
Division of Information and Intelligent Systems
Directorate for Computer and Information Science and Engineering
National Science Foundation
[email protected]
http://www.nsf.gov/
Digital Living 2010
People across the globe will have access to each other and
information provided by pervasive devices, embedded sensors
and systems because all will be connected to the Internet.
Communications
Games
Photography
Inventory/Sales
tracking
Entertainment Systems
Banking
and
Commerce
Health/Medical
Home Computer
Home Appliances
Surveillance and Security
(at home, work, or in public)
PDA
Telephone
Car
Building Automation
Thanks to David Kotz at Dartmouth
Global Environment for Networking Innovations
(GENI)
Limitations of the Internet
 Security mechanisms not included in the IP layer
 End-to-end robustness cannot be assumed or assured
 Scaling limitations
 Quality of service mechanisms have not diffused widely
in the public Internet
 Support for new technologies difficult (e.g., wireless,
mobility, sensors)
Global Environment for Networking Innovations
 New networking and distributed system architectures
 Build in security and robustness
 Enabling pervasive computing, bridging the gap
between the physical and virtual worlds by including
mobile, wireless and sensor networks
 Enable control and management of other critical
infrastructures
 Include ease of operation and usability
 New classes of societal-level services and applications
Global Environment for Networking Innovations
Research Program
 Supports research, design, and development of new
networking and distributed systems
 Builds on many years of knowledge and experience, but
reexamine all networking assumptions and reinvent
where needed
 Design for intended capabilities; deploy and validate
architectures; build new services and applications
 Encourage users to participate in experimentation
 Take a system-wide approach to the synthesis of new
architectures
Global Environment for Networking Innovations
Facility
 Shared use through slicing and virtualization (where "slice"
denotes the subset of resources bound to a particular
experiment)
 Access to physical facilities through programmable platforms
(e.g., via customized protocol stacks)
 Large-scale user participation by "user opt-in" and IP tunnels
 Protection and collaboration among researchers by
controlled isolation and connection among slices
 A broad range of investigations using new classes of
platforms and networks, a variety of access circuits and
technologies, and global control and management software
 Interconnection of independent facilities via federated design.
Global Environment for Networking Innovations
Outreach
 CISE has supported numerous community workshops
in support of GENI
 CISE is supporting on-going planning efforts, including
needs assessment and requirements for the GENI
Facility.
 CISE will hold town meetings and continue to support
future workshops to broaden community participation.
 CISE will work with industry, other US agencies, and
international groups to broaden participation in GENI
beyond NSF and the US government.