Searching for the Quantifiable, Scalable, Verifiable, and

Download Report

Transcript Searching for the Quantifiable, Scalable, Verifiable, and

UNCLASSIFIED
Searching for the Quantifiable, Scalable,
Verifiable, and Understandable
Dewey Murdick, Ph.D.
Program Manager
Quantitative Methods in Defense of National Security, 25 May 2010
25 May 2010
UNCLASSIFIED
1
UNCLASSIFIED
Intelligence Advanced
Research Projects Activity
(IARPA)
25 May 2010
UNCLASSIFIED
2
UNCLASSIFIED
Overview
IARPA’s mission is to invest in high-risk/high-payoff research programs that
have the potential to provide the U.S. with an overwhelming intelligence
advantage over our future adversaries

This is about taking real risk.
–

CAVEAT: HIGH-RISK/HIGH-PAYOFF IS NOT A FREE PASS FOR STUPIDITY.
–

This is NOT about “quick wins”, “low-hanging fruit”, “sure things”, etc.
Competent failure is acceptable; incompetence is not.
“Best and brightest”.
–
World-class PMs.
o
–

IARPA will not start a program without a good idea and an exceptional person to
lead its execution.
Full and open competition to the greatest possible extent.
Cross-community focus.
–
Address cross-community challenges
–
Leverage agency expertise (both operational and R&D)
–
Work transition strategies and plans
25 May 2010
UNCLASSIFIED
3
UNCLASSIFIED
The “P” in IARPA is very important
 Technical and programmatic excellence are required
 Each Program will have a clearly defined and measurable end-goal,
typically 3-5 years out.
– Intermediate milestones to measure progress are also required
– Every Program has a beginning and an end
– A new program may be started that builds upon what has been
accomplished in a previous program, but that new program must
compete against all other new programs
 This approach, coupled with rotational PM positions, ensures that…
– IARPA does not “institutionalize” programs
– Fresh ideas and perspectives are always coming in
– Status quo is always questioned
– Only the best ideas are pursued, and only the best performers are
funded.
25 May 2010
UNCLASSIFIED
4
UNCLASSIFIED
The “Heilmeier Questions”
1. What are you trying to do?
2. How does this get done at present? Who does it? What are the
limitations of the present approaches?
– Are you aware of the state-of-the-art and have you thoroughly thought
through all the options?
3. What is new about your approach? Why do you think you can be
successful at this time?
– Given that you’ve provided clear answers to 1 & 2, have you created a
compelling option?
– What does first-order analysis of your approach reveal?
4. If you succeed, what difference will it make?
– Why should we care?
5. How long will it take? How much will it cost? What are your mid-term
and final exams?
– What is your program plan? How will you measure progress? What are your
milestones/metrics? What is your transition strategy?
25 May 2010
UNCLASSIFIED
5
UNCLASSIFIED
The Three Strategic Thrusts (Offices)
 Smart Collection: dramatically improve the value of collected data
– Innovative modeling and analysis approaches to identify where to
look and what to collect.
– Novel approaches to access.
– Innovative methods to ensure the veracity of data collected from a
variety of sources.
 Incisive Analysis: maximizing insight from the information we collect,
in a timely fashion
– Advanced tools and techniques that will enable effective use of
large volumes of multiple and disparate sources of information.
– Innovative approaches (e.g., using virtual worlds, shared
workspaces) that dramatically enhance insight and productivity.
– Methods that incorporate socio-cultural and linguistic factors into
the analytic process.
– Estimation and communication of uncertainty and risk.
 Safe and Secure Operations: countering new capabilities of our
adversaries that could threaten our ability to operate effectively in a
networked world
– Cybersecurity
o
o
Focus on future vulnerabilities
Approaches to advancing the "science" of cybersecurity, to include
the development of fundamental laws and metrics
– Quantum information science & technology
25 May 2010
UNCLASSIFIED
6
UNCLASSIFIED
Program Manager Interest Areas by Office
safe and secure operations
incisive analysis
smart collection
25
20 May
April2010
2010
UNCLASSIFIED
7
UNCLASSIFIED
Concluding Thoughts on IARPA
 Technical Excellence & Technical Truth
– Scientific Method
– Peer/independent review
– Full and open competition
 We are looking for outstanding PMs.
 How to find out more about IARPA:
www.iarpa.gov
25 May 2010
UNCLASSIFIED
8
UNCLASSIFIED

Conference on Technical Information Discovery, Extraction & Organization
–
–
–
Mark Heiligman, IARPA PM, Mile-wide, Mile-deep (M2) Exploration
Held October 28-29, 2008, consisted of talks, breakout sessions, and open discussion
Attended by 30+ researchers, business intelligence, and government participants

Facilitated an open and active discussion on current methods, challenges,
and opportunities in:
– Information Retrieval
This talk is a personal summary of
– Text Processing
the materials presented and
– Knowledge Discovery
– Information Extraction
discussed at the conference.
– Social Network Analysis
– Scientometrics
– Information Visualization and
– Closely related research domains

Goal: Drive technical innovation and explore novel applications in the area
of systematically mining the global technical literature for useful and nonobvious information and insights
25 May 2010
UNCLASSIFIED
9
UNCLASSIFIED
M2 Information Content



Formal Presentations
– Mile-wide, Mile-deep, Mark Heiligman, IARPA
– Information Retrieval, Scientometrics/Text Mining,and Literature-related
Discovery and Innovation, Ron Kostoff, MITRE
– From Knowledge Mapping to Innovation Evolution, Hsinchun Chen, University of
Arizona
– Machine Learning for Extraction, Integration and Mining of Research Literature,
Andrew McCallum, University of Massachusetts Amherst
– Information Retrieval:The Path Ahead, Jamie Callan, Carnegie Mellon University
– Sentiment Analysis from User Forums, Ronen Feldman, Hebrew University
– The Accuracy of a Map of Science: Measurement & Implications, Richard
Klavans, SciTech Strategies, Inc
– Document Classification Using Nonnegative Matrix Factorization, Michael W.
Berry, University of Tennessee, Knoxville
Breakout Sessions & Open Discussion – richest idea content, and biggest
contribution to what follows
MITRE Summary:
– A Two-step Analytic-workshop Process For Identifying Promising Research
Opportunities, by Ronald Kostoff et al.
25 May 2010
UNCLASSIFIED
10
UNCLASSIFIED
Problems



Too Much Data / Diversity
– Scale
– Textual / Multimedia
– Multilingual
– Multiple Sources
Too Complex
– Motivation (Create / Disseminate)
– Topics / Domains (# / Connectedness)
– Shared Intentionally or Not
Too Fast – Streaming
Example for Technical Topics:
Scientific Literature, Patents, Conference Proceedings, Talks, Technical Blogs,
S&T News, Social Media, Experimental Data, Computational Models / Code,
Forecasts, Corporate Filings, Government Funding, Policy, Public Opinion, etc.
25 May 2010
UNCLASSIFIED
11
UNCLASSIFIED
Weak Signals in Context
 Find weak signals
 Use weak signals within context for
– Finding connections
– Anomaly detection/rare events
– Cultural meaning / implications
 Manage uncertainty
 Development new standards for
“ground truth”
25 May 2010
UNCLASSIFIED
12
UNCLASSIFIED
Connecting Weak Signals


Automated Connection Making / Knowledge Discovery
 Iterative information retrieval (IR), extraction (IE), and linkages
identification
 Leveraging previous relevancy judgments and feedback
 Probabilistic linking of subjective qualities within text
Goal: find high-value, low-signature information in context
Material processing
method X may be
interesting for property Y
!
Intriguing Rumors,
Uncertain Source
Analyst
25 May 2010
Analyst w/
Quantitative System
UNCLASSIFIED
Analyst
13
UNCLASSIFIED
Enhancing Contextual Awareness


Automatically
– Leverage element characteristics in connection building process
– Focused information augmentation from secondary sources
– Characterize and apply to analogous situations
o Network Behaviors and Features
o Assessments of subjectivity (e.g., theme, sentiment)
Goal: rapidly inform non-experts with context about a given area/issue
Context
Analyst
25 May 2010
UNCLASSIFIED
14
UNCLASSIFIED
Identifying Outliers, Rare Events


Automatically
– Measuring and analyzing low-frequency indicators in group trends
– Systematically identifying anomalies from records of interest and early-stage
emerging technologies
– Identifying rare events based on non-technical phrase association patterns
– Extracting technical phrases of interest by targeting non-technical phrases
such as sentiment, analysis, stylistics, etc.
– Intelligent clustering techniques
Goal: Identify significant rare events
Is Jim doing
something illegal?
Bank statements
Analyst
25 May 2010
UNCLASSIFIED
15
UNCLASSIFIED
Collaboration
(Two Different Kinds)
 Common playground facilitating:
– Large-scale data sharing
– Data discovery annotation
– Error corrections
– Multi-source integration
– Recall of what has been done in the past
 Measure collaboration
– Recognize cultural differences
– Discover key players
– Process changes over time
25 May 2010
UNCLASSIFIED
16
UNCLASSIFIED
Multilingual Methods

Need algorithms that can process, filter, and
analyze multilingual data

Leverage domain-specific machine
translation

Compare and contrast translated and
multilingual data for improvements in
queries, trends, etc.

Language translation is high cost

Translation is not enough to understand
meaning in non-English text

Cultural information helps to understand
social landscape, motivation, and production
of scientists in S&T
25 May 2010
UNCLASSIFIED
17
UNCLASSIFIED
No Black Boxes
 No Algorithm black boxes
– Shared environment for algorithm development
– Success verifiable through indicator metrics
– Output must be humanly comprehensible
 Human comprehension metrics:
o Number of potential associations
o Number of dimensions simultaneously analyzed
o Steps to finding information
o Amount of time to digest information
o Amount of information at time
o Efficiency of user-driven tuning of level-of-detail
 Algorithmic output exportable to interactive tools
25 May 2010
UNCLASSIFIED
18
UNCLASSIFIED
User-Friendly Displays for Data Analysis
 Interactive and multifaceted views of
scientific landscape
– Geo-location
– Entity Networks
– Topical Networks
 Environments that provide both
contextual awareness and
visualizations
– Contextual information
(Wikipedia style) provided when
user encounters unfamiliar term
or concept
 Interactive interfaces to pull out
information
25 May 2010
UNCLASSIFIED
19
UNCLASSIFIED
Metric Validation Processes
 User studies and human labeling to verify
data in information extraction(IE) and NLP
is costly
 Use hybrid methods (e.g., boosting)
 Leverage automatically processed
information from a external source to
validate output
 Automating identification of trusted sources
to help validation process
 Validate results with historical studies,
knowledge of current state, and forecasts
25 May 2010
UNCLASSIFIED
Serious Need for
Novel Thinking
20
UNCLASSIFIED
Things to Remember
 Track Uncertainty
– Indicator metrics
– Weak signals
 No black boxes
– Human comprehensible output
 Provide clear view of evaluation metrics
– Gold standards
– Ground truth
25 May 2010
UNCLASSIFIED
21
UNCLASSIFIED
Take Action
 Respond to an open BAA
 Chat with a Program Manager (PM)
 Come up with new ideas for programs,
become a PM
 Provide information to open RFIs
25 May 2010
UNCLASSIFIED
22