Key Challenges for Theoretical Computer Science

Download Report

Transcript Key Challenges for Theoretical Computer Science

Key Challenges for
Theoretical Computer Science
Richard M. Karp
NSF, August 31, 2005
What is Theoretical Computer Science ?
TCS applies abstract models and mathematical
reasoning to problems related to computation.
 Provides a set of tools, and ways of thinking applicable to
a wide variety of applied problems.
 Contributes to national security through cryptographic
protocols and to computational science through
fundamental algorithms.
Its core: fundamental and interrelated
questions about the nature of computation.
Topics at the Core of TCS
 Algorithms and complexity of computation
 Computational limits of proof methods
 Logic and program verification
 The power of randomization
 Cryptography
 Quantum computation
 Distributed computation and communication
 Computational learning theory
The Revolutionary Impact of Algorithms
 Optimization
 Scientific computing
 Cryptography
 Genome sequencing
 Compiler construction
 Algebraic computation
 Data structures
Fundamental Questions: Complexity and Algorithms
 Two possible worlds:
• (P = NP) Combinatorial search easy but
cryptography impossible.
• (P  NP) Combinatorial search hard but
unbreakable cryptography possible
 Find best algorithms for multiplying numbers,
Discrete Fourier transform and matrix multiplication.
 Which tautologies in propositional logic have
short proofs?
Fundamental Questions: Nature & Limits of Proofs
 Find limits of computationally sound interactive proofs,
which prove a statement by performing a computation
that would be infeasible if the statement were false.
 Can we prove that statement is true without
revealing any additional information?
(Prove you earned <$100K without revealing salary.)
 Can we design proofs that can be verified by
spot checking rather than checking every step?
(Traditional proofs are only strong as weakest link.)
Fundamental Questions: Randomness, Quantum, Crypto
 Does access to random numbers give more
computing power?
 Are hypothetical computers based on principles of
quantum mechanics more powerful?
 Is there cryptosystem where everyone can send
encrypted message to Alice, but only she can
read it?
Fundamental Questions: Mechanism Design,
Distributed Comp., Learning
 Can we cause self-interested agents to co-operate
over the Internet?
 Can we reach agreement in the face of asynchronicity
and faulty parties?
 What are the inherent limits on the ability of
computers to infer patterns from examples?
Evolution of TCS
60s:
Automata
Complexity
70s:
Random
80s:
Learning
Derand
90s–
Boosting
Network Sec
PK Crypto
Quantum Comp.
Model Checking
PL
NP-Complete
Parallel, Dist
Mechanisms
Comp. Bio
Approx
Interactive Proofs
Approx Hardness
Web Search
Stat. Physics Con.
Sublinear
Large Datasets
Evolution of TCS
60s:
70s:
Automata
Random
Complexity
PK Crypto
PL
NP-Complete
Approx
algorithms,
complexity,
Learning Themes:
Derandefficient
Parallel,
Dist
Interactive
Proofs
80s:Persistent
nature of proofs, cryptography, randomness, verification
90s–
Boosting
Network Sec
Quantum Comp.
Model Checking
Mechanisms
Comp. Bio
Approx Hardness
Web Search
Stat. Physics Con.
Sublinear
Large Datasets
TCS’s Greatest Strength: Unexpected Pay-offs
NP-Completeness
Machine learning
Hardness amp.
Boosting
AdaBoost
PK Crypto,
Blum-Micali-Yao
etc.
Zero
Knowledge
Interactive proofs, PCP
Loss-Resilient
Coding
List Decoding
Emerging Challenges
 Theory of networked computation
•
•
•
•
Security and privacy
Incentives, pricing and sharing
Reliable communication
Massive distributed data sets
 Formal methods for reliable systems.
 Ties to physical and biological sciences:
• Statistical physics.
• Quantum computing
• Computational biology
Theory of Networked Computation
 Emergence of large networks (e.g. the Web) is
profound shift in focus of CS.
 Networks built, operated and used by parties w/
diverse interests and varying degrees of
cooperation and competition.
 Challenges: build and manage large systems
consisting of autonomous parties.
 Ensure rights of individuals and full and fair
exploitation of shared resources.
Internet Algorithmics
 Emerged with the spread of the Web.
 Produced significant results on
•
•
•
•
•
•
•
•
Search and information retrieval
Network protocols
Error correction
Peer-to-peer networks
E-commerce
Internet-based auctions
Mechanism design
Massive distributed data sets.
Theory of Networked Computation: Agenda
 Theoretical complement to GENI Initiative and
Cyber-infrastructure program.
 Close in spirit to Patterson’s SPUR manifesto:
Security, Privacy, Usability, Reliability.
Security and Privacy
 Users today invoke complex financial interactions
with as single click.
 Current design of the Internet based on trust.
Inadequate protection against worms, viruses,
spam and identity theft.
 Must ensure appropriate use of information by
dynamic and potentially large set of authorized
users.
Formal Models of Security
 Security can not be tested by experimentation or
simulation.
 We need quantitative measures of security with
respect to realistic models of user behavior.
 Past TCS work: cryptographic primitives (RSA,
Diffie-Hellman, DES), protocols (signatures,
e-commerce, secure interactions), study of
protocol composition.
Security: Ongoing TCS Work
 Expand protocol design to address scale, complexity
and interactivity of modern environment.
 Use economic theory to obtain security through
positioning incentives
 New techniques for sanitizing public data,
traceback, intrusion detection, etc..
Incentives, Pricing and Sharing
 Networks are built, operated and used by multiple
parties with diverse goals and interests.
 Algorithmic distributed mechanism design
studies economic mechanisms that induce
globally efficient behavior in self-interested
agents.
 Builds on algorithms, economic theory and game
theory.
 Areas of study: auctions, routing, congestion
control, caching, border gateway protocol, pricing
of multicast, network design, price of anarchy.
Massive Distributed Data Sets
 Robust trends in IT: ever-decreasing cost of data
storage, ever-increasing ubiquity of computers
and networks, accelerating deployment of sensor
networks and surveillance systems.
 New computational models: data streaming,
external memory and cache oblivious models,
sampling, property testing, sublinear time
algorithms.
 Randomization and approximation are essential.
Massive Data Sets: Challenges
 Data replication, placement, access and
persistence.
 Security and privacy, strategic and adversarial
behavior, complex data formats (images, video,
audio)
 Personalized search, complex queries, full-text
search, defenses against adversarial behavior by
web page owners.
Reliable Storage and Communication
 Maintaining integrity of data is a classical
challenge to computing.
 Modern issues:
• Explosion in amount of data
• Radical differences in nature of communication and
storage media
– Communication medium = Internet
– Storage medium = Worldwide Web
Reliable Storage & Communications:
TCS Achievements & Challenges
 Achievements: ability to correct more errors, faster
error-correction algorithms, rateless codes,
checkable codes, list decoding, computationally
bounded channels.
 Challenges: more powerful error-correction
techniques, ultra-fast decoding, malicious errors,
integration with network protocols such as
multicast.
 Connections with probabilistically checkable
proofs, cryptographic protocols, pseudorandom
number generation.
Complexity Theory of Networked Computation
 Needed: A theory of the fundamental limits of
networked computation.
• How is a networked computational problem involving
multiple agents specified?
• What is meant by a correct solution?
 Require formal models capturing massive scale,
user self-interest, subnetwork autonomy,
distributed control, network failures.
 Must define computational resources and cost,
reductions between problems, complexity classes,
complete problems, intractable problems.
Formal Models of Reliable Systems
 Classical approach to reliability is simulation and
testing.
 Detects errors only in late stages of development;
coverage is only partial.
 More principled approach: rigorous mathematical
specification and formal verification of system
behavior.
 Need certified software with precise and well
understood specifications.
 Particularly critical in embedded systems and
autonomous medical applications.
Role of Logic
Logic provides languages for formalizing requirements:
 Floyd-Hoare logic for sequential programs
 Temporal and fixpoint logics for reactive
programs
 Logics tailored for authentication and security
properties of crypto protocols.
 Led to standardized industrial-strength formal
specification languages.
Role of Automata Theory
 Model checker SPIN uses linear temporal logic as
requirement language and an automata-theoretic
model checking algorithm.
 Research on timed and hybrid automata provides
foundation for the emerging area of embedded
systems.
Role of Decision Procedures
 Modern solvers for propositional satisfiability
used routinely on industrial-scale problems with
hundreds of thousands of variables.
 Symbolic fixpoint evaluation research led to
industrial interest in model checking.
 Decision procedures are continually being refined
and improved for use in verification tools.
TCS Connections with Biology and the
Physical Sciences
 Statistical physics
 Quantum computation
 Computational Biology
Statistical Physics
 Studies macroscopic properties of large systems
involving simple microscopic components
undergoing local interactions.
Examples: freezing of water, ferromagnetism.
 CS analogy: global properties of WWW emerge from
local interactions; structure of complex combinatorial
problems derives from local constraints.
 Statistical physics studies random interactions;
TCS studies algorithms on random structures.
Phase Transitions and Sharp Thresholds
 Infinitesimal change in the parameters governing
local interactions causes a drastic change in
macroscopic behavior:
• Physics: transformation from water to steam.
• CS: random satisfiability instances switch from easy
to hard when ratio of clauses to variables passes a
critical value.
Cross-Fertilization
 Spin glasses are fluid at high temperatures but at lower
temperatures have many clusters of stable configurations.
Similarly, constraint satisfaction problems with sparse
constraints are fluid, but with dense constraints get stuck
in suboptimal solutions.
 Algorithmic paradigm based on this analogy has been
spectacularly successful.
 Markov Chain Monte Carlo used in physics as model of
evolution of a physical system, and in CS as technique
for approximation algorithms.
Quantum Computation
 Quantum mechanics holds the promise of
exponentially faster computers and perfectly
secure communication channels.
 Large numbers can be factored rapidly, allowing
RSA to be broken.
 To realize quantum computers, must guard
against decoherence, the dissipation of quantum
information into the environment.
Challenges for Theory of Quantum Computation
 Defeat decoherence using quantum errorcorrecting codes.
 Understand structure of problems that can be
solved exponentially faster on quantum
computers than on classical ones.
 Use quantum computation as a test of the validity
of quantum mechanics.
Revolution in Biology
 Sequencing of human genome is a landmark
event in history of science.
 Biology is becoming a quantitative, informationbased science.
 Goals:
• Detailed, predictive model of how cells work at
molecular level.
• Understand mechanisms of cancer, global
organization of physiological systems, processes of
development from embryo to complex organism
• Tailor therapy to genetic makeup of individuals.
Role of Algorithms in Computational Biology
 Understand the information hidden in the
genome.
 Construct mathematical models of complex
cellular processes.
 Extract patterns from large biological data sets.
 Determine associations between genetic variation
and disease.