BellLabs06 - Computer Science, Columbia University

Download Report

Transcript BellLabs06 - Computer Science, Columbia University

Henning Schulzrinne
Dept. of Computer Science
Columbia University
Networking Research: Moving from a Hardware
Model to a Service Model
October 3, 2006
Overview
• Disclaimers
– network systems research
– grossly oversimplified
• Technology evolution
• Research – the big picture
• Networking research
• New research themes:
– user-created content
– programmability
– TCO
Bell Labs
2
Lifecycle of technologies
COTS
(e.g., GPS)
traditional technology propagation:
IM, digital photo
military
opex/capex
doesn’t
matter;
expert
support
Can it be done?
corporate
capex/opex
sensitive,
but
amortized;
expert
support
Can I afford it?
Bell Labs
consumer
capex
sensitive;
amateur
Can my mother use it?
3
Evolution of VoIP
“how can I make it
stop ringing?”
long-distance calling,
ca. 1930
“amazing – the
phone rings”
1996-2000
“does it do
call transfer?”
going beyond
the black phone
catching up
with the digital PBX
2000-2003
Bell Labs
20044
IEEE Top 100 R&D Spenders
Bell Labs
5
Does Research Pay?
• Benefits tend to accrue to everyone
– despite patent protection
– long-term industrial research seems to require a
monopoly
• Old Bell Labs, Microsoft
– national research: workforce training
• But leadership requires research
– vs. followers & copiers
• What is research?
– Is Cisco doing Research? Is Google?
– Is research anything that is published in a peerreviewed journal?
– Has research paid off for Microsoft?
Bell Labs
6
Why Research?
• Need goal clarity – multiple goals ok, but often differs
between researcher and organization
• Science
– new fundamental insights
• Engineering for the public good
– e.g., open standards
• Support product engineering
– new & better local products
• IPR development
– capture value developed elsewhere
• Annual report gloss
– should get funding from marketing
Bell Labs
7
Science vs. Engineering
•
•
•
•
•
Computer Science has identity crisis: applied math, experimental
science or engineering?
Applied math
– general abstractions & elegant models
– reality only a distant motivator
– metric: can it be published in J Applied Probability?
Experimental science
– emphasis on general insights
– measurements & models
– often reflective: “analyze Gnutella structure”
– point solutions
– metric: does it fit Small World and is it self-similar? is it
optimal?
Engineering
– emphasis on real-world impact
– constrained by existing large systems
– system solutions: needs to play nice with rest of the world
– metrics: scalability, cost, maintainability, implementability
Honesty about what we’re doing
Bell Labs
8
Traditional research
• Inspired by physics or chemistry
• Physics: Theory  experiment  lab bench  prototype
 (semiconductor) product
– Communications: Research  advanced development
 development
• Necessary for hardware
• Dubious for software-intensive systems
– rewrite several times (if not forgotten)
– less qualified each time
– BL example: Unix
Bell Labs
9
Who’s the customer?
• Business units  carrier  consumer
• Goals may not be identical
– BU: preserve investment, confirm earlier choices
• ATM, SS7
– Carrier: preserve product differentiation, business
model, customer lock-in, monopoly rent, …
• walled gardens, WAP, AAA, DRM, IMS, …
– Consumer: fashion, functionality, cost
• search engines, WiFi, MP3, Skype, web hosting, …
• Easier for some organizations
– e.g., Google: direct customer is advertiser, but
revenue driven by page views  consumer
Bell Labs
10
Good ideas
•
•
•
Myth: Good ideas will win
– “Build a better mousetrap and the world will beat a path to
your door.” (Ralph Waldo Emerson)
– modern version: IEEE 802.11 will dig through IEEE Infocom
proceedings to find your master paper
– even most Sigcomm papers have had no (engineering) impact
Myth: Just ahead of its time – it will take 10 years to have impact
– reality: most papers either have immediate impact or none,
ever
Mediocre ideas with commitment win over brilliant ideas without
– particularly if part of a larger system
– cost of understanding ideas
– possible encumbrances (patents)
–  researchers need to accompany their “children” through
teenage years
Bell Labs
11
Translation into Practice
• Relay model
– research  advanced development  product
– information loss rate of 95%?
– lack of sense of ownership
– hand-off: original owners have moved on to next
project
• Google model
– repeated, continuous refinement
– public beta
– no separate “research”
– still has problems with polish & completion
Bell Labs
12
Big Bet vs. 1000 Flowers
• Old research approach:
– large bets, with huge, but rare pay-off
– however, more often cancelled
– nobody seems to be able to predict success early
– IH syndrome  “can’t be good”
• Another approach:
– low-cost projects – more feasible for software
– try it on customer (“beta”)
– fail early and often
• but threshold to success is lower
• find alternate routes to success (e.g., spin off as
open source project)
Bell Labs
13
Standards Work
•
•
Old approach
– standards group goes to Geneva
– Input: dinners
– Output: PowerPoint
– software groups convert finished standard into products
(maybe)
New approach
– standards contributors directly develop (or supervise) libraries,
prototypes and other tools
• possibly in conjunction with academic research groups
– early, pre-completion feedback
– rapid early release  possible early implementation IPR
– train development staff
– participate in interop testing
Bell Labs
14
Aside: Motivating Researchers
•
•
Industry research attractors:
– no fund-raising
– better paid
– larger, more mature research groups (not just grad students)
– larger projects
• not just 1-student NSF projects
– impact on real products, not just papers
• engineering motivator vs. science motivator
• want to see real-world impact
Danger of losing attractors
– projects that get cancelled semi-randomly
– legacy-support research
– rapid, fashion-driven changes in wind direction
– directional uncertainty (“why are we here?”)
– discouragement of community building & maintenance
Bell Labs
15
Some network technology crystal-ball gazing
•
•
•
•
•
Killer applications
Internet evolution
Resource scarcity: bits  humans
Programmability
Reliability
Bell Labs
16
Killer Application
• Carriers looking for killer application
– justify huge infrastructure investment
– “video conferencing” (*1950 – †2000)
– ?
• “There is no killer application”
– Network television block buster  YouTube hit
– “Army of one”
– Users create their own custom applications that are
important to them
– Little historical evidence that carriers (or equipment
vendors) will find that application if it exists
• Killer app = application that kills the carrier
Bell Labs
17
Service Providers
• Old-style service providers:
– want to avoid being bit pipes only
• but only really successful at that
– major innovation: custom ring tones
• Boundary between service providers and software
providers vanishing
– “web 2.0” – software? service?
– APIs
Bell Labs
18
Internet and networks timeline
theory
1960
university
prototypes
1970
port
speeds
Internet
protocols
production use
in research
1980
100 kb/s
email
ftp
queuing
architecture
commercial
early residential
1990
1 Mb/s
2000
10 Mb/s
DQDB, ATM
QoS
VoD
Bell Labs
2010
100 Mb/s
ATM
BGP, OSPF
Mbone
IPsec
HTTP
HTML
RTP
DNS
RIP
UDP
TCP
SMTP
SNMP
finger
routing
cong. control
broadband
home
1 Gb/s
XML
OWL
SIP
Jabber
p2p
ad-hoc
sensor
19
Networking research is fashion-driven
workshop
white paper
DARPA, NSF
 $$
EU
Nth framework
trailing-edge
research
Sigcomm
Infocom
Mobicom
ICNP
ATM
DQDB
QoS
networking courses
First (European) workshop
on X -YAP on X
secondary
conferences
active networks
Bell Labs
mobile networks
wireless
ad-hoc, sensor
20
Cause of death for the next big thing
QoS multicast
not manageable across
competing domains


not configurable by normal
users (or apps writers)

no business model for ISPs


no initial gain

80% solution in existing
system

increase system
vulnerability

mobile
IP
active
networks


IPsec IPv6















(NAT)
Bell Labs



21
Maturing network research
•
•
Old questions:
– Can we make X work over packet networks?
• All major dedicated network applications (flight reservations,
embedded systems, radio, TV, telephone, fax, messaging, …) are
now available on IP
– Can we get M/G/T bits to the end user?
– Raw bits everywhere: “any media, anytime, anywhere”
New questions:
– Dependency on communications  Can we make the network reliable?
– Can non-technical users use networks without becoming amateur sysadmins?  auto/zeroconfiguration, autonomous computing, selfhealing networks, …
– Can we prevent social and financial damage inflicted through networks
(viruses, spam, DOS, identity theft, privacy violations, …)?
Bell Labs
22
Impact of network research
•
•
•
What’s promising/interesting – two
different axes:
– Intellectual merit  interesting
analysis, broadly applicable, …
– Satisfies practical needs  may
not be a scientific breakthrough
Field has few grand challenges and
metrics
– cf., speech understanding or
face recognition
Depends largely on external
technology inputs
– faster CPUs, better optical gear,
compression
– typical performance
improvements in queueing: 2050%
•
•
•
Bell Labs
Networking research impact
– on deployed systems and
protocols?
– on understanding network
behavior?
– on other papers?
Which of the 10,000 QoS papers had
real impact?
What papers were responsible for
most important networking advances?
– TCP , web?, email?
23
Recent network R&D successes
•
•
•
Early networking: success = thousands of other researchers, 80% PhD
Success now = millions of users, 1% PhD
– iPod  ease of use
– Blogs, Wiki, YouTube, Wikipedia  user-created content
• ease of content creation for non-experts
– PHP, Ruby-on-Rails
• ease of development for non-experts
– Skype
• ease of configuration (none)
Axiom: The chance of creating a successful new application is
inversely proportional to the amount of formal network knowledge
– HTTP/1.0 would flunk any network design exam
Bell Labs
24
What’s fashionable (and not)
•
•
•
Judging from Infocom submissions and NSF panels:
– Security of any sort
– Peer-to-peer networks
– Sensor networks
– Overlay networks
– Network measurements
Ideal paper
– “Ad-hoc MIMO sensor network exploiting small-world
phenomena in peer-to-peer overlays”
What’s not:
– QoS: scheduling, admission control, …
– Active networks
– Multicast
Bell Labs
25
Infrastructure research questions: Scaling, Maintainability,
Security, …
•
•
•
Scaling
– no major changes for 20+ years (link-state, DV, etc.)
– two-layer (intra/inter)  other routing paradigms
Maintainability
– protocols and systems are not designed with fault diagnosis
capabilities
– e.g., “transparent” proxies, routing, DNS, hacked traceroute
Security
– secure routing protocols
– DOS prevention (pushback, source discovery)
Bell Labs
26
… and Reliability
•
•
•
•
we don’t know precisely why network applications fails
– components and backbones appear to pretty reliable
– but we measured at 99.5% of usable time  far below 99.999%
in telecom networks
– lots of possible culprits, including DNS and carrier
interconnects
temporary overloads
reduce operator errors
– e.g., XCONF effort in IETF
– inherently safe or fail-safe protocols?
faster convergence in routing protocols
– BGP  up to 20-30 minutes!
Bell Labs
27
Why do good ideas fail?
•
•
•
Research: O(.), CPU overhead
– “per-flow reservation (RSVP) doesn’t scale”
 not the problem
– at least now -- routinely handle
O(50,000) routing states
Reality:
– deployment costs of any new L3 technology
is probably billions of $
– coordination costs
• The QoS problem is a lawyer problem,
not an engineering problem
Cost of failure:
– conservative estimate (1 grad student year
= 2 papers)
– 10,000 QoS papers @ $20,000/paper  $200
million
Bell Labs
QoS
quality-ofservice
IEEE
10,377
12,876
ACM
3,487
4,388
28
Resource Scarcity
•
•
•
Old model:
– scarce disk  memory  CPU  bandwidth
New model:
– disks at <$.50/GB, memory at <$150/GB
– 22 mio. SMEs
– 100 mio. households
– system administrator: call center in India + teenager (or
son/daughter)
Missing:
– automated installation
– self-diagnosis
– automated scaling across multiple servers
• single-server web-app: trivial
• 2-server web-app: circular master slave with custom file sync
– automated backup and recovery
Bell Labs
29
In more detail…
•
•
•
•
•
Deployment problems
Layer creep
Simple and universal wins
Scaling in human terms
Cross-cutting concerns, e.g.,
– CPU vs. human cycles
• we optimize the $100 component, not the
$100/hour labor
– introspection
– graceful upgrades
– no policy magic
Bell Labs
30
Transition in cost balance
• Total cost of ownership
– Ethernet port cost  $10
– about 80% of Columbia CS’s system support cost is
staff cost
• about $2500/person/year  2 new PCs/year
• much of the rest is backup & license for spam
filters 
• Does not count hours of employee or son/daughter time
• PC, Ethernet port and router cost seem to have reached
plateau
– just that the $10 now buys a 100 Mb/s port instead of
10 Mb/s
Bell Labs
31
CRF 2007 budget precis
toner
tapes
supplies (cables, tools, parts, …)
misc. (shipping, books, …)
16,000
8,000
8,000
3,000
software licenses
22,000
maintenance
12,000
cell phones, cable modems
7,000
hardware (servers)
~22,000
file storage + backup, amortized
Staff (5), incl. fringe
20,000
500,000
Total
618,000
Bell Labs
32
User issues (guesses)
• Lack of trust
– small mistakes  identity gone
– waste time on spam, viruses, worms, spyware, …
• Lack of reliability
– 99.5% instead of 99.999%
– even IETF meeting can’t get reliable 802.11
connectivity
• Lack of symmetry
– asymmetric bandwidth: ADSL
– asymmetric addressing: NAT, firewalls  client(server) only, packet relaying via TURN or p2p
• Users as “Internet mechanics”
– why does a user need to know whether to use IMAP
or POP?
– navigate circle of blame
Bell Labs
33
Technical infrastructure issues
• Multi-homing and mobility
– address vs. locator issues
• Large-scale Internet
– secure routing
– routing scaling (60,000 AS)
• Architecture
– standardization delays  now routinely 3-5 years for
minor extensions
– resistance to change at ≤ L4
– difficulty in deploying new applications:
• Internet service = outbound port 80 and 443
Bell Labs
34
What has gone wrong?
•
•
•
•
•
Familiar to anybody who has an old house…
Entropy
– as parts are added, complexity and interactions increase
Changing assumptions
– trust model: research colleagues  far more spammers and
phishers than friends
• AOL: 80% of email is spam
– internationalization: internationalized domain names, email
character sets
– criticality: email research papers  transfers $B and dial “9-11”
– economics: competing providers
• “Internet does not route money” (Clark)
Backfitting
– had to backfit security, I18N, autoconfiguration, …
 Tear down the old house, gut interior or more wall paper?
Bell Labs
35
The transformation of protocol stacks
Internet
ca. 1995
Internet
ca. 2005
application
presentation
session
application
application
SOAP
HTTP
transport
TCP
TCP
network
IP
IP-in-IP
IP
H. Zimmermann
ca. 1980
link
802.3
physical
physical
Bell Labs
MPLS, PoE
PoS, ATM
physical
36
Simple wins (mostly)
•
•
Examples:
– Ethernet vs. all other L2 technologies
– HTTP vs. HTTPng and all the other hypertext attempts
– SMTP vs. X.400
– SDP vs. SDPng
– TLS vs. IPsec (simpler to re-use)
– no QoS & MPLS vs. RSVP
– DNS-SD (“Bonjour”) vs. SLP
– SIP vs. H.323 (but conversely: SIP vs. Jabber, SIP vs. Asterisk)
– the failure of almost all middleware
– future: demise of 3G vs. plain SIP
Efficiency is not important
– BitTorrent, P2P searching, RSS, …
Bell Labs
37
Customer Programmability
• Old model:
– 1000s of 5ESS programmers in Naperville using oneof-a-kind language
– successor model: JAIN, CAMEL
• New model:
– carrier programmers are probably no smarter than
best early-adopter customers
– see Linksys WRT54G
– Google maps mash-ups
– Google/Yahoo libraries
Bell Labs
38
Programmable Applications
• Old model:
– mostly closed applications
– sometimes SDKs, but highly complex
• New model:
– every application designed for humans will be
integrated into other applications
• Microsoft started trend with Office SDKs and OLE,
but limited
• web APIs
• XUL (web browsers)
– applications as components
Bell Labs
39
(My) guidelines for a new Internet
•
•
Maintain success factors, such
as
– service transparency
– low barrier to entry
– narrow interfaces
New guidelines
– optimize human cycles,
not CPU cycles
– design for symmetry
– security built-in, not
bolted-on
– everything can be mobile,
including networks
– sending me data is a
privilege, not a right
– reliability paramount
– isolation of flows
•
Bell Labs
New possibilities:
– another look at circuit
switching?
– knowledge and control
(“signaling”) planes?
– separate packet forwarding
from control
– better alignment of costs and
benefit
– better scaling for Internetscale routing
– more general services
40
Academic Collaboration
• Advantage: naïve
– don’t know that something can’t be done and has
never been done like that before
• “Speak truth to power”
• Not beholden to carrier business models
• Often exposed to more modern programming models
– e.g., Eclipse vs. gcc
• Graduate student demographic is closer to future
customers
– digital natives vs. digital immigrants
• Can apply for NSF grants 
Bell Labs
41
Conclusion
•
•
•
•
Clarity of goals and purpose
– science, engineering for the greater good,
annual report gloss or product development?
Is networking research becoming like civil
engineering: large, important infrastructure, but
resistant to fundamental change?
Challenges are in reliability and maintainability,
rather than performance or packet-loss & jitter QoS
As a community, need to learn more from our
collective and individual mistakes…
– Need series “The design mistakes in [(formerly)
popular system or protocol]”
Bell Labs
42