What is a Platform? - Stanford University

Download Report

Transcript What is a Platform? - Stanford University

The Stanford Platform Laboratory
John Ousterhout and Guru Parulkar
Stanford University
http://platformlab.stanford.edu/
Platform Lab Faculty
Bill Dally
Nick McKeown
Sachin Katti
John Ousterhout
Faculty Director
Christos Kozyrakis
Phil Levis
Guru Parulkar
Mendel Rosenblum
Executive Director
Keith Winstein
2
New Platforms Enable New Applications
• 1980’s:
– Platform: relational database
– Applications: enterprise applications (e.g. ERP systems)
• 1990’s:
– Platform: HTTP + HTML + JavaScript
– Applications: online commerce
• 2000’s:
– Platform: GFS + MapReduce
– Applications: large-scale analytics
• 2010’s:
– Platform: smart phones + GPS
– Applications: Uber and many others
3
What is a Platform?
• General-purpose substrate
– Makes it easier to build applications or higher-level platforms
– Solves significant problems
– Usually introduces some restrictions
• Software and/or hardware
• Example: Map/Reduce computational model
– Simplifies construction of applications that use hundreds of
servers to compute on large datasets
– Hides communication latency: data transferred in large blocks
– Automatically handles failures & slow servers
– Restrictions: 2 levels of computation, sequential data access
4
Platform Lab Vision
Create the next generation of platforms
to stimulate new classes of applications
Platforms
Large Systems
Collaboration
5
Drivers for Next Generation Platforms
• Achieve physical limits
– Can we have layers of abstraction without giving up performance?
• Heterogeneity and specialization
– General-purpose systems fundamentally inefficient
– Can we find a small set of specialized components that are highly efficient and, taken
together, provide a general-purpose set of abstractions?
• Raise the floor of the developer productivity
– How to create abstractions that are extremely easy to use, while still providing high enough
performance to meet the application’s needs?
• Scalability and elasticity
– How to achieve high throughput and low latency with horizontal scaling
– How to achieve elasticity, for example from 1K-1M users without reimplementation?
6
Initial Focus Platform:
Swarm Pod
Swarm Pod
Switches/Routers
Self-Driving
Cars
Wired/Wireless
Networks
Cell Phones
Next-Generation
Datacenter Pod
Internet
of Things
Drones
Swarms of Devices
8
Changing Technologies and Requirements
Switches/Routers
Specialized
Components
Need Better
Visibility and
Control
Increasing
Core Density
Wired/Wireless
Networks
More Devices
Online
Next-Generation
Datacenter Pod
Low Latency
Interconnects
Self-Driving
Cars
Collaboration
Between Devices
Cell Phones
Internet
Large
of Things
Nonvolatile
Memories
Drones
Swarms of Devices
9
Swarm Pod Research Topics
RAMCloud
Storage System
Switches/Routers
Self-Incentivizing
Networks
Programmable
Network Fabrics
Self-Driving
Cars
Wired/Wireless
Networks
Scalable
Control Planes
Cell Phones
Next-GenerationNew Memory/
IX Operating Datacenter PodStorage Systems
Internet
System
of Things
Low-Latency
Software Stack
Drones
Swarms of Devices
10
The Low-Latency Datacenter
• Phase 1 of datacenter revolution: scale
– How can one application harness thousands of servers?
– New platforms such as MapReduce, Spark
– But, based on high-latency technologies:
• 1990’s networking: 300-500µs round-trips
• Disks: 10ms access time
• Phase 2 of datacenter revolution: low latency
– New networking hardware:
• 5-10µs round-trips today
• 2-3µs in the future
– New nonvolatile memory technologies
• Storage access times < 10µs
– Low latency will enable new applications
How does low latency affect system architecture?
11
Eliminating Layers
• Existing software stacks highly layered
– Great for software structuring
– Layer crossings add latency
– Software latency hidden by slow networks and disks
• Can’t achieve low latency with today’s stacks
– Death by a thousand cuts: no single place to optimize
– Networks:
• Complex OS protocol stacks
• Marshaling/serialization costs
– Storage systems:
• OS file system overheads
• Low-latency systems will require a new software stack
– Can layers be reduced without making systems unmanageable?
– Must eliminate layer crossings
– What are the new APIs?
12
The RAMCloud Storage System
• New class of storage for low-latency
datacenters:
–
–
–
–
All data in DRAM at all times
Large scale: 1000-10000 servers
Low latency: 5-10µs remote access
Durability/availability equivalent to
replicated disk.
1000 – 100,000 Application Servers
Appl.
Appl.
Appl.
Library
Library
Library
…
Appl.
Library
Datacenter
Network
Coordinator
• 1000x improvements in:
– Latency
– Throughput
(relative to disk-based storage)
Master
Master
Master
Backup
Backup
Backup
…
Master
Backup
1000 – 10,000 Storage Servers
• Goal: enable new data-intensive
applications
13
New RAMCloud Projects
New software stack layers for low-latency datacenter:
• New remote procedure call (RPC) system
– Homa: new transport protocol
• Receiver-managed flow and congestion control
• Minimize buffering
– Microsecond-scale latency
– 1M connections/server
• New thread scheduling mechanism
– Threads scheduled by application, not OS
– OS allocates cores to applications, manages competing apps
– Same mechanism extends to VMMs: hypervisor allocates cores to guest OS
14
Reimagining Memory and Storage
• New nonvolatile memories coming soon
• Example: Intel/Micron Crosspoint devices:
– 1-10µs access time?
– 10 TB capacity?
– DIMM form factor: directly addressable
• What are the right abstractions for shared storage?
– Files have high overheads for OS lookups, protection checks
– Does paging make sense again?
– Single-level store?
• Relationship between data and computation:
– Move data to computation or vice versa?
15
Hollowing Out of the OS
Application
Thread Scheduling
Networking (Kernel Bypass)
Direct Storage Access
Operating System
Physical Memory Mgmt
Device Drivers
Hypervisor
Does a radical OS redesign/simplification make sense?
16
Next logical step in SDN:
Take programmability
all the way down to the wire
Status quo
• Prone to bugs
Switch OS
• Very long and unpredictable lead time
Run-time API
“This is roughly how I process
packets …”
Driver
Fixed-function ASIC
18
Turning the tables
Switch OS
in P4
“This is precisely how you
must process packets”
Run-time API
Driver
PISA device
(Protocol-Independent Switch Architecture)
19
Macro
Action
Fixed Action
Table
Match
ACL Table
Macro
Action
Fixed Action
IPv6 Table
Table
Match
Macro
Action
Fixed Action
IPv4 Table
Table
Match
Fixed Action
Macro
Action
L2 Table
Table
Match
Programmable
Parser
P4 and PISA
P4 code
Compiler
Compiler Target
Queues
CLK
20
Current Research Projects
1. P4 as a front-end to configure OVS (with Ben Pfaff and Princeton)
– Approach 1: Statically compile P4 program to replace parse and matching in OVS
– Approach 2: Compile P4 to eBPF and dynamically load to kernel
– Early results suggest no performance penalty for programmability; in some cases faster
2. Domino: A higher level language (with MIT)
– C-like, process-to-completion. Includes stateful processing. Compiler generates P4
code.
3. PIFO: A hardware abstraction for programmable packet scheduling algorithms
4. xFabric: Calculating flow rates based on programmers utility function
5. PERC: Fast congestion control by proactive, direct calculation of flow rates in the
forwarding plane.
21
xFabric: Programmable Datacenter Fabric
• Applications declare their resource preferences
– Lowest latency, bandwidth allocation
• Network operators declare their resource usage policies
• Challenge is to automate optimal resource allocation for diverse
applications for the datacenter scale infrastructure
xFabric as a platform will ensure optimal resource allocations while
meeting application requirements while meeting operator policies
22
Scalable Control Plane: Generalized Customizable
• Separation of control plane is a common trend: networks/systems
– SDN, storage systems, MapReduce scheduler, …
• Control plane design represents a challenge
– Scale, throughput and latency metrics, abstractions that are easy to use
• We have been building control planes for specific systems
– ONOS, SoftRAN, and RAMCloud Coordinator
• Can we design a generalized scalable control plane with
– A common foundation that can be customized for different contexts
Design a new platform that makes it significantly easier to develop diverse
control planes with functionality and performance
23
Key Performance Requirements
High Volume of State:
~500GB-2TB
Apps
Apps
High Throughput:
~500K-20M ops / second
~100M state ops / second
High throughput | Low latency | Consistency | High availability
Low Latency to Events:
1-10s ms
Control
Global
ViewView
/ State
Global
Network
/ State
A distributed platform
required to meet the metrics
Server
Server
Difficult challenge!
Server
24
Storage
Generalized and Customizable
Scalable Control Plane
Apps
Northbound Abstraction:
- Interface to apps
- Provide different APIs
- Customize for the context
Core:
- distributed
- context independent
Northbound Absrtractions/APIs
(C/C++, Declective Programming, REST)
Strongly consistent, trasaction semantics?
Distributed Core
Cluster of Servers, 10-100Gbps, low latency RPC
Distributed State Management Primitives
Southbound
Southbound Abstraction:
- Interface to data plane
- Plug-ins for different
contexts
Switches
eNBs
Servers
Storage
Plug-in for different contexts
OpenFlow/NetConf
RAN Protocol?
RPC
Server
Server
Server
Storage
Platform Lab Vision
New platforms enable new applications
Platforms
Large Systems
Collaboration
26
Large Systems
• Why universities should do large systems projects:
– Companies don’t have time to evaluate, find best approach
– Universities can lead the market
– Produce better graduates
• Goal for Platform Lab:
– Create environment where large systems projects flourish
27
Collaboration
• Convergence of computing, storage, and networking is very important for future
infrastructure
• Swarm Pod and the target applications require expertise in many system areas
• The Platform Lab has brought a set of professors and their students with
expertise in different systems area to collaborate to address the challenges at
the convergence of computing, storage, and networking
28
Expected Results
Difficult to know the specifics at this point but our expectations and
history suggests Platform Lab will lead to
•
•
•
•
•
Influential ideas and architecture directions
Real systems or platforms with community of users
Graduates with strong systems skillset
Impact on the practice of computing and networking
Commercial impact with ideas, open source systems, and startups
Across several areas of systems: hardware & software; computing,
networking, and storage; different layers of the system; app domains;
…
29
Engagement Model
• Regular Members – Event Based Interactions
– Regular reviews and retreats
– Early access to results
– Access to faculty and students
• Premium Members – Active Collaboration
– Work together with committed engineers/researchers
– Be part of architecture, design, implementation, and evaluation of platforms
– Company staff participate in regular meetings including weekly meetings
30
Questions? Reactions?
Thank You!
Example Target Platforms
• Low-latency datacenter
(Dally, Katti, Kozyrakis, Levis, Ousterhout)
• RAMCloud
(Ousterhout, Rosenblum)
• Scalable control planes
(Katti, Ousterhout, Parulkar)
• Programmable network fabrics
(Katti, Levis, McKeown, Ousterhout, Parulkar)
• New memory/storage systems for the 21st Century
(Dally, Kozyrakis, Levis)
• Cloud query planner
(Winstein, Levis)
33
21st Century Abstractions for Memory and Storage
• Memory abstractions and storage hierarchies obsolete for today’s
workload and technologies
– E.g., memory is limited; temporal locality; moving data to computation is
efficient -- not true for many of today’s apps/environment
• Goals: revisit memory/storage abstractions and implementations
– Heterogeneous: combination of DRAM, SCM
– Aggressive memory sharing among apps across a sever, cluster, datacenter
– Support for QoS, near-data processing, and security
34
21st Century Abstractions for Memory and Storage
• Proposed design – ideas
– A single-level store based on objects or segments that will span apps, memory
technologies, and servers
– Objects will have logical attributes: persistence, indexing, …
– Objects will have physical attributes: encryption, replication requirements, …
– Apps/users specify logical attributes; compilers and run time systems manage
mapping and do background optimizations
• Develop hardware & software platforms for a single-level store
–
–
–
–
–
Efficient hardware structure for fast access
Compiler & system software for efficient mapping within and across servers
APIs and storage representation schemes
Security and privacy support
Cluster-wide management and optimization
35
Logically Centralized Control Plane
• Provides global network view
• Makes it easy to program
control, management, config
apps
• Enables new apps
36
Scalable Control Plane:
Perfect Platform for the Laboratory
Requires overcoming all of the fundamental challenges identified
• Physical limits
– To deliver on performance
• Heterogeneity and specialization
– Target environments are diverse: hardware to apps
• Scalability and elasticity
– Most control plane scenarios need scalability and elasticity
•
Raise the floor of the developer productivity
– Typically devops/netops people write apps for the controllers –
programming abstractions have to suite them
37
Target Platforms: Low Latency Datacenter
Evolution of Datacenters
• Phase 1: manage scale
–
–
–
–
10,000-100,000 servers within 50m radius
1 PB DRAM
100 PB disk storage
Challenge: how can one application harness thousands of servers?
• Answer: MapReduce, etc.
• But, communication latency high:
– 300-500µs round-trip times
– Must process data sequentially to hide latency
(e.g. MapReduce)
– Interactive applications limited in functionality
39
Why Does Latency Matter?
Data
Structures
Single machine
<< 1µs latency
UI
App.
Logic
Storage Servers
UI
App.
Logic
Web Application
Application Servers
Traditional Application
Datacenter
0.5-10ms latency
• Large-scale apps struggle with high latency
– Random access data rate has not scaled!
– Facebook: can only make 100-150 internal requests per page
40
Goal: Scale and Latency
Data
Structures
Single machine
<< 1µs latency
UI
App.
Logic
Storage Servers
UI
App.
Logic
Web Application
Application Servers
Traditional Application
Datacenter
0.5-10ms latency
5-10µs
• Enable new class of applications:
– Large-scale graph algorithms (machine learning?)
– Collaboration at scale?
41
Large-Scale Collaboration
“Region of Consciousness”
Gmail: email for one user
Facebook: 50-500 friends
Morning commute:
10,000-100,000 cars
Data for one user
42
Low Latency Datacenter
Goal: Build new hardware and software infrastructure that operates at
microsecond-scale latencies
• Build on RAMCloud RPC implementation:
– Reduce software overhead down from 2µs
– Support throughput as well as latency
– Reduce state per connection to support 1M connections/server in future
43
Target Platforms: RAMCloud
RAMCloud
Storage system for low-latency datacenters:
• General-purpose
• All data always in DRAM (not a cache)
• Durable and available
• Scale: 1000+ servers, 100+ TB
• Low latency: 5-10µs remote access
45
RAMCloud: Distributed Storage with Low Latency
1000 – 100,000 Application Servers
Appl.
Library
Appl.
Library
Appl.
Library
…
Appl.
Library
Datacenter
Network
Commodity
Servers
High-speed networking:
• 5 µs round-trip
• Full bisection bandwidth
Coordinator
Master
Master
Master
Backup
Backup
Backup
…
Master
Backup
Coordinator
Standby
External
Storage
(ZooKeeper)
64-256 GB
per server
1000 – 10,000 Storage Servers
Build higher level abstractions for ease of use while preserving or improving
performance
46
RAMCloud Performance
• Using Infiniband networking (24 Gb/s, kernel bypass)
– Other networking also supported, but slower
• Reads:
–
–
–
–
100B objects: 4.7µs
10KB objects: 10µs
Single-server throughput (100B objects): 900 Kops/sec.
Small-object multi-reads: 2M objects/sec.
• Durable writes:
– 100B objects: 13.5µs
– 10KB objects: 35µs
– Small-object multi-writes: 400-500K objects/sec.
47
RAMCloud Next Steps
Support higher-level features/abstractions
• Secondary indexes
• Multi-object transactions
• Graph operations
Without compromising scale and latency (as much as possible)
48