PowerPoint format
Download
Report
Transcript PowerPoint format
Network-I/O Convergence
in “Too Fast” Networks:
Threats and Countermeasures
David R. Cheriton
Stanford University
Network-I/O Convergence:
An Old Story
Network – I/O Convergence – an old story
File transfer vs. tape in 1950/60’s
File I/O from way back
1975: Thoth – message-based but RDMA-like
“move” operation necessary for read/writes
1985: V – distributed message passing – again an
RDMA-like move for file I/O
File servers – only external I/O is network
Blade servers – only I/O is network, except on boot
Network-I/O Convergence is not new
Recent Problem:
Attacks and Attack Resistance
Reordering – what does that do to delivery
Reordering must not kill performance
Inserted/forged packets – encrypted
Must not corrupt memory or use extra bandwidth
Replay – partial packets
Being attacked by your peripherals
iSCSI – one of your SAN disks could be
compromised
Ideal: Performance of host not degraded by
attacks
New Problem: “Too Fast” Networks
E.g. 10 Gbps Ethernet
Network speeds exceed memory and processor
Network processing cost increases: MAC and
decryption
Too fast to handle goodput in software
i.e. receive, demux, decrypt, deliver into memory
Too fast for protecting host in software
Too many hardware resources used to reject a packet
Need for encryption just makes it worse
Not Just Zero-Copy
“Too fast” means very expensive not to be protect and
not feasible to do in software
Objective: Zero-copy, zerocorruption, zero-compromise
Receiver authorization: fixed limit on cost on
receiver host processor/memory resources
E.g. 10 percent mem/processor cycles per net
I/F
System performance depends on this
No combination of packet traffic (attack) can
exceed
Resource allocation/protection problem
How to do, how to do efficiently, how to do
safely
What about Moore’s Law?
“Too fast” is intrinsic
Fiber operating at the limit of the memory system
Switch write/read
Processor operating at the limit of the memory
system
system write/read on reception
Memory goes faster – network will go faster
Too fast networks – at the limit of memory speeds
Collision between I/O and Processor
for Hardware Resources
Contention for pins to memory between I/O vs.
processor
Memory is random-access, caching – latency-sensitive,
temporal/spatial locality, single subsystem
I/O streams: streaming, multiple subsystems
Contention for on-chip state
Mapping state for network vs VM, cache, etc.
Like VM page tables
Contention on for-chip logic:
Software-centric protocol design overhead
Worse with multiprocessors
Multiple integrated NICs in processor chip so more
demands
If not in the processor, way across the I/O “network”
E.g. Infiniband link
Threat: Infiniband
Specialized networks for I/O
Just like Fiber channel
1995 (actually 1999): RDMA over TCP, versus
NextGenIO, FutureIO and now Infiniband
2005 (actually 2003): RDMA-based transport
Unified I/O designed protocol architecture
Safer because limited range – data center network
Potential for deconvergence, at least relative to
general-purpose networking, Ethernet, IP
Note: need for remote disaster recovery
GE Ethernet adaptors on the edge of the data
center net
Fix IP for storage or else lose to IB
The Multi-Layer Solution?
Ethernet/IP/ESP/TCP/MPA/DDP/RDMA
Many layers, redundancy, complexity, semantics
E.g. TCP sequencing semantics
Plus, still need control plane communication
so need HW demux, delivery, decryption
there too
Can an attacker compromise by?
forcing high CPU, extra memory bandwidth.
flooding garbage traffic, including control plane
Trip-up the “fast path” with exceptions
Very complex “solution”, or “meta solution”
Meta-protocols
Standardize yet don’t provide interoperability
E.g. RDMA over …
Several different choices next level down, e.g.
SCTP but TCP allowed.
Standards are too flexible to design hardware
to.
Good standards require hard choices to get
interoperability and market size, not
metaprotocols
High-performance RPC Problem
High-performance RPC including framing support
(marshal/demarshal) for large parameters
E.g. file/block read and write
Networks as fast as memory so copy is painful
Semantic gap between transport and OO RPC
Transport is byte stream but RPC is frames
Makes it hard to avoid copies
Safe, secure, transport with prevention of DoS, e.g.
syn attacks
RPC – control plane for RDMA
Not just RDMA, RPC needs to be handled too
Proposed Solution: Refactoring the
Transport Layer Protocol
Theory: refactoring the protocol design
problem between hardware level and nonhardware
Hardware
Hardware must protect resources – mem BW,
memory, processor
HW data path, e.g. receive, decrypt, MAC,
deliver to memory in right place and right way,
else drop
Software handles control, with control
building on hardware path so “immune” to
Solution: RDMA-based Protocol
Region-level: handles packet-based data
delivery
Receive, decrypt, MAC and copy to memory
region
Control-level: RPC using RDMA regions
Shamelessly harvest TCP techniques
fast retransmit
Slow-start, etc.
Connection Management: RPC using special
RDMA region
Integrate key exchange, session setup
Region
What: Collection of packet frames to/from which a
sequence of packets of particular flow are mapped
Flow label plus an offset/seqNo field maps to frame
in region
else drop
Static MTU per frame from MTU discovery
Transmission:
Gather, encryption and authentication
Region – like in virtual memory, but frames, not
pages
Similar mapping to page tables
Region structure
Flowlabel
sequenceNo
region
k
k+1
frame
frame
k+w
...
frame
File system disk buffer
E.g. UDP header plus 32-bit offset
Similar to page mapping except
MTU-size packet frames
Packet reception state in region desc and frame descriptors
Region, cont’d
Delivery conditions:
Only deliver if decrypts and MACs
Only deliver if maps to region and buffer
Only deliver into exact location
Best-efforts delivery with retrans at higher-level
Pros:
Simple state and logic for xmit, recv, acking
Competitive with Infiniband
Protection: no memory cost if packet not accepted
Multiple Protocols feasible: UDP, ESP
Fine for data, what about control?
ROP Control Level
Build on hardware region mechanism for
delivery and transmission
Exploit OORPC technology
Call, return and update regions as well as
application-level RDMA regions
Referred to as a “channel”
Acks as RPC calls into update region
Software or hardware processing of acks
Same HW decryption, authentication,
delivery mechanisms apply
File Write
Client has return region and update region
Server has call and update region plus region per RDMA
buffer
Client knows identifier for buffer from file open
File write:
Client RDMAs data to remote buffer using region transmission
mech.
Client sends RPC write call to server call region, identifying
buffer
Server is notified of reception of write call
Server checks data is completely received
If not, requests retransmission of missing
Perform write processing and map buffer to new memory
Send RPC return frame to client’s return region
Client processes return frame and returns
Regions are used as (hardware) windows to flow control
Connection Setup
Channel Manager: mechanism to
create/setup new channels
Provides a default call region for channel
setup
Channel setup as RPCs
Present credentials and get a cookie
Present a cookie and get a new channel
Builds on experience with SYN attacks
Exposure: flooding of these channel setup
RPCs
Conclusions
Network-I/O convergence – old story
The threat: “too fast” networks – too fast for software
protocol design as usual
Memory is most limited, I/O is memory-intensive
Attacks on host resources
Threat: specialized I/O networks vs. complex generalpurpose
Refactoring protocol design between hardware and software
Protecting resources, efficient delivery
Direct data placement only part of the story
New transport protocol better than going to a competing network
protocol architecture, e.g. Infiniband
RPC-simplified protocol – demux/delivery for control plane
There is a counter-measure – can the IP/Ethernet
community respond?