Interrupts and Exceptions
Download
Report
Transcript Interrupts and Exceptions
Socket Layer
COMS W6998
Spring 2010
Erich Nahum
Outline
Sockets API Refresher
Linux Sockets Architecture
Interface between BSD sockets and AF_INET
Interface between AF_INET and TCP/UDP
Receive Path
Send Path
BSD Socket API
Originally developed by UC Berkeley at the
dawn of time
Used by 90% of network oriented programs
Standard interface across operating systems
Simple, well understood by programmers
User Space Socket API
socket() / bind() / accept() / listen()
select() / poll() / epoll()
Stream oriented (e.g. TCP) Rx / Tx
sendto() / recvfrom()
Waiting for events
send() / recv()
Initialization, addressing and hand shaking
Datagram oriented (e.g. UDP) Rx / TX
close(), shutdown()
Closing down an association
Standard Socket Sequence
The ‘server’ application
socket()
bind()
The ‘client’ application
socket()
listen()
bind()
accept()
read()
write()
close()
3-way handshake
data flow to server
data flow to client
4-way handshake
connect()
write()
read()
close()
Socket() System Call
Creating a socket from user space is done by the
socket() system call:
int socket (int family, int type, int
protocol);
On success, a file descriptor for the new socket is
returned.
For open() system call (for files), we also get a file
descriptor as the return value.
“Everything is a file” Unix paradigm.
The first parameter, family, is also sometimes referred
to as “domain”.
Socket(): Family
A family is a suite of protocols
Each family is a subdirectory of linux/net
IPv4: PF_INET
IPv6: PF_INET6.
Packet sockets: PF_PACKET
E.g., linux/net/ipv4, linux/net/decnet, linux/net/packet
Operate at the device driver layer.
pcap library for Linux uses PF_PACKET sockets
pcap library is in use by sniffers such as tcpdump.
Protocol Family == Address Family
PF_INET == AF_INET (in /include/linux/socket.h)
Address/Protocol Families
/* Supported address families. */
#define AF_UNSPEC
0
#define AF_UNIX
1
/*
#define AF_LOCAL
1
/*
#define AF_INET
2
/*
#define AF_AX25
3
/*
#define AF_IPX
4
/*
#define AF_APPLETALK
5
/*
#define AF_NETROM
6
/*
#define AF_BRIDGE
7
/*
#define AF_ATMPVC
8
/*
#define AF_X25
9
/*
#define AF_INET6
10
/*
#define AF_ROSE
11
/*
#define AF_DECnet
12
/*
#define AF_NETBEUI
13
/*
#define AF_SECURITY
14
/*
#define AF_KEY
15
/*
..
#define AF_ISDN
34
/*
#define AF_PHONET
35
/*
#define AF_IEEE802154
36
/*
#define AF_MAX
37
/*
Unix domain sockets
*/
POSIX name for AF_UNIX
*/
Internet IP Protocol
*/
Amateur Radio AX.25
*/
Novell IPX
*/
AppleTalk DDP
*/
Amateur Radio NET/ROM
*/
Multiprotocol bridge
*/
ATM PVCs
*/
Reserved for X.25 project
*/
IP version 6
*/
Amateur Radio X.25 PLP
*/
Reserved for DECnet project */
Reserved for 802.2LLC project*/
Security callback pseudo AF */
PF_KEY key management API */
mISDN sockets
Phonet sockets
IEEE802154 sockets
For now.. */
*/
*/
*/
include/linux/socket.h
Socket(): Type
SOCK_STREAM and SOCK_DGRAM are
the mostly used types.
SOCK_STREAM for TCP, SCTP
SOCK_DGRAM for UDP.
SOCK_RAW for RAW sockets.
There are cases where protocol can be either
SOCK_STREAM or SOCK_DGRAM; for
example, Unix domain socket (AF_UNIX).
Socket(): Protocol
Protocol is protocol number within a family.
Internet protocols are assigned by IANA
For AF_INET, it’s usually 0.
http://www.iana.org/assignments/protocol-numbers/
IPPROTO_IP is 0, see: include/linux/in.h.
For SCTP:
protocol is IPPROTO_SCTP (132)
sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_SCTP);
For UDP-Lite:
protocol is IPPROTO_UDPLITE (136)
Socket Layer Architecture
PF_INET
SOCK_
STREAM
TCP
SOCK_
DGRAM
UDP
Application
User
BSD Socket Layer
Socket
Interface
PF_PACKET
SOCK
_RAW
SOCK
_RAW
PF_UNIX
….
….
SOCK_
DGRAM
PF_IPX
Protocol
Layers
IPV4
Kernel
Network Device Layer
Ethernet
Intel E1000
Token Ring
PPP
SLIP
FDDI
Device
Layer
Hardware
Key Concepts
Function pointer tables (“ops”)
In-kernel interfaces for socket functions
Binding between BSD sockets and AF_XXX families
Binding between AF_INET and transports (TCP, UDP)
Socket data structures
struct socket (BSD socket)
struct sock (protocol family socket, network state)
struct packet_sock (PF_PACKET)
struct inet_sock (PF_INET)
struct udp_sock
struct tcp_sock
Socket Data Structures
For every socket which is created by a user space application,
there is a corresponding struct socket and struct sock in the
kernel.
These are confusing.
struct socket: include/linux/net.h
Data common to the BSD socket layer
Has only 8 members
Any variable “sock” always refers to a struct socket
struct sock : include/net/sock/h
Data common to the Network Protocol layer (i.e., AF_INET)
has more than 30 members, and is one of the biggest structures
in the networking stack.
Any variable “sk” always refers to a struct sock.
struct socket
struct socket {
socket_state
short
unsigned long
struct fasync_struct
wait_queue_head_t
struct file
struct sock
const struct proto_ops
};
state; // SS_CONNECTING etc.
type; // SOCK_STREAM etc.
flags;
*fasync_list;
wait; // tasks waiting
*file; // back ptr to inode
*sk;
// AF specific state
*ops; // AF specific operations
include/linux/net.h
Socket State
typedef enum {
SS_FREE = 0,
SS_UNCONNECTED,
SS_CONNECTING,
SS_CONNECTED,
SS_DISCONNECTING
} socket_state;
/*
/*
/*
/*
/*
not allocated
unconnected to an socket
in process of connecting
connected to socket
in process of disconnecting
*/
*/
*/
*/
*/
These states are not layer 4 states (like TCP_ESTABLISHED or
TCP_CLOSE).
include/linux/net.h
Socket Types
enum sock_type {
SOCK_STREAM
SOCK_DGRAM
SOCK_RAW
SOCK_RDM
SOCK_SEQPACKET
SOCK_DCCP
SOCK_PACKET
};
=
=
=
=
=
=
=
1,
2,
3,
4,
5,
6,
10,
include/linux/net.h
Comment in include/net/sock.h
/*
* This structure really needs to be cleaned up.
* Most of it is for TCP, and not used by any of
* the other protocols.
*/
struct sock_common
/* minimal network layer representation of sockets */
struct sock_common {
/*
* first fields are not copied in sock_copy()
*/
union {
struct hlist_node
skc_node;
// main hash linkage for lookup
struct hlist_nulls_node skc_nulls_node; // main hash for TCP/UDP
};
atomic_t
skc_refcnt;
int
skc_tx_queue_mapping; // tx queue for this connection
union {
unsigned int
skc_hash;
// hash value for lookup
__u16
skc_u16hashes[2];
};
unsigned short
skc_family;
// network address family
volatile unsigned char skc_state;
// Connection state
unsigned char
skc_reuse;
// SO_REUSEADDR setting
int
skc_bound_dev_if;
// bound if !=0
union {
struct hlist_node
skc_bind_node;
// bind hash linkage
struct hlist_nulls_node skc_portaddr_node; // bind hash for UDP/Lite
};
struct proto
*skc_prot; // protocol handlers in a net family
};
include/net/sock.h
Outline
Sockets API Refresher
Linux Sockets Architecture
Interface between BSD sockets and AF_INET
Interface between AF_INET and TCP/UDP
Receive Path
Send Path
BSD Socket AF Interface
Main data structures
struct net_proto_family
struct proto_ops
Key function
sock_register(struct net_proto_family *ops)
Each address family:
Implements the struct net _proto_family.
Calls the function sock_register( ) when the protocol
family is initialized.
Implement the struct proto_ops for binding the BSD
socket layer and protocol family layer.
BSD Socket Layer
net_proto_family
AF Socket Layer
Describes each of the supported protocol families
struct net_proto_family {
int family;
int (*create)(struct net *net, struct socket
*sock, int protocol, int kern);
struct module *owner;
}
Specifies the handler for socket creation
create() function is called whenever a new socket of this type is
created
BSD Socket Layer
AF Socket Layer
INET and PACKET proto_family
static const struct net_proto_family
inet_family_ops = {
.family = PF_INET,
.create = inet_create,
.owner = THIS_MODULE,
/* af_inet.c */
};
static const struct net_proto_family
packet_family_ops = {
.family = PF_PACKET,
.create = packet_create,
.owner = THIS_MODULE,
/* af_packet.c
*/
};
BSD Socket Layer
proto_ops
AF Socket Layer
Defines the binding between the BSD
socket layer and address family (AF_*)
layer.
The proto_ops tables contain function
exported by the AF socket layer to the BSD
socket layer
It consists of the address family type and a
set of pointers to socket operation routines
specific to a particular address family.
BSD Socket Layer
struct proto_ops
struct proto_ops {
int
struct module
int
int
int
int
int
int
unsigned int
int
int
int
int
int
int
int
int
int
int
int
ssize_t
ssize_t
};
AF Socket Layer
family;
*owner;
(*release);
(*bind);
(*connect);
(*socketpair);
(*accept);
(*getname);
(*poll);
(*ioctl);
(*compat_ioctl);
(*listen);
(*shutdown);
(*setsockopt);
(*getsockopt);
(*compat_setsockopt);
(*compat_getsockopt);
(*sendmsg);
(*recvmsg);
(*mmap);
(*sendpage);
(*splice_read);
include/linux/net.h
BSD Socket Layer
PF_PACKET proto_opsAF Socket Layer
static const struct
.family =
.owner =
.release =
.bind =
.connect =
.socketpair
.accept =
.getname =
.poll =
.ioctl =
.listen =
.shutdown =
.setsockopt
.getsockopt
.sendmsg =
.recvmsg =
.mmap =
.sendpage =
};
proto_ops packet_ops = {
PF_PACKET,
THIS_MODULE,
packet_release,
packet_bind,
sock_no_connect,
=
sock_no_socketpair,
sock_no_accept,
packet_getname,
packet_poll,
packet_ioctl,
sock_no_listen,
sock_no_shutdown,
=
packet_setsockopt,
=
packet_getsockopt,
packet_sendmsg,
packet_recvmsg,
packet_mmap,
sock_no_sendpage,
net/packet/af_packet.c
BSD Socket Layer
PF_INET proto_ops
AF Socket Layer
inet_stream_ops (TCP)
inet_dgram_ops (UDP)
inet_sockraw_ops (RAW)
.family
PF_INET
PF_INET
PF_INET
.owner
THIS_MODULE
THIS_MODULE
THIS_MODULE
.release
inet_release
inet_release
inet_release
.bind
inet_bind
inet_bind
inet_bind
.connect
inet_stream_connect
inet_dgram_connect
inet_dgram_connect
.socketpair
sock_no_socketpair
sock_no_socketpair
sock_no_socketpair
.accept
inet_accept
sock_no_accept
sock_no_accept
.getname
inet_getname
inet_getname
inet_getname
.poll
tcp_poll
udp_poll
datagram_poll
.ioctl
inet_ioctl
inet_ioctl
inet_ioctl
.listen
inet_listen
sock_no_listen
sock_no_listen
.shutdown
inet_shutdown
inet_shutdown
inet_shutdown
.setsockopt
sock_common_setsockopt
sock_common_setsockopt
sock_common_setsockopt
.getsockopt
sock_common_getsockop
sock_common_getsockop
sock_common_getsockop
.sendmsg
tcp_sendmsg
inet_sendmsg
inet_sendmsg
.recvmsg
sock_common_recvmsg
sock_common_recvmsg
sock_common_recvmsg
.mmap
sock_no_mmap
sock_no_mmap
sock_no_mmap
.sendpage
tcp_sendpage
inet_sendpage
inet_sendpage
.splice_read
tcp_splice_read
--
--
net/ipv4/af_inet.c
Outline
Sockets API Refresher
Linux Sockets Architecture
Interface between BSD sockets and AF_INET
Interface between AF_INET and TCP/UDP
Binding between IP and TCP/UDP (upcall)
Binding between AF_INET and TCP (downcall)
Receive Path
Send Path
AF_INET Layer
AF_INET TransportTransport
APILayer
struct inet_protos
Interface between IP and the transport layer
Is the upcall binding from IP to transport
Method for demultiplexing IP packets to proper transport
struct proto
Defines interface for individual protocols (TCP, UDP, etc)
Is the downcall binding for AF_INET to transport
Transport-specific functions for socket API
struct inet_protosw
Describes the PF_INET protocols
Defines the different SOCK types for PF_INET
SOCK_STREAM (TCP), SOCK_DGRAM (UDP), SOCK_RAW
BSD Socket Layer
Recall IP’s inet_protos AF Socket Layer
net_protocol
inet_protos[MAX_INET_PROTOS]
0
handler
udp_rcv()
udp_err()
err_handler
gso_send_check
gso_segment
gro_receive
gro_complete
1
net_protocol
handler
err_handler
gso_send_check
gso_segment
gro_receive
gro_complete
MAX_INET_
PROTOS
net_protocol
igmp_rcv()
Null
Receive binding
from the IP layer to
the transport layer.
init_inet( ) calls
inet_add_protocol
(p) to add each
protocol to the hash
queues.
BSD Socket Layer
struct proto
AF Socket Layer
/* Networking protocol blocks we attach to sockets.
* socket layer -> transport layer interface
*/
struct proto {
void
(*close);
int
(*connect);
int
(*disconnect);
struct sock *
(*accept);
int
(*ioctl);
int
(*init);
void
(*destroy);
void
(*shutdown);
int
(*setsockopt);
int
(*getsockopt);
int
(*sendmsg);
int
(*recvmsg);
int
(*sendpage);
int
(*bind);
int
(*backlog_rcv);
void
(*hash);
void
(*unhash);
int
(*get_port);
}
include/linux/net.h
BSD Socket Layer
udp_prot
struct proto udp_prot = {
.name
.owner
.close
.connect
.disconnect
.ioctl
.destroy
.setsockopt
.getsockopt
.sendmsg
.recvmsg
.sendpage
.backlog_rcv
.hash
.unhash
.get_port
.memory_allocated
.sysctl_mem
.sysctl_wmem
.sysctl_rmem
.obj_size
.slab_flags
.h.udp_table
#ifdef CONFIG_COMPAT
.compat_setsockopt
.compat_getsockopt
#endif
};
AF Socket Layer
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
"UDP",
THIS_MODULE,
udp_lib_close,
ip4_datagram_connect,
udp_disconnect,
udp_ioctl,
udp_destroy_sock,
udp_setsockopt,
udp_getsockopt,
udp_sendmsg,
udp_recvmsg,
udp_sendpage,
__udp_queue_rcv_skb,
udp_lib_hash,
udp_lib_unhash,
udp_v4_get_port,
&udp_memory_allocated,
sysctl_udp_mem,
&sysctl_udp_wmem_min,
&sysctl_udp_rmem_min,
sizeof(struct udp_sock),
SLAB_DESTROY_BY_RCU,
&udp_table,
= compat_udp_setsockopt,
= compat_udp_getsockopt,
net/ipv4/af_inet.c
BSD Socket Layer
inet_protosw
static struct inet_protosw inetsw_array[] =
{
{
.type =
SOCK_STREAM,
.protocol =
IPPROTO_TCP,
.prot =
&tcp_prot,
.ops =
&inet_stream_ops,
.no_check =
0,
.flags =
INET_PROTOSW_PERMANENT |
INET_PROTOSW_ICSK,
},
{
.type =
SOCK_DGRAM,
.protocol =
IPPROTO_UDP,
.prot =
&udp_prot,
.ops =
&inet_dgram_ops,
.no_check =
UDP_CSUM_DEFAULT,
.flags =
INET_PROTOSW_PERMANENT,
},
{
.type =
SOCK_RAW,
.protocol =
IPPROTO_IP, /* wild card */
.prot =
&raw_prot,
.ops =
&inet_sockraw_ops,
.no_check =
UDP_CSUM_DEFAULT,
.flags =
INET_PROTOSW_REUSE,
}
};
AF Socket Layer
On startup (inet_init()),
TCP, UDP, and Raw
socket protocols are
inserted into the
inetsw_array[].
Other protocols call
inet_register_protosw()
inet_unregister_protosw()
will not remove protocols
with PERMANENT set.
net/ipv4/af_inet.c
Relationships
struct socket
state
type
flags
fasync_list
wait
file
sk
proto_ops
struct sock
sk_common
sk_lock
sk_backlog
...
(*sk_prot_creator)
sk_socket
sk_send_head
...
struct proto_ops
PF_INET
af_inet.c
inet_release
inet_bind
inet_accept
...
struct sock_common
skc_node
skc_refcnt
skc_hash
...
skc_proto
skc_net
struct proto
udp_lib_close
ipv4_dgram_connect
udp_sendmsg
udp_recvmsg
...
Example: inet_accept()
int inet_accept(struct socket *sock, struct socket *newsock, int flags)
{
struct sock *sk1 = sock->sk;
int err = -EINVAL;
struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, &err);
if (!sk2)
goto do_err;
lock_sock(sk2);
WARN_ON(!((1 << sk2->sk_state) &
(TCPF_ESTABLISHED | TCPF_CLOSE_WAIT | TCPF_CLOSE)));
sock_graft(sk2, newsock);
newsock->state = SS_CONNECTED;
err = 0;
release_sock(sk2);
do_err:
return err;
}
Backup