Transcript 投影片 1
National Sun Yat-Sen University
Embedded System Laboratory
XMCAPI:Inter-Core
Communication Interface
on Multi-chip Embedded
Systems
Presenter: Hung-Lun Chen
2013/12/2
Miura, S.
Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan
Hanawa, T. ; Boku, T. ; Sato, M.
Embedded and Ubiquitous Computing (EUC), 2011 IFIP 9th International Conference on
1
2015/7/21
2
Multi-core processor technology has been applied to the processors in
embedded systems as well as in ordinary PC systems. In multi-core
embedded processors, however, a processor may consist of heterogeneous
CPU cores that are not configured with a shared memory and do not have a
communication mechanism for inter-core communication. MCAPI is a highly
portable API standard for providing inter-core communication independent of
the architecture heterogeneity. In this paper, we extend the current MCAPI to
a multi-chip in a distributed memory configuration and propose its portable
implementation, named XMCAPI, on a commodity network stack. With
XMCAPI, the inter-core communication method for intra-chip cores is
extended to inter-chip cores. We evaluate the XMCAPI implementation,
xmcapi/ip, on a standard socket in a portable software development
environment.
2015/7/21
IA32
Embedded System
Memory
Shared
Often not shared
Cache
Coherent
cache
Often not shared
Fig 1. Comparison of IA32 and Embedded System
3
Therefore, communication mechanisms in IA32 may not
be appropriate to implement in embedded system.
So, we need an efficient communication mechanism to
coordinate the cores in distributed or parallel applications.
2015/7/21
Using MCAPI to create a new communication mechanism that not
only support intra-chip communication, but also support inter-chip
communication by distributed memory, named XMCAPI.
MCAPI:MULTICORE COMMUNICATIONS API WORKING GROUP (MCAPI®)
Why we choose the MCAPI to implement not MPI or OpenMP?
MPI or OpenMP
。 Consumes too much resources from a system and many of the functions they provide are overkill.
。 Large footprint required by their library implementation can not fit into the local memory for most
embedded platforms.
MCAPI
。 High portability、Independent、flexibility、low overhead、small memory footprint
Fig 2. The overview of the MCAPI from
their official website
4
2015/7/21
[1]
[2][3]
[4][5][6]
Avoid race conditions
Improve Performance
[7]
[9][10]
Message Passing Interface
1. High portability
2. Independent
3. flexibility
4. low overhead
5. small memory footprint
Support physical interface
[8]
Light weight
This paper :
XMCAPI: Inter-Core Communication Interface on Multi-chip Embedded Systems
5
2015/7/21
Supports a variety of physical interfaces and protocols.
If shared memory is available, we should use it for inter-core
communication.
Libraries used
OpenMP:Most typical MPI implementation
PM Libraries:Support multiple physical interfaces
Why not we just use the traditional communication libraries?
To achieve the better performance for MCAPI, XMCAPI should access the physical
interface directly to avoids overheads.
OFED
InfiniBand is a network
PEARL
1.communications
OpenFabrics Enterprise
Ethernet
link used
Process
and
Experiment
Distribution
1. It is the common network
in high-performance
2.
The OFED
stack includes
Automation
Realtime
type in modern development
computing
and enterprise
softwareis
drivers,
core kernelLanguage,
a
computer
2. It defines the connection
data
centers.
code,
middleware,
and
user-level
programming language
in physical layer,
interfaces.
designed for multitasking and
real-time programming
Fig 3. Overview of XMCAPI software stacks
6
2015/7/21
Ethernet
Commonly used in many systems, including embedded system.
TCP/IP ( Socket APIs)
Advantage:Does not depend on the operating system and the network device.
Disadvantage:Communication performance decreased due to does not clearly
defined services/interfaces/protocols in its library.
。 Against the property of XMCAPI of accessing the physical interface directly.
To improve the portability of the program, the paper implement the
XMCAPI with socket API, named XMCAPI/IP module.
Main purpose of XMCAPI/IP is to provide a test bed of the MCAPI application.
Advantage:Portability
Disadvantage:Decrease the communication performance because using TCP/IP
。 A trade-off between the success of implementation and cost of using TCP/IP.
。 Our purpose here is to extend the utilization and coverage field of MCAPI from its limited platform to a
wider system configuration with multi-chip solutions.
7
2015/7/21
Communicator thread and User thread are implemented by POSIX threads (i.e.,pthreads)
User thread:User applications
Communicator thread:Cores communication
Communicator thread:
epoll() (event triger):Manage the connection between two or more cores, and
notify the system with pipe().
pipe():Used only for event and control signal notification and the exchange of data.
Fig 4. Overview of the XMCAPI/IP implementation
8
2015/7/21
Flow:
First step:User thread tells the communicator thread to send data.
Second step:The communicator thread send the data from the request.
Third step:The other side communicator thread receive data and stored it
in a buffer.
( bits)
send_id, receiver_id, port_id
Message type
Header size:24 bytes
Determines it is an ACK packet or not
Indicates the queue pairs
Size of the payload
Data
9
Fig 5. Packet format on the XMCAPI/IP module
2015/7/21
Message
Data arrival guaranteed by TCP (Handshaking protocol)
Problem:
。 The receiver buffer may not have enough space to store data
Solution:
。 If receiver buffer is full, it respond with an ACK having a stop flag.
。 If receiver buffer have enough space, it tell the sender to send data.
Packet Channel (Unidirectional FIFO)
XMCAPI/IP access the buffer of the user application directly.
。 Advantage:User applications can access the receiver buffer directly, so unnecessary memory copies are
reduced in the packet channel.
Scalar Channel (Unidirectional FIFO)
10
Advantage:If we apply this mechanism in the XMCAPI/IP module, we can also
reduce the frequency of read()/write() system calls.
2015/7/21
SC:Single-Core
DC:Dual-Core
64
74
28
Fig 6. Latency of various data size of the MCAPI message
In this evaluation, the latency of XMCAPI/IP is larger than normal TCP/IP.
The overhead of 10 usec is added by the multi-threading operation in the
xmcapi/ip (SC) xmcapi/ip (DC)
single-core environment.
Latency
11
74-10-10=54
64-10=54
The communication time for pipe() of 10 usec is added to the XMCAPI/IP
environment.
2015/7/21
We transmit data of 1.0 Gbytes between two nodes in a singledirection communication.
The performance in single-core and dual-core environments are
approximately 112 Mbytes/sec at 32 Kbytes.
Realized 90% performance of the XMCAPI/IP environment.
112 Mbytes/sec at 32 KBytes
Fig 7. Bandwidth of various data size on the MCAPI packet channel
12
2015/7/21
XMCAPI is an extension of the MCAPI concept with the purpose of allowing inter-chip
and inter-node to communicate while keeping the compatibility with the original MCAPI
API.
To provide high portability on various hardware platforms, we implemented XMCAPI
based on TCP/IP on an Ethernet with a socket library.
To support asynchronous communication in MCAPI, we introduced the communicator
thread with POSIX threads for each user thread. The added thread causes a certain
amount of overhead that increases the latency for short messages.
13
This overhead is accepted to keep the compatibility and portability.
We could achieve approximately 90% of the theoretical peak performance on Gigabit
Ethernet as the bandwidth.
My comments
Experiments does not have the other implementations to compare.
The size of library of XMCAPI/IP does not shown in the paper.
Latency still a little too high, and it needs to be reduced.
2015/7/21