Transcript 投影片 1

National Sun Yat-Sen University
Embedded System Laboratory
XMCAPI:Inter-Core
Communication Interface
on Multi-chip Embedded
Systems
Presenter: Hung-Lun Chen
2013/12/2
Miura, S.
Center for Comput. Sci., Univ. of Tsukuba, Tsukuba, Japan
Hanawa, T. ; Boku, T. ; Sato, M.
Embedded and Ubiquitous Computing (EUC), 2011 IFIP 9th International Conference on
1
2015/7/21

2
Multi-core processor technology has been applied to the processors in
embedded systems as well as in ordinary PC systems. In multi-core
embedded processors, however, a processor may consist of heterogeneous
CPU cores that are not configured with a shared memory and do not have a
communication mechanism for inter-core communication. MCAPI is a highly
portable API standard for providing inter-core communication independent of
the architecture heterogeneity. In this paper, we extend the current MCAPI to
a multi-chip in a distributed memory configuration and propose its portable
implementation, named XMCAPI, on a commodity network stack. With
XMCAPI, the inter-core communication method for intra-chip cores is
extended to inter-chip cores. We evaluate the XMCAPI implementation,
xmcapi/ip, on a standard socket in a portable software development
environment.
2015/7/21
IA32
Embedded System
Memory
Shared
Often not shared
Cache
Coherent
cache
Often not shared
Fig 1. Comparison of IA32 and Embedded System
3

Therefore, communication mechanisms in IA32 may not
be appropriate to implement in embedded system.

So, we need an efficient communication mechanism to
coordinate the cores in distributed or parallel applications.
2015/7/21

Using MCAPI to create a new communication mechanism that not
only support intra-chip communication, but also support inter-chip
communication by distributed memory, named XMCAPI.


MCAPI:MULTICORE COMMUNICATIONS API WORKING GROUP (MCAPI®)
Why we choose the MCAPI to implement not MPI or OpenMP?

MPI or OpenMP
。 Consumes too much resources from a system and many of the functions they provide are overkill.
。 Large footprint required by their library implementation can not fit into the local memory for most
embedded platforms.

MCAPI
。 High portability、Independent、flexibility、low overhead、small memory footprint
Fig 2. The overview of the MCAPI from
their official website
4
2015/7/21
[1]
[2][3]
[4][5][6]
Avoid race conditions
Improve Performance
[7]
[9][10]
Message Passing Interface
1. High portability
2. Independent
3. flexibility
4. low overhead
5. small memory footprint
Support physical interface
[8]
Light weight
This paper :
XMCAPI: Inter-Core Communication Interface on Multi-chip Embedded Systems
5
2015/7/21

Supports a variety of physical interfaces and protocols.

If shared memory is available, we should use it for inter-core
communication.

Libraries used


OpenMP:Most typical MPI implementation

PM Libraries:Support multiple physical interfaces
Why not we just use the traditional communication libraries?

To achieve the better performance for MCAPI, XMCAPI should access the physical
interface directly to avoids overheads.
OFED
InfiniBand is a network
PEARL
1.communications
OpenFabrics Enterprise
Ethernet
link used
Process
and
Experiment
Distribution
1. It is the common network
in high-performance
2.
The OFED
stack includes
Automation
Realtime
type in modern development
computing
and enterprise
softwareis
drivers,
core kernelLanguage,
a
computer
2. It defines the connection
data
centers.
code,
middleware,
and
user-level
programming language
in physical layer,
interfaces.
designed for multitasking and
real-time programming
Fig 3. Overview of XMCAPI software stacks
6
2015/7/21

Ethernet


Commonly used in many systems, including embedded system.
TCP/IP ( Socket APIs)

Advantage:Does not depend on the operating system and the network device.

Disadvantage:Communication performance decreased due to does not clearly
defined services/interfaces/protocols in its library.
。 Against the property of XMCAPI of accessing the physical interface directly.

To improve the portability of the program, the paper implement the
XMCAPI with socket API, named XMCAPI/IP module.

Main purpose of XMCAPI/IP is to provide a test bed of the MCAPI application.

Advantage:Portability

Disadvantage:Decrease the communication performance because using TCP/IP
。 A trade-off between the success of implementation and cost of using TCP/IP.
。 Our purpose here is to extend the utilization and coverage field of MCAPI from its limited platform to a
wider system configuration with multi-chip solutions.
7
2015/7/21


Communicator thread and User thread are implemented by POSIX threads (i.e.,pthreads)

User thread:User applications

Communicator thread:Cores communication
Communicator thread:

epoll() (event triger):Manage the connection between two or more cores, and
notify the system with pipe().

pipe():Used only for event and control signal notification and the exchange of data.
Fig 4. Overview of the XMCAPI/IP implementation
8
2015/7/21

Flow:

First step:User thread tells the communicator thread to send data.

Second step:The communicator thread send the data from the request.

Third step:The other side communicator thread receive data and stored it
in a buffer.
( bits)
send_id, receiver_id, port_id
Message type
Header size:24 bytes
Determines it is an ACK packet or not
Indicates the queue pairs
Size of the payload
Data
9
Fig 5. Packet format on the XMCAPI/IP module
2015/7/21

Message

Data arrival guaranteed by TCP (Handshaking protocol)

Problem:
。 The receiver buffer may not have enough space to store data

Solution:
。 If receiver buffer is full, it respond with an ACK having a stop flag.
。 If receiver buffer have enough space, it tell the sender to send data.

Packet Channel (Unidirectional FIFO)

XMCAPI/IP access the buffer of the user application directly.
。 Advantage:User applications can access the receiver buffer directly, so unnecessary memory copies are
reduced in the packet channel.

Scalar Channel (Unidirectional FIFO)

10
Advantage:If we apply this mechanism in the XMCAPI/IP module, we can also
reduce the frequency of read()/write() system calls.
2015/7/21
SC:Single-Core
DC:Dual-Core
64
74
28
Fig 6. Latency of various data size of the MCAPI message

In this evaluation, the latency of XMCAPI/IP is larger than normal TCP/IP.

The overhead of 10 usec is added by the multi-threading operation in the
xmcapi/ip (SC) xmcapi/ip (DC)
single-core environment.
Latency

11
74-10-10=54
64-10=54
The communication time for pipe() of 10 usec is added to the XMCAPI/IP
environment.
2015/7/21

We transmit data of 1.0 Gbytes between two nodes in a singledirection communication.

The performance in single-core and dual-core environments are
approximately 112 Mbytes/sec at 32 Kbytes.

Realized 90% performance of the XMCAPI/IP environment.
112 Mbytes/sec at 32 KBytes
Fig 7. Bandwidth of various data size on the MCAPI packet channel
12
2015/7/21

XMCAPI is an extension of the MCAPI concept with the purpose of allowing inter-chip
and inter-node to communicate while keeping the compatibility with the original MCAPI
API.

To provide high portability on various hardware platforms, we implemented XMCAPI
based on TCP/IP on an Ethernet with a socket library.

To support asynchronous communication in MCAPI, we introduced the communicator
thread with POSIX threads for each user thread. The added thread causes a certain
amount of overhead that increases the latency for short messages.

13
This overhead is accepted to keep the compatibility and portability.

We could achieve approximately 90% of the theoretical peak performance on Gigabit
Ethernet as the bandwidth.

My comments

Experiments does not have the other implementations to compare.

The size of library of XMCAPI/IP does not shown in the paper.

Latency still a little too high, and it needs to be reduced.
2015/7/21