Cell-based IC Design Concepts

Download Report

Transcript Cell-based IC Design Concepts

Inter-Processor Communication for
Heterogeneous Dual Core Systems
2006/09/27
Chun-Ming Huang, Ph.D.
National Chip Implementation Center (CIC)
[email protected]
Agenda






IPC Overview
IPC Schemes
Nokia DSP Gateway
TI DSP/BIOS Link
IPC Hardware Architecture
Conclusions
C. M. Huang / SLDC-IPC / 09.2006
2
IPC Overview
What is IPC?
 Inter-Process Communication
 Inter-Processor Communication
P1
Single-Chip
P
1
Multi-Chip
P
2
P
1
C1
Single-Core
P2
P
1
P
2
P
1
C1
P
2
P
2
C2
How to provide inter-process communication
services for multi-core systems?
Multi-Core
C. M. Huang / SLDC-IPC / 09.2006
P
1
P
1
C1
P
2
P
1
P
2
C2
P
2
P
1
P
2
C1
P
1
P
1
C2
P
2
P
1
P
2
C3
P
2
P
1
P
3
P
2
C4
4
Independent & Cooperating Process
 Processes executing concurrently in the multitasking
environment may be either independent processes or
cooperating processes
 A process is independent if it cannot affect or be affected by
the other processes executing in the system; any process
that does not share data with any other process is
independent
 A process is cooperating if it can affect or be affected by the
other processes executing in the system; any process that
shares data with other processes is a cooperating process
Silberschatz, et al., Operating System Principles, Seventh Edition
C. M. Huang / SLDC-IPC / 09.2006
5
Why Allow Process Cooperation?




Information sharing
Computation speedup
Modularity
Convenience
 Cooperating processes requires an inter-process
communication (IPC) mechanism that will allow them to
exchange data and information
Silberschatz, et al., Operating System Principles, Seventh Edition
C. M. Huang / SLDC-IPC / 09.2006
6
IPC Example
 Unix pipe
 ls –l / | grep 2005 | wc
 2 19 98
 The grep utility searches text files for a pattern and prints all
lines that contain that pattern.
 The wc utility displays a count of lines, words and characters
in a text file.
 Data exchange
 Synchronization
C. M. Huang / SLDC-IPC / 09.2006
7
Operating System Kernel Components
 Process scheduler
– determines when and for how long a process execute on a processor
 Memory manager
– determines when and how memory is allocated to processes and what to do
when memory becomes full
 I/O manager
– services input and output requests from and to hardware devices
 Inter-process communication (IPC) manager
– allows processes to communicate with one other
 File system manager
– organizes named collections of data on storage devices and provides an
interface for accessing data on those devices
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
8
Linux Kernel 2.6.17.11
drwxr-xr-x arch
drwxr-xr-x block
drwxr-xr-x crypto
drwxr-xr-x drivers
drwxr-xr-x fs
drwxr-xr-x include
-rw-r--r-- Makefile
-rw-r--r-- compat.c
-rw-r--r-- compat_mq.c
drwxr-xr-x init
-rw-r--r-- mqueue.c
drwxr-xr-x ipc
-rw-r--r-- msg.c
drwxr-xr-x kernel
-rw-r--r-- msgutil.c
drwxr-xr-x lib
drwxr-xr-x mm
drwxr-xr-x net
-rw-r--r-- sem.c
-rw-r--r-- shm.c
drwxr-xr-x scripts
-rw-r--r-- util.c
drwxr-xr-x security
-rw-r--r-- util.h
drwxr-xr-x sound
drwxr-xr-x usr
C. M. Huang / SLDC-IPC / 09.2006
http://www.kernel.org
9
Machine-Independent SW in the FreeBSD Kernel
Category
Lines of Code
Percentage of Kernel (%)
Headers
initialization
kernel facilities
generic interfaces
38,158
1,663
53,805
22,191
4.8
0.2
6.7
2.8
interprocess communication
terminal handling
virtual memory
vnode memory
local filesystem
miscellaneous filesystems (19)
network filesystem
network communication
Internet V4 protocols
Internet V6 protocols
IPsec
netgraph
cryptographic support
GEOM layer
CAM layer
ATA layer
ISA bus
PCI bus
pccard bus
Linux compatibility
10,019
5,798
24,714
22,764
28,067
58,753
22,436
46,570
41,220
45,527
17,956
74,338
7,515
11,563
41,805
14,192
10,984
72,366
6,916
10,474
1.3
0.7
3.1
2.9
3.5
7.4
2.8
5.8
5.2
5.7
2.2
9.3
0.9
1.4
5.2
1.8
1.4
9.1
0.9
1.3
Total Machine Independent
689,794
86.4
C. M. Huang / SLDC-IPC / 09.2006
McKusic & Neville-Neil, The Design and Implementation of the FreeBSD Operating System
10
Homogeneous vs. Heterogeneous
Sun
TI OMAP 5910
C. M. Huang / SLDC-IPC / 09.2006
11
Multiprocessor OS Organizations
 Can classify systems based on how processors share
operating system responsibilities
 Three types
– Master/slave
– Separate kernels
– Symmetrical organization
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
12
Master/Slave
 Master/Slave organization
–
–
–
–
–
–
Master processor executes the operating system
Slaves execute only user processors
Hardware asymmetry
Low fault tolerance
Good for computationally intensive jobs
Example: nCUBE system
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
13
Separate Kernels
 Separate kernels organization
–
–
–
–
Each processor executes its own operating system
Some globally shared operating system data
Loosely coupled
Catastrophic failure unlikely, but failure of one processor results in
termination of processes on that processor
– Little contention over resources
– Example: Tandem system
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
14
Symmetrical Organization
 Symmetrical organization
–
–
–
–
–
–
Operating system manages a pool of identical processors
High amount of resource sharing
Need for mutual exclusion
Highest degree of fault tolerance of any organization
Some contention for resources
Example: BBN Butterfly
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
15
Memory Access Architectures
 Memory access
– Can classify multiprocessors based on how processors share
memory
– Goal: Fast memory access from all processors to all memory
• Contention in large systems makes this impractical
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
16
Uniform Memory Access
 Uniform memory access (UMA) multiprocessor
– All processors share all memory
– Access to any memory page is nearly the same for all processors
and all memory modules (disregarding cache hits)
– Typically uses shared bus or crossbar-switch matrix
– Also called symmetric multiprocessing (SMP)
– Small multiprocessors (typically two to eight processors)
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
17
Uniform Memory Access
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
18
Non-Uniform Memory Access
 Non-uniform memory access (NUMA) multiprocessor
– Each node contains a few processors and a portion of system
memory, which is local to that node
– Access to local memory faster than access to global memory (rest
of memory)
– More scalable than UMA (fewer bus collisions)
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
19
Non-Uniform Memory Access
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
20
Cache-Only Memory Architecture
 Cache-only memory architecture (COMA) multiprocessor
– Physically interconnected as a NUMA is
• Local memory vs. global memory
– Main memory is viewed as a cache and called an attraction
memory (AM)
• Allows system to migrate data to node that most often accesses it at
granularity of a memory line (more efficient than a memory page)
• Reduces the number of cache misses serviced remotely
• Overhead
– Duplicated data items
– Complex protocol to ensure all updates are received at all processors
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
21
Cache-Only Memory Architecture
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
22
No Remote Memory Access
 No-remote-memory-access (NORMA) multiprocessor
– Does not share physical memory
– Some implement the illusion of shared physical memory—shared
virtual memory (SVM)
– Loosely coupled
– Communication through explicit messages
– Distributed systems
– Not networked system
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
23
No Remote Memory Access
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
24
Four Possible Cases
Symmetrical OSs
Asymmetrical
OSs
Homogeneous
Cores
CPU_A(OS_X)
CPU_A(OS_X)
CPU_A(OS_X)
CPU_A(OS_Y)
Heterogeneous
Cores
CPU_A(OS_X)
CPU_B(OS_X)
CPU_A(OS_X)
CPU_B(OS_Y)
C. M. Huang / SLDC-IPC / 09.2006
25
IPC Schemes
Communication via Files
 Communication via files is in fact the oldest way of
exchanging data between programs. Program A writes data
to a file and Program B reads it. In a system in which only
one program can be run at any given time, this does not
present any problem.
 In a multitasking system, however both programs could be
run as processes at least quasi-parallel to each other. Race
conditions then usually produce inconsistencies in the file
data which result from one program reading a data area
before the other has finished modifying it, or both processes
modifying the same area of memory at the same time.
C. M. Huang / SLDC-IPC / 09.2006
27
Communication via Files
 Locking entire files
– lock file
– fcntl( ) (POSIX), flock( ) (BSD 4.3)
 Locking file areas (record locking)
– Deadlock
Read
Write
1
Process 1
Process 2
2
Write
C. M. Huang / SLDC-IPC / 09.2006
..
.
Read
28
Process Communication Models
 Message passing
 Shared memory
Process A
M
Process A
1
M
Process B
M
2
Process B
2
Kernel
M
1
Kernel
Silberschatz, et al., Operating System Principles, Seventh Edition
C. M. Huang / SLDC-IPC / 09.2006
29
IPC for Linux
 Linux IPC
– Many IPC mechanisms derived from traditional UNIX IPC
• Allow processes to exchange information
– Some are better suited for particular applications
• For example, those that communicate over a network or exchange short
messages with other local applications
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
30
IPC for Linux






Signal
Pipe
Message queue
Shared memory
System V Semaphores
Sockets
C. M. Huang / SLDC-IPC / 09.2006
31
Signals
 Signals
– One of the first interprocess communication mechanisms
available in UNIX systems
– Kernel uses them to notify processes when certain events occur
– Do not allow processes to specify more than a word of data to
exchange with other processes
– Created by the kernel in response to interrupts and exceptions,
are sent to a process or thread
• as a result of executing an instruction (such as a segmentation fault)
• from another process (such as when one process terminates another)
• from an asynchronous event
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
32
POSIX Signals
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
33
Signals
 A process/thread can handle a signal by
1. Ignore the signal—processes can ignore all but the SIGSTOP
and SIGKILL signals.
2. Catch the signal—when a process catches a signal, it invokes its
signal handler to respond to the signal.
3. Execute the default action that the kernel defines for that signal
 Default actions
–
–
–
–
–
Abort: terminate immediately
Memory dump: Copies execution context before exiting
Ignore
Stop (i.e., suspend)
Continue (i.e., resume)
C. M. Huang / SLDC-IPC / 09.2006
Deitel, et al., Operating Systems, Third Edition
34
Signals
 Signal blocking
– A process or thread can block a signal
•
Signal is not delivered until process/thread stops blocking it
– While a signal handler is running, signals of that type are blocked
by default
•
Still possible to receive signals of a different type
– Common signals are not queued
•
Real-time signals provide signal queuing
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
35
Pipes
 Pipes 
– Producer process writes data to the pipe, after which the consumer
process reads data from the pipe in first-in-first-out order
– When pipe is created, an inode that points to pipe buffer (page of
data) is created
– Access to pipes is controlled by file descriptors
• Can be passed between related processes (e.g., parent and child)
– Named pipes (FIFOs) ↔
• Can be accessed via the directory tree
– Limitation: Fixed-size buffer
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
36
Message Queues
 Message queues
– Allow processes to transmit information that is composed of a
message type and a variable-length data area
• Stored in message queues, remain until a process is ready to receive them
• Related processes can search for a message queue identifier in a global
array of message queue descriptors
– Message queue descriptor contains
» Queue of pending messages
» Queue of processes waiting for messages
» Queue of processes waiting to send messages
» Data describing the size and contents of the message queue
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
37
Shared Memory
 Shared memory [protection schemes]
– Advantages
• Improves performance for processes that frequently access shared data
• Processes can share as much data as they can address
– Standard interfaces
• System V shared memory
• POSIX shared memory
– Does not allow processes to change privileges for a segment of shared memory
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
38
System V Shared Memory System Calls
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
39
Shared Memory
 Shared memory implementation
– Treats region of shared memory as a file
– Shared memory page frames are freed when file is deleted
– Tmpfs (temporary file system) stores such files
• Tmpfs pages are swappable
• Permissions can be set
• File system does not require formatting
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
40
System V Semaphores
 System V semaphores
– Designed for user processes to access via the system call interface
 Semaphore arrays
– Protect a group of related resources
– Before a process can access resources protected by a semaphore
array, the kernel requires that there be sufficient available
resources to satisfy the process’s request
– Otherwise, kernel blocks requesting process until resources
become available
 Preventing deadlock
– When a process exits, the kernel reverses all the semaphore
operations it performed to allocate its resources
C. M. Huang / SLDC-IPC / 09.2006
Deitel, et al., Operating Systems, Third Edition
41
Sockets
 Sockets
– Allows pairs of processes to exchange data by establishing direct
bidirectional communication channels
– Primarily used for bidirectional communication between multiple
processes on different systems, but can be used for processes on
the same system
– Stored internally as files
– File name used as socket’s address, accessed via the VFS
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
42
Sockets
 Stream sockets
– Implement the traditional client/server model
– Data is transferred as a stream of bytes
– Use TCP to communicate, so they are more appropriate for reliable
communication
 Datagram sockets
– Faster, but less reliable communication
– Data is transferred using datagram packets
 Socketpairs
– Pair of connected, unnamed sockets
– Limited to use by processes that share file descriptors
C. M. Huang / SLDC-IPC / 09.2006
Deitel, et al., Operating Systems, Third Edition
43
sf01a:cmhuang[/] ipcs
IPC status from <running system> as of Thu Sep 21 14:35:30 CST 2006
T
ID
KEY
MODE
OWNER
GROUP
Message Queues:
Shared Memory:
m
1
0x50000d1d --rw-r--r-m
2
0xabbaca01 --rw-rw-rwm
3103
0
--rw-rw-rwm
1404
0
--rw-rw-rw-
root
pc62
cmhuang
root
root
TR
DSD
root
Semaphores:
s
0
s
2031617
s
917506
root
cmhuang
cmhuang
root
DSD
DSD
0x1
0
0
C. M. Huang / SLDC-IPC / 09.2006
--ra-ra-ra--ra-ra-ra--ra-ra-ra-
44
IPC for WinXP
 Data oriented
– Pipes
– Mailslots (message queues)
– Shared memory
 Procedure oriented / object oriented
–
–
–
–
Remote procedure calls
Microsoft COM objects
Clipboard
GUI drag-and-drop capability
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
45
Pipes
 Manipulated with file system calls
– Read
– Write
– Open
 Pipe server
– Process that creates pipe
 Pipe clients
– Processes that connect to pipe
 Modes
– Read: pipe server receives data from pipe clients
– Write: pipe server sends data to pipe clients
– Duplex: pipe server sends and receives data
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
46
Pipes
 Anonymous Pipes
–
–
–
–
Unidirectional
Between local processes
Synchronous
Pipe handles, usually passed through inheritance
 Named Pipes
–
–
–
–
–
–
Unidirectional or bidirectional
Between local or remote processes
Synchronous or asynchronous
Opened by name
Byte stream vs. message stream
Default mode vs. write-through mode
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
47
Mailslots
 Mailslot server: creates mailslot
 Mailslot clients: send messages to mailslot
 Communication
–
–
–
–
–
Unidirectional
No acknowledgement of receipt
Local or remote communication
Implemented as files
Two modes
• Datagram: for small messages
• Server Message Block (SMB): for large messages
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
48
Shared Memory
 File mapping
– Processes map their virtual memory to same page frames in
physical memory
– Multiple processes access same file
– No synchronization guaranteed
 File mapping object
– Maps file to main memory
 File view
– Maps a process’s virtual memory to main memory mapped by file
mapping object
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
49
Nokia DSP Gateway
Nokia DSP Gateway Overview
 Supports TI OMAP1510, 1610, 5910, 5912, 2410, and 2412.
 GPP side
– Linux kernel 2.6.6
– Linux device driver
– Access DSP through normal system calls such as read() and
write()
 DSP side
– TI DSP/BIOS
– DSP kernel library (tokliBIOS) and API
http://dspgateway.sourceforge.net/pub/index.php
C. M. Huang / SLDC-IPC / 09.2006
51
Nokia DSP Gateway Overview
 Current version: 3.3.1 (2006-09-13)
 Open source software
 Current license state:
Release
License
1.0
GPL
2.X
GPL
3.X
C. M. Huang / SLDC-IPC / 09.2006
ARM pack
DSP pack
GPL
BSD
52
TI OMAP 1610
C. M. Huang / SLDC-IPC / 09.2006
53
Summary of changes from v2.6.5 to v2.6.6
============================================
<[email protected].(none)> [ARM PATCH] 1777/1: Add TI OMAP support to ARM core files
Patch from Tony Lindgren
This patch updates the ARM Linux core files to add support for Texas Instruments
OMAP-1510, 1610, and 730 processors.
OMAP is an embedded ARM processor with integrated DSP.
OMAP-1610 has hardware support for USB OTG, which might be of interest to Linux
developers. OMAP-1610 could be easily be used as development platform to add USB
OTG support to Linux.
This patch is an updated version of an earlier patch 1767/1 with the dummy
Kconfig added for OMAP as suggested by Russell King here:
http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=1767/1
This patch is brought to you by various linux-omap developers.
http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.6
C. M. Huang / SLDC-IPC / 09.2006
54
TI DSP/BIOS









Scalable real-time kernel
Real-time scheduling and synchronization
Host-to-target communication
Real-time instrumentation
Preemptive multi-threading
Hardware abstraction
Real-time analysis and configuration tools
Application programs use DSP/BIOS by making calls to the API
All DSP/BIOS modules provide C-callable interfaces
C. M. Huang / SLDC-IPC / 09.2006
55
DSP Gateway System Architecture
C. M. Huang / SLDC-IPC / 09.2006
56
Mailbox in OMAP1
 Each set of mailbox registers consists of two 16-bit registers and a 1bit flag register.
 The interrupting processor can use one 16-bit register to pass a data
word to the interrupted processor and the other 16-bit register to pass
a command word.
C. M. Huang / SLDC-IPC / 09.2006
57
Mailbox in OMAP2
 6 sets of mailbox registers, and each message register can carry a 32bit data
 two mailbox queues are reserved, MAILBOX_0 for ARM to DSP
direction and MAILBOX_1 for DSP to ARM direction
C. M. Huang / SLDC-IPC / 09.2006
58
Mailbox Command and Data Register
 Command register bit definitions
 Data register bit definitions
C. M. Huang / SLDC-IPC / 09.2006
59
Mailbox Command Definition
C. M. Huang / SLDC-IPC / 09.2006
60
Mailbox Command Sequence
 Configuration sequence
– System configuration
– Task configuration
– Task add/delete
 Data transfer sequence
–
–
–
–
–
ARM to DSP transfer
DSP to ARM transfer
Task control
Read/write DSP register
Read/write DSP system parameters
C. M. Huang / SLDC-IPC / 09.2006
61
System Configuration Sequence
C. M. Huang / SLDC-IPC / 09.2006
62
DSPCFG Command
C. M. Huang / SLDC-IPC / 09.2006
63
ARM to DSP Passive Word Receiving
C. M. Huang / SLDC-IPC / 09.2006
64
ARM to DSP Active Word Receiving
C. M. Huang / SLDC-IPC / 09.2006
65
ARM to DSP Passive Block Receiving
C. M. Huang / SLDC-IPC / 09.2006
66
IPC Buffer
 It is unrealistic to transfer a large amount of data between
two processors with only mailbox registers. Therefore,
IPBUF (Inter-Processor Buffer) is introduced for the large
block data transfer.
 There are three types of IPBUFs:
– Global IPBUF
– Private IPBUF
– System IPBUF
C. M. Huang / SLDC-IPC / 09.2006
67
Global IPBUF
 The Global IPBUFs are defined for the block data transfer
between ARM and DSP.
 The Global IPBUF lines are identified with BID (Buffer ID),
and all tasks can use them commonly.
 The maximum line size is 64k words (128k bytes).
C. M. Huang / SLDC-IPC / 09.2006
68
Global IPBUF
C. M. Huang / SLDC-IPC / 09.2006
69
DSP Gateway Linux Device Interfaces
C. M. Huang / SLDC-IPC / 09.2006
70
DSP Gateway Linux APIs
C. M. Huang / SLDC-IPC / 09.2006
71
Passive Receiving Task
C. M. Huang / SLDC-IPC / 09.2006
72
Active Receiving Task
C. M. Huang / SLDC-IPC / 09.2006
73
TI DSP/BIOS Link
TI DSP/BIOS Link
 For TI OMAP5910/5912, Davinci, and DM642 devices.
 DSP/BIOS Link is a no-charge, royalty-free product and is
provided in C source code form.
 Current version: 1.30.06 (Nov. 22, 2005)
 Portable across different operating systems.
 OS (GPP) + DSP/BIOS (DSP)
http://focus.ti.com/dsp/docs/dspsupportatn.tsp?sectionId=3&tabId=477&familyId=44&toolTypeId=5
C. M. Huang / SLDC-IPC / 09.2006
75
DSP/BIOS Link Supported Platforms
 Davinci running Montavista Linux Pro 4.0 or PrKernel v4.1 on
ARM
 OMAP5912 running Montavista Linux Pro 3.1 on ARM
 DA300 running PrKernel v4.1 on ARM
 DM642 connected to a PC running Red Hat Linux 9.0 or Red
Hat Enterprise Linux 4.0
C. M. Huang / SLDC-IPC / 09.2006
76
Software Architecture of DSP/BIOS Link
C. M. Huang / SLDC-IPC / 09.2006
77
On the GPP Side
 The OS ADAPTATION LAYER encapsulates the generic OS services
that are required by the other components of DSP/BIOS LINK. This
component exports a generic API that insulates the other components
from the specifics of an OS. All other components use this API instead
of direct OS calls. This makes DSP/BIOS LINK portable across
different operating systems.
 The LINK DRIVER encapsulates the low-level control operations on
the physical link between the GPP and DSP. This module is
responsible for controlling the execution of the DSP and data transfer
using defined protocol across the GPP-DSP boundary.
C. M. Huang / SLDC-IPC / 09.2006
78
On the GPP Side
 The PROCESSOR MANAGER maintains book-keeping
information for all components. It also allows different bootloaders to be plugged into the system. It builds exposes the
control operations provided by the LINK DRIVER to the user
through the API layer.
 The DSP/BIOS LINK API is interface for all clients on the
GPP side. This is a very thin component and usually doesn’t
do any more processing than parameter validation. The API
layer can be considered as ‘skin’ on the ‘muscle’ mass
contained in the PROCESSOR MANAGER and LINK
DRIVER.
C. M. Huang / SLDC-IPC / 09.2006
79
On the DSP Side
 The LINK DRIVER is one of the drivers in DSP/BIOS. This
driver specializes in communicating with the GPP over the
physical link.
 There is no specific DSP/BIOS LINK API on the DSP. The
communication (data/message transfer) is done using the
DSP/BIOS modules - SIO/GIO/MSGQ.
C. M. Huang / SLDC-IPC / 09.2006
80
DSP/BIOS Link Key Components
 PROC
– This component represents the DSP processor in the application
space.
– This component provides services to:
•
•
•
•
•
•
Initialize the DSP & make it available for access from the GPP.
Load code on the DSP.
Start execution from the run address specified in the executable.
Read from or write to DSP memory.
Stop execution.
Additional platform-specific control actions.
– In the current version, only one processor is supported. However,
the APIs are designed to support multiple DSPs and hence they
accept a processorID argument to support this future enhancement.
C. M. Huang / SLDC-IPC / 09.2006
81
DSP/BIOS Link Key Components
 CHNL
– This component represents a logical data transfer channel in the
application space.
– CHNL is responsible for the data transfer across the GPP and DSP.
– CHNL is an acronym for ‘channel’.
– A channel (when referred in context of DSP/BIOS LINK) is:
• A means of transferring data across GPP and DSP.
• A logical entity mapped over a physical connectivity between the GPP and
DSP.
• Uniquely identified by a number within the range of channels for a specific
physical link towards a DSP.
• Unidirectional. The direction of a channel is decided at run time based on the
attributes passed to the corresponding API.
C. M. Huang / SLDC-IPC / 09.2006
82
DSP/BIOS Link Key Components
 MSGQ
– This component represents queue based messaging
– This component is responsible for exchanging short messages of
variable length between the GPP and DSP clients. It is based on
the MSGQ module in DSP/BIOS.
– The messages are sent and received through message queues.
– A reader gets the message from the queue and a writer puts the
message on a queue. A message queue can have only one reader
and many writers. A task may read from and write to multiple
message queues.
C. M. Huang / SLDC-IPC / 09.2006
83
DSP/BIOS Link Key Components
 POOL
– This component provides APIs to open and close memory pools,
which are used by the CHNL and MSGQ component for allocating
the buffers used in data transfer and messaging respectively.
– This component is responsible for providing a uniform view of
different memory pool implementations, which may be specific to
the hardware architecture or OS on which DSP/BIOS LINK is
ported. This component is based on the POOL interface in
DSP/BIOS.
C. M. Huang / SLDC-IPC / 09.2006
84
Initialization Phase API
 PROC
– PROC_Setup()
– PROC_Attach()
– PROC_Load()
 CHNL
– CHNL_Create()
– CHNL_AllocateBuffer()
 MSGQ
–
–
–
–
MSGQ_TransportOpen()
MSGQ_Open()
MSGQ_SetErrorHandler()
MSGQ_Locate()
 POOL
– POOL_Open()
C. M. Huang / SLDC-IPC / 09.2006
85
Execution Phase API
 PROC
–
–
–
–
PROC_Start()
PROC_Read()
PROC_Write()
PROC_Stop()
 CHNL
– CHNL_Issue()
– CHNL_Reclaim()
 MSGQ
–
–
–
–
–
MSGQ_Alloc()
MSGQ_Put()
MSGQ_Get()
MSGQ_GetSrcQueue()
MSGQ_Free()
C. M. Huang / SLDC-IPC / 09.2006
86
Finalization Phase API
 PROC
– PROC_Detach()
– PROC_Destroy()
 CHNL
– CHNL_FreeBuffer()
– CHNL_Delete()
 MSGQ
– MSGQ_Release()
– MSGQ_TransportClose()
– MSGQ_Close()
 POOL
– POOL_Close()
C. M. Huang / SLDC-IPC / 09.2006
87
IPC Hardware Architecture
Tightly Coupled vs. Loosely Coupled
Systems
 Tightly coupled systems
– Processors share most resources including memory
– Communicate over shared buses using shared physical memory
 Loosely coupled systems
– Processors do not share most resources
– Most communication through explicit messages or shared virtual
memory (although not shared physical memory)
 Comparison
– Loosely coupled systems: more flexible, fault tolerant, scalable
– Tightly coupled systems: more efficient, less burden to operating
system programmers Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
89
Tightly Coupled Systems
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
90
Loosely Coupled Systems
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
91
Processor Interconnection Schemes
 Interconnection scheme
– Describes how the system’s components, such as processors and
memory modules, are connected
– Consists of nodes (components or switches) and links (connections)
– Parameters used to evaluate interconnection schemes
•
•
•
•
Node degree
Bisection width
Network diameter
Cost of the interconnection scheme
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
92
Processor Interconnection Schemes
Shared bus multiprocessor organization.
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
93
Processor Interconnection Schemes
Crossbar-switch matrix multiprocessor organization.
C. M. Huang / SLDC-IPC / 09.2006
Deitel, et al., Operating Systems, Third Edition
94
Processor Interconnection Schemes
4-connected 2-D mesh network.
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
95
Processor Interconnection Schemes
3- and 4-dimensional hypercubes.
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
96
Processor Interconnection Schemes
Multistage baseline network.
Deitel, et al., Operating Systems, Third Edition
C. M. Huang / SLDC-IPC / 09.2006
97
A Simple IPC Architecture
 ARM writes command in shared
memory
 ARM interrupts DSP
 DSP responds to interrupt and
reads command in shared
memory
 DSP executes a task based on
the command
 DSP interrupts ARM upon
completion of the task
TMS320DM644x DMSoC ARM Subsystem Reference Guide (SPRUE14)
C. M. Huang / SLDC-IPC / 09.2006
98
TI OMAP5910
C. M. Huang / SLDC-IPC / 09.2006
99
OMAP5910 IPC Architecture
 Mailbox registers
– Each direction 32bit x 2
– Interrupt occurrence
 MPU interface (MPUI)
– MPU accesses DSP memory
space directly
 Shared memory
– Arrangement with the Traffic
Controller
– 3 type of memories
– Best suitable to large amount of
data sharing
C. M. Huang / SLDC-IPC / 09.2006
100
Traffic Controller (TC)
 The IMIF allows access to the 192K bytes of on-chip SRAM.
 The EMIFS interface provides 16-bit-wide access to asynchronous or synchronous
memories.
 The EMIFF Interface provides access to 16-bit-wide access to standard SDRAM
memories.
 The TC provides the functions of
– arbitrating contending accesses to the same memory interface from different initiators
(MPU, DSP, System DMA, Local Bus),
– synchronization of accesses due to the initiators and the memory interfaces running at
different clock rates,
– and the buffering of data allowing burst access for more efficient multiplexing of transfers
from multiple initiators to the memory interfaces.
 The TC’s architecture allows simultaneous transfers between initiators and different
memory interfaces without penalty. For instance, if the MPU is accessing the EMIFF
at the same time, the DSP is accessing the IMIF, transfers may occur simultaneously
since there is no contention for resources.
C. M. Huang / SLDC-IPC / 09.2006
101
ARM IPCM Module
 The IPCM provides up to 32 mailboxes with control logic and
interrupt generation to support inter-processor
communication.
 An AHB interface enables access from source and
destination cores.
 The IPCM:
– sends interrupts to other cores
– passes small amounts of data to other cores.
 A source core can have multiple mailboxes and send
messages in parallel (multitasking).
PrimeCell Inter-Processor Communications Module Technical Reference Manual
C. M. Huang / SLDC-IPC / 09.2006
102
IPCM Components
 1-32 programmable mailboxes, each comprising:
–
–
–
–
–
–
a single 1-32-bit Mailbox Source Register
a single 1-32-bit Mailbox Destination Register
a single 2-bit Mailbox Mode Register
a single 1-32-bit Mailbox Mask Register
a single 2-bit Mailbox Send Register
0-7 32-bit data registers to store the message.
 1-32 sets of read-only interrupt status registers, one for each interrupt,
each comprising:
– 1-32-bit Raw Interrupt Status Register (each bit corresponds to each mailbox)
– 1-32-bit Masked Interrupt Status Register (each bit corresponds to each mailbox).
 A 32-bit Configuration Status Register
C. M. Huang / SLDC-IPC / 09.2006
103
IPCM Functional Block
PrimeCell Inter-Processor Communications Module Technical Reference Manual
C. M. Huang / SLDC-IPC / 09.2006
104
IPCM Example
C. M. Huang / SLDC-IPC / 09.2006
105
IPCM Example
 Core0 has a message to send to Core1. Core0 claims the mailbox by
setting bit 0 in the Mailbox Source Register. Core0 then sets bit 1 in
the Mailbox Destination Register, enables the interrupts and programs
the message into the Mailbox Data Registers. Finally, Core0 sends the
message by writing 01 to the Mailbox Send Register. This asserts the
interrupt to Core1.
 When Core1 is interrupted, it reads the Masked Interrupt Status
Register for IPCMINT[1] to determine which mailbox contains the
message. Core1 reads the message in that mailbox, then clears the
interrupt and asserts the acknowledge interrupt by writing 10 to the
Mailbox Send Register.
 Core0 is interrupted with the acknowledge message, completing the
operation. Core0 then decides whether to retain the mailbox to send
another message or release the mailbox, freeing it up for other cores in
the system to use it.
C. M. Huang / SLDC-IPC / 09.2006
106
Conclusions
Conclusions
 IPC schemes for supporting many cores
 Performance and power consumption analysis for different
IPC schemes
 IPC API schemes
C. M. Huang / SLDC-IPC / 09.2006
108
Thanks for Your Attention!