Networking for Embedded Systems
Download
Report
Transcript Networking for Embedded Systems
HW/SW Co-design
성균관대 조준동 교수
목차
1. SDR의 개념
2. Hardware and Software 분할
3. MP-SoC
4. HW/SW Re-configurable Multimedia
Platforms
H/W and S/W Co-design
Codesign Definition and Key
Concepts
Codesign
Exploiting the trade-offs between hardware and
software in a system through their concurrent
design
Key concepts
Concurrent: hardware and software developed
at the same time on parallel paths
Integrated: interaction between hardware and
software developments to produce designs that
meet performance criteria and functional
specifications
Motivations for Codesign
Instruction Set Processors (ISPs) available as
cores in many design kits (386s, DSPs,
microcontrollers,etc.)
Systems on Silicon - many transistors available
in typical processes (> 10 million transistors
available in IBM ASIC process, etc.)
Increasing capacity of field programmable devices
- some devices even able to be reprogrammed onthe-fly (FPGAs, CPLDs, etc.)
Efficient C compilers for embedded processors
Hardware synthesis capabilities
SOC Co-Design Challenges
Current systems are complex and heterogenous Contain
many different types of components
Half of the chip can be filled with 200 low-power, RISClike processors (ASIP) interconnected by fieldprogrammable buses, embedded in 20Mbytes of
distributed DRAM and flash memory, Another Half: ASIC
Computational power will not result from multi-GHz
clocking but from parallelism, with below 200 MHz. This
will greatly simplify the design for correct timing,
testability, and signal integrity.
A general codesign
framework
Hardware and software
partitioning
Hardware design
software design
Interface design
Quantitative evaluation
Evaluate next structure
Mixing H/W and S/W
Argument: Mixed hardware/ software
systems
represent the best of both worlds.
High performance, flexibility, design reuse,
etc.
Counterpoint: From a design standpoint, it is
the worst of both worlds
Simulation: Problems of verification, and
test become harder
Interface: Too many tools, too many
interactions, too much heterogeneity
Hardware/ software partitioning is “AIcomplete”!
Flexibility: 응용 가능한 제품의 수
HW/SW Co-design Space
SW: Flexibility 증가
10000
1000
100
Co-Design
Space
HW: 전력 효율 증가
10
1
101
102
103
104
105
전력 효율 (MIPS/W)
Design
space
exploration
Customer/marketing
system architect
High-level
transformation
System
architect
Design space
exploration
space
System
analysis
Cospecification
Reused functions
and processes
Process
transformation
HW/SW partitioning
and scheduling
HW arch & comp.
Reused HW & SW
components
HW synthesis
SW synthesis
Source: Ernst (IEEE D & T of Computer)
Evaluation (cosimulation)
Synthesis tasks
Scheduling: make sure that data is available
when it is needed.
Allocation: make sure that processes don’t
compete for the PE.
Partitioning: break operations into separate
processes to increase parallelism, put serial
operations in one process to reduce
communication.
Mapping: take PE, communication link
characteristics into account.
Horizontal Partitioning in
Codesign
System
Core processor
Application-specific
coprocessors
SW
HW
Bridging the architectural
gap
One-M gate reconfigurable, one-M gate
hardwired logic.
50GIPS for programmable components or 500
GIPS for dedicated hardwares
Product reliability: design at a level far above
the RT level, with reuse factors in excess of 100
Trade-off: 100MOPs/watt (microprocessor)
100GOPs/watt (hardwired) Reconf. Computing
with a large number of computing nodes and a
very restricted instruction set (Pleiades)
HW/SW architecture in 2005
SDR
Configuration
• Digital Down/Up Conversion (DDC)
– Channel Center
– Decimation/Interpolation rates
– Compensation Filters
– Matched Filter a = {0.25,0.35,...}
• FEC
– Convolutional
– Reed-Solomon
– Concatenated Coding
– Turbo CC/PC
– (De-)Interleave
• Beam Forming
Soft Radio
Digital
Signal
Processing
Engine
• Security
• Modulation Format
– QPSK
– DQPSK
– p/4 DQPSK
– {16,64,256,1024} QAM
– OFDM
– OFDM CDMA
• Channel Access
– CDMA
– TDMA
• DSSS
– Rake, track, acquire
– Multi User Detect. (MUD)
– ICU
• Network Interface Definition
Link Adaptation Technique
Adaptive Modulation and
Coding
Throughput
16QAM, R=1/2
Modulation/Coding
transition, 8PSK->16QAM
16QAM, R=1/4
8PSK, R=1/4
Hull of AMC
QPSK, R=1/4
C/I
Reconfigurable Radios
Roy Want Roy Want
Intel Research Intel Research
– Manual selection Manual selection wireless interface wireless
interface
– Automatic selection Automatic selection
high high-res res, low , low- automatic selection automatic
selection video format video format
Key Software Radio
Components
Multibeam Antenna Array
Multiband RF Conversion
Spectral
Purity
Wideband A/D & D/A Conversion
IF Processing
Environment
Characterization
IF Processing
WB Digital
DSP &
Software
Design
Modulator
Demodulator
Advanced
Control
Bitstream
Bitstream Processing
Bitstream Processing
Service
Quality
Time to Market
Transmit
Isochronism
Throughput
Response Time
Receive
Larger Network
On Line Adaptation
Off Line Support
SNR/BER optimization
Data rate adaptation
interference suppression
Band/Mode selection
Development
Optimization
Over the air
Delivery
SDR Architecture
RF unit
Signal processing/control unit
Input/
Output
Rx SYN
LNA
RX
Tx SYN
LNA
TX
RX
Receive/
Transmit
Rx SYN
PA
EX.
TX
Tx SYN
C-PCI bus
Isao TESHIMA, Hitachi Kokusai Electric Inc., [email protected]
C
o
n
tr
o
l
EX.
In
te
fra
c
e
PA
B
a
s
e
b
a
n
d
M
O
QD
E
uM
a
d
ra
t
u
re
M
O
D D
E a
M ta
c
o
n
ve
rt
e
r
Receive/
Transmit
HMI
Terminal
Specification of Prototype
Signal processing
FPGA : Quadrature MODEM
DSP : Baseband MODEM
FPGA
XCV2000E x 3
DSP
TMS320C6701 x 4
CPU
Control module : Celeron Peripheral module
System bus
cPCI
Operating system
Linux
HMI
Operates from web browser
Interface
Audio I/O
Serial I/O
Ethernet(100BASE-TX)
Specification of Prototype
RF range
2~500MHz
Waveform
SSB, AM, FM, BPSK, QPSK, 8PSK,
16QAM
Number of channel
Four full-duplex
Radio relay
Repeat/Bridge
Frequency accuracy
<0.1ppm
Rx IF frequency
70MHz
Tx IF frequency
25MHz
Dynamic range
14bits
Rx IF sampling frequency
40MHz
Tx IF sampling frequency
100MHz
PACT’s SDR XPP
Martin Vorbach
PACT XPP Technologies, Germany
U-P vs XPP
A SDR/Multimedia Solution
HW/SW Cost Trends
100
Percent of total costs
80
Hardware
Development
60
40
Software
20
0
1955
Maintenance
1970
1985
Year
2000
Function-Architecture CoDesign
Cadence
Partitioning
Performance Requirements
몇몇의 Function들은 Hardware로의 구현이 더 용이
반복적으로 사용되는 Block
Parallel하게 구성되어 있는 Block
Modifiability
Software로 구성된 Block은 변형이 용이
Implementation Cost
Hardware로 구성된 Block은 공유해서 사용이 가능
Scheduling
각각 HW와 SW로 분리된 Block들을 정해진 constraints들에
맞출 수 있도록 scheduling
SW Operation은 순차적으로 scheduling되어야 한다
Data와 Control의 의존성만 없다면 SW와 HW는
Concurrent하게 scheduling
Interface
Interface Block의 필요성
Hardware와 Software Block간의 Data 전달
효율적인 Interface Block 을 구성해야만 HW/SW
Block간의 Overhead를 줄일 수 있다
Interface 방법
Shared Memory
FIFO
Handshaking protocol
Logical Bus Architecture
System Bus Signals
address, data, control signals
address space consists of the memory
space & I/O space
memory space : memory of the SW
component
I/O space : ports within SW & registers in
other HW
Port Signals
These are specialized signals capable of
directly interfacing between SW & HW
component
Interrupt Signals
When SW & HW components have
completed an operation, or when an error
condition is detected
Existing Tools
Academic:
POLIS: U.C. Berkeley
PTOLEMY: U.C. Berkeley
VULCAN: Stanford U.
(Hardware C)
CHINOOK: U. of
Washington (VHDL)
COSYMA: U. of
Braunschweig (C*)
MEIJE: INRIA and others
(Esterel, Lustre, Signal)
Commercial:
Arexys: SDL, VHDL, C
CoWare: C/C++
LavalLogic: Java to
Verilog
Cynlib: C++ to Verilog
Art, Algorithm to RT:
C++ to RTL
SUPERLOG: System
level description
language
POLIS Codesign
Methodology
Formal
Verification
Graphical EFSM
ESTEREL
................
Compilers
Partitioning
CFSMs
Sw Synthesis
Simulation
Intfc + RTOS
Synthesis
Sw Code +
RTOS
Hw Synthesis
Logic Netlist
Rapid prototyping
Rapid Prototyping
RASSP design flow
REUSE DESIGN LIBRARIES AND DATABASE
Primarily
software
HW
DESIGN
SYSTEM
DEF.
FUNCTION
DESIGN
HW & SW
CODESIGN
Primarily
hardware
VIRTUAL PROTOTYPE
HW &
SW
PART.
HW
FAB
INTEG.
& TEST
SW
DESIGN
HW & SW
Partitioning
& Codesign
SW
CODE
IP Authoring, SPW
C/C++, MatLab, HDL
RF J-K models
3G Wireless Libraries
Multimedia Libraries
Full System Specification
SPW Fixed Point
Algorithm Analysis
SPW HDS Block Implementation
SPW-NC-SIM C++/RTL Cross Debug
HW/SW Verifier - Verification Cockpit
Synthesis / Place & Route etc.
Platform Integration
SPW Floating Point
Algorithm Analysis
Software
Development
SPW
DSP Behavioral Specification
Core ISS models
Hardware
Development
IP Block Authoring
Block Level
Specification
System C supports:
New SystemC Products and Support from:
Mentor Graphics - Seamless® C-Bridge™
Verisity - SpecMan™ Elite
Forte Design Systems - ESC Library
Emulation & Verification Engineering - Zebu
Axys Design - MaxSim™
New SystemC Capabilities from:
CoWare - N2C updated for SystemC 2.0
Cadence - SPW 4.8 / SystemC v2.0 IF
Synopsys - CoCentric System Studio
Plus Kluwer book - “System Design Using
SystemC”, 2002
Design Flow using Seamless CVE
Typical System Design
Process
Specify system
Design HW & SW
Specify
System
Design
Hardware
i
e
n
i
f
f
e
i
c
Integrate & Test
No
Test together
No
OK?
Yes
t
n
Design
Software
Design Flow using Seamless CVE
Logic Simulation
Requires a fully functional
microprocessor model
Bus functional models are not fully
functional
Software models are too slow
Software models may not be available
Hardware models have limited capability
Limited debugging capability
Okay for verifying hardware
Ineffective for running software
Design Flow using Seamless CVE
Instruction Set Simulation
Fast
Good debugging capability
Can model custom hardware
Limited I/O and interrupt handling
Design Flow using Seamless CVE
Seamless CVE
X-ray for Debug
g
e bu atio n
D
re
ul
tw a e Sim
f
o
r
S
twa
f
o
S
ModelSIM for Simulation
Seamless
te
Sys
m
t
Con
rol
Performance
Optimization
g
e bu atio n
D
l
e
w a r Sim u
d
r
Ha ware
d
Har
Design Flow using Seamless CVE
Design Flow using Seamless CVE
OCAPI-xl design flow
Application Structure
Specification and modeling
Executable specification - Verilog, VHDL,
C, C++, Java.
Common models: synchronous dataflow
(SDF), sequential programs (Prog.),
communicating sequential processes
(CSP), object-oriented programming
(OOP), FSMs, hierarchical/concurrent FSM
(HCFSM).
Depending on the application domain and
specification semantics, they are based on
different models of computation.
Hardware Synthesis
Many RTL, logic level, physical level
commercial CAD tools.
Some emerging high-level synthesis tools:
Behavioral Compiler (Synosys), Monet
(Mentor Graphics), and RapidPath
(DASYS).
Many open problems: memory
optimization, parallel heterogeneous
hardware architectures, programmable
hardware synthesis and optimization,
communication optimization.
Software synthesis
The use of real-time operating systems
(RTOSs)
The use of DSPs and micro-controllers –
code generation issues
Special processor compilation in many
cases is still far less efficient than manual
code generation!
Retargeting issues - C code developed for
TI TMS320C6x is not optimized for running
on Philips TriMedia processor.
Interface synthesis
Interface between:
- hardware-hardware
- hardware-software
- software-software
Timing and protocols
Recently, first commercial tools appeared:
the CoWare system (hw-sw protocols)
and the Synopsys Protocol Compiler
(hw interface synthesis tool)
HW/SW communication
design
HW/SW communication
services
Component Component-based
design environment
VDSL application: the
subset
A Multicore SoC
RTL Architecture Model
WCDMA BER with Image
Quality Verification
Cadence
WCDMA
Channel
Floating Point.
WCDMA
Channel
A
• modulation
transfer
function area
(MTFA)
Image
Quality
Tester
(IQT)
• integrated
contrast
sensitivity (ICS)
B
• square root
integral (SQRI)
• subjective
quality factor
(SQF)
• folded SQRI
Fixed Point.
Polymorphism for
Rapid Conversion
Floating to Fixed Pt.
Models
• folded MTFA
• peak
signal/noise
ratio (PSNR).
Derivative Design
Problem
The corrupted voice mail decoding is due to the JPEG
decoding having a higher priority than the QCELP
audio decoding of the voice mail, on the DSP. There
are 2 possible solutions:
1. HW/SW tradeoff
Move JPEG decoding which stalled the QCELP audio
decoding into hardware
2. SW/SW tradeoff
Re-prioritise the QCELP audio decoding
We will explore option 1, by moving part of the JPEG
decoding (IDCT) to dedicated HW.
JPEG Encoding/ Decoding Design o
Systems-on-a-Chip
Masaharu Imai School of IST, Osaka University
E-mail: [email protected]
http://vlsilab.ics.es.osaka-u.ac.jp/~imai/
Principle of codesign method
Specification of Machikane-I
JPEG Encoding
Decoding
Block Diagram of Machikane-I
Estimation of Design Quality
Optimum Partitioning
Design strategy
Put high-rate but simple functions on peripheral processors.
Also moves control physically closer.
Consolidate low-rate background tasks on main CPU.
Can multiple processes execute concurrently?
Is the performance granularity of available components fine enough
to allow efficient search of the solution space?
Do computation and communication requirements conflict?
How accurately can we estimate performance?
software
custom ASICs
Granularity of Description
Tradeoff between HW cost
and Performance
Spec. of Digital Motion
Camera
Encoding Procedure
Function
R2Y: Transform from RGB to YCbCr
DCT: Discrete Cosine Transform
Image Input
Q: Quantization
Image Display
VLC: Variable Length Coding
Image Compression Decoding Procedure
VLD: Variable Length Decoding
Image Store
IQ: Inverse Quantization
Image Transmission
IDCT: Inverse Discrete Cosine Transform
User Interface
Y2R: Transform from YCbCr to RGB
Block Diagram
CODEC
Software Implementation
Some Functions by
Hardware
Computation Time
Pipeline Processing of
Images
Case 1: Parallel
Processing
Case 2: Sequential
Processing
Partitioning 방법
Partitoining Method C
Partitioning Method D
Partitioning v.s. Design
Quality
Partitioning v.s. Design
Quality
분할 방법
Grouping of
Similar Process Components
Major Classification
Detailed Classification
Division
Assignment
Process
Component 1
Process
Component 1
Group 1
Process
Component 1
Process
Component 2
Process
Component 3
Group 2
Process
Component 3
Grouping
HardWare
HardWare
Process
Component 3
HardWare
Process
Component 6
SoftWare
The Design
Target
Model
Assignment
SoftWare
Process
Component 2
Group 3
Process
Component 4
Group 4
SoftWare
Process
Component 2
Un-Decision
Un-Decision
Process
Component 5
Group 5
Process
Component 4
Process
Component 6
Group 6
Process
Component 5
Process
Component n
Y. Endo, H. Koizumi
Dept. of Computer & System Eng. Tokyo Denki Univ. Japan
Development of a Real-time Motion
Image Encoder using Codesign
Methodology
Motion image
Motion image
PC/AT or
compatible
computer
Interface
Logic
Compressed
image data
Compressed
image data
Motion
Compensation
Motion vectors
and
estimation errors
Compressed
image data
Buffer
Control
Master
Slave
Image
Compensation
Compression control
parameters
System model of the image encoder
System model of the real-time
image encoder
Master-slave model for the system
(PC/AT : master , image encoder : slave system)
The purpose of real-time image encoder
: compress a stream of motion images with resolution of 128 by
128 pixels
Slave system : coprocessor to reduce the workload of the host
system
Host system
: more time to handle other tasks such as user interface, data
logging
System specifications
Description
Requirements
Size of image
128 by 128 pixels
Compression Ratio
Adjustable, 30 times compression with 29db
SNR
Processing Speed
Maximum compress speed = 15 frames/sec.
Host system
PC/AT or compatible computer
Size
Same as PC/AT add-on card
Cost
Less than US $200.00
Interface logic module
DMA interface :
implementation for two data transfer channels
for transferring motion images from the master to the real-time image
encoder
for transferring of the compressed image data back to the master
Motion
compensation
module
DMA Interface
FIFO buffer
PC/AT
PC/AT I/O
space decoder
FIFO buffer
Buffer control
module
DMA Interface
Configuration
and control
registers
Image
compression
module
Interface logic
Motion Compensation
Module
searching window
Searching Window
8 by 8 pixels block
Image Compression Module
Discrete Cosine Transform(DCT) algorithm
: transforming the estimation error from time domain to
frequency domain
After the transformation : DC data(mean of the each pixel
in searching block), Frequency data of the block
Frequency domain is much less than in time domain, since
only information of edges is existed in frequency domain.
Huffman coding
performing on the quantized data and motion vector
for more reduction of redundancy
Buffer Control Module
Maintaining a constant bit rate
output of the real-time image
encoder
The importance of a constant bit
rate output
because a variable bit rate image
compression will cause uneven
transmission time and jagged
playback
Timeout timer : the indicator of
being required transfer bit rate
too high and too low
When either states are active
the buffer controller will request
image compression module to
adjust the image compression
ratio to maintain the required
transfer bit rate
Request
Timeout
Timer
Timer
Interface Logic
Buffer controller
Image compression
module
FIFO buffer
Software/Hardware Partition
Interface logic module : only implementation in hardware
because of hardware
interface between ISA bus and real-time image encoder
Several hardware/software partition methods for the other
three modules
Interface control
and buffer control
PC/AT
out FIFO buffer
In FIFO buffer
DSP56001R40
Memories :
ROM, SRAM
Method 1
All the functions except interface logic and buffer control are
implemented in software.
DSP56001R40 is used as the processor.
Low cost(fixed-point DSP), high precision(24 bits word length), high
performance(deliver 40 million operations per second), simple
hardware implementation(simple bus interface)
The simulator : sim56000 for DSP56000 family
The clock cycles for compressing an image : 7.27*106
Compression speed of this system : 5.5 frames per second
Therefore this design obviously cannot meet the system
specifications.
System architecture
Interface control
and buffer control
Interface control
and buffer control
Memories :
REGFIFO,BANK1
and BANK2
DSP56001R40
Memories :
ROM, SRAM
PC/AT
in FIFO buffer
out FIFO buffer
Method 2
The IN-FIFO is used to prefetch next image block.
The REG-FIFO is used to hold the current image block during block
searching.
In order to save the data for further use, the data must be written back
to the FIFO immediately.
BANK1 and BANK2 are used as a double image buffer.
One of the bank stored the previous decoded frame and the other bank
will be fill up with current decoded frame by the external image
compression module.
After the motion compensation of current frame finish, the role of the
two banks
will switch and then the motion compensation of next frame will start.
Motion compensation module
REG-FIFO
MUX
`
From PC/AT
Subtract
unit
IN-FIFO
MUX
MIN-VTR
Absolute
unit
accumulator
MIN-ERR
comparator
BANK1
MUX
CTR1
CONTROL
UNIT
BANK2
MUX
CTR2
To image compression module
To interface logic
Method 3
CTR1 and CTR2 are used to generate addresses for the two SRAM
banks.
The subtraction unit, absolute unit, accumulator, comparator, MINERR register, MIN-VTR register
: being used to find the minimum total absolute error and the
corresponding
motion vector of the current block searching
If the error of current iteration is smaller than the error in MIN-ERR
then MIN-ERR and MIN-VTR will be updated to the new minimum
total
absolute error and the corresponding motion vector respectively
After all iterations of the current block searching, the motion vector
with minimum error will be stored in REG-VTR.
The motion compensation chip will signal external image
compression module to read this motion vector.
Clock cycles used for each image : 2.43*106
The compression speed of this system : 16.46 frames/second
Method 4
All modules are implemented in hardware.
The implementation of interface logic, buffer control
and motion compression module are same as before.
But, the image compression module is impossible to
be implemented by a single chip.
At least two Actel’s A1280 is used.
Only two A1280 will over the system cost.
This approach is then rejected.
System architecture
Interface control
and buffer control
Motion
compensation
chip
Memories :
REGFIFO,BANK1
and BANK2
DSP56001R40
Memories :
ROM, SRAM
PC/AT
in FIFO buffer
out FIFO buffer
8051MCU
FIFO buffer
Method 4
Interface logic, buffer control, motion compensation
module : hardware
The other modules : software
Since 8051 MCU has enough internal memory for
huffman coding so that no external memory is
required
A FIFO buffer : interface between 8051 and DSP
DSP: estimation error, DCT/IDCT, buffer control
Clock cycles used for each image : 2.1*106
Compression speed : 15.7 frames/second
In-circuit simulator from Philips is used to simulate
the huffman coding.
Results
Method
Description
1
Hardware : interface logic, buffer control
Software : block searching, estimation error,
DCT, inverse DCT, huffman coding
2
Hardware : interface logic, buffer control,
block searching
Software : estimation error, DCT, inverse DCT,
huffman coding
3
Hardware : interface logic, buffer control,
block searching, estimation error, DCT,
inverse DCT, huffman coding
4
Hardware : interface logic, buffer control,
block searching
Software : estimation error, DCT, inverse DCT,
huffman coding
Compression Speed
(Frames/sec.)
Cost
(US$)
5.5
189.00
16.46
295.00
-
-
15.45
280.70
Conclusions
Method 2 and method 4 can be chosen for the system to
meet the system requirement.
But method 4 is selected for our prototype.
Although compression speed of method 4 is slower than
method 2 but it still satisfy the system specifications and
with cheaper system cost.
Since processor in method 4 running in a lower clock
frequency so that the level of difficulty of PCB layout in
method 4 less than method 2.
With codesign methodology, the hidden problems of the
design can be discovered in earlier stage which can
reduce time and cost of development.
Co-design Sites
Bibliography of Hardware/Software Codesign: http://www-ti.informatik.unituebingen.de/~buchen/
Ralf Niemann's Codesign Links and Literature: http://ls12-www.informatik.unidortmund.de/~niemann/codesign/codesign_links.html
URLs to Hardware/Software Co-Design Research:
http://www.ece.cmu.edu/~thomas/hsURL.html
RASSP Architecture Guide: http://www.sanders.com/hpc/ArchGuide/TOC.html
EDA, Electronic Design Automation: http://www.eda.org
COMET (Case Western Reserve University):
http://bear.ces.cwru.edu/research/hard_soft.html
COSMOS (Tima - Cmp, France): http://timacmp.imag.fr/Homepages/cosmos/research.html
COSYMA (Braunschweig): http://www.ida.ing.tu-bs.de/projects/cosyma/
Handel-C (Oxford): http://oldwww.comlab.ox.ac.uk/oucl/hwcomp.html
Lycos (Technical University of Lyngby, Denmark): http://www.it.dtu.dk/~lycos/
MOVE (Technical University Delft): http://cardit.et.tudelft.nl/MOVE/
Polis (University of Berkeley): http://www
cad.eecs.berkeley.edu/Respep/Research/hsc/abstract.html
ProCos (UK Research): http://www.comlab.ox.ac.uk/archive/procos/codesign.html
Ptolemy (University of Berkeley): http://ptolemy.eecs.berkeley.edu/
SPAM (Princeton): http://www.ee.princeton.edu/~spam/
TRADES (University of Twente, INF/CAES): http://wwwspa.cs.utwente.nl/aid/aid.html
Specificatietalen
SystemC: http://www.systemc.org