Douglas Gourlay, VP Marketing, Arista Networks, Santa Clara, CA
Download
Report
Transcript Douglas Gourlay, VP Marketing, Arista Networks, Santa Clara, CA
Trading Dynamics
20-25% CAGR in market volumes
Competitive advantage hinges on speed,
transparency, and proximity to data sources.
The application must be in the data path –
seamlessly
Quest to balance risk/compliance with
HPC on Wall Street - 2012
performance
10GbE Switches for the
Virtualized Data Center, but a
software company at the
core
>1300 Customers
>325 Employees
Profitable, self-funded, preIPO network infrastructure
provider
Open Linux-based OS
Fully automated testing, and
SW development
HPC on Wall Street - 2012
Arista Application Switch - 7124FX
• Couples ultra-low latency switch with next generation
programmable FPGA and memory subsystem
• Customer programmable FPGA and Control Plane
provides total control over the network, forwarding,
inspection, redirection, etc.
• Targeted for early adopters of hardware accelerated
applications such as risk analysis, data arbitrage, order
HPC on Wall Street - 2012
routing
Exegy believes…
•
Exegy believes in continually challenging the status quo of
market data delivery systems and trading platforms.
–
–
•
Exegy believes that delivery and consumption of quality
market data should be as easy and painless as possible.
–
–
v1
First to market with hardware-accelerated market data appliances
based on FPGA technology.
Best of breed solutions for major use cases faced by low-latency,
high-capacity consumers of financial market data feeds.
Fully managed and constantly monitored appliances to assure
optimal performance and the best customer experience.
A passion to help our customers succeed in the face of escalating
complexity and the increasing demands placed on them.
4
Impulse C, Custom FPGA-Accelerated Solutions for the Arista 7124FX
Brian Durwood, Co-founder
Converting C to multiple streaming hardware
processes ain’t that hard.
Focus on reducing clock cycles
Verify as you go
Iterate, iterate, iterate (no “magic button”)
The tool flow is a bit awkward for first timers.
Visual Studio or equivalent
Impulse C co-development, analysis & compile
Altera Quartus II for place & route into FPGA
Things you can do to get up to speed quickly:
Work from known good sw modules
Get up-front training or factory engineering
Programming With Impulse C
Not a new language
C language
applications
Based on standard ANSI C
C-language for FPGA programming
For embedded and HPC applications
Supports standard C development tools
Supports multi-process partitioning
A software-to-hardware compiler
Optimizes C code for parallelism
Generates HDL, ready for FPGA synthesis
Also generates hardware/software interfaces
Purpose
Describe hardware accelerators using C
Move compute-intensive functions to FPGAs
www.ImpulseAccelerated.com
Generate
accelerator
hardware
HDL
files
Arista’s
on-board
FGPA
Generate
hardware
interfaces
Generate
software
interfaces
C software
libraries
Reference slides from hereafter
www.ImpulseC.com
7
Custom FPGA-Accelerated
Solutions for the Arista 7124FX
Brian Durwood, Co-founder
Converting C to Multiple Streaming Hardware Processes
FPGAs – Advantages Over Software
Massive parallelism
At system level, loop level, instruction level
One FPGA can replace multiple CPUs
For specific tasks/algorithms, using much lower power
No need for separate NIC card
Enable in line processing at near line speed
Minimize OS interference in filtering
Especially during high transaction load events
Reduces jitter and other interference
Offloads standard CPUs with customized pre-processors
e.g. select limited analysis of X message types that meet X
criteria for X symbols
www.ImpulseAccelerated.com
Confidential
9
3 Popular FPGA Configurations
Usage
Embedded
CPU
Core
Usage
Option
1
2
Generated
Generated
hardware
Hardware
module
Module
Generated
Embedded
Hardware
hardware
Accelerators
accelerator
FPGA
FPGA
Create a hardware module
Accelerate an embedded CPU
Usage
3
Accelerate an
external/host CPU
or computing
cluster
Host
processor
or cluster
Generated
Generated
hardware
Generated
hardware
Generated
accelerator
hardware
accelerator
hardware
accelerator
accelerator
FPGA coprocessor
10
Configurations Can Be Combined
Combining streaming, embedded processor, and host processor
10G Ethernet
Stream
processing
and
parsing
Host
message
generation
FPGA
Matching
algorithm
and strategy
Embedded
CPU
for
configuration
Embedded and shared RAM
FPGA
FPGA strategies can be coded using
C for hardware and for embedded
CPU, with shared RAM for hash table
lookup or other local data
www.ImpulseAccelerated.com
Impulse C Programming Model
C
C
C
H/W process
S/W process
C
C
S/W process
H/W process
H/W process
Communicating C-Language Processes
Supports dataflow and message-based communications
Supports parallelism at the application level and at the level of
individual processes
Allows simulation and
debugging of parallel
software processes.
www.ImpulseAccelerated.com
12
Parallelism via Multiple Processes
Spatial
parallelism
C
C
C
C
C
C
C
C
Temporal
parallelism
(system-level pipelining)
www.ImpulseAccelerated.com
13
An Impulse C Process
Shared memory
C
block reads/writes
Stream
C
Multiple methods of
process-to-process
communications
are supported
inputs
Signal
inputs
C
Stream
process
Signal
outputs
Register
Register
inputs
outputs
App Monitor
outputs
C
outputs
C
Processes are independently
synchronized
www.ImpulseAccelerated.com
14
Compile and Optimize
Optimize the results using
interactive tools
Pipeline analysis
Loop unrolling
Instruction scheduling
Generate FPGA hardware
VHDL or Verilog
Low level interfaces to
memory, I/O and
busses.
ModelSim Test bench
www.ImpulseAccelerated.com
15
Debug and Verify
Use C tools for application
debugging
Source-level debuggers
C-language testing
Test and analyze parallel
dataflow with the Impulse
Application Monitor
Automatically generate
VHDL or Verilog Testbenches
www.ImpulseAccelerated.com
16
Constructs Familiar to C Programmers
Concept is similar to getc(), putc() in C for I/O
co_stream_create
Used in configuration
co_stream_open
co_stream_close
co_stream_eos
Open the stream (clear eos)
Close the stream (set eos)
Check end of stream (eos)
co_stream_read
co_stream_write
Read from stream (with rdy, en)
Write to stream (with rdy, en)
co_stream_read_nb
co_stream_write_nb
Non-blocking read (no rdy)
No-blocking write (no rdy)
www.ImpulseC.com
17
Credible Solution in use by:
Multiple Confidential
Financial
NDA Covered
Financial Teams
www.ImpulseAccelerated.com
Confidential
18
Impulse Platform Support Package
FPGA
Embedded
Processor
Impulse
CoDeveloper™
Produces
Memory
Resources
Host Interfaces
FPGA
Fabric Processing
Core
PSP generates HW/SW
wrappers between FPGA
core & system elements
Ethernet
Other I/O
Extensions (scripts and wrapper generators)
Platform-specific library functions
Documentation and tutorials
Current ready to run examples for platform
www.ImpulseAccelerated.com
Confidential
19
Examples of FPGA processing:
Financial feed kernel bypass or Full
Hardware based trading
Direct handling of financial feeds
Parsing incoming feeds and triggering
outbound orders – your strategy in
hardware
Normalization or Protocol Conversion
Gateway sending a sub-feed of data
Pre-Trade Risk Checking
Low Latency Broker Dealer Compliance
Financial valuations
Co-processor off-loading for Monte Carlo
and other algorithms
www.ImpulseAccelerated.com
Confidential
20
Stand-Alone Feed Handling Solution
Usage
3
RX
Adapter
(Verilog)
Feed Handler
and
Outbound UDP
(Impulse C)
1G or 10G
Ethernet
MAC
TX
Adapter
(Verilog)
www.ImpulseAccelerated.com
Confidential
21
Network Processing Pipeline
FPGA
1/10GigE
MAC
Enet
Filter
UDP Parser
and/or TCP/IP
Stack
UDP and TCP/IP
implemented
directly in FPGA
hardware for low
latency
Host System
Embedded
CPU
Custom
Filtering
Application
User
Application
Driver
Host
I/O Interface
www.ImpulseAccelerated.com
Confidential
Host
Memory
22
Complex Order Support
Adapterswithout OS
Processing
RMDS,
Bloomberg
Ultra-fast pattern matching
and
Custom.
Direct connection Impulse UDP/TCP
Exchanges, feed handlers, order data sources
Standard
and
Incoming
Custom
Feed
Handler Across Feeds
Normalizing
Formats
Produce
e.g.:
ITCH, Sub-Feed
OUCH,
Pull and Present Opportunities
OPRA,
10 Gb/S
BATS,
&
Decompression
Ethernet
Decryption
Generic
Replace UDP.
NIC
Apply Trade Logic
FPGA or FPGA-Based Board
Outgoing
Algorithms
User and
ReplaceTrading
NIC
Applications
Revert feed to exchange formats
Hardwire potential X required responses
Message Management
Exchanges
Trade With
Data
Filtering
www.ImpulseAccelerated.com
Analytics
Insert risk limitations awaiting confirm
Manage Risk
Confidential
23
Three Ways To Get Started
Learn the tools
Acquire an Impulse CoDeveloper license.
Work from the included reference designs.
Experiment with ways to optimize your algorithms to run efficiently as
multiple streaming processes in FPGA.
Turn Key System (“Bump in the Wire”)
License above +
UDP or other network attached FPGA-enabled reference design.
FPGA-based accelerator platform.
Impulse factory engineers to help get your system on line.
Turn Key System Running A Target Algorithm
License above + Turn Key System above +
Impulse Engineers, under NDA, refactor your target algorithm(s) for
efficient compilation to FPGA.
Impulse Engineers train your team on how the refactoring works.
www.ImpulseAccelerated.com
Confidential
24
About Impulse
Most widely used C to FGPA tool
Pure ANSI C
No PAR or HW statements inserted
Founded in 2002
By part of the original ABEL team
www.ImpulseAccelerated.com
Confidential
25
Additional Resources
Engineering consultation
[email protected]
Tutorials:
www.ImpulseAccelerated.com/Tutorials
Book:
Practical FPGA
Programming in C
www.ImpulseAccelerated.com
26
Arista Application Switch – Systems Design
Compute, Storage, Memory, I/O, Application Acceleration –
Together
HPC on Wall Street - 2012
Platform Details
Console Port
Clock Input
Air Vents
16 Base SFP/SFP+ Ports
24 Wirespeed 1G/ 10G SFP/ SFP + Ports
8 FX SFP/SFP+ Ports
USB Port
Management Port
High Availability:
Dual Hot-swappable Power Supplies
Multiple Hot-swappable Fan Units
Designed for Data Center + Colocation:
Flexible Front-to-Rear or Rear-to-Front
Airflow
Choice of AC or DC Power Supplies
Application Switching
Cloud
Networks
HPC onfor
Wall
Street
- 2012
Arista Application Switch - 7124FX
Ultra Low Latency 24 port 10GbE Switch
•
•
•
•
16 10GbE ports connected to LLE ASIC
8 10GbE ports connected through Stratix V FPGA
Built in 50GB SSD
Optional Chip-Scale Atomic Clock and External Clock
Source
HPC on Wall Street - 2012
Application Switch Markets
Financial
Services
Broker/DMA
Market Data
HFT/Algo
Exchanges
Government
Signals
Intelligence
Link Encryption
Distributed
Lawful Intercept
HPC &
Medical
Telecom
Diagnostic
Imaging
Telemedicine
Data Filtering
Video
Broadcasting
Transcoding
Network Security
HPC on Wall Street - 2012
Financial Services Applications
Inline Risk Analysis
Feed Handling and A/B Arbitration
Real-time Data analysis
Algorithmic trading
Order Protocol Conversion
Order Execution Routing
Application Switching for Cloud Networks
Low Latency Broker Dealer Compliance
Offload line arbitration to dramatically
improve application performance
Instrument transaction performance at high
resolution
Reducing system latency increases
performance of trading strategies
Convert or normalize multiple order entry
formats to a common format
Set order policies for best execution
March 19,
2011
HPC on Wall Street
- 2012
Developing on the Application Switch
Full
Custom
Customer programs the FPGA
subsystem. Arista provides software that
validates HW and implements the FPGA
for the 8 ports
Outsourced
Custom
Customer outsources development to
Arista verified partner like Impulse-C or
Enyx who develops the custom
capability
Off the
Shelf
Customer purchases a prebuilt
application such as Exegy to run on the
Application Switch
HPC on Wall Street - 2012
Application Switch Development Partners
Complete integrated appliance model
• Novasparks 100% Hardware market data solution
• Exegy Appliance based robust ticker plant
System integrators and development support
• Impulse C C to RTL tools
• Enyx Customer trading solutions and IP blocks
HPC on Wall Street - 2012
Arista Application Switch 7124FX
A new category of product that provides a network accelerated
platform for high performance app vendors to develop on
Combines a true network switch with full routing and switching
protocols, with fully-programmable hardware creates a new
market for the most demanding applications
Application logic inserted into real-time environments with
complete transparency
HPC on Wall Street - 2012