Transcript Slide 1
QNX Technology Overview
Kerry Johnson
July 18, 2015
Networking System Complexity
Fault Management
Remote
Network Management
Diagnostics
Health Monitoring
Alarm Detection &
Reporting
SNMP
TL1
Control Loops
Network Topology
and Restoration
G-MPLS
CR-LDP
OSPF
Performance Monitoring
IP Statistics
SONET/SDH
Local Craft
Interface
CLI
GUI
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
2
Developing Complex Systems
Diverse system requirements
Management
Interfaces
> Rich feature set
> Demanding performance requirements
> Demanding availability requirements
Fault
Management
Large, multi-site teams
development
> Geographic and time zone separation
Division of responsibilities,
functional areas and expertise
> Differing designer skill sets
Parallel development, followed by
system integration and verification
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
Restoration
3
The QNX development platform
QNX® Momentics® development suite
Develop
Deploy
QNX® Neutrino® RTOS
Debug
Analyze
Optimize
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
4
QNX Product Snapshot
QNX4
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
Wavemaker “Talk”
5
Achieving Scalability
Scaling the platform
> Distributed, multiprocessor system to accommodate required
capacity
> Processing intensive nodes can accommodate growth through
multiprocessor approach (symmetric processing systems)
Scaling the software
> Software architecture needs to accommodate distributed
environment
> Software needs to support parallel processing
Scaling development teams
> Building large scale systems is also a people issue
> Tools and development techniques are a key consideration
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
6
Distributed processing with a monolithic kernel
Application
Monolithic kernel provides
process management, timers,
scheduling, interrupt handling
In addition, many OS services
are implemented in the kernel
Application
Monolithic Kernel
> Network stack
Process Management, Scheduling
Timers, IPC
> File Systems
> Device drivers
TCP/IP
Stack
File
System
Ethernet
Driver
Flash
Driver
Ethernet
Port
July 18, 2015
Flash
Memory
PCI
Driver
PCI
Bus
Distributed processing only
applies to the application
level
> TCP/IP stack, file systems, device
drivers are not addressable by
external systems
QNX Confidential. All content copyright QNX Software Systems.
7
Distributed processing with a micro kernel
Application
Micro kernel provides basic
process management, IPC,
interrupts, scheduling
Most OS services are
implemented as applications in
user space
Application
Micro Kernel
Process Management, Scheduling
Timers, IPC
> File systems
> Device drivers
TCP/IP
Stack
File
System
Ethernet
Driver
Flash
Driver
PCI
Driver
Ethernet
Port
Flash
Memory
PCI
Bus
July 18, 2015
> Network stack
Services exist as applications
that can be used either locally
or remotely
> External applications can address
OS services on a node
> Protection / control is provided
through a process interface
QNX Confidential. All content copyright QNX Software Systems.
8
Distributed Processing
Internet
QNX
Networking
Stack
Application
Flash File
System
Transparent Distributed
Processing extends message
passing over a transport layer
Connectivity (e.g. Ethernet)
Applications
QNX Neutrino
Microkernel
/ services can be
built in a fully distributed
manner without special code
Application
Node 2
> Networking stack
> File systems
Database
QNX Neutrino
Microkernel
Flash File
System
Application
Node 1
> Hardware ports
Seamless
sharing of I/O
resources between cores (e.g.
use flash memory located on
another node)
fd = open(“/dev/ffs1”,…);
open(“/net/node2/dev/ffs1”,…);
write(fd, …);
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
9
Scaling performance through multi-processing
Applications
Applications
Applications
OS
OS
OS
Core
Core 1
Core 2
Interconnect
Memory
Core 1
Core 2
Interconnect
I/O
Memory
Single core
I/O
Multi-core
Memory
I/O
Discrete
Multiprocessing
Processor scaling by adding CPUs and/or CPU cores
Software scaling through symmetric multiprocessing (SMP)
abstracts physical processor configuration from software
Transparent distributed processing abstracts processor
boundaries
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
10
Scaling development by partitioning
Management
Interfaces
Restoration
System development
• System designers allocate CPU
budget to subsystems /
development teams
• Design teams can develop their
own priority schemes to use
CPU time effectively
• Test under worst case load to
verify partition budgets
Partitioning OS
• Budgets enforced by RTOS
• Priority based scheduling within
partition
July 18, 2015
10%
40%
10%
40%
Fault
Management
QNX Confidential. All content copyright QNX Software Systems.
Performance
Monitoring
11
Developing, Troubleshooting & Optimization
Visual system analysis
QNX Momentics
on development host
Display
Trace
Information
Target system running
QNX Neutrino
instrumented kernel
• Single step to capture, upload and
view system trace
• Quickly visualize system
interaction and behavior
System Profiler
July 18, 2015
• View interrupts, thread states,
event timing, CPU usage,
partitions, IPC, and much more….
QNX Confidential. All content copyright QNX Software Systems.
12
High Availability
Availability =
Mean Time Between Failure
Mean Time Between Failure + Mean Time To Repair
> Increase in MTBF increases availability
> Decrease in MTTR increases availability
> The formula above is simplified, other considerations:
Reducing impact of failure also increases availability
Parallel (redundant) components increase availability
Serial dependencies where failure of one component causes
unavailability of dependent components decreases availability
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
13
Reducing impact of failures
Application
Application
Application
Micro Kernel
Monolithic Kernel
TCP/IP
Stack
File
System
Ethernet
Driver
Flash
Driver
Ethernet
Port
Flash
Memory
TCP/IP
Stack
File
System
PCI
Driver
Ethernet
Driver
Flash
Driver
PCI
Driver
PCI
Bus
Ethernet
Port
Flash
Memory
PCI
Bus
Many serial dependencies
High impact of failure
Longer recovery time (system restart)
July 18, 2015
Application
Serial dependencies reduced
Lower impact of failure
Faster recovery time (restart component)
QNX Confidential. All content copyright QNX Software Systems.
14
Automatic recovery to improve MTTR
Application
• Reduce downtime
• Restart / recover processes without
affecting higher layers
Application
Microkernel
CPM Guardian
TCP/IP
Stack
File
System
recover
restart
Ethernet
Driver
Flash
Driver
PCI
Driver
Ethernet
Port
Flash
Memory
PCI
Bus
July 18, 2015
monitor
notify
Critical Process
Monitor
Remote Node
• CPM / Guardian provide high availability
monitor
• Heartbeat monitoring
• Detect process deaths
• Restart failed processes
• Multi-stage recovery
• High availability connections provide
failure notification / recovery procedures
QNX Confidential. All content copyright QNX Software Systems.
15
Partitioning to increase availability
Management
Interfaces
Restoration
Add partitioning to:
• Contain faulty subsystems
that cause CPU overload
• Guaranteed CPU time for
failure recovery
• Guaranteed CPU time for
user interfaces / remote
monitoring
35%
10%
10%
Performance
Monitoring
40%
5%
High Availability
Fault
Management
CPM Guardian
Critical Process
Monitor
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
16
Scalable and Available
Scalability through distributed multiprocessing
> Transparent Distributed Processing
> Multi-Core / multiprocessing support
> System visualization tools
Architecture for high availability
> Microkernel architecture and memory protection
> Critical Process Monitoring
> Adaptive Partitioning
July 18, 2015
QNX Confidential. All content copyright QNX Software Systems.
17