No Slide Title - icalepcs 2005

Download Report

Transcript No Slide Title - icalepcs 2005

Key Topics
 NIF Overview
 Distributed Component Architecture
 Distribution Complexity
 Connection Management
 Diagnostic Framework
 Shot Automation
 Commissioning Tools
 Software Framework Evolution
NIF Facility flyover
NIF is a stadium-sized facility that will contain  a 192-beam, 1.8-Megajoule, 500-Terawatt, ultraviolet laser system
 a 10-meter diameter target chamber with room for nearly 100 diagnostics
Symmetry & nomenclature
National Ignition Facility
“Quads” and “Bundles” are the basic
building blocks of the NIF
The Integrated Computer Control System (ICCS)
for NIF is based on a scalable software framework
 Object-oriented Ada95, Java, CORBA
 Ada implements control system semantics
 Java implements GUI layer and COTS integration
 CORBA provides transparent language binding and distributed
communication using TCP/IP transport
 60,000 control points, 140,000 CORBA objects, 750 computers
Framework Approach
Framework Toolkit
NIF Control System
Operators
Architecture &
Environment
Supervisory
Controls
text
text
Application
Patterns
text
Customizable
text
Software
Components
Build
Subsystems
Configuration
Logging
Archive
etc.
Bundle
Managers
Automatic
Controls
“Plug-in” Interconnections
Common
Services
Front-End
Processors
Distribution Mechanism
LRU actuators, sensors, devices
ICCS Distributed Component Architecture
Framework
Server
GUI
Navigator
Status &
Control
Supervisor
Shot
Supervisor
FWL
FWL
FWL
FWL
CORBA
FWL
Process Kind
Process
Counts
FEP
device
A framework layer
(FWL) resides in
each process
Framework Server
CORBA
Objects
10
338
Status & Control Supervisor
219
3736
Shot Supervisor
249
3864
Front-End Processors (FEP)
700
134916
50
250
1228
143104
GUI Navigator
Totals
Common Application Process Architecture
C
T
O
C
R
P
B
Connection
Objects
Application Objects
Object System Status Message
Shot
Machine Config/
Alerts Events Reservations Archive History Name
Factory Manager Monitor Log
Service
Framework Agents
A
Startup/Shutdown
Heartbeat
Diagnostic Agent
UDP
• Common startup and shutdown protocol
• Behavior of application processes is completely data driven
• Service distribution encapsulated by Framework Service APIs
Distribution Complexity
 Layered client-server computer architecture
 340 distinct CORBA interfaces
 700 Front-End Processors (Power PC, x86, Sun) interface to
various sensors, actuators, instruments
 50 server class computers (Sun) host supervisor software and
framework servers
 14 console PCs host Java GUIs in the control room
 Bundle-based hardware partitioning eliminates scaling risk by
bounding CORBA object populations and TCP connections
 CORBA provides transparent language binding (Java, Ada95) and
location independent inter-process communication
 Policies define interface de-coupling mechanisms and common
exception pattern
ICCS employs a component-based communication
architecture with connection management
 Decoupled inter-process communications reduces deadlock
potential
 Object reconnection allows transparent process restarts
 Subscription management restores all subscription services
 Process heartbeats verify process health
 Status health heartbeats provide positive status health feedback
 Timed remote invocations protect clients from problematic services
ICCS provides fault resilience – degraded operation in the
presence of server failure and recovery upon server restoration
Diagnostic Architecture Framework
 “Out-of-CORBA-band” probes (UDP)
 System Level Diagnostics
 Network, I/O, memory, CPU
 SNMP based
 Process Level Diagnostics
 Diagnostic Agent embedded into ICCS process architecture
 Custom diagnostic objects register with the Diagnostic Agent
 Remotely activated and displayed
 Stored for off-line analysis
 Supports framework and application diagnostic classes
Shot Automation Framework
 Requirements derived from NIF Early Light experience
 Intensive, year long, design and development effort
 Achieves NIF scale through bundle partitioning
 State machine guides operators through 10 well defined shot cycle states
 Shot model describes (in data) subsystem activities and dependencies
 A workflow engine executes the shot model
 Calculated participation based on laser beam destination and diagnostic use
(only control components that are in the beam path)
 Support error recovery
 Operators given Pause/Play, Stop, Manual Step, Retry, and Resume
Automation semantics
 Can de-participate non-performing non-critical components/segments
 Shot model editor provides flexibility to define different NIF operating
scenarios
Commissioning Tools facilitate NIF build-out
 Tools that enable efficient calibration/qualification of laser
components
 Framework utilizes existing device layer CORBA interfaces and
separates displays from algorithms via autonomous threads
 Contain complex algorithms that send commands to collections of
devices and aggregate/process data
 Results stored in a configuration database for use in integrated shot
operations
ICCS Framework Evolution
 Developed over the past 4+ years
 Iterative process – requirements, design, and refinement
 April and Sept 2001 releases of Framework 1.0 and 2.0 contained common
services and templates
 April 2002 release of Framework 3.0 satisfied significant portions of
connection management and fault tolerant performance requirements
 Application layers built on Framework 3.x supported NIF Early Light
activation and experimental campaigns through summer of 2004
 Sept 2003 release of Framework 4.0 contained migration to new versions of
COTS (OS, Database, ORBs, compilers, CM systems)
 Nov 2004 release of Framework 4.1 contained Diagnostic, Shot Automation,
and Commissioning Tool Frameworks
 Jan 2006 release of Framework 5.0 contains additional commissioning tools
and refinements to Alert, Reservation, and Shot Automation
ICCS is positioned for June 2006 multi-bundle
milestone with framework release 5.0
 Connection Management minimizes impact of process failures
 Distribution complexity is mitigated through hardware partitioning
and de-coupled interface design patterns
 Model-driven shot automation provides flexibility to operate the NIF
in different ways
 Highly data-driven architecture allows modification of run-time
behavior without new software releases
 Run-time diagnostics for collecting and evaluating system
performance
 Commissioning tools help meet rapid laser construction schedule
 Rigorous manual and automated formal software testing program
removes >95% of software defects before deployment
Status of the use of Large-Scale CORBADistributed Software Framework for
NIF Controls
ICALEPCS 2005
October 10-14, 2005
R. Carey, R. Bettenhausen, C. Estes, J. Fisher, J. Krammen, L. Lagin,
A. Ludwigsen, D. Mathisen, J. Matone, R. Patterson, C. Reynolds,
R. Sanchez, E. Stout, J. Tappero, P. VanArsdall
National Ignition Facility
Lawrence Livermore National Laboratory
The work was performed under the auspices of the U.S. Department of Energy
National Nuclear Security Administration by the University of California
Lawrence Livermore National Laboratory under Contract No. W-7405-ENG-48.
Network
Test suite characterized detailed effects of
CORBA failure modes
Java Java
Ada
Ada
Objects
Clients Servers Clients Servers ---------------CORBA
Visibroker
OIS
---------------Transport
TCP/IP
---------------OS
Solaris
 Failures under different socket
conditions:
 Server fails before/after initial
client connection
 Client fails after server
connection
 Failures during request processing:
Ada
Ada
Clients Servers Objects
---------------CORBA
OIS
---------------Transport
TCP/IP
---------------OS
VxWorks
 Server fails during processing
 Client fails during a request
 Client registers callback; client
fails, restarts, and re-registers.
Server attempts client call-back
 Client sends request to server.
Server hangs
CORBA failure modes are handled by
Connection Management Framework
Plug-and-play component architecture is
designed to scale up
ICCS Core
Servers
Configuration
Archive
History
Message Log
Alerts, Events
Reservations
Shot Set Up
Supervisor Consoles
System
Manager
Supervisor
Control Units
Graphical User
Interfaces
Software Distribution Bus (CORBA over network)
Front End Processors
Device
Control
Device
Reservation
Device
Emulation
Status
Monitor
Controller
Interface
Device
History