Stable Data Communication Conditions
Download
Report
Transcript Stable Data Communication Conditions
Assessment of Digital Equipment
for Safety and High Integrity
Applications – Session 4 of 6
Assessment of Reliability/
Dependability –COTS Components
Thuy Nguyen and Ray Torok
Joint IAEA - EPRI Workshop on
Modernization of Instrumentation and Control
Systems in NPPs
3 - 6 October, 2006
Vienna, Austria
1
Commercial off-the-Shelf (COTS)
Components are Attractive
• Many advantages
–
–
–
–
–
–
Proven track record
Lower vendor costs
More available
Opportunity to standardize
Features
…….
• However, for applications critical to safety or power
production, want assurance of high quality/dependability
• Problematic for digital equipment, even more so for COTS
• Don’t forget – other industries have this problem too
• The alternative, developing new equipment from scratch,
is even worse for safety and dependability
© 2006 Electric Power Research Institute, Inc. All rights reserved.
2
Review
- Digital “Issues”
• New behaviors and failure modes
• Greater complexity
• Human-machine interface
• Software (real-time)
–
–
–
–
Quality
Limited testability
Common mode failure
Flaws are ‘designed in’
• ‘Like-for-like’ replacement not generally possible
© 2006 Electric Power Research Institute, Inc. All rights reserved.
3
Assessment of COTS Components is
Problematic
• COT components are usually “evolutionary”
–
–
–
–
–
Variable development process
Rely on expertise of individuals
Variable documentation - not up to nuclear safety expectations
Operating history used to detect/fix problems
Still, the end product can be highly dependable
• Strong development process is considered important for digital
• Vendor cooperation to ‘look inside the box’ to understand design
features, defensive measures and failure modes
• Can’t (and don’t want to) force vendors to use nuclear safety
standards
• Want to find and credit all evidence of high dependability
© 2006 Electric Power Research Institute, Inc. All rights reserved.
4
Establishing Assurance Quality / Dependability
Level of Assurance
Utility
Nuclear Grade
Digital Equipment
Commercial Grade
Digital Equipment
Install/Test
Install/Test
Utility Evaluation
Utility Evaluation
Experience
Utility
Supplemental
Activities
Experience
Vendor
Nuclear Vendor
Activities
Addt'l Activities
Vendor
© 2006 Electric Power Research Institute, Inc. All rights reserved.
5
Commercial
Vendor Activities
Adequate level
of assurance
Tests and Evaluations Do Not Add Quality,
They Seek to Confirm its Existence
• Environmental qualification – temperature, humidity,
seismic, electromagnetic compatibility, etc.
• Functional & challenge testing
• Review vendor processes & documentation
–
–
–
–
Software development
Configuration management
Corrective actions
Manufacturing
• Review and credit use of standards, third party
certifications as appropriate – TUV, IEC, IEEE, ISO, etc.
(with verification)
© 2006 Electric Power Research Institute, Inc. All rights reserved.
6
Tests and Evaluations, cont’d
• Operating history assessment (mostly non-nuclear)
–
–
–
–
Relevance
Extent
Success
Evidence / documentation
• Critical design review
– software/hardware architectures
– failure modes
– abnormal behaviors
• Grade effort based on complexity and safety significance
• Base judgment on preponderance of evidence
• Want “reasonable assurance” (there are no guarantees)
© 2006 Electric Power Research Institute, Inc. All rights reserved.
7
EPRI ‘COTS Guidelines’ for Digital
•
EPRI TR-106439, Guideline on Evaluation and Acceptance of
Commercial Grade Digital Equipment for Nuclear Safety
Applications, October 1996
– Endorsed by NRC in SER, July 1997
•
EPRI TR-107339, Evaluating Commercial Digital Equipment for High
Integrity Applications - A Supplement to EPRI Report TR-106439,
December 1997
– More detailed, ‘how-to’ guidance
•
EPRI – 1011710, Handbook for Evaluating Critical Digital Equipment
and Systems, November 2005
– Update based on lessons learned
© 2006 Electric Power Research Institute, Inc. All rights reserved.
8
Popular Components for Evaluation
Smart transmitter
Single loop controller
Positioners for air-operated valve
Circuit breaker trip controller
© 2006 Electric Power Research Institute, Inc. All rights reserved.
9
General Results of
EPRI Component Evaluations
•
•
•
•
•
•
•
•
Evolutionary development
Experienced development team
Good manufacturing controls
Successful operating history
Software development documentation lacking
“Continue to run” design philosophy
Limited diagnostics
Failed parts of EMC tests
© 2006 Electric Power Research Institute, Inc. All rights reserved.
10
Lessons Learned –
Selecting Devices and Vendors
• The purchase price is a small fraction of the overall cost for
qualification. (Don’t select device based on price)
• Establish acceptable failure modes and abnormal behaviors
before selecting candidate devices
• If possible, select simplest device that will do the job
• Costs for qualification will depend on:
– To what extent commercial testing and/or certifications can be
credited
– What is required to extend device capabilities beyond
commercial specifications (e.g. EMC filter)
– Complexity of the device
– Extent and relevance of device operating history
– Level of involvement and cooperation of device vendor
© 2006 Electric Power Research Institute, Inc. All rights reserved.
11
Lessons Learned – Project Planning
• Avoid special application requirements or configurations
not in accordance with manufacturer recommendations.
• Establish appropriate level of QA for control of device,
testing, and V&V of test equipment.
• Define and budget for mitigation efforts for problems that
may be encountered during testing.
• Establish method for maintaining qualification.
© 2006 Electric Power Research Institute, Inc. All rights reserved.
12
Lessons Learned –
Vendor/Device On-site Review
• Review vendor design and development documents before
the visit to streamline and focus the on-site review.
• Assure the review team has appropriate experience and
expertise.
• Expect CDR shortcomings and plan for compensation.
• Develop a matrix of the critical attributes and methods of
verification prior to the on-site review.
© 2006 Electric Power Research Institute, Inc. All rights reserved.
13
Lessons Learned –
EMC Qualification
• Investigate and credit (if possible) vendor testing to CE
Mark, European EMC Directives, etc.
• Assure test equipment is immune to expected EMI levels
for device qualification testing.
• Identify potential device vulnerabilities through informal
testing.
• Fully understand test laboratory capabilities and
expertise of personnel.
• Plan and budget for fixes as failures are encountered.
© 2006 Electric Power Research Institute, Inc. All rights reserved.
14
Evaluation of Programmable Logic Controller
(PLC) Platforms
• Apply the same COTS evaluation techniques
• Added complexity increases difficulty
• Vendor should take the lead
• Three platforms have been “pre-qualified” by US regulator
– Siemens Teleperm XS
– Invensys/Triconex Tricon
– Westinghouse Common Q
• Others are considering pre-qualification
© 2006 Electric Power Research Institute, Inc. All rights reserved.
15
Assessment of Digital Equipment
for Safety and High Integrity
Applications – Session 4 of 6
Inter-Channel / Inter-System
Data Communications
Thuy Nguyen and Ray Torok
Joint IAEA - EPRI Workshop on
Modernization of Instrumentation and Control
Systems in NPPs
3 - 6 October, 2006
Vienna, Austria
16
Data Communication in Digital I&C Systems
• Advanced digital I&C architectures may feature data
communication between:
– Redundant divisions of I&C systems important to safety
– I&C systems of different safety classes
• Objective: improve error detection and fault tolerance
• May concern
– Digital upgrade of obsolete analog I&C systems
– Digital I&C in new plants
© 2006 Electric Power Research Institute, Inc. All rights reserved.
17
IEEE Standard 603-1998
• Standard Criteria for Safety Systems for Nuclear Power
Generating Stations
• Independence and physical separation between the
redundant channels of a safety system
– The failure of one channel cannot adversely affect the
ability of redundant channels to perform the
necessary safety functions
• Credible failures in, and consequential actions by, other
systems cannot adversely affect the ability of the safety
system to perform their intended safety functions
© 2006 Electric Power Research Institute, Inc. All rights reserved.
18
Data Communication and
Digital Common Cause Failures (CCF)
• Potential for digital CCF due to possible
– Failure of data communication links
– Uncommon (but correct) modes of data communication
links
• These could trigger concurrent digital failures of redundant
divisions or multiple systems
– Error propagation through data communication
• Identification of susceptibilities to digital CCF
– Diversity Guideline of BTP-19 and NUREG/CR 6303: 7
forms of diversity
Complementary approach based on the analysis of
defensive measures (EPRI D3 technical report TR1002835)
© 2006 Electric Power Research Institute, Inc. All rights reserved.
19
Defensive Measures for Data Communication
• Fault-tolerant overall digital architecture
– Single failure criterion
– Multiple data communication links
• Defensive measures against CCF of multiple links
– One-way data communication gateways
• Reliable data communication links
– Prevention of data communication failures and CCF
– Stable data communication conditions
• Communicating stations tolerant to:
– Data communication links failures
– Transmission of erroneous data
© 2006 Electric Power Research Institute, Inc. All rights reserved.
20
Simplified Example of
Fault-Tolerant Overall Architecture
One-way
gateway
to lower
safety
classes
One-way
gateway
to lower
safety
classes
Voting & Priority
Logic
© 2006 Electric Power Research Institute, Inc. All rights reserved.
21
Simplified Example of
Fault-Tolerant Overall Architecture - Cont’d
A
B
C
D
A
B
C
D
A
B
C
D
A
B
C
D
1234
Division A
© 2006 Electric Power Research Institute, Inc. All rights reserved.
1234
1234
Division B
Division C
22
1234
Division D
Preventing Data Communication Failure
• Application of rigorous development standards
– Low level of residual faults
• As few internal states as possible
– Facilitates testing and recovery
• Transparency to plant conditions
– Data communication links transparent to transmitted data values
– Stable data communication rates and conditions
• Protection against failures of communicating stations
– Stations failures cannot affect communication links behavior
besides acknowledgement and transmission of their availability /
unavailability
• Detection & correction or signaling of data transmission errors
© 2006 Electric Power Research Institute, Inc. All rights reserved.
23
Preventing Data Communication CCF
• Different applications and operating
conditions
– Communicating stations, Data
messages, Cycle time, ...
– Influence conditions need to be
identified, and differences / similarities
need to be assessed
• Same data communication platform
– Design measures can be taken to
reduce the likelihood of CCF due to
faults in data communication platform
• Overall design, Software, Hardware
© 2006 Electric Power Research Institute, Inc. All rights reserved.
24
Stable Data Communication Conditions
• Deterministic cyclic functioning of communication links
– Fixed cycle time
– For each cycle, fixed number of messages of fixed length, of fixed
semantics, in a fixed order
• Fixed number and identity of communicating stations
– Stations withdrawal and reinsertion do not affect the pre-determined
cyclic behavior
– Stations states (availability / unavailability) transmitted at each cycle
• Fixed role for each communicating station
– With respect to each message (send / receive / ignore)
© 2006 Electric Power Research Institute, Inc. All rights reserved.
25
Tolerance to Failures of
Data Communication Links
• Multiple communication links in diverse operating conditions
– Reflecting overall redundancy, separation and diversity in the
I&C architecture
• Identification and characterization of failure modes of communication
links
– Detection of communication links failures by stations
– Safety-classified stations can perform their safety functions or
reach safe state even when communication links fail
• Protection of stations against communication links failures
– Failures cannot affect stations behavior besides the required
actions
© 2006 Electric Power Research Institute, Inc. All rights reserved.
26
Tolerance to Transmission of Erroneous Data
• Plausibility checks of data received through
communication links
• Erroneous data caused by a single postulated failure
received through communication links cannot prevent a
safety classified station from performing its safety
functions
– May cause safe failures
© 2006 Electric Power Research Institute, Inc. All rights reserved.
27
Conclusion
• Appropriate defensive measures can provide reasonable
assurance that data communication between redundant
channels or safety / non-safety systems will not trigger
digital CCF
• Measures to be taken within the data communication
subsystems, within safety-classified stations, and at the
interface between communication subsystems and
stations
© 2006 Electric Power Research Institute, Inc. All rights reserved.
28