Environmental Information eXchange Network

Download Report

Transcript Environmental Information eXchange Network

Designing a Data Exchange Best Practices
• Data Exchange Scenarios
– Sender vs. Receiver-initiated exchanges
– Node Design
• Best Practices:
–
–
–
–
–
Handling Large Transactions
State Management
Data Services
Data Validation
Schema Design
1
Data Exchange Scenarios
DATA PROVIDER
Database
Exchange
Network
Node
Menu of
Services
1. GetFacilities
2.GetPermits
3.GetProjects
4.Get...
EPA CDX
Node
Database
EPA
Desktop
Software
Exchange
Network
Node
Web
Site
DATA CONSUMER A
DATA CONSUMER B
DATA CONSUMER C
Data Synchronization
Exchange
Data Publishing Exchange
Submit to Data Consumer
Get from Data Provider
2
Requesting Data
(1 of 3)
Simple Query
PARTNER A
PARTNER B
Query
Query Response
– Synchronous process
– Ideal for small data sets
– Ideal for both ad hoc and planned
exchanges
– Onus is on requestor to initiate exchange
3
Requesting Data
(2 of 3)
Solicit with Download
PARTNER A
PARTNER B
Solicit
Solicit Response
...time passes...
Download
Download Response
– Asynchronous process
– Good for larger datasets
– Data Provider can schedule processing of
request
– Requester can use “GetStatus” to see if data
is ready yet
4
Requesting Data
(3 of 3)
Solicit with Submit
PARTNER A
PARTNER B
Solicit
Solicit Response
...time passes...
Submit
Submit Response
– Asynchronous process
– Good for larger datasets
– Does not require the requestor to
continuously poll the data provider to see if
data is ready
5
Sending Data
(1 of 2)
Simple Submit
PARTNER A
PARTNER B
Submit
Submit Response
– Very simple and very common process
– Typical for traditional regulatory flows
– “Hides” data since is not exposed as a
service
6
Sending Data
(2 of 2)
Notify with Download
PARTNER A
PARTNER B
Notify
...time passes...
Download
Download Response
– Asynchronous approach to Simple Submit
– Receiver can perform download at the time
of their own choosing
7
Data Exchange Scenarios
• Nodes wait for requests
• Nodes may initiate actions (i.e. Submit)
• How can a node do both?
8
Node Components
Flow A
Flow B
`
Flow
Database
Request
Processor
Web Services
Interface
`
Node
Database
Node
Administration
Utility
Internet
Example Node Architecture
9
Node Components
Node can be divided into components,
each playing a different role:
Web Services
Interface
1. The Web Services Interface
• Acts as a listener for inbound requests
and submissions
• Hosted on a Web Server (i.e. IIS,
WebSphere)
• Should not do any heavy lifting (i.e.
data processing)
10
Node Components
(continued)
2. Request Processor
Request
Processor
•
Performs all data processing
–
–
•
Coupled with a scheduler component
–
–
•
Enables node to process Solicit requests at a
time of the node administrator’s choosing
Automatically kick off outbound processes (i.e.
daily Submit)
Flow agnostic
–
•
Composes XML files for outbound delivery
Decomposes and processes inbound XML files
Decoupled from specific flow implementations
Ideally installed on an Application Server
11
Node Components
(continued)
3. Node Administration Utility
Node
Administration
Utility
–
–
–
–
–
Create and manage local accounts
Install new data exchange components
Set processing schedules
Audit Node activity
Extract documents (inbound and outbound
should be stored)
12
Node Components
(continued)
4. Flow-specific components
Flow A
Flow B
– Discrete components tailored for a specific
data exchange
– Hot-swappable
– Services (interface) is generic
• Node configuration determines which
services are internal or public
• Node configuration determines whether
a given service is for Query or Solicit
13
Node Components
GetFacilities(params[])
(continued)
Pass Thru
(solicit)
Pass Thru
(submit (out))
GetInspections(params[])
Pass Thru
(query)
ProcessInboundData(XML)
Internal
(submit (in))
Web
Services
Interface
Flow A
Flow B...
Request
Processor
Node
Admin
Utility
Flow-to-Node Interface
14
Large Transactions
• Can cause problems in several areas:
–
–
–
–
–
Data retrieval (SQL)
XML serialization (sender side)
Transmission over Internet
XML deserialization (receiver side)
Schema validation (both sender and
receiver)
15
Large Transactions
• Stage data in a model similar to that which is
used by the schema
NODE
Firewall
(SQL)
–
–
–
–
Source Database
(Intranet)
Flow Database
(DMZ)
XML is hierarchal whereas RDBMS is relational
More secure
source system unaffected by node operations
Index query parameter fields
16
Large Transactions
(continued)
• Use an asynchronous exchange
– Use Solicit, not Query
• Schema design considerations
– Schema KEY/KEYREF discouraged
– Element naming may significantly affect file
size
<MailingAddressStateUSPSCode>OR</MailingAddressStateUSPSCode>
• Query “costing”
– Calculate the size of a given result set (i.e.
COUNT(*)) before running full query.
– Not very much experience in this area
17
Large Transactions
(continued)
• A well-designed flow can help avoid large
transactions
– “List” services can return only high-level data
Scenario 1:
• RCRA.GetFacilities(“WA”)
Scenario 2:
• RCRA.GetFacilityList(“WA”)
• RCRA.GetFacilityDetail(“WA”,”FACID1234”)
– Data service parameters can be used to limit
transaction size
Scenario 3:
• RCRA.GetFacilitiesByType(“WA”,”LQG”)
– All options affect schema design
18
Large Transactions
(continued)
• File compression
– zipping files can reduce file size by over
90%
• Compact storage (archiving)
• Significant reduction in time to transmit
• Disk I/O versus memory I/O
– If possible, avoid using techniques which
require system to read entire document into
memory in order to process. Toughie…
19
State Management
• State Management is required any time
two systems must be synchronized
• Contrast to Data Publishing exchange
• Typically the sender’s burden, but does
not have to be
• Partial rejects compound the difficulty
20
State Management
(continued)
• Flagging source data
– Set “submission status” indicator on source data
– Complexity is directly related to transaction
granularity
– Compounded if record-level rejects are performed
Permit
Discharge Point
Parameter
Measurement
INSERT, UPDATE, DELETE
GetPermits()
INSERT, UPDATE, DELETE
GetPipes()
INSERT, UPDATE, DELETE
GetParameters()
INSERT, UPDATE, DELETE
GetMeasurements()
Fine-Grain Transactions
Permit
INSERT, UPDATE, DELETE
GetPermitDetails()
INSERT, UPDATE, DELETE
GetMeasurements()
Discharge Point
Parameter
Measurement
Coarse-Grain Transactions
21
State Management
(continued)
• Exchange Network Header
– Same schema can be used to perform
different transactions
– Can remove the need for TransactionCode
(i.e. INSERT, UPDATE, DELETE) in schema
• “Delta” to derive data changes since last
submit
– Many systems do not store deleted data
– Compare last submission snapshot with
current snapshot, derive what has changed
• Incremental and full refresh services
– i.e. Facility Flow
22
Data Service Best Practices
• Data service naming conventions
{Prefix}.{Action}{Object}[By{Parameter(s)}]
i.e.: FacID.GetFacilityByName
• Work in Progress
• What about versioning?
23
Data Services Best Practices
Documenting data services:
– Data Service name
– Whether the service is supported by Query, Solicit, or both
– Parameters
• Parameter Name
• Index (order)
• Required/Optional
• Minimum/Maximum allowed values
• Data type (string, integer, Boolean, Date…)
• Whether multiple values can be supplied to the parameter
• Whether wildcard searches are supported and default wildcard
behavior
• Special formatting considerations
– Access/Security settings
– Return schema
– Special fault conditions
•
•
•
Wildcards:
Parameter delimiter:
Parameter operation:
%
| (pipe character)
AND
24
Data Validation Best Practices
• XML instance files should be validated
against the schema by the sender
before submittal
• CDX offering pre-submittal validation
services for some flows
• Schematron (Doug Timms)
25
Schema Design Best Practices
• DRC 1.0 and DRC 1.1
–
–
–
–
Schema Namespace
Schema Versioning
Exchange Network Schema Types
Use the Shared Schema Components
26