Reliable and Scalable Data Streaming in Multi-Hop

Download Report

Transcript Reliable and Scalable Data Streaming in Multi-Hop

Reliable and Scalable Data Streaming in
Multi-Hop Architecture
Sudhir Sangra, BMC Software
Lalit Shukla , BMC Software
www.cmgindia.org
Computer Measurement Group, India
1
Contents
• Introduction
• Challenges
• Approaches
– Acknowledge based
– Store and forward
– Distributed consumer model (Peer to Peer Acknowledge)
• Conclusion
• Questions
Computer Measurement Group, India
2
Introduction
• Multi-Hop architecture for various reasons but not
limited to
– Time to market - Buy decision – multiple component to
integrate
• Integration of the different component using different
technologies underline, to provide end to end solution
• Independent products performance and scalability needs to
be remained intact while new features are added to solve
the customer problem and provide value
– Multiple hop to increase scalability, failover, load balance
• Fully developed solution organically with technology parity
in end to end multi-hop architecture for streaming and
scalability but still there are challenges to handle when
middleware need to be stateless
Computer Measurement Group, India
3
Challenges
• Real time data streaming consumes resources on both
the end (provider and consumer) impact scalability
– Multi hop protocol difference
– No TCP-IP like protocol to handle guaranteed delivery in multihop environment
• Network bandwidth (Concerns when on WAN) specially
on Software as a Service model
– In-consistence detection trigger full data sync by provider
– Provider failover from one middleware to another middleware
initiated the full sync of data
• Priority data such as event data is treated as metric data
for transmission leading to delay in IT problem resolution
Computer Measurement Group, India
4
System behavior
The end to end system performance used to be function of the state of middle
ware. Analytics resource requirement (CPU/Memory) were reflecting the spikes
and used to take long time to settle after fixing the fault at the middle ware or
network
Computer Measurement Group, India
5
System behavior
Database Performance was also a function of middleware state and network. Abrupt
failure of any components were very expensive for DB operations.
Computer Measurement Group, India
6
Approach
• There could be various approaches to provide
reliability and scalability for data streaming but we
will be focusing in context of performance and
monitoring domain where data can be categorized in
3 different sets
– Event
– Performance metric
– Discovered resource instance
Computer Measurement Group, India
7
End to End Acknowledge Model Approach
• For each message sent, acknowledge is needed for reliability delivery, in
case acknowledge is not received message is retransmitted.
– Provide and consumer is sole responsible , middleware is stateless
– Message are not discarded from provider until consumer sends an ACK or time to
live timer expires
• Instead of network layer TCP/IP protocol, application protocol is devised to
handle the sequencing and ACK’ing as provider and consumer are not
directly connected over socket
• Round trip to receive ACK message is used to determine the time interval
for retransmission of message. This allows to take the server load in
consideration
• Dynamic sliding window to handle the bust of messages
– Message is added to window if window is empty and also added to disk cache
– In case window is full, message is written to disk cache
• Discovered resources instances messages are best suitable for this model
as
– No tolerance in data loss
Computer Measurement Group, India
8
End to End Acknowledge Model Approach
Computer Measurement Group, India
9
Store and Forward Model Approach
• In case of network glitch, server under maintenance mode or
server unavailability due to unknown reasons, data should not
be lost
– Application layer is transparent to these network issues
– Virtual socket abstract the file system and TCP/IP socket to peer
– Reestablishment of the connection via same hop or different hop in
case of fail over will not re-send the whole state but only delta which
is generated after the link was broken increase scalability
• Basically suitable for “performance metric” for which
acknowledgment is not desired because
– Huge data
– Some % of tolerance is acceptable for data loss
• Messages such as discovered resources instance piggy back
on this model but does not degrade the model because
change in system characteristics/resources are not frequent
Computer Measurement Group, India
10
Store and Forward Model Approach
Computer Measurement Group, India
11
Distributed Data Consumer Model Approach
• Event messages are highest priority message. Loss in
these messages leads to business loss
• These message requires high grade reliability and
message loss which are in transition is not acceptable
• Peer – to – Peer acknowledgment distributed consumer
model is best suited for these messages
–
–
–
–
Hop connected on TCP/IP socket is responsible for acknowledge
100 % fault tolerant and reliability
Each hop is designated as processing hop or presentation hop
Event processing happens on each hop which reduces the
network bandwidth and/or resources needed on end consumer
to process the message because pre-processing of message is
already done
Computer Measurement Group, India
12
Distributed Data Consumer Model Approach
Computer Measurement Group, India
13
System behavior
Process pronet_cntl
Sun 01/26/14 10:00 AM to Mon 01/27/14 10:00 AM
The system resources utilization is now uniformly distributed and no longer remains
the function of faulty middleware states.
Computer Measurement Group, India
14
Conclusion
• Categorizing the data and applying the relevant
model provided a significant improvement
– The architecture has increased the number of allowed
attributes under analytics from 1.2 M to 1.7 M, which is
nearly 30% more than the traditional integration method
of pulling data from the source nodes with no or minimal
data loss
– Number of device supported went from 200 to 1000 which
is 5 X improvement with no or minimal data loss
– The architecture became stateless and thus could fit to
extend linearly with a mix of distributed analytics and
distributed middleware
Computer Measurement Group, India
15
Questions
?s
Computer Measurement Group, India
16