1 - Lyle School of Engineering
Download
Report
Transcript 1 - Lyle School of Engineering
IIIT Allahabad
VoIP Data
IIIT Allahabad
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Dallas, Texas 75275, USA
[email protected]
Support provided by Fulbright Grant and IIIT Allahabad
1
• VoIP overview
• CDR
• CDR Example using EMM
IIIT Allahabad
VoIP Data Outline
2
IIIT Allahabad
VoIP Overview
3
http://www.voipmechanic.com/what-is-voip.htm
• Travel
• Cost reduction
• Additional Features: Voice messages, call forwarding, logs,
caller ID, …
• Integration of business tools
• Common network infrastructure
IIIT Allahabad
VoIP Advantages
4
• Need reliable broadband internet connection
• Voice quality
IIIT Allahabad
VoIP Disadvantages
5
• Analog Telephone Adapter (ATA) converts analog phone call to
digital signal.
• Sent over internet as data packets.
• Converted back to digital analog.
IIIT Allahabad
Telephone-VoIP Steps
6
• Software on server or ATA that converts voice signal into
digital data.
• COmpressor – DECompressor
• COder – DECoder
• Sample (8000, 24000, 32000 times per second)
• Sort
• Compress
• Packetize
IIIT Allahabad
VoIP Codec
7
• SIP (Session Initiation Protocol)
• Signaling to set up and tear down sessions.
• SDP (Session Description Protocol)
• Describe call
IIIT Allahabad
Protocols
• RTP (Realtime Transport Protocol)
• Exchange data/voice packets
• Media Transport to transmit packets
8
•
•
•
•
•
•
•
•
Setup
Connect
Disconnect
Syntax similar to HTTP
Bind to IP address using SIP registration
URLs for address format: [email protected]
Independent of application or data types
Uses RTP and SDP
IIIT Allahabad
SIP
9
IIIT Allahabad
SIP Overview
10
http://www.voipmechanic.com/sip-basics.htm
IIIT Allahabad
VoIP Data Packet [4]
11
• Any of this digital data could be saved and analyzed.
• Typically only statistical/summary information about the calls
is saved
• These Call Detail Records (CDR) are use for billing and analysis
IIIT Allahabad
VoIP Data
12
• Log of VoIP usage
• May be by account
• Typical attributes:
•
•
•
•
•
•
•
Source
Destination
Duration of call
Amount billed
Total usage time in billing period
Remaining time in billing period
Total charge in billing period
• The format of the CDR varies among VoIP providers or
programs. Some programs allow CDRs to be configured by the
user.
IIIT Allahabad
Call Detail Record
13
• Usually created through special Authentication, Authorization,
and Accounting (AAA) server.
• May also be created by logging capabilities at gateway or
router using a syslog server software.
• Normally simply csv format.
• Normally uses UDP, so underlying data packets are not
sequenced and may be lost (Redundancy of servers can help.)
• Timestamps between routers can be synchronized using a
Network Time Protocol (NTP).
• CDR generated for both forward and return leg of call.
• http://www.cisco.com/en/US/tech/tk1077/technologies
_tech_note09186a0080094e72.shtml
IIIT Allahabad
CDR Generation [3]
14
• VoIP traffic in their Richardson, Texas facility from Mon
Sep 22 12:17:32 2003 to Mon Nov 17 11:29:11 2003.
• Over 1.5 million call trials were logged
• 272,646 connected calls
• 66 attributes including source, destination, starting time,
duration, routing/switching, device, etc
• Application: Anomaly Detection (Classification)
• Goal: Find unusual call patterns based on type and time
of call
• Technique: New data structure, New classification
algorithm, New visualization technique
• Sample of raw csv data:
http://lyle.smu.edu/~mhd/iiit/start.csv
IIIT Allahabad
Example: CISCO CDR Data
15
• Remove the attributes other than source, destination, starting
time, duration from the logs.
• Count the connected calls and discard unconnected calls.
• The total number of connected calls was 272,646.5 phone
classes: internal, local, national, international, unknown.
• 25 link classes (source class + destination class)
• Data is aggregated into 15 minute time intervals.
• The total number of time points is 5422 and the total number
of attributes is 26.
• Add two attributes, namely, type of day (workday or weekend)
and time of the day, to the processed data. This step gives a
spatio-temporal cube in the model space.
• http://www.engr.smu.edu/~mhd/7331f08/CISCOEMM.xls
IIIT Allahabad
CISCO Preprocessing
16
IIIT Allahabad
CISCO Data Visualization
http://www.lyle.smu.edu/~mhd/7331f11/CiscoEMM.png
17
Records may arrive at a rapid rate
High volume (possibly infinite) of continuous data
Concept drifts: Data distribution changes on the fly
Data does not necessarily fit any distribution pattern
Multidimensional
Temporal
Spatial
Data are collected in discrete time intervals,
Data are in structured format, <a1, a2, …>
Data hold an approximation of the Markov property.
IIIT Allahabad
Spatiotemporal Stream Data
18
• Events arriving in a stream
• At any time, t, we can view the state of
the problem as represented by a vector
of n numeric values:
Vt = <S1t, S2t, ..., Snt>
V1
S1
S2
…
Sn
S11
S21
…
Sn1
Time
V2
S12
S22
…
Sn2
…
…
…
…
…
IIIT Allahabad
Spatiotemporal Environment
Vq
S1q
S2q
…
Snq
19
Data Stream Modeling
Single pass: Each record is examined at most once
Bounded storage: Limited Memory for storing synopsis
Real-time: Per record processing time must be low
Summarization (Synopsis )of data
Use data NOT SAMPLE
Temporal and Spatial
Dynamic
Continuous (infinite stream)
Learn
Forget
Sublinear growth rate - Clustering
IIIT Allahabad
•
•
•
•
•
•
•
•
•
•
•
20
20
A first order Markov Chain is a finite or countably infinite
sequence of events {E1, E2, … } over discrete time points,
where Pij = P(Ej | Ei), and at any time the future behavior of
the process is based solely on the current state
IIIT Allahabad
MM
A Markov Model (MM) is a graph with m vertices or states, S,
and directed arcs, A, such that:
• S ={N1,N2, …, Nm}, and
• A = {Lij | i 1, 2, …, m, j 1, 2, …, m} and Each arc,
Lij = <Ni,Nj> is labeled with a transition probability
Pij = P(Nj | Ni).
21
• Time Varying Discrete First Order Markov Model
• Nodes are clusters of real world states.
• Learning continues during application phase.
• Learning:
• Transition probabilities between nodes
• Node labels (centroid/medoid of cluster)
• Nodes are added and removed as data arrives
IIIT Allahabad
Extensible Markov Model (EMM)
22
IIIT Allahabad
EMM Creation
<18,10,3,3,1,0,0>
<17,10,2,3,1,0,0>
2/3
2/3
2/2
2/3
1
1/2
/1
1/2
N3
<16,9,2,3,1,0,0>
<14,8,2,3,1,0,0>
N1
1/3
1/1
1/2
1/1
N2
<14,8,2,3,0,0,0>
<18,10,3,3,1,1,0.>
23
• EMMRare algorithm indicates if the current input
event is rare. Using a threshold occurrence
percentage, the input event is determined to be
rare if either of the following occurs:
• The frequency of the node at time t+1 is below
this threshold
• The updated transition probability of the MC
transition from node at time t to the node at t+1
is below the threshold
IIIT Allahabad
EMMRare
24
IIIT Allahabad
Sublinear Growth Rate
25
IIIT Allahabad
Rare Event in Cisco Data
26
1. VoIP Mechanic, “What is VoIP?, a tutorial.” http://www.voipmechanic.com/what-is-voip.htm .
2. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a
Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006, pp
629-634.
3. Cisco, “CDR Logging Configuration with Syslog Servers and Cisco IOS Gateways,” Document ID: 14068,
February 24, 2006,
http://www.cisco.com/en/US/tech/tk1077/technologies_tech_note09186a0080094e72.shtml .
4. Cisco, “Voice Over IP – Per Call Bandwidth Consumption,” Document ID: 7934, February 2, 2008,
http://www.cisco.com/en/US/tech/tk652/tk698/technologies_tech_note09186a0080094ae2.shtml .
5. “VoIPThink”, http://www.en.voipforo.com , Accessed February 1, 2012.
6. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM
Conference, November 2004, pp 371-374.
7. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,”
Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer
Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.)
8. Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data
Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50.
9. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,”
Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer
Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) (Extended version submitted to
Journal of Computers.)
10. Yu Meng, Margaret Dunham, Marco Marchetti, and Jie Huang, ”Rare Event Detection in a
Spatiotemporal Environment,” Proceedings of the IEEE Conference on Granular Computing, May 2006,
pp 629-634.
IIIT Allahabad
References
27