Traffic Clusters in Networks of Convenience

Download Report

Transcript Traffic Clusters in Networks of Convenience

Traffic Clusters in Networks of
Convenience
Ron McLeod, PhD. (Candidate)
Director - Research and Corporate Development
Telecom Applications Research Alliance
(TARA)
FloCon 2009
Who is TARA
• Private consortium of 35 member companies and
research institutions all working in IT/Telecom.
• Most active investor in early stage IT companies in
Atlantic Canada.
• Senior Partners include:
– Bell Aliant
– Cisco Systems Canada
– Nortel Networks
• We are actively seeking Research collaborations
The Project
TARA has partnered with a group of companies in a multi year project
to analyze the outbound and inbound traffic in Networks of
Convenience.
The specific companies and specific objectives of The Project remain
confidential at this point.
However, From an analysis perspective we are first interested in
understanding the nature of this traffic.
Data sources are real traffic captures from hotels, airports and general
hotpots from around the world.
The Project
Networks of convenience are a relatively new and rapidly growing
sector of the ISP community.
These are networks that serve a transient population.
The provider is compensated either by fees charged to end users, or by
the hosting organization which absorbs the cost as overhead.
The networks may be wired, typically using Ethernet, or wireless
(802.11).
Relatively little is known about the ways in which these networks are
used.
The Project
We believe that Networks of Convenience may be used by criminals and
/ or terrorists in attempts to conceal their activities, identities, or both.
Networks of convenience are the “payphones” of the twenty-first
century. Users of these networks take advantage of the implicit
anonymity that comes with their use.
We do not know how common other forms of malicious activity may be
in these networks
The Project
Network traffic characterization approaches in the past have relied on
availability stable data in an environment of perfect information.
An analyst could have access to static IP and MAC address databases
or DHCP lease logs that could be used to collate traffic to specific
origins such as identifiable workstation/user combinations, servers or
other network attached devices.
In this environment, normal-versus-anomalous behaviour models could
be used to profile network and user behaviour to detect misuses or
anomalous behaviour such as masquerade attack or worm propagation.
Data Gathering
Since the sources tend to be NAT’ed, we use network taps
on the interfaces inside of the edge router. Currently
capturing inbound and outbound data separately.
Prior to analysis, full packet captures are first converted to
primitive flows.
Our research is focused on flow level analysis but this
conversion also helps to allay provider’s concerns for their
customer’s privacy. (i.e. we don’t look at your data only the
packet header)
Observations During Conversion
100 Internal IPs Monitored for 1 month.
Of all Packets Read:
• Not IPV4: 1.7%
• Fragmented: 0.06%
• Too Short: 0.0%
• Incomplete (No Ports and or Flags): 0.0%
Overall, traffic is characterised by its non-uniformity.
TCP=65%
UDP=34%
Protocol Flows were a Little Unusual
IPv6 Encapsulation
At 0.00003%
Flows by Protocol
10000000
1000000
100000
10000
Flows by Protocol
1000
100
10
1
1
2
6
17
41
47
Protocol
Multicast Host management
At 0.18%
VPN’s smaller than I expected at 0.09 %
Outbound Bytes by Host Show Large Variations
Outbound Bytes by Internal Host
10000000000
1000000000
100000000
10000000
1000000
100000
10000
1000
100
10
1
Outbound Bytes by Internal Host
Obvious in a Linear Scale
Outbound Bytes by Internal Host
2000000000
1800000000
1600000000
1400000000
1200000000
1000000000
Outbound Bytes by Internal Host
800000000
600000000
400000000
200000000
0
Lets take a closer look at this guy
Flows and Bytes to DIPs for Suspecious Host
10000000000
1000000000
100000000
10000000
100000
Flows by DIP
10000
Bytes by DIP
1000
22
100
29
10
1
Flows by DIP
43
36
15
8
1
1000000
Flows and Bytes by Dport to DIPs for Suspecious Host
VRML Multi User 4204
10000000000
1000000000
100000000
10000000
1000000
10
1
Flows by Dport
60890
52584
1000
100
40331
27535
10000
10752
778
21
100000
Flows by Dport
Bytes by Dport
0
Dport
Together they accounted for 41%
Note that the DPort 0 data point is ICMP Traffic
64843
63217
61824
60467
59232
56420
53963
51688
50430
49593
48650
47334
46171
44979
43776
42635
41345
40152
38986
37780
36656
35433
34273
33106
31984
30763
29575
28503
27403
26451
25417
24388
23303
22222
21108
20064
19010
17973
16919
15890
14886
13923
12915
12005
11111
10352
9614
8708
7678
6515
5351
4252
3366
2370
1531
We expected DPorts 80 and 443 to represent most traffic….
Flows by Dport
10000000
1000000
100000
10000
1000
Flows by Dport
100
10
1
0
Note that the DPort 0 data point is ICMP Traffic
64843
63217
61824
60467
59232
56420
53963
51688
50430
49593
48650
47334
46171
44979
43776
42635
41345
40152
38986
37780
36656
35433
34273
33106
31984
30763
29575
28503
27403
26451
25417
24388
23303
22222
21108
20064
19010
17973
16919
15890
14886
13923
12915
12005
11111
10352
9614
8708
7678
6515
5351
4252
3366
2370
1531
4204 lists as VRML Multi-User almost all from 1 host
One host only 25 flows on BitTorent
Flows by Dport
10000000
1000000
100000
10000
Flows by Dport
1000
100
10
1
Dport
Only minute traces of Half Life Gaming
SSH 10% of all flows
Mac Skype
36459
13991 and 44849
52523
Sport Flows
10000000
1000000
100000
10000
Sport Flows
1000
100
10
Fasttrack 50 hosts 1700 flows
64998
64121
63237
62291
61388
60496
59609
58730
57850
56972
56095
55218
54341
53464
52587
51710
50833
49956
48183
37001
4519
34347
3642
2765
0
1888
1
No 6667 listening?
Number of Destination by Host Shows Substantial Spikes
Total Destination Ips by Host
100000
10000
1000
100
10
1
1
191 381 571 761 951 1141 1331 1521 1711 1901 2091 2281 2471 2661 2851 3041
Linear Scale
Total Destination Ips by Host
14000
12000
10000
8000
6000
4000
2000
0
1
190 379 568 757 946 1135 1324 1513 1702 1891 2080 2269 2458 2647 2836 3025
Suspicious Host accessed sequential ranges through multiple /16`s
Lets look a little closer at his activity
0
65515
64029
62338
61005
59502
58025
57851
57754
57657
57559
57462
57338
56823
56512
56270
56041
55791
55264
54990
54755
54521
54271
53724
53469
53328
53176
53032
52935
52831
52710
52599
52438
52308
52173
51822
50949
50731
50478
50361
50255
50152
50049
49951
49850
49749
49651
49546
49440
49341
49244
36459 Dominates the Sports…
FLOWS by SPORT
100000
10000
1000
FLOWS by SPORT
100
10
1
0
Sport use is near sequential above 49153
64464
62967
61525
60375
58783
57907
57815
57723
57631
57538
57446
57310
56797
56510
56275
56061
55830
55326
55053
54826
54611
54372
54038
53596
53397
53267
53112
53004
52912
52810
52693
52583
52430
52312
52183
51951
51032
50782
50545
50396
50294
50195
50099
50003
49908
49814
49718
49624
49525
49425
49331
49239
Removing 36459 in a linear scale we get
SPort Flows with 36459 removed
2500
2000
1500
SPort Flows
1000
500
0
His DPort values in Log Scale
Flows by Dport
100000
10000
1000
Flows by Dport
100
10
64079
61036
53730
50131
48662
46824
45060
43418
41704
40054
38448
36748
35170
33611
32043
30313
28788
27308
25808
24347
22760
21214
19805
18322
16761
15281
13862
12377
11122
10126
8843
7183
5673
4073
2352
0
1
Not quite sequential but most ports above 1024 are accessed.
Average is less than 10 flows (packets) per Dport
0
62896
60145
50838
49866
48223
46455
44857
43248
41583
40025
38472
36814
35294
33796
32305
30699
29154
27715
26307
24902
23370
21859
20377
19086
17568
16092
14701
13359
12026
10830
10021
8543
6913
5506
3936
2284
Linear version Dports with Port 80 removed
Flows by Dport with 80 removed
2500
2000
1500
Dport Flows
1000
500
0
count
21005
5502
1373
1280
1261
1173
914
866
673
430
pro
6
6
6
6
17
6
6
6
6
6
dPort
80
80
443
80
1900
80
80
65209
80
80
flags
A
A
A
RA
S
S
R
FA
A
packets
1
1
1
1
1
1
1
1
1
1
bytes
40
52
40
40
61
52
48
40
40
60
Protocol Distribution for Suspicious Host
Protocol Flows
100000
10000
ICMP ratio (0.08%)
Is double
the aggregate
value (0.04%)
1000
Protocol Flows
100
10
1
1
2
Multi-Cast Host Management 0.18%
6
17
65 % TCP
34 % UDP
Would expect these to
be more equal for a peer
and vastly skewed for a
scanner.
Tradition Demands that I ask
this Question
Massive Destination IP`s (sequential /16`s)
Massive Source Ports (near sequential)
Massive Destination Ports (near sequential)
Multi-Cast Host Management Protocol
Larger than expected ICMP Ratio
Standard TCP/UDP Ratio
WHO AM I ?
Tradition Demands that I ask
this Question
Massive Destination IP`s (sequential /16`s)
Massive Source Ports (near sequential)
Massive Destination Ports (near sequential)
Multi-Cast Host Management Protocol
Larger than expected ICMP
WHO AM I ?
However, unlike previous years…..
I HAVE NO IDEA…….
Summary
Some obvious challenges
How to tell when host changes?
- Will test user-host profiler presented at
flocon 2006.
- until this is nailed down – assumptions are
more like ``let`s pretend``.
Some Intriguing Opportunities
- Oops – I`m not allowed to talk about those
yet.
Thank You
I am seeking help and would welcome any
private feedback, discussions or ideas you
might have.
If you had access to this data – what would
you do?