Structure Preserving Anonymization of Router Configuration Data

Download Report

Transcript Structure Preserving Anonymization of Router Configuration Data

Structure Preserving Anonymization
of Router Configuration Data
David A. Maltz,
Jibin Zhan, Geoffrey Xie, Hui Zhang
Carnegie Mellon University
Gisli Hjalmtysson, Albert Greenberg, Jennifer Rexford
ATT Labs Research
1
Why Configuration Files are Valuable
Configuration file = program loaded on each router
• Controls operation of router
• Controls interactions between routers
Configuration files allow researchers to study of the
details of real networks
• The problem is getting access to them
• We have developed a technique for
anonymizing configuration files
• We have a proposal for how configs could be
made accessible to the research community
2
Why Configuration Files are Valuable - 2
The set of configurations defines the network
• Captures many of the network’s properties
–
–
–
–
Topology (node degree, interconnectivity)
Policies (CoS, QoS, packet filters, reachability)
Routing (neighbors, OSPF weights, BGP policies)
Security (vulnerabilities, mitigations)
Only source of insight for Enterprise networks
• 10K+ networks that are currently a mystery
• Interesting! 10 – 1200 routers, global scale
• Configs are the only way to look at them
– Networks firewalled, external probes dropped
3
Topology
Internet
Router 1 Config
Router 2 Config
interface Serial1/0.5
interface Serial2/1.5
ip address 1.1.1.1/30
ip address 1.1.1.2/30
4
Quality of Service
class-map GoodCustomer
match access-group 136
policy-map GoldService
class GoodCustomer
bandwidth 2000
queue-limit 40
class class-default
fair-queue 16
queue-limit 20
interface Serial0/0
service-policy output GoldService
Class definition
CB-WFQ parameters
CB-WFQ policy name
5
Routing
AS Numbers
router bgp 65501
neighbor EdgeSwitch peer-group
Policies
neighbor EdgeSwitch remote-as 64740
neighbor EdgeSwitch distribute-list 11 in
neighbor EdgeSwitch route-map exportRoutes out
neighbor 192.168.96.8 peer-group EdgeSwitch
neighbor 192.168.96.9 peer-group EdgeSwitch
neighbor 10.217.248.14 remote-as 65500
neighbor 10.217.248.14 ebgp-multihop 5
Peers
6
Security Issues
access-list 143 deny 53 any any
access-list 143 deny 55 any any
access-list 143 deny 77 any any
access-list 143 permit ip any any
Access list 143:
Drops packets that can
attack Cisco interfaces
interface Serial0.2 multipoint
This interface is
ip access-group 143 in
ip address 66.248.162.13 255.255.255.224 safe
interface Ethernet0
ip address 144.201.41.59 255.255.255.0
This interface is
not
7
How to Get Configuration Files?
Considered proprietary secrets of network owners
• Discloses business strategy
• Discloses vulnerabilities
Anonymization breaks tie between data and owner
• Anonymized configs will show some network is vulnerable,
but which/where to attack?
We developed method for anonymizing configuration files
• Approach convinced some customers of ATT to disclose
their configs to CMU researchers
8
Anonymization Challenges
We don’t know the intended use of the data
• Must anonymize entire configuration file
• A customized data set is easier to anonymize
Must preserve structure of information in files
• Relationships of identifiers inside/between files
• IP address subnet relationships
Traditional parsing tools are of no use
• No published grammar for Cisco IOS
• 200+ different versions seen in 31 networks
9
Anonymize Non-numeric Tokens
Created “pass list” of words by string-scraping Cisco’s
web pages
• Contains most IOS commands
• Other words are generic networking terms (“IETF”)
All tokens not in pass list are hashed with salted SHA1
router bgp 64780
redistribute ospf 64 match route-map NYOffice
neighbor 1.2.3.4 remote-as 701
route-map NYOffice deny 10
match ip address 4
router bgp 64780
redistribute ospf 64 match route-map 8aTzlvBrbaW
neighbor 66.253.160.68 remote-as 701
route-map 8aTzlvBrbaW deny 10
match ip address 4
10
Anonymize Specific Numbers
Most numbers are harmless, some reveal identity
• Public AS numbers
• Phone numbers (NOCs, backup modems)
26 rules used to find and anonymize context-dependent items
•
•
"neighbor\\s+$ipAddrPatt\\s+remote-as"
" neighbor\s+\w+\s+remote-as "
router bgp 64780
redistribute ospf 64 match route-map NYOffice
neighbor 1.2.3.4 remote-as 701
route-map NYOffice deny 10
match ip address 4
router bgp 64780
redistribute ospf 64 match route-map 8aTzlvBrbaW
neighbor 66.253.160.68 remote-as 1237
route-map 8aTzlvBrbaW deny 10
match ip address 4
11
Limits of Anonymization
Anonymization is a lossy process
• Comments & meaningful identifiers removed
• (Were they right anyway???)
Anonymizer preserves relationships it knows about
• Doesn’t know about IP addr <-> ASN mapping
• A packet filter, based on IP address, and route
policy, based on ASN, could target same AS
• Post-anonymization: both mechanisms preserved,
but won’t show them targeting same AS
• (Router didn’t have that external information either)
12
Potential Vulnerabilities: Textual Attacks
Identifying information left in configs
Heuristics used as double-check
• Rules that anonymize public AS numbers
record the public AS numbers they find
• Search post-anonymization file for any
remaining occurrences
13
Potential Vulnerabilities:
Fingerprinting Attacks
Network characteristics (fingerprint) extracted from
anonymized configs matched against public data
Potential fingerprints
• BGP community strings
• Number of POPs, number of BGP peers
• Structure of address space utilization
• Others…
Evaluation still in progress
• Seems like backbone networks are identifiable
• Seems like enterprise networks are not
14
A Clearinghouse for Configuration Data
Network owners
Retrieve
Anonymizer
Questions
Results
Anonymize &
test configs
Run tools on site:
Scalable, pictures
Upload configs
Blinded
email
Website enforcing single-blind methodology
Retrieve
configs
Register with
site
Analyze
data
Blinded
email
Questions
Results
Researchers
Boot-strap with configs from academic/research institutions?
15
Questions?
16
Fingerprinting Attacks
BGP
Peers
per
POP
Data from networks in repository
of anonymized configs
POPs (sorted by peers/POP)
1. For each anonymized network, compute fingerprint from
anonymized config files
• Will be 100% accurate
2. Experimentally measure real networks
17
Fingerprinting Attacks
BGP
Peers
per
POP
Measured network characteristics
POPs (sorted by peers/POP)
Evaluation still in progress
• Seems like backbone networks are identifiable
• Seems like enterprise networks are not
18
Anonymize Regular Expressions
Some AS numbers appear in regular expressions
• Expressions w/ only private AS numbers ! no change
ip as-path access-list 99 permit _6451[2-9]_
64512, 64513, … 64519
ip as-path access-list 99 permit _6451[2-9]_
• Expressions w/ public AS numbers ! expand and
anonymize
ip as-path access-list 101 permit _70 [1-3]_
701, 702, 703 Anonymize
1234, 543, 21
ip as-path access-list 101 permit _(1234|543|21)_
19
Anonymize IP Addresses
Extended Minshall’s prefix-preserving algorithm
Made it class preserving
• Class A to Class A, etc.
– RIP and older protocols are class-full
Made it “subnet address” preserving
• Assume 128.2.0.0/16 is subnet
• We want 128.2.0.0 ! 150.7.0.0
• Before extension, 128.2.0.0 ! 150.7.43.66
20
Anonymize IP Addresses - 2
Made it “special address” preserving
• Multicast, private address space
• Must fix collisions in mapping function
IP Addr
Special?
N
Anonymize
Y
Special?
Y
N
21
Anonymization Overview
Minimize dependence on context
• If in-doubt, hash it out
1. Remove all comments
2. Find all IP addresses and hash using specialized
prefix-preserving anonymization
3. Hash all non-numeric tokens not known to be safe
4. Anonymize specific numeric tokens using regular
expressions
5. Anonymize regular expressions appearing in
configs
22