Transcript Zmap
Zmap
FAST INTERNET-WIDE SCANNING AND ITS SECURITY APPLICATION
What is Zmap?
Port scanner
0-65535 (2^16)
Protocols such as TCP, UDP, ICMP…
Common examples
HTTP TCP/80, HTTPS TCP/443, DNS TCP/53 UDP/53, SIP UDP/5060, etc…
Like Nmap
Nmap is a port scanner too, released in 1997
Still in active development
Very well-known to sysadmins and power users
Why Zmap?
Nmap is great at what it does, isn’t it?
Yes, but it’s not great at what it doesn’t do
Nmap built to scan smaller segments of hosts, not the entire Internet
Simply put: Nmap is too slow at scanning 4 billion addresses
How is Zmap better?
Architecture specifically built for Internet-scale scanning
Optimised probing
Random permutation of IPv4 address space using a cyclic multiplicative group
Skip the kernel’s TCP/IP stack, open raw socket and send Ethernet frames directly over
the wire
No per-connection state
No TCP/IP stack, no half-open SYN connections, better performance
No retransmission
No reply? Oh well
Research shows that scanning at 1GigE achieves 98% network coverage using only a
single probe
How is Zmap better? (continued)
Zmap will send as quickly as the CPU and NIC will allow
Uses multiple threads in tight loops
Zmap sends raw Ethernet packets instead of relying on the TCP/IP stack
Allows for caching certain packet values
Ethernet header (excluding checksum)
Prevents: kernel routing lookup, ARP cache lookup, and Netfilter checks
Side-effect of not using TCP/IP stack closes connections on receipt
Zmap design
Zmap is modular
Probe modules
Responsible for generating the actual packets sent down the wire
Output handlers
Pipe results to another process, to a database, or do other things
Probe modules
Out-of-the-box includes
TCP SYN scanning
ICMP echo scanning
Check if a port is open
Check if a “host is alive”
Write custom handlers
Your own type of probe module for your specific purpose (e.g. TCP ACK scan, etc.)
Output modules
Out-of-the-box includes
Simple text output (list of IPs that have the specified port open)
Callbacks
Scan initialisation
Probe packet sent
Response received
Trigger application-level handshake (SSH, ESMTP, etc.)
Global progress updates
Scan termination
forge_socket kernel module
Reuse Zmap SYN packet, avoid resending SYN at application level (TCP/IP stack)
Pass raw sequence numbers and other TCP parameters to the kernel
Validation and measurement
How fast is it?
Scans the whole IPv4 address space (~2^32 addresses) in 44 minutes
Using modest hardware, but with an excellent Internet connection
Gigabit upstream needed for this, so I guess Dunedin’s pretty sorted
Not really
Test setup was on 20Gbit campus network
Is it too fast?
Nope
1,000 to 1.4M PPS
No statistically significant differences
Is one packet really enough?
Unlike Nmap, Zmap sends one packet without retransmissions
To allow amazing 44-minute Internet-scale scans
Hard to really measure
Hosts always going up and down
The world is a big place
Repeating scans could show vastly different results as packets are increased
Wouldn’t be a real snapshot of time
Tests showed that by sending 8+ SYN packets to 1% of the Internet, a distinct
number of hosts was found. Any less, and this number visibly decreases. So
extrapolating the data meant sending 1 packet would give 98% coverage,
sufficient for research purposes.
Variation with time of day
This used to not be an issue
Nmap and other tools took months to complete
Now it is…
But results show that there’s only a 3.1% difference; not too serious
Achieved by scanning port 443
Skype uses 80 or 443 when not in use by other system services, so could be
related if the router happens to forward these ports
Direct comparison with Nmap
TCP SYN scan on port 443
Using Nmap’s “insane” template – yes it’s really called that
Would take longer than 1 year to complete a full Internet scan for one single port
More customisation lowered this to 1300x slower than Zmap (63 days)
Why is Zmap so much faster?
Nmap uses the TCP/IP stack
Nmap keeps state in the kernel
And to avoid overloading it, runs serially and keeps a timeout for each probe
Zmap will accept a delayed packet at any time during the entire duration
of the scan, whereas Nmap would time out after maximum of 600ms when
using 2 probes.
Comparison with previous studies
Scan TLS certificates
Discover hosts
Perform TLS handshakes
Collect and parse resulting certificate
Done 10 hours with Zmap
It took SSL Observatory 3 computers and 3 months to do this in 2010
But with what network infrastructure?
Results discovered 196% more TLS certificates
5 years is a long time, SSL uptake has been high since then
Applications and security implications
So you can scan the IPv4 address space in an hour huh…
Where there’s good, there’s evil
Heartbleed, occurred later than the publication of Zmap
Serious vulnerability that had the potential to leak credentials and private information
People were scanning and creating databases of vulnerable hosts overnight
Researchers were also scanning to see how quickly it was getting patched
But hackers were also exploiting
“While this can be a powerful defensive tool for researchers—for instance, to measure the
severity of a problem or to track the application of a patch—it also creates the possibility for an
attacker with control of only a small number of machines to scan for and infect all public hosts
suffering from a new vulnerability within minutes.”
Tracking protocol adoption (stats)
Finding out new protocols, address depletion, common misconfigurations,
and vulnerabilities
Done before obviously, but typically only on small samples of the Internet
Full scans completed over extended period with massive
parallelisation and cloud providers with high bandwidth
availability
Zmap lowers barrier of entry
I could do it, you could do it
Maybe not at gigabit speeds thoguh
Discovering unadvertised services
Security by obscurity? Not anymore!
Example: Tor nodes
Not published for anonymity reasons
But we know how to identify Tor nodes…
Perform TLS handshake with set of cipher suits supported by Tor’s TLSv1 handshake
Gives specific response
Check self-signed certificate
Bam: 67,432 Tor nodes on port 443, and 2,952 on port 9001
More on privacy…
So you think you’re safe with a Dynamic IP?
A lot of CPE is remotely administrable by ISPs and has an SSL certificate.
Scan often, and you’ll find their new IP
Previous work has fingerprinted SIP devices and other protocols which
inadvertently expose unique identifiers
Monitor service availability
Find when services go down and when they are restored
Being a good Internet Citizen
Scanning the Internet is not recommended…
High-speed scanning uses a large amount of bandwidth
Congest your local network, your upstream’s network, or intermediaries
Scare your destination into thinking they’re being attacked
Single-probe TCP scan results in a 40-byte SYN packet
At gigabit speeds, a /24 network will receive a packet every 10 seconds
Awesome, that’s pretty negligible for the destination
But not necessarily intermediaries, and definitely not your source
Being a good Internet Citizen (cont)
If you scan at gigabit speeds, get multiple dedicated IPs
Set appropriate reverse DNS records (e.g. researchscan1.university.com)
Have a web server running on each of them explaining what it is you’re
doing and why
User responses
Having done 200 Internet-wide scans
Received e-mail responses from 145 unique recipients
Mostly to WHOIS abuse e-mail
But also to institution help desk, chief security officer and departmental administrator
Immediately responded to any inquiries explaining purpose of scan
Excluded anyone who wanted to be excluded
Totalling 4 million exclusions (0.11% of the public IPv4 address space)
Received 15 actively hostile responses
Got DDoSed twice, but that’s ok, it was null-routed at the ISP level
Conclusion
Super fast tool
Very configurable
No kernel modules needed, works with any NIC
Makes people angry
Callbacks to let you do anything you need with a positive result
No special hardware required
Can scan the Internet for good and evil in 45 minutes
Free DDoS retaliation attacks
No IP-level exclusion standard like HTTP’s robots.txt
Can’t actively opt-out without contacting the source first
Critisism
Utilising full bandwidth not exactly the best way to use your Internet
You will annoy more people over time
Mostly your closest ISPs and their peering partners
Endpoints will likely block your IPs entirely, so your results will get worse
It’s a goldmine for malicious users
There are better ways to monitor outages… http://www.outages.org/
Crowdsourced, but good enough
Doesn’t pollute the Internet with scans
Criticism (cont)
TLS Server Name Indication (SNI) allows hosting multiple SSL certificates on
a single IP
In order to get the right certificate you need to negotiate what the hostname is
at the start of the handshaking process
TLS SNI is not supported by Internet Explorer 6, but people are using better
browsers today and so the uptake is steadily and inevitably increasing
This is also what our final report will hope to solve