Transcript Zmap

Zmap
FAST INTERNET-WIDE SCANNING AND ITS SECURITY APPLICATION
What is Zmap?

Port scanner

0-65535 (2^16)

Protocols such as TCP, UDP, ICMP…

Common examples


HTTP TCP/80, HTTPS TCP/443, DNS TCP/53 UDP/53, SIP UDP/5060, etc…
Like Nmap

Nmap is a port scanner too, released in 1997

Still in active development

Very well-known to sysadmins and power users
Why Zmap?

Nmap is great at what it does, isn’t it?

Yes, but it’s not great at what it doesn’t do

Nmap built to scan smaller segments of hosts, not the entire Internet

Simply put: Nmap is too slow at scanning 4 billion addresses
How is Zmap better?

Architecture specifically built for Internet-scale scanning


Optimised probing

Random permutation of IPv4 address space using a cyclic multiplicative group

Skip the kernel’s TCP/IP stack, open raw socket and send Ethernet frames directly over
the wire
No per-connection state


No TCP/IP stack, no half-open SYN connections, better performance
No retransmission

No reply? Oh well

Research shows that scanning at 1GigE achieves 98% network coverage using only a
single probe
How is Zmap better? (continued)

Zmap will send as quickly as the CPU and NIC will allow


Uses multiple threads in tight loops
Zmap sends raw Ethernet packets instead of relying on the TCP/IP stack


Allows for caching certain packet values

Ethernet header (excluding checksum)

Prevents: kernel routing lookup, ARP cache lookup, and Netfilter checks
Side-effect of not using TCP/IP stack closes connections on receipt
Zmap design

Zmap is modular

Probe modules


Responsible for generating the actual packets sent down the wire
Output handlers

Pipe results to another process, to a database, or do other things
Probe modules

Out-of-the-box includes

TCP SYN scanning


ICMP echo scanning


Check if a port is open
Check if a “host is alive”
Write custom handlers

Your own type of probe module for your specific purpose (e.g. TCP ACK scan, etc.)
Output modules

Out-of-the-box includes


Simple text output (list of IPs that have the specified port open)
Callbacks

Scan initialisation

Probe packet sent

Response received


Trigger application-level handshake (SSH, ESMTP, etc.)

Global progress updates

Scan termination
forge_socket kernel module

Reuse Zmap SYN packet, avoid resending SYN at application level (TCP/IP stack)

Pass raw sequence numbers and other TCP parameters to the kernel
Validation and measurement

How fast is it?

Scans the whole IPv4 address space (~2^32 addresses) in 44 minutes

Using modest hardware, but with an excellent Internet connection

Gigabit upstream needed for this, so I guess Dunedin’s pretty sorted


Not really
Test setup was on 20Gbit campus network
Is it too fast?

Nope

1,000 to 1.4M PPS

No statistically significant differences
Is one packet really enough?

Unlike Nmap, Zmap sends one packet without retransmissions



To allow amazing 44-minute Internet-scale scans
Hard to really measure

Hosts always going up and down

The world is a big place

Repeating scans could show vastly different results as packets are increased

Wouldn’t be a real snapshot of time
Tests showed that by sending 8+ SYN packets to 1% of the Internet, a distinct
number of hosts was found. Any less, and this number visibly decreases. So
extrapolating the data meant sending 1 packet would give 98% coverage,
sufficient for research purposes.
Variation with time of day

This used to not be an issue


Nmap and other tools took months to complete
Now it is…

But results show that there’s only a 3.1% difference; not too serious

Achieved by scanning port 443

Skype uses 80 or 443 when not in use by other system services, so could be
related if the router happens to forward these ports
Direct comparison with Nmap

TCP SYN scan on port 443

Using Nmap’s “insane” template – yes it’s really called that


Would take longer than 1 year to complete a full Internet scan for one single port
More customisation lowered this to 1300x slower than Zmap (63 days)
Why is Zmap so much faster?

Nmap uses the TCP/IP stack

Nmap keeps state in the kernel


And to avoid overloading it, runs serially and keeps a timeout for each probe
Zmap will accept a delayed packet at any time during the entire duration
of the scan, whereas Nmap would time out after maximum of 600ms when
using 2 probes.
Comparison with previous studies

Scan TLS certificates

Discover hosts

Perform TLS handshakes

Collect and parse resulting certificate

Done 10 hours with Zmap

It took SSL Observatory 3 computers and 3 months to do this in 2010


But with what network infrastructure?
Results discovered 196% more TLS certificates

5 years is a long time, SSL uptake has been high since then
Applications and security implications

So you can scan the IPv4 address space in an hour huh…

Where there’s good, there’s evil

Heartbleed, occurred later than the publication of Zmap

Serious vulnerability that had the potential to leak credentials and private information

People were scanning and creating databases of vulnerable hosts overnight

Researchers were also scanning to see how quickly it was getting patched

But hackers were also exploiting

“While this can be a powerful defensive tool for researchers—for instance, to measure the
severity of a problem or to track the application of a patch—it also creates the possibility for an
attacker with control of only a small number of machines to scan for and infect all public hosts
suffering from a new vulnerability within minutes.”
Tracking protocol adoption (stats)


Finding out new protocols, address depletion, common misconfigurations,
and vulnerabilities

Done before obviously, but typically only on small samples of the Internet

Full scans completed over extended period with massive
parallelisation and cloud providers with high bandwidth
availability
Zmap lowers barrier of entry

I could do it, you could do it

Maybe not at gigabit speeds thoguh
Discovering unadvertised services

Security by obscurity? Not anymore!

Example: Tor nodes

Not published for anonymity reasons

But we know how to identify Tor nodes…

Perform TLS handshake with set of cipher suits supported by Tor’s TLSv1 handshake

Gives specific response

Check self-signed certificate

Bam: 67,432 Tor nodes on port 443, and 2,952 on port 9001
More on privacy…

So you think you’re safe with a Dynamic IP?

A lot of CPE is remotely administrable by ISPs and has an SSL certificate.


Scan often, and you’ll find their new IP
Previous work has fingerprinted SIP devices and other protocols which
inadvertently expose unique identifiers
Monitor service availability

Find when services go down and when they are restored
Being a good Internet Citizen


Scanning the Internet is not recommended…

High-speed scanning uses a large amount of bandwidth

Congest your local network, your upstream’s network, or intermediaries

Scare your destination into thinking they’re being attacked
Single-probe TCP scan results in a 40-byte SYN packet

At gigabit speeds, a /24 network will receive a packet every 10 seconds

Awesome, that’s pretty negligible for the destination

But not necessarily intermediaries, and definitely not your source
Being a good Internet Citizen (cont)

If you scan at gigabit speeds, get multiple dedicated IPs

Set appropriate reverse DNS records (e.g. researchscan1.university.com)

Have a web server running on each of them explaining what it is you’re
doing and why
User responses

Having done 200 Internet-wide scans

Received e-mail responses from 145 unique recipients

Mostly to WHOIS abuse e-mail

But also to institution help desk, chief security officer and departmental administrator

Immediately responded to any inquiries explaining purpose of scan

Excluded anyone who wanted to be excluded


Totalling 4 million exclusions (0.11% of the public IPv4 address space)
Received 15 actively hostile responses

Got DDoSed twice, but that’s ok, it was null-routed at the ISP level
Conclusion

Super fast tool


Very configurable


No kernel modules needed, works with any NIC
Makes people angry


Callbacks to let you do anything you need with a positive result
No special hardware required


Can scan the Internet for good and evil in 45 minutes
Free DDoS retaliation attacks
No IP-level exclusion standard like HTTP’s robots.txt

Can’t actively opt-out without contacting the source first
Critisism

Utilising full bandwidth not exactly the best way to use your Internet

You will annoy more people over time


Mostly your closest ISPs and their peering partners

Endpoints will likely block your IPs entirely, so your results will get worse

It’s a goldmine for malicious users
There are better ways to monitor outages… http://www.outages.org/

Crowdsourced, but good enough

Doesn’t pollute the Internet with scans
Criticism (cont)


TLS Server Name Indication (SNI) allows hosting multiple SSL certificates on
a single IP

In order to get the right certificate you need to negotiate what the hostname is
at the start of the handshaking process

TLS SNI is not supported by Internet Explorer 6, but people are using better
browsers today and so the uptake is steadily and inevitably increasing
This is also what our final report will hope to solve