info@papamike.ca

Understanding tcpdump

operating system used
OpenBSD 3.7 stable (kernel and sourcecode: Aug 5, 2005)

software used
program	version	manner of installation
tcpdump	3.4.0	OpenBSD default install

document history
version	date	changes
1.0.0	2004-04-14	conceived
1.0.1	2005-10-05	- new document history scheme - minor corrections and some new examples

preamble

In order to benefit from this tutorial, you should already understand the basics of the TCP/IP suite and have some experience in network administration. The main course tonight is the packet sniffer tcpdump. I will be running all software on OpenBSD 2.9. So before going any further, make sure that you at least have tcpdump installed and working on a Unix-like system.

The tutorial is organized like so:

introducing tcpdump
decomposing a sample packet
basic usage
intermediate usage

introducing tcpdump

packet capture is the real-time collection of data as it travels over a network. For our purposes, a packet is any message that has been encapsulated in various headers that uses the IP protocol to communicate. And tcpdump is a tool that does the capturing. It has a lot of switches and so the command line can become messy. Because of this I decided to structure the tutorial mostly by example. For a more systematic approach you can read the man page.

decomposing a sample packet

Let us start off by looking at one such packet as displayed by tcpdump:

1023873914.125606 fulton.ssh > spider.1145: P 3066603742:3066603806(64) ack 1646168027 win 17520 [tos 0x10] 4510 0068 7e87 4000 4006 3862 c0a8 011e c0a8 0128 0016 0479 b6c8 a8de 621e 87db 5018 4470 1813 0000 e492 152f 23c3 8a2b 4ee7 dbf8 0d48 88e8 0110 2b01 4295 39f4 52c9 a05b 31d7 e3ae 1c62 2dbd d955 d604 b5d2 63d1 8fbc 4ab7 1615 b382 571c 70e0 a368 a03f 425b 6211

What we see here does not accurately represent the structure of the packet at all. The "upper" portion is a summary of what's going on. The "lower" portion (the rows and columns), though, is the actual stuff on the wire (in "hex"). What I discovered is that the ethernet frame header (that encapsulates the IP packet) is not shown here. I determined it to be consistently 14 bytes in length (6 bytes each for destination and source mac addresses and 2 bytes for the "ethertype").

Here is a breakdown:

The black stuff is the time the packet came across our network card (not part of the packet)
The dark blue stuff is the source & source port and destination & destination port of the communication taking place.
The red stuff are TCP flags
The olive stuff is the byte sequence/range
The light blue stuff is the window size of bytes that the source (sender) is currently prepared to receive
The green stuff is the TCP type of service

Important: packet structure and information is dependent on the nature of the packet. This example packet involves TCP, SSH, and is part of the middle of the session. Learn more about TCP and IP by reading RFC793 and RFC791 respectively.

Here is a quick reference for TCP flags:

TCP Flag	Flag in tcpdump	Flag Meaning
SYN	s	Syn packet, a session establishment request. The first part of any TCP connection.
ACK	ack	Ack packet, used to acknowledge the receipt of data from the sender. May appear in conjunction with other flags.
FIN	f	Finish flag, used to indicate the sender's intention to terminate the connection to the receiving host.
RESET	r	Indicates the sender's intention to immediately abort the existing connection.
PUSH	p	Signals the immediate push of data from the sending host to the receiving host. For interactive applications such as telnet, the main issue is the quickest response time, which this "push" flag signals.
URGENT	urg	Urgent data should take precedence over other data. For example, a Ctrl-C to terminate a FTP download.
Placeholder	.	If the connection does not have a syn, finish, reset, or push flag set, this placefolder flag will be found after the destination port. Note that it also appears in conjunction with the `ack` flag.

Back to our sample packet...

So the first line is "hidden" in the hexadecimal portion of the output. The first line happens to contain information that resides within the packet headers (here IP and TCP headers). Of course you already know that a typical IP header is 20 bytes in length and TCP is another 20 bytes right? And you also know that 2 digits in hex is equivalent to one byte right? Sure, so that means that the headers lie within the first 20 chunks of hex. Here I put the IP header in yellow and the TCP header in cyan:

4510 0068 7e87 4000 4006 3862 c0a8 011e
c0a8 0128 0016 0479 b6c8 a8de 621e 87db
5018 4470 1813 0000 e492 152f 23c3 8a2b
4ee7 dbf8 0d48 88e8 0110 2b01 4295 39f4
52c9 a05b 31d7 e3ae 1c62 2dbd d955 d604
b5d2 63d1 8fbc 4ab7 1615 b382 571c 70e0
a368 a03f 425b 6211

Let's see, that's 20 chunks for the headers so we have 32 left for data. Well we see in olive that there are 64 bytes of data so there's the remaining 32 chunks. Pretty good.

Now we will satisfy ourselves with decoding just a few choice bytes:

The second chunk represents the total length of the thing in bytes. In our head we calculate that 0068 is 104 in decimal (6x16+8). So, again, we score because there's our 52 chunks again. Yes!

The second half of chunk number 5 is the protocol we're using to carry the IP data. Bingo! It shows 6 and that means TCP:

# grep tcp /etc/protocols
tcp     6       TCP             # transmission control protocol

The last 2 chunks (4 bytes) of the IP header (chunks 9 & 10) represent the destination address. Since host spider is known to be using 192.168.1.40 we can check this out:

c0 = (12 x 16) + (0 x 1) = 192
a8 = (10 x 16) + (8 x 1) = 168
01 = ( 0 x 16) + (1 x 1) =   1
28 = ( 2 x 16) + (8 x 1) =  40

Yep. Looks good.

Finally, let's look in the TCP header and decode the source port. We see in dark blue that we're in the midst of a SSH session, port 22. Source port is assigned bytes 1 & 2:

0016 = 22 (perfect)

So that ends our look at the anatomy of a sample packet. Let's move on to the business of issueing the tcpdump command.

basic usage

We can run tcpdump by simply typing:

# tcpdump

Resulting in:

23:29:04.050167 spider.3224 > 66-28-147-032.servercentral.net.6020: . ack 36517 win 16044
23:29:04.059645 66-28-147-032.servercentral.net.6020 > spider.3224: P 36517:37969(1452) ack 1 win 5840 (DF)
23:29:04.092955 daffy.pmatulis.homeunix.net.netbios-ns > 192.168.1.255.netbios-ns: nbt-query-req-bcast
23:29:04.093587 daffy.pmatulis.homeunix.net.netbios-ns > 192.168.1.255.netbios-ns: nbt-query-req-bcast
23:29:04.093836 mudra.pmatulis.homeunix.net.netbios-ns > daffy.pmatulis.homeunix.net.netbios-ns: nbt-query-positive-resp (DF)
23:29:04.095028 mudra.pmatulis.homeunix.net.netbios-ns > daffy.pmatulis.homeunix.net.netbios-ns: nbt-query-positive-resp (DF)
23:29:04.097645 daffy.pmatulis.homeunix.net.netbios-ns > 192.168.1.255.netbios-ns: nbt-registration-req-bcast
23:29:04.098410 mudra.pmatulis.homeunix.net.netbios-ns > daffy.pmatulis.homeunix.net.netbios-ns: nbt-registration-negative-resp (DF)
23:29:04.143267 66-28-147-032.servercentral.net.6020 > spider.3224: P 37969:39421(1452) ack 1 win 5840 (DF)
23:29:04.145122 spider.3224 > 66-28-147-032.servercentral.net.6020: . ack 39421 win 13140

This is a sample of traffic on my network. From this we discover some of tcpdump's default behaviour. For instance, it attempts to resolve fully qualified domain names. It also displays a timestamp in a different format from what we saw in our first sample packet. And most importantly, it does not look past the headers (hex is nowhere to be seen).

Often, we want to take a sample of traffic and save it somewhere. We accomplish this in standard Unix fashion:

# tcpdump > textfile

Proceed like this if you want to view the capture as it is being saved:

# tcpdump -l | tee textfile

That's fine but we will now really begin our study of tcpdump. We will do this by acquainting ourselves with the concept of tracefiles (or dumpfiles). Tracefiles are binary versions of the traffic samples. They require less processing power (and space) as no parsing is required to give us legible text files (so, yes, tracefiles are always created by tcpdump, whether internally or externally). The data is raw (and quite unintelligible by humans). Here is how to save a tracefile:

# tcpdump -w tracefile &

Presumably we will one day want to use that tracefile so here we go:

# tcpdump -r tracefile

This will produce legible output as if we ran tcpdump with no switches at all. So remember, the "w" and "r" switches are useless by themselves; each implies the other.

intermediate usage

Let us proceed to the next level by taming tcpdump; so far he's been quite wild. There are 4 approaches at our disposal. They involve telling tcpdump:

how to behave (program control)
what packet information to show
what traffic to capture (packet filtering)
how to display that packet information (data formatting)

1. controlling tcpdump

The switches we have seen so far (l,w,r) fall in this catagory. Here are a few more:

# tcpdump -i xl0 -c 100 -s 400

Ok, so that tells tcpdump to listen on the xl0 network interface, capture only the first 100 packets that come across it, and suck up the first 400 bytes of data from each packet (which includes the 14 bytes from the ethernet frame). The corresponding defaults are: the "lowest" interface (example: eth0 and not eth1); an indefinite number of packets; and 96 bytes.

2. what packet information to show

For the traffic that is captured we have a say in what information is shown. A few things you can try are to tell tcpdump to be "quiet", "verbose", or "very verbose". The corresponding switches are "-q", "-v", and "-vv" and, quite frankly, they are no big deal. The switches that are more interesting are "-e" and "-x". The former tells tcpdump to display the source and destination mac/ethernet addresses and the latter tells it to show us the payload of a packet in hexadecimal. This last switch is one of the more important switches available to us and it is what we used to display that first packet in the tutorial. Here are some examples:

Let's say we're just interested in the mac addresses:

# tcpdump -qec1

00:57:06.154240 0:a0:4d:3:e0:1d 0:50:be:2b:44:8f 60: spider.4454 > fulton.pmatulis.homeunix.net.ssh: tcp 0

The two mac addresses you see are the source and destination. So here, spider has mac address 0:a0:4d:3:e0:1d and fulton has mac address 0:50:be:2b:44:8f. I used the quiet option to cut down on the clutter and requested a count of only 1 packet.

Let's go further and tell tcpdump to include some data in the capture:

# tcpdump -qec1 -x

01:04:18.762895 0:50:ba:2b:44:8f 0:a0:4b:3:e0:1d 118: fulton.pmatulis.homeunix.net.ssh > spider.4454: tcp 64 (DF) [tos 0x10]
                         4510 0068 ca56 4000 4006 ec92 c0a8 011e
                         c0a8 0128 0016 1166 fe89 6677 1140 5338
                         5018 4470 9250 0000 3b6e e13e c39e cebe
                         6ad5 5a78 8d62 090c 7dcf e1f1 37e0 9f64
                         1c54 0ef7 8534 1ec9 0240 d02d c8a1 e54b
                         fa3b

A final and very useful switch for controlling tcpdump is to have the packet payload converted to ASCII. Normally this is used in conjunction with hex (the "x" switch) even though, when using this ASCII switch, hex is often displayed whether demanded or not. To display ASCII we employ the "X" switch:

# tcpdump -qec1xX

On another system this gave me info on some Samba exchanges:

21:04:34.339233 0:c:6e:77:68:8d 0:c0:4f:ac:12:3b 107: bureau.danville.ca.2343 > candyman.danville.ca.netbios-ssn: tcp 53 (DF)
0x0000   4500 005d 138d 4000 8006 635e c0a8 013f        E..]..@...c^...?
0x0010   c0a8 0120 0927 008b 8fba 2130 aa33 e823        .....'....!0.3.#
0x0020   5018 fe64 030d 0000 0000 0031 ff53 4d42        P..d.......1.SMB
0x0030   2b00 0000 0018 43c0 0000 0000 0000 0000        +.....C.........
0x0040   0000 0000 ffff fffe 0000 feff 0101 000c        ................
0x0050   004a                                           .J

3. what traffic to capture

We can capture traffic based on:

address
protocol
port
packet characteristics
combinations thereof

address filtering

An address can refer to a host, a network, a multicast/broadcast, or a mac/ethernet address
An address can be a source or a destination

So if I am interested in all IP traffic involving host mudra:

# tcpdump host mudra

Let's say I want only traffic where mudra is the destination:

# tcpdump dst host mudra

Obviously, if you use names (instead of IP addresses) name resolution must be set up.

Below we capture IP traffic involving a source ethernet address of 0:a0:3b:3:e1:1d

# tcpdump ether src host 0:a0:3b:3:e1:1d

Note the syntax: ether host {mac address}

Here we will display the first 100 packets involving network 192.168.1.0 with a netmask of 255.255.255.0 and save everything to a text file (called net_1.0.txt.dump):

# tcpdump -c 100 -l | tee net_1.0.txt.dump net 192.168.1.0 mask 255.255.255.0

Note that the switches must come first.

protocol filtering

Five common protocols can be specified by name (also called keywords):
ip tcp udp icmp igmp

The remaining protocols must be specified by number.

So if I want to capture all udp related packets:

# tcpdump udp

If I wanted to do the same but save 300 bytes of data per packet to a tracefile (called udp1000hits.dump) containing the first 1000 hits on interface vr0 then:

# tcpdump -c 1000 -i vr0 -s 300 -w udp1000hits.dump udp

Here I am telling tcpdump to capture esp related packets:

# tcpdump proto 50

Note the syntax: proto {protocol number} (see /etc/protocols)

port filtering

For example, I am sniffing around my network for any Telnet activity:

# tcpdump port 23

Note the syntax: port {port number} (see /etc/services)

packet characteristics filtering

Remember in the beginning when we were decoding fields in the headers of a packet? Well we can filter using the same strategy. Here is the syntax:

tcpdump "{protocol}[{bypass n bytes}] = {number}"

The available protocols are: ip, tcp, udp, icmp, ether, arp, rarp, and fddi. What "protocol" refers to is a little murky. If we use "ip" then it represents the IP header. If we use "icmp" (there is no such thing as an ICMP header) it represents the payload or data portion of the packet.

An indirect way to capture all icmp traffic is like this:

# tcpdump "ip[9]=1"

This works because the transport protocol is given by the tenth byte of the IP header. And we know that icmp's protocol number is "1".

Now given that host daffy is currently pinging host spider, the following command will capture only the echo replies of spider:

# tcpdump -x "icmp[0]=0"

04:20:22.056929 spider > daffy.pmatulis.homeunix.net: icmp: echo reply 4500 0054 c20e 0000 8001 f4db c0a8 0128 c0a8 0146 0000 6354 d87b 0000 3d09 a7a8 000d f46d 0809 0a0b 0c0d 0e0f 1011 1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 3435

Focus your attention on the first 2 digits of chunk 11. That's byte 1 of the data portion of the packet and it is zero. Well the first byte of an ICMP message is it's type and by golly type "0" means echo reply.

Let's try that again but this time we'll capture only what we need for the test (the 14 bytes for the frame header, the first 20 bytes for the IP header, and 1 byte for the ICMP type). We'll test for echo requests on the part of daffy this time (echo request is type "8"):

# tcpdump -x -s 35 "icmp[0]=8"

04:56:12.618507 daffy.pmatulis.homeunix.net > spider: [|icmp] 4500 0054 8c8e 0000 ff01 ab5b c0a8 0146 c0a8 0128 08

Pretty good. Exactly as expected.

Something else that's fun is to filter on TCP flags. The key here is to find where the flags are located in the TCP header. Well, they are found at byte 14 (we need to bypass 13 bytes). Therefore, we can do the following if we want to identify all packets containing the RESET flag (which must necessarily also contains the ACK flag):

# tcpdump -x "tcp[13]=20"

Why? Well first, its actually only the last 6 bits of byte 14 that supply flag information:

|U|A|P|R|S|F| |R|C|S|S|Y|I| |G|K|H|T|N|N|

Therefore ACK and RST gives us a byte of 00010100. I think the first two bits are always zero (?). Converting binary to decimal (we do not use hex [which would of been 14 or 0x14]) gives us "20".

Sometimes it may be useful to identify (datagram) fragments. In this case we filter on fields in the IP header where information regarding fragmentation is stored. Without elaborating on fragmentation theory, suffice it to say that if there is no "fragment offset" listed in the IP header then that packet is either not a fragment or it is the datagram's very first fragment.

The offset is given by bytes 7 and 8 with the exception of the first 3 bits: reserved (always zero?), "do not fragment" (DF), and "more fragments to follow" (MF). Any one datagram will therefore have these two bytes looking like this:

type bytes 7 and 8

non-fragment 0?00000000000000

first fragment 0010000000000000

intervening fragment 001xxxxxxxxxxxx

last fragment 000xxxxxxxxxxxx

Where ? may be "1" or "0" and the string of x's contains at least one "1".

So if we can do a bitwise AND operation on those two bytes with the following binary number then we know our datagram is of the first two types providing the result is zero:

0001111111111111

So this will sniff non-fragments or "frag zeroes" (initial fragments):

# tcpdump "ip[6:2] & 0001111111111111=0"

Why? Well we bypass 6 bytes and examine the next two (bytes 7 and 8). We then perform the AND operation and specify a result of zero.

In hexadecimal the expression becomes:

# tcpdump "ip[6:2] & 0x1fff=0"

compound filtering

Compound filtering is where we combine multiple filtering options in the same tcpdump command. This is very common on real networks. I can either specify exactly what I want or specify what I don't what. We do this through the use of three operators:

Negation -- (not or !)
Concatenation -- (and or &&)
Alternation -- (or or ||)

An example: Say I am working with tcpdump remotely by SSH and I want to take a sample capture of my network traffic and store it in a tracefile. Now I don't want to contaminate my capture with the remote session traffic itself so here is what I must do:

# tcpdump -w tracefile not port 22

But wait. This is no good. What if there is SSH activity on my network to begin with? Hmmm. I will tell tcpdump to forget about SSH activity between me (spider) and the remote host (fulton):

# tcpdump -w tracefile not port 22 and host spider and host fulton

Good idea but the syntax is all wrong. This is the correct command:

# tcpdump -w tracefile not (port 22 and host spider and host fulton)

We're saying to ignore anything that makes the entire contents of the parentheses true. This means packets that involve port 22 AND host spider AND host fulton will be ignored. But still, tcpdump will complain. The real final command is this:

# tcpdump -w tracefile not "(port 22 and host spider and host fulton)"

type	bytes 7 and 8
non-fragment	0?00000000000000
first fragment	0010000000000000
intervening fragment	001xxxxxxxxxxxx
last fragment	000xxxxxxxxxxxx

One big bonus is that we can specify the filtering expression in a filter file. We work with such files using the "-F" switch:

# cat > filterfile
dst host spider and "(udp or proto 51)" and not "(src host daffy or src host fulton)"
Ctrl-D
# tcpdump -F filterfile

So we can save compound filters that interest us and capture traffic with ease. By the way, the above command captures all IP traffic involving host spider as destination but not involving host daffy nor host fulton as sources. A second protocol must be either UDP or AH.

4. how to display packet information

The fourth and final approach at taming tcpdump is specifying how our data is to be displayed. It is important to realize that our filter (or the lack of one) determines what data is actually captured. After that, we tell tcpdump how to output that data. We have already looked at how to output certain kinds of data (subsection 2). In this section, we look at how to format that output.

We can format data output in the following ways:

"-a " -- force all name resolution (the default)

"-n " -- remove all name resolution

"-N " -- remove domain name resolution

"-f " -- remove all remote host name resolution

"-t " -- remove timestamp

"-tt " -- no timestamp formatting (use epoch time)
"-ttt " -- format timestamp with day and month

So there's nothing spectacular here. The most we can say is that removing name resolution can speed things up. Removing timestamps also unclutters the screen quite well. I leave it up to you, dear reader, to experiment with these formatting options.

Let me end with two more examples.

Let's say I am working at my desk and I want to keep tabs at what bastards are bombarding my firewall with junk. I could issue the following command and take a look from time to time:

# tcpdump -i tun0 -nq \
not "(port 22 and host spider)" \
and not "(port 53 or 80 or 110 or 119 or 443)" \
and dst host 216.209.50.188

on my system the interface is 'tun0'
so as not to have my ssh session interfere
so as not to show valid packets
if this is the IP currently assigned to tun0

What if I am interested in what my firewall is actually sending out onto the internet? I can remove some known good outgoing traffic and observe the rest (which ideally should not exist):

# tcpdump -i tun0 -nq \
and not port '(20 or 21 or 25 or 53 or 80 or 110 or 119 or 123 or 443)' \
and not icmp \
and src host 216.209.50.188

I became alarmed when I was seeing outgoing traffic to Microsoft ports 135 and 445. After some frantic research I discovered that I was indeed sending responses back to internet hosts but it was due to the way my OpenBSD firewall was set up. It was resetting connections instead of allowing them to time out.