Get DuckDB swimming in your Packet Pond in 10 Minutes!

Get DuckDB swimming in your Packet Pond in 10 Minutes!

At some point over the last few years I discovered DuckDB as an alternative lightweight SQL database that is super fast and has support for a wide variety of file formats, but until now I'd only used CSV and Parquet, but recently a pcap extension was released, so I thought I'd try it out! Fortunately now there are binaries available so you don't have to go through the headache of a building at DuckDB extension.

Installation

If you don't already have it installed, install DuckDB 1.0 on on your platform.

Following the instructions from the updated blog post at the bottom, you then install the pre-compiled extension.

$ duckdb -unsigned
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
D SET custom_extension_repository='https://w3c2.c20.e2-5.dev/ppcap/latest';
D install ppcap;
D load ppcap;

If you want to see all the extensions available, use the following query:

D select extension_name from duckdb_extensions() where loaded = True;
┌────────────────┐
│ extension_name │
│    varchar     │
├────────────────┤
│ autocomplete   │
│ fts            │
│ icu            │
│ inet           │
│ jemalloc       │
│ json           │
│ parquet        │
│ ppcap          │
│ shell          │
│ tpch           │
├────────────────┤
│    10 rows     │
└────────────────┘

This creates the following files in ~/.duckdb

$ ls -alR
.:
total 12
drwxr-xr-x  3 mfranz mfranz 4096 Aug 31 12:54 .
drwxr-x--- 41 mfranz mfranz 4096 Aug 31 12:56 ..
drwxr-xr-x  3 mfranz mfranz 4096 Aug 31 12:54 extensions

./extensions:
total 12
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 .
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 ..
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 v1.0.0

./extensions/v1.0.0:
total 12
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 .
drwxr-xr-x 3 mfranz mfranz 4096 Aug 31 12:54 ..
drwxr-xr-x 2 mfranz mfranz 4096 Aug 31 12:55 linux_amd64_gcc4

./extensions/v1.0.0/linux_amd64_gcc4:
total 22980
drwxr-xr-x 2 mfranz mfranz     4096 Aug 31 12:55 .
drwxr-xr-x 3 mfranz mfranz     4096 Aug 31 12:54 ..
-rw-rw-r-- 1 mfranz mfranz 19712350 Aug 31 12:54 httpfs.duckdb_extension
-rw-rw-r-- 1 mfranz mfranz      131 Aug 31 12:54 httpfs.duckdb_extension.info
-rw-rw-r-- 1 mfranz mfranz  3797542 Aug 31 12:55 ppcap.duckdb_extension
-rw-rw-r-- 1 mfranz mfranz      144 Aug 31 12:55 ppcap.duckdb_extension.info

Loading your PCAP

You don't have to install the extension each time so the next time you run, you just load it it then create a table from your pcap

$ duckdb -unsigned
v1.0.0 1f98600c2c
Enter ".help" for usage hints.
D 
D load ppcap;
D 
D create table bridged as from read_pcap("~/bridged.pcap");

All your normal SQL commands work

D describe bridged;
┌─────────────┬─────────────┬──────┬─────┬─────────┬───────┐
│ column_name │ column_type │ null │ key │ default │ extra │
├─────────────┼─────────────┼──────┼─────┼─────────┼───────┤
│ timestamp   │ TIMESTAMP   │ YES  │     │         │       │
│ source_ip   │ VARCHAR     │ YES  │     │         │       │
│ dest_ip     │ VARCHAR     │ YES  │     │         │       │
│ source_port │ INTEGER     │ YES  │     │         │       │
│ dest_port   │ INTEGER     │ YES  │     │         │       │
│ length      │ INTEGER     │ YES  │     │         │       │
│ tcp_session │ VARCHAR     │ YES  │     │         │       │
│ source_mac  │ VARCHAR     │ YES  │     │         │       │
│ dest_mac    │ VARCHAR     │ YES  │     │         │       │
│ protocols   │ VARCHAR[]   │ YES  │     │         │       │
│ payload     │ BLOB        │ YES  │     │         │       │
│ tcp_flags   │ VARCHAR[]   │ YES  │     │         │       │
│ tcp_seq_num │ UINTEGER    │ YES  │     │         │       │
└─────────────┴─────────────┴──────┴─────┴─────────┴───────┘

D select count(*) from bridged;
┌──────────────┐
│ count_star() │
├──────────────┤
│ 104851       │
└──────────────┘

D .mode line
D pragma database_size;
database_name = memory
database_size = 0 bytes
   block_size = 0
 total_blocks = 0
  used_blocks = 0
  free_blocks = 0
     wal_size = 0 bytes
 memory_usage = 114.6 MiB
 memory_limit = 37.5 GiB

Slicing and Dicing with SQL

What are the IP protocols active on the network. Mostly UDP! (I do wish this field had been an INTEGER instead of a VARCHAR.

D select protocols, count(*) as cnt from bridged group by protocols order by cnt desc;
┌─────────────────────────┬────────┐
│        protocols        │  cnt   │
├─────────────────────────┼────────┤
│ [Ethernet, IP, UDP]     │ 101644 │
│ [Ethernet, IP, TCP]     │ 2831   │
│ [Ethernet, IP, Unknown] │ 346    │
│ [Ethernet, IP, ICMP]    │ 30     │
└─────────────────────────┴────────┘

Let's find all the IPs on my local network

D SELECT distinct (source_ip) from bridged where source_ip like '192.168.12%';
┌────────────────┐
│   source_ip    │
├────────────────┤
│ 192.168.12.131 │
│ 192.168.12.159 │
│ 192.168.12.113 │
│ 192.168.12.1   │
└────────────────┘

Let's go down to the link layer and see where most of the frames are going from!

D SELECT DISTINCT (source_mac, dest_mac) from bridged;
┌────────────────────────────────────────┐
│     main.row(source_mac, dest_mac)     │
├────────────────────────────────────────┤
│ (00:16:3e:9b:21:27, 33:33:00:00:00:16) │
│ (00:16:3e:7d:5c:ca, 33:33:00:00:00:16) │
│ (d6:06:10:3e:e9:ed, 56:1c:4b:ac:c3:15) │
│ (3a:99:83:4e:f6:9f, 33:33:00:00:00:16) │
│ (c8:99:b2:e1:47:17, 01:80:c2:00:00:13) │
│ (3a:99:83:4e:f6:9f, 33:33:00:00:00:02) │
│ (56:1c:4b:ac:c3:15, d6:06:10:3e:e9:ed) │
│ (56:1c:4b:ac:c3:15, 01:00:5e:7f:ff:fa) │
│ (c8:99:b2:e1:47:17, 33:33:00:00:00:01) │
│ (3a:99:83:4e:f6:9f, 01:00:5e:00:00:fb) │
│ (84:47:09:19:c3:16, 33:33:00:00:00:16) │
│ (00:16:3e:53:19:f5, 33:33:00:00:00:16) │
│ (00:16:3e:aa:1b:f3, 33:33:00:00:00:16) │
│ (52:54:00:8b:d8:ac, 33:33:00:00:00:16) │
│ (56:1c:4b:ac:c3:15, c8:99:b2:e1:47:17) │
│ (d6:06:10:3e:e9:ed, 33:33:00:00:00:16) │
│ (c8:99:b2:e1:47:1c, 01:80:c2:00:00:13) │
│ (c8:99:b2:e1:47:17, 56:1c:4b:ac:c3:15) │
│ (3a:99:83:4e:f6:9f, 33:33:00:00:00:fb) │
│ (3a:99:83:4e:f6:9f, 01:00:5e:00:00:16) │
│ (c8:99:b2:e1:47:17, 33:33:ff:45:46:08) │
└────────────────────────────────────────┘



D select dest_mac, count(*) as cnt from bridged group by dest_mac order by cnt desc;
┌───────────────────┬───────┐
│     dest_mac      │  cnt  │
├───────────────────┼───────┤
│ 56:1c:4b:ac:c3:15 │ 52469 │
│ d6:06:10:3e:e9:ed │ 51857 │
| c8:99:b2:e1:47:17 │ 492   │
│ 33:33:00:00:00:16 │ 16    │
│ 01:00:5e:7f:ff:fa │ 4     │
│ 33:33:00:00:00:fb │ 3     │
│ 01:00:5e:00:00:fb │ 3     │
│ 01:80:c2:00:00:13 │ 2     │
│ 01:00:5e:00:00:16 │ 2     │
│ 33:33:00:00:00:02 │ 1     │
│ 33:33:ff:45:46:08 │ 1     │
│ 33:33:00:00:00:01 │ 1     │
└───────────────────┴───────┘

The first entry is my workstation so it makes sense most of the comms are with my router.

D select source_ip, count(*) as cnt from bridged where dest_mac = '56:1c:4b:ac:c3:15' and source_ip like '192.168.12%' group by source_ip order by cnt desc limit 25;
┌────────────────┬───────┐
│   source_ip    │  cnt  │
├────────────────┼───────┤
│ 192.168.12.159 │ 51977 │
│ 192.168.12.1   │ 36    │
└────────────────┴───────┘

Let's see the unique ports

D select distinct(dest_port) from bridged where dest_mac = 'c8:99:b2:e1:47:17' and source_port != 0;
┌───────────┐
│ dest_port │
├───────────┤
│ 1900      │
│ 3478      │
│ 443       │
│ 80        │
│ 53        │
│ 5351      │
└───────────┘


D select dest_port, count(*) as cnt from bridged where dest_mac = 'c8:99:b2:e1:47:17' and source_port != 0 group by dest_port order by cnt desc;
┌───────────┬─────┐
│ dest_port │ cnt │
├───────────┼─────┤
│ 443       │ 181 │
│ 3478      │ 71  │
│ 80        │ 37  │
│ 53        │ 26  │
│ 5351      │ 8   │
│ 1900      │ 2   │

Looking at the packets!

Getting used to switching output formats is useful depending on your screen resolution.

D select * from bridged where length < 15 limit 3;
  timestamp = 2024-08-31 16:56:25
  source_ip = 192.168.12.131
    dest_ip = 209.177.145.120
source_port = 46017
  dest_port = 443
     length = 14
tcp_session = 192.168.12.131:46017 -> 209.177.145.120:443
 source_mac = 56:1c:4b:ac:c3:15
   dest_mac = c8:99:b2:e1:47:17
  protocols = [Ethernet, IP, TCP]
    payload = \x00\x00\x01\x01\x08\x0A\xF5\xB3K.\x8B\xFB\x96\x1C
  tcp_flags = [ACK]
tcp_seq_num = 3986401156

  timestamp = 2024-08-31 16:56:25
  source_ip = 192.168.12.159
    dest_ip = 192.168.12.131
source_port = 24679
  dest_port = 6443
     length = 14
tcp_session = 192.168.12.159:24679 -> 192.168.12.131:6443
 source_mac = d6:06:10:3e:e9:ed
   dest_mac = 56:1c:4b:ac:c3:15
  protocols = [Ethernet, IP, TCP]
    payload = \x00\x00\x01\x01\x08\x0A\xAE\x87H\xD5\xBC\xF6d\x12
  tcp_flags = [ACK]
tcp_seq_num = 1817817319

  timestamp = 2024-08-31 16:56:25
  source_ip = 192.168.12.159
    dest_ip = 192.168.12.131
source_port = 24679
  dest_port = 6443
     length = 14
tcp_session = 192.168.12.159:24679 -> 192.168.12.131:6443
 source_mac = d6:06:10:3e:e9:ed
   dest_mac = 56:1c:4b:ac:c3:15
  protocols = [Ethernet, IP, TCP]
    payload = \x00\x00\x01\x01\x08\x0A\xAE\x87H\xD5\xBC\xF6d\x12
  tcp_flags = [ACK]
tcp_seq_num = 1817817319

References

Reading PCAP Files (Directly) With DuckDB - rud.is
2024-08-30 UPDATE: Binary versions of this extension are available for amd64 Linux (linux_amd64 & linux_amd64_gcc4) and Apple Silicon. (osx_arm64). $ duckdb -unsigned v1.0.0 1f98600c2c Enter ”.help” for usage hints. Connected to a transient in-memory database. Use ”.open FILENAME” to reopen on a persistent database. D SET custom_extension_repository=‘https://w3c2.c20.e2-5.dev/ppcap/latest’; D INSTALL ppcap; D LOAD ppcap; 2024-08-29 UPDATE: […]
matano-scripts/data/cloudtrail/duckdb at main · mdfranz/matano-scripts
Random Things for Interacting with Matano. Contribute to mdfranz/matano-scripts development by creating an account on GitHub.