What is Conntrack?

Posted on Apr 30, 2025 • 5 min read

As part of the last blog, I explored the NFTables mode for the Kube-Proxy and looked into potential performance gains over the IPTables mode.

One thing I skipped over while writing the last entry was to understand how the different kube-proxy modes were benchmarked against each other. As I started digging into how these metrics were captured, I kept seeing the mention of Conntrack linux networking submodule which was totally unknown to me. Fast-forward a week, I am compiling my notes here on Conntrack, and how it is utilised in the Linux networking stack.

What is conntrack?

Conntrack is the connection tracking submodule of the Linux Kernel which tracks the connections and their states. This allows us to support various widely used use-cases such as NAT and stateful firewalls.

It extracts the connection tuple (Src IP, Src Port, Dest IP, Dest Port, Protocol) from the packets received and uses that to identify the packets on the same connection. It maintains these connection entries in a database, and exposes these connection details to other layers for supporting stateful use-cases.

What does connection mean here?

Before continuing our discussion on how conntrack is used and implemented, I would like to highlight that Connection in conntrack is conceptually different from the L4 level(TCP/IP) connection layer.

TCP is connection-oriented protocol in L4 layer, which means each peer maintains the TCP state required for packet re-transmission and other features. On the other hand, UDP is a connection less protocol with no support for packet re-transmission and retries.

A connection in Conntrack is simply the flow identified by the Connection tuple defined above. This means even UDP, which is a connection-less protocol from the L4 perspective can have a connection entry identified by an connection tuple. Not just that, even ICMP which is a L3 level protocol is tracked by conntrack.

Each protocol has it’s own implementation of connection tracking. In case of ICMP, Conntrack treats the ICMP echo and it’s reply as a connection. It generates the connection tuple by reading the ICMP header values such as ICMP type and code.

How does it work?

To track the connections, Conntrack builds on the Netfilter framework in the Linux kernel:

  1. Attaches callback in the Netfilter hooks exposed by the Linux kernel.

    Netfilter hooks are different points in the lifecycle of a packet in the linux networking stack where kernel supports attaching various callbacks to allow different modules inspect the packets and perform certain actions. The below image shows the different netfilter hooks in the linux kernel.

    Netfilter Hooks in Linux Kernel

  2. Sets up the Connection table in the kernel.

    • A hash table is created by the conntrack module in the kernel. The table uses the connection tuple as the key. You can think of it as the primary key identifying a unique connection.
  3. Extracts information from the hooked packets and updates the connection entries in the connection table. Conntrack extracts the connection tuple from the L3/L4 packet headers and tries to find the related connection in the connection table:

    • If no existing entry is found, it creates a new conntrack entry for the connection.
    • If an existing entry is found, it updates the related conntrack entry state and statistics in the connection table.
  4. Garbage collects stale connections from the connection table.

Conntrack in itself doesn’t manipulate or drop packets. It’s responsibility is to keep track of the state of different connections and expose that to different submodules to make the decision.

Interaction between Conntrack, NAT and IPTables modules The two major sub-modules using this exposed state from the conntrack are:

  • IPTables/NFTables for stateful firewalls: Let’s consider the case where a system administrator wants drop any TCP ACK packets received for which there is no connection seen already. You would use the state from conntrack to setup iptable rules to drop ACK packets which don’t have an entry in the conntrack table.
  • NAT: Network Address Translation allows multiple devices(physical or virtual) to share a common public IP. Conntrack tracks the mapping between originating IP:Port to the external IP:Port used by the NAT gateway. This allows the NAT gateway to translate the addresses when transmitting packets in and out of the network.

If you have ever used Docker or Kubernetes to run containers, you have already encountered the NAT use-case. Docker and Kubernetes both use the NAT functionality to provide individual IPs to containers/pods running on the nodes.

💡 Typically, each tracked connections end up in two different keys in the hash table, one for original flow and the other for the opposite reply flow. This is bi-directional tracking is required for NAT where the IP:Ports can be different for the different directions and need to be tracked.

Conntrack events

One more amazing feature I would like to highlight here is Conntrack Events. Conntrack publishes events that allow userspace programs get insights into the lifecycle and connection state changes. These events can be used to troubleshoot and monitor network traffic. All the events can have timestamps attached, allowing uses an accurate picture of what happened in the network and when.

If you have read my previous blogs and have seen the performances numbers for the Kube-Proxy nftables backend, the performance numbers were captured by tracking the conntrack events from the kernel. This allowed exact picture into how the the first packet latency was affected with different nftables backend.

Summary

Conntrack is fundamental networking module of the Linux kernel. It acts as the stateful layer tracking the connections of the node. This allows other networking submodules like NAT and IPTables to leverage the state maintained by the Conntrack module. Conntrack treats the flow as a single connection if it has the same connection tuple which is used as the hash key in the conntrack hash table.

References