My self is steam

Insights into computer security, programming and math


March 8, 2023
The CoreDNS Cache Poisoning Conjecture

One of the most important breakthroughs of recent years in the field of network security was the revival of the original Kaminsky's cache poisoning attack by Keyu Man et al.: thanks to a side-channel affecting the ICMP rate limit in the Linux kernel network stack, the technique showed how it was possible to unveil the source port of the UDP request initiated by a DNS resolver towards the name server. The original attack by Dan Kaminsky exploited the small space of random transaction IDs, $2^{16}$, to bypass the bailiwick rule which prevented rogue glue records from being cached. The fix at the time introduced yet more $2^{16}$ bits of randomness by requiring that all source ports of UDP requests initiated by the resolver be randomized. The SADDNS attack demonstrated that the randomized port could be discovered by leveraging the side-channel in order to infer which initiated UDP ports are effectively open during a port scan, therefore reducing again the effort to mount the cache poisoning. Indeed, due to the fact that the UDP protocol is connection-less, when the resolver initiates the requests, anybody knowing the source port number could send datagrams to the underlying socket, by using the source port as the destination. "Private" UDP sockets are also susceptible of this behavior, with the additional requirement that the illegitimate datagrams must spoof the legitimate IP address the resolver formerly contacted. The side channel, and a few more that were discovered by the same group, were promptly fixed by the kernel team by randomizing the way the rate limit counter was incremented. Regardless of the means by which it was deployed, the technique brought to attention once again the importance of randomized source ports of the client-side UDP requests, since they represent the one half of the randomness required to thwart the original cache poisoning attack, the other half being the transaction IDs.

With this last observations in mind, I decided to go code spelunking through CoreDNS source code. As per its documentation, CoreDNS does not have a native recursive resolver, therefore it's not susceptible to the classic Kaminsky-bug and the SADDNS variations. Its exposure to poisoning attacks is then circumscribed to the "forward" mode of operations, besides the cache functionality itself. Both are implemented in the form of plugins. The forward plugin was found vulnerable to elementary cache poisoning during a security audit by Cure53. The vulnerability was fixed in commit 5616fcb175865f2d8ede0460e2537c3b584debad:

plugin/forward/forward.go

func (f *Forward) ServeDNS(ctx context.Context, w dns.ResponseWriter, r *dns.Msg
... 
       // Check if the reply is correct; if not return FormErr.
       if !state.Match(ret) {
               formerr := state.ErrorMessage(dns.RcodeFormatError)
               w.WriteMsg(formerr)
               return 0, nil
       }
...

Method Match of request structure is found under request/request.go:

func (r *Request) Match(reply *dns.Msg) bool {
        if len(reply.Question) != 1 {
                return false
        }

        if !reply.Response {
                return false
        }

        if strings.ToLower(reply.Question[0].Name) != r.Name() {
                return false
        }

        if reply.Question[0].Qtype != r.QType() {
                return false
        }

        return true
}

It basically checks if the reply matches the qname and the qtype from the request.

The cache plugin was found vulnerable to a different form of poisoning during an audit by Trails of Bits. Since the caching mechanism is built upon the FNV-1 hashing function, which is not collision-resistant, it was possible to forge a response that produced the same hash key of a legitimate response. The vulnerability was fixed in commit 68c887f19e811ffa8bce982305a040dff549891e:

plugin/cache/item.go

type item struct {
    Name               string
    QType              uint16
    Rcode              int
    AuthenticatedData  bool
    RecursionAvailable bool
...
}
...
func newItem(m *dns.Msg, now time.Time, d time.Duration) *item {
        i := new(item)
        if len(m.Question) != 0 {
                i.Name = m.Question[0].Name
                i.QType = m.Question[0].Qtype
        }
...

func (i *item) matches(state request.Request) bool {
        if state.QType() == i.QType && strings.EqualFold(state.QName(), i.Name) {
                return true
        }
        return false
}

After the fix, each cached item is stored with the qname and qtype of the original request, and each incoming request's qname and qtype are checked against those, even though the hash is the same.

One thing that comes to mind is that, in both Cure53 and ToB's findings, the attack scenario presumes that the attacker is serving poisoned replies from legitimate network positions, i.e. the attacker is not racing against the upstream server by flooding the forwarder with datagrams to the same source port from which the query was forwarded. This implies that the resolver with which the forwarder communicates is assumed as malicious: the forwarder has to explicitly connect to the resolver (or to another forwarder) in order to receive the poisoned response. However, under these assumptions, even with the above mitigations in place, the resolver could still return matching responses for the original qname and qtype which point to a different IP than the legitimate one.

A possible explanation for this threat model may be that in CoreDNS, all initiated UDP requests are private, i.e. the socket underlying the request is set as connected and can only receive datagrams from the same remote address to which datagrams are sent. This is necessary in order to avoid accepting random inbound junk, due to the connection-less nature of the UDP protocol. It turns out, this behavior is enforced by the Go net package:

plugin/forward/connect.go 

func (p *Proxy) Connect(ctx context.Context, state request.Request, opts options) (*dns.Msg, error) {
        start := time.Now()

        proto := ""
        switch {
        case opts.forceTCP: // TCP flag has precedence over UDP flag
                proto = "tcp"
        case opts.preferUDP:
                proto = "udp"
        default:
                proto = state.Proto()
        }

        pc, cached, err := p.transport.Dial(proto)
...

Regardless of the transport protocol of the request, or of the protocol enforced by the plugin configuration, a Dial method is invoked:

func (t *Transport) Dial(proto string) (*persistConn, bool, error) {
        ...
        reqTime := time.Now()
        timeout := t.dialTimeout()
        if proto == "tcp-tls" {
                conn, err := dns.DialTimeoutWithTLS("tcp", t.addr, t.tlsConfig, timeout)
                t.updateDialTimeout(time.Since(reqTime))
                return &persistConn{c: conn}, false, err
        }
        conn, err := dns.DialTimeout(proto, t.addr, timeout)
        t.updateDialTimeout(time.Since(reqTime))
        return &persistConn{c: conn}, false, err
}

The effective request is carried out by the dns library package, which is not part of CoreDNS. Therein, we can find the method DialTimeout, besides its siblings, under dns/client.go:

func DialTimeout(network, address string, timeout time.Duration) (conn *Conn, err error) {
        client := Client{Net: network, Dialer: &net.Dialer{Timeout: timeout}}
        return client.Dial(address)
}
...
func (c *Client) Dial(address string) (conn *Conn, err error) {
        return c.DialContext(context.Background(), address)
}

func (c *Client) DialContext(ctx context.Context, address string) (conn *Conn, err error) {
        // create a new dialer with the appropriate timeout
        var d net.Dialer
        ...
        conn = new(Conn)
        if useTLS {
            ...
        } else {
                conn.Conn, err = d.DialContext(ctx, network, address)
        }

DialContext method of structure Dialer is found in the Go standard library under net/dial.go. It takes at least seven intermediary function calls to finally arrive at the function that establishes the UDP socket:

DialContext -> dialParallel -> dialSerial -> dialSingle -> dialUDP -> internetSocket -> socket -> dial

The latter is the only relevant one, the others are just setup calls:

net/sock_posix.go

(fd *netFD) dial(ctx context.Context, laddr, raddr sockaddr, ...)  error {
...
    if raddr != nil {
        if rsa, err = raddr.sockaddr(fd.family); err != nil {
            return err
        }
        if crsa, err = fd.connect(ctx, lsa, rsa); err != nil {
            return err
        }
        fd.isConnected = true
    }
...

As it can be seen, the connect function is called upon the file descriptor of the socket, which has the effect of rendering the UDP socket connected. Therefore, an attacker willing to poison the forwarder is only left with the option of blindly sending IP-spoofed responses to the forwarder. This is indeed the classic scenario of the original DNS cache poisoning attack: the attacker is considered off-path, otherwise they must be already in a privileged network position as the one assumed in the findings of Cure53 and ToB. Attackers in a trust relationship with the forwarder (or resolver, in the original attack), or in-the-middle, could always reply with a poisoned response without the need to overcome the more demanding requirements of an off-path attacker, which are the guessing of the transaction ID and the source port of the UDP request. In a sense, the poisoning attacks uncovered during the Cure53 and Trail of Bits audits are necessarily more "privileged" than the classic Kaminsky's attack, where the attacker is off-path.

This is the reason why in RFC 5452 it is mandated that the source port of the client-side connection be randomized, and this was what practically fixed the former Kaminsky bug, since at the time the source port was commonly set to a static number.

To recap: connect() on the forwader-initiated requests leaves the attacker with the only option of IP-spoofing the responses with which she attempts to guess the source port and the transaction ID of the forwarder request. This amounts to $2^{32}$ attempts to be delivered to the forwarder in a very limited time frame, which makes the attack unfeasible in the absence of any flaw in the way the source ports are allocated, or in the generation of the transaction IDs.

With regards to the latter, which represent the other half of the random pool carried by the client-side requests, an issue was found in the underlying dns library which made all transaction IDs correlated and therefore easier to predict, given that an insecure non-cryptographic random generator was used for the task. The issue was promptly fixed, though.

At first glance, one might be tempted to believe that the client-side requests initiated by the forwarder are not susceptible of carrying improperly set source ports. Indeed, on Linux systems, the randomization of source ports is taken care of by the kernel when the source port is not specified by the application. The DialContext method under net/dial.go of Go library always leaves the local address, and hence the source port, unspecified by default:

type Dialer struct {
...
    // LocalAddr is the local address to use when dialing an
    // address. The address must be of a compatible type for the
    // network being dialed.
    // If nil, a local address is automatically chosen.
    LocalAddr Addr
...
}
...
func (d *Dialer) DialContext(ctx context.Context, network, address string) (Conn, error) {
    ...
    sd := &sysDialer{
        Dialer:  *d, 
        network: network,
        address: address,
    }
    ...
    return sd.dialParallel(ctx, primaries, fallbacks)
}

Dialer is type-embedded into sysDialer but no value for LocalAddr, nor any other field, is specified, leaving it to its initialization value nil. This means that whenever the forwarder initiates a UDP connection, the source port is effectively random.

However, CoreDNS forward plugin implements a persistent connection cache, which makes the forwarder re-utilize connections that were already allocated during a pre-defined time frame (default is 10 seconds). The connection cache logic can be found under plugin/forward/persistent.go:

func (t *Transport) connManager() {
        ticker := time.NewTicker(defaultExpire)
        defer ticker.Stop()
Wait:
        for {
                select {
                case proto := <-t.dial:
                        transtype := stringToTransportType(proto)
                        // take the last used conn - complexity O(1)
                        if stack := t.conns[transtype]; len(stack) > 0 {
                                pc := stack[len(stack)-1]
                                if time.Since(pc.used) < t.expire {
                                        // Found one, remove from pool and return this conn.
                                        t.conns[transtype] = stack[:len(stack)-1]
                                        t.ret <- pc
                                        continue Wait
                                }
                                // clear entire cache if the last conn is expired
                                t.conns[transtype] = nil
                                // now, the connections being passed to closeConns() are not reachable from
                                // transport methods anymore. So, it's safe to close them in a separate goroutine
                                go closeConns(stack)
                        }
                        t.ret <- nil

                case pc := <-t.yield:
                        transtype := t.transportTypeFromConn(pc)
                        t.conns[transtype] = append(t.conns[transtype], pc)
...

The channel dial of struct Transport is used to receive a string message indicating the protocol of the new connection that the forwarder wants to initiate; each protocol defines a bucket, and each bucket holds the currently active connections in an array; connections are always returned in a last-in-first-out scheme. Once the caller is done with a connection, it releases it by putting it in the yield channel of the transport.

The forwarder refers to the upstream resolvers as "proxies"; connections to the proxies are initiated with the Connect method of structure Proxy, under plugin/forward/connect.go, where we can have a better view of the connections' lifecycle:

func (p *Proxy) Connect(ctx context.Context, state request.Request, opts options) (*dns.Msg, error) {
        ...
①       pc, cached, err := p.transport.Dial(proto)
        if err != nil {
                return nil, err
        }
        ...

        if err := pc.c.WriteMsg(state.Req); err != nil {
                pc.c.Close() // not giving it back
                if err == io.EOF && cached {
                        return nil, ErrCachedClosed
                }
                return nil, err
        }

        var ret *dns.Msg
        pc.c.SetReadDeadline(time.Now().Add(readTimeout))
        for {
                ret, err = pc.c.ReadMsg()
                if err != nil {
                    ...
                }
                // drop out-of-order responses
                if state.Req.Id == ret.Id {
                        break
                }

 
        }

②       p.transport.Yield(pc)
    ...

Line ① is where the interaction with the cache happens: method Dial of Transport puts a message on the dial channel requesting any available cached connection for the protocol proto; any value from channel ret is then waited upon:

func (t *Transport) Dial(proto string) (*persistConn, bool, error) {
        // If tls has been configured; use it.
        if t.tlsConfig != nil {
                proto = "tcp-tls"
        }

        t.dial <- proto
        pc := <-t.ret
        if pc != nil {
    ...

After writing the DNS request to the socket, the response is read and finally the connection can be inserted in the cache again by means of the `Yield` method invoked on line ②; the method will just put the pc connection object in the yield channel described above.

This mechanism implies that on a very busy forwarder, i.e. where connections to the upstream proxy happens regurarly fast enough to refresh the connection expiration time, the likelihood of reusing always the same connection is very high.

This behavior could, of course, be coerced: one can think of a web page triggering a XHR request every five seconds in the background when visited by a peer served by the forwarder (as in the original Kaminsky paper); more generally, an unprivileged, cooperating host (or container) whose resolution requests are served by the forwarder, can make the cached connection permanent by performing a lookup every $n < 10$ seconds. This can be easily verified by mounting a forwarder instance with a simple configuration like:

### Corefile.forwarder
.:8853 {
    forward . 10.191.237.6:8853
    log
    errors
}

The upstream resolver is configured with the following:

### Corefile.resolver
.:8853 {
    file doom.db
    log
    errors
}

### doom.db
doom.com.        IN  SOA dns.doom.com. gco.doom.com. 2015082541 7200 3600 1209600 3600

dns.doom.com    IN  A   10.71.161.139
things-viral.doom.com   IN  A   1.2.3.4

The forwarder has IP 10.191.237.174 and will be used to send queries through:

while true; do dig @10.191.237.174 -p 8853 things-viral.doom.com; sleep 5; done

For convenience, we print a debug message from within the Dial function (or attach gdb to a running instance of the forwarder):

        if pc != nil {
                ConnCacheHitsCount.WithLabelValues(t.addr, proto).Add(1)
                fmt.Printf("Cached connection: LocalAddress: %v, RemoteAddress: %v ###", pc.c.LocalAddr().String(), pc.c.RemoteAddr().String())
                return pc, true, nil
        }

This way, we have the source port easily logged:

[INFO] 10.191.237.1:57004 - 22032 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.002193317s
Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:49263 - 19067 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.000724985s
Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:50915 - 18066 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.000728109s
Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:34574 - 47229 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.001512207s
Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:54639 - 39535 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.000718289s
Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:51254 - 29698 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.001259638s
...

As it can be seen, the source port of the UDP connection can be made static.

According to RFC 5452, section 4.5:

If the source port of the original query is random, but static, any authoritative nameserver under observation by the attacker can be used to determine this port. This means that matching this conditions often requires no guess work.

The RFC refers to the situation where the resolver initiates the connection, and since the resolver connects eventually to the nameserver (which in this case can be controlled by the attacker), a static source port is eventually leaked. In our scenario, the target of the poisoning is the forwarder, and the forwarder will not ever connect to any upstream server the attacker controls, unless one operates under the same relaxed assumptions of Cure53 and Trail of Bits' attacks. This means that the attacker still has to perform some guesswork in order to infer the single static port from the ephemeral range (32768-60999 on Linux).

The archetypal source port de-randomization attacks introduced by Herzberg and Shulman and Alharbi et al. both propose a threat model where the off-path attacker coordinates with a cooperating program on the same host that initiates the client-side requests. In both papers, the program is unprivileged and only exploits the permissive port allocation policy of the underlying operating system. Indeed, despite ulimit's apparent restriction to the number of simultaneously open sockets, the OS usually does not prevent a program to fork repeatedly and to reserve as many ports as needed, up to the extent that the exhaustion of the available port range is possible. The result is that the pool of source ports available for each forwarded DNS query becomes small and predictable.

Interestingly, this exact strategy could be devised in shared environments, in which CoreDNS is usually deployed: given that the forwarder's source port never changes, an unprivileged, co-located container on the same node of the forwarder could exploit the way the container ports are NAT'ed to the host. Without loss of generality, for example, a container instance of an image defined with a Dockerfile of the form:

FROM alpine
EXPOSE 32000-40000/udp
ENTRYPOINT /bin/sh

when run with the -P option, will make the runtime engine map all of the specified ports to host's ports sourced from the host's ephemeral port range. The host source ports do not seem to be picked in a random way: indeed, only the first port will be randomly picked, whereas all the others are progressively decremented from the latter:

$ docker port $(docker ps -q) | sort -n -t'/' -k1
32000/udp -> 0.0.0.0:48970
32001/udp -> 0.0.0.0:48969
32002/udp -> 0.0.0.0:48968
32003/udp -> 0.0.0.0:48967
32004/udp -> 0.0.0.0:48966
32005/udp -> 0.0.0.0:48965
32006/udp -> 0.0.0.0:48964
32007/udp -> 0.0.0.0:48963
32008/udp -> 0.0.0.0:48962
32009/udp -> 0.0.0.0:48961
32010/udp -> 0.0.0.0:48960
32011/udp -> 0.0.0.0:48959
...
39997/udp -> 0.0.0.0:40973
39998/udp -> 0.0.0.0:40972
39999/udp -> 0.0.0.0:40971
40000/udp -> 0.0.0.0:40970

This behavior mimics that of prominent NAT devices (including Iptables NAT), and could be used as a modern source port de-randomization primitive, which helps the attacker to restrict the available ephemeral ports to smaller ranges, whereas the static, persistent connections of the CoreDNS forwarder may extend the time window for a cache-poisoning attack indefinitely.

It is important to mention that the de-randomization scenario above is only one example where an unprivileged attacker could reduce their guessing efforts. The crux of the matter is that the persistent connection gives the attacker a reasonably long time window during which they could send multiple forged responses with several different transaction IDs, and only those, once the source port is unveiled. There is indeed a huge difference between a random source port allocated for each new connection and which lasts only a few seconds, and a single static port that, although selected randomly, hardly ever changes.

To recap:

A possible mitigation strategy which would not radically affect the connection caching mechanism could be to simply try to detect Id enumeration attempts: after a certain amount of different message Ids for the same response are collected, the connection is either destroyed or the forwarder switches to TCP. It seems that Cloudflare uses this to mitigate injected spoofed replies to their resolvers, despite their predisposition to long-lived connections.

Presently, CoreDNS forwarder only verifies that the transaction Id of the reply matches that of the former request:

 plugin/forward/connect.go
...
        for {
                ret, err = pc.c.ReadMsg()
                if err != nil {
                    ...
                }
                // drop out-of-order responses
                if state.Req.Id == ret.Id {
                        break
                }
        }

Whichever reply carrying the right Id comes first, it might eventually succeed in poisoning the forwarder cache.