March 8, 2023
The CoreDNS Cache Poisoning Conjecture
One of the most important breakthroughs of recent years in the field of network security was the revival of the original Kaminsky's cache poisoning attack by Keyu Man et al.: thanks to a side-channel affecting the ICMP rate limit in the Linux kernel network stack, the technique showed how it was possible to unveil the source port of the UDP request initiated by a DNS resolver towards the name server. The original attack by Dan Kaminsky exploited the small space of random transaction IDs, $2^{16}$, to bypass the bailiwick rule which prevented rogue glue records from being cached. The fix at the time introduced yet more $2^{16}$ bits of randomness by requiring that all source ports of UDP requests initiated by the resolver be randomized. The SADDNS attack demonstrated that the randomized port could be discovered by leveraging the side-channel in order to infer which initiated UDP ports are effectively open during a port scan, therefore reducing again the effort to mount the cache poisoning. Indeed, due to the fact that the UDP protocol is connection-less, when the resolver initiates the requests, anybody knowing the source port number could send datagrams to the underlying socket, by using the source port as the destination. "Private" UDP sockets are also susceptible of this behavior, with the additional requirement that the illegitimate datagrams must spoof the legitimate IP address the resolver formerly contacted. The side channel, and a few more that were discovered by the same group, were promptly fixed by the kernel team by randomizing the way the rate limit counter was incremented. Regardless of the means by which it was deployed, the technique brought to attention once again the importance of randomized source ports of the client-side UDP requests, since they represent the one half of the randomness required to thwart the original cache poisoning attack, the other half being the transaction IDs.
With this last observations in mind, I decided to go code
spelunking through CoreDNS source code. As per its documentation, CoreDNS does
not have a native recursive resolver, therefore it's not susceptible
to the classic Kaminsky-bug and the SADDNS variations. Its exposure
to poisoning attacks is then circumscribed to the "forward" mode of
operations, besides the cache functionality itself. Both are implemented
in the form of plugins.
The forward plugin was found vulnerable to elementary cache poisoning
during a security audit by Cure53.
The vulnerability was fixed in commit
5616fcb175865f2d8ede0460e2537c3b584debad
:
plugin/forward/forward.go func (f *Forward) ServeDNS(ctx context.Context, w dns.ResponseWriter, r *dns.Msg ... // Check if the reply is correct; if not return FormErr. if !state.Match(ret) { formerr := state.ErrorMessage(dns.RcodeFormatError) w.WriteMsg(formerr) return 0, nil } ...
Method Match
of request structure is found under request/request.go
:
func (r *Request) Match(reply *dns.Msg) bool { if len(reply.Question) != 1 { return false } if !reply.Response { return false } if strings.ToLower(reply.Question[0].Name) != r.Name() { return false } if reply.Question[0].Qtype != r.QType() { return false } return true }
It basically checks if the reply matches the qname and the qtype from the request.
The cache plugin was found vulnerable
to a different form of poisoning during an audit
by Trails of Bits. Since the caching mechanism is built upon
the FNV-1 hashing function, which is not collision-resistant, it
was possible to forge a response that produced the same hash key
of a legitimate response. The vulnerability was fixed in commit
68c887f19e811ffa8bce982305a040dff549891e
:
plugin/cache/item.go type item struct { Name string QType uint16 Rcode int AuthenticatedData bool RecursionAvailable bool ... } ... func newItem(m *dns.Msg, now time.Time, d time.Duration) *item { i := new(item) if len(m.Question) != 0 { i.Name = m.Question[0].Name i.QType = m.Question[0].Qtype } ... func (i *item) matches(state request.Request) bool { if state.QType() == i.QType && strings.EqualFold(state.QName(), i.Name) { return true } return false }
After the fix, each cached item is stored with the qname and qtype of the original request, and each incoming request's qname and qtype are checked against those, even though the hash is the same.
One thing that comes to mind is that, in both Cure53 and ToB's findings, the attack scenario presumes that the attacker is serving poisoned replies from legitimate network positions, i.e. the attacker is not racing against the upstream server by flooding the forwarder with datagrams to the same source port from which the query was forwarded. This implies that the resolver with which the forwarder communicates is assumed as malicious: the forwarder has to explicitly connect to the resolver (or to another forwarder) in order to receive the poisoned response. However, under these assumptions, even with the above mitigations in place, the resolver could still return matching responses for the original qname and qtype which point to a different IP than the legitimate one.
A possible explanation for this threat model may be that in CoreDNS, all initiated UDP requests are private, i.e. the socket underlying the request is set as connected and can only receive datagrams from the same remote address to which datagrams are sent. This is necessary in order to avoid accepting random inbound junk, due to the connection-less nature of the UDP protocol. It turns out, this behavior is enforced by the Go net package:
plugin/forward/connect.go func (p *Proxy) Connect(ctx context.Context, state request.Request, opts options) (*dns.Msg, error) { start := time.Now() proto := "" switch { case opts.forceTCP: // TCP flag has precedence over UDP flag proto = "tcp" case opts.preferUDP: proto = "udp" default: proto = state.Proto() } pc, cached, err := p.transport.Dial(proto) ...
Regardless of the transport protocol of the request, or of the protocol
enforced by the plugin configuration, a Dial
method is invoked:
func (t *Transport) Dial(proto string) (*persistConn, bool, error) { ... reqTime := time.Now() timeout := t.dialTimeout() if proto == "tcp-tls" { conn, err := dns.DialTimeoutWithTLS("tcp", t.addr, t.tlsConfig, timeout) t.updateDialTimeout(time.Since(reqTime)) return &persistConn{c: conn}, false, err } conn, err := dns.DialTimeout(proto, t.addr, timeout) t.updateDialTimeout(time.Since(reqTime)) return &persistConn{c: conn}, false, err }
The effective request is carried out by the dns library package, which is
not part of CoreDNS. Therein, we can find the method DialTimeout
,
besides its siblings, under dns/client.go
:
func DialTimeout(network, address string, timeout time.Duration) (conn *Conn, err error) { client := Client{Net: network, Dialer: &net.Dialer{Timeout: timeout}} return client.Dial(address) } ... func (c *Client) Dial(address string) (conn *Conn, err error) { return c.DialContext(context.Background(), address) } func (c *Client) DialContext(ctx context.Context, address string) (conn *Conn, err error) { // create a new dialer with the appropriate timeout var d net.Dialer ... conn = new(Conn) if useTLS { ... } else { conn.Conn, err = d.DialContext(ctx, network, address) }
DialContext
method of structure Dialer
is
found in the Go standard library under net/dial.go
. It
takes at least seven intermediary function calls to finally arrive at
the function that establishes the UDP socket:
DialContext -> dialParallel -> dialSerial -> dialSingle -> dialUDP -> internetSocket -> socket -> dial
The latter is the only relevant one, the others are just setup calls:
net/sock_posix.go (fd *netFD) dial(ctx context.Context, laddr, raddr sockaddr, ...) error { ... if raddr != nil { if rsa, err = raddr.sockaddr(fd.family); err != nil { return err } if crsa, err = fd.connect(ctx, lsa, rsa); err != nil { return err } fd.isConnected = true } ...
As it can be seen, the connect
function is called upon the
file descriptor of the socket, which has the effect of rendering the UDP
socket connected. Therefore, an attacker willing to poison the
forwarder is only left with the option of blindly sending IP-spoofed
responses to the forwarder. This is indeed the classic scenario of
the original DNS cache poisoning attack: the attacker is considered
off-path, otherwise they must be already in a privileged
network position as the one assumed in the findings of Cure53 and ToB.
Attackers in a trust relationship with the forwarder (or resolver,
in the original attack), or in-the-middle, could always reply with
a poisoned response without the need to overcome the more demanding
requirements of an off-path attacker, which are the guessing of the
transaction ID and the source port of the UDP request. In a sense, the
poisoning attacks uncovered during the Cure53 and Trail of Bits audits
are necessarily more "privileged" than the classic Kaminsky's attack,
where the attacker is off-path.
This is the reason why in RFC 5452 it is mandated that the source port of the client-side connection be randomized, and this was what practically fixed the former Kaminsky bug, since at the time the source port was commonly set to a static number.
To recap: connect()
on the forwader-initiated requests leaves the
attacker with the only option of IP-spoofing the responses with which
she attempts to guess the source port and the transaction ID of the
forwarder request. This amounts to $2^{32}$ attempts to be delivered to the
forwarder in a very limited time frame, which makes the attack unfeasible
in the absence of any flaw in the way the source ports are allocated, or
in the generation of the transaction IDs.
With regards to the latter, which represent the other half of the random pool carried by the client-side requests, an issue was found in the underlying dns library which made all transaction IDs correlated and therefore easier to predict, given that an insecure non-cryptographic random generator was used for the task. The issue was promptly fixed, though.
At first glance, one might be tempted to believe that the client-side
requests initiated by the forwarder are not susceptible of carrying
improperly set source ports. Indeed, on Linux systems, the randomization
of source ports is taken care of by the kernel when the source port is
not specified by the application. The DialContext
method
under net/dial.go
of Go library always leaves the local
address, and hence the source port, unspecified by default:
type Dialer struct { ... // LocalAddr is the local address to use when dialing an // address. The address must be of a compatible type for the // network being dialed. // If nil, a local address is automatically chosen. LocalAddr Addr ... } ... func (d *Dialer) DialContext(ctx context.Context, network, address string) (Conn, error) { ... sd := &sysDialer{ Dialer: *d, network: network, address: address, } ... return sd.dialParallel(ctx, primaries, fallbacks) }
Dialer
is type-embedded into sysDialer
but
no value for LocalAddr
, nor any other field, is specified,
leaving it to its initialization value nil
. This means that
whenever the forwarder initiates a UDP connection, the source port is
effectively random.
However, CoreDNS forward plugin implements a persistent connection
cache, which makes the forwarder re-utilize connections that were already
allocated during a pre-defined time frame (default is 10 seconds).
The connection cache logic can be found under
plugin/forward/persistent.go
:
func (t *Transport) connManager() { ticker := time.NewTicker(defaultExpire) defer ticker.Stop() Wait: for { select { case proto := <-t.dial: transtype := stringToTransportType(proto) // take the last used conn - complexity O(1) if stack := t.conns[transtype]; len(stack) > 0 { pc := stack[len(stack)-1] if time.Since(pc.used) < t.expire { // Found one, remove from pool and return this conn. t.conns[transtype] = stack[:len(stack)-1] t.ret <- pc continue Wait } // clear entire cache if the last conn is expired t.conns[transtype] = nil // now, the connections being passed to closeConns() are not reachable from // transport methods anymore. So, it's safe to close them in a separate goroutine go closeConns(stack) } t.ret <- nil case pc := <-t.yield: transtype := t.transportTypeFromConn(pc) t.conns[transtype] = append(t.conns[transtype], pc) ...
The channel dial
of struct Transport
is used to
receive a string message indicating the protocol of the new connection
that the forwarder wants to initiate; each protocol defines a bucket,
and each bucket holds the currently active connections in an array;
connections are always returned in a last-in-first-out scheme. Once the
caller is done with a connection, it releases it by putting it in the
yield
channel of the transport.
The forwarder refers to the upstream resolvers as
"proxies"; connections to the proxies are initiated with the
Connect
method of structure Proxy
, under
plugin/forward/connect.go
, where we can have a better view
of the connections' lifecycle:
func (p *Proxy) Connect(ctx context.Context, state request.Request, opts options) (*dns.Msg, error) { ... ① pc, cached, err := p.transport.Dial(proto) if err != nil { return nil, err } ... if err := pc.c.WriteMsg(state.Req); err != nil { pc.c.Close() // not giving it back if err == io.EOF && cached { return nil, ErrCachedClosed } return nil, err } var ret *dns.Msg pc.c.SetReadDeadline(time.Now().Add(readTimeout)) for { ret, err = pc.c.ReadMsg() if err != nil { ... } // drop out-of-order responses if state.Req.Id == ret.Id { break } } ② p.transport.Yield(pc) ...
Line ① is where the interaction with the cache happens: method
Dial
of Transport
puts a message on the
dial
channel requesting any available cached connection for
the protocol proto
; any value from channel ret
is then waited upon:
func (t *Transport) Dial(proto string) (*persistConn, bool, error) { // If tls has been configured; use it. if t.tlsConfig != nil { proto = "tcp-tls" } t.dial <- proto pc := <-t.ret if pc != nil { ...
After writing the DNS request to the socket, the response is read and
finally the connection can be inserted in the cache again by means of
the `Yield` method invoked on line ②; the method will just put the
pc
connection object in the yield
channel
described above.
This mechanism implies that on a very busy forwarder, i.e. where connections to the upstream proxy happens regurarly fast enough to refresh the connection expiration time, the likelihood of reusing always the same connection is very high.
This behavior could, of course, be coerced: one can think of a web page triggering a XHR request every five seconds in the background when visited by a peer served by the forwarder (as in the original Kaminsky paper); more generally, an unprivileged, cooperating host (or container) whose resolution requests are served by the forwarder, can make the cached connection permanent by performing a lookup every $n < 10$ seconds. This can be easily verified by mounting a forwarder instance with a simple configuration like:
### Corefile.forwarder .:8853 { forward . 10.191.237.6:8853 log errors }
The upstream resolver is configured with the following:
### Corefile.resolver .:8853 { file doom.db log errors } ### doom.db doom.com. IN SOA dns.doom.com. gco.doom.com. 2015082541 7200 3600 1209600 3600 dns.doom.com IN A 10.71.161.139 things-viral.doom.com IN A 1.2.3.4
The forwarder has IP 10.191.237.174
and will be used to send queries through:
while true; do dig @10.191.237.174 -p 8853 things-viral.doom.com; sleep 5; done
For convenience, we print a debug message from within the Dial
function (or attach
gdb to a running instance of the forwarder):
if pc != nil { ConnCacheHitsCount.WithLabelValues(t.addr, proto).Add(1) fmt.Printf("Cached connection: LocalAddress: %v, RemoteAddress: %v ###", pc.c.LocalAddr().String(), pc.c.RemoteAddr().String()) return pc, true, nil }
This way, we have the source port easily logged:
[INFO] 10.191.237.1:57004 - 22032 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.002193317s Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:49263 - 19067 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.000724985s Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:50915 - 18066 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.000728109s Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:34574 - 47229 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.001512207s Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:54639 - 39535 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.000718289s Cached connection: LocalAddress: 10.191.237.174:38355, RemoteAddress: 10.191.237.6:8853 ###[INFO] 10.191.237.1:51254 - 29698 "A IN things-viral.doom.com. udp 62 false 1232" NOERROR qr,aa,rd 99 0.001259638s ...
As it can be seen, the source port of the UDP connection can be made static.
According to RFC 5452, section 4.5:
If the source port of the original query is random, but static, any authoritative nameserver under observation by the attacker can be used to determine this port. This means that matching this conditions often requires no guess work.
The RFC refers to the situation where the resolver initiates the connection, and since the resolver connects eventually to the nameserver (which in this case can be controlled by the attacker), a static source port is eventually leaked. In our scenario, the target of the poisoning is the forwarder, and the forwarder will not ever connect to any upstream server the attacker controls, unless one operates under the same relaxed assumptions of Cure53 and Trail of Bits' attacks. This means that the attacker still has to perform some guesswork in order to infer the single static port from the ephemeral range (32768-60999 on Linux).
The archetypal source port de-randomization attacks introduced by Herzberg and Shulman and Alharbi et al. both propose a threat model where the off-path attacker coordinates with a cooperating program on the same host that initiates the client-side requests. In both papers, the program is unprivileged and only exploits the permissive port allocation policy of the underlying operating system. Indeed, despite ulimit's apparent restriction to the number of simultaneously open sockets, the OS usually does not prevent a program to fork repeatedly and to reserve as many ports as needed, up to the extent that the exhaustion of the available port range is possible. The result is that the pool of source ports available for each forwarded DNS query becomes small and predictable.
Interestingly, this exact strategy could be devised in shared environments, in which CoreDNS is usually deployed: given that the forwarder's source port never changes, an unprivileged, co-located container on the same node of the forwarder could exploit the way the container ports are NAT'ed to the host. Without loss of generality, for example, a container instance of an image defined with a Dockerfile of the form:
FROM alpine EXPOSE 32000-40000/udp ENTRYPOINT /bin/sh
when run with the -P
option, will make the runtime engine
map all of the specified ports to host's ports sourced from the host's
ephemeral port range. The host source ports do not seem to be picked
in a random way: indeed, only the first port will be randomly picked,
whereas all the others are progressively decremented from the latter:
$ docker port $(docker ps -q) | sort -n -t'/' -k1 32000/udp -> 0.0.0.0:48970 32001/udp -> 0.0.0.0:48969 32002/udp -> 0.0.0.0:48968 32003/udp -> 0.0.0.0:48967 32004/udp -> 0.0.0.0:48966 32005/udp -> 0.0.0.0:48965 32006/udp -> 0.0.0.0:48964 32007/udp -> 0.0.0.0:48963 32008/udp -> 0.0.0.0:48962 32009/udp -> 0.0.0.0:48961 32010/udp -> 0.0.0.0:48960 32011/udp -> 0.0.0.0:48959 ... 39997/udp -> 0.0.0.0:40973 39998/udp -> 0.0.0.0:40972 39999/udp -> 0.0.0.0:40971 40000/udp -> 0.0.0.0:40970
This behavior mimics that of prominent NAT devices (including Iptables NAT), and could be used as a modern source port de-randomization primitive, which helps the attacker to restrict the available ephemeral ports to smaller ranges, whereas the static, persistent connections of the CoreDNS forwarder may extend the time window for a cache-poisoning attack indefinitely.
It is important to mention that the de-randomization scenario above is only one example where an unprivileged attacker could reduce their guessing efforts. The crux of the matter is that the persistent connection gives the attacker a reasonably long time window during which they could send multiple forged responses with several different transaction IDs, and only those, once the source port is unveiled. There is indeed a huge difference between a random source port allocated for each new connection and which lasts only a few seconds, and a single static port that, although selected randomly, hardly ever changes.
To recap:
- An off-path attacker willing to poison the cache of the forwarder could not normally infer the source port of request UDP connections initiated by the forwarder.
- Given that the forwarder connections can be made static by lookups made at regular intervals, not only does the guessing work not need to be repeated but also gives the attacker the opportunity to indefinitely try to inject spoofed responses. This effectively reduces the effort from $2^{32}$ attempts, in a limited time frame of only a few seconds, to $2^{16}$ for guessing the transaction IDs in a potentially unlimited time frame, once the source port is discovered by other means.
- If the same off-path attacker manages to obtain co-location on the same node of the forwarder, they could influence the source port allocation in order to reduce the guessing work; this is necessary because, contrary to the resolver of RFC5452#section-4.5, the forwarder does not usually connect to a "server under observation by the attacker" (as assumed in Cure53 and Trail of Bits audits).
A possible mitigation strategy which would not radically affect the connection caching mechanism could be to simply try to detect Id enumeration attempts: after a certain amount of different message Ids for the same response are collected, the connection is either destroyed or the forwarder switches to TCP. It seems that Cloudflare uses this to mitigate injected spoofed replies to their resolvers, despite their predisposition to long-lived connections.
Presently, CoreDNS forwarder only verifies that the transaction Id of the reply matches that of the former request:
plugin/forward/connect.go ... for { ret, err = pc.c.ReadMsg() if err != nil { ... } // drop out-of-order responses if state.Req.Id == ret.Id { break } }
Whichever reply carrying the right Id comes first, it might eventually succeed in poisoning the forwarder cache.