Solution for slow localhost lookups with getaddrinfo on OS X
Saturday, September 6, 2008
I noticed that sometimes, running \ foo edit
would take several
seconds to execute; sometimes as long as a minute. On my laptop, the
editors.jedit
vocabulary is loaded, meaning the edit hook connects to
the running jEdit instance over a socket on localhost
, transmitting
the file name and line number which defines foo
.
After fiddling around for a while with watch
and making it print
messages when a word was called and returned, I noticed my observing the
messages that it was blocking in name resolution. And indeed, resolving
localhost
seemed to take a while:
( scratchpad ) [ "localhost" 1234 resolve-host drop ] time
==== RUNNING TIME
194 ms
One would expect it to return instantly, but obviously, there’s a DNS
query happening here, which should not be the case since localhost
is
defined in /etc/hosts
.
Factor uses getaddrinfo()
for name resolution. I tried the older,
deprecated gethostbyname()
function:
( scratchpad ) FUNCTION: void* gethostbyname ( char* name ) ;
( scratchpad ) [ "localhost" gethostbyname drop ] time
==== RUNNING TIME
0 ms
No performance problem there.
The next step was to look at the tcpdump
output:
05:50:05.340667 IP 192.168.1.105.62560 > cns.westlandrdc.mi.michigan.comcast.net.domain: 27262+ SRV? _http._tcp.localhost. (38)
05:50:05.374190 IP cns.westlandrdc.mi.michigan.comcast.net.domain > 192.168.1.105.62560: 27262 NXDomain 0/1/0 (113)
05:50:05.374492 IP 192.168.1.105.62561 > cns.westlandrdc.mi.michigan.comcast.net.domain: 518+[|domain]
05:50:05.424012 IP cns.westlandrdc.mi.michigan.comcast.net.domain > 192.168.1.105.62561: 518 NXDomain 0/1/0 (138)
05:50:05.424568 IP 192.168.1.105.62562 > cns.westlandrdc.mi.michigan.comcast.net.domain: 14115+ SRV? _http._tcp.localhost.mn.comcast.net. (53)
05:50:05.474268 IP cns.westlandrdc.mi.michigan.comcast.net.domain > 192.168.1.105.62562: 14115 NXDomain 0/1/0 (105)
05:50:05.474767 IP 192.168.1.105.62563 > cns.westlandrdc.mi.michigan.comcast.net.domain: 56054+ SRV? _http._tcp.localhost.comcast.net. (50)
05:50:05.530655 IP cns.westlandrdc.mi.michigan.comcast.net.domain > 192.168.1.105.62563: 56054 NXDomain 0/1/0 (102)
As you can see, Factor is making some SRV
DNS queries here. This made
me think of the servname
parameter to getaddrinfo()
. We always set
this to "http"
, since I found that on FreeBSD, setting it to a string
representation of an integer does not work. However, on Mac OS X 10.5,
it appears that DNS queries are always made if this field is set, even
if the host name is listed in /etc/hosts
. Changing the resolver code
to pass a NULL pointer as servname
seems to fix the problem, and I
verified it to work on Windows, FreeBSD, OpenBSD, NetBSD and Linux.
A minor refactoring later, and its as fast as it should be, with no network traffic generated:
( scratchpad ) [ "localhost" 1234 resolve-host drop ] time
==== RUNNING TIME
0 ms
So perhaps this is completely obvious to some people, but I never
realized that setting servname
to a non-NULL value would cause DNS
queries to be performed when resolving localhost
. Certainly this isn’t
documented anywhere, at least not that I can see.
In conclusion, if you are developing a program which communicates with
another service running on the same machine via TCP/IP or UDP/IP, and
you notice that getaddrinfo()
blocks for a long time, try changing the
code to pass NULL for servname
.