Tags:
create new tag
view all tags

IssueNscdAliasedHostsNotCached

Symptoms

Summary: name resolution fails sometimes causing applications to fail. The nscd does not cache hosts which resolve to multiple IP addresses

Occurrences

At what times did this problem occur (used to estimate frequency):

2009-06-17

Observations

The nscd does not seem to cache some host entries. Switching the debug level to >1 in the /etc/nscd.conf shows that for some hostnames always cache failures are returned, while the caching works correctly for others.

Experimentation shows that the caching systematically fails for hosts which resolve to multiple IP addresses.

As a particularly bad bonus, the lookup failures are correctly cached by the nscd, leading to a failure for all subsequent requests for that host resolution, until the cache is cleared again (negative-time-to-live config parameter, 20s by default).

This situation is extremely bad on our T3, because the DMZ nameserver that we use is protected from too many requests from the same host during short time spans. So, we get host lookup failures for these cases. The problem was noted with CRAB jobs trying to resolve cmsdbprod for registering data sets.

Test example:

cmsdbsprod resolves to two IP addresses

host cmsdbsprod.cern.ch

cmsdbsprod.cern.ch has address 128.142.142.178
cmsdbsprod.cern.ch has address 128.142.142.133

a little stress test

for ((n=1;$n<200;n=$n+1)); do gethostip cmsdbsprod.cern.ch ; done

nscd.log entry example:

...
19128: handle_request: request received (Version = 2) from PID 26386
19128:  GETHOSTBYNAME (cmsdbsprod.cern.ch)
19128: Haven't found "cmsdbsprod.cern.ch" in hosts cache!
19128: handle_request: request received (Version = 2) from PID 26386
19128:  GETHOSTBYNAME (cmsdbsprod.cern.ch)
19128: Haven't found "cmsdbsprod.cern.ch" in hosts cache!
...

Solution or Workaround

Googling brought only one reference to this problem (http://bugs.gentoo.org/196241). There, upgrading glibc to 2.8 was recommended, but no reply from the submitter is seen. We currently run glibc-2.3.4 on our SL4 installations.

An ugly workaround is to hardcode the few hosts that give problems into the /etc/hosts files. Done for the moment.

Monitoring for this condition

-- DerekFeichtinger - 17 Jun 2009

IssueForm
Affected Service all host resolution dependent services
Symptom summary name resolution fails sometimes causing applications to fail. The nscd does not cache hosts which resolve to multiple IP addresses
Reason Understood yes
Solution Exists workaround
Obsolete no
Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r4 - 2009-11-19 - DerekFeichtinger
 
This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback