47

I'm learning about puppet and trying to experiment with it on a VM at home. I'm not using a puppet server yet, just running things locally. It works okay, but every time I run puppet apply ..., I get a delay of several seconds, after which it displays the message

warning: Could not retrieve fact fqdn

I assume the message is linked to the delay, and I want to get rid of it (the delay--I can live with the message). Googling for a solution seems to indicate that it's somehow related to DNS lookups, but I can't really find anything else about it, which seems surprising. All I want is to be able to apply manifests in my vm quickly so I can experiment. How can I speed things up?

Update: I don't see any extra info in the debug output, but it looks like this:

$ puppet apply -dv puppet-1.pp 
warning: Could not retrieve fact fqdn
debug: Failed to load library 'rubygems' for feature 'rubygems'
debug: Failed to load library 'selinux' for feature 'selinux'
debug: Puppet::Type::File::ProviderMicrosoft_windows: feature microsoft_windows is missing
...

Update: I added the "ruby" tag because puppet has so few followers. If this doesn't belong in ruby, or if you know a better tag for it, let me know.

Update again: Having learned some more about puppet, I now understand that this message is coming from the component called "Facter" that sniffs out "facts" about the system that Puppet is running on. I found some configuration options and played around with "certname", "node_name" and "node_name_value", but I couldn't get the delay to go away. Does anyone know specifically how to either tell Facter to ignore the fqdn or how to make Facter able to find the fqdn on an Ubuntu 11.10 vm?

Progress:

$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 192.168.1.1

That's my router, which is running Dnsmasq via Tomato.

$ dig -x 192.168.1.129 192.168.1.1

; <<>> DiG 9.7.3 <<>> -x 192.168.1.129 192.168.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21838
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;129.1.168.192.in-addr.arpa.    IN  PTR

;; ANSWER SECTION:
129.1.168.192.in-addr.arpa. 0   IN  PTR desk-vm-ubuntu-beta.

;; Query time: 14 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sun Oct 16 17:47:47 2011
;; MSG SIZE  rcvd: 77

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27462
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;192.168.1.1.           IN  A

;; ANSWER SECTION:
192.168.1.1.        0   IN  A   192.168.1.1

;; Query time: 11 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Sun Oct 16 17:47:47 2011
;; MSG SIZE  rcvd: 45

strace led me to arp, which was blocking for 5 seconds and called twice for each facter:

$ time arp -a
? (10.0.2.2) at 52:54:00:12:35:02 [ether] on eth0

real    0m5.127s
user    0m0.004s
sys     0m0.016s

I changed the VM from NAT networking to bridged, so that it now has an IP on the network, and arp returns immediately now. (I'm no networking guru, so I have no idea why this worked, but it seemed a reasonable thing to try.) But facter still takes about 4-5 seconds total to run and still reports "Could not retrieve fact fqdn". facter -d shows several occurrences of "value for domain is still nil", all the way to the end. I'm thinking something still isn't quite right.

John
  • 6,382
  • 3
  • 31
  • 53
Ryan Stewart
  • 115,853
  • 19
  • 167
  • 192

6 Answers6

33

Since puppet uses the fqdn fact to determine which node it is running as, it may not be possible to run if it can't be determined. Given what you're describing, the simplest thing to debug is facter fqdn instead of your puppet command-line.

If the "several seconds" is very close to exactly 5 seconds, it's very likely that your DNS configuration is broken with a single bad DNS server listed. What's in /etc/resolv.conf? What happens if you run dig -x $HOSTIP $DNSSERVERIP with the first nameserver listed in resolv.conf?

If you look in facter/fqdn.rb you can see what exactly facter is trying to do to resolve the fqdn. In the version I have most handy it's using facter/hostname.rb and facter/domainname.rb which call code from facter/util/resolution.rb.

Exactly what happens will depend on what version of facter you have, what OS, and possibly also what exactly you have installed. Calling /bin/hostname, uname (etc) and doing DNS lookups are all quite likely. You can always use strace -t facter fqdn to see what is taking the time (look for the gap in timestamps)

From everything you've described, it does sound like the problem is that puppet/facter really wants to have a domain name and you don't have one, you just have a naked hostname.

Adding domain example.com to /etc/resolv.conf should do the trick. Running hostname foo.example.com should also do the trick (but will need to be re-applied). Permanent solutions depend on the exact OS setup.

freiheit
  • 4,836
  • 34
  • 35
  • 3
    Ah! `strace` found something: `/usr/lib/ruby/1.8/facter/arp.rb` was calling `arp -a` which was blocking for 5 seconds. I added the resolv.conf, dig output, and arp output to my answer. It's faster now, but still not fast. Does 4-5 seconds sound right for a vm on one desktop core? – Ryan Stewart Oct 16 '11 at 23:14
  • It's trying really hard to find a domain name and failing every way it tries. Give it a domain and it will succeed faster. I'll add something to the answer in a moment. – freiheit Oct 16 '11 at 23:55
  • 1
    Since the vm is a desktop distro, I set a domain in my router and restarted network-manager, and it added the domain to resolv.conf itself. That took care of the domain and fqdn. I've never found a need to have a domain name configured before. It's at about 4 seconds max now--much better than it was, and I'm not sure it will get much better. Thanks for your help! – Ryan Stewart Oct 17 '11 at 00:19
  • 2
    More recent Facter versions use 'arp -an' to avoid reverse lookup. You can modify arp.rb to use this call, or upgrade facter to 1.6.1 - either will fix the timeout problem. – eshamow Oct 23 '11 at 19:21
  • 4
    Adding `domain example.com` to `/etc/resolv.conf` did the trick for me. Thanks! – François Beausoleil Jan 12 '12 at 13:38
  • Just want to point out that actual domain should be according to FQDN standards. I.e., webserver.local will not be picked up, but webserver.local.com will. – Alex Skrypnyk May 20 '14 at 08:43
26

I got the same error when running puppet on my home machine (Xubuntu). What worked for me was changing the second line of file /etc/hosts. The first two lines before the change:

127.0.0.1   localhost
127.0.1.1   box

And after the change:

127.0.0.1   localhost
127.0.1.1   box.example.com box

Now, the command hostname -f returns box.example.com instead of box, and puppet is happy.

Teemu Leisti
  • 3,672
  • 2
  • 27
  • 37
  • 1
    I had to find the full before short name it /etc/hosts otherwise it still didn't work. – Josiah Apr 25 '12 at 19:38
  • 4
    Make sure you have the order of your redirect correct. You gotta watch out for doing something like `127.0.1.1 box www.box.com` – Adam Harte Aug 16 '12 at 02:02
23

Adding

  config.vm.hostname = "vagrant.example.com"

to my Vagrantfile fixed it for me.

shredding
  • 4,376
  • 3
  • 36
  • 66
  • 1
    This is in-line with the information on this page: http://blog.doismellburning.co.uk/2013/01/19/upgrading-puppet-in-vagrant-boxes/ – demaniak Mar 08 '14 at 14:42
5

FQDN stands for "fully qualified domain name". In a Windows domain (or other similar LDAP-based domain), for example, it would be the name of your network domain, such as "organization.internal" - the domain that your computers and servers are joined to, and the domain which contains your network groups and user accounts.

So, it probably had trouble getting the fqdn for some authentication needed to perform the rest of the configuration steps, would be my guess.

http://en.wikipedia.org/wiki/Fully_qualified_domain_name

It's possible that you'll get a better answer on ServerFault, since system/configuration management also crosses over into their realm.

jefflunt
  • 32,075
  • 7
  • 80
  • 122
  • 1
    Right, I understand what an fqdn is. I've seen indications that it's somehow related to auth, but it doesn't actually prevent anything from happening. It all works. It just introduces an annoying delay. This also isn't a Windows box or on a Windows domain. It's a VirtualBox VM running Ubuntu 11.11. Good idea about ServerFault--if I don't get any more input here, I might see if I can figure out how to move the question over there. I added a little more detail that I've discovered to my question. – Ryan Stewart Oct 16 '11 at 15:18
  • 1
    Ah, gotcha. To migrate the question, people basically have to vote to close it as off topic (suggesting it move to ServerFault). If you don't get any traction, you'll probably just have to cross post it yourself. – jefflunt Oct 16 '11 at 15:24
  • 1
    I tried to answer (more gave you things to look at), but definitely ServerFault has more people that can answer a question like this. Instead of cross-posting or waiting for enough close/migrate votes, you can flag the question for moderator attention and ask them to migrate it for you. – freiheit Oct 16 '11 at 19:11
4

append this line into /etc/resolv.conf

domain abc.com

run facter fqdn again

Fqdn requires domain name , which maybe missing in your freshly installed ubu12

Kit Ho
  • 23,048
  • 42
  • 104
  • 150
3

Another possible way to circumvent is to override the fact.

http://www.puppetcookbook.com/posts/override-a-facter-fact.html

FACTER_fqdn=box.example.com facter

On Windows this would be

SET FACTER_fqdn=box.example.com
facter fqdn
ferventcoder
  • 10,640
  • 2
  • 51
  • 83