0

I have set up Ganglia(Ganglia Core 3.6.0 and Ganglia Web 3.5.10) to monitor my cluster.

When gmond is restarted in a machine, metrics from all other gmond machines also gets stopped ie I am not able to see metrics getting published from other machines in Ganglia Web. And I can also see Hosts up going to 0 and Hosts down as 13(total number of machines). As time goes, the Hosts up comes back to 13.

Am I missing something ?? Can some one help me...

vishnu
  • 451
  • 4
  • 18

1 Answers1

0

If it's always the same machine, it should be a gmond 'end-point'. The gmetad daemon is querying only one gmond (no redundancy), if he goes down everybody seems to be going down. If there are a redundancy (eg. more than one host in a data source), you can expect some lag if the first one goes down because of the number of TCP queries before it timesout.

aimnor
  • 56
  • 4