To quote its web site: Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. Or with other words: One can visualize all sorts of metrics otherwise obtained by numerous different shell tools at a glance.
We, the HPC team, use Ganglia on a daily basis to monitor various states of our cluster(s). You as a user can monitor the state of nodes where your jobs are running on.
Without futher ado, here is the top link to enter our ganglia page: Ganglia Entry point for the Mogon Clusters.