Saturday, August 31, 2013

gangalia rrd graph is broken (shown with gap on time scale)



Problem : gangalia rrd graph is broken (shown with gap on time scale)

In my case, after upgraded to latest ganglia version, my rrd graphs were like this:



It took a while to find out that 2 unintended changes went into gmetad.conf.

[1]  RRAs has been relplaced with default one. To understand RRAs and RRD database file, read this article. RRD is a circular buffer.

ganglia gmetad polls gmond in every 15 seconds (default). My below RRAs will create 2 circular buffer in RRD database file.

     RRAs    "RRA:AVERAGE:0.5:1:172800"   "RRA:AVERAGE:0.5:24:87600


  • ·         Write value in every polling interval (:1: ) and save last 172800 sample. Thus, 15 seconds x 172800 sample / (3600 second x 24 hours) = 30 days. It will store 30 days data of 15 second resolution.

  • ·       Write once sample by averaging value of 24 polls (:24:) (24 sec x 15 sec = 6 min) and save last 87600 samples. 6 min x 87600 sample / (60 min x  24 hr ) = 365 days . This It will store 1 year data of 6 min resolution.

RRD database files were created with new RRAs values for the systems added after ganglia upgrade. So some of systems, RRD database files were different than other systems' RRD files.
 
[2] There was no polling interval defined for cluster data_source (polling interval is 15 second, if not mentioned). But in new config file, it was 120. (either remove 120 or replace with 15)

         data_source app_cluster 120 appsystem01 appsystem02 appsystem03

Did you find any other fix for same problem ?
 



No comments:

Post a Comment