1

I'm working with RabbitMQ instances hosted at CloudAMQP. I'm calling the management API to get detailed queue statistics. About 1 in 10 calls to the API return invalid numbers.

The endpoint is /api/queues/[vhost]/[queue]?msg_rates_age=600&msg_rates_incr=30. I'm looking for average message rates at 30 second increments over a 10 minute span of time. Usually that returns valid data for the stats I'm interested in, e.g.

{
    "messages": 16,
    "consumers": 30,
    "message_stats": {
        "ack_details": {
            "avg_rate": 441
        },
        "publish_details": {
            "avg_rate": 441
        }
    }
}

But sometimes I get incorrect results for one or both "avg_rate" values, often 714676 or higher. If I then wait 15 seconds and call the same API again the numbers go back down to normal. There's no way the average over 10 minutes jumps by a multiple of 200 and then comes back down seconds later.

I haven't been able to reproduce the issue with a local install, only in production where the queue is always very busy. The data displayed on the admin web page always looks correct. Is there some other way to get the same stats accurately like the UI?

Matt S
  • 13,731
  • 4
  • 45
  • 70
  • Can you show the problem response that you are getting? My guess is that you're seeing a high "average rate" if the denominator is close to zero (keep in mind a rate always is a ratio of two numbers). I don't know what they're using here (1 second?) – theMayer Apr 05 '18 at 14:56
  • This might be something to poke to the bug tracker too, if it's reproducible. – theMayer Apr 05 '18 at 14:58
  • The response contains `"avg_rate": 714676` or similar for either value. There might be some calculation bug (like an overflow) in the RabbitMQ backend. I'm wondering if there are better parameters I can pass or if there's another data point I should be looking at. I thought the 30 second increment param would be good. – Matt S Apr 05 '18 at 15:00
  • The documentation is poor, but I am wondering if `msg_rates_incr` is actually in units of milliseconds? If so, that would explain this. Try setting it to 1000 and see what happens. – theMayer Apr 05 '18 at 15:02
  • @theMayer If I pass 30000 for incr the "avg_rate" field isn't returned at all. It returns "rate" of 0. I guess incr really is seconds and it knows it can't calculate with an increment larger than age. – Matt S Apr 05 '18 at 15:12
  • Even if you increase your age also? I'm just spitballing here. – theMayer Apr 05 '18 at 15:13
  • @theMayer Works but my avg_rate drops to 0 in my local env because it's over such a long span of time. BTW this is very hard to reproduce in my local but easy to see in my live environment where the queue is always very busy. I'm leaning toward reporting a bug or at least a ticket with CloudAMQP so they can see it. – Matt S Apr 05 '18 at 15:16
  • That makes sense. I wonder if there are better ways to go about getting this information overall (e.g. logging statistics via elasticsearch query) as opposed to this. I've honestly never thought about doing what you're doing. Usually this type of meta analysis is done in its own subsystem. – theMayer Apr 05 '18 at 15:18
  • Hey guys, do you have any update on this? I'm facing the same issue with statistics, and it's not only for queue statistic, for messages as well. – alxbxbx Nov 08 '18 at 10:34
  • @alxbxbx I never found a solution. Please post back if you figure it out! – Matt S Nov 08 '18 at 14:28

0 Answers0