0

I have in production from 1 year one MongoDB cluster in replica-set.

From few days one slave node exceeded the maximum IOPS capacity of disk:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
    dm-0              0.00     0.00    2.00  594.00     8.00  6855.00    23.03   149.99  288.84  194.00  289.16   1.68 100.00
    dm-0              0.00     0.00    0.00  587.00     0.00  5489.00    18.70   150.39  245.96    0.00  245.96   1.70 100.00
    dm-0              0.00     0.00    2.00  720.00   132.00  5640.50    15.99   149.68  204.47  856.00  202.66   1.39 100.00
    dm-0              0.00     0.00    4.00  748.00   512.00  8962.00    25.20   147.64  210.12  264.00  209.83   1.33 100.00

The other two nodes doesn't have any problem, the IOPS are under the 10%.

This is my cluster configuration:

rs0:PRIMARY> rs.status()
{
    "set" : "rs0",
    "date" : ISODate("2014-11-25T15:51:28Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 1,
            "name" : "contr002.ecs.net:27018",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 78849,
            "optime" : Timestamp(1416930685, 235),
            "optimeDate" : ISODate("2014-11-25T15:51:25Z"),
            "lastHeartbeat" : ISODate("2014-11-25T15:51:26Z"),
            "lastHeartbeatRecv" : ISODate("2014-11-25T15:51:26Z"),
            "pingMs" : 0,
            "syncingTo" : "contr001.ecs.net:27018"
        },
        {
            "_id" : 2,
            "name" : "contr003.ecs.net:27018",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 39,
            "optime" : Timestamp(1416927433, 64),
            "optimeDate" : ISODate("2014-11-25T14:57:13Z"),
            "lastHeartbeat" : ISODate("2014-11-25T15:51:27Z"),
            "lastHeartbeatRecv" : ISODate("2014-11-25T15:51:26Z"),
            "pingMs" : 0,
            "syncingTo" : "contr002.ecs.net:27018"
        },
        {
            "_id" : 3,
            "name" : "contr001.ecs.net:27018",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 15575163,
            "optime" : Timestamp(1416930688, 200),
            "optimeDate" : ISODate("2014-11-25T15:51:28Z"),
            "self" : true
        }
    ],
    "ok" : 1
}

I already tried to resync slave node, but nothing has changed

Do you have any idea about it?

Thanks

Community
  • 1
  • 1
  • Anything unusual in the logs? Anything else running on the machine? There's either extra IOPS coming from somewhere or the disk is messed up. There shouldn't be writes beyond replication for a secondary, but check the logs and the operations to be sure. Then look at what else is on the machine using IOPS. – wdberkeley Nov 25 '14 at 18:52
  • The logs are clean. For debugging i stopped all services on the machine and i verified that there wasn't any IOPS on disk (there is a dedicated SSD in RAID1 for MongoDB) – Daniel Bellantuono Nov 26 '14 at 00:08

0 Answers0