We are having a problem with one of our kafka clusters. We have 6 nodes with v1.0, all the topics have a replication factor of 3 and 10 partitions/topic which seemed to be enough for us.
Due to a power failure, 3 of the nodes went down for a while, and now we have A LOT of topics which are reported as having under-replicated partitions.
The only solution (and seems to be the more accepted) we have seen on forums is to do a rolling restart until everything gets magically fixed, but i hope there is a better solution for this. Has anybody recovered from this situation? Network or cpu shouldn't be a problem to get in sync as it's not even near the limits.
Thanks a lot!