2

I am doing data replication in kafka. But, the size of kafka log file is increases very quickly. The size reaches 5 gb in a day. As a solution of this problem, ı want to delete processed data immediately. I am using delete record method in AdminClient to delete offset. But when I look at the log file, data corresponding to that offset is not deleted.

RecordsToDelete recordsToDelete = RedcordsToDelete.beforeOffset(offset);
TopicPartition topicPartition = new TopicPartition(topicName,partition);
Map<TopicPartition,RecordsToDelete> deleteConf = new HashMap<>();
deleteConf.put(topicPartition,recordsToDelete);
adminClient.deleteRecords(deleteConf);

I don't want suggestions like (log.retention.hours , log.retention.bytes , log.segment.bytes , log.cleanup.policy=delete)

Because I just want to delete data consumed by the consumer. In this solution, I also deleted the data that is not consumed.

What are your suggestions?

OneCricketeer
  • 126,858
  • 14
  • 92
  • 185
omerstack
  • 348
  • 1
  • 14
  • See https://stackoverflow.com/questions/28586008/delete-message-after-consuming-it-in-kafka – Giorgos Myrianthous Oct 24 '18 at 06:23
  • @Gio most of the answers there were before there existed an AdminClient to delete records... – OneCricketeer Oct 24 '18 at 07:24
  • What if *other consumers* want that data? It shouldn't be up to the client to delete offsets, that's a server side configuration – OneCricketeer Oct 24 '18 at 07:25
  • There is a separate topic for each consumer. That's why there's no problem with client side. I can also delete offsets.My problem is that the size of the log file does not decrease when I delete the offset. Because my log file is not shrinking, the disk fills up soon. – omerstack Oct 24 '18 at 07:27
  • Possible duplicate of [Kafka: How to delete records from a topic using Java API?](https://stackoverflow.com/questions/51283736/kafka-how-to-delete-records-from-a-topic-using-java-api) – trix Dec 20 '18 at 10:45

2 Answers2

1

You didn't do anything wrong. The code you provided works and I've tested it. Just in case I've overlooked something in your code, mine is:

public void deleteMessages(String topicName, int partitionIndex, int beforeIndex) {
    TopicPartition topicPartition = new TopicPartition(topicName, partitionIndex);
    Map<TopicPartition, RecordsToDelete> deleteMap = new HashMap<>();
    deleteMap.put(topicPartition, RecordsToDelete.beforeOffset(beforeIndex));
    kafkaAdminClient.deleteRecords(deleteMap);
}

I've used group: 'org.apache.kafka', name: 'kafka-clients', version: '2.0.0'

So check if you are targeting right partition ( 0 for the first one)

Check your broker version: https://kafka.apache.org/20/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html says:

This operation is supported by brokers with version 0.11.0.0

Produce the messages from the same application, to be sure you're connected properly.

There is one more option you can consider. Using cleanup.policy=compact If your message keys are repeating you could benefit from it. Not just because older messages for that key will be automatically deleted but you can use the fact that message with null payload deletes all the messages for that key. Just don't forget to set delete.retention.ms and min.compaction.lag.ms to values small enough. In that case you can consume a message and than produce null payload for the same key ( but be cautious with this approach since this way you can delete messages ( with that key) you didn't consume)

OneCricketeer
  • 126,858
  • 14
  • 92
  • 185
1

Try this

DeleteRecordsResult result = adminClient.deleteRecords(recordsToDelete);
Map<TopicPartition, KafkaFuture<DeletedRecords>> lowWatermarks = result.lowWatermarks();
try {
    for (Map.Entry<TopicPartition, KafkaFuture<DeletedRecords>> entry : lowWatermarks.entrySet()) {
        System.out.println(entry.getKey().topic() + " " + entry.getKey().partition() + " " + entry.getValue().get().lowWatermark());
    }
} catch (InterruptedException | ExecutionException e) {
    e.printStackTrace();
}
adminClient.close();

In this code, you need to call entry.getValue().get().lowWatermark(), because adminClient.deleteRecords(recordsToDelete) returns a map of Futures, you need to wait for the Future to run by calling get()

trix
  • 1,759
  • 3
  • 15
  • 15