-1

I am inserting data into Cassandra using Batch. I am getting below exception when i run the job.

caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large   
  at com.datastax.driver.core.Responses$Error.asException(Responses.java:136)   
  at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179)     
  at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184)    
  at com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43)    
  at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798)   
  at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617)    
  at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005)  
  at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928)   
  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)     
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)   
  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)     
  at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)     
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)   
  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)

I have read lot of blogs on this issue. But that is not helpful. I tried setting “spark.cassandra.output.batch.size.bytes” to the spark conf while initializing. Still this is not resolved my problem. I am getting the same error. My batch has some 1000 insert statements.

Please find my code below.

CassandraConnector connector = CassandraConnector.apply(javaSparkContext.getConf());  
pairRDD.mapToPair(earnCalculatorKeyIterableTuple2 -> {  
            if (condition) {  
                do something......
            }  
            else {  
                Session session = connector.openSession();  
                BatchStatement batch = new   BatchStatement(BatchStatement.Type.UNLOGGED);            batch.setConsistencyLevel(ConsistencyLevel.valueOf(LOCAL_QUOROM));  
                PreparedStatement statement = session.prepare('my insert query');  
                for (condition) {  
                    if (!condition) {  
                        break;  
                    }  
                    Tuple2._2.forEach(s -> {  
                        if (!condition) {  
                            LOG.info(message);  
                        }  
                        else {  
                            BoundStatement boundStatement = statement.bind("bind variables");  
                            batch.add(boundStatement);  
                        }  
                    });  
                    session.execute(batch);  
                    batch.clear();  
                }  
                session.close();  
            }  
            return Tuple2;  
        });  
        return s;  
    }  

Appreciate any help.

mrsrinivas
  • 27,898
  • 11
  • 107
  • 118
sandy
  • 1
  • 5
  • Are you actually using Spark? I ask because your trace doesn't seem to have any Spark Cassandra Connector levels and changing batch.size.bytes would change the number of statements in your insert. – RussS Jan 09 '17 at 22:51
  • yes i am using spark-cassandra connector. i tried giving batch.size.bytes = auto. Still not fixed with that. – sandy Jan 10 '17 at 01:13
  • 2
    Could you actually provide a code sample? – RussS Jan 10 '17 at 01:41
  • added code. please have a look into it. – sandy Jan 10 '17 at 16:03

1 Answers1

1

You are manually creating batches and your batches are too large. Add less rows to each batche. There are a lot of ways to do this manually but berhaps the easiest is just to add a counter which submits a batch every time X statements are added?

The parameters you are changing only relate to the automatic batching done by saveToCassandra.

RussS
  • 16,006
  • 1
  • 29
  • 58