Questions tagged [spark-cassandra-connector]

Connects Apache Spark and Cassandra for clustered queries

Summary

Get lightning-fast cluster computing with Spark and Cassandra. This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.

Links

839 questions
55
votes
8 answers

How to list all cassandra tables

There are many tables in cassandra database, which contain column titled user_id. The values user_id are referred to user stored in table users. As some users are deleted, I would like to delete orphan records in all tables that contain column…
Niko Gamulin
  • 63,517
  • 91
  • 213
  • 274
50
votes
5 answers

How to query JSON data column using Spark DataFrames?

I have a Cassandra table that for simplicity looks something like: key: text jsonData: text blobData: blob I can create a basic data frame for this using spark and the spark-cassandra-connector using: val df = sqlContext.read …
12
votes
5 answers

How to fix java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List to field type scala.collection.Seq?

This error has been the hardest to trace. I am not sure what is going on. I am running a Spark cluster on my location machine. so the entire spark cluster is under one host which is 127.0.0.1 and I run on a standalone mode JavaPairRDD
user1870400
  • 4,540
  • 11
  • 41
  • 87
11
votes
12 answers

java.lang.NoClassDefFoundError: org/apache/spark/Logging

I'm always getting the following error.Can somebody help me please? Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging at java.lang.ClassLoader.defineClass1(Native Method) at…
11
votes
1 answer

How to retrieve Metrics like Output Size and Records Written from Spark UI?

How do I collect these metrics on a console (Spark Shell or Spark submit job) right after the task or job is done. We are using Spark to load data from Mysql to Cassandra and it is quite huge (ex: ~200 GB and 600M rows). When the task the done, we…
11
votes
2 answers

scala.ScalaReflectionException: is not a term

I have the following piece of code in Spark: rdd .map(processFunction(_)) .saveToCassandra("keyspace", "tableName") Where def processFunction(src: String): Seq[Any] = src match { case "a" => List(A("a", 123112, "b"), A("b", 142342, "c")) …
Paulo
  • 575
  • 2
  • 7
  • 19
10
votes
2 answers

Datastax Cassandra Driver throwing CodecNotFoundException

The exact Exception is as follows com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [varchar <-> java.math.BigDecimal] These are the versions of Software I am using Spark 1.5…
10
votes
3 answers

Guava version while using spark-shell

I'm trying to use the spark-cassandra-connector via spark-shell on dataproc, however I am unable to connect to my cluster. It appears that there is a version mismatch since the classpath is including a much older guava version from somewhere else,…
9
votes
2 answers

Reading from Cassandra using Spark Streaming

I have a problem when i use spark streaming to read from Cassandra. https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md#reading-from-cassandra-from-the-streamingcontext As the link above, i use val rdd =…
Yao Yu
  • 133
  • 2
  • 8
8
votes
1 answer

How to setup Spark with a multi node Cassandra cluster?

First of all, I am not using the DSE Cassandra. I am building this on my own and using Microsoft Azure to host the servers. I have a 2-node Cassandra cluster, I've managed to set up Spark on a single node but I couldn't find any online resources…
RoyaumeIX
  • 1,767
  • 3
  • 11
  • 31
8
votes
2 answers

How to write streaming Dataset to Cassandra?

So I have a Python Stream-sourced DataFrame df that has all the data I want to place into a Cassandra table with the spark-cassandra-connector. I've tried doing this in two ways: df.write \ .format("org.apache.spark.sql.cassandra") \ …
7
votes
1 answer

How to pushdown limit predicate for Cassandra when you use dataframes?

I have large Cassandra table. I want to load only 50 rows from Cassandra. Following code val ds = sparkSession.read .format("org.apache.spark.sql.cassandra") .options(Map("table" -> s"$Aggregates", "keyspace" -> s"$KeySpace")) …
addmeaning
  • 1,238
  • 1
  • 11
  • 34
7
votes
1 answer

saveToCassandra with spark-cassandra connector throws java.lang.ClassCastException

When trying to save data to Cassandra(in Scala), I get the following exception: java.lang.ClassCastException: com.datastax.driver.core.DefaultResultSetFuture cannot be cast to com.google.common.util.concurrent.ListenableFuture Please note that…
neeraj baji
  • 211
  • 2
  • 11
7
votes
1 answer

Not able to change Authentication in spark-cassandra-connector

I am creating one Spark-Cassandra App (Spark 1.6.0 & spark-cassandra-connector 1.6.0-M1), in which i am asking multiple users to enter their Cassandra properties like Host, Username, Password, Keyspace, Table and others. To change the above…
6
votes
0 answers

Getting Exception while reading and writting from Cassandra table through Spark

I have these configuration set for Spark but every time either I am reading from or writing to Cassandra table I am getting ioException .setMaster(sparkIp) .set("spark.cassandra.connection.host", cassandraIp) …
1
2 3
55 56