Questions tagged [spark-cassandra-connector]

Connects Apache Spark and Cassandra for clustered queries

Summary

Get lightning-fast cluster computing with Spark and Cassandra. This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.

Links

839 questions
4
votes
1 answer

java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy

I am trying to run spark-shell from DSE 5.0.11. I can successfully create and RDD, but trying to query it yields:
Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at…
4
votes
2 answers

Spark parquet s3 Error : AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null

I am trying to read a parquet file which is present in AWS S3 and getting the below error. 17/12/19 11:27:40 DEBUG DAGScheduler: ShuffleMapTask finished on 0 17/12/19 11:27:40 DEBUG DAGScheduler: submitStage(ResultStage 2) 17/12/19 11:27:40 DEBUG…
dks551
  • 893
  • 10
  • 26
4
votes
1 answer

Spark - sortWithInPartitions over sort

Below is the sample dataset representing the employees in_date and out_date. I have to obtain the last in_time of all employees. Spark is running on 4 Node standalone cluster. Initial Dataset: EmployeeID-----in_date-----out_date 1111111 …
4
votes
0 answers

Heavy writing in cassandra cluster causes a node to fail

I have to read a massive amount of table entries that I want to write to another table. Therefore I wrote a java+scala program that uses an rdd to scan the source table and write each entry in the target table. The program is submitted to a spark…
João Matos
  • 3,528
  • 3
  • 24
  • 52
4
votes
0 answers

Is there an alternative to joinWithCassandraTable for DataFrames in Spark (Scala) when retrieving data from only certain Cassandra partitions?

When extracting small number of partitions from large C* table using RDDs, we can use this: val rdd = … // rdd including partition data val data = rdd.repartitionByCassandraReplica(keyspace, tableName) .joinWithCassandraTable(keyspace,…
4
votes
0 answers

bulk insert into Cassandra UDT using spark

I am using spark to bulk insert Facebook data for analysis I have comment as a UDT in cassandra . the a table fbpost that has set of comments as a column Below is the schema CREATE TYPE analytics.comment ( commentid text, commenttext…
Count
  • 1,235
  • 2
  • 18
  • 37
4
votes
1 answer

Spark cassandra : join table with condition on the query based on attribute from the primary RDD ("where tableA.myValue > tableB.myOtherValue")

Is there a way to join 2 tables adding a condition on columns between the 2 tables ? Example : case class TableA(pkA: Int, valueA: Int) case class TableB(pkB: Int, valueB: Int) val rddA = sc.cassandraTable[TableA]("ks",…
Gridou
  • 109
  • 7
4
votes
2 answers

How to use java.time.LocalDate in Cassandra query from Spark?

We have a table in Cassandra with column start_time of type date. When we execute following code: val resultRDD = inputRDD.joinWithCassandraTable(KEY_SPACE,TABLE) .where("start_time = ?", java.time.LocalDate.now) We get following…
4
votes
2 answers

Spark-cassandra-connector: toArray does not work

I am using the spark-cassandra-connector with Scala and I want to read data from cassandra and display it via the method toArray. However, I get an error message that it is not member of a class, but it is indicated in the API. Could somebody help…
Andi Maier
  • 514
  • 1
  • 7
  • 18
4
votes
1 answer

How spark selects cassandra node for read?

I have Cassandra cluster with N nodes on N machines. Also I have spark worker on every machine. For reading from Cassandra I'm using Datastax spark-cassandra connector. When I'm setting workers (Standalone mode) I only say master host for them. In…
Cortwave
  • 4,071
  • 2
  • 21
  • 38
4
votes
1 answer

Spark + Cassandra on EMR LinkageError

I have Spark 1.6 deployed on EMR 4.4.0 I am connecting to datastax cassandra 2.2.5 deployed on EC2. The connection works to save data into cassandra using spark-connector 1.4.2_s2.10 (Since it has guava 14) However reading data from cassandra fails…
4
votes
4 answers

Error Running Cassandra from Spark in Java - NoClassDefFoundError at org.apache.spark.sql.catalyst

I am using Cassandra 3.0.3, Spark 1.6.0 and trying to run by combining code from the old documentation in http://www.datastax.com/dev/blog/accessing-cassandra-from-spark-in-java and the new one in…
M.R. Murazza
  • 346
  • 3
  • 12
4
votes
1 answer

How to convert cassandraRow into Row (apache spark)?

I am trying to create a Dataframe from RDD[cassandraRow].. But i can't because createDataframe(RDD[Row],schema: StructType) need RDD[Row] not RDD[cassandraRow]. How can I achieve this? And also as per the answer in this question How to convert rdd…
4
votes
1 answer

Getting Exception java.util.NoSuchElementException: key not found: 'text' in spark-cassandra-connector

I want to save data from spark RDDs to cassandra table. I am using spark-cassandra-connector for java from https://github.com/datastax/spark-cassandra-connector Code to save as per docs rddJavaFunctions.writerBuilder("populartweets",…
4
votes
0 answers

PySpark Cassandra Connector efficiently querying across partition keys

I'm faced with the following problem using PySpark and dataframes with the cassandra-connector. My Cassandra data lake consists of metric measurements across (network) devices, and the entries are of type (device,interface,metric,time,value). My…
DrNik
  • 41
  • 2
1 2
3
55 56