Questions tagged [spark-cassandra-connector]

Connects Apache Spark and Cassandra for clustered queries

Summary

Get lightning-fast cluster computing with Spark and Cassandra. This library lets you expose Cassandra tables as Spark RDDs, write Spark RDDs to Cassandra tables, and execute arbitrary CQL queries in your Spark applications.

Links

Github site
User group: another place to ask questions & get support
Datastax overview and tutorial
DZone tutorial

839 questions

votes

1 answer

java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy

I am trying to run spark-shell from DSE 5.0.11. I can successfully create and RDD, but trying to query it yields:

Caused by: java.io.IOException: java.lang.reflect.InvocationTargetException
        at…

asked Feb 01 '18 at 22:04

Mark Bidewell

votes

2 answers

Spark parquet s3 Error : AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null

I am trying to read a parquet file which is present in AWS S3 and getting the below error. 17/12/19 11:27:40 DEBUG DAGScheduler: ShuffleMapTask finished on 0 17/12/19 11:27:40 DEBUG DAGScheduler: submitStage(ResultStage 2) 17/12/19 11:27:40 DEBUG…

apache-spark amazon-s3 parquet spark-cassandra-connector

asked Dec 19 '17 at 11:59

dks551

votes

1 answer

Spark - sortWithInPartitions over sort

Below is the sample dataset representing the employees in_date and out_date. I have to obtain the last in_time of all employees. Spark is running on 4 Node standalone cluster. Initial Dataset: EmployeeID-----in_date-----out_date 1111111 …

apache-spark apache-spark-sql spark-cassandra-connector apache-spark-dataset

asked Nov 30 '17 at 17:19

ThinkTank0790

votes

0 answers

Heavy writing in cassandra cluster causes a node to fail

I have to read a massive amount of table entries that I want to write to another table. Therefore I wrote a java+scala program that uses an rdd to scan the source table and write each entry in the target table. The program is submitted to a spark…

apache-spark cassandra spark-cassandra-connector

asked Oct 08 '17 at 16:05

João Matos

3,528
3
24
52

votes

0 answers

Is there an alternative to joinWithCassandraTable for DataFrames in Spark (Scala) when retrieving data from only certain Cassandra partitions?

When extracting small number of partitions from large C* table using RDDs, we can use this: val rdd = … // rdd including partition data val data = rdd.repartitionByCassandraReplica(keyspace, tableName) .joinWithCassandraTable(keyspace,…

scala apache-spark cassandra spark-cassandra-connector

asked Apr 21 '17 at 21:48

JackONeill

votes

0 answers

bulk insert into Cassandra UDT using spark

I am using spark to bulk insert Facebook data for analysis I have comment as a UDT in cassandra . the a table fbpost that has set of comments as a column Below is the schema CREATE TYPE analytics.comment ( commentid text, commenttext…

java apache-spark cassandra spark-cassandra-connector

asked Jan 26 '17 at 12:22

Count

1,235
2
18
37

votes

1 answer

Spark cassandra : join table with condition on the query based on attribute from the primary RDD ("where tableA.myValue > tableB.myOtherValue")

Is there a way to join 2 tables adding a condition on columns between the 2 tables ? Example : case class TableA(pkA: Int, valueA: Int) case class TableB(pkB: Int, valueB: Int) val rddA = sc.cassandraTable[TableA]("ks",…

apache-spark cassandra spark-cassandra-connector

asked Nov 21 '16 at 10:37

Gridou

votes

2 answers

How to use java.time.LocalDate in Cassandra query from Spark?

We have a table in Cassandra with column start_time of type date. When we execute following code: val resultRDD = inputRDD.joinWithCassandraTable(KEY_SPACE,TABLE) .where("start_time = ?", java.time.LocalDate.now) We get following…

apache-spark cassandra apache-spark-sql spark-cassandra-connector

asked Oct 13 '16 at 12:54

Marcin Armatys

votes

2 answers

Spark-cassandra-connector: toArray does not work

I am using the spark-cassandra-connector with Scala and I want to read data from cassandra and display it via the method toArray. However, I get an error message that it is not member of a class, but it is indicated in the API. Could somebody help…

scala spark-cassandra-connector

asked Sep 30 '16 at 08:34

Andi Maier

votes

1 answer

How spark selects cassandra node for read?

I have Cassandra cluster with N nodes on N machines. Also I have spark worker on every machine. For reading from Cassandra I'm using Datastax spark-cassandra connector. When I'm setting workers (Standalone mode) I only say master host for them. In…

apache-spark cassandra spark-cassandra-connector

asked Apr 26 '16 at 17:28

Cortwave

4,071
2
21
38

votes

1 answer

Spark + Cassandra on EMR LinkageError

I have Spark 1.6 deployed on EMR 4.4.0 I am connecting to datastax cassandra 2.2.5 deployed on EC2. The connection works to save data into cassandra using spark-connector 1.4.2_s2.10 (Since it has guava 14) However reading data from cassandra fails…

apache-spark cassandra datastax spark-cassandra-connector datastax-startup

asked Mar 29 '16 at 00:13

lazywiz

1,041
2
12
24

votes

4 answers

Error Running Cassandra from Spark in Java - NoClassDefFoundError at org.apache.spark.sql.catalyst

I am using Cassandra 3.0.3, Spark 1.6.0 and trying to run by combining code from the old documentation in http://www.datastax.com/dev/blog/accessing-cassandra-from-spark-in-java and the new one in…

java spark-cassandra-connector

asked Mar 04 '16 at 06:54

M.R. Murazza

votes

1 answer

How to convert cassandraRow into Row (apache spark)?

I am trying to create a Dataframe from RDD[cassandraRow].. But i can't because createDataframe(RDD[Row],schema: StructType) need RDD[Row] not RDD[cassandraRow]. How can I achieve this? And also as per the answer in this question How to convert rdd…

apache-spark cassandra spark-cassandra-connector

asked Feb 01 '16 at 05:29

Parth Vishvajit

votes

1 answer

Getting Exception java.util.NoSuchElementException: key not found: 'text' in spark-cassandra-connector

I want to save data from spark RDDs to cassandra table. I am using spark-cassandra-connector for java from https://github.com/datastax/spark-cassandra-connector Code to save as per docs rddJavaFunctions.writerBuilder("populartweets",…

java cassandra spark-streaming twitter-streaming-api spark-cassandra-connector

asked Dec 16 '15 at 04:54

hard coder

4,256
4
26
52

votes

0 answers

PySpark Cassandra Connector efficiently querying across partition keys

I'm faced with the following problem using PySpark and dataframes with the cassandra-connector. My Cassandra data lake consists of metric measurements across (network) devices, and the entries are of type (device,interface,metric,time,value). My…

apache-spark pyspark spark-cassandra-connector

asked Dec 16 '15 at 02:17

DrNik

Prev 1 2

…

55 56 Next