SparkContext and SparkSession : How to get the "parallelizePairs()"?

Question

I'm a newbie on Spark and need to parallelizePairs() (working on Java).

First, I've started my driver with:

SparkSession spark = SparkSession
        .builder()
        .appName("My App")
        .config("driver", "org.postgresql.Driver")
        .getOrCreate();

But spark don't have the function I need. Just parallelize() thru spark.sparkContext()

Now I'm tempted to add

SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("My App");
JavaSparkContext context = new JavaSparkContext(sparkConf);

This way, context have the function I need but I'm very confusing here.

First, I never needed JavaSparkContext because I'm running using spark-submit and setting the master address there.

Second, why spark.sparkContext() is not the same of JavaSparkContext and how to get it using the SparkSession?

If I'm passing the master in command line, must I set sparkConf.setMaster( '<master-address-again>' )?

I already read this: How to create SparkSession from existing SparkContext and undesrtood the problem but I realy need the builder way because I need to pass the .config("driver", "org.postgresql.Driver") to it.

Please some light here...

EDIT

    Dataset<Row> graphDatabaseTable = spark.read()
            .format("jdbc")
            .option("url", "jdbc:postgresql://192.168.25.103:5432/graphx")
            .option("dbtable", "public.select_graphs")
            .option("user", "postgres")
            .option("password", "admin")
            .option("driver", "org.postgresql.Driver")
            .load();        
    SQLContext graphDatabaseContext = graphDatabaseTable.sqlContext();
    graphDatabaseTable.createOrReplaceTempView("select_graphs");

    String sql = "select * from select_graphs where parameter_id = " + indexParameter;          
    Dataset<Row> graphs = graphDatabaseContext.sql(sql);

score 3 · Accepted Answer · answered Sep 10 '17 at 09:43

Initialize JavaSparkContext using existing SparkContext:

JavaSparkContext context = JavaSparkContext(spark.sparkContext());

why spark.sparkContext() is not the same of JavaSparkContext and how to get it using the SparkSession

In short, because Scala is much richer language than Java and JavaSparkContext is a convenience wrapper, designed to get around some Java limitations. At the same time RDD API is moved aside.

If I'm passing the master in command line, must I set sparkConf.setMaster( '' )

No. Precedence is:

configuration files
spark-submit options
SparkConf and SparkContext options.

but I realy need the builder way because I need to pass the .config("driver", "org.postgresql.Driver") to it.

It doesn't look right. driver option is used by DataFrameWriter and DataFrameReader. It should be passed there.

About the `driver` option : Yes, I made a mistake. See my edit and please give me your opinion. — Magno C, Sep 11 '17 at 12:21

score 0 · Answer 2 · edited Jun 22 '18 at 02:39

0

sparkContext = SparkSession.sparkContext();

SparkSession spark = SparkSession
    .builder()
    .appName("My App")
    .config("driver", "org.postgresql.Driver")
    .getOrCreate(); 
sparkContext = spark.sparkContext;

edited Jun 22 '18 at 02:39

Stephen Rauch

40,722
30
82
105

answered Jun 22 '18 at 02:14

gong situ

39
2

SparkContext and SparkSession : How to get the "parallelizePairs()"?

2 Answers2