How to create SparkSession from existing SparkContext

Question

I have a Spark application which using Spark 2.0 new API with SparkSession. I am building this application on top of the another application which is using SparkContext. I would like to pass SparkContext to my application and initialize SparkSession using existing SparkContext.

However I could not find a way how to do that. I found that SparkSession constructor with SparkContext is private so I can't initialize it in that way and builder does not offer any setSparkContext method. Do you think there exist some workaround?

I'm not very sure but according to my knowledge ter is no workaround — Balaji Reddy, Mar 21 '17 at 18:22
yea :( so If there is no workaround there are two options left: using SparkContext in my application or add support for sparkSession to application I am building on the top (it is spark-jobserver, I am using their branch spark-2.0-preview however they still use sparkContext) — Stefan Repcek, Mar 21 '17 at 18:30
You only need to add support for an external SparkContext to the application and access the session.sparkContext. Shouldn't be a big issue. — matfax, Mar 21 '17 at 22:21
can you explain more by what you mean "add support for an external SparkContext" I read you should use just one instance of sparkcontext — Stefan Repcek, Mar 21 '17 at 23:31
I suppose the application creates its own SparkContext. Since you only want one SparkContext (for good reasons), you need to add a parameter to the application's constructor or builder that accepts the external SparkContext that you already created using the session builder. — matfax, Mar 22 '17 at 01:10
the problem is the application I am using (spark-jobserver) don't allow to pass my sparkContext, it creates its own — Stefan Repcek, Mar 22 '17 at 11:03
That's why you need to edit the code of spark-jobserver (the application) not to create its own. Fork it, make your modifications, and publish it (e.g., with Jitpack). As Balaji said, there is no workaround. The only alternative is to edit Spark itself, which I wouldn't recommend. — matfax, Mar 22 '17 at 12:12
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/138731/discussion-between-matthias-fax-and-stevesk). — matfax, Mar 22 '17 at 12:27

score 22 · Answer 1 · answered Dec 05 '18 at 13:26

22

Deriving the SparkSession object out of SparkContext or even SparkConf is easy. Just that you might find the API to be slightly convoluted. Here's an example (I'm using Spark 2.4 but this should work in the older 2.x releases as well):

// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("spark-test").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()

Hope that helps!

answered Dec 05 '18 at 13:26

Rishabh

1,673
1
16
18

This indeed works. This is the one that should have been accepted. – Romeo Sierra Jun 24 '20 at 09:43
1

val spark = SparkSession.builder.config(conf).getOrCreate() instead of sc.getConf as you already have conf. – soumya-kole Oct 06 '20 at 19:08

score 21 · Answer 2 · edited Aug 16 '17 at 08:31

21

Like in the above example you cannot create because SparkSession's constructor is private Instead you can create a SQLContext using the SparkContext, and later get the sparksession from the sqlcontext like this

val sqlContext=new SQLContext(sparkContext);
val spark=sqlContext.sparkSession

Hope this helps

edited Aug 16 '17 at 08:31

philantrovert

8,704
3
25
48

answered Jun 01 '17 at 09:55

Partha Sarathy

219
2
4

4

When I do this in Spark 2.2, it says SQLContext is deprecated and to use SparkSession.Builder() instead – covfefe Mar 14 '18 at 22:35
Correct. In Spark 2, SQLContext is deprecated because everything is consolidated to the SparkSession, which is why you'd just use `SparkSession.sql()` to execute your Spark SQL, `SparkSession.sparkContext` to get the context if you need it, etc. If you're looking for Hive support (previously HiveContext), you do something like `val spark = SparkSession.builder().enableHiveSupport()` – Anthony May 22 '18 at 19:23

score 14 · Accepted Answer · answered Apr 12 '17 at 14:44

14

Apparently there is no way how to initialize SparkSession from existing SparkContext.

answered Apr 12 '17 at 14:44

Stefan Repcek

2,285
2
18
26

score 6 · Answer 4 · answered Apr 28 '17 at 05:01

public JavaSparkContext getSparkContext() 
{
        SparkConf conf = new SparkConf()
                    .setAppName("appName")
                    .setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        return jsc;
}


public  SparkSession getSparkSession()
{
        sparkSession= new SparkSession(getSparkContext().sc());
        return sparkSession;
}


you can also try using builder  

public SparkSession getSparkSession()
{
        SparkConf conf = new SparkConf()
                        .setAppName("appName")
                        .setMaster("local");

       SparkSession sparkSession = SparkSession
                                   .builder()
                                   .config(conf)
                                  .getOrCreate();
        return sparkSession;
}

in your second method you don't use any spark context, in scala I can't construct SparkSession like in your getSparkSession() — Stefan Repcek, May 10 '17 at 20:30

score 4 · Answer 5 · answered Nov 05 '18 at 19:09

4

val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()

answered Nov 05 '18 at 19:09

lostsoul29

654
1
9
19

2

Please edit your answer to include details about why and how this solves OP's issue. – Will Barnwell Nov 05 '18 at 22:23
In PySpark this could cause `AttributeError: 'function' object has no attribute 'getAll'` – A.Ametov Nov 01 '19 at 10:08

score 1 · Answer 6 · answered Jan 26 '18 at 06:33

You would have noticed that we are using SparkSession and SparkContext, and this is not an error. Let's revisit the annals of Spark history for a perspective. It is important to understand where we came from, as you will hear about these connection objects for some time to come.

Prior to Spark 2.0.0, the three main connection objects were SparkContext, SqlContext, and HiveContext. The SparkContext object was the connection to a Spark execution environment and created RDDs and others, SQLContext worked with SparkSQL in the background of SparkContext, and HiveContext interacted with the Hive stores.

Spark 2.0.0 introduced Datasets/DataFrames as the main distributed data abstraction interface and the SparkSession object as the entry point to a Spark execution environment. Appropriately, the SparkSession object is found in the namespace, org.apache.spark.sql.SparkSession (Scala), or pyspark.sql.sparkSession. A few points to note are as follows:

In Scala and Java, Datasets form the main data abstraction as typed data; however, for Python and R (which do not have compile time type checking), the data...

https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781785889271/4/ch04lvl1sec31/sparksession-versus-sparkcontext

How to create SparkSession from existing SparkContext

6 Answers6

Linked