You would have noticed that we are using SparkSession and SparkContext, and this is not an error. Let's revisit the annals of Spark history for a perspective. It is important to understand where we came from, as you will hear about these connection objects for some time to come.
Prior to Spark 2.0.0, the three main connection objects were SparkContext, SqlContext, and HiveContext. The SparkContext object was the connection to a Spark execution environment and created RDDs and others, SQLContext worked with SparkSQL in the background of SparkContext, and HiveContext interacted with the Hive stores.
Spark 2.0.0 introduced Datasets/DataFrames as the main distributed data abstraction interface and the SparkSession object as the entry point to a Spark execution environment. Appropriately, the SparkSession object is found in the namespace, org.apache.spark.sql.SparkSession (Scala), or pyspark.sql.sparkSession. A few points to note are as follows:
In Scala and Java, Datasets form the main data abstraction as typed data; however, for Python and R (which do not have compile time type checking), the data...
https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781785889271/4/ch04lvl1sec31/sparksession-versus-sparkcontext