2

I have two dataframes aaa_01 and aaa_02 in Apache Spark 2.1.0.

And I perform an Inner Join on these two dataframes selecting few colums from both dataframes to appear in the output.

The Join is working perfectly fine but the output dataframe has the column names as it was present in the input dataframes. I get stuck here. I need to have new column names instead of getting the same column names in my output dataframe.

Sample Code is given below for reference

DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select("a.col1","a.col2","b.col4")

I am getting the output dataframe with column names as "col1, col2, col3". I tried to modify the code as below but in vain

DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select("a.col1","a.col2","b.col4" as "New_Col")

Any help is appreciated. Thanks in advance.

Edited

I browsed and got similar posts which is given below. But I do not see an answer to my question.

Updating a dataframe column in spark

Renaming Column names of a Data frame in spark scala

The answers in this post : Spark Dataframe distinguish columns with duplicated name are not relevant to me as it is related more to pyspark than Scala and it had explained how to rename all the columns of a dataframe whereas my requirement is to rename only one or few columns.

JKC
  • 2,000
  • 3
  • 26
  • 50
  • Possible duplicate of [Spark Dataframe distinguish columns with duplicated name](https://stackoverflow.com/questions/33778664/spark-dataframe-distinguish-columns-with-duplicated-name) – Davis Broda Aug 25 '17 at 16:13

2 Answers2

5

You want to rename columns of the dataset, the fact that your dataset comes from a join does not change anything. Yo can try any example from this answer, for instance :

DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner")
    .select("a.col1","a.col2","b.col4")
    .withColumnRenamed("col4","New_col")
Fabich
  • 2,080
  • 2
  • 25
  • 32
3

you can .as alias as

import sqlContext.implicits._
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select($"a.col1".as("first"),$"a.col2".as("second"),$"b.col4".as("third"))

or you can use .alias as

import sqlContext.implicits._
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select($"a.col1".alias("first"),$"a.col2".alias("second"),$"b.col4".alias("third"))

if you are looking to update only one column name then you can do

import sqlContext.implicits._
DF1.alias("a").join(DF2.alias("b"),DF1("primary_col") === DF2("primary_col"), "inner").select($"a.col1", $"a.col2", $"b.col4".alias("third"))
Ramesh Maharjan
  • 36,749
  • 6
  • 50
  • 78