-1

I want to rename one column name from dataframe columns, So currently the Column name is rate%year. I want to rename it as rateyear in pyspark.

Possibly, we can rename columns at dataframe and table level after registering dataframe as table, but at table level "%" will create problem so i want to rename at dataframe level itelf.

I tried this- data.selectExpr("rate%year as rateyear")

but getting this error pyspark.sql.utils.AnalysisException: u"cannot resolve 'rate' given input columns

Thanks.

Mayank Porwal
  • 27,201
  • 7
  • 25
  • 45
andy
  • 413
  • 1
  • 6
  • 16
  • 5
    Possible duplicate of [How to change dataframe column names in pyspark?](https://stackoverflow.com/questions/34077353/how-to-change-dataframe-column-names-in-pyspark) – Nordle Oct 23 '18 at 05:49
  • @Matt B, I went through the links and tried this- data.selectExpr("rate%year as rateyear") but getting this error pyspark.sql.utils.AnalysisException: u"cannot resolve '`rate`' given input columns. – andy Oct 23 '18 at 06:41

4 Answers4

2

Try this:

sqlContext.registerDataFrameAsTable(data, "myTable")
data = sqlContext.sql("SELECT rate%year AS rateyear from myTable")
Mayank Porwal
  • 27,201
  • 7
  • 25
  • 45
2

I wrote an easy and fast function for you to remove % from column names. Enjoy! :)

def rename_cols(rename_df):
    for column in rename_df.columns:
        new_column = column.replace('%','')
        rename_df = rename_df.withColumnRenamed(column, new_column)
    return rename_df
Zilong Z
  • 441
  • 3
  • 10
0

Possible way of renaming at dataframe level-

oldColumns=['rate%year']
newColumns = ["rateyear"]
df1 = reduce(lambda df, idx: df.withColumnRenamed(oldColumns[idx], newColumns[idx]), xrange(len(oldColumns)), df)

this is working fine at dataframe level. any suggestion how to resolve at table level?

andy
  • 413
  • 1
  • 6
  • 16
  • Just rename the column prior to registering the dataframe as a table e.g. `df = df.withColumnRenamed('rate%year', 'rateyear')` and then `sc.registerDataFrameAsTable(data, "myTable")` no need for reduce and lambdas – Davos Nov 21 '19 at 12:05
0

Simple and quick way to alter dataframe column names.

def format_col(df):    
    cols = [col.replace("%", "") for col in df.columns]
    res_df = df.toDF(*cols)
    return res_df
Jyoti Gupta
  • 96
  • 1
  • 2