pyspark withColumnRenamed, drop functions, u'Reference ambiguous error

Question

I have a function which changes the column headers of a DF with a new set of headers in a list.

def updateHeaders(dataFrame, newHeader):
    oldColumns = dataFrame.schema.names
    dfNewCol =  reduce(lambda dataFrame, idx: dataFrame.withColumnRenamed(oldColumns[idx], newHeader[idx]), xrange(len(oldColumns)), dataFrame)
    return dfNewCol

I capture the newHeader list from another function. The first header in the list is named as Action. Later I apply a filter function in which I drop the Action column and create a new DF

def willBeInserted(dataFrame):
    insertData = ["I"] # Some rows of 'Action' column include value "I"
    insertDF = dataFrame.filter(dataFrame.Action.isin(insertData)).drop('Action')
    return insertDF

Later I call the functions

DF1 = updateHeaders(someDF, headerList) #Update the headers
DF2 = willBeInserted(DF1) #Drop 'Action' column and create new DF

The result is the following error:

pyspark.sql.utils.AnalysisException: u'Reference 'Action' is ambiguous, could be: Action#29, Action#221.;"

I tried the solution approaches in this link and in other similar questions, no change so far. Any ideas?

score 1 · Answer 1 · answered Jan 27 '18 at 11:05

here is some code to rename columns using a udf - hope this helps:

dataDf=spark.createDataFrame(data=[('Alice',4.300,None),('Bob',float('nan'),897)],schema=['name','High','Low'])
dataDf.show()

+-----+----+----+
| name|High| Low|
+-----+----+----+
|Alice| 4.3|null|
|  Bob| NaN| 897|
+-----+----+----+


newColNames=['FistName','newHigh','newLow']

def changeColNames(df,newColNameLst):
    for field,newCol in zip(df.schema.fields,newColNameLst):
        df = df.withColumnRenamed(str(field.name), newCol)
    return df

df2=changeColNames(dataDf,newColNames)
df2.show()

+--------+-------+------+
|FistName|newHigh|newLow|
+--------+-------+------+
|   Alice|    4.3|  null|
|     Bob|    NaN|   897|
+--------+-------+------+

Unfortunately this also returned the same error above. – ylcnky Jan 29 '18 at 10:27 — ylcnky, Jan 29 '18 at 10:27

pyspark withColumnRenamed, drop functions, u'Reference ambiguous error

1 Answers1