I have a function which changes the column headers of a DF with a new set of headers in a list.
def updateHeaders(dataFrame, newHeader):
oldColumns = dataFrame.schema.names
dfNewCol = reduce(lambda dataFrame, idx: dataFrame.withColumnRenamed(oldColumns[idx], newHeader[idx]), xrange(len(oldColumns)), dataFrame)
return dfNewCol
I capture the newHeader
list from another function. The first header in the list is named as Action
. Later I apply a filter function in which I drop the Action
column and create a new DF
def willBeInserted(dataFrame):
insertData = ["I"] # Some rows of 'Action' column include value "I"
insertDF = dataFrame.filter(dataFrame.Action.isin(insertData)).drop('Action')
return insertDF
Later I call the functions
DF1 = updateHeaders(someDF, headerList) #Update the headers
DF2 = willBeInserted(DF1) #Drop 'Action' column and create new DF
The result is the following error:
pyspark.sql.utils.AnalysisException: u'Reference 'Action' is ambiguous, could be: Action#29, Action#221.;"
I tried the solution approaches in this link and in other similar questions, no change so far. Any ideas?