I saw this post and it was somewhat helpful except that I need to change the headers of a dataframe using a list, because it's long and changes with every dataset I input, so I can't really write out/ hard-code in the new column names.
Ex:
df = sqlContext.read.load("./assets/"+filename,
format='com.databricks.spark.csv',
header='false',
inferSchema='false')
devices = df.first()
metrics = df.take(2)[1]
# Adding the two header rows together as one as a way of later searching through and sorting rows
# delimiter is "..." since it doesn't occur anywhere in the data and we don't have to wory about multiple splits
header = [str(devices[i]) +"..."+ str(metrics[i]) for i in range(len(devices))]
df2 = df.toDF(header)
Then of course I get this error:
IllegalArgumentException: u"requirement failed: The number of columns doesn't match.\nOld column names (278):
The length of header = 278 and the number of columns is the same. So, the real question is, how do I do a non-hard-coded re-naming of headers in a dataframe when I have a list of the new names?
I'm suspecting I have to make the input not in the form of an actual list object, but how do I do this without iterating through each column (with a selectexpr or alias and creating several new dfs (immutable) with one new updated column at a time? (yuck)