Is there any generic function to finding the column names in pyspark

Question

how to print column names in generic way. I want col1,col2,… instead of _1,_2,…

+---+---+---+---+---+---+---+---+---+---+---+---+
| _1| _2| _3| _4| _5| _6| _7| _8| _9|_10|_11|_12|
+---+---+---+---+---+---+---+---+---+---+---+---+
|  0|  0|  0|  1|  0|  1|  0|  0|  0|  1|  0|   |
|  0|  0|  0|  1|  0|  1|  0|  0|  0|  1|  0|   |
|  0|  0|  0|  0|  0|  1|  1|  0|  1|  1|  0|   |
|  0|  0|  0|  0|  0|  1|  1|  0|  1|  1|  0|   |
|  0|  0|  0|  0|  0|  1|  1|  0|  1|  1|  0|   |

I just answered the similar question - https://stackoverflow.com/questions/63259555/is-there-any-generic-functions-to-assign-column-names-in-pyspark — Som, Aug 05 '20 at 08:23
that one not working...it is showing error like this, df_split=index.select(sf.split(index.binary,"")).rdd.flatMap(lambda X: X).toDF(*["col_{}".format(i) for i in range(1,len(index.columns)+1)]) TypeError: toDF() takes from 1 to 3 positional arguments but 10 were given — Vanitha Gopireddy, Aug 05 '20 at 08:30
df_split=index.select(sf.split(index.binary,""),"binary").rdd.flatMap(lambda X: X).toDF() — Vanitha Gopireddy, Aug 05 '20 at 08:31
| _1| _2| _3| _4| _5| _6| _7| _8| _9|_10|_11|_12| +---+---+---+---+---+---+---+---+---+---+---+---+ | 0| 0| 0| 1| 0| 1| 0| 0| 0| 1| 0| | — Vanitha Gopireddy, Aug 05 '20 at 08:34
Does this answer your question? [How to change dataframe column names in pyspark?](https://stackoverflow.com/questions/34077353/how-to-change-dataframe-column-names-in-pyspark) — Powers, Aug 05 '20 at 13:52

score 0 · Answer 1 · answered Aug 05 '20 at 10:05

0

assuming df is your dataframe, you can juste rename :

for col in df.columns: 
    df = df.withColumnRenamed(col, col.replace("_", "col"))

answered Aug 05 '20 at 10:05

Steven

7,927
4
28
57

Is there any generic function to finding the column names in pyspark

1 Answers1