0

how to print column names in generic way. I want col1,col2,… instead of _1,_2,…

+---+---+---+---+---+---+---+---+---+---+---+---+
| _1| _2| _3| _4| _5| _6| _7| _8| _9|_10|_11|_12|
+---+---+---+---+---+---+---+---+---+---+---+---+
|  0|  0|  0|  1|  0|  1|  0|  0|  0|  1|  0|   |
|  0|  0|  0|  1|  0|  1|  0|  0|  0|  1|  0|   |
|  0|  0|  0|  0|  0|  1|  1|  0|  1|  1|  0|   |
|  0|  0|  0|  0|  0|  1|  1|  0|  1|  1|  0|   |
|  0|  0|  0|  0|  0|  1|  1|  0|  1|  1|  0|   |
Steven
  • 7,927
  • 4
  • 28
  • 57
  • I just answered the similar question - https://stackoverflow.com/questions/63259555/is-there-any-generic-functions-to-assign-column-names-in-pyspark – Som Aug 05 '20 at 08:23
  • that one not working...it is showing error like this, df_split=index.select(sf.split(index.binary,"")).rdd.flatMap(lambda X: X).toDF(*["col_{}".format(i) for i in range(1,len(index.columns)+1)]) TypeError: toDF() takes from 1 to 3 positional arguments but 10 were given – Vanitha Gopireddy Aug 05 '20 at 08:30
  • Are you the same user having 2 accounts? – Som Aug 05 '20 at 08:31
  • df_split=index.select(sf.split(index.binary,""),"binary").rdd.flatMap(lambda X: X).toDF() – Vanitha Gopireddy Aug 05 '20 at 08:31
  • Why is this required? `.rdd.flatMap(lambda X: X).` – Som Aug 05 '20 at 08:32
  • | _1| _2| _3| _4| _5| _6| _7| _8| _9|_10|_11|_12| +---+---+---+---+---+---+---+---+---+---+---+---+ | 0| 0| 0| 1| 0| 1| 0| 0| 0| 1| 0| | – Vanitha Gopireddy Aug 05 '20 at 08:34
  • Does this answer your question? [How to change dataframe column names in pyspark?](https://stackoverflow.com/questions/34077353/how-to-change-dataframe-column-names-in-pyspark) – Powers Aug 05 '20 at 13:52

1 Answers1

0

assuming df is your dataframe, you can juste rename :

for col in df.columns: 
    df = df.withColumnRenamed(col, col.replace("_", "col"))
Steven
  • 7,927
  • 4
  • 28
  • 57