0

Documentation doesn't specify if this is allowed or not however I can't seem to get it to work and it isn't very clean to chain multiple DF's over and over.

E.g.

    df1= RenameField.apply(frame = df, old_name = "col1",new_name = "COL1")
df2= RenameField.apply(frame = df1, old_name = "col2", new_name = "COL2") 

I tried a few variants and based on other Glue transforms thought the following would have worked..

df1 = RenameField.apply[(frame = df, old_name = "col1",new_name = "COL1"),
                       (frame = df, old_name = "col2", new_name = "COL2")]
DataDog
  • 362
  • 1
  • 4
  • 18

2 Answers2

0

You can write clean chain code if you accept conversion like ... "DynamicFrame -> DataFrame -> DynamicFrame". DynamicFrame class has conversion methods; toDF and fromDF. I mean, it's better to do renaming on Dataframe class. In order to use PySpark Dataframe class, you can use conversion methods of toDF & fromDF.

PySpark Dataframe class has several column renaming methods, see How to change dataframe column names in pyspark?

hiropon
  • 1,486
  • 2
  • 17
  • 36
0

You can use applyMapping method from DynamicFrame to rename columns and/or cast to another data type (Scala):

val mappedDynamicFrame = sourceDynamicFrame.applyMapping(
      mappings = Seq(
        ("col1", "string", "column_1", "string"),
        ("col2", "string", "column_2", "string"),
        ("col3", "long", "column_3", "timestamp")
      ),
      caseSensitive = false,
      transformationContext = s"mapped-source"
    )
Yuriy Bondaruk
  • 3,464
  • 1
  • 21
  • 38