Normally, when we read a csv file in R, the spaces are automatically converted to '.'
> df <- read.csv("report.csv")
> str(df)
'data.frame': 598 obs. of 61 variables:
$ LR.Number
$ Vehicle.Number
However, when we read the same csv file in sparkR, the space remains intact and is not handled implicitly by spark
#To read a csv file
df <- read.df(sqlContext, path = "report.csv", source = "com.databricks.spark.csv", inferSchema = "true", header="true")
printSchema(df)
root
|-- LR Number: string (nullable = true)
|-- Vehicle Number: string (nullable = true)
Because of this, performing any activity with the column causes a lot of trouble and need to be call like this
head(select(df, df$`LR Number`))
How can I explicitly handle this? How can sparkR implicitly handle this.
I am using sparkR 1.5.0 version