3

The following code worked for me before, but not anymore. I got the error:

AttributeError: 'DataFrame' object has no attribute 'toDF'

if __name__ == "__main__":
  sc = SparkContext(appName="test")
  sqlContext = SQLContext(sc)

  df = sqlContext.read.format('com.databricks.spark.csv').\
    options(header='false',delimiter=',',inferSchema='true').load('test')

  ### rename columns
  df = df.toDF('a','b','c')
  ...
  sc.stop()
Devid Farinelli
  • 7,020
  • 8
  • 35
  • 65
user3610141
  • 165
  • 2
  • 12

2 Answers2

1

I figured it out. Looks like it has to do with our spark version. It worked with 1.6

user3610141
  • 165
  • 2
  • 12
0

if you are working with spark version 1.6 then use this code for conversion of rdd into df

from pyspark.sql import SQLContext, Row
sqlContext = SQLContext(sc)
df = sqlContext.createDataFrame(rdd)

if you want to assign title to rows then use this

df= rdd.map(lambda p: Row(ip=p[0], time=p[1], zone=p[2]))

ip,time,zone are row headers in this example.

Hamid Ali
  • 13
  • 6