I am trying to access inner attributes of following json using pyspark
[
{
"432": [
{
"atttr1": null,
"atttr2": "7DG6",
"id":432,
"score": 100
}
]
},
{
"238": [
{
"atttr1": null,
"atttr2": "7SS8",
"id":432,
"score": 100
}
]
}
]
In the output, I am looking for something like below in form of csv atttr1, atttr2,id,score null,"7DG6",432,100 null,"7SS8",238,100
I understand I can get these details like below but I don't want to pass 432 or 238 in lambda expression as in bigger json this(italic one) will vary. I want to iterate over all available values.
print(inputDF.rdd.map(lambda x:(x['*432*'])).first())
print(inputDF.rdd.map(lambda x:(x['*238*'])).first())
I also tried registering a temp table with the name "test" but it gave an error with message element._id doesn't exist.
inputDF.registerTempTable("test")
srdd2 = spark.sql("select element._id from test limit 1")
Any help will be highly appreciated. I am using spark 2.4