25

I am scraping some data with complex hierarchical info and need to export the result to json.

I defined the items as

class FamilyItem():
    name = Field()
    sons = Field()

class SonsItem():
    name = Field()
    grandsons = Field()

class GrandsonsItem():
    name = Field()
    age = Field()
    weight = Field()
    sex = Field()

and when the spider runs complete, I will get a printed item output like

{'name': 'Jenny',
   'sons': [
            {'name': u'S1',
             'grandsons': [
                   {'name': u'GS1',
                    'age': 18,
                    'weight': 50
                   },
                   {
                    'name':u'GS2',
                    'age': 19,
                    'weight':51}]
                   }]
}

but when I run scrapy crawl myscaper -o a.json, it always says the result "is not JSON serializable". Then I copy and paste the item output into ipython console and use json.dumps(), it works fine.So where is the problem? this is driving my nuts...

Tomas Sykora
  • 450
  • 1
  • 5
  • 15
Shadow Lau
  • 441
  • 1
  • 7
  • 9

2 Answers2

34

When saving the nested items, make sure to wrap them in a call to dict(), e.g.:

gs1 = GrandsonsItem()
gs1['name'] = 'GS1'
gs1['age'] = 18
gs1['weight'] = 50

gs2 = GrandsonsItem()
gs2['name'] = 'GS2'
gs2['age'] = 19
gs2['weight'] = 51

s1 = SonsItem()
s1['name'] = 'S1'
s1['grandsons'] = [dict(gs1), dict(gs2)]

jenny = FamilyItem()
jenny['name'] = 'Jenny'
jenny['sons'] = [dict(s1)]
Myle Ott
  • 616
  • 6
  • 5
2

Not sure if there's a way to do nested items in scrapy with classes but arrays work fine. You could do something like this:

grandson = Grandson(name = 'Grandson', age = 2)

son = Son(name = 'Son', grandsons = [grandson])

item = Item(name = 'Name', son = [son])
akohout
  • 1,742
  • 3
  • 21
  • 41
Leo
  • 926
  • 1
  • 8
  • 29