4

I'm trying to download Australian Bureau of Statistics data using pandasdmx. I can download the ERP_COB using SDMX no problem but for ERP by SA2, age and sex I am getting a timeout error. I have limited the time period to 2018 only but am still timing out.The weird thing is that it works sometimes but mostly not. I would like to try limiting the parameters such as age or sex but not sure how to do this. Any help would be much appreciated.

Thanks in advance.

from pandasdmx import Request

Agency_Code = 'ABS'
Dataset_Id = 'ABS_ERP_ASGS2016'
ABS = Request(Agency_Code)
data_response = ABS.data(resource_id='ABS_ERP_ASGS2016', params={'startTime': '2018','endTime': '2018'})
def timeout(self, value):
    self.client.config['timeout'] = 10000
ERP2018=data_response.write().unstack().reset_index()

ERP2018 = ERP2018[(ERP2018.REGIONTYPE =='AUS') | (ERP2018.REGIONTYPE =='STE')]

ERP2018.to_csv('c:\\Temp\\erp2018.csv')

2 Answers2

1

I've managed to figure this out largely thanks to Anthony Kong.

I've modified the resource_id to match the URL given at the ABS website to apply a filter to my request to allow for a smaller request and no timeouts. I've also been told how to change the ABS timeout value thanks to the ABS staff. See below.

from pandasdmx import Request

Agency_Code = 'ABS'
Dataset_Id = 'ABS_ERP_ASGS2016'
ABS = Request(Agency_Code)
ABS.client.config['timeout'] = 100000
data_response = ABS.data(resource_id='ABS_ERP_ASGS2016/ERP.3+1+2.TT+A04+A59+A10+A15+A20+A25+A30+A35+A40+A45+A50+A55+A60+A65+A70+A75+A80+8599.AUS+STE..A/all?', params={'startTime': '2009','endTime': '2018'})

ERP2018=data_response.write().unstack().reset_index()

ERP2018.to_csv('c:\\Temp\\erp2018.csv')
0

If you turns on logging,

 import logging

 from pandasdmx import Request
 Agency_Code = 'ABS'
 Dataset_Id = 'ABS_ERP_ASGS2016'
 ABS = Request(Agency_Code, log_level=logging.INFO)

you can see the Request module is trying to download from http://stat.data.abs.gov.au/sdmx-json/data/ABS_ERP_ASGS2016. If you try this url in your browser, you will see you are not getting anything from ABS server.

I don't see any problem with your python code.

The odd thing is there is no data set for prior year (2015) or following years (2017, 2018). So this data set seems to be an odd one out.

It is likely a data issue. You can either contact the maintainers of pandasdmx or talk to ABS directly.

Anthony Kong
  • 29,857
  • 33
  • 139
  • 244
  • 1
    Thanks Anthony. It is called **_2016 because the estimates are generated from the 2016 census. Agreed it might be a data issue because the query consistently works with a smaller data set but inconsistently with this one.Thanks for taking your time time check it out! – epidemiologistseekshelp Feb 17 '20 at 22:50