1

I want to access an xml file that's in a public google storage bucket. I tried doing the following:

import requests

url = 'https://storage.cloud.google.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml'

response = requests.get(url)

What happened was that the response was the code for a google login page, rather than the xml file I wanted to access. How should I access this data in python?

D'Arcy
  • 33
  • 4
  • Check the xml file share permissions. – Pedro Lobito Jan 30 '19 at 18:42
  • Ah, this particular file is 100% open to the public, you just need to be signed into google to view it, which I suspect is the source of the issue. – D'Arcy Jan 30 '19 at 18:44
  • Yes, I confirm that you can download the file directly if you're logged into google, otherwise you need to sign in. Please read: https://cloud.google.com/storage/docs/access-control/making-data-public – Pedro Lobito Jan 30 '19 at 18:46
  • response.text contains that xml, are you looking different? – Kannan Kandasamy Jan 30 '19 at 18:48
  • I'm checking the link, it's more an issue of how to sign python into google, if that makes sense? And when I do response.text I definitely get the code for the google sign in page – D'Arcy Jan 30 '19 at 18:51
  • Possible duplicate of [How to "log in" to a website using Python's Requests module?](https://stackoverflow.com/questions/11892729/how-to-log-in-to-a-website-using-pythons-requests-module) – G. Anderson Jan 30 '19 at 18:54
  • I mean, not really, in all likelihood that method would lead to my google account getting flagged – D'Arcy Jan 30 '19 at 19:02

1 Answers1

0

In order to download the file directly (without being logged into google), you'll need to change the url, i.e.:

From:

https://storage.cloud.google.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml

To:

https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml

Python Sample:

import requests
u = "https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/04/Q/FJ/S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE/MTD_MSIL1C.xml"
r = requests.get(u)
open('MTD_MSIL1C.xml', 'wb').write(r.content)

MTD_MSIL1C.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<n1:Level-1C_User_Product xmlns:n1="https://psd-14.sentinel2.eo.esa.int/PSD/User_Product_Level-1C.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://psd-14.sentinel2.eo.esa.int/PSD/User_Product_Level-1C.xsd">
    <n1:General_Info>
        <Product_Info>
            <PRODUCT_START_TIME>2019-01-26T21:09:21.024Z</PRODUCT_START_TIME>
            <PRODUCT_STOP_TIME>2019-01-26T21:09:21.024Z</PRODUCT_STOP_TIME>
            <PRODUCT_URI>S2A_MSIL1C_20190126T210921_N0207_R057_T04QFJ_20190126T221757.SAFE</PRODUCT_URI>
            <PROCESSING_LEVEL>Level-1C</PROCESSING_LEVEL>
            <PRODUCT_TYPE>S2MSI1C</PRODUCT_TYPE>
...

Notes:

  1. Accessing Public Data (API Link)
  2. I've no idea why it works just by changing the url, but it does.
Pedro Lobito
  • 75,541
  • 25
  • 200
  • 222
  • Thanks man, that's a real step in the right direction, but I get a new problem now, now I get this returned to me: `We\'re sorry...

    ... but your computer or network may be sending automated queries. To protect our users, we can\'t process your request right now.`

    – D'Arcy Jan 30 '19 at 18:56
  • Ah yes, but this is the response I get within python – D'Arcy Jan 30 '19 at 19:01
  • 1
    Hi Pedro, given sufficient delay between queries it works, I suspect that that warrants a new question though, thanks so much for your help! – D'Arcy Jan 31 '19 at 11:09