0

We are using GoogleCloudPlatform for big-data analytics. For processing we are currently using the google cloud dataproc & spark-streaming.

I want to submit a Spark job using the REST API, but when I am calling the URI with the api-key, I am getting the below error!

{
  "error": {
    "code": 403,
    "message": "The caller does not have permission",
    "status": "PERMISSION_DENIED"
  }
}

URI :- https://dataproc.googleapis.com/v1/projects/orion-0010/regions/us-central1-f/clusters/spark-recon-1?key=AIzaSyA8C2lF9kT*************SGxAipL0

I created the API from the google console> API manager

Remis Haroon - رامز
  • 2,536
  • 1
  • 28
  • 53

1 Answers1

2

While API keys can be used for associating calls with a developer project, it's not actually used for authorization. Dataproc's REST API, like most other billable REST APIs within Google Cloud Platform, uses oauth2 for authentication and authorization. If you want to call the API programmatically, you'll likely want to use one of the client libraries such as the Java SDK for Dataproc which provides convenience wrappers around the low-level JSON protocols, as well as giving you handy thick libraries for using oauth2 credentials.

You can also experiment with the direct REST API using Google's API explorer where you'll need to click the button on the top right that says "Authorize requests using OAuth 2.0".

I also noticed you used us-central1-f under the regions/ path for the Dataproc URI; note that Dataproc's regions don't map one-to-one with Compute Engine zones or regions; rather, Dataproc's regions will each contain multiple Compute Engine zones or regions. Currently there is only one Dataproc region available publicly, which is called global and is capable of deploying clusters into all Compute Engine zones. For an easy illustration of using an oauth2 access token, you can simply use curl along with gcloud if you have the gcloud CLI installed:

PROJECT='<YOUR PROJECT HERE>'
ACCESS_TOKEN=$(gcloud beta auth application-default print-access-token)
curl \
    --header "Authorization: Bearer ${ACCESS_TOKEN}" \
    --header "Content-Type: application/json" \
    https://dataproc.googleapis.com/v1/projects/${PROJECT}/regions/global/clusters

Keep in mind that the ACCESS_TOKEN printed by gcloud here by nature expires (in about 5 minutes, if I remember correctly); the key concept is that the token you pass along in HTTP headers for each request will generally be a "short-lived" token, and by design you'll have code which separately fetches new tokens whenever the access tokens expire using a "refresh token"; this helps protect against accidentally compromising long-lived credentials. This "refresh" flow is part of what the thick auth libraries handle under the hood.

Dennis Huo
  • 9,799
  • 20
  • 41