Questions tagged [google-cloud-dataprep]

An intelligent cloud data service to visually explore, clean, and prepare data for analysis.

DataPrep (or more accurately Cloud Dataprep by Trifacta) is a visual data transformation tool built by Trifacta and offered as part of Google Cloud Platform.

It is capable of ingesting data from and writing data to several other Google services (BigQuery, Cloud Storage).

Data is transformed using recipes which are shown alongside a visual representation of the data. This allows the user to preview changes, profile columns and spot outliers and type mismatches.

When a DataPrep flow is run (either manually or scheduled), a DataFlow job is created to run the task. DataFlow is Google's managed Apache Beam service.

196 questions
2
votes
1 answer

Google Cloud Dataprep: Transformation engine unavailable due to prior crash (exit code: -1)

I am trying to create a flow using Google Cloud Dataprep. The flow takes a data set from Big Query which contains app events data from Firebase Analytics to flatten event parameters for easier analysis. I keep getting the following error before even…
2
votes
1 answer

Dataprep: job finish event

We are considering using Dataprep on an automatic schedule in order to wrangle & load a folder of GCS .gz files into Big Query. The challenge is: how can the source .gz files be moved to cold storage once they are processed ? I can't find an event…
jldupont
  • 82,560
  • 49
  • 190
  • 305
2
votes
2 answers

Google Cloud DataPrep fails with cross-region error when using EU BigQuery db

I hit some issues today developing some new flows - the first I've done reading from & loading into EU-region BigQuery databases. To isolate the issue, I took the following steps: Create a new BQ database in the EU region Create a table by…
2
votes
1 answer

Google Dataprep - replace data in columns

I have started to use Google's Dataprep solution to cleanse eCommerce product feeds. As I receive data from 100s of eCommerce stores, I want to cleanse the data for consistency and rename the various spellings of brand names. For example, I have a…
2
votes
2 answers

Google Cloud Dataprep - Functions

Are there any functions like discretization, normalization and data transformation (categorical to numeric) on Google Cloud Dataprep?
Gozde
  • 21
  • 3
2
votes
2 answers

How do I give access to Google Cloud Dataprep?

I have created a flow in Cloud Dataprep, job executed. All fine. However, my colleagues, who also has owner role in this GCP project, are not able to see the flow I created. I'm not able to find sharing options anywhere. How should it be setup so…
paulboony
  • 178
  • 2
  • 4
2
votes
2 answers

Dataflow Workers unable to connect to Dataflow Service

I am using Google Dataprep to start Dataflow jobs and am facing some difficulties. For background, we used Dataprep for some weeks and it worked without problem before we started to have authorization issues with the service account. When we finally…
1
vote
0 answers

GCP DataPrep - Unable to rename output files Error

I have created simple dataprep workflow(Source File as CSV from GCS, simple transformation(Upper case conversion) & Target - load into BigQuery). When i run this workflow job in DataPrep UI, I am getting error as: Unable to rename output files…
1
vote
1 answer

Dataprep - accents and special characters

How do I solve this problem with accents / special characters in the dataprep? I need this information to appear. Thank you very much for your attention.
Theorp
  • 151
  • 8
1
vote
0 answers

Dataprep recipe fails to load with "Cannot read property 'expandScriptLines' of undefined"

A recent update to dataprep sometime between August 7-10 has broken a number of our dataprep recipes. Broken recipes fail to load with the error "Cannot read property 'expandScriptLines' of undefined". The browser console shows the following…
ty.
  • 10,411
  • 9
  • 49
  • 63
1
vote
0 answers

Dataprep Bigquery running in different region

So I get the following error when running dataprep. java.io.IOException: Query job beam_job_9e016180fbb74637b35319c89b6ed6d7_clouddataprepleads6085795bynick-query-d23eb37a1bee4a788e7b16c1de1f92e6 failed, status: { "errorResult" : { "message" : "Not…
1
vote
0 answers

Dataprep - missing rows after processing

I have csv containing 1.5 milion rows. I prepared Dataprep job that parse data and store them to BQ (or CSV). But after processing I have nearly half of rows missing (around 700k). When I run this Dataprep job without any recipe steps I got the same…
Jozef Cechovsky
  • 2,733
  • 2
  • 25
  • 43
1
vote
0 answers

Dataprep is leaving Datasets/Tables behind in BigQuery

I am using Google Cloud Dataprep for processing data stored in BigQuery. I am having an issue with dataprep/dataflow creates a new dataset with a name starting with "temp_dataset_beam_job_" It seems to crate the temporary dataset both for failed and…
1
vote
1 answer

In Trifacta or Google Cloud Dataprep, i'm trying to flag rows with non alpha numeric (�). What formula do I use?

In Trifacta or Google Cloud Dataprep, i'm trying to flag rows with non alpha numeric (�). What formula do I use? tried this formula but doesn't work Replace Matches of `�` from EMPLOYEE_FIRST with NOT VALID
1
vote
1 answer

Conversion of DateTime to Timestamp -adding the timestamp

I want to convert the date from former to later format 2020-04-14T14:56:43 TO 2020-04-14 14:56:43 UTC Basically how to convert the DATETIME into TIMESTAMP IN Dataprep?
1 2
3
13 14