Questions tagged [tensorflow-datasets]

TensorFlow's `tf.data` module provides a functional API for building input pipelines, using the `tf.data.Dataset` and `tf.data.Iterator` classes.

1667 questions
13
votes
2 answers

tensorflow dataset shuffle then batch or batch then shuffle

I recently began learning tensorflow. I am unsure about whether there is a difference x = np.array([[1],[2],[3],[4],[5]]) dataset = tf.data.Dataset.from_tensor_slices(x) ds.shuffle(buffer_size=4) ds.batch(4) and x =…
Lim Kaizhuo
  • 524
  • 2
  • 5
  • 14
13
votes
4 answers

How to iterate a dataset several times using TensorFlow's Dataset API?

How to output the value in a dataset several times? (dataset is created by Dataset API of TensorFlow) import tensorflow as tf dataset = tf.contrib.data.Dataset.range(100) iterator = dataset.make_one_shot_iterator() next_element =…
void
  • 195
  • 1
  • 1
  • 8
13
votes
1 answer

How does one move data to multiple GPU towers using Tensorflow's Dataset API

We are running multi GPU jobs on Tensorflow and evaluating a migration from the queue based model (using the string_input_producer interface) to the new Tensorflow Dataset API. The latter appears to offer an easier way to switch between Train and…
7hacker
  • 1,648
  • 2
  • 16
  • 29
12
votes
5 answers

Not able to import tensorflow_datasets module in jupyter notebook

I am trying tensorflow course from Udacity which uses google colab to write/run the code. But I want to run the code on my local machine and hence have created a new environment to run the code , but am unable to import tensorflow_dataset into the…
12
votes
1 answer

How to use py_func with a function that returns dict

I'm writing an input pipeline using tf.data.Dataset. I'd like to use python code to load and transform my samples, the code returns a dictionary of tensors. Unfortunately I don't see how I can define that as the output type that is passed to…
Piotr Czapla
  • 23,150
  • 23
  • 90
  • 120
12
votes
2 answers

Is there a way to stack two tensorflow datasets?

I want to stack two datasets objects in Tensorflow (rbind function in R). I have created one dataset A from tfRecord files and one dataset B from numpy arrays. Both have same variables. Do you know if there is a way to stack these two datasets to…
Kent930
  • 131
  • 1
  • 1
  • 5
12
votes
1 answer

tf.train.MonitoredTrainingSession and reinitializable iterator from Dataset

It seems as if a MonitoredTrainingSession do some operations (logging?) before the first call to .run(..), meaning that when I do: train_data = reader.traindata() # returns a tf.contrib.data.Dataset it =…
11
votes
1 answer

IDE breakpoint in TensorFlow Dataset API mapped py_function?

I'm using the Tensorflow Dataset API to prepare my data for input into my network. During this process, I have some custom Python functions which are mapped to the dataset using tf.py_function. I want to be able to debug the data going into these…
golmschenk
  • 9,361
  • 17
  • 69
  • 117
11
votes
1 answer

How to make custom loss with extra input in tensorflow 2.0

I'm having a lot of trouble getting a custom loss function with an extra argument to work in TF 2.0 using tf.keras and a dataset. In the following case, the extra argument is the input data into the model, which is contained in a Dataset. In 1.14…
Luke
  • 5,052
  • 9
  • 39
  • 68
11
votes
2 answers

How to acquire tf.data.dataset's shape?

I know dataset has output_shapes, but it shows like below: data_set: DatasetV1Adapter shapes: {item_id_hist: (?, ?), tags: (?, ?), client_platform: (?,), entrance: (?,), item_id: (?,), lable: (?,), mode: (?,), time: (?,), user_id: (?,)}, types:…
11
votes
2 answers

How do you send arguments to a generator function using tf.data.Dataset.from_generator()?

I would like to create a number of tf.data.Dataset using the from_generator() function. I would like to send an argument to the generator function (raw_data_gen). The idea is that the generator function will yield different data depending on the…
11
votes
3 answers

Tensorflow: How to find the size of a tf.data.Dataset API object

I understand Dataset API is a sort of a iterator which does not load the entire dataset into memory, because of which it is unable to find the size of the Dataset. I am talking in the context of large corpus of data that is stored in text files or…
omsrisagar
  • 191
  • 2
  • 9
11
votes
1 answer

tf.data.Dataset.padded_batch pad differently each feature

I have a tf.data.Dataset instance which holds 3 different features label which is a scalar sequence_feature which is a sequence of scalars seq_of_seqs_feature which is a sequence of sequences feature I am trying to use…
bluesummers
  • 7,494
  • 4
  • 55
  • 85
11
votes
3 answers

Oversampling functionality in Tensorflow dataset API

I would like to ask if current API of datasets allows for implementation of oversampling algorithm? I deal with highly imbalanced class problem. I was thinking that it would be nice to oversample specific classes during dataset parsing i.e. online…
K Kolasinski
  • 300
  • 2
  • 11
11
votes
1 answer

How do I use the "group_by_window" function in TensorFlow

In TensorFlow's new set of input pipeline functions, there is an ability to group sets of records together using the "group_by_window" function. It is described in the documentation…
John Scolaro
  • 617
  • 6
  • 18