Questions tagged [tensorflow-datasets]

TensorFlow's `tf.data` module provides a functional API for building input pipelines, using the `tf.data.Dataset` and `tf.data.Iterator` classes.

1667 questions
19
votes
2 answers

Replacing tf.placeholder and feed_dict with tf.data API

I have an existing TensorFlow model which used a tf.placeholder for the model input and the feed_dict parameter of tf.Session().run to feed in data. Previously the entire dataset was read into memory and passed in this way. I want to use a much…
erobertc
  • 594
  • 1
  • 8
  • 18
18
votes
3 answers

Parallelism isn't reducing the time in dataset map

TF Map function supports parallel calls. I'm seeing no improvements passing num_parallel_calls to map. With num_parallel_calls=1 and num_parallel_calls=10, there is no improvement in performance run time. Here is a simple code import time def…
Kracekumar
  • 16,101
  • 10
  • 41
  • 51
18
votes
1 answer

Tensorflow Data API - prefetch

I am trying to use new features of TF, namely Data API, and I am not sure how prefetch works. In the code below def dataset_input_fn(...) dataset = tf.data.TFRecordDataset(filenames, compression_type="ZLIB") dataset = dataset.map(lambda…
MPękalski
  • 5,609
  • 4
  • 23
  • 33
18
votes
3 answers

How to map a function with additional parameter using the new Dataset api in TF1.3?

I'm playing with the Dataset API in Tensorflow v1.3. It's great. It is possible to map a dataset with a function as described here. I am interested to know how can I pass a function which has an additional argument, for example arg1: def…
18
votes
4 answers

How do I create padded batches in Tensorflow for tf.train.SequenceExample data using the DataSet API?

For training an LSTM model in Tensorflow, I have structured my data into a tf.train.SequenceExample format and stored it into a TFRecord file. I would now like to use the new DataSet API to generate padded batches for training. In the documentation…
Marijn Huijbregts
  • 183
  • 1
  • 1
  • 5
16
votes
2 answers

How to improve the performance of this data pipeline for my tensorflow model

I have a tensorflow model which I am training on google-colab. The actual model is more complex, but I condensed it into a reproducible example (removed saving/restoring, learning rate decay, asserts, tensorboard events, gradient clipping and so…
Salvador Dali
  • 182,715
  • 129
  • 638
  • 708
16
votes
1 answer

Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords?

My question is about how to get batch inputs from multiple (or sharded) tfrecords. I've read the example https://github.com/tensorflow/models/blob/master/inception/inception/image_processing.py#L410. The basic pipeline is, take the training set as…
mining
  • 3,073
  • 3
  • 34
  • 56
15
votes
2 answers

how to get string value out of tf.tensor which dtype is string

I want to use tf.data.Dataset.list_files function to feed my datasets. But because the file is not image, I need to load it manually. The problem is tf.data.Dataset.list_files pass variable as tf.tensor and my python code can not handle tensor. How…
Ko Ohhashi
  • 582
  • 1
  • 6
  • 20
15
votes
1 answer

How to apply data augmentation in TensorFlow 2.0 after tfds.load()

I'm following this guide. It shows how to download datasets from the new TensorFlow Datasets using tfds.load() method: import tensorflow_datasets as tfds SPLIT_WEIGHTS = (8, 1, 1) splits =…
15
votes
5 answers

Tensorflow : logits and labels must have the same first dimension

I am new in tensoflow and I want to adapt the MNIST tutorial https://www.tensorflow.org/tutorials/layers with my own data (images of 40x40). This is my model function : def cnn_model_fn(features, labels, mode): # Input Layer …
15
votes
4 answers

How can I filter tf.data.Dataset by specific values?

I create a dataset by reading the TFRecords, I map the values and I want to filter the dataset for specific values, but since the result is a dict with tensors, I am not able to get the actual value of a tensor or to check it with tf.cond() /…
tsveti_iko
  • 3,976
  • 2
  • 32
  • 32
15
votes
3 answers

In Tensorflow's Dataset API how do you map one element into multiple elements?

In the tensorflow Dataset pipeline I'd like to define a custom map function which takes a single input element (data sample) and returns multiple elements (data samples). The code below is my attempt, along with the desired results. I could not…
David Parks
  • 25,796
  • 41
  • 148
  • 265
15
votes
1 answer

When to use tensorflow datasets api versus pandas or numpy

There are a number of guides I've seen on using LSTMs for time series in tensorflow, but I am still unsure about the current best practices in terms of reading and processing data - in particular, when one is supposed to use the tf.data.Dataset API.…
ira
  • 605
  • 6
  • 11
14
votes
1 answer

Correct way of doing data augmentation in TensorFlow with the dataset api?

So, I've been playing around with the TensorFlow dataset API for loading images, and segmentation masks (for a semantic segmentation project), I would like to be able to generate batches of images and masks, with each image having randomly gone…
Hasnain Raza
  • 691
  • 5
  • 10
13
votes
2 answers

Tensorflow tf.data.Dataset API, dataset unzip function?

In tensorflow 1.12 there is the Dataset.zip function: documented here. However, I was wondering if there is a dataset unzip function which will return back the original two datasets. # NOTE: The following examples use `{ ... }` to represent the #…
Ouwen Huang
  • 878
  • 2
  • 7
  • 19