Questions tagged [tensorflow-datasets]

TensorFlow's `tf.data` module provides a functional API for building input pipelines, using the `tf.data.Dataset` and `tf.data.Iterator` classes.

1667 questions
111
votes
5 answers

Meaning of buffer_size in Dataset.map , Dataset.prefetch and Dataset.shuffle

As per TensorFlow documentation , the prefetch and map methods of tf.contrib.data.Dataset class, both have a parameter called buffer_size. For prefetch method, the parameter is known as buffer_size and according to documentation : buffer_size: A…
Ujjwal
  • 1,438
  • 2
  • 7
  • 17
72
votes
4 answers

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?

I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset. I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and…
Llewlyn
  • 1,143
  • 1
  • 9
  • 10
52
votes
8 answers

Split a dataset created by Tensorflow dataset API in to Train and Test?

Does anyone know how to split a dataset created by the dataset API (tf.data.Dataset) in Tensorflow into Test and Train?
Dani
  • 567
  • 1
  • 4
  • 8
47
votes
2 answers

tf.data with multiple inputs / outputs in Keras

For the application, such as pair text similarity, the input data is similar to: pair_1, pair_2. In these problems, we usually have multiple input data. Previously, I implemented my models successfully: model.fit([pair_1, pair_2], labels,…
Amir
  • 13,841
  • 10
  • 67
  • 104
43
votes
15 answers

tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?

Let's say I have defined a dataset in this way: filename_dataset = tf.data.Dataset.list_files("{}/*.png".format(dataset)) how can I get the number of elements that are inside the dataset (hence, the number of single elements that compose an…
nessuno
  • 23,549
  • 5
  • 71
  • 70
43
votes
3 answers

TensorFlow: training on my own image

I am new to TensorFlow. I am looking for the help on the image recognition where I can train my own image dataset. Is there any example for training the new dataset?
VICTOR
  • 1,724
  • 5
  • 19
  • 51
38
votes
3 answers

parallelising tf.data.Dataset.from_generator

I have a non trivial input pipeline that from_generator is perfect for... dataset = tf.data.Dataset.from_generator(complex_img_label_generator, (tf.int32, tf.string)) dataset = dataset.batch(64) iter =…
mat kelcey
  • 2,977
  • 2
  • 28
  • 31
34
votes
2 answers

How do I split Tensorflow datasets?

I have a tensorflow dataset based on one .tfrecord file. How do I split the dataset into test and train datasets? E.g. 70% Train and 30% test? Edit: My Tensorflow Version: 1.8 I've checked, there is no "split_v" function as mentioned in the possible…
Lukas Hestermeyer
  • 453
  • 1
  • 5
  • 11
31
votes
6 answers

How to extract data/labels back from TensorFlow dataset

there are plenty of examples how to create and use TensorFlow datasets, e.g. dataset = tf.data.Dataset.from_tensor_slices((images, labels)) My question is how to get back the data/labels from the TF dataset in numpy form? In other words want would…
Valentin
  • 976
  • 2
  • 11
  • 24
28
votes
0 answers

How to create only one copy of graph in tensorboard events file with custom tf.Estimator?

I'm using a custom tf. Estimator object to train a neural network. The problem is in the size of the events file after training - it is unreasonably large. I've already solved the problem with saving part of a dataset as constant by using…
27
votes
4 answers

Difference between tf.data.Dataset.map() and tf.data.Dataset.apply()

With the recent upgrade to version 1.4, Tensorflow included tf.data in the library core. One "major new feature" described in the version 1.4 release notes is tf.data.Dataset.apply(), which is a "method for applying custom transformation functions".…
GPhilo
  • 15,115
  • 6
  • 55
  • 75
23
votes
4 answers

How to input a list of lists with different sizes in tf.data.Dataset

I have a long list of lists of integers (representing sentences, each one of different sizes) that I want to feed using the tf.data library. Each list (of the lists of list) has different length, and I get an error, which I can reproduce here: t =…
Escachator
  • 1,283
  • 1
  • 10
  • 26
21
votes
2 answers

How to improve data input pipeline performance?

I try to optimize my data input pipeline. The dataset is a set of 450 TFRecord files of size ~70MB each, hosted on GCS. The job is executed with GCP ML Engine. There is no GPU. Here is the pipeline: def build_dataset(file_pattern): return…
21
votes
2 answers

TensorFlow - tf.data.Dataset reading large HDF5 files

I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as a collection of compressed JPG images (to make size on disk manageable).…
verified.human
  • 1,197
  • 2
  • 16
  • 24
20
votes
1 answer

Tensorflow tf.data AUTOTUNE

I was reading the TF performance guide for Data Loading section. For prefetch it says, The tf.data API provides a software pipelining mechanism through the tf.data.Dataset.prefetch transformation, which can be used to decouple the time when…
dgumo
  • 1,568
  • 11
  • 17
1
2 3
99 100