TensorFlow's `tf.data` module provides a functional API for building input pipelines, using the `tf.data.Dataset` and `tf.data.Iterator` classes.
Questions tagged [tensorflow-datasets]
1667 questions
111
votes
5 answers
Meaning of buffer_size in Dataset.map , Dataset.prefetch and Dataset.shuffle
As per TensorFlow documentation , the prefetch and map methods of tf.contrib.data.Dataset class, both have a parameter called buffer_size.
For prefetch method, the parameter is known as buffer_size and according to documentation :
buffer_size: A…
Ujjwal
- 1,438
- 2
- 7
- 17
72
votes
4 answers
What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?
I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset.
I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and…
Llewlyn
- 1,143
- 1
- 9
- 10
52
votes
8 answers
Split a dataset created by Tensorflow dataset API in to Train and Test?
Does anyone know how to split a dataset created by the dataset API (tf.data.Dataset) in Tensorflow into Test and Train?
Dani
- 567
- 1
- 4
- 8
47
votes
2 answers
tf.data with multiple inputs / outputs in Keras
For the application, such as pair text similarity, the input data is similar to: pair_1, pair_2. In these problems, we usually have multiple input data. Previously, I implemented my models successfully:
model.fit([pair_1, pair_2], labels,…
Amir
- 13,841
- 10
- 67
- 104
43
votes
15 answers
tf.data.Dataset: how to get the dataset size (number of elements in a epoch)?
Let's say I have defined a dataset in this way:
filename_dataset = tf.data.Dataset.list_files("{}/*.png".format(dataset))
how can I get the number of elements that are inside the dataset (hence, the number of single elements that compose an…
nessuno
- 23,549
- 5
- 71
- 70
43
votes
3 answers
TensorFlow: training on my own image
I am new to TensorFlow. I am looking for the help on the image recognition where I can train my own image dataset.
Is there any example for training the new dataset?
VICTOR
- 1,724
- 5
- 19
- 51
38
votes
3 answers
parallelising tf.data.Dataset.from_generator
I have a non trivial input pipeline that from_generator is perfect for...
dataset = tf.data.Dataset.from_generator(complex_img_label_generator,
(tf.int32, tf.string))
dataset = dataset.batch(64)
iter =…
mat kelcey
- 2,977
- 2
- 28
- 31
34
votes
2 answers
How do I split Tensorflow datasets?
I have a tensorflow dataset based on one .tfrecord file. How do I split the dataset into test and train datasets? E.g. 70% Train and 30% test?
Edit:
My Tensorflow Version: 1.8
I've checked, there is no "split_v" function as mentioned in the possible…
Lukas Hestermeyer
- 453
- 1
- 5
- 11
31
votes
6 answers
How to extract data/labels back from TensorFlow dataset
there are plenty of examples how to create and use TensorFlow datasets, e.g.
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
My question is how to get back the data/labels from the TF dataset in numpy form? In other words want would…
Valentin
- 976
- 2
- 11
- 24
28
votes
0 answers
How to create only one copy of graph in tensorboard events file with custom tf.Estimator?
I'm using a custom tf. Estimator object to train a neural network. The problem is in the size of the events file after training - it is unreasonably large.
I've already solved the problem with saving part of a dataset as constant by using…
Andrii Zadaianchuk
- 700
- 8
- 15
27
votes
4 answers
Difference between tf.data.Dataset.map() and tf.data.Dataset.apply()
With the recent upgrade to version 1.4, Tensorflow included tf.data in the library core.
One "major new feature" described in the version 1.4 release notes is tf.data.Dataset.apply(), which is a "method for
applying custom transformation functions".…
GPhilo
- 15,115
- 6
- 55
- 75
23
votes
4 answers
How to input a list of lists with different sizes in tf.data.Dataset
I have a long list of lists of integers (representing sentences, each one of different sizes) that I want to feed using the tf.data library. Each list (of the lists of list) has different length, and I get an error, which I can reproduce here:
t =…
Escachator
- 1,283
- 1
- 10
- 26
21
votes
2 answers
How to improve data input pipeline performance?
I try to optimize my data input pipeline.
The dataset is a set of 450 TFRecord files of size ~70MB each, hosted on GCS.
The job is executed with GCP ML Engine. There is no GPU.
Here is the pipeline:
def build_dataset(file_pattern):
return…
AlexisBRENON
- 2,516
- 2
- 12
- 22
21
votes
2 answers
TensorFlow - tf.data.Dataset reading large HDF5 files
I am setting up a TensorFlow pipeline for reading large HDF5 files as input for my deep learning models. Each HDF5 file contains 100 videos of variable size length stored as a collection of compressed JPG images (to make size on disk manageable).…
verified.human
- 1,197
- 2
- 16
- 24
20
votes
1 answer
Tensorflow tf.data AUTOTUNE
I was reading the TF performance guide for Data Loading section. For prefetch it says,
The tf.data API provides a software pipelining mechanism through the
tf.data.Dataset.prefetch transformation, which can be used to decouple
the time when…
dgumo
- 1,568
- 11
- 17