How to extract data/labels back from TensorFlow dataset

Question

there are plenty of examples how to create and use TensorFlow datasets, e.g.

dataset = tf.data.Dataset.from_tensor_slices((images, labels))

My question is how to get back the data/labels from the TF dataset in numpy form? In other words want would be reverse operation of the line above, i.e. I have a TF dataset and want to get back images and labels from it.

score 22 · Answer 1 · edited Nov 04 '20 at 09:34

22

In case your tf.data.Dataset is batched, the following code will retrieve all the y labels:

y = np.concatenate([y for x, y in ds], axis=0)

edited Nov 04 '20 at 09:34

Jaroslav Bezděk

2,697
2
14
29

answered Jul 09 '20 at 20:30

kawingkelvin

2,229
1
17
35

2

Elegant and pythonic! +1 – Tim Mironov Jan 17 '21 at 22:11
@TimMironov Thanks. I could also have used _ for the x in that one-liner. Actually, I think there's a downside if you want to extract both x and y. I haven't yet figured out if you can do it in a similar one-liner. – kawingkelvin Jan 20 '21 at 23:26
This should be the answer. thanks. – Sahar Millis Feb 13 '21 at 03:56

score 14 · Answer 2 · answered Aug 27 '19 at 14:25

Supposing our tf.data.Dataset is called train_dataset , with eager_execution on, you can retrieve images and labels like this:

for images, labels in train_dataset.take(1):  # only take first element of dataset
    numpy_images = images.numpy()
    numpy_labels = labels.numpy()

the inline operation .numpy() converts tf.Tensors in numpy arrays
if you want to retrieve more elements of the dataset, just increase the number inside the take method. If you want all elements, just insert -1

It should be noted that this method will return ```count``` batches of images in some cases, instead of individual images. — Mr. Duhart, Jul 27 '20 at 20:25

Dylan · Answer 3 · 2019-05-20T19:28:40.263

I think we get a good example here:

https://colab.research.google.com/github/tensorflow/datasets/blob/master/docs/overview.ipynb#scrollTo=BC4pEXtkp4K-

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# where mnsit train is a tf dataset
mnist_train = tfds.load(name="mnist", split=tfds.Split.TRAIN)
assert isinstance(mnist_train, tf.data.Dataset)

mnist_example, = mnist_train.take(1)
image, label = mnist_example["image"], mnist_example["label"]

plt.imshow(image.numpy()[:, :, 0].astype(np.float32), cmap=plt.get_cmap("gray"))
print("Label: %d" % label.numpy())

So each individual component of the dataset can be accessed sort of like a dictionary. Presumably different datasets have different field names (Boston housing won't have image, and value, but might have 'features' and 'target' or 'price':

cnn = tfds.load(name="cnn_dailymail", split=tfds.Split.TRAIN)
assert isinstance(cnn, tf.data.Dataset)
cnn_ex, = cnn.take(1)
print(cnn_ex)

returns a dict() with keys ['article', 'highlight'] with numpy strings inside.

score 1 · Answer 4 · answered May 21 '19 at 12:50

Here is my own solution to the problem:

def dataset2numpy(dataset, steps=1):
    "Helper function to get data/labels back from TF dataset"
    iterator = dataset.make_one_shot_iterator()
    next_val = iterator.get_next()
    with tf.Session() as sess:
        for _ in range(steps):
           inputs, labels = sess.run(next_val)
           yield inputs, labels

Please note that this function will yield inputs/labels of dataset batch. The steps control how many batches from a dataset will be taken out.

score 1 · Answer 5 · answered Jun 09 '20 at 09:35

This worked for me

features = np.array([list(x[0].numpy()) for x in list(ds_test)])
labels = np.array([x[1].numpy() for x in list(ds_test)])



# NOTE: ds_test was created
iris, iris_info = tfds.load('iris', with_info=True)
ds_orig = iris['train']
ds_orig = ds_orig.shuffle(150, reshuffle_each_iteration=False)
ds_train = ds_orig.take(100)
ds_test = ds_orig.skip(100)

score 0 · Answer 6 · answered Dec 29 '20 at 22:06

If you are OK with keeping the images and labels as tf.Tensors, you can do

images, labels = tuple(zip(*dataset))

Think of the effect of the dataset as zip(images, labels). When we want to get images and labels back, we can simply unzip it.

If you need the numpy array version, convert them using np.array():

images = np.array(images)
labels = np.array(labels)

How to extract data/labels back from TensorFlow dataset

6 Answers6

Linked

Related