0

I'm using tf.data.Datasets of V1.12 API like this Q&A to read several .h5 files pre-saved batch per file in a directory. I first made a generator:

class generator_yield:
    def __init__(self, file):
        self.file = file

    def __call__(self):
        with h5py.File(self.file, 'r') as f:
            yield f['X'][:], f['y'][:]

Then make a list of filenames and passe them in Dataset:

def _fnamesmaker(dir, mode='h5'):
    fnames = []
    for dirpath, _, filenames in os.walk(dir):
        for fname in filenames:
            if fname.endswith(mode):
                fnames.append(os.path.abspath(os.path.join(dirpath, fname)))
    return fnames

fnames = _fnamesmaker('./')
len_fnames = len(fnames)
fnames = tf.data.Dataset.from_tensor_slices(fnames)

Apply the interleave method of Dataset:

# handle multiple files
ds = fnames.interleave(lambda filename: tf.data.Dataset.from_generator(
    generator_yield(filename), output_types=(tf.float32, tf.float32),
    output_shapes=(tf.TensorShape([100, 100, 1]), tf.TensorShape([100, 100, 1]))), cycle_length=len_fnames)
ds = ds.batch(5).shuffle(5).prefetch(5)

# init iterator
it = ds.make_initializable_iterator()
init_op = it.initializer
X_it, y_it = it.get_next()

Model:

# model
with tf.name_scope("Conv1"):
    W = tf.get_variable("W", shape=[3, 3, 1, 1],
                         initializer=tf.contrib.layers.xavier_initializer())
    b = tf.get_variable("b", shape=[1], initializer=tf.contrib.layers.xavier_initializer())
    layer1 = tf.nn.conv2d(X_it, W, strides=[1, 1, 1, 1], padding='SAME') + b
    logits = tf.nn.relu(layer1)


    loss = tf.reduce_mean(tf.losses.mean_squared_error(labels=y_it, predictions=logits))
    train_op = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss)

Start session:

with tf.Session() as sess:
    sess.run([tf.global_variables_initializer(), init_op])
    while True:
        try:
            data = sess.run(train_op)
            print(data.shape)
        except tf.errors.OutOfRangeError:
            print('done.')
            break

The Error looks like:

TypeError: expected str, bytes or os.PathLike object, not Tensor At the init method of generator. Apparently when one applies interleave the it's a Tensor passes through to the generator

Zézouille
  • 342
  • 2
  • 17

2 Answers2

0

You cannot run the dataset object directly through sess.run. You have to define an iterator, get the next element. Try doing something like:

next_elem = files.make_one_shot_iterator.get_next()
data = sess.run(next_elem)

You should be able to get your tensors.

kvish
  • 972
  • 1
  • 6
  • 11
  • @Zeliang Su all the dataset objects need to use some iterator to get the elements. All the methods are transformations or aggregations, and the result of these are accessible only through the iterator. Check out the guide on [Importing data](https://www.tensorflow.org/guide/datasets) for more clarity on how the API is structured. – kvish Feb 22 '19 at 15:53
0

According to this post, my case won't benefit in performance with the parralel_interleave.

...have a transformation that transforms each element of a source dataset into multiple elements into the destination dataset...

It's more relevant in the typical classification problem with datas (dog, cat...)saved in separate directories. We have a segmentation problem here which means that a label contains identical dimension of a input image. All datas are stocked in one directory and each .h5 file contains an image and its labels(masks)

Herein, a simple map with num_parallel_calls is sufficient.

Zézouille
  • 342
  • 2
  • 17