5

I am trying to replicate the behaviour of tf.nn.dynamic_rnn using the low level api tf.nn.raw_rnn. In order to do so, I am using the same patch of data, setting the random seed and using the same hparams for the creation of the cell and recurrent neural network. However, the outputs that are generated from both implementation are not equal to each other. Below is the data as well as the code.

The data and lengths:

X = np.array([[[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [0.0, 0.0, 0.0]], [[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [7.7, 8.8, 9.9]], [[1.1, 2.2, 3.3], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]], dtype=np.float32)
X_len = np.array([2, 3, 1], dtype=np.int32)

The tf.nn.dynamic_rnn implementation:

tf.reset_default_graph()
tf.set_random_seed(42)

inputs = tf.placeholder(shape=(3, None, 3), dtype=tf.float32)
lengths = tf.placeholder(shape=(None,), dtype=tf.int32)

lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
outputs, state = tf.nn.dynamic_rnn(inputs=inputs, sequence_length=lengths, cell=lstm_cell, dtype=tf.float32, initial_state=lstm_cell.zero_state(3, dtype=tf.float32), time_major=True)
outputs_reshaped = tf.transpose(outputs, perm=[1, 0, 2])

sess = tf.Session()
sess.run(tf.initializers.global_variables())
X = np.transpose(X, (1, 0, 2))
hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: X, lengths: X_len})
print(hidden_state)

The tf.nn.raw_rnn implementation:

tf.reset_default_graph()
tf.set_random_seed(42)

inputs = tf.placeholder(shape=(3, None, 3),dtype=tf.float32)
lengths = tf.placeholder(shape=(None,), dtype=tf.int32)

inputs_ta = tf.TensorArray(dtype=tf.float32, size=3)
inputs_ta = inputs_ta.unstack(inputs)

lstm_cell = tf.nn.rnn_cell.LSTMCell(5)

def loop_fn(time, cell_output, cell_state, loop_state):
    emit_output = cell_output  # == None for time == 0
    if cell_output is None:  # time == 0
        next_cell_state = lstm_cell.zero_state(3, tf.float32)
    else:
        next_cell_state = cell_state

    elements_finished = (time >= lengths)
    finished = tf.reduce_all(elements_finished)
    next_input = tf.cond(finished, true_fn=lambda: tf.zeros([3, 3], dtype=tf.float32), false_fn=lambda: inputs_ta.read(time))

    next_loop_state = None

    return (elements_finished, next_input, next_cell_state, emit_output, next_loop_state)

outputs_ta, final_state, _ = tf.nn.raw_rnn(lstm_cell, loop_fn)
outputs_reshaped = tf.transpose(outputs_ta.stack(), perm=[1, 0, 2])

sess = tf.Session()
sess.run(tf.initializers.global_variables())

X = np.transpose(X, (1, 0, 2))
hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: X, lengths: X_len})

print(hidden_state)

I am sure that there is some discrepancy between them but I am not able to figure out where and what it is. If anyone has an idea it will be awesome.

Looking forward to your answers!

gorjan
  • 4,443
  • 14
  • 32

1 Answers1

2

The reason for the discrepency is that your variables are intialised to different values. You can see this by calling:

print(sess.run(tf.trainable_variables()))

after they have been initialised.

The reason for this discrepency is that there is a global seed and a per-op seed, so setting the random seed doesn't force the call to initialiser buried in the lstm code to use the same random seed. Refer to this answer for more details on this. To summarize: the random seed used for anything random, starts from your global seed, and then depends on the id of the last operation added to the graph.

Knowing this, we can force the variable seed to be the same accross both implementations by building the graph in the exact same order up until we construct the variables: this means that we started from the same global seed, and added the same operations to the graph in the same order up until the variables, thus the variables will have the same operation seed. We can do this like so:

tf.reset_default_graph()
tf.set_random_seed(42)
lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
inputs_shape = (3, None, 3)
lstm_cell.build(inputs_shape)

The build method is needed as this is what actually adds the variables to the graph.

Here is the full working version of what you had:

import tensorflow as tf
import numpy as np


X = np.array([[[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [0.0, 0.0, 0.0]], [[1.1, 2.2, 3.3], [4.4, 5.5, 6.6], [7.7, 8.8, 9.9]], [[1.1, 2.2, 3.3], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]], dtype=np.float32)
X_len = np.array([2, 3, 1], dtype=np.int32)


def dynamic():
    tf.reset_default_graph()
    tf.set_random_seed(42)
    lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
    inputs_shape = (3, None, 3)
    lstm_cell.build(inputs_shape)

    inputs = tf.placeholder(shape=inputs_shape, dtype=tf.float32)
    lengths = tf.placeholder(shape=(None,), dtype=tf.int32)

    outputs, state = tf.nn.dynamic_rnn(inputs=inputs, sequence_length=lengths, cell=lstm_cell, dtype=tf.float32,
                                       initial_state=lstm_cell.zero_state(3, dtype=tf.float32), time_major=True)
    outputs_reshaped = tf.transpose(outputs, perm=[1, 0, 2])

    sess = tf.Session()
    sess.run(tf.initializers.global_variables())
    a = np.transpose(X, (1, 0, 2))
    hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: a, lengths: X_len})
    print(hidden_state)


def replicated():
    tf.reset_default_graph()
    tf.set_random_seed(42)
    lstm_cell = tf.nn.rnn_cell.LSTMCell(5)
    inputs_shape = (3, None, 3)
    lstm_cell.build(inputs_shape)

    inputs = tf.placeholder(shape=inputs_shape, dtype=tf.float32)
    lengths = tf.placeholder(shape=(None,), dtype=tf.int32)

    inputs_ta = tf.TensorArray(dtype=tf.float32, size=3)
    inputs_ta = inputs_ta.unstack(inputs)


    def loop_fn(time, cell_output, cell_state, loop_state):
        emit_output = cell_output  # == None for time == 0
        if cell_output is None:  # time == 0
            next_cell_state = lstm_cell.zero_state(3, tf.float32)
        else:
            next_cell_state = cell_state

        elements_finished = (time >= lengths)
        finished = tf.reduce_all(elements_finished)
        next_input = tf.cond(finished, true_fn=lambda: tf.zeros([3, 3], dtype=tf.float32),
                             false_fn=lambda: inputs_ta.read(time))

        next_loop_state = None

        return (elements_finished, next_input, next_cell_state, emit_output, next_loop_state)

    outputs_ta, final_state, _ = tf.nn.raw_rnn(lstm_cell, loop_fn)
    outputs_reshaped = tf.transpose(outputs_ta.stack(), perm=[1, 0, 2])

    sess = tf.Session()
    sess.run(tf.initializers.global_variables())

    a = np.transpose(X, (1, 0, 2))
    hidden_state = sess.run(outputs_reshaped, feed_dict={inputs: a, lengths: X_len})

    print(hidden_state)


if __name__ == '__main__':
    dynamic()
    replicated()
Ben
  • 641
  • 6
  • 15