I'm running a reinforcement learning program in a gym environment(BipedalWalker-v2) implemented in tensorflow. I've set the random seed of the environment, tensorflow and numpy manually as follows
os.environ['PYTHONHASHSEED']=str(42)
random.seed(42)
np.random.seed(42)
tf.set_random_seed(42)
env = gym.make('BipedalWalker-v2')
env.seed(0)
config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
# run the graph with sess
However, I get different results every time I run my program (without changing any code). Why are the results not consistent and what should I do if I want to obtain the same result?
Update:
The only places that I can think of may introduce randomness (other than the neural networks) are
- I use
tf.truncated_normal
to generate random noiseepsilon
so as to implement noisy layer - I use
np.random.uniform
to randomly select samples from replay buffer
I also spot that the scores I get are pretty consistent at the first 10 episodes, but then begin to differ. Other things such as losses also show a similar trend but are not the same in numeric.
Update 2
I've also set "PYTHONHASHSEED" and use single-thread CPU as @jaypops96 described, but still cannot reproduce the result. Code has been updated in the above code block