4

I am using GoogleNet model for binary classification of images. Earlier, I was using the virtual machine and now I am using Ubuntu 14.04. Both are giving me different results. I tried to find out a lot where is the problem but couldn't pinpoint it.

I have trained two models separately one in Ubuntu 14.04 and another in the virtual machine. Both models are using CPU. cuDNN is not being used in both. Regarding BLAS library I am using default ATLAS.

Any suggestions would be of great help.

Ashutosh Singla
  • 691
  • 4
  • 12
  • 28
  • how different the results? did you started training from a trained model ("finetuning") or from scratch (random weights)? is it possible that the random seed of caffe was different in the two runs? – Shai May 02 '16 at 12:04
  • 2
    In both cases, I started the training from scratch and with the same parameters. And the difference in the results are huge more than 20%. – Ashutosh Singla May 02 '16 at 12:14
  • have you fixed [`random_seed`](https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L207) in your `solver.prototxt` to the same seed in both cases explicitly? – Shai May 02 '16 at 12:16
  • 2
    No, I didn't know about it. – Ashutosh Singla May 02 '16 at 12:50

1 Answers1

4

Since you started your training from scratch in both cases and you did not explicitly fixed random_seed parameter in your solver.prototxt it is very likely that caffe initialized your model with different random weights for each of the two training processes. Starting from different points is very likely to end with differently trained models.
If you are concerned about possible differences in caffe between the two architectures, try repeat the training but with the same random_seed parameter in solver.prototxt.

Community
  • 1
  • 1
Shai
  • 93,148
  • 34
  • 197
  • 325
  • 1
    So, shall I use the below line to set the random seed. radom_seed = 10 is this command right? or do I need to assign the some other value. – Ashutosh Singla May 02 '16 at 12:57
  • 1
    @AshutoshSingla the correct syntax would be `random_seed: 10` or any integer number for that matter. My personal favorite is 310 ;) – Shai May 02 '16 at 12:58
  • 1
    Ok. Thanks a lot for your help. I will do the same the will update you. – Ashutosh Singla May 02 '16 at 12:59
  • 1
    @AshutoshSingla well, it might be more appropriate to wait with "Accepting" this answer to the results of your new experiment... (not that I'm complaining) – Shai May 02 '16 at 13:01
  • 1
    I set random_seed: 1 in my solver file and still I am getting the different results. I also looked into the issue. I found this link https://groups.google.com/forum/#!topic/caffe-users/bsp34yOcEZo – Ashutosh Singla May 03 '16 at 08:51
  • @AshutoshSingla how different are your results? can this difference be explained as a "random perturbation" or is there a *significant* difference? – Shai May 03 '16 at 08:55
  • In Linux : True Positive 99 True Negative 23 In Virtual Machine : True Positive 91 True Negative 48 – Ashutosh Singla May 03 '16 at 08:59
  • @AshutoshSingla shouldn't they add upto the same value i.e. 99 + 23 and 91 + 48 should have given the same result... – Autonomous Jun 04 '16 at 02:09
  • @Shai - Does setting `random_seed` also helps ensure that the total time taken to train the network will remain same if I run it today and then tomorrow for same network parameters? – Chetan Arvind Patil Feb 04 '18 at 02:04
  • @ChetanArvindPatil I don't know. My guess is that run time is not affected by the actual value of parameters/data – Shai Feb 04 '18 at 04:31