Tensorflow can not restore vocabulary in evaluation process

Question

I am new to tensorflow and neural network. I started a project which is about detecting errors in persian texts. I used the code in this address and developed the code in here. please check the code because I can not put all the code here.

What I want to do is to give several persian sentences to the model for training and then see if model can detect wrong sentences. The model works fine with english data but when I use it for persian data I encounter this issue.

The code is too long to be written here so I try to point to the part I think might be causing the issue. I used these lines in train.py which works fine and stores vocabularies:

x_text, y = data_helpers.load_data_labels(datasets)
# Build vocabulary
max_document_length = max([len(x.split(" ")) for x in x_text])
vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)
x = np.array(list(vocab_processor.fit_transform(x_text)))

however after training when I try this code in eval.py:

vocab_path = os.path.join(FLAGS.checkpoint_dir, "..", "vocab")
vocab_processor = learn.preprocessing.VocabularyProcessor.restore(vocab_path)
x_test = np.array(list(vocab_processor.transform(x_raw)))

this error happens:

vocab_processor = learn.preprocessing.VocabularyProcessor.restore(vocab_path)
File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\tensorflow\contrib\learn\python\learn\preprocessing\text.py", line 226, in restore
return pickle.loads(f.read())
File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 118, in read
self._preread_check()
 File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 78, in _preread_check
  compat.as_bytes(self.__name), 1024 * 512, status)
 File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\contextlib.py", line 66, in __exit__
 next(self.gen)
 File "C:\WinPython-64bit-3.5.2.3Qt5\python-3.5.2.amd64\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: ..\vocab : The system cannot find the file specified.

I think the problem is because it can not read the vocabulary stored after training ,as the data is in unicode and it's not english. Can anyone help me please

Are you certain that the file exists in the expected directory? Can you try using an absolute path rather than a relative one? — mrry, Dec 04 '17 at 16:17
The code works with english dataset. It also works for polarity data set. So I guess the relative directory doesn't make any problem here. — Masoud Masoumi Moghadam, Dec 04 '17 at 20:01
Does anybody know any sample code which works with tensorflow and unicode data? something that I can make use of it to solve my problem? I want to know if tensorflow can save unicode vocabulary. — Masoud Masoumi Moghadam, Dec 04 '17 at 20:07
The most likely explanation for a `NotFound` error is that the file is not in the requested location. Did you set the `--checkpoint_dir` flag? (From the error message, it appears that the flag has an empty value.) — mrry, Dec 04 '17 at 21:01
Do you have an extra space, like this `..\vocab `, in the filename? — Jiang Xiang, Dec 08 '17 at 22:31
I guess that's the point. The address to vocab file is causing the problem. thanks bro — Masoud Masoumi Moghadam, Dec 08 '17 at 22:39

Masoud Masoumi Moghadam · Accepted Answer · 2017-12-10T09:19:58.720

The reason why this problem happens is because vocab address is not correct. In train.py after line 144 which the out_dir is set, I added this:

file = open('model_dir.txt', 'w')
file.write(out_dir)
file.close()

After training the model, address is saved in the directory in a file named as model_dir.txt.

Then in eval.py I added this:

model_dir = open('model_dir.txt').readline()
vocab_path = model_dir + "/vocab"

Now, The address is set correctly and the code is working with no problem.

score 3 · Answer 2 · answered Dec 05 '17 at 22:53

3

Have you tried adding this at the top of your file?

# -*- coding: utf-8 -*-

answered Dec 05 '17 at 22:53

Myles Hollowed

581
3
15

you mean the dataset ? – Masoud Masoumi Moghadam Dec 05 '17 at 23:00
In the file that's loading the data. – Myles Hollowed Dec 05 '17 at 23:45

Tensorflow can not restore vocabulary in evaluation process

2 Answers2