1

I've been getting some irregular behavior from an LDA topic model program and right now, it seems like my file won't save the lda model it creates... I'm really not sure why.

Here's a code snippet, albeit it's going to take me more time before I could write code that's reproducible since I'm really just trying to load certain files I created beforehand.

def naive_LDA_implementation(name_of_lda, create_dict=False, remove_low_freq=False):

    LDA_MODEL_PATH = "lda_dir/" + str(name_of_lda) +"/model_dir/" # for some reason this location doesn't work entirely... and yes, I have made a directory in a the folder of this name.
    # This ends up saving the .state, .id2word, and .expEblogbeta.npy files... But normally when saving an lda model actually works, a fourth file is included that's to my understanding the model itself.
    # LDA_MODEL_PATH = "models/" # This is what I originally had as the location for LDA_MODEL_PATH. I was using a directory called models for multiple lda models. This no longer works.

    doc_df = getCorpus(name_of_lda, cleaned=True) # returns a dataframe containing a row for each text record and an extra column that contains the tokenized version of the text's post/string of words.
    dict_path = "lda_dir/" + str(name_of_lda) + "/dict_of_tokens.dict"
    docs_of_tokens = convert_cleaned_tokens_entries(doc_df['cleaned_tokens'])
    if create_dict != False:
        doc_dict = corpora.Dictionary(docs_of_tokens) :
        if remove_low_freq==True:
            doc_dict.filter_extremes(no_below=5, no_above=0.6)
        doc_dict.save(dict_path)
        print("Finished saving") 
    else:
        doc_dict = corpora.Dictionary.load(dict_path)
doc_term_matrix = [doc_dict.doc2bow(doc) for doc in docs_of_tokens] # gives a unique id for each word in corpus_arr

Lda = gensim.models.ldamodel.LdaModel
ldamodel = Lda(doc_term_matrix, num_topics=15, id2word = doc_dict, passes=20, chunksize=10000)
ldamodel.save(LDA_MODEL_PATH)

To put it sraightforwardly... I have no clue why permission is being denied when I try to save my lda model to a particular location. Right now even the original models/ directory location is giving me "permission denied" with this error message. It's seeming like any and all directories I could use just... won't work. This is odd behavior and I can't really find asks that talk about this error in the same context. I have found posts of people getting this error message when they actually tried storing in locations that did not exist. But for me that isn't really a question.

When I first got this error... I actually started to wonder if it was because I had another lda topic model that I named topic_model_1. It was stored in the models/ subdirectory. I started to wonder if the name was a potential cause, and changed it to lda_model_topic_1 to see if that could change results... but nothing is working.

Even if you can't really figure out what solution applies to my situation (especially since right now I don't have reproducible code, I just have my work)... Can someone tell me what this error message means? When and why does it come up? Maybe that's a start.

      Traceback (most recent call last):
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 679,
in save
    _pickle.dump(self, fname_or_handle, protocol=pickle_protocol)
TypeError: file must have a 'write' attribute

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "text_mining.py", line 461, in <module>
    main()
  File "text_mining.py", line 453, in main
    naive_LDA_implementation(name_of_lda="lda_model_topic_1", create_dict=True,
remove_low_freq=True)
  File "text_mining.py", line 411, in naive_LDA_implementation
    ldamodel.save(LDA_MODEL_PATH)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\models\ldamodel.py",
line 1583, in save
    super(LdaModel, self).save(fname, ignore=ignore, separately=separately, *arg
s, **kwargs)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 682,
in save
    self._smart_save(fname_or_handle, separately, sep_limit, ignore, pickle_prot
ocol=pickle_protocol)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 538,
in _smart_save
    pickle(self, fname, protocol=pickle_protocol)
  File "C:\Users\biney\Miniconda3\lib\site-packages\gensim\utils.py", line 1337,
 in pickle
    with smart_open(fname, 'wb') as fout:  # 'b' for binary, needed on Windows
  File "C:\Users\biney\Miniconda3\lib\site-packages\smart_open\smart_open_lib.py
", line 181, in smart_open
    fobj = _shortcut_open(uri, mode, **kw)
  File "C:\Users\biney\Miniconda3\lib\site-packages\smart_open\smart_open_lib.py
", line 287, in _shortcut_open
    return io.open(parsed_uri.uri_path, mode, **open_kwargs)
PermissionError: [Errno 13] Permission denied: 'lda_dir/lda_model_topic_1/model_
dir/'
Byron Smith
  • 527
  • 8
  • 28

1 Answers1

0

It seems that because you are using a relative path you may be attempting to save to a location of SCRIPT_LAUNCH_PATH + lda_dir/lda_model_topic_1/model_dir/ which is not accessible for writing (it may be that SCRIPT_LAUNCH_PATH is actually your PYTHONPATH - installation directory of your python interpreter).

You could check your launch directory:

import os
print(os.path.dirname(os.path.abspath(__file__)))

or (better) save the file to an absolute path, e.g.: C:\Users\<youruser>\Documents\... (remember to swap <youruser> to your login name in Windows) where you should have all the write permissions.

Another reason may be that you are running the script with a different user than the one who created the directory.

sophros
  • 8,714
  • 5
  • 30
  • 57