0

So I learned how to use pickle this morning to dump lists to a text file bc you can not use .write to send lists to a file. I am watching a video on youtube, Natural Language Processing With Python and NLTK p.4. You can see what the full output should be there. He does not push the data to a txt file but I wanted to take it farther to learn more.

Sample Terminal Output: [('PRESIDENT', 'NNP'), ('GEORGE', 'NNP'), ('W.', 'NNP'), ('BUSH', 'NNP'), ("'S", 'POS') Note: This is suppose to go on for the whole speech and does in the terminal.

Full File Output: €]q (X (qh†qX ApplauseqX NNPq†qX .qh†qX )qh†q e.

My Code:

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer
import pickle

output = open('stoutput.txt', 'wb')
train_text = state_union.raw('2005-GWBush.txt')
sample_text = state_union.raw('2006-GWBush.txt')

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)

def process_content():

    try:
        for i in tokenized:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            print(tagged)
            pickle.dump(tagged, open('stoutput.txt', 'wb'))
    except Exception as e:
           pickle.dump(e, open('stoutput.txt', 'wb'))
           print(str(e))


process_content()

Any help is greatly appreciated as I know it takes time. Thanks for reading.

Yepram Yeransian
  • 291
  • 1
  • 13

1 Answers1

1

pickle is used to serialize Python objects and tagged is a list object, so what you're looking at is the byte stream representation of a list coerced to a character encoding, which explains why you have the strings 'Applause' and 'NNP' (elements of the list) surrounded by a bunch of gibberish. If you want the representation you get with print(l) then forget pickle and write the list coerced to a string

with open('stoutput.txt', 'wb') as f:
    f.write(str(tagged))

although you probably want the with statement outside of your for loop.

EDIT: if your goal is to be able to use this data in a later Python session or script but you want it in a more readable form than pickle gives you I would suggest converting your list to CSV -- see this question for instructions.

apteryx
  • 825
  • 6
  • 13
  • 1
    Bad advice to convert a Python object into a string without first serializing it. The user has to later use `eval()` to read the file and that might cause a whole lot of problems... – alvas Feb 05 '18 at 01:39
  • @alvas you're right, I made the (hasty) assumption that the question asker just wanted the file to read, not to load back into python – apteryx Feb 05 '18 at 14:33