how to use LSA for dimension reduction in text analytics with R

Question

I am a beginner at data science, and I am working on a text analytics/sentiment analysis project with tweets. what i have been trying to do is to perform some dimension reduction on my tweets training set, and feed the training set into a NaiveBayes learner, and use the learned NaiveBayes to predict the sentiment on the testing tweet set.

I have been following the steps in this article:

http://www.analyticskhoj.com/data-mining/text-analytics-part-iv-cluster-analysis-on-terms-and-documents-using-r/

their explanation is kind of too brief for a beginner like me.

I have used the lsa() to create a, what's labeled as "Large LSAspace (3 elements)" in RStudio. And following their example, I've created 3 more data frames:

lsa.train.tk = as.data.frame(lsa.train$tk)
lsa.train.dk = as.data.frame(lsa.train$dk)
lsa.train.sk = as.data.frame(lsa.train$sk)

when i view the lsa.train.tk data, it looks like this (lsa.train.dk looks pretty similar to this matrix):

and my lsa.train.sk looks like following:

my question is, how do i interpret such information? How can i utilize this information to create something that I can feed into my NaiveBayes learner? I tried just using the lsa.train.sk for the NaiveBayes learner, but I cannot think of any good explanation that can justify what I've tried. Any help would be much appreciated!

EDIT: What I've done so far:

making everything into term document matrix
pass in the matrix into the NaiveBayes learner
predict using the learned algorithm

my problems are:

accuracy is only 50%... and I realized that it labels everything as positive sentiment (so I could have gotten 1% accuracy if my test set only contains negative sentiment tweets).
current code is not scalable. since it utilizes large matrices, I can only handle up to 3.5k rows of data. more than that, my computer would crash. thus I wanted to do a dimensional reduction so that I can handle up to more data (such as 10k or 100k rows of tweets)

It's hard to tell what you're looking for, as written. You may want to reduce your post to the minimal reproduction of what you've already tried and what result you expected. — effel, Mar 17 '16 at 02:29

how to use LSA for dimension reduction in text analytics with R

0 Answers0