Questions tagged [latent-semantic-analysis]

Latent semantic analysis is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Use this tag for questions related to the natural language processing technique.

Latent semantic analysis is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. Use this tag for questions related to the natural language processing technique.

30 questions
13
votes
1 answer

R Supervised Latent Dirichlet Allocation Package

I'm using this LDA package for R. Specifically I am trying to do supervised latent dirichlet allocation (slda). In the linked package, there's an slda.em function. However what confuses me is that it asks for alpha, eta and variance parameters. As…
Alex R.
  • 1,164
  • 3
  • 13
  • 29
6
votes
1 answer

In Latent Semantic Analysis, how do you recombine the decomposed matrices after truncating the singular values?

I'm reading Matrix decompositions and latent semantic indexing (Online edition © 2009 Cambridge UP) I'm trying to understand how you reduce the number of dimensions in a matrix. There's an example on page 13 which I'm trying to replicate using…
mtanti
  • 724
  • 7
  • 20
5
votes
1 answer

combining LSA/LSI with Naive Bayes for document classification

I'm new to the gensim package and vector space models in general, and I'm unsure of what exactly I should do with my LSA output. To give a brief overview of my goal, I'd like to enhance Naive Bayes Classifier using topic modeling to improve…
3
votes
1 answer

Using Latent Semantic Analysis to measure passage similarity

Im currently developing a program to compare two pieces of text based on its semantics (meaning). I understand there are libraries such as lingpipe which provide useful methods to compare string distances, however i've heard that LSA is the best…
3
votes
1 answer

LSA - Feature selection

I have this SVD decomposition of the document I've read this page, but I don't understand how can I compute the best feature for document separation. I know that: S x Vt gives me relation between documents and features U x S gives me relation…
2
votes
0 answers

AttributeError: 'int' object has no attribute 'toarray'

I tried to solve this issue but the error keeps persistent. It is not a problem for English texts, but it is so for Arabic. Any idea how to solve this problem? top_n_words_lsa = get_top_n_words(10, lsa_keys, small_document_term_matrix,…
2
votes
2 answers

Topic Modelling: LDA , word frequency in each topic and Wordcloud

Question: How can I compute and code the frequency of words in each topic? My goal is to create 'Word Cloud' from each topic. P.S.> I have no problem with wordcloud. From the code, burnin <- 4000 #We do not collect this. iter <- 4000 thin…
2
votes
1 answer

Using the lsa package in R - Error in Ops.simple_triplet_matrix(m, 1) : Incompatible dimensions

I am trying to learn to use the lsa package in R. I am working with a much larger data set than the example below, but this is for the purposes of reproducibility (props to this person for posting this code on his site, it's a great resource). I…
1
vote
0 answers

Latent text analysis (lsa package) using whole documents in R

I have a code that successfully performs Latent Text Analysis on short citations using the lsa package in R (see below). However, I would rather like to use this method on text from larger documents. Copy-pasting the whole thing in each citation…
Naomi Peer
  • 317
  • 2
  • 7
1
vote
1 answer

Semantic Similarity between Sentences in a Text

I have used material from here and a previous forum page to write some code for a program that will automatically calculate the semantic similarity between consecutive sentences across a whole text. Here it is; The code for the first part is copy…
1
vote
1 answer

gensim Generating LSI model causes "Python has stopped working"

So I am trying to use gensim to generate an LSI model along with corpus_lsi following this tutorial. I start with a corpus and a dictionary that I generated myself. The list of documents are too small (9 lines = 9 documents), which is the sample…
1
vote
1 answer

How Latent Semantic Analysis Handle Semantics

I have gone through LSA method. It is said that LSA can be used for semantic analysis. But I can not understand how it is working in LSA. Can anyone please tell me how LSA handle semantics.
Chamath Sajeewa
  • 303
  • 1
  • 14
1
vote
1 answer

Latent semantic analysis (LSA) single value decomposition (SVD) understanding

Bear with me through my modest understanding of LSI (Mechanical Engineering background): After performing SVD in LSI, you have 3 matrices: U, S, and V transpose. U compares words with topics and S is a sort of measure of strength of each feature. Vt…
user2040444
1
vote
1 answer

How to generate recommendation with matrix factorization

I've read some papers of Matrix Factorization(Latent Factor Model) in Recommendation System,and I can implement the algorithm.I can get the similar RMSE result like the paper said on the MovieLens dataset. However I find out that,if I try to…
0
votes
1 answer

How do i retain numbers while preprocessing data using gensim in python?

I have used gensim.utils.simple_preprocess(str(sentence) to create a dictionary of words that I want to use for topic modelling. However, this is also filtering important numbers (house resolutions, bill no, etc) that I really need. How did I…
1
2