Questions tagged [lsa]

LSA stands for Latent Semantic Analysis, a natural language processing technique which involves analysing the relationships between documents and terms they contain by producing a set of related concepts.

LSA stands for Latent Semantic Analysis, a natural language processing technique which involves analysing the relationships between documents and terms they contain by producing a set of related concepts.

For the Microsoft Windows subsystem, see (local-security-authority).

113 questions
10
votes
1 answer

How do we decide the number of dimensions for Latent semantic analysis ?

I have been working on latent semantic analysis lately. I have implemented it in java by making use of the Jama package. Here is the code: Matrix vtranspose ; a = new Matrix(termdoc); termdoc = a.getArray(); a = a.transpose() ;…
CTsiddharth
  • 907
  • 12
  • 21
10
votes
3 answers

Custom Windows Authentication Package

So, here is the scenario. I am developing a logon system in windows 7. I have created a Credential Provider, containing one Credential. The Credential has three input fields, username, password, and PIN. From what I have learned the documentation…
9
votes
4 answers

LSA - Latent Semantic Analysis - How to code it in PHP?

I would like to implement Latent Semantic Analysis (LSA) in PHP in order to find out topics/tags for texts. Here is what I think I have to do. Is this correct? How can I code it in PHP? How do I determine which words to chose? I don't want to use…
caw
  • 29,212
  • 58
  • 168
  • 279
8
votes
1 answer

Python LSA with Sklearn

I'm currently trying to implement LSA with Sklearn to find synonyms in multiple Documents. Here is my Code: #import the essential tools for lsa from sklearn.feature_extraction.text import CountVectorizer from sklearn.feature_extraction.text import…
Schweigerama
  • 109
  • 1
  • 1
  • 9
6
votes
2 answers

SVD in a term document matrix do not give me values I want

I am trying to replicate an example in a paper called "An introduction to LSA": An introduction to LSA In the example they have the following term-document matrix: And then they apply SVD and get the following: Trying to replicate this, I wrote…
dpalma
  • 498
  • 4
  • 20
6
votes
2 answers

Randomized SVD for LSA\LSI on Windows environment

I am working on a project which includes the use of latent semantic analysis (LSA). This requires the usage of singular value decomposition (SVD), sometimes on large data sets. Is there an implementation of randomized-SVD (rSVD) available for…
Leeor
  • 617
  • 7
  • 22
5
votes
1 answer

What can cause a Kerberos TGT session key on Windows to be all zeros

I recently asked a question about some problems I was having getting MIT Kerberos to work nicely with Microsoft's LSA credentials cache. I was told that setting the registry key AllowTGTSessionKey should fix the problem. However, I'm still having…
jalf
  • 229,000
  • 47
  • 328
  • 537
5
votes
1 answer

Probabilistic latent semantic analysis/Indexing - Introduction

But recently I found this link quite helpful to understand the principles of LSA without too much math. http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis-tutorial.html. It forms a good basis on which I…
Sharmila
  • 1,557
  • 2
  • 22
  • 30
5
votes
4 answers

How to build a conceptual search engine?

I would like to build an internal search engine (I have a very large collection of thousands of XML files) that is able to map queries to concepts. For example, if I search for "big cats", I would want highly ranked results to return documents with…
DevX
  • 1,464
  • 2
  • 13
  • 17
5
votes
1 answer

Latent Semantic Analysis (LSA) Tutorial

I am trying to work with a tutorial in LSA in this link (edit: July 2017. Remove dead link) Here is the code of the tutorial: titles = [doc1,doc2] stopwords = ['and','edition','for','in','little','of','the','to'] ignorechars = ''',:'!''' class…
Tasos
  • 6,330
  • 10
  • 64
  • 148
4
votes
0 answers

Inspect TermDocumentMatrix to get full list of words / terms in R

I am trying to use inspect(TermDocumentMatrix()) to get a list of word/term frequencies between text documents (in R) Using the example code from ?TermDocumentMatrix: data("crude") tdm <- TermDocumentMatrix(crude, control = list(removePunctuation =…
b_g
  • 267
  • 4
  • 14
4
votes
1 answer

Clustering Using Latent Symantic Analysis

Suppose I have a corpus of documents and I run LSA algorithm on it. How can I use the final matrix obtained after applying SVD to semantically cluster all the words appearing in my corpus of documents? Wikipedia says LSA can be used to find relation…
user2115183
  • 837
  • 2
  • 9
  • 13
3
votes
1 answer

Discovering synonyms from set of documents using LSA transform in Ruby

After applying the LSA transform to a document array, how can this be used to generate synonyms? For instance, I have the following sample documents: D1 = Mobilization D2 = Reflective Pavement D3 = Maintenance of Traffic D4 = Special Detour D5 =…
reectrix
  • 5,419
  • 15
  • 44
  • 74
3
votes
2 answers

Why use LSA before K-Means when doing text clustering

I'm following this tutorial from Scikit learn on text clustering using K-Means: http://scikit-learn.org/stable/auto_examples/text/document_clustering.html In the example, optionally LSA (using SVD) is used to perform dimensionality reduction. Why is…
Niko Nelissen
  • 90
  • 1
  • 10
3
votes
1 answer

How to handle negative values of cosine similarities

I computed tf-idf of my documents based of terms. Then, I applied LSA to reduce the dimensionality of the terms. 'similarity_dist' contains values which are negative (see table below). How can I compute cosine distance with the range…
kitchenprinzessin
  • 769
  • 3
  • 7
  • 26
1
2 3 4 5 6 7 8