Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

858 questions

votes

6 answers

Remove empty documents from DocumentTermMatrix in R topicmodels?

I am doing topic modelling using the topicmodels package in R. I am creating a Corpus object, doing some basic preprocessing, and then creating a DocumentTermMatrix: corpus <- Corpus(VectorSource(vec), readerControl=list(language="en")) corpus <-…

r lda topic-modeling topicmodels

asked Dec 19 '12 at 01:25

Bill M

votes

2 answers

LDA topic modeling - Training and testing

I have read LDA and I understand the mathematics of how the topics are generated when one inputs a collection of documents. References say that LDA is an algorithm which, given a collection of documents and nothing more (no supervision needed), can…

lda topic-modeling

asked Jun 22 '12 at 18:52

tan

1,419
3
14
29

votes

2 answers

Simple Python implementation of collaborative topic modeling?

I came across these 2 papers which combined collaborative filtering (Matrix factorization) and Topic modelling (LDA) to recommend users similar articles/posts based on topic terms of post/articles that users are interested in. The papers (in PDF)…

python machine-learning lda topic-modeling collaborative-filtering

asked Aug 25 '15 at 23:40

jxn

6,325
21
75
140

votes

5 answers

Understanding LDA implementation using gensim

I am trying to understand how gensim package in Python implements Latent Dirichlet Allocation. I am doing the following: Define the dataset documents = ["Apple is releasing a new product", "Amazon sells many things", …

python gensim lda topic-modeling dirichlet

asked Dec 03 '13 at 11:31

visakh

2,333
6
25
50

votes

10 answers

How to print the LDA topics models from gensim? Python

Using gensim I was able to extract topics from a set of documents in LSA but how do I access the topics generated from the LDA models? When printing the lda.print_topics(10) the code gave the following error because print_topics() return a…

python nlp lda topic-modeling gensim

asked Feb 22 '13 at 02:47

alvas

94,813
90
365
641

votes

2 answers

What's the disadvantage of LDA for short texts?

I am trying to understand why Latent Dirichlet Allocation(LDA) performs poorly in short text environments like Twitter. I've read the paper 'A biterm topic model for short text', however, I still do not understand "the sparsity of word…

nlp lda topic-modeling

asked Apr 22 '15 at 03:05

Shuguang Zhu

votes

2 answers

Topic models: cross validation with loglikelihood or perplexity

I'm clustering documents using topic modeling. I need to come up with the optimal topic numbers. So, I decided to do ten fold cross validation with topics 10, 20, ...60. I have divided my corpus into ten batches and set aside one batch for a holdout…

r tm cross-validation topic-modeling

asked Jan 25 '14 at 17:52

user37874

votes

2 answers

Gensim: KeyError: "word not in vocabulary"

I have a trained Word2vec model using Python's Gensim Library. I have a tokenized list as below. The vocab size is 34 but I am just giving few out of 34: b = ['let', 'know', 'buy', 'someth', 'featur', 'mashabl', 'might', 'earn', 'affili', …

python nlp gensim word2vec topic-modeling

asked Jul 31 '17 at 15:59

Krishnang K Dalal

1,671
7
24
40

votes

5 answers

Using scikit-learn vectorizers and vocabularies with gensim

I am trying to recycle scikit-learn vectorizer objects with gensim topic models. The reasons are simple: first of all, I already have a great deal of vectorized data; second, I prefer the interface and flexibility of scikit-learn vectorizers; third,…

python scikit-learn topic-modeling gensim

asked Feb 04 '14 at 12:25

emiguevara

1,299
11
25

votes

3 answers

Using Word2Vec for topic modeling

I have read that the most common technique for topic modeling (extracting possible topics from text) is Latent Dirichlet allocation (LDA). However, I am interested whether it is a good idea to try out topic modeling with Word2Vec as it clusters…

nlp topic-modeling word2vec

asked Oct 06 '15 at 20:35

user1814735

votes

3 answers

LDA with topicmodels, how can I see which topics different documents belong to?

I am using LDA from the topicmodels package, and I have run it on about 30.000 documents, acquired 30 topics, and got the top 10 words for the topics, they look very good. But I would like to see which documents belong to which topic with the…

r lda topic-modeling tm

asked Feb 14 '13 at 12:22

d12n

votes

1 answer

Export pyLDAvis graphs as standalone webpage

i am analysing text with topic modelling and using Gensim and pyLDAvis for that. Would like to share the results with distant colleagues, without a need for them to install python and all required libraries. Is there a way to export interactive…

python gensim lda topic-modeling

asked Jan 30 '17 at 13:10

Darius

votes

1 answer

Predicting LDA topics for new data

It looks like this question has may have been asked a few times before (here and here), but it has yet to be answered. I'm hoping this is due to the previous ambiguity of the question(s) asked, as indicated by comments. I apologize if I am breaking…

r lda topic-modeling

asked Apr 20 '13 at 00:01

David

8,565
3
37
39

votes

4 answers

LDA model generates different topics everytime i train on the same corpus

I am using python gensim to train an Latent Dirichlet Allocation (LDA) model from a small corpus of 231 sentences. However, each time i repeat the process, it generates different topics. Why does the same LDA parameters and corpus generate…

python nlp lda topic-modeling gensim

asked Feb 25 '13 at 13:08

alvas

94,813
90
365
641

votes

1 answer

How to interpret LDA components (using sklearn)?

I used Latent Dirichlet Allocation (sklearn implementation) to analyse about 500 scientific article-abstracts and I got topics containing most important words (in german language). My problem is to interpret these values associated with the most…

python-3.x scikit-learn lda topic-modeling

asked Feb 01 '16 at 20:53

LSz

2 3

…

57 58 Next