Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

6577 questions

344

votes

7 answers

What is "entropy and information gain"?

I am reading this book (NLTK) and it is confusing. Entropy is defined as: Entropy is the sum of the probability of each label times the log probability of that same label How can I apply entropy and maximum entropy in terms of text mining? Can…

asked Dec 07 '09 at 11:54

TIMEX

217,272
324
727
1,038

163

votes

16 answers

Failed loading english.pickle with nltk.data.load

When trying to load the punkt tokenizer... import nltk.data tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle') ...a LookupError was raised: > LookupError: > ********************************************************************* …

python jenkins nltk

asked Feb 01 '11 at 19:43

Martin

1,633
2
12
5

155

votes

10 answers

What is the difference between lemmatization vs stemming?

When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was?

python nlp nltk lemmatization

asked Nov 24 '09 at 00:48

TIMEX

217,272
324
727
1,038

151

votes

15 answers

n-grams in python, four, five, six grams?

I'm looking for a way to split a text into n-grams. Normally I would do something like: import nltk from nltk import bigrams string = "I really like python, it's pretty awesome." string_bigrams = bigrams(string) print string_bigrams I am aware that…

python string nltk n-gram

asked Jul 08 '13 at 16:35

Shifu

1,863
2
14
15

149

votes

8 answers

What are all possible pos tags of NLTK?

How do I find a list with all possible pos tags used by the Natural Language Toolkit (nltk)?

python nltk

asked Mar 13 '13 at 14:59

OrangeTux

9,840
7
45
67

148

votes

10 answers

How to check if a word is an English word with Python?

I want to check in a Python program if a word is in the English dictionary. I believe nltk wordnet interface might be the way to go but I have no clue how to use it for such a simple task. def is_english_word(word): pass # how to I implement…

python nltk wordnet

asked Sep 24 '10 at 16:01

Barthelemy

7,041
6
30
35

130

votes

11 answers

How to get rid of punctuation using NLTK tokenizer?

I'm just starting to use NLTK and I don't quite understand how to get a list of words from text. If I use nltk.word_tokenize(), I get a list of words and punctuation. I need only the words instead. How can I get rid of punctuation? Also…

python nlp tokenize nltk

asked Mar 21 '13 at 12:22

lizarisk

6,692
9
42
66

120

votes

14 answers

How to remove stop words using nltk or python

So I have a dataset that I would like to remove stop words from using stopwords.words('english') I'm struggling how to use this within my code to just simply take out these words. I have a list of the words from this dataset already, the part i'm…

python nltk stop-words

asked Mar 30 '11 at 12:36

Alex

1,595
5
14
15

115

votes

8 answers

how to check which version of nltk, scikit learn installed?

In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: import nltk echo nltk.__version__ but it stops shell script at import line in linux terminal tried to see in this…

python linux shell scikit-learn nltk

asked Feb 13 '15 at 13:46

nlper

1,987
5
21
35

111

votes

26 answers

pip issue installing almost any library

I have a difficult time using pip to install almost anything. I'm new to coding, so I thought maybe this is something I've been doing wrong and have opted out to easy_install to get most of what I needed done, which has generally worked. However,…

python pip nltk easy-install

asked May 04 '13 at 04:29

contentclown

1,141
2
8
8

107

votes

18 answers

Resource u'tokenizers/punkt/english.pickle' not found

My Code: import nltk.data tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle') ERROR Message: [ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py Traceback (most recent call last): File "mapper_local_v1.0.py", line 16,…

python unix nltk

asked Oct 26 '14 at 07:52

Supreeth Meka

1,679
2
12
16

votes

6 answers

Python: tf-idf-cosine: to find document similarity

I was following a tutorial which was available at Part 1 & Part 2. Unfortunately the author didn't have the time for the final section which involved using cosine similarity to actually find the distance between two documents. I followed the…

python machine-learning nltk information-retrieval tf-idf

asked Aug 25 '12 at 02:41

add-semi-colons

14,928
43
126
211

votes

18 answers

How to use Stanford Parser in NLTK using Python

Is it possible to use Stanford Parser in NLTK? (I am not talking about Stanford POS.)

python parsing nlp nltk stanford-nlp

asked Dec 14 '12 at 17:12

ThanaDaray

1,573
4
20
28

votes

7 answers

How to config nltk data directory from code?

python path directory nlp nltk

asked Aug 19 '10 at 13:42

Juanjo Conti

25,163
37
101
128

votes

4 answers

Creating a new corpus with NLTK

I reckoned that often the answer to my title is to go and read the documentations, but I ran through the NLTK book but it doesn't give the answer. I'm kind of new to Python. I have a bunch of .txt files and I want to be able to use the corpus…

python nlp nltk corpus

asked Feb 09 '11 at 23:19

alvas

94,813
90
365
641

2 3

…

99 100 Next