Questions tagged [named-entity-extraction]

92 questions
21
votes
3 answers

Training n-gram NER with Stanford NLP

Recently I have been trying to train n-gram entities with Stanford Core NLP. I have followed the following tutorials - http://nlp.stanford.edu/software/crf-faq.shtml#b With this, I am able to specify only unigram tokens and the class it belongs to.…
20
votes
2 answers

How to use DBPedia to extract Tags/Keywords from content?

I am exploring how I can use Wikipedia's taxonomy information to extract Tags/Keywords from my content. I found articles about DBPedia. DBpedia is a community effort to extract structured information from Wikipedia and to make this information…
Pritam Raut
  • 289
  • 1
  • 3
  • 12
13
votes
6 answers

Extracting webpage information based on a template in Java

Right now I use Jsoup to extract certain information (not all the text) from some third party webpages, I do it periodically. This works fine until the HTML of certain webpage changes, this change leads to a change in the existing Java code, this is…
vikasing
  • 11,112
  • 2
  • 22
  • 25
10
votes
1 answer

What are the entity types for NLTK?

I've been trying to find the full list of entity types of NLTK. I was only able to find the most common ones on this page, but not the full list. Could you please share the full list of named entity types NLTK has?
Furkanicus
  • 309
  • 2
  • 16
10
votes
3 answers

Methods for extracting locations from text?

What are the recommended methods for extracting locations from free text? What I can think of is to use regex rules like "words ... in location". But are there better approaches than this? Also I can think of having a lookup hash table table with…
8
votes
1 answer

Difference between named entity recognition and resolution?

What is the difference between named entity recognition and named entity resolution? Would appreciate a practical example.
London guy
  • 24,942
  • 40
  • 110
  • 169
8
votes
4 answers

Entity extraction web services

Are there any paid or free named entity recognition web services available. Basically I'm looking for something - where if I pass a text like: "John had french fries at Burger King" It should be identify - something along the lines: Person:…
Gublooo
  • 2,400
  • 7
  • 48
  • 87
7
votes
2 answers

How can Stanford CoreNLP Named Entity Recognition capture measurements like 5 inches, 5", 5 in., 5 in

I'm looking to capture measurements using Stanford CoreNLP. (If you can suggest a different extractor, that is fine too.) For example, I want to find 15kg, 15 kg, 15.0 kg, 15 kilogram, 15 lbs, 15 pounds, etc. But among CoreNLPs extraction rules, I…
6
votes
1 answer

How to recognize entities in text that is the output of optical character recognition (OCR)?

I am trying to do multi-class classification with textual data. Problem I am facing that I have unstructured textual data. I'll explain the problem with an example. consider this image for example: I want to extract and classify text information…
6
votes
1 answer

Semi-automatic annotation tool - How to find RDF Triplets

I'm developing a semi-automatic annotation tool for medical texts and I am completely lost in finding the RDF triplets for annotation. I am currently trying to use an NLP based approach. I have already looked into Stanford NER and OpenNLP and they…
5
votes
1 answer

extending NLP entity extraction

We would like to identify from a simple search neighborhood and streets in various cities. We don't only use English but also various other Cyrillic languages. We need to be able to identify spelling mistakes of locations. When looking at python…
Dory Zidon
  • 9,328
  • 2
  • 20
  • 33
5
votes
3 answers

Entity Extraction Library

I’m looking for a library that does text analysis and extract entities. The type/classification of an entity is not critical, it’s the identification of something that’s worthwhile that is critical. The entities universe in this case is infinite,…
4
votes
4 answers

Fast algorithm to extract thousands of simple patterns out of large amounts of text

I want to be able to match efficiently thousands of regexps out of GBs of text knowing that most of these regexps will be fairly simple, like: \bBarack\s(Hussein\s)?Obama\b \b(John|J\.)\sBoehner\b etc. My current idea is to try to extract out of…
jp.
  • 106
  • 5
4
votes
5 answers

How do I do Entity Extraction in Lucene

I m trying to do Entity Extraction (more like matching) in Lucene. Here is a sample workflow: Given some text (from a URL) AND a list people names, try to extract names of people from the text. Note: Names of people are not completely …
ankimal
  • 875
  • 2
  • 9
  • 22
3
votes
3 answers

Exact Dictionary based Named Entity Recognition with Stanford

I have a dictionary of named entities, extracted from Wikipedia. I want to use it as the dictionary of an NER. I wanted to know how can I use Stanford-NER with this data of mine. I have also downloaded Lingpipe, although I have no idea how can I use…
1
2 3 4 5 6 7