11

I'm interested in extracting triples (subject,predicate,object) from questions.

For example, I would like to transform the following question :

Who is the wife of the president of the USA?

to :

(x,isWifeOf,y) ∧ (y,isPresidentof,USA)

x and y are unknows that we have to find in order to answer the question (/\ denotes the conjunction).

I have read a lot of papers about this topic and I would like to perform this task using existing parsers such as Stanford parser. I know that parsers output 2 types of data :

  • parse structure tree (constituency relations)
  • dependency tree (dependency relations)

Some papers try to build triples from the parse structure tree (e.g., Triple Extraction from Sentences), however this approach seems to be too weak to deal with complicated questions.

On the other hand, dependency trees contain a lot of relevant information to perform the triple extraction. A lot of papers claim to do that, however I didn't find any of them that gives explicitely a detailed procedure or an algorithm. Most of the time, authors say they analyze the dependencies to produce triples according to some rules they didn't give.

Does anyone know any paper with more information on extracting (subject,predicate,object) from dependency tree of a question?

David Batista
  • 2,542
  • 2
  • 21
  • 38
permanganate
  • 699
  • 1
  • 5
  • 19
  • 3
    This is certainly an interesting question, but it's not really on topic here at Stack Overflow. It's too broad, for one (as there are potentially lots and lots of answers), and requests for off-site resources, books, etc., are specifically off-topic: (from the close reasons): "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – Joshua Taylor Oct 14 '14 at 12:01
  • All that said, how would you (programmatically) determine that "(x,isWifeOf,y) ∧ (y,isPresidentof,USA)" is *the* desired triplification? How do you determine which things should be constants and predicates? Why not (x,isFirstLadyOf,USA)? This will be especially important when you start handling genuinely n-ary relations. – Joshua Taylor Oct 14 '14 at 12:03
  • 3
    There is not a canonical form for triples (a lot of combinations are possibles). In the above example, "(x,isWifeOf,y) ∧ (y,isPresidentof,USA)" is better than "(x,isFirstLadyOf,USA)" because a database (such as DBpedia) is more likely to contains entries for isPresidentof or isWifeOf than isFirstLadyOf. In a first time, all corrects triples would be good... – permanganate Oct 14 '14 at 12:49
  • 1
    Yes, I agree; my point was that your *question* doesn't have the technical specifications for what triples you're trying to extract from the given text. That's part of the reason that it's too broad and open ended for Stack Overflow (but quite possibly a question for a forum built for discussion). – Joshua Taylor Oct 14 '14 at 12:52
  • @permanganate Hi, mate, have you found a proper strategy to extract the triplet? Did you end up using typed dependencies or parsed tree or maybe both? – grumpynerd Apr 25 '15 at 12:16
  • 1
    Yes, I obtained great results thanks to CoreNLP from Stanford (mostly using the dependency tree) :) My work is part of "Projet Pensées Profondes" (http://projetpp.github.io/), you can find more details in this document: http://projetpp.github.io/documentation/finalReport.pdf (section 5.1). The algorithm (https://github.com/ProjetPP/PPP-QuestionParsing-Grammatical) has been deeply improved since the report has been released, I will post a paper presenting the improvements in a few months. – permanganate Apr 25 '15 at 21:38
  • @permanganate thanks mate. – grumpynerd Apr 26 '15 at 09:49
  • 1
    You mentioned using either the parse tree OR the dependency parse, in my research I found that it's often useful to use BOTH. Here I describe my approach: http://ieeexplore.ieee.org/document/7489041/?tp=&arnumber=7489041 – Josep Valls Apr 17 '17 at 18:03
  • Update: although these are not new ideas, the mentioned task is attempting to be solved by a method named "Semantic Role Labeling" SRL. It is a NLP task that aims to "label" the semantic role of each entity retrieved from text. It attempts to find "agents", "goals", "results", etc. This could be used to identify roles and fill triplets. Keep in mind this is still an open field of research that is being aimed by some projects such as FrameNet. For more information, read: https://en.wikipedia.org/wiki/Semantic_role_labeling and https://web.stanford.edu/~jurafsky/slp3/18.pdf – Tiago Duque Jun 06 '19 at 13:23

1 Answers1

0

Textacy has a decent implementation of triple extraction. It's built on top of SpaCy, a popular NLP library in Python. You seem to be specifically interested into the underlying algorithm for triple extraction, so maybe looking into the source code of their algorithm could give you some inspiration. See here: https://textacy.readthedocs.io/en/stable/_modules/textacy/extract.html#subject_verb_object_triples

Moritz
  • 1,004
  • 4
  • 6