Get the K best parses of a sentence with Stanford Parser

Question

I want to have the K best parses of a sentence, I figured that this can be done with ExhaustivePCFGParser Class , the problem is that I don't know how to use this class , more precisely haw can I instantiate this class ? ( the constructor is : ExhaustivePCFGParser(BinaryGrammar bg, UnaryGrammar ug, Lexicon lex, Options op, Index stateIndex, Index wordIndex, Index tagIndex) ) but i don't know how to fit all this parameters

Is there any more easy way to have the K best parses ?

score 2 · Accepted Answer · edited Feb 09 '15 at 03:27

2

In general you do things via a LexicalizedParser object which is a "grammar" which provides all these things (the grammars, lexicon, indices, etc.).

From the command-line, the following will work:

java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt

At the API level, you need to get a LexicalizedParserQuery object. Once you have a LexicalizedParser lp (as in ParserDemo.java) you can do the following:

LexicalizedParser lp = ... // Load / train a model
LexicalizedParserQuery lpq = lp.parserQuery();
lpq.parse(sentence);
List<ScoredObject<Tree>> kBest = lpq.getKBestPCFGParses(20);

A LexicalizedParserQuery is sort of equivalent to a java regex Matcher.

Note: at present kBest parsing works well only for PCFG not factored grammars.

edited Feb 09 '15 at 03:27

Jon Gauthier

23,502
5
60
68

answered Dec 28 '12 at 14:58

Christopher Manning

8,977
32
46

Thank you Chris , it works :) , I just want to point out that the sentence in " lpq.parse(sentence);" has to be a tokenized string . – Amine Jan 30 '13 at 11:45
Agreed, you need to have gotten a List of words first, using either a DocumentPreprocessor or a Tokenizer (as in ParserDemo.java) or using other code of your own that does this. – Christopher Manning Feb 02 '13 at 18:38
@Amine did you get it working? I'm trying to get the k best parse trees of a sentence via the API but I'm getting a NullPointerException at edu.stanford.nlp.parser.lexparser.Debinarizer.transformTreeHelper(Debinarizer.java:34) `if ((!newChild.isLeaf()) && newChild.label().value().indexOf('@') >= 0)` – Josep Valls Jul 29 '13 at 05:57
Just tested again with v.3.2.0 release. Works for me. If you have a reproducible bug, please send it in. – Christopher Manning Nov 03 '13 at 16:58

score 0 · Answer 2 · answered Oct 16 '19 at 15:22

This is a work-around I implemented based on Christopher Manning's answer above, assuming you wish to use Python. The Python wrapper for CoreNLP does not have "K-best parse trees" implemented so the alternative is to use the terminal command

java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 20 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt

Do note that you need to have Stanford CoreNLP with all the JAR files downloaded into a directory, as well as the pre-requisite Python libraries installed (see the import statements)

import os
import subprocess
import nltk
from nltk.tree import ParentedTree

ip_sent = "a quick brown fox jumps over the lazy dog."

data_path = "<Your path>/stanford-corenlp-full-2018-10-05/data/testsent.txt" # Change the path of working directory to this data_path
with open(data_path, "w") as file:
    file.write(ip_sent) # Write to the file specified; the text in this file is fed into the LexicalParser

os.chdir("/home/user/Sidney/Vignesh's VQA/SpElementEx/extLib/stanford-corenlp-full-2018-10-05") # Change the working directory to the path where the JAR files are stored
terminal_op = subprocess.check_output('java -mx500m -cp "*" edu.stanford.nlp.parser.lexparser.LexicalizedParser -printPCFGkBest 5 edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz data/testsent.txt', shell = True) # Run the command via the terminal and capture the output in the form of bytecode
op_string = terminal_op.decode('utf-8') # Convert to string object 
parse_set = re.split("# Parse [0-9] with score -[0-9][0-9].[0-9]+\n", op_string) # Split the output based on the specified pattern 
print(parse_set)

# Print the parse trees in a pretty_print format
for i in parse_set:
    parsetree = ParentedTree.fromstring(i)
    print(type(parsetree))
    parsetree.pretty_print()

Hope this helps.

Get the K best parses of a sentence with Stanford Parser

2 Answers2

Linked