2

I'm new to solr and I'm trying to build up a Question Answering system. I have indexed some Wikipedia pages, for example Nikola Tesla. https://en.wikipedia.org/wiki/Nikola_Tesla

My question is: It is possible and how to type a query as a question in Solr?

I splitted the Wikipedia page by "Contents" (corresponding to SectionTitle), so...for the query pageTitle:Nikola Tesla my results are:

"response":{"numFound":23,"start":0,"docs":[
{
        "sectionTitle":"First Paragraph",
        "pageTitle":"Nikola Tesla",
        "text":["Born and raised in the Austrian Empire, Tesla received an advanced education in engineering and physics in the 1870s and gained practical experience in the early 1880s working in telephony and at Continental Edison in the new electric power industry.]},
{
        "sectionTitle":"Early years",
        "pageTitle":"Nikola Tesla",
        "text":["Nikola Tesla was born an ethnic Serb in the village Smiljan, Lika county, in the Austrian Empire (present day Croatia), on 10 July [O.S. 28 June] 1856. etc..]}]
  }}

My schema is the following:

  <field name="id" type="string" indexed="true" required="true" stored="true"/>
  <field name="pageTitle" type="text_en" indexed="true" stored="true"/>
  <field name="sectionTitle" type="text_en" indexed="true" stored="true"/>
  <field name="title" type="text_en" indexed="true" stored="true"/>
  <field name="text" type="text_general" indexed="true" stored="true"/>

Is it possible to type a query as a question? And how to show results similar to the question? For example, looking above...

How can I type a query When Nikola Tesla born? and obtain the paragraph:

"sectionTitle":"Early years",
"pageTitle":"Nikola Tesla",
"text":["Nikola Tesla was born an ethnic Serb in the village Smiljan, Lika county, in the Austrian Empire (present day Croatia), on 10 July [O.S. 28 June] 1856."]

Or a query Where Nikola Tesla born? / Where Nikola Tesla raised? and obtain:

"Born and raised in the Austrian Empire, Tesla received...."?

Thanks in advance.

Giorgio
  • 23
  • 5
  • Did you tried : pageTitle:"When Nikola Tesla born?" IS it giving you any result? – Abhijit Bashetti May 20 '19 at 09:24
  • To get the similar to the question you can use more like feature of solr – Abhijit Bashetti May 20 '19 at 09:41
  • @Abhijit Bashetti yes, i tried pageTitle:"When Nikola Tesla born?" and I didn't get any result, same for sectionTitle:"When Nikola Tesla born?". For text:"When Nikola Tesla born?" it show me results but from others pageTitle, not from Nikola Tesla. – Giorgio May 20 '19 at 09:53
  • try by defining the mm clause...https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html – Abhijit Bashetti May 20 '19 at 10:09
  • But the word "born" is in the field text, how come will it find the same if the same text is not indexed in the field pageTitle or sectionTitle? – Abhijit Bashetti May 20 '19 at 10:13
  • @Abhijit Bashetti With the mm clause I haven't solved my problem. As you said the word "born" is in the field text so I think my search query should be mostly on the text field, for example `text:Where Nikola Tesla born?` The problem is that for this query documents that talk about something else are returned, I think because they contain the word "Where" in the text field. – Giorgio May 20 '19 at 13:07
  • What is it returned? Relavent documents or irrelevant documents? – Abhijit Bashetti May 20 '19 at 14:23
  • @Abhijit Bashetti Unfortunately, irrelevant documents. – Giorgio May 20 '19 at 19:10
  • check if this can be of any help to you https://medium.com/@pablocastelnovo/if-they-match-i-want-them-to-be-always-first-boosting-documents-in-apache-solr-with-the-boost-362abd36476c – Abhijit Bashetti May 21 '19 at 04:02
  • also check by Boolean Operators Supported by the Standard Query Parser...or you can try the Proximity Searches of solr – Abhijit Bashetti May 21 '19 at 05:35
  • Did the below answer helped you? – Abhijit Bashetti Jun 06 '19 at 07:30
  • 1
    Yes, mine was a problem concerning boolean operators and the question mark "?". Your comments helped me understand what to do, thank you! – Giorgio Jun 06 '19 at 21:10

1 Answers1

0

A proximity search looks for terms that are within a specific distance from one another.

To perform a proximity search, add the tilde character ~ and a numeric value to the end of a search phrase. For example, to search for a "apache" and "jakarta" within 10 words of each other in a document, use the search:

"jakarta apache"~10

The distance referred to here is the number of term movements needed to match the specified phrase. In the example above, if "apache" and "jakarta" were 10 spaces apart in a field, but "apache" appeared before "jakarta", more than 10 term movements would be required to move the terms together and position "apache" to the right of "jakarta" with a space in between.

You can also try the proximity search in your case as below.

text:"Nikola Tesla born?"~10

text:"Austrian Empire engineering"~10

text:"Tesla born?"~10

Please refer the images from the solr admin tool.

For text:"Nikola Tesla born?"~10

Solr analysis 1

For text:"Austrian Empire engineering"~10

Solr analysis 2

For text:"Tesla born?"~10 the query goes as "http://localhost:8983/solr/TestCore/select?q=text:"Tesla born?"~10"

Solr analysis 3

Abhijit Bashetti
  • 7,438
  • 6
  • 27
  • 43