1

I just started working with SOLR. I want to index some html pages and got this from the documentation:

curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@/home/binaryplease/workspace/SOLRTest/HTMLPages/hello2.html"

Which works as expected as the query returns the expecteed results.

How would I do this exact POST inside a java application?

I tried this as I dont know how to do it with the HttpClient but it's not working:

String command = "curl \"http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true\" -F \"myfile=@\"" +f.getAbsoluteFile() + "\"";

        try { 
            proc = Runtime.getRuntime().exec(command );

            InputStream in = proc.getInputStream();
            InputStream err = proc.getErrorStream();

            System.out.println("Inputstream " + getStringFromInputStream(in));
            System.out.println("Errorstream " + getStringFromInputStream(err));

        } catch (IOException e) {
            e.printStackTrace();
        }

What would be the correct way to index a html file in SOLR and do a query using java? I would appreciate an example.

EDIT: I got this now which still isn't working:

    HttpClient httpclient = HttpClients.createDefault();
    HttpPost httppost = new HttpPost("http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true");

    // Request parameters and other properties.
    List<NameValuePair> params = new ArrayList<NameValuePair>(2);
    params.add(new BasicNameValuePair("myfile", "@/home/binaryplease/workspace/SOLRTest/HTMLPages/hello3.html"));
    httppost.setEntity(new UrlEncodedFormEntity(params, "UTF-8"));

    //Execute and get the response.
    HttpResponse response = httpclient.execute(httppost);
    HttpEntity entity = response.getEntity();

    if (entity != null) {
        InputStream instream = entity.getContent();
        try {
            System.out.println("Content " + getStringFromInputStream(instream));

        } finally {
            instream.close();
        }
    }
}

What am i doing wrong?

pinpox
  • 169
  • 2
  • 10
  • Have you googled the phrase "sending http post in java"? It might lead you to [this StackOverflow question](http://stackoverflow.com/questions/3324717/sending-http-post-request-in-java) – Ray Toal Jul 06 '14 at 01:07
  • @RayToal see my edit. – pinpox Jul 06 '14 at 01:27
  • What do you mean when you say "not working" - Do you get an error? Or just don't see a desired outcome? Are there logs that you can provide? Can you debug and see if there are any exceptions thrown? It is a challenge for us to understand the full problem without specifics. – Srikanth Venugopalan Jul 06 '14 at 02:59
  • Well there is no particular error, I get a 200 Response but the file is jsut not being indexed. If I query for a string that occurs in the html file, I dont get any results. – pinpox Jul 06 '14 at 08:39

1 Answers1

3

You should be using the SolJ client for accessing Solr from Java, which will likely be much easier for you than going the the HTTP interface:

SolrJ is an API that makes it easy for Java applications to talk to Solr. SolrJ hides a lot of the details of connecting to Solr and allows your application to interact with Solr with simple high-level methods.

The center of SolrJ is the org.apache.solr.client.solrj package, which contains just five main classes. Begin by creating a SolrServer, which represents the Solr instance you want to use. Then send SolrRequests or SolrQuerys and get back SolrResponses.

SolrServer is abstract, so to connect to a remote Solr instance, you'll actually create an instance of HttpSolrServer, which knows how to use HTTP to talk to Solr.

https://cwiki.apache.org/confluence/display/solr/Using+SolrJ

The setup is pretty easy:

String urlString = "http://localhost:8983/solr";
SolrServer solr = new HttpSolrServer(urlString);

And so are queries:

SolrQuery parameters = new SolrQuery();
parameters.set("q", mQueryString);

QueryResponse response = solr.query(parameters);

SolrDocumentList list = response.getResults();

Same thing with indexing:

String urlString = "http://localhost:8983/solr";
SolrServer solr = new HttpSolrServer(urlString);
SolrInputDocument document = new SolrInputDocument();
document.addField("id", "552199");
document.addField("name", "Gouda cheese wheel");
document.addField("price", "49.99");
UpdateResponse response = solr.add(document);

// Remember to commit your changes!

solr.commit();
John Petrone
  • 24,904
  • 5
  • 57
  • 67
  • I tried this and it works fine, but how would i substitute the `document.addField("id", "552199");` with a html file from the disk so that I can search for any strings that occur in it? – pinpox Jul 06 '14 at 08:38
  • Use ContentStreamUpdateRequest in SolrJ. See http://wiki.apache.org/solr/ContentStreamUpdateRequestExample for an example. – MatsLindh Jul 06 '14 at 11:06
  • @fiskfisk that works nicely, but do have to save the html page in a file (like in the example you gave) or is there a way to index a string containing all the html? – pinpox Jul 10 '14 at 17:38
  • Even though I have added the jar file `solr-solrj-6.4.1.jar` and added the dependency on my `pom.xml`, I'm still not able to add the `import org.apache.solr.client.solrj.SolrServer;` neither able to use the `setParser()` function. Any reason why? – HelloIT Mar 08 '17 at 16:50