2

We have defined Solr Schema in IBM Watson for Worksheet as collection of questions. There are few schema elements which are multivalued fields. We are able to load documents and index documents using Retrieve and Rank Services but When generating Training Dataset, We get data type conversion errors.

Schema

    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
    <field name="worksheet_number" type="watson_text_en" indexed="true" stored="true" />
    <field name='question_number' type='int' indexed='true' stored='true' multiValued='true' />
    <field name='question_type' type='watson_text_en' indexed='true' stored='true' multiValued='true' />
    <field name='answer' type='watson_text_en' indexed='true' stored='true' multiValued='true' />
    <field name='text' type='watson_text_en' indexed='true' stored='true' multiValued='true' />

Training command using train.py which throws an exception curl -u "***********":"************" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax15-rank-3108

Unfortunately, an exception doesn't give any information on which field it is throwing the exception.

Java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.Float
  at com.ibm.watson.hector.plugins.ss.FCFeatureGeneratorComponent.toCSV(FCFeatureGeneratorComponent.java:677)
  at com.ibm.watson.hector.plugins.ss.FCFeatureGeneratorComponent.process(FCFeatureGeneratorComponent.java:364)
  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
  at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:651)
  at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
  at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
  at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
  at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
  at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
  at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
  at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
  at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
  at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
  at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
  at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
  at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
  at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
  at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
  at org.eclipse.jetty.server.Server.handle(Server.java:499)
  at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
  at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
  at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
  at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
  at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
  at java.lang.Thread.run(Thread.java:745)\n","code":500}}

There appears to be an issue with MultiValue fields defined in Schema and Generating Training data set for it. MultiValue fields allow us to store multiple questions, texts for given Worksheet number with any data types. E.g. integer for Question Number with values [1,2,3,4,5] in question_number field.

When Generating Training Data set, Watson API throws the exception with data type conversion error "java.util.ArrayList cannot be cast to java.lang.Float".

Radiodef
  • 35,285
  • 14
  • 78
  • 114
Nik A
  • 21
  • 1

1 Answers1

1

This issue is potentially caused by having a multi-valued field called "score" in your data. It could be an explicitly defined field in schema.xml or a valid dynamic field in some documents. Could you check if this is the case? If so, it would need to be changed into a different name (for example "my_score"), because this would conflict with the name hardcoded by Solr for returning the scores.

Wenlong
  • 41
  • 1
  • Thanks for support, We do have 'score' field, I will modify schema, try upload and confirm if it works. – Nik A Jun 20 '16 at 10:20
  • It is working now, We got past exception and It is creating Training Dataset – Nik A Jun 21 '16 at 08:57