We have defined Solr Schema in IBM Watson for Worksheet as collection of questions. There are few schema elements which are multivalued fields. We are able to load documents and index documents using Retrieve and Rank Services but When generating Training Dataset, We get data type conversion errors.
Schema
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
<field name="worksheet_number" type="watson_text_en" indexed="true" stored="true" />
<field name='question_number' type='int' indexed='true' stored='true' multiValued='true' />
<field name='question_type' type='watson_text_en' indexed='true' stored='true' multiValued='true' />
<field name='answer' type='watson_text_en' indexed='true' stored='true' multiValued='true' />
<field name='text' type='watson_text_en' indexed='true' stored='true' multiValued='true' />
Training command using train.py which throws an exception curl -u "***********":"************" "https://gateway.watsonplatform.net/retrieve-and-rank/api/v1/rankers/3b140ax15-rank-3108
Unfortunately, an exception doesn't give any information on which field it is throwing the exception.
Java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.Float
at com.ibm.watson.hector.plugins.ss.FCFeatureGeneratorComponent.toCSV(FCFeatureGeneratorComponent.java:677)
at com.ibm.watson.hector.plugins.ss.FCFeatureGeneratorComponent.process(FCFeatureGeneratorComponent.java:364)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:651)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)\n","code":500}}
There appears to be an issue with MultiValue fields defined in Schema and Generating Training data set for it. MultiValue fields allow us to store multiple questions, texts for given Worksheet number with any data types. E.g. integer for Question Number with values [1,2,3,4,5] in question_number field.
When Generating Training Data set, Watson API throws the exception with data type conversion error "java.util.ArrayList cannot be cast to java.lang.Float".