2

I would like to write a single SPARQL query to find the k nearest neighbors for a set of vectors. To find the average label for the 100 nearest neighbors for a single vector I can use the following query:

PREFIX : <ml://>
PREFIX vector: <ml://vector/>
PREFIX feature: <ml://feature/>

SELECT (AVG(?label) as ?prediction)
WHERE {
  {
    SELECT ?other_vector (COUNT(?common_feature) as ?similarity)
    WHERE { vector:0 :has ?common_feature . 
      ?other_vector :has ?common_feature .
    } GROUP BY ?other_vector ORDER BY DESC(?similarity) LIMIT 100
  }
  ?other_vector :hasLabel ?label .
}

Is there a way to do this for multiple vectors in a single query?

Abraham D Flaxman
  • 2,814
  • 19
  • 39

1 Answers1

0

Unless I'm overlooking something, you can do this by replacing the URI vector:0 with a variable, like so:

SELECT ?vector (AVG(?label) as ?prediction)
WHERE {
  {
    SELECT ?vector ?other_vector (COUNT(?common_feature) as ?similarity)
    WHERE { ?vector :has ?common_feature . 
      ?other_vector :has ?common_feature .
      FILTER(?vector != ?other_vector)
    } GROUP BY ?other_vector ORDER BY DESC(?similarity) LIMIT 100
  }
  ?other_vector :hasLabel ?label .
}

I added a filter condition to check that ?vector and ?other_vector are not equal, whether that is necessary is up to you of course :)

If you need to restrict the list of vectors for which you want to find a match, you can use a VALUES clause to restrict possible bindings for ?vector:

VALUES ?vector { vector:0 vector:1 ... } 
Jeen Broekstra
  • 20,156
  • 4
  • 43
  • 67
  • Thank you for your help. If I understand this correctly, I don't think that this solves my problem, however.Syntactically, ?vector cannot appear in the SELECT since it is not included in the GROUP BY. But just adding it, as is "GROUP BY ?vector ?other_vector" will use the 100 most similar vectors _overall_, while I want the 100 most similar for each vector on my list. – Abraham D Flaxman Jan 28 '13 at 23:14
  • That's not a problem, you can add the values clause on the inner select to restrict ?vector to your list. – Jeen Broekstra Jan 29 '13 at 02:06
  • Alternatively, you can leave the `?vector` variable out of the projection, I just assumed you needed it to be able to correlate the results. – Jeen Broekstra Jan 29 '13 at 04:54
  • I think you have misunderstood the problem, or perhaps I have stated it unclearly. For the k=2 case, if v has neighbors from nearest to furthest (a,b,c), and u has neighbors (d,e,f), I would like results (v,a), (v,b), (u,d), (u,e). But after fixing the syntax in your query, if v is nearer to c than u is to d, the results will be (v,a), (v,b), (v,c), (u,d). In other words, the challenge is to get k neighbors for _each_ vector on the list. – Abraham D Flaxman Jan 29 '13 at 19:40