Is it possible to get the position of an element in an RDF Collection in SPARQL?

Question

Suppose that I have the following Turtle declaration:

@prefix : <http://example.org#> .

:ls :list (:a :b :c)

Is there a way to get the positions of the elements in the collection?

For example, with this query:

PREFIX :     <http://example.org#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 

SELECT ?elem WHERE {
 ?x :list ?ls .
 ?ls rdf:rest*/rdf:first ?elem .
}

I get:

--------
| elem |
========
| :a   |
| :b   |
| :c   |
--------

But I would like a query to obtain:

--------------
| elem | pos |
==============
| :a   |  0  |
| :b   |  1  |
| :c   |  2  |
--------------

Is it possible?

I believe all known sparql engines return the results of this query in the expected order. So if you don't need the index, maybe that is enough? — Vladimir Alexiev, Nov 22 '18 at 21:07

score 46 · Accepted Answer · edited May 23 '17 at 12:10

A Pure SPARQL 1.1 Solution

I've extended the data to make the problem a little harder. Let's add a duplicate element to the list, e.g., an additional :a at the end:

@prefix : <http://example.org#> .

:ls :list (:a :b :c :a) .

Then we can use a query like this to extract each list node (and its element) along with the position of the node in the list. The idea is that we can match all the individual nodes in the list with a pattern like [] :list/rdf:rest* ?node. The position of each node, though, is the number of intermediate nodes between the head of the list and ?node. We can match each of those intermediate nodes by breaking the pattern down into

[] :list/rdf:rest* ?mid . ?mid rdf:rest* :node .

Then if we group by ?node, the number of distinct ?mid bindings is the position of ?node in the list. Thus we can use the following query (which also grabs the element (the rdf:first) associated with each node) to get the positions of elements in the list:

prefix : <https://stackoverflow.com/q/17523804/1281433/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?element (count(?mid)-1 as ?position) where { 
  [] :list/rdf:rest* ?mid . ?mid rdf:rest* ?node .
  ?node rdf:first ?element .
}
group by ?node ?element

----------------------
| element | position |
======================
| :a      | 0        |
| :b      | 1        |
| :c      | 2        |
| :a      | 3        |
----------------------

This works because the structure of an RDF list is a linked list like this (where ?head is the beginning of the list (the object of :list), and is another binding of ?mid because of the pattern [] :list/rdf:rest* ?mid):

graphical representation of RDF list

Comparison with Jena ARQ Extensions

The asker of the question also posted an answer that uses Jena's ARQ extensions for working with RDF lists. The solution posted in that answer is

PREFIX :     <http://example.org#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>

SELECT ?elem ?pos WHERE {
 ?x :list ?ls .
 ?ls list:index (?pos ?elem).
}

This answer depends on using Jena's ARQ and enabling the extensions, but it is more concise and transparent. What isn't obvious is whether one has an obviously preferable performance. As it turns out, for small lists, the difference isn't particularly significant, but for larger lists, the ARQ extensions have much better performance. The runtime for the pure SPARQL query quickly becomes prohibitively long, while there's almost no difference in the version using the ARQ extensions.

-------------------------------------------
| num elements | pure SPARQL | list:index |
===========================================
|      50      |    1.1s     |    0.8s    |
|     100      |    1.5s     |    0.8s    |
|     150      |    2.5s     |    0.8s    |
|     200      |    4.8s     |    0.8s    |
|     250      |    9.7s     |    0.8s    |
-------------------------------------------

These specific values will obviously differ depending on your setup, but the general trend should be observable anywhere. Since things could change in the future, here's the particular version of ARQ I'm using:

$ arq --version
Jena:       VERSION: 2.10.0
Jena:       BUILD_DATE: 2013-02-20T12:04:26+0000
ARQ:        VERSION: 2.10.0
ARQ:        BUILD_DATE: 2013-02-20T12:04:26+0000

As such, if I knew that I had to process lists of non-trivial sizes and that I had ARQ available, I'd use the extension.

@Labra I compared the performance of the pure SPARQL query with the one that you proposed that uses the ARQ extensions. The ARQ extensions are _much_ better performance-wise. If you can use Jena, I'd suggest that you keep using the extensions. — Joshua Taylor, Jul 09 '13 at 12:48
FYI, this answer was [cited in the mailing list thread *SPARQL-friendly alternative to rdf:Lists?*](https://lists.w3.org/Archives/Public/semantic-web/2013Oct/0069.html) — unor, Jan 09 '17 at 19:59

score 4 · Answer 2 · answered Jul 08 '13 at 13:46

4

I have found a way to do it using the property function library in ARQ. As Steve Harris says, this is non-standard.

PREFIX :     <http://example.org#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
PREFIX list: <http://jena.hpl.hp.com/ARQ/list#>

SELECT ?elem ?pos WHERE {
 ?x :list ?ls .
 ?ls list:index (?pos ?elem).
}

answered Jul 08 '13 at 13:46

Labra

1,252
1
10
30

It is non-standard but, as I mentioned in [my answer](http://stackoverflow.com/a/17530689/1281433), it's _much_ faster than the pure SPARQL solution. If it's an option (i.e., if you know that you have ARQ available) it's may well be worth using the non-standard extension! – Joshua Taylor Jul 26 '13 at 00:31

score 2 · Answer 3 · answered Jul 08 '13 at 11:04

TL;DR - short answer no with a but, long answer yes with an if.

Short answer

Not without stepping outside the standard, unless your lists are of a constrained length, then you can do something dirty like:

{ ?x :list (:a) BIND(1 AS ?length) }
UNION
{ ?x :list ([], :a) BIND(2 AS ?length) }
UNION
{ ?x :list ([], [], :a) BIND(3 AS ?length) }
...

etc.

Some RDF query engines have non-standard functions that will operation on RDF Lists, but you'd have to consult the documentation for your system.

Long answer

This is a symptom of RDF Lists having a terrible structure and definition. Somehow we ended up with two different ways of representing lists, both of which are horrible to work with!

If you control the data, use some more sensible representation, e.g.

<x> :member [
   rdf:value :a ;
   :ordinal 1 ;
], [
   rdf:value :b ;
   :ordinal 2 ;
], [
   rdf:value :c ;
   :ordinal 3 ;
]
...

then you can query with:

{ <x> :member [ rdf:value :a ; :ordinal ?position ] }

Thanks for your answer. I had the feeling that the answer was no, but I preferred to ask before :). The problem is that I want to have a list with any number of items (so your first solution is not valid, and I didn't want to modify the input data, so your second solution isn't also valid. The second solution is similar to the Ordered list ontology. — Labra, Jul 08 '13 at 12:14
@Labra Actually, using the aggregate functions available in SPARQL 1.1, you can collect the intermediate nodes in the list and count them, essentially getting the position. I've described this in [my answer](http://stackoverflow.com/a/17530689/1281433). — Joshua Taylor, Jul 08 '13 at 15:42

Is it possible to get the position of an element in an RDF Collection in SPARQL?

3 Answers3

A Pure SPARQL 1.1 Solution

Comparison with Jena ARQ Extensions

Linked