Sparql queries over collection and rdf:containers?

Question

Hi all rdf/sparql developpers. Here a question that have been nagging me for a while now but it seems nobody has answered it accurately since the rdf and sparql specifications have been released.

To state the case, RDF defines several ways to deal with multi-valued properties for resources; from creating as many triples with same subjet-predicate uris to collections or containers. That's all good since each pattern has its own characteristics.

But seen from the SPARQL point-of-view, it seems to me that querying those structures leads to overly complicated queries that (that's worse) are unable to transcribe into a sensible resultset: you cannot use variables to query arbitrary-length and propertyPath does not preserve "natural" order.

In a naïve way, in many SELECT or ASK queries, if I want to query or filter on the container's or list's values, I won't most of the time care what the underlying pattern really is (if any). So for instance:

<rdf:Description rdf:about="urn:1">
    <rdfs:label>
        <rdf:Alt>
            <rdf:li xml:lang="fr">Exemple n°1</rdf:li>
            <rdf:li xml:lang="en">Example #1</rdf:li>
        </rdf:Alt>
    </rdfs:label>
    <my:release>
        <rdf:Seq>
            <rdf:li>10.0</rdf:li>
            <rdf:li>2.4</rdf:li>
            <rdf:li>1.1.2</rdf:li>
            <rdf:li>0.9</rdf:li>
        </rdf:Seq>
    </my:release>
</rdf:Description>

<rdf:Description rdf:about="urn:2">
    <rdfs:label xml:lang="en">Example #2</rdfs:label>
</rdf:Description>

Obviously I would expect both resource to answer the query:

SELECT ?res WHERE { ?res rdfs:label ?label . FILTER ( contains(?label, 'Example'@en) }

I would also expect the query :

SELECT ?ver WHERE { <urn:1> my:release ?ver }

to return the rdf:Seq elements (or any rdf:Alt's for that matter) in original order (for the other patterns, it wouldn't matter if original order is preserved or not so why not keep it anyway ?) - unless explicitely specified through an ORDER BY clause.

Of course, it would be necessary to preserve compatibility with the old way, so perhaps a possibility would be to extend the propertyPath syntax with a new operator?

I feel it would simplify a lot the day-to-day SPARQL use-case.

Does it make sense to you? Moreover, do you see any reason why not to try implementing this?

EDIT corrected the example's urn:2 rdfs:label value that was incorrect

See https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/UserGuide/Typed%20Values%20and%20Lists for dotNetRDF's programmatic API for RDF lists — RobV, Apr 26 '13 at 17:11
Thanks for the tip Rob, but I was mainly speaking on the sparql level thinking it would just be nice to be able to simply query or filter on a property with SPARQL without postprocessing, knowing beforehand whether the object is a container or not and without resorting to ugly sparql like this : SELECT coalesce(?lit, ?labelObj) as ?label { ?s rdfs:label ?labelObj . OPTIONAL { ?label rdf:next*/rdf:first ?lit FILTER (isLiteral(?lit))})} or such things... — Max, Apr 26 '13 at 17:28
In fact I was thinking more on an XPath port for propertyPaths patterns like rdfs:label[], rdfs:label[0] or rdfs:label[i..n] for example that would come in handy and answer both container/lists or simple facts patterns (with of course no predictability in case of the fact pattern...) — Max, Apr 26 '13 at 17:40
@Max I know this is an older question, but I've added an answer that shows how you can get what it sounds like you wanted if you use RDF lists rather than the other RDF containers. — Joshua Taylor, Mar 05 '14 at 23:18

score 8 · Answer 1 · edited May 23 '17 at 11:53

I realize that this question already has an answer, but it's worth taking a look at what you can do here if you use RDF lists as opposed to the other types of RDF containers. First, the data that you've provided (after providing namespace declarations) in Turtle is:

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix my:    <https://stackoverflow.com/q/16223095/1281433/> .

<urn:2>  rdfs:label  "Example #2"@en .

<urn:1>  rdfs:label  [ a       rdf:Alt ;
                       rdf:_1  "Exemple n°1"@fr ;
                       rdf:_2  "Example #1"@en
                     ] ;
        my:release  [ a       rdf:Seq ;
                      rdf:_1  "10.0" ;
                      rdf:_2  "2.4" ;
                      rdf:_3  "1.1.2" ;
                      rdf:_4  "0.9"
                    ] .

The properties rdf:_n are the difficulty here, since they are the only thing that provides any real order to the elements in the sequence. (The alt doesn't really have an important sequence, although it still uses rdf:_n properties.) You can get all three labels if you use a SPARQL property path that makes the rdf:_n property optional:

prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?label where {
  ?x rdfs:label/(rdf:_1|rdf:_2|rdf:_3)* ?label
  filter( isLiteral( ?label ))
}

------------------------------
| x       | label            |
==============================
| <urn:1> | "Exemple n°1"@fr |
| <urn:1> | "Example #1"@en  |
| <urn:2> | "Example #2"@en  |
------------------------------

Let's look at what you can do with RDF lists instead. If you use lists, then you data is this:

@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix my:    <https://stackoverflow.com/q/16223095/1281433/> .

<urn:2>  rdfs:label  "Example #2"@en .

<urn:1>  rdfs:label  ( "Exemple n°1"@fr "Example #1"@en ) ;
        my:release  ( "10.0" "2.4" "1.1.2" "0.9" ) .

Now you can get the labels relatively easily:

prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?label where {
  ?x rdfs:label/(rdf:rest*/rdf:first)* ?label
  filter( isLiteral( ?label ))
}

------------------------------
| x       | label            |
==============================
| <urn:1> | "Exemple n°1"@fr |
| <urn:1> | "Example #1"@en  |
| <urn:2> | "Example #2"@en  |
------------------------------

If you want the position of the labels in the list of labels, you can even get that, although it makes the query a bit more complicated:

prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?x ?label (count(?mid)-1 as ?position) where {
  ?x rdfs:label ?y .
  ?y rdf:rest* ?mid . ?mid rdf:rest*/rdf:first? ?label .
  filter(isLiteral(?label))
}
group by ?x ?label

-----------------------------------------
| x       | label            | position |
=========================================
| <urn:1> | "Exemple n°1"@fr | 0        |
| <urn:1> | "Example #1"@en  | 1        |
| <urn:2> | "Example #2"@en  | 0        |
-----------------------------------------

This uses the technique in Is it possible to get the position of an element in an RDF Collection in SPARQL? to compute the position of each value in the list that is the object of rdfs:label, starting from 0, and assigning 0 to elements that aren't in a list.

Joshua, your answer do not address the initial issue but I must agree that lists are much more easier to work with. I considered containers first since it is less "verbose" and the real "type" of list is then bundled within the RDF. In fine even if simplier for updates, it proves a pain in the a.. to work with containers query-wise and I think the effective logic behind those multi-valued properties still can be handled somewhere else... so to make it short thanks for the answer ;) — Max, Mar 07 '14 at 13:30

user2313838 · Accepted Answer · 2013-04-26T15:08:48.167

RDF defines a vocabulary for collections and containers but they hold no special meaning in terms of how graphs containing them should be interpreted. They aren't intended for and aren't really appropriate for representing multi-valued properties.

In general, saying:

:A :predicate [ a rdf:Alt ; rdf:_1 :B ; rdf:_2 :C ] .

Is not equivalent to

:A :predicate :B , :C .

Let's say the predicate is owl:sameAs:

:A owl:sameAs [ a rdf:Alt ; rdf:_1 :B ; rdf:_2 :C ] .

The above says that :A names an individual containing :B and :C, whereas:

:A owl:sameAs :B , :C .

says that :A, :B, and :C are the same individual.

SPARQL is agnostic about containers and collections (aside from the syntactic shorthand for rdf:List). If you want a more convenient way of working with collections, many RDF APIs including Jena and rdflib have first-class representations for them.

Addendum

The way to model multi-valued properties--that is, to model that both "Example n°1"@fr and and "Example #1"@en are labels for urn:1--is to simply state the two facts:

<rdf:Description rdf:about="urn:1">
    <rdfs:label xml:lang="fr">Exemple n°1</rdfs:label>
    <rdfs:label xml:lang="en">Example #1</rdfs:label>
    ...
</rdf:Description>

And the query:

SELECT ?res WHERE { ?res rdfs:label ?label . FILTER ( contains(?label, 'Example'@en) ) }

will match on the English labels for <urn:1> and <urn:2>.

For the my:release property where you have a multi-valued property and an ordering on its values, it's a little trickier. You could define a new property (e.g) my:releases whose value is an rdf:List or rdf:Seq. my:release gives the direct relationship and my:releases an indirect relationship specifying an explicit ordering. With an inferencing store and the appropriate rule, you would only have to provide the latter. Unfortunately this doesn't make it any easier to use the ordering within SPARQL.

An approach that's easier to work with in SPARQL and non-inferencing stores would be to make the versions themselves objects with properties that define the ordering:

  <rdf:Description rdf:about="urn:1">
    <rdfs:label xml:lang="fr">Exemple n&#xB0;1</rdfs:label>
    <rdfs:label xml:lang="en">Example #1</rdfs:label>
    <my:release>
      <my:Release>
        <dc:issued rdf:datatype="&xsd;date">2008-10-10/dc:issued>
        <my:version>10.0</my:version>
      </my:Release>
    </my:release>
    <my:release>
      <my:Release>
        <my:version>2.4</my:version>
        <dc:issued rdf:datatype="&xsd;date">2007-05-01</dc:issued>
      </my:Release>
    </my:release>
    ...
  </rdf:Description>

In the above, the date can be used to order the results as there is no explicit sequence anymore. The query is only slightly more complex:

SELECT ?ver 
WHERE { <urn:1> my:release [ my:version ?ver ; dc:issued ?date ] }
ORDER BY ?date

ok, thanks for pointing me an counter-example to my RDF use-case, I was not saying that those patterns are equivalent or that containers should be ignored totally but that SPARQL should provide a "shortcut" way to query this if needed. Could you then point the "best" way to model multivalued ordered properties and what about the SPARQL to query this ? — Max, Apr 26 '13 at 06:25
In an RDF way you're right, but I was thinking more along the SPARQL perspective where a sparql query doesn't implying anything about the underlying data and schematas. I finally formalzized my thought on this, see the comments to @RobV above — Max, Apr 26 '13 at 18:06

Sparql queries over collection and rdf:containers?

2 Answers2