2

How do I store tree with ordered children in RDF?

Input:

1. Title 1
   Some text  1.
2. Title 2
2.1. Title 2.1
     Some text under title 2.1.
2.2. Title 2.2
     Some text under title 2.2.

Titles can be arbitrary and not necessarily contain numbering.

How to get back all elements still ordered in one query?

Desired output:

|-----------+----------------------------+
| Title     | Content                    |
|-----------+----------------------------+
| Title 1   | Some text under title 1.   |
| Title 2   |                            |
| Title 2.1 | Some text under title 2.1. |
| Title 2.2 | Some text under title 2.2. |
|-----------+----------------------------+

EDIT: "Calculate length of path between nodes?" doesn't answer my question. It discusses unordered nodes. My question is specifically about ordered collection (list of lists) and getting back elements in original order.

Stanislav Kralin
  • 10,115
  • 4
  • 30
  • 52
teksisto
  • 646
  • 5
  • 9

2 Answers2

2

You could model your example data as follows:

ex:title1 a ex:Title ;
          rdfs:label "Title 1";
          rdfs:comment "some text under title 1".

ex:title2 a ex:Title ;
          rdfs:label "Title 2";
          rdfs:comment "some text under title 2".


ex:title21 a ex:Title ;
          rdfs:label "Title 2.1";
          rdfs:comment "some text under title 2.1".

ex:title22 a ex:Title ;
          rdfs:label "Title 2.2";
          rdfs:comment "some text under title 2.2".
ex:title2 ex:subtitles (ex:title21 ex:title22).
ex:titleCollection ex:subtitles (ex:title1 ex:title2) .

Then a query for all things in order could do a very basic lexical ordering by title:

select ?title ?content 
where {  
    [] ex:subtitles/rdf:rest*/rdf:first [ 
                      rdfs:label ?title ;
                      rdfs:comment ?content ] .
} 
order by ?title

result:

Evaluating SPARQL query...
+-------------------------------------+-------------------------------------+
| title                               | content                             |
+-------------------------------------+-------------------------------------+
| "Title 1"                           | "some text under title 1"           |
| "Title 2"                           | "some text under title 2"           |
| "Title 2.1"                         | "some text under title 2.1"         |
| "Title 2.2"                         | "some text under title 2.2"         |
+-------------------------------------+-------------------------------------+
4 result(s) (4 ms)

If you don't want to rely on the actual title property to provide correct ordering, you could of course introduce an explicit ordering property with hierarchical numbering, and use the value of that in your order by clause.

Jeen Broekstra
  • 20,156
  • 4
  • 43
  • 67
  • Though I appreciate your answer, I think ordering by title is cheating. Aren’t nodes in collection ordered already? Does it really necessary to introduce new property? If one moves some branch to other parent it would be painful to update this property. – teksisto Aug 14 '18 at 03:49
  • @teksisto well that wasn't really specified in your question - I had to make up the actual data modeling myself. You could possibly do some trick in sparql like in https://stackoverflow.com/questions/17523804, but it seems complicated to get that both correct and performant for not just a list, but a tree of lists, of unknown depth. Personally, I think you're better off solving this by means of multiple queries, or by using an API, or by the "cheat" I outlined in this answer. – Jeen Broekstra Aug 14 '18 at 04:00
  • I’ve updated my question. Sorry it was misleading. My question is based on the fact that even relational databases can retrieve such structure in one query using nested set. I thought SPARQL can deal with graph data better than relational databases. – teksisto Aug 14 '18 at 04:27
  • 1
    @teksisto it's not that SPARQL can't retrieve it - it obviously can, just not necessarily ordered as you expect. The problem is that RDF collections are not easy to traverse in order, unless you use specific custom functions. There are some triplestore that have extension functions available for this purpose. In vanilla SPARQL though, it's a pain. – Jeen Broekstra Aug 14 '18 at 04:34
  • @teksisto, perhaps Jeen Broekstra means Jena ARQ list functions: https://jena.apache.org/documentation/query/library-propfunc.html – Stanislav Kralin Aug 20 '18 at 07:42
  • @StanislavKralin Yeah, I've seen this page, but sadly none of these functions relate to my question. – teksisto Aug 20 '18 at 12:40
  • @teksisto, possibly you could ask on https://community.stardog.com/. Stardog has many graph-related features. – Stanislav Kralin Aug 23 '18 at 07:41
2

Option 1

You could serialize RDF into flattened JSON-LD and write simple recursive function in e. g. Javascript.

var nquads = `
<http://ex.com/titleCollection> <http://ex.com/subtitles> _:b1 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b2 .
_:b1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b3 .
_:b2 <http://www.w3.org/2000/01/rdf-schema#label> "Title 1" .
_:b2 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.com/Title> .
_:b2 <http://www.w3.org/2000/01/rdf-schema#comment> "some text under title 1" .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b4 .
_:b3 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:b4 <http://ex.com/subtitles> _:b5 .
_:b4 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.com/Title> .
_:b4 <http://www.w3.org/2000/01/rdf-schema#comment> "some text under title 2" .
_:b4 <http://www.w3.org/2000/01/rdf-schema#label> "Title 2" .
_:b5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b6 .
_:b5 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> _:b7 .
_:b6 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.com/Title> .
_:b6 <http://www.w3.org/2000/01/rdf-schema#comment> "some text under title 2.1" .
_:b6 <http://www.w3.org/2000/01/rdf-schema#label> "Title 2.1" .
_:b7 <http://www.w3.org/1999/02/22-rdf-syntax-ns#first> _:b8 .
_:b7 <http://www.w3.org/1999/02/22-rdf-syntax-ns#rest> <http://www.w3.org/1999/02/22-rdf-syntax-ns#nil> .
_:b8 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ex.com/Title> .
_:b8 <http://www.w3.org/2000/01/rdf-schema#comment> "some text under title 2.2" .
_:b8 <http://www.w3.org/2000/01/rdf-schema#label> "Title 2.2" .
`;

jsonld.fromRDF(nquads, {format: 'application/nquads'}, function (err, doc) { 
   print(doc, "http://ex.com/titleCollection") 
});

function print(doc, id) {
   var what = get(doc, id)
   var label = what['http://www.w3.org/2000/01/rdf-schema#label']
   var comment = what['http://www.w3.org/2000/01/rdf-schema#comment']
   var subtitles = what['http://ex.com/subtitles']
   if (label) console.log(label[0]['@value'])
   if (comment) console.log(comment[0]['@value'])
   if (subtitles) {
      for (var i of subtitles[0]['@list']) print(doc, i['@id'])
   }
}

function get(doc, id) {return doc.find((element) => (element['@id'] == id))}
<script src="https://cdnjs.cloudflare.com/ajax/libs/jsonld/0.4.12/jsonld.min.js"></script>

Original Turtle was:

@prefix ex: <http://ex.com/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:titleCollection ex:subtitles
    (
        [
        a ex:Title ; rdfs:label "Title 1" ;
        rdfs:comment "some text under title 1" 
        ]
        [
        a ex:Title ; rdfs:label "Title 2" ;
        rdfs:comment "some text under title 2" ;
        ex:subtitles
            (
                [
                a ex:Title ; rdfs:label "Title 2.1" ;
                rdfs:comment "some text under title 2.1" 
                ]
                [
                a ex:Title ; rdfs:label "Title 2.2" ;
                rdfs:comment "some text under title 2.2" 
                ]
            )
        ]
    ) .

Option 2

Another option is to rely on storage order, hoping that items are stored in order of appearance.

Turtle syntax for blank node property lists and collections forces correct "order of appearance".

In GraphDB, you could say after importing the above Turtle:

PREFIX ex: <http://ex.com/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ent: <http://www.ontotext.com/owlim/entity#>

SELECT ?label ?comment {
    ?s a ex:Title ; rdfs:label ?label ; rdfs:comment ?comment
} ORDER BY ent:id(?s)

Option 3

Another option is to use inferencing.

  1. First, let's invent our own format for ordered trees, e. g. the following one:

    :title0 a :Node; rdfs:label "Book";
            :down title1.
    :title1 a :Node; rdfs:label "Title 1";
            :down title11;
            :right title2.
    :title2 a :Node; rdfs:label "Title 2";
            :down title21;
            :right title3.
    :title3 a :Node; rdfs:label "Title 3";
            :down title31.
    
  2. Second, let's restore initial tree ordering (and transitively close it). In SWRL:

    right(?a, ?b) ^ right(?b, ?c) -> right(?a, ?c)
    down(?a, ?b) ^ right(?b, ?c) -> down(?a, ?c)
    down(?a, ?b) ^ down(?b, ?c) -> down(?a, ?c)
    

    You could use OWL axioms instead or assert some of inferred statements explicitly.

  3. Third, let's formulate rules that define ordering that corresponds to the depth-first traversing order:

    right(?a, ?b) -> after(?a, ?b)
    down(?a, ?b) -> after(?a, ?b)
    down(?a, ?c) ^ right(?a, ?b) ^ down(?b, ?d) -> after(?c, ?d)
    down(?a, ?c) ^ right(?a, ?b) -> after(?c, ?b)
    right(?a, ?b) ^ down(?b, ?c) -> after(?a, ?c)
    

    Not sure that this set of rules is minimal or elegant...

  4. Now, your SPARQL query should be:

    SELECT ?s (SAMPLE(?label) AS ?title) (COUNT(?o) AS ?count) {
        ?s a :Node ; rdfs:label ?label .
        OPTIONAL { ?s :after ?o }
    } GROUP BY ?s ORDER BY DESC(?count)
    
Stanislav Kralin
  • 10,115
  • 4
  • 30
  • 52