9

How can I retrieve the length of a path between two nodes? For instance, given an organizational hierarchy, how can I determine how far separated are a parent and an descendant organization? Consider the following scenarios:

  1. OrgA -hasSubOrganization-> OrgB, OrgC

    This is the very simplistic case where I want to get all the immediate suborganizations of an entity. Hence the path length is 1.

  2. OrgA -> OrgB -> OrgC

    or the general case

    OrgA -> OrgB - - - - - - - - OrgZ
    

I want to recursively traverse down the graph and find each organization belonging to another organization through the hasSubOrganization property. To get all the sub-organizations recursive I can use property paths, e.g., the + operator:

OrgA hasSubOrganization+ ?subOrg

This will give me all the suborganizations right down to the leaf nodes. But my ultimate goal is to build the organization hierarchy, but the information about the "Number of nodes/steps/levels/hops away a suborganization is" is lost. This means that I cannot recreate the org structure for a visualization.

How can I capture the "number of nodes away" information in addition to the name of the suborganization?

Joshua Taylor
  • 80,876
  • 9
  • 135
  • 306
Chantz
  • 5,687
  • 10
  • 51
  • 76
  • It's not a trivial on line query, but this _can_ be achieved. [Is it possible to get the position of an element in an RDF Collection in SPARQL?](http://stackoverflow.com/q/17523804/1281433) reduces to the same kind of query, and the answer there was a resounding "yes". – Joshua Taylor Sep 24 '13 at 20:23

2 Answers2

20

This is based on the same technique used to compute the position of an element in an RDF list using SPARQL that is described in: Is it possible to get the position of an element in an RDF Collection in SPARQL?

If you have data like this:

@prefix : <http://example.org> .

:orgA :hasSuborganization :orgB, :orgC, :orgD.
:orgB :hasSuborganization :orgE, :orgF.
:orgE :hasSuborganization :orgG.
:orgG :hasSuborganization :orgH.

which describes a hierarchy like this:

organization hierarchy

then you can use a query like this:

prefix : <http://example.org> 

select ?super ?sub (count(?mid) as ?distance) { 
  ?super :hasSuborganization* ?mid .
  ?mid :hasSuborganization+ ?sub .
}
group by ?super ?sub 
order by ?super ?sub

to get results like these:

$ sparql --query query.rq --data subs.n3
----------------------------
| super | sub   | distance |
============================
| :orgA | :orgB | 1        |
| :orgA | :orgC | 1        |
| :orgA | :orgD | 1        |
| :orgA | :orgE | 2        |
| :orgA | :orgF | 2        |
| :orgA | :orgG | 3        |
| :orgA | :orgH | 4        |
| :orgB | :orgE | 1        |
| :orgB | :orgF | 1        |
| :orgB | :orgG | 2        |
| :orgB | :orgH | 3        |
| :orgE | :orgG | 1        |
| :orgE | :orgH | 2        |
| :orgG | :orgH | 1        |
----------------------------

The trick here is to recognize that any path from X to Y can be viewed as a (possibly empty) path from X to some intermediate node Z (nonempty means that you can choose X as Z) concatenated with a (non empty) path from Z to Y. The number of possible ways of picking Z indicates the length of the path.

Community
  • 1
  • 1
Joshua Taylor
  • 80,876
  • 9
  • 135
  • 306
  • 8
    It's important to note that this will break down if there are multiple paths from X to Y. The count will include all nodes from both paths. – Rob Hall Apr 21 '14 at 13:41
1

You can't do this using propery paths since the working group specifically chose not to make this information available as it makes implementation much more complex.

If you want to generate a hierarchy it will probably be just as efficient to make a whole series of SPARQL queries where each query expands one leaf of the hierarchy and not use property paths at all if your goal is just to visualise the hierarchy

There may be other approaches using the Jena Ontology API - I'd recommend asking on their mailing list jena-users@incubator.apache.org for more expert help

RobV
  • 26,016
  • 10
  • 71
  • 114
  • Thanks for this. I asked the same question at semanticoverflow.com & there too somebody aswered with what you have said http://www.w3.org/TR/sparql11-property-paths/#Outstanding_Issues . I think what I will do is use the property path to get all the & then do post-processing to generate the hierarchy. – Chantz Mar 06 '11 at 04:06
  • Sounds like a good approach. Yes I saw your question on SemanticOverflow but somebody else had already answered there so didn't see the point of replicating my answer – RobV Mar 06 '11 at 17:35
  • 1
    This _can_ be done, using the same technique that [computes the position of an element in an RDF list](http://stackoverflow.com/q/17523804/1281433). – Joshua Taylor Sep 24 '13 at 20:41