SPARQL Query to create a merged graph from different graphs based on property value comparisons

Question

I have three graph data models with nodes representing the same physical entity in different ways in the three graphs.

Graph G1 where Pump P1 is of type CentrifugalPumpType

Graph G2 where Pump P2 is of type PADIMType

Graph G3 where Pump P3 is of type PumpType

As you can see in the above three graphs the same Pump is being modelled in different ways. However there is a way to find out if they are indeed the same pump. Between the first graph (G1) and the second graph (G2) the comparison can be done based on the values of the TagNameAssignmentClass property (from the G1 graph) with that of SignalTag property (from the G2 graph), in this example they both have the value "P1612-A". Similary between G2 and the third Graph (G3) the comparison can be done between the Manufacturer properties from G2 and G3 (in the example they have the same value "XYZ") and the respective SerialNumber properties from G2 and G3 ((in the example they have the same value "1234"). All of these properties are direct or indirect properties of the node representing the same pump (P1, P2 and P3) in all three models. The aim of the merge would be to actually merge the node representing the pump in the three models. The merged Graph would then look something like this:

I am a complete newbie to this new way of thinking, I went through all the basic SPARQL tutorials that are out there, however this query that I am trying to write is too complex for my current level of understanding of SPARQL. It would be great if someone out there could help! The string literals are just to explain what I mean, I do not mean to mention the string literals in my query, rather I would just want to directly compare the properties that I mentioned without mentioning what literal value that is.

Edit 1: I was asked to create a minimal reproducible example, so here is a try after removing unnecessary properties and simplifying the aim further:

So the Graph G1 dataset is as follows:

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> . 
@prefix eg1:    <http://www.myexample1.com> . 

eg1:PumpP1
    rdf:type eg1:CentrifugalPumpType ;
    has_property  eg1:DifferentialPressure ;
    has_property  eg1:TagNameAssignmentClass .

eg1:TagNameAssignmentClass
    rdf:value "P1612-A" .

Graph G2 Dataset is as follows:

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> . 
@prefix eg2:    <http://www.myexample2.com> . 
    
eg2:PumpP2
    rdf:type   eg2:PADIMType ;
    has_property  eg2:SignalSet ;
    has_property  eg2:Manufacturer ;
    has_property  eg2:SerialNumber .

eg2:Manufacturer
    rdf:value "XYZ" .

eg2:SerialNumber
    rdf:value "1234" .

eg2:SignalSet
    has_property eg2:SignalS1 .

eg2:SignalS1
    has_property eg2:SignalTag .

eg2:SignalTag
    rdf:value "P1612-A" .

Graph G3 Dataset may look like:

@prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
@prefix rdfs:   <http://www.w3.org/2000/01/rdf-schema#> . 
@prefix eg3:    <http://www.myexample3.com> . 
    
eg3:PumpP3
    rdf:type   eg3:PumpType ;
    has_property  eg3:Identification ;
    has_property  eg3:Ports .

eg3:Identification
    has_property  eg3:Manufacturer ;
    has_property  eg3:SerialNumber .

eg3:Manufacturer
    rdf:value "XYZ" .

eg3:SerialNumber
    rdf:value "1234" .

Expected Graph after the merge could look like this:

@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> . 
@prefix eg1:   <http://www.myexample1.com> .
@prefix eg2:   <http://www.myexample2.com> .
@prefix eg3:   <http://www.myexample3.com> . 
@prefix mg:    <http://www.mymergeexample.com> .
    
mg:PumpP
    rdf:type   eg1:CentrifugalPumpType ;
    rdf:type   eg2:PADIMType ;
    rdf:type   eg3:PumpType ;
    has_property  eg1:DifferentialPressure ;
    has_property  eg1:TagNameAssignmentClass ;
    has_property  eg2:Manufacturer ;
    has_property  eg2:SerialNumber ;
    has_property  eg2:SignalSet ;
    has_property  eg3:Identification ;
    has_property  eg3:Ports .

eg1:TagNameAssignmentClass
    rdf:value "P1612-A" .

eg2:Manufacturer
    rdf:value "XYZ" .

eg2:SerialNumber
    rdf:value "1234" .

eg2:SignalSet
    has_property eg2:SignalS1 .

eg2:SignalS1
    has_property eg2:SignalTag .

eg2:SignalTag
    rdf:value "P1612-A" .

eg3:Identification
    has_property  eg3:Manufacturer ;
    has_property  eg3:SerialNumber .

eg3:Manufacturer
    rdf:value "XYZ" .

eg3:SerialNumber
    rdf:value "1234" .

Please forgive syntax errors incase I might have made any.

@StanislavKralin Thank you for your comment, I have added the rdf datasets for the graphs in my example. I hope this helps explain the objective further and more clearly — Ricky, Mar 20 '21 at 15:56
your data is currently just "wrong". Triples like `eg3:PumpP3 rdf:Property eg3:Identification` or `eg2:SerialNumber rdfs:Literal "1234" .` don't make sense. This is not how RDF has to be used. `rdf:Property` or `rdfs:Literal` are RDF classes and thus are only in object position of a triple to denote the type of the subject. If you want to add data about the `eg3:PumpP3`, then you use the property in predicate position and like `eg3:PumpP3 eg2:SerialNumber "1234" .` So currently, you example data is rather unusable, you should fix this first. — UninformedUser, Mar 21 '21 at 09:36
Honestly, I'm not sure what you want to express by e.g. `eg3:PumpP3 rdf:Property eg3:Identification ` and then `eg3:Identification rdf:Property eg3:Manufacturer ; rdf:Property eg3:SerialNumber .` at all? What does this express? It should be more like `eg3:PumpP3 rdf:type eg3:PumpType ; eg3:Identification [ eg3:Manufacturer "XYZ"; eg3:SerialNumber "1234" ] .` — UninformedUser, Mar 21 '21 at 09:39
@UninformedUser thank you so much for your help. Indeed I did it all wrong, now I have corrected it as far as I could grasp the concepts a little better. I hope it is more understandable now. I used the relationships all wrong before, I just meant to say that the resource Identification has 2 properties Manufacturer and SerialNumber, which inturn have values. I hope after my correction it makes more sense :) Thanks a lot for the help! — Ricky, Mar 21 '21 at 11:16
no, it's still wrong. You should have a look at RDF again I'd say. For example now you added some `has_property` - why? This is totally useless and redundant. Why don't you use the property directly? I already showed how to define data about an individual in RDF: `eg3:PumpP3 eg2:Manufacturer "XYZ"` - plese check RDF tutorial. It's nothing more than `subject-predicate-object`, like `John birthPlace London` - no need to state *"John hasProperty birthDate . BirthDate is '2020-10-10'"* - you directly encode the information — UninformedUser, Mar 21 '21 at 12:46
@UninformedUser This is a 1:1 representation of a model that already exists, which is not a graph model. I don't really understand why it would be wrong. Has_Property is a relationship that the two resources share. The question is not about how I have decided to model something. When I used rdf:property incorrectly, I could understand that and have corrected it. However I don't really see anything "wrong" in the current model. I might choose to model things in a particular way. It is a perfectly valid triple if I say Subject has_property Name and Name has_value "abc", isn't it? — Ricky, Mar 21 '21 at 12:59
@UninformedUser I have checked the RDF Tutorial, and they do not really give strict modelling guidelines, that is infact the modellers prerogative. As mentioned before the question is not about the modelling choices, rather how to merge different models together that have nodes representing the same entity in different ways, so I would highly appreciate it if the conversation stays about the question and not about different way (however optimized it maybe) of modelling the same thing. — Ricky, Mar 21 '21 at 13:05
ok, if you ignore those facts, I don't care. But let me ask you a question. Given your initial question, I"m assuming each graph contains multiple of those entities, right? So `G1` has multiple pump instances, right? Currently you have `eg1:PumpP1 has_property eg1:TagNameAssignmentClass .` right? And then `eg1:TagNameAssignmentClass rdf:value "P1612-A" .` So, given this modeling, another pump will have the same property, right? So you add `eg1:PumpP2 has_property eg1:TagNameAssignmentClass .` and some other tag`eg1:TagNameAssignmentClass rdf:value "P123456" . ` - correct? — UninformedUser, Mar 21 '21 at 13:58
anyways, a starting SPARQL query that matches G1 and G2 is `select ?pumpG1 ?pumpG2 where {graph :g1 {?pumpG1 :has_property ?tagAssignment . ?tagAssignement rdf:value ?tag } graph :g2 {?pumpG2 :has_property ?signalSet . ?signalSet :has_property ?signal . ?signal :has_property ?signalTag . ?signalTag rdf:value ?tag .}}` - good luck — UninformedUser, Mar 21 '21 at 14:06
so basically, just put the different pattern per graph to your query, and reuse the variable name of those values that are the same among the graphs and indicate equality of pumps . — UninformedUser, Mar 21 '21 at 14:14
@UninformedUser yes indeed there can be multiple of those instances, correct! What will not work for obvious reasons? I didn't get it. Having another pumpX instance in the same graph will not work you mean?, if yes then why not? Thank you for the basic query, however I was assuming that it would be a Create query since my goal in the end is to have a merged graph, with all those properties attached to a single pump instance if the matching criteria is qualified. — Ricky, Mar 21 '21 at 14:31
@UninformedUser ah ok I get what you mean now when you say eg1:TagNameAssignmentClass rdf:value "P123456" . Yes indeed you are absolutely right if the two "TagNameAssignmentClass" nodes have the same id, I just mean to project here that the "TagNameAssignmentClass" is something like a "BrowseName" and not the identifier of the node. But I understand the confusion now. — Ricky, Mar 21 '21 at 15:01
yep, the URI identifies any entity aka node in your graph uniquely (as the name of URI indicates). So you would have to introduce different URIs for each other pump although the property is semantically the same - that is for sure odd. You do not want to have `has_property eg1:TagNameAssignmentClass1 .` and then another `has_property eg1:TagNameAssignmentClass 2.` etc. - that's why I suggested to rethink your modeling. — UninformedUser, Mar 21 '21 at 15:16
In the end it's your decision but you have to live with what RDF does and a SPARQL query would return a wrong result if you do use the same `has_property eg1:TagNameAssignmentClass` for multiple pumps — UninformedUser, Mar 21 '21 at 15:16
@UninformedUser yes I completely got what you were trying to convey. However currently I would be happy to write a Construct query which will merge the three Pump instances into one and attach all the relationships from all three graphs into that merged pump instance. I feel like I need to do a conditional UNION of the three graphs, but I really can't get my head around how to do this. — Ricky, Mar 21 '21 at 15:32

SPARQL Query to create a merged graph from different graphs based on property value comparisons

0 Answers0