12

There is probably an easy to answer to this, but I can't even figure out how to formulate the Google query to find it.

I'm writing SPARQL construct queries against a dataset that includes blank nodes. So if I do a query like

CONSTRUCT {?x ?y ?z .} WHERE {?x ?y ?z .}

Then one of my results might be:

nm:John nm:owns _:Node

Which is a problem if all of the

_:Node nm:has nm:Hats

triples don't also get into the query result somehow (because some parsers I'm using like rdflib for Python really don't like dangling bnodes).

Is there a way to write my original CONSTRUCT query to recursively add all triples attached to any bnode results such that no bnodes are left dangling in my new graph?

Stanislav Kralin
  • 10,115
  • 4
  • 30
  • 52
rogueleaderr
  • 4,084
  • 2
  • 28
  • 40
  • 2
    If you're doing this in a way where you can run a CONSTRUCT query and then run a SPARQL query against that, you actually *can* get a concise-bounded description (CBD) using SPARQL queries. Have a look at [Sparql query to return all triples recursively that make up an rdfs:class definition](http://answers.semanticweb.com/questions/26220/sparql-query-to-return-all-triples-recursively-that-make-up-an-rdfsclass-definition) (and maybe [Implementing Concise Bounded Description in SPARQL](http://answers.semanticweb.com/questions/20361/implementing-concise-bounded-description-in-sparql)). – Joshua Taylor Mar 19 '15 at 14:10
  • 1
    The idea behind the answers in the previous comment is that you can create a *new property* that effectively acts as a predicate within a property path. E.g., for each IRI node i, you add a triple "i selfIRI i". Then when you write a path like "?x p/selfIRI ?y", you've ensured that ?y is an IRI node. – Joshua Taylor Mar 19 '15 at 14:15

3 Answers3

11

Recursion isn't possible. The closest I can think of is SPARQL 1.1 property paths (note: that version is out of date) but bnode tests aren't available (afaik).

You could just remove the statements with trailing bnodes:

CONSTRUCT {?x ?y ?z .} WHERE 
{
  ?x ?y ?z .
  FILTER (!isBlank(?z))
}

or try your luck fetching the next bit:

CONSTRUCT {?x ?y ?z . ?z ?w ?v } WHERE 
{
  ?x ?y ?z .
  OPTIONAL {
    ?z ?w ?v
    FILTER (isBlank(?z) && !isBlank(?v))
  }
}

(that last query is pretty punishing, btw)

You may be better off with DESCRIBE, which will often skip bnodes.

user205512
  • 8,368
  • 26
  • 28
  • Thanks user. My current plan is use a two level query without worrying about recursion to greater depth. The isBlank filter may help, but FILTERs really seem to slaughter performance since SPARQL seems to be materializing the entire pre-filter subgraph before performing the filter line-by-line. So unless the unfiltered subgraph is small, filter queries turn out to be really intensive. – rogueleaderr Mar 20 '12 at 19:22
  • 2
    You can't say "SPARQL is materialising the entire pre-filter subgraph ...": different SPARQL engine implementations will have different algorithms with differing strengths and weaknesses. It even varies depending on the version of the library you're using. – Ian Dickinson Mar 21 '12 at 08:03
3

As user205512 suggests, performing that grab recursively is not possible, and as they point out, using optional(s) to go arbitrary levels down into your data getting the nodes is not feasible on anything but non-trivial sized databases.

Bnodes themselves are locally scoped, to the result set, or to the file. There's no guarantee that a BNode is you get from parsing or from a result set is the same id that is used in the database (though some database do guarantee this for query results). Furthermore, a query like "select ?s where { ?s ?p _:bnodeid1 }" is the same as "select ? where { ?s ?p ?o }" -- note that bnode is treated as a variable in that case, not as "the thing w/ the id 'bnodeid1'" This quirk of the design makes it difficult to query for bnodes, so if you are in control of the data, I'd suggest not using them. It's not hard to generate names for stuff that would otherwise be bnodes, and named resources v. bnodes will not increase overhead during querying.

That does not help you recurse down and grab data, but for that, I don't recommend doing such general queries; they don't scale well and usually return more than you want or need. I'd suggest you do more directed queries. Your original construct query will pull down the contents of the entire database, that's generally not what you want.

Lastly, while describe can be useful, there's not a standard implementation; the SPARQL spec doesn't define any particular behavior, so what it returns is left to the database vendor, and it can be different. That can make your code less portable if you plan on trying different databases with your application. If you want a specific behavior out of describe, you're best off implementing it yourself. Doing something like the concise bounded description for a resource is an easy piece of code, though you can run into some headaches around Bnodes.

Michael
  • 4,818
  • 17
  • 30
  • 1
    Thanks for the answer Michael. My example query was a bit imprecise...what I'm trying to actually do is pull out all the information about a given entity in a dataset I've downloaded. But the dataset includes entries like "John was the creator of _:1234". So I suppose my alternatives are to use the two-level query and hope it doesn't crush performance, or just restructure the database to name all the bnodes. It does seem like the SPARQL spec could use stronger support for this, because this doesn't seem like it would be a particularly uncommon issue. – rogueleaderr Mar 20 '12 at 19:20
1

With regard to working with the ruby RDF.rb library, which allows SPARQL queries with significant convenience methods on RDF::Graph objects, the following should expand blank nodes.

rdf_type = RDF::SCHEMA.Person # for example
rdf.query([nil, RDF.type, rdf_type]).each_subject do |subject|
  g = RDF::Graph.new
  rdf.query([subject, nil, nil]) do |s,p,o|
    g << [s,p,o]
    g << rdf_expand_blank_nodes(o) if o.node?
  end
end

def rdf_expand_blank_nodes(object)
  g = RDF::Graph.new
  if object.node?
    rdf.query([object, nil, nil]) do |s,p,o|
      g << [s,p,o]
      g << rdf_expand_blank_nodes(o) if o.node?
    end
  end
  g
end
Joshua Taylor
  • 80,876
  • 9
  • 135
  • 306
Darren Weber
  • 1,072
  • 13
  • 14