16

I want to extract a triple which contains word say "alice" in its subject. The query I used was:

SELECT ?s ?p ?o  WHERE { ?s ?p ?o .FILTER regex(?s, \"alice\") .}

This doesn't give me any results inspite of have a triple which satisfies this constraint.

On the other hand when I use the same query to extract a triple which contains a word brillant in its object .It returns only one of the 2 possible matches.

The query used is:

SELECT ?s ?p ?o  WHERE { ?s ?p ?o .FILTER regex(?o, \"brillant\") .}

Please let me know where am I going wrong and what is the reason for this behaviour.

André Dion
  • 19,231
  • 7
  • 52
  • 59
user2335580
  • 350
  • 1
  • 3
  • 13

1 Answers1

22

I'll assume that the escapes around the quotation marks are just a remnant from copying and pasting. The first argument to regex must be a literal, but literals cannot be the subjects of triples in RDF, so it's not true that you have data that should match this pattern. What you might have, though, is subjects whose URI contains the string "alice", and you can get the string representation of the URI using the str function. E.g.,

SELECT ?s ?p ?o  WHERE { ?s ?p ?o .FILTER regex(str(?s), "alice") .}

To illustrate, let's use the two values <http://example.org> and "string containing example" and filter as you did in your original query:

select ?x where {
  values ?x { <http://example.org> "string containing example" }
  filter( regex(?x, "exam" ))
}
-------------------------------
| x                           |
===============================
| "string containing example" |
-------------------------------

We only got "string containing example" because the other value wasn't a string, and so wasn't a suitable argument to regex. However, if we add the call to str, then it's the string representation of the URI that regex will consider:

select ?x where {
  values ?x { <http://example.org> "string containing example" }
  filter( regex(str(?x), "exam" ))
}
-------------------------------
| x                           |
===============================
| <http://example.org>        |
| "string containing example" |
-------------------------------
Joshua Taylor
  • 80,876
  • 9
  • 135
  • 306
  • Testing your first code fragment on the DBPedia sparql endpoint gives both values, instead of only the string. – DieterDP Sep 23 '15 at 11:12
  • 4
    @DieterDP DBpedia uses Virtuoso for its endpoint. Its implementation "helpfully" extends **regex** to accept non-strings, even though the standard says that [**regex**](http://www.w3.org/TR/sparql11-query/#func-regex) takes a literal as its argument. I say "helpfully" because while it may make queries simpler with Virtuoso, you end up with non-portable queries that will fail when you bring them to other environments. If you're concerned with portability and adhering to the standard, it can help to test queries at [sparql.org's general purpose query engine](http://sparql.org/sparql.html). – Joshua Taylor Sep 23 '15 at 12:33
  • I thought as much. Hadn't come across the sparql.org engine yet, looks handy. But I don't really understand what data (if any) it queries. How exactly do you get it to run on a graph (eg: dbpedia?). – DieterDP Sep 23 '15 at 13:35
  • For a small dataset available online, you can just paste the URL of the dataset into the "Target graph URI" field. For bigger datasets with remote endpoints, you can use the **service** keyword in the query, but that will rely on the remote server performing the query, so you might still get the Virtuoso specific results. – Joshua Taylor Sep 23 '15 at 13:42
  • 1
    Just want to mention that for exact query in question `contains()` works the same or more correctly than `regex()` (because it takes literal strings) – phiresky Dec 09 '20 at 13:37