4

I'm looking through some of the triples contained within the Freebase data dump, and some of the date times look like this:

"T12:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>

Which is ingestible by some triplestores, but not by others.

So, is this a valid dateTime? and if so, why is it valid?

Kristian
  • 19,340
  • 14
  • 84
  • 156
  • 4
    Freebase dumps often contain non-legal stuff. See [Jena parsing issue for freebase RDF dump (Jan 2014)](http://stackoverflow.com/q/21274368/1281433). – Joshua Taylor Aug 06 '14 at 17:10
  • 4
    And no, this isn't legal. There's some discussion of the syntax in [Datatype format exception for xsd:dateTime in SPARQL query with Jena?](http://stackoverflow.com/q/24166518/1281433) – Joshua Taylor Aug 06 '14 at 17:11
  • Actually, I'm not sure whether or not it's "legal". It's legal RDF, but it's semantically inconsistent. [RobV's answer](http://stackoverflow.com/a/25176637/1281433) and [my comment](http://stackoverflow.com/questions/25165456/is-this-a-valid-xsddatetime-if-so-why/25168873#comment39215156_25176637) on it add some elaboration. – Joshua Taylor Aug 07 '14 at 14:12

2 Answers2

7

It's not a valid xsd:dateTime, but it is a syntactically valid RDF literal term, but one that is semantically inconsistent.

First, let's see why T12:00 isn't in the lexical space of xsd:dateTime. The standard xsd:dateTime says:

The lexical space of dateTime consists of finite-length sequences of characters of the form: '-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?

T12:00 matches part of that, but it's lacking the year, month, and day, and second parts.

However, as RobV pointed out an RDF literal term is still syntactically valid, even if the lexical form isn't in the lexical space of the datatype. In RDF 1.1 Concepts and Abstract Syntax, we have this (note 2.b):

3.3 Literals

A literal in an RDF graph consists of two or three elements:

  • a lexical form, being a Unicode string, which SHOULD be in Normal Form C,
  • a datatype IRI, being an IRI identifying a datatype that determines how the lexical form maps to a literal value, and
  • if and only if the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, a non-empty language tag as defined by [BCP47]. The language tag MUST be well-formed according to section 2.2.9 of [BCP47].

… The literal value associated with a literal is:

  1. If the literal is a language-tagged string, then the literal value is a pair consisting of its lexical form and its language tag, in that order.
  2. If the literal's datatype IRI is in the set of recognized datatype IRIs, let d be the referent of the datatype IRI.
    • a. If the literal's lexical form is in the lexical space of d, then the literal value is the result of applying the lexical-to-value mapping of d to the lexical form.
    • b. Otherwise, the literal is ill-typed and no literal value can be associated with the literal. Such a case produces a semantic inconsistency but is not syntactically ill-formed. Implementations MUST accept ill-typed literals and produce RDF graphs from them. Implementations MAY produce warnings when encountering ill-typed literals.
  3. If the literal's datatype IRI is not in the set of recognized datatype IRIs, then the literal value is not defined by this specification.

Thus, "T12:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> is an RDF literal term, but it's a semantically inconsistent one. This alone doesn't make the Freebase dump invalid RDF. An implementation must process it and create an RDF graph from it, but can warn about it. That means that an RDF parser has to be able to process it. I'm not sure whether a triple store counts as "an implementation" or not. If it does, then it should store the resulting value. If it's not, then I guess it's OK for it to only store RDF graphs that have only semantically consistent literals.

Community
  • 1
  • 1
Joshua Taylor
  • 80,876
  • 9
  • 135
  • 306
4

As Joshua says it is not a valid xsd:dateTime however it is still a valid RDF literal

A RDF literal consists of a lexical value - the T12:00 - and an optional data type/language specifier. In your case it has the optional data type of xsd:dateTime

So the difference in behaviour you see between stores is down to whether stores enforce data type restrictions on the lexical form of the literal or not i.e. do they require that the lexical values for xsd: datatypes match the rules laid out in XML Schema Part 2: Datatypes

Stores which enforce this will only allow valid values while those that do not allow mixtures or valid and invalid values. Some of the strict stores may have options to allow the invalid values in which case check with your vendor/community as to whether this is the case.

RobV
  • 26,016
  • 10
  • 71
  • 114
  • 2
    Very good point! Stores enforcing that literals have legal lexical forms is interesting, too. Note that [3.3 Literals in RDF 1.1](http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal) says "If the literal's datatype IRI is in the set of recognized datatype IRIs, … [if the lexical form is not in the lexical space, then] the literal is ill-typed and … – Joshua Taylor Aug 07 '14 at 14:08
  • … no literal value can be associated with the literal. Such a case produces a semantic inconsistency but is not syntactically ill-formed. Implementations **must** accept ill-typed literals and produce RDF graphs from them. Implementations **may** produce warnings when encountering ill-typed literals." – Joshua Taylor Aug 07 '14 at 14:08
  • So, it's definitely syntactically valid; I'm not sure whether a triple store is considered "an implementation". I guess the RDF parser has to process them to be conforming, but if the store isn't "an implementation" it could be stated to disallow semantically inconsistent literal terms. – Joshua Taylor Aug 07 '14 at 14:11