4

With the introduction of new BiTemporal features in MarkLogic8, you can track changes in two time axes: valid and system times. These features are also supported for triples. So you can go back in time along those two axes and possibly see the changes. However, as triples are stored in documents and the bitemporal metadata are stored at document level not at triple level, you are not able to delete or update a particular triple. In addition, you cannot use the new SPARQL Update features with temporal triples. Here is an example:

On Day 1, we add the following triples which, we assume, are always true:

<temporalTriples>
  <systemStart />
  <systemEnd />
  <validStart>2001-01-01T00:00:00Z</validStart>
  <validEnd>2999-01-01T00:00:00Z</validEnd>
  <sem:triples>
    <sem:triple>
      <sem:subject>Denver</sem:subject>
      <sem:predicate>state</sem:predicate>
      <sem:object>CO</sem:object>
    </sem:triple>
    <sem:triple>
      <sem:subject>San Francisco</sem:subject>
      <sem:predicate>state</sem:predicate>
      <sem:object>CA</sem:object>
    </sem:triple>
  </sem:triples>
</temporalTriples>

On Day 2, we add the following triple as we think Luna lives in Denver:

<temporalTriples>
  <systemStart />
  <systemEnd />
  <validStart>{current-dateTime()}</validStart>
  <validEnd>2999-01-01T00:00:00Z</validEnd>
  <sem:triples xmlns:sem="http://marklogic.com/semantics">
    <sem:triple>
      <sem:subject>Luna</sem:subject>
      <sem:predicate>city</sem:predicate>
      <sem:object>Denver</sem:object>
    </sem:triple>
  </sem:triples>
</temporalTriples>

Now on Day 3, we want to change the city of Luna to San Francisco, so we have no options but adding another triple:

<temporalTriples>
  <systemStart />
  <systemEnd />
  <validStart>{current-dateTime()}</validStart>
  <validEnd>2999-01-01T00:00:00Z</validEnd>
  <sem:triples xmlns:sem="http://marklogic.com/semantics">
    <sem:triple>
      <sem:subject>Luna</sem:subject>
      <sem:predicate>city</sem:predicate>
      <sem:object>San Francisco</sem:object>
    </sem:triple>
  </sem:triples>
</temporalTriples>

Without having the notion of triple update/delete, there are a couple of issues that make MarkLogic unable to answer certain questions correctly:

  • If you ask for all valid triples (along the valid time axis), you will get all triples including <Luna> <city> <Denver>.
  • If you ask for all current triples (along the system time axis), again you will get all triples.
  • If you ask for the latest triples (along both axes), you will get only <Luna> <city> <San Francisco>.

Here is a sample query that gives all valid triples:

sem:sparql('SELECT *
  WHERE {
    ?s ?p ?o .
  }',
    (),
    (),
    sem:store(
      (),
      cts:and-query((
        cts:period-range-query(
          "valid",
          "ALN_CONTAINS",
          cts:period( xs:dateTime("2998-12-31T23:59:59Z") )
         ),
         cts:collection-query("temporalCollection"),
         cts:collection-query("temp/triples.xml")
     ))
   )
)

Based on these, you are not able to answer the following questions correctly:

  1. If you ask for the valid city and state that Luna lives in now, you will get both Denver and San Francisco and their states.
  2. If you ask for the latest city and state that Luna lives in, you will get nothing because the triples defining the connection between cities and states are not in the latest collection.

Here is the summary of the main issues:

  1. Adding new triples into the DB: It's perfectly supported by ML8 bitemporal feature. You can go back in time and see the DB as it was before the addition.
  2. Removing a triple: Not supported. You can only remove the latest inserted triples from the 'latest' collection by using temporal:document-delete. The data is there and you can query that. You may also end up removing the triples that you want to keep as a set of triples are stored in a single document.
  3. Updating a triple (e.g. Luna moves to San Francisco from Denver). Ideally, you should be able to remove the old triple and insert the new one (similar to the ML8 SPARQL Update capability) but since the deletion is not supported you will end up having both the new and old triples stored in/retured from the DB.

Is there any workaround for deleting/updating temporal triples so that we could answer the example questions?

  • Could you give an example of one of your sparqle queries? I'm expecting to see a cts:period-range-query section, for instance. – David Ennis Nov 12 '15 at 00:02
  • @DavidEnnis: Added to the question. – user5552435 Nov 12 '15 at 00:46
  • 1
    Hi, this is a pretty complex question -- I think it needs a combination of temporal and non-temporal documents. You are correct that to implement this solution you need to manage all the temporal triples yourself -- so the first insert (always true things) could be done with SPARQL update or xdmp:document-insert(). The second and third inserts look like they are for temporal data. You should use the temporal:document-insert function, and use the same document URI to 'overwrite' the triple from the second insert. Is that what you are doing? – grechaw Nov 12 '15 at 19:06
  • Related to @grechaw's comment, I see that the system timestamps in your documents are empty. If they were inserted using the temporal functions, they would get populated. Are you showing the pre-insert version? – Dave Cassel Nov 12 '15 at 19:35
  • Thanks @grechaw. This is actually a modified version of bitemporal-demo (https://github.com/fxue/bitemporal-demo). Instead of just ingesting only related triples in each transaction, I have split the triples. I've broken down the issue into smaller pieces to make it simpler (see the question). One can get all triples, remove or update the target triple, and then insert it as a new document. But won't be an efficient solution. – user5552435 Nov 13 '15 at 01:52
  • That's right @DaveCassel. This is showing the pre-insert triples. The system time is populated by ML. – user5552435 Nov 13 '15 at 01:57

1 Answers1

0

I think it would make sense to rely on system axis instead of valid axis. The difference is that system axis helps identify when data entered the system, and when it expired (e.g. the temporal equivalent of deletion). Valid axis tells something about the semantic validity of the data. It tells when it became relevant, and when it loses/lost relevance. Next to that, system axis is managed by MarkLogic, so it makes sure that temporal versions don't have overlapping end times, which I think is causing the main issue in above example.

I'm not sure if the described case even really requires BI-temporal. MarkLogic 9 comes with the option to use uni-temporal instead, e.g. using the system-axis only. That would both make maintenance much easier, as you can just leave out the valid axis properties, and more reliable, as you only need to care about system start time when (temporally) inserting. Previous versions would get expired automatically, since MarkLogic will update system end time of previous versions accordingly.

I also think it would be easiest if you group your triples by subject iri, and create one doc per subject iri, managed temporally. If you update a triple of a particular subject, you can do a temporal node update to make the appropriate triple change. If you then first insert a triple for Luna living in Denver, and then (temporally) update the Luna triple doc to say that Luna lives in SF, the older version gets a new system end time preceding the system start time of the newest version.

You can then run SPARQL queries constrained to cts:collection-query('latest') to get newest temporal triples only, cts:lsqt-query(..) to get triples from a particular time before LSQT, or do something manually with a date range query on the system start and end time properties.

HTH!

grtjn
  • 19,803
  • 1
  • 21
  • 34