12

I'm interested in downloading some boundary files from statistics.gov.scot, which is an official statistical repository for sharing statistical data that utilises SPARQL queries.

Background

Statistics.gov.scot provides access to GeoJSON boundaries for number of administrative and statistical geographies, like local authority administrative boundaries or health boards. In my particular case I'm interested in download a data set with GeoJSON boundaries pertaining to data zones. Data zones are statistical geographies developed for the purpose of disseminating life outcomes data on a small area level. When accessed via the statistics.gov.scot sample data zone looks like that:

Sample data zone

The geography and the related data can be accessed here. The corresponding GeoJSON data is available here.

Problem

Data zones are available in two iterations, on produced in 2004 and another one updated recently. I would like to download first iteration produced in 2004. Following the information on the statistical entities, I drafted the following query:

PREFIX entity: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX boundaries: <http://statistics.gov.scot/boundaries/>

SELECT ?boundary 
    WHERE {
        entity:introduced <http://reference.data.gov.uk/id/day/2004-02-01>
  }

LIMIT 1000

which returns the following error message:

Error There was a syntax error in your query: Encountered " "}" "} "" at line 7,
column 3. Was expecting one of: <IRIref> ... <PNAME_NS> ... <PNAME_LN> ...
<BLANK_NODE_LABEL> ... <VAR1> ... <VAR2> ... "true" ... "false" ... <INTEGER> ...
<DECIMAL> ... <DOUBLE> ... <INTEGER_POSITIVE> ... <DECIMAL_POSITIVE> ...
<DOUBLE_POSITIVE> ... <INTEGER_NEGATIVE> ... <DECIMAL_NEGATIVE> ...
<DOUBLE_NEGATIVE> ... <STRING_LITERAL1> ... <STRING_LITERAL2> ...
<STRING_LITERAL_LONG1> ... <STRING_LITERAL_LONG2> ... "(" ... <NIL> ... "[" ...
<ANON> ... "+" ... "*" ... "/" ... "|" ... "?" ...

when tested via the endpoint: http://statistics.gov.scot/sparql.

Comments

Ideally, I would like to develop other queries that would enable me to source other statistical geographies by using the entity: prefix. This should be possible as the entity: will contain information on the available geographies (name, acronym, date of creation).


The query:

PREFIX entity: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX boundaries: <http://statistics.gov.scot/boundaries/>

SELECT DISTINCT ?boundary ?shape WHERE {
  ?shape entity:firstcode ?boundary
}

LIMIT 1000

Got me to something that looks like a list of desired geographies but I'm struggling to source the GeoJSON boundaries.

Community
  • 1
  • 1
Konrad
  • 14,406
  • 15
  • 86
  • 141
  • It seems that neither *statistics.gov.scot*, nor *statistics.data.gov.uk* do not contain data zones boundaries as [wkt-](https://en.wikipedia.org/wiki/Well-known_text) or string literals. However, one could easily construct URIs of geojson-files with the following query. – Stanislav Kralin Oct 09 '17 at 15:10
  • @StanislavKralin Why don’t you make it answer, it seems like a good approach. – Konrad Oct 09 '17 at 15:43

2 Answers2

5

The first query is missing the subject. A SPARQL query defines a set of triple patterns - a subject, predicate, and object - to match an RDF graph. To turn your WHERE clause into a SPARQL triple pattern, try:

?boundary entity:introduced <http://reference.data.gov.uk/id/day/2004-02-01>
scotthenninger
  • 3,613
  • 1
  • 12
  • 23
  • Thanks for showing the interest in my modest problem. Running this query gave ma a table with one row and this value: `http://statistics.gov.scot/id/statistical-entity/S01` . – Konrad Feb 26 '16 at 17:12
  • 1
    OK, and if you want to see what information you have about that entity, then add the triple pattern {?boundary ?p ?o}, which give you all property/object pairs and you can choose which ones you really want to query for. – scotthenninger Mar 11 '16 at 17:41
1

Neither statistics.gov.scot nor statistics.data.gov.uk contains data zones boundaries as WKT or string literals.

However, with the following query, one could easily construct URLs of the GeoJSON files that are used on resources' pages:

PREFIX pref1: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX pref2: <http://statistics.gov.scot/id/statistical-entity/>
PREFIX pref3: <http://statistics.data.gov.uk/def/boundary-change/>
PREFIX pref4: <http://reference.data.gov.uk/id/day/>
PREFIX pref5: <http://statistics.data.gov.uk/def/statistical-geography#>
PREFIX pref6: <http://statistics.gov.scot/id/statistical-geography/>
PREFIX pref7: <http://statistics.gov.scot/boundaries/>

SELECT ?zone ?name ?json {
   ?zone pref1:code pref2:S01 .
   ?zone pref3:operativedate pref4:2004-02-01
   OPTIONAL { ?zone pref5:officialname ?name }
   BIND (CONCAT(REPLACE(STR(?zone), STR(pref6:), STR(pref7:)), ".json") AS ?json)
} ORDER BY (!bound(?name)) ASC(?name)

After that, one could easily retrieve GeoJSON files using wget -i or something like this.

Some explanation

You should use <http://statistics.data.gov.uk/def/boundary-change/operativedate> instead of <http://statistics.data.gov.uk/def/statistical-entity#introduced>, the latter property is rather a class property:

SELECT * WHERE {
    ?S <http://statistics.data.gov.uk/def/statistical-entity#introduced> ?date .
    ?S <http://www.w3.org/2000/01/rdf-schema#label> ?label
}

The second generation data zones are dated by 2014-11-06:

SELECT ?date (COUNT(?zone) AS ?count) WHERE {
    ?zone
        <http://statistics.data.gov.uk/def/statistical-entity#code>
            <http://statistics.gov.scot/id/statistical-entity/S01> ;
        <http://statistics.data.gov.uk/def/boundary-change/operativedate>
            ?date 
} GROUP BY ?date

Analogously, if you need URLs of corresponding GeoJSON files, your query should be:

SELECT ?zone ?name ?json {
   ?zone pref1:code pref2:S01 .
   ?zone pref3:operativedate pref4:2014-11-06 .
   ?zone pref5:officialname ?name 
   BIND (CONCAT(REPLACE(STR(?zone), STR(pref6:), STR(pref7:)), ".json") AS ?json)
} ORDER BY ASC(?name)

You do not need OPTIONAL, because all second generation data zones have "official names".


Probably this page on data.gov.uk will be interesting for you.
There also exists opendata.stackexchange.com for questions related to open data.

Update

As of May 2018, one can retrieve data zones boundaries as WKT:

PREFIX pref1: <http://statistics.data.gov.uk/def/statistical-entity#>
PREFIX pref2: <http://statistics.gov.scot/id/statistical-entity/>
PREFIX pref3: <http://statistics.data.gov.uk/def/boundary-change/>
PREFIX pref4: <http://reference.data.gov.uk/id/day/>
PREFIX pref5: <http://statistics.data.gov.uk/def/statistical-geography#>
PREFIX pref6: <http://www.opengis.net/ont/geosparql#>


SELECT ?zone ?name ?geometry {
   ?zone pref1:code pref2:S01 .
   ?zone pref3:operativedate pref4:2014-11-06 .
   ?zone pref5:officialname ?name .
   ?zone pref6:hasGeometry/pref6:asWKT ?geometry .
} ORDER BY ASC(?name)
Stanislav Kralin
  • 10,115
  • 4
  • 30
  • 52