23

What is the difference between DISTINCT and REDUCED in SPARQL?

Donal Fellows
  • 120,022
  • 18
  • 134
  • 199
Infinite
  • 2,692
  • 2
  • 25
  • 32

2 Answers2

26

REDUCED is like a 'best effort' DISTINCT. Whereas DISTINCT guarantees no duplicated results, REDUCED may eliminate some, all, or no duplicates.

What's the point? Well DISTINCT can be expensive; REDUCED can do the straightforward de-duplication work (e.g. remove immediately repeated results) without having to remember every row. In many applications that's good enough.

Having said that I've never used REDUCE, I've never seen anyone use REDUCED, and never seen REDUCED mentioned in a talk or tutorial.

user205512
  • 8,368
  • 26
  • 28
  • 1
    Just found this: http://www.franz.com/agraph/support/documentation/current/twinql-tutorial.html#header3-92 says - If you do not need duplicates to be removed, but you do not need the redundant entries, either — which would be the case if you are relying on counts to be correct, for example — then you can specify REDUCED instead of DISTINCT. **This allows AllegroGraph to discard duplicate values if it's advantageous to do so.** – Tomalak Jun 07 '10 at 15:04
  • 3
    We use REDUCED when dealing with very large result sets where DISTINCT would be too slow, but there's a lot of duplicates. It's pretty rare that it's useful though. – Steve Harris Nov 05 '12 at 10:10
  • [This paper](https://link.springer.com/article/10.1007/s00778-019-00558-9) says REDUCED is extremely rare in queries in the wild. – alexis Jul 22 '20 at 09:47
1

In my mind (and in my own SPARQL implementation) REDUCED is effectively an optional DISTINCT constraint which is only applied if the engine deems it to be necessary i.e. the query engine will decide whether or not to eliminate duplicate results based on the query

In my own implementation I only eliminate duplicates when REDUCED has been used if OFFSET/LIMIT has also been used

RobV
  • 26,016
  • 10
  • 71
  • 114